Offline Batch Inference Error after Upgrading SDK

After upgrading to Furiosa SDK 2025.1, I followed the documentation to build the model and run the offline batch inference script. However, I encountered the following error during execution:

2025-03-08T15:22:58.228626031+09:00  INFO furiosa_generator::backend::furiosa_rt: Trying to open:
DeviceRow([Device::npu_fused(0, 0..=3)])
DeviceRow([Device::npu_fused(0, 4..=7)])
2025-03-08T15:22:58.490746937+09:00  INFO furiosa_generator::backend::furiosa_rt: npu:0:0-3 is memory-unified with npu:0:4-7
2025-03-08T15:22:58.490759801+09:00  INFO furiosa_generator::backend::furiosa_rt: npu:0:4-7 is memory-unified with npu:0:0-3
2025-03-08T15:22:58.490827218+09:00  INFO furiosa_generator::backend: KV caches on for each layer (I8, total 4.3 GB * num_blocks over 32 layers)  will be allocated
2025-03-08T15:22:58.492115341+09:00  INFO furiosa_generator::backend::furiosa_rt: Loading 1491 parameters from storages has started ...
2025-03-08T15:22:58.492618874+09:00  INFO furiosa_sprinter::buffer::alloc: Support for huge page size of 2 MiB has been detected.
2025-03-08T15:23:08.384780571+09:00  INFO furiosa_generator::backend::furiosa_rt: 1491 parameters (12.5 GiB) has been successfully loaded (9 secs).
2025-03-08T15:23:08.385359746+09:00  WARN furiosa_compiler_ir: CompiledIr capsule was made with a different compiler version: expected "24f1d0abe", got "86d6cfd47"
2025-03-08T15:23:08.436369811+09:00  INFO furiosa_generator::scheduler::action_provider: Preparing Backend (DeviceIndex(0): npu:0:4-7, npu:0:0-3):
2025-03-08T15:23:08.43638535+09:00  INFO furiosa_generator::scheduler::action_provider:  - [Pipeline 0] prefill batch: 1, attn_size: 1024, 1 segments per device (total: 2 devices)
2025-03-08T15:23:08.436390209+09:00  INFO furiosa_generator::scheduler::action_provider:  - [Pipeline 1] decode batch: 1, attn_size: 1024, 1 segments per device (total: 2 devices)
2025-03-08T15:23:08.437877756+09:00  INFO furiosa_generator::scheduler::generator: Starting scheduler loop with config: SchedulerConfig { npu_queue_limit: 2, max_processing_samples: 65536, spare_blocks_ratio: 0.2, is_offline: false, prefill_chunk_size: None }
2025-03-08T15:23:08.43929409+09:00  INFO furiosa_generator::scheduler::hf_compat: num samples received: 1
thread 'tokio-runtime-worker' panicked at /workspace/source/3.10/furiosa-runtime/furiosa-generator/src/frontend/v1/random_sampling.rs:63:14:
called `Result::unwrap()` on an `Err` value: InvalidWeight
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2025-03-08T15:23:08.540861932+09:00 ERROR furiosa_generator::frontend: Furiosa LLM Engine terminated - pending/subsequent requests will fail
thread 'tokio-runtime-worker' panicked at furiosa-generator/src/scheduler/generator.rs:365:17:
5 blocks are not reclaimed
[553309, 553306, 553308, 553305, 553307]
Traceback (most recent call last):
  File "/home/ubuntu/repo/nwkim/testing/single_query.py", line 12, in <module>
2025-03-08T15:23:08.540953343+09:00 ERROR furiosa_generator::scheduler::generator: Furiosa LLM Engine terminated - pending/subsequent requests will fail
    response = llm.generate(prompts, sampling_params)
  File "/home/ubuntu/miniconda3/envs/rngd/lib/python3.10/site-packages/furiosa_llm/api.py", line 2095, in generate
    native_outputs = self.engine.generate(prompt_token_ids, sampling_params)
ValueError: engine terminated

Prior to the upgrade, the script executed successfully without issues. However, after upgrading, the error persists despite multiple retry attempts. What could be the potential cause of this issue?

  • Current Environment:
$ furiosa-smi info
+------+--------+------------------+------------------+---------+---------+--------------+
| Arch | Device | Firmware         | PERT             | Temp.   | Power   | PCI-BDF      |
+------+--------+------------------+------------------+---------+---------+--------------+
| rngd | npu0   | 2024.2.0+7a11888 | 2025.1.0+1694e18 | 40.19°C | 34.00 W | 0000:2d:00.0 |
+------+--------+------------------+------------------+---------+---------+--------------+
$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-compiler/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-driver-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-firmware-image-rngd/jammy,now 2025.1.0 amd64 [installed]
furiosa-firmware-tools-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-libsmi/jammy,now 2025.1.0-3 amd64 [installed,automatic]
furiosa-mlperf-resources/jammy,now 5.0.0 amd64 [installed,automatic]
furiosa-mlperf/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-pert-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-smi/jammy,now 2025.1.0-3 amd64 [installed]

Hi,

You seem to upgrade from 2024.2.x to 2025.1.0. Upgrade seems to happen partially because
your firmware is still old. I’m sorry for the partial upgrade. We will improve the dependency of packages to be more strict in the next release.

Could you check the upgrade instruction?

furiosa-smi result should be as follows:

+------+--------+------------------+------------------+---------+---------+--------------+
| Arch | Device | Firmware         | PERT             | Temp.   | Power   | PCI-BDF      |
+------+--------+------------------+------------------+---------+---------+--------------+
| rngd | npu0   | 2025.1.0+696efad | 2025.1.0+1694e18 | 31.95°C | 34.00 W | 0000:b3:00.0 |
+------+--------+------------------+------------------+---------+---------+--------------+

Also, the output of pip list should be as follows:

$ pip list | grep furiosa
furiosa-llm                   2025.1.2
furiosa-llm-models            2025.1.0
furiosa-model-compressor      2025.1.0
furiosa-model-compressor-impl 2025.1.0
furiosa-native-compiler       2025.1.0
furiosa-native-runtime        2025.1.0
furiosa-smi-py                2025.1.0
furiosa-torch-ext             2025.1.0

After upgrading to 2025.1, you would need the pre-built artifacts. I’ll send them via a direct message.

안녕하세요? 담당자님,
저 역시 해당 ‘tokio-runtime-worker’ 관련 문제가 존재하여 댓글 남깁니다.

먼저, 환경 조건에 대해서 말씀드립니다.

  • Current Environment:


  1. furiosa-mlperf gpt-j-offline 및 server 테스팅을 수행중에 error가 생겼습니다.
  2. test-mode는 performance-only 입니다.

command1:
 *furiosa-mlperf gpt-j-offline /data/LLM/mlperf-bert-large /data/LLM/ju_results --test-mode performance-only --mlperf-conf /usr/share/furiosa/mlperf/v4.1/mlperf.conf*

command2:
 *furiosa-mlperf gpt-j-server/data/LLM/mlperf-bert-large /data/LLM/ju_results --test-mode performance-only --mlperf-conf /usr/share/furiosa/mlperf/v4.1/mlperf.conf*

error가 이렇게 생성되고 있습니다.
확인 요청 드리겠습니다.
감사합니다~

좋은 하루 되세요~

Hi @juyeon91629,

I’m sorry for the late reply. Could you upgrade your sdk first? This is because the latest SDK is more stable. You can follow the upgrade instruction at Upgrading FuriosaAI’s Software — FuriosaAI Developer Center 2025.1.0 documentation. Also, please let me know if you still have the problem.