After upgrading to Furiosa SDK 2025.1, I followed the documentation to build the model and run the offline batch inference script. However, I encountered the following error during execution:
2025-03-08T15:22:58.228626031+09:00 INFO furiosa_generator::backend::furiosa_rt: Trying to open:
DeviceRow([Device::npu_fused(0, 0..=3)])
DeviceRow([Device::npu_fused(0, 4..=7)])
2025-03-08T15:22:58.490746937+09:00 INFO furiosa_generator::backend::furiosa_rt: npu:0:0-3 is memory-unified with npu:0:4-7
2025-03-08T15:22:58.490759801+09:00 INFO furiosa_generator::backend::furiosa_rt: npu:0:4-7 is memory-unified with npu:0:0-3
2025-03-08T15:22:58.490827218+09:00 INFO furiosa_generator::backend: KV caches on for each layer (I8, total 4.3 GB * num_blocks over 32 layers) will be allocated
2025-03-08T15:22:58.492115341+09:00 INFO furiosa_generator::backend::furiosa_rt: Loading 1491 parameters from storages has started ...
2025-03-08T15:22:58.492618874+09:00 INFO furiosa_sprinter::buffer::alloc: Support for huge page size of 2 MiB has been detected.
2025-03-08T15:23:08.384780571+09:00 INFO furiosa_generator::backend::furiosa_rt: 1491 parameters (12.5 GiB) has been successfully loaded (9 secs).
2025-03-08T15:23:08.385359746+09:00 WARN furiosa_compiler_ir: CompiledIr capsule was made with a different compiler version: expected "24f1d0abe", got "86d6cfd47"
2025-03-08T15:23:08.436369811+09:00 INFO furiosa_generator::scheduler::action_provider: Preparing Backend (DeviceIndex(0): npu:0:4-7, npu:0:0-3):
2025-03-08T15:23:08.43638535+09:00 INFO furiosa_generator::scheduler::action_provider: - [Pipeline 0] prefill batch: 1, attn_size: 1024, 1 segments per device (total: 2 devices)
2025-03-08T15:23:08.436390209+09:00 INFO furiosa_generator::scheduler::action_provider: - [Pipeline 1] decode batch: 1, attn_size: 1024, 1 segments per device (total: 2 devices)
2025-03-08T15:23:08.437877756+09:00 INFO furiosa_generator::scheduler::generator: Starting scheduler loop with config: SchedulerConfig { npu_queue_limit: 2, max_processing_samples: 65536, spare_blocks_ratio: 0.2, is_offline: false, prefill_chunk_size: None }
2025-03-08T15:23:08.43929409+09:00 INFO furiosa_generator::scheduler::hf_compat: num samples received: 1
thread 'tokio-runtime-worker' panicked at /workspace/source/3.10/furiosa-runtime/furiosa-generator/src/frontend/v1/random_sampling.rs:63:14:
called `Result::unwrap()` on an `Err` value: InvalidWeight
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2025-03-08T15:23:08.540861932+09:00 ERROR furiosa_generator::frontend: Furiosa LLM Engine terminated - pending/subsequent requests will fail
thread 'tokio-runtime-worker' panicked at furiosa-generator/src/scheduler/generator.rs:365:17:
5 blocks are not reclaimed
[553309, 553306, 553308, 553305, 553307]
Traceback (most recent call last):
File "/home/ubuntu/repo/nwkim/testing/single_query.py", line 12, in <module>
2025-03-08T15:23:08.540953343+09:00 ERROR furiosa_generator::scheduler::generator: Furiosa LLM Engine terminated - pending/subsequent requests will fail
response = llm.generate(prompts, sampling_params)
File "/home/ubuntu/miniconda3/envs/rngd/lib/python3.10/site-packages/furiosa_llm/api.py", line 2095, in generate
native_outputs = self.engine.generate(prompt_token_ids, sampling_params)
ValueError: engine terminated
Prior to the upgrade, the script executed successfully without issues. However, after upgrading, the error persists despite multiple retry attempts. What could be the potential cause of this issue?
- Current Environment:
$ furiosa-smi info
+------+--------+------------------+------------------+---------+---------+--------------+
| Arch | Device | Firmware | PERT | Temp. | Power | PCI-BDF |
+------+--------+------------------+------------------+---------+---------+--------------+
| rngd | npu0 | 2024.2.0+7a11888 | 2025.1.0+1694e18 | 40.19°C | 34.00 W | 0000:2d:00.0 |
+------+--------+------------------+------------------+---------+---------+--------------+
$ apt list --installed | grep furiosa
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
furiosa-compiler/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-driver-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-firmware-image-rngd/jammy,now 2025.1.0 amd64 [installed]
furiosa-firmware-tools-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-libsmi/jammy,now 2025.1.0-3 amd64 [installed,automatic]
furiosa-mlperf-resources/jammy,now 5.0.0 amd64 [installed,automatic]
furiosa-mlperf/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-pert-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-smi/jammy,now 2025.1.0-3 amd64 [installed]