Offline Batch Inference Error after Upgrading SDK

trick · March 8, 2025, 8:30am

After upgrading to Furiosa SDK 2025.1, I followed the documentation to build the model and run the offline batch inference script. However, I encountered the following error during execution:

2025-03-08T15:22:58.228626031+09:00  INFO furiosa_generator::backend::furiosa_rt: Trying to open:
DeviceRow([Device::npu_fused(0, 0..=3)])
DeviceRow([Device::npu_fused(0, 4..=7)])
2025-03-08T15:22:58.490746937+09:00  INFO furiosa_generator::backend::furiosa_rt: npu:0:0-3 is memory-unified with npu:0:4-7
2025-03-08T15:22:58.490759801+09:00  INFO furiosa_generator::backend::furiosa_rt: npu:0:4-7 is memory-unified with npu:0:0-3
2025-03-08T15:22:58.490827218+09:00  INFO furiosa_generator::backend: KV caches on for each layer (I8, total 4.3 GB * num_blocks over 32 layers)  will be allocated
2025-03-08T15:22:58.492115341+09:00  INFO furiosa_generator::backend::furiosa_rt: Loading 1491 parameters from storages has started ...
2025-03-08T15:22:58.492618874+09:00  INFO furiosa_sprinter::buffer::alloc: Support for huge page size of 2 MiB has been detected.
2025-03-08T15:23:08.384780571+09:00  INFO furiosa_generator::backend::furiosa_rt: 1491 parameters (12.5 GiB) has been successfully loaded (9 secs).
2025-03-08T15:23:08.385359746+09:00  WARN furiosa_compiler_ir: CompiledIr capsule was made with a different compiler version: expected "24f1d0abe", got "86d6cfd47"
2025-03-08T15:23:08.436369811+09:00  INFO furiosa_generator::scheduler::action_provider: Preparing Backend (DeviceIndex(0): npu:0:4-7, npu:0:0-3):
2025-03-08T15:23:08.43638535+09:00  INFO furiosa_generator::scheduler::action_provider:  - [Pipeline 0] prefill batch: 1, attn_size: 1024, 1 segments per device (total: 2 devices)
2025-03-08T15:23:08.436390209+09:00  INFO furiosa_generator::scheduler::action_provider:  - [Pipeline 1] decode batch: 1, attn_size: 1024, 1 segments per device (total: 2 devices)
2025-03-08T15:23:08.437877756+09:00  INFO furiosa_generator::scheduler::generator: Starting scheduler loop with config: SchedulerConfig { npu_queue_limit: 2, max_processing_samples: 65536, spare_blocks_ratio: 0.2, is_offline: false, prefill_chunk_size: None }
2025-03-08T15:23:08.43929409+09:00  INFO furiosa_generator::scheduler::hf_compat: num samples received: 1
thread 'tokio-runtime-worker' panicked at /workspace/source/3.10/furiosa-runtime/furiosa-generator/src/frontend/v1/random_sampling.rs:63:14:
called `Result::unwrap()` on an `Err` value: InvalidWeight
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2025-03-08T15:23:08.540861932+09:00 ERROR furiosa_generator::frontend: Furiosa LLM Engine terminated - pending/subsequent requests will fail
thread 'tokio-runtime-worker' panicked at furiosa-generator/src/scheduler/generator.rs:365:17:
5 blocks are not reclaimed
[553309, 553306, 553308, 553305, 553307]
Traceback (most recent call last):
  File "/home/ubuntu/repo/nwkim/testing/single_query.py", line 12, in <module>
2025-03-08T15:23:08.540953343+09:00 ERROR furiosa_generator::scheduler::generator: Furiosa LLM Engine terminated - pending/subsequent requests will fail
    response = llm.generate(prompts, sampling_params)
  File "/home/ubuntu/miniconda3/envs/rngd/lib/python3.10/site-packages/furiosa_llm/api.py", line 2095, in generate
    native_outputs = self.engine.generate(prompt_token_ids, sampling_params)
ValueError: engine terminated

Prior to the upgrade, the script executed successfully without issues. However, after upgrading, the error persists despite multiple retry attempts. What could be the potential cause of this issue?

Current Environment:

$ furiosa-smi info
+------+--------+------------------+------------------+---------+---------+--------------+
| Arch | Device | Firmware         | PERT             | Temp.   | Power   | PCI-BDF      |
+------+--------+------------------+------------------+---------+---------+--------------+
| rngd | npu0   | 2024.2.0+7a11888 | 2025.1.0+1694e18 | 40.19°C | 34.00 W | 0000:2d:00.0 |
+------+--------+------------------+------------------+---------+---------+--------------+

$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-compiler/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-driver-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-firmware-image-rngd/jammy,now 2025.1.0 amd64 [installed]
furiosa-firmware-tools-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-libsmi/jammy,now 2025.1.0-3 amd64 [installed,automatic]
furiosa-mlperf-resources/jammy,now 5.0.0 amd64 [installed,automatic]
furiosa-mlperf/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-pert-rngd/jammy,now 2025.1.0-3 amd64 [installed]
furiosa-smi/jammy,now 2025.1.0-3 amd64 [installed]

hyunsik · March 10, 2025, 5:47am

Hi,

You seem to upgrade from 2024.2.x to 2025.1.0. Upgrade seems to happen partially because
your firmware is still old. I’m sorry for the partial upgrade. We will improve the dependency of packages to be more strict in the next release.

Could you check the upgrade instruction?

furiosa-smi result should be as follows:

+------+--------+------------------+------------------+---------+---------+--------------+
| Arch | Device | Firmware         | PERT             | Temp.   | Power   | PCI-BDF      |
+------+--------+------------------+------------------+---------+---------+--------------+
| rngd | npu0   | 2025.1.0+696efad | 2025.1.0+1694e18 | 31.95°C | 34.00 W | 0000:b3:00.0 |
+------+--------+------------------+------------------+---------+---------+--------------+

Also, the output of pip list should be as follows:

$ pip list | grep furiosa
furiosa-llm                   2025.1.2
furiosa-llm-models            2025.1.0
furiosa-model-compressor      2025.1.0
furiosa-model-compressor-impl 2025.1.0
furiosa-native-compiler       2025.1.0
furiosa-native-runtime        2025.1.0
furiosa-smi-py                2025.1.0
furiosa-torch-ext             2025.1.0

After upgrading to 2025.1, you would need the pre-built artifacts. I’ll send them via a direct message.

juyeon91629 · March 12, 2025, 6:59am

안녕하세요? 담당자님,
저 역시 해당 ‘tokio-runtime-worker’ 관련 문제가 존재하여 댓글 남깁니다.

먼저, 환경 조건에 대해서 말씀드립니다.

Current Environment:

furiosa-mlperf gpt-j-offline 및 server 테스팅을 수행중에 error가 생겼습니다.
test-mode는 performance-only 입니다.

command1:
 *furiosa-mlperf gpt-j-offline /data/LLM/mlperf-bert-large /data/LLM/ju_results --test-mode performance-only --mlperf-conf /usr/share/furiosa/mlperf/v4.1/mlperf.conf*

command2:
 *furiosa-mlperf gpt-j-server/data/LLM/mlperf-bert-large /data/LLM/ju_results --test-mode performance-only --mlperf-conf /usr/share/furiosa/mlperf/v4.1/mlperf.conf*

error가 이렇게 생성되고 있습니다.
확인 요청 드리겠습니다.
감사합니다～

좋은 하루 되세요~

hyunsik · March 19, 2025, 2:41am

Hi @juyeon91629,

I’m sorry for the late reply. Could you upgrade your sdk first? This is because the latest SDK is more stable. You can follow the upgrade instruction at Upgrading FuriosaAI’s Software — FuriosaAI Developer Center 2025.1.0 documentation. Also, please let me know if you still have the problem.

juyeon91629 · March 31, 2025, 12:32pm

펌웨어 및 나머지 라이브러리들 update완료한 후,
furiosa-mlperf 돌려봤습니다.
그리고 다음 그림과 같은 에러가 났습니다.

먼저, 환경 조건에 대해서 말씀드립니다.

Current Environment:

image682×232 5.15 KB

image959×323 9.55 KB

image1102×151 5.68 KB

#사용한 모델

모델명: RNGD000207

이상입니다.

hyunsik · April 4, 2025, 8:24pm

안녕하세요?

답변이 늦었습니다. 죄송합니다. 2025.1 릴리즈를 사용하시면 아래 버전의 아티팩트를 사용하셔야 합니다.

컴파일된 아티팩트는 버전별 호환이 되지 않습니다. 2025.1 까지는 직접 버전에 맞는 아티팩트를 사용하거나 직접 컴파일 하셔야 합니다.

참고로 2025.2 부터는 아티팩트는 허깅 페이스 허브에 업로드될 예정이며, 2025.2 부터는 버전을 고려해 다운 받으실 필요가 없고 SDK 버전에 맞는 모델을 허깅 페이스에서 찾게 될 예정입니다. 또한 2025.2 부터는 BF16, FP16, FP32 등 모델은 양자화 없이 BF16 으로 캐스팅 되어 바로 실행 하실 수 있으며, 최적화가 필요한 경우에만 양자화 하시면 되어 모델을 새로 임포트 하는 과정이 매우 간결해질 예정입니다. 2025.2 는 현재 릴리즈 막바지 준비 중이며 준비되는대로 업데이트 드리겠습니다.

Topic		Replies	Views
Furiosa SDK 2025.1.0 release Announcements release	0	76	February 24, 2025
[rngd] Furiosa SDK 2024.2 (Beta0) 펌웨어 error 및 온도 관련 문의 일반 rngd	9	128	February 26, 2025
Furiosa SDK 2024.2.0 Release Announcements release	0	25	January 13, 2025
Errors in compilng enf file having batch_size > 1 with furiosa-compiler command Furiosa Runtime	2	46	January 8, 2025
[rngd] Furiosa SDK 2024.2 (Beta0) 장치 인식 문제 일반 rngd	3	98	February 17, 2025

Offline Batch Inference Error after Upgrading SDK

모델명: RNGD000207

Related topics