아티팩트 생성 시 오류 발생 이슈

tobewiseys · October 27, 2025, 4:33am

빠른 지원에 감사드리고 있습니다.
이번에는 다음과 같이 아티팩트 생성과 관련된 내용입니다.

아래와 같은 명령을 이용하여 llama 3.1 8b를 미세조정한 모델을
양자화한 후, 아티팩트를 만들고자 하였습니다.

furiosa-llm build ./quantized_llam318_komedic
./komedic_qt8b
-tp 8 --max-seq-len-to-capture 4096
–num-pipeline-builder-workers 4
–num-compile-workers 4

해당 모델은 허깅페이스에 unidocs/llama-3.1-8b-komedic-instruct 로 공개되어 있으며,
해당 모델을 내려 받아서 8bit로 양자화 시켜서 ./quantized_llam318_komedic 폴더에 저장하였습니다.

양자화된 모델로 아티팩트를 만들 때 처리 속도 향상을 위해서 --num-pipeline-builder-workers와 --num-compile-workers의 숫자를 증가시켰습니다. 현재 36core여서 16,16으로 처음에 세팅하였으나, 아티팩트 만드는 시간이 증가하지 않아서 4,4로 조정하였습니다.

위와 같이 수행한 후 process 상태를 확인해 보면 ray::IDLE 상태가 많이 보입니다.

elicer 12237 12156 0 01:59 pts/2 00:00:20 ray::__build_fx_pipelines_with_ray
elicer 12238 12156 0 01:59 pts/2 00:00:05 ray::IDLE
elicer 12239 12156 0 01:59 pts/2 00:00:05 ray::IDLE
elicer 12749 5379 0 01:59 pts/3 00:00:00 grep --color=auto ray
(llama_ve310) elicer@f5e3ec3ba50d:~$ source /home/elicer/workspaces/llama_ve310/bin/activate
(llama_ve310) elicer@f5e3ec3ba50d:~$ ps -ef | grep ray

16,16으로 늘린 경우에는 대부분이 IDLE 상태이고 CPU 점유율이 5%가 되지 않고 있습니다.
이전에 제가 성공적으로 작업을 완료하였을 때는 50분 경 작업이 종료되었는데 지금은 10시간 가까이 돌려도 아티팩트가 생성되지 않고 있습니다. (SSD에서 HDD로 변경한 것이 관련이 있을지요?)

10시간 정도 지난 후에 아래와 같은 메시지가 나오고 있습니다.

Embedding
tp_config:
inputs:
outputs:
operators: {}

(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 [furiosa-llm] Compiling pipeline Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv4095-b1-attn4096, supertask 2 for renegade-8pe.
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Block type: first
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Loading compiler config for CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=8, num_chip=1, block_type=<BlockType.FIRST: ‘first’>, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=4095), phase=<PipelineMode.LLM_DECODE: ‘decode’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True)
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Using compiler config {‘progress_mode’: ‘ProgressBar’, ‘propagate_sparse_axis_from_op’: ‘Gather’, ‘cast_64bit_types’: True, ‘support_i64_index’: False, ‘implicit_type_casting’: False, ‘implicit_type_casting_all_activation’: False, ‘reshape_remover_ignores_slice_concat’: False, ‘ignore_casts_for_precision_gain’: True, ‘arithmetic_qk_masking’: False, ‘separate_vector_ops_from_dpe’: False, ‘separate_vrf_from_interleaving’: True, ‘no_interleaving_at_fusion_stage’: False, ‘experimental_demote_concat’: True, ‘activation_shape_guide’: None, ‘normalize_sine_cosine_operator_domain’: False, ‘max_operation_memory_ratio’: 1.0, ‘tensor_size_in_page_after_split’: 2, ‘tensor_unit_bridge_threshold_in_page’: 12, ‘dma_bridge_threshold_in_bytes’: 4294967296, ‘use_dma_bridges_only’: False, ‘use_split_operation_in_tk’: True, ‘allow_unlowered_operators’: False, ‘remove_lower’: False, ‘remove_unlower’: False, ‘lowering_mode’: ‘Optimal’, ‘max_num_partitioning_axes’: 2, ‘max_num_unique_shapes’: 500, ‘padding_policy’: {‘Small’: 1.1}, ‘allow_unlimited_padding’: True, ‘tactic_hint’: ‘ForLlmModelDecode’, ‘tactic_context_config’: ‘None’, ‘local_population_threshold’: 0, ‘use_sparse_bridge_population’: True, ‘reshape_einsum_mode’: {‘Reshape’: {‘permute’: False}}, ‘use_efficiently_broadcasted_tactic’: False, ‘populate_dma_optimized_einsum_tactics’: True, ‘use_exhaustive_search_in_binary_lowering’: False, ‘dma_segment_mode’: ‘Outer’, ‘populate_irregular_indices_shape’: True, ‘allow_hp_lowering’: True, ‘use_attention_kernel’: True, ‘attention_kernel_hint_mode’: ‘Decode’, ‘tactic_sorting_policy’: ‘ByEstimation’, ‘dont_care_bridge_cost’: False, ‘dma_preference’: 1.0, ‘num_transaction_simulation_per_pe’: 1024, ‘tactic_tail_shape_alignment’: 8, ‘enable_vrf_half_mode’: True, ‘vrf_reuse_optimization_level’: 2, ‘reduce_by_ve_allow_partitioning_tail_split’: True, ‘enable_tactic_pruning’: True, ‘use_aligned_repartition2’: True, ‘bf16_partial_sum_policy’: ‘EnableForSplitAndChipAndCluster’, ‘coalesce_tensors_by_common_split’: False, ‘use_block_compile’: False, ‘skip_clustering_by_scc’: False, ‘allow_external_operators_in_lir’: False, ‘no_dram_reuse’: False, ‘scheduler_beam_search’: True, ‘subgraph_scheduling’: False, ‘attention_mask_reuse_traverse’: True, ‘expected_total_beam_states’: 100000, ‘focused_ops_coverage_limit’: None, ‘dma_grace_period’: 10000, ‘sync_grace_period_per_chip’: 10000, ‘estimate_dma_command_gather_with_concrete_samples’: False, ‘mimic_sync_io’: False, ‘profile_sync’: False, ‘dump_in_nvp’: False, ‘instruction_mem_budget’: 720896, ‘instruction_chunk_size’: 2000, ‘enable_tuc_profile’: False, ‘dedup_task_commands’: True, ‘insert_wait_by_estimation’: False, ‘resolve_noc_timeout’: True, ‘c_compile_debug_mode’: False, ‘dma_throughput_per_pe’: 180, ‘remove_dtoh_htod’: False, ‘use_einsum_by_dpe_for_interleaved_mul’: False, ‘use_dma_stos_for_concat_paste’: True, ‘use_custom_broadcast_for_concat_paste’: False, ‘profile_exact_command_cycle’: False, ‘duplicate_arm_binary’: False, ‘allow_reduce_by_ve_cluster_chip_reduce’: True, ‘allow_multiple_consumer_rf’: False, ‘allow_reduce_by_ve_cluster_chip_reduce_base_population’: False, ‘all_reduce_as_dma_reduce_broadcast_tactic’: True}
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 [furiosa-llm] Generated graph metadata: —
(__compile_supertasks_with_ray pid=12237) valid_length: ~
(__compile_supertasks_with_ray pid=12237) graph_io_category:
(__compile_supertasks_with_ray pid=12237) input_category:
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
INFO:2025-10-26 02:00:21+0000 hash for the graph: 015314bf4f0084e296cc44b2da7571f3
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) output_category:
(__compile_supertasks_with_ray pid=12237) - Intermediate:
(__compile_supertasks_with_ray pid=12237) named_axes:
(__compile_supertasks_with_ray pid=12237) - Batch
(__compile_supertasks_with_ray pid=12237) - Sequence
(__compile_supertasks_with_ray pid=12237) - Embedding
(__compile_supertasks_with_ray pid=12237) tp_config:
(__compile_supertasks_with_ray pid=12237) inputs:
(__compile_supertasks_with_ray pid=12237) outputs:
(__compile_supertasks_with_ray pid=12237) operators: {}
(__compile_supertasks_with_ray pid=12237)
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 hash for the graph: d3809467d55a7b74d967ae5279a35b1f
source /home/elicer/workspaces/llama_ve310/bin/activate
ERROR: invalid npu id: renegade-8pe
Encountered exception!
non_shared_configs(w/o past_kv): [NonSharedPipelineBuildConfig(args_data=(), kwargs_data={‘input_ids’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘position_ids’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘new_key_location’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘new_value_location’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘bucket_size’: 4096, ‘use_cache’: False, ‘is_prefill’: True, ‘causal_mask’: TensorGenInfo(shape=torch.Size([1, 4096, 4096]), dtype=torch.bool)}, pipeline_name=‘Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv0-b1-attn4096’, compile_config=CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=None, num_chip=None, block_type=None, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=0), phase=<PipelineMode.LLM_PREFILL: ‘prefill’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True), logits_slice_config=LogitsSliceConfig(slice_direction=‘left’, slice_size=1), num_blocks_per_supertask=1), NonSharedPipelineBuildConfig(args_data=(), kwargs_data={‘input_ids’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘attention_mask’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.bool), ‘position_ids’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘new_key_location’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘new_value_location’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘bucket_size’: 4096, ‘use_cache’: False, ‘is_prefill’: False, ‘past_valid_key_indices’: TensorGenInfo(shape=torch.Size([4095]), dtype=torch.int32), ‘past_valid_value_indices’: TensorGenInfo(shape=torch.Size([4095]), dtype=torch.int32)}, pipeline_name=‘Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv4095-b1-attn4096’, compile_config=CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=None, num_chip=None, block_type=None, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=4095), phase=<PipelineMode.LLM_DECODE: ‘decode’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True), logits_slice_config=None, num_blocks_per_supertask=1)]

Traceback (most recent call last):
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/converter.py”, line 819, in compile_gm_and_get_preprocessed_gm_hash
compiled = compile(
RuntimeError: fail to compile: Invalid NPU ID

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/elicer/workspaces/llama_ve310/bin/furiosa-llm”, line 7, in
sys.exit(main())
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/cli/main.py”, line 20, in main
args.dispatch_function(args)
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/cli/convert.py”, line 246, in convert
builder.build(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/builder.py”, line 687, in build
target_model_artifact, target_model_pipelines = self._build_model_artifact(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/builder.py”, line 590, in _build_model_artifact
pipelines_with_metadata = build_pipelines(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/helper.py”, line 285, in build_pipelines
pipelines = pipeline_builder.build_pipelines(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 844, in build_pipelines
return PipelineBuilder.compile_supertasks_in_parallel(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 1064, in compile_supertasks_in_parallel
local_pipelines = PipelineBuilder.__compile_supertasks_aux(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 1108, in __compile_supertasks_aux
_compile_supertasks_in_pipeline(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 250, in _compile_supertasks_in_pipeline
compile_result, hash_val = GraphModuleConverter.compile_gm_and_get_preprocessed_gm_hash(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/converter.py”, line 837, in compile_gm_and_get_preprocessed_gm_hash
raise RuntimeError(f"Compilation failed with error {e}")
RuntimeError: Compilation failed with error fail to compile: Invalid NPU ID
(llama_ve310) elicer@f5e3ec3ba50d:~$

Invalid NPU ID오류가 발생하는데 무엇이 잘못된 것일까요?

(llama_ve310) elicer@f5e3ec3ba50d:~/workspaces$ furiosa-smi info
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| Index | Arch | Device | Firmware | PERT | Temp. | Power | PCI-BDF |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| 0 | rngd | npu5 | 2025.3.0+c097ea0 | 2025.3.0+52e5705 | 40.04°C | 34.56 W | 0000:bc:00.0 |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| 1 | rngd | npu6 | 2025.3.0+c097ea0 | 2025.3.1+52e5705 | 30.37°C | 34.56 W | 0000:bd:00.0 |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+

furiosa-llm 버전 2025.3.3 버전으로 아티팩트만들 때 시간이 너무 오래 걸려
2025.3.1 버전으로 다운그레이드 시켜서 테스트를 한 상태입니다.

(llama_ve310) elicer@f5e3ec3ba50d:~/workspaces$ pip list | grep furiosa
furiosa-llm 2025.3.1
furiosa-llm-models 2025.3.0
furiosa-model-compressor 2025.3.0
furiosa-model-compressor-impl 2025.3.0
furiosa-models-lang 2025.3.0
furiosa-native-compiler 2025.3.1
furiosa-native-llm-common 2025.3.1
furiosa-native-runtime 2025.3.2
furiosa-smi-py 2025.3.0
furiosa-torch-ext 2025.3.1

hyunsik · October 29, 2025, 1:50am

안녕하세요? 불편드려 죄송합니다. Invalid NPU ID는 -tp8 옵션을 주셨으면 발생하면 안되는 에러이기는 합니다. 한번 공유 주신 버전으로 제가 재현해보고 내일 까지 업데이트 드리도록 하겠습니다.

hyunsik · October 29, 2025, 1:51am

혹시 apt list --installed | grep furiosa 커맨드의 실행 결과도 공유 부탁드려도 될까요?

예를 들면, 다음과 같이 출력됩니다.

apt list --installed | grep furiosa 

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-compiler/jammy-rc,now 2025.3.0-3 amd64 [installed]
furiosa-driver-rngd/jammy-rc,now 1.11.0-3 all [installed,upgradable to: 2025.3.1-3]
furiosa-firmware-image-rngd/jammy-rc,now 1.7.0 all [installed,upgradable to: 2025.3.1]
furiosa-firmware-tools-rngd/jammy-rc,now 2025.3.1-3 amd64 [installed]
furiosa-libhal-nvp/jammy-nightly,now 0.11.0-3+nightly-250714+9f4c23a amd64 [installed,upgradable to: 0.11.0-3+nightly-251023+99925dc]
furiosa-libsmi/jammy-rc,now 2025.3.0-3 amd64 [installed]
furiosa-mlperf-resources/jammy-rc,now 5.0.0 amd64 [installed]
furiosa-pert-rngd/jammy-nightly,now 0.1.0-3+nightly-250929+f9abfb9 amd64 [installed,upgradable to: 2025.3.1-3]
furiosa-smi/jammy-rc,now 2025.3.0-3 amd64 [installed]

tobewiseys · October 29, 2025, 1:53am

요청하신 내용은 다음과 같습니다.

(llama_ve310) elicer@f5e3ec3ba50d:~$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-bench/now 0.10.3-3 amd64 [installed,local]
furiosa-compiler/now 0.10.3-3 amd64 [installed,upgradable to: 2025.3.0-3]
furiosa-libcompiler/now 0.10.1-3 amd64 [installed,local]
furiosa-libhal-warboy/now 0.12.0-3 amd64 [installed,local]
furiosa-libnux/now 0.10.1-3 amd64 [installed,local]
furiosa-libsmi/jammy,now 2025.3.0-3 amd64 [installed,automatic]
furiosa-smi/jammy,now 2025.3.0-3 amd64 [installed]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

결과가 많이 다른데요… 버전에 맞게 설치를 해야 하는 것일까요?
제가 CSP에서 인스턴스를 만들어서 사용중이라 서버 재기동은 별도로 요청해야 합니다.

tobewiseys · October 29, 2025, 2:35am

참고로 말씀드리면 이전에 만들어진 아티팩트는 허깅페이스에 공개한 상태입니다.
해당 아티팩트는 잘 작동이 됩니다.

새로운 인스턴스에서 새로 개발환경을 만들어서 양자화를 수행하고
아티팩트를 생성하려고 할 때 발생한 오류입니다.
참고하시기 바랍니다.

원래는 양자화를 안 하고 바로 아티팩트를 만들어서 성능을 비교하려고 했었는데
아티팩트 만드는 시간이 너무 오래 걸려서 양자화를 한 모델로 성능을 평가하였습니다.

hyunsik · October 29, 2025, 3:32pm

안녕하세요?

설치된 패키지들을 보면 1세대 제품인 Warboy용 패키지들이네요. 그렇다면, Invalid NPU id 오류가 설명이 되네요.

인스턴스에 미리 설치된 패키지였다면, Warboy, RNGD용 인스턴스가 구분되어 있을 것 같네요.

만약에 설치 문서를 보고 직접 설치하신거라면, Welcome to Furiosa Docs — FuriosaAI Developer Center 2025.3.1 documentation 를 참고하시면 됩니다.

tobewiseys · October 30, 2025, 8:20am

제가 일일이 손으로 맞추어 보았지만 아티팩트를 생성하는 시간이 오래 걸리는 것은 동일했습니다.

수작업으로 아래와 같은 상태까지 만들었지만…
firmware를 새로 설치하여 콜드 부팅이필요하고
furiosa-libhal-nvp는 설치가 되지 않아서
아래와 같은 상태로 양자화는 되지만 아티팩트는 만들어지지 않습니다.
계속 대기 상태인데요…

furiosa-compiler/now 0.10.3-3 amd64 [installed,upgradable to: 2025.3.0-3]
furiosa-driver-rngd/jammy,now 2025.3.1-3 all [installed]
furiosa-firmware-image-rngd/jammy,now 2025.3.1 all [installed]
furiosa-firmware-tools-rngd/jammy,now 2025.3.1-3 amd64 [installed]
furiosa-libsmi/jammy,now 2025.3.0-3 amd64 [installed,automatic]
furiosa-pert-rngd/jammy,now 2025.3.1-3 amd64 [installed]
furiosa-smi/jammy,now 2025.3.0-3 amd64 [installed]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

그래서 인스턴스를 삭제하고 새로 생성하니
아래와 같이 warboy 상태로 설정되어 있습니다.

elicer@a9c6e8a77490:~$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-bench/now 0.10.3-3 amd64 [installed,local]
furiosa-compiler/now 0.10.3-3 amd64 [installed,local]
furiosa-libcompiler/now 0.10.1-3 amd64 [installed,local]
furiosa-libhal-warboy/now 0.12.0-3 amd64 [installed,local]
furiosa-libnux/now 0.10.1-3 amd64 [installed,local]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

인스턴스를 생성할 때 warboy나 renegade를 위한 환경을 특별히 설정하는 것이 없는데요
이런 경우. csp에게 renegade를 위한 환경설정 작업을 요청해야 하는 것일까요?
매번 인스턴스 생성할 때 마다 요청을 해야 하는 것인지 모르겠지만
csp에게 어떤 식으로 요청을 하면 될지 문의드립니다.

jongwook.kim · October 30, 2025, 9:13am

안녕하세요, 퓨리오사에이아이 김종욱입니다.

환경에 있는 /etc/apt/sources.list.d/furiosa.list 파일과 /etc/apt/auth.conf.d/furiosa.conf 파일을 삭제하신 후에, 여기 페이지 에서 Setting up APT 항목만 따라하시고, 아래와 같이 comiler를 RNGD 버전에 맞춰서 설치를 하실 수 있을까요?

$ sudo apt update
$ apt list -a | grep furiosa-compiler # 여기서 2025.3.0 기준의 버전을 확인한 다음
$ sudo apt install furiosa-compiler=2025.3.0-3 # 해당 버전을 지정한 후 설치

tobewiseys · October 30, 2025, 9:51am

수행한 결과는 아래와 같습니다.

elicer@a9c6e8a77490:/etc/apt/trusted.gpg.d$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
furiosa-bench/now 0.10.3-3 amd64 [installed,local]
furiosa-compiler/jammy,now 2025.3.0-3 amd64 [installed]
furiosa-libcompiler/now 0.10.1-3 amd64 [installed,local]
furiosa-libhal-warboy/now 0.12.0-3 amd64 [installed,local]
furiosa-libnux/now 0.10.1-3 amd64 [installed,local]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

jongwook.kim · October 30, 2025, 11:51am

compiler는 정상적으로 설치가된 것으로 보이며, 아티팩트 자체는 만들 수 있을 것으로 보입니다. 한번 시도를 해볼 수 있으실까요? 다만, 실행을 위해서는 RNGD Instance 환경을 따로 받아야할 것 같습니다.

tobewiseys · November 3, 2025, 2:16am

아티팩트 생성이 기존에서는 1시간 안에 처리되었는데 3시간이 넘어도 처리가 되지 않아서
일단 중지시킨 상태입니다.

관련하여 csp에 레너게이드를 위한 기본 패키지 변경을 금요일에 요청한 상태이고 대기중입니다.

Topic		Replies	Views
Rngd 아티팩트 빌드 관련 질문 일반	1	136	January 22, 2025
Furiosa-llm 모델 로드 이슈 Furiosa LLM	2	158	August 6, 2025
[rngd] Furiosa SDK 2024.2 (Beta0) 펌웨어 error 및 온도 관련 문의 일반 rngd	9	280	February 26, 2025
Librenegade.so 오류 Furiosa LLM	8	121	October 20, 2025
Furiosa-sdk 설치문제 - NPU가 확인되지 않습니다 일반	6	247	August 22, 2023

아티팩트 생성 시 오류 발생 이슈

Related topics