아티팩트 생성 시 오류 발생 이슈

빠른 지원에 감사드리고 있습니다.
이번에는 다음과 같이 아티팩트 생성과 관련된 내용입니다.

아래와 같은 명령을 이용하여 llama 3.1 8b를 미세조정한 모델을
양자화한 후, 아티팩트를 만들고자 하였습니다.

furiosa-llm build ./quantized_llam318_komedic
./komedic_qt8b
-tp 8 --max-seq-len-to-capture 4096
–num-pipeline-builder-workers 4
–num-compile-workers 4

해당 모델은 허깅페이스에 unidocs/llama-3.1-8b-komedic-instruct 로 공개되어 있으며,
해당 모델을 내려 받아서 8bit로 양자화 시켜서 ./quantized_llam318_komedic 폴더에 저장하였습니다.

양자화된 모델로 아티팩트를 만들 때 처리 속도 향상을 위해서 --num-pipeline-builder-workers와 --num-compile-workers의 숫자를 증가시켰습니다. 현재 36core여서 16,16으로 처음에 세팅하였으나, 아티팩트 만드는 시간이 증가하지 않아서 4,4로 조정하였습니다.

위와 같이 수행한 후 process 상태를 확인해 보면 ray::IDLE 상태가 많이 보입니다.

elicer 12237 12156 0 01:59 pts/2 00:00:20 ray::__build_fx_pipelines_with_ray
elicer 12238 12156 0 01:59 pts/2 00:00:05 ray::IDLE
elicer 12239 12156 0 01:59 pts/2 00:00:05 ray::IDLE
elicer 12749 5379 0 01:59 pts/3 00:00:00 grep --color=auto ray
(llama_ve310) elicer@f5e3ec3ba50d:~$ source /home/elicer/workspaces/llama_ve310/bin/activate
(llama_ve310) elicer@f5e3ec3ba50d:~$ ps -ef | grep ray

16,16으로 늘린 경우에는 대부분이 IDLE 상태이고 CPU 점유율이 5%가 되지 않고 있습니다.
이전에 제가 성공적으로 작업을 완료하였을 때는 50분 경 작업이 종료되었는데 지금은 10시간 가까이 돌려도 아티팩트가 생성되지 않고 있습니다. (SSD에서 HDD로 변경한 것이 관련이 있을지요?)

10시간 정도 지난 후에 아래와 같은 메시지가 나오고 있습니다.

  • Embedding
    tp_config:
    inputs:
    outputs:
    operators: {}

(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 [furiosa-llm] Compiling pipeline Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv4095-b1-attn4096, supertask 2 for renegade-8pe.
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Block type: first
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Loading compiler config for CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=8, num_chip=1, block_type=<BlockType.FIRST: ‘first’>, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=4095), phase=<PipelineMode.LLM_DECODE: ‘decode’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True)
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Using compiler config {‘progress_mode’: ‘ProgressBar’, ‘propagate_sparse_axis_from_op’: ‘Gather’, ‘cast_64bit_types’: True, ‘support_i64_index’: False, ‘implicit_type_casting’: False, ‘implicit_type_casting_all_activation’: False, ‘reshape_remover_ignores_slice_concat’: False, ‘ignore_casts_for_precision_gain’: True, ‘arithmetic_qk_masking’: False, ‘separate_vector_ops_from_dpe’: False, ‘separate_vrf_from_interleaving’: True, ‘no_interleaving_at_fusion_stage’: False, ‘experimental_demote_concat’: True, ‘activation_shape_guide’: None, ‘normalize_sine_cosine_operator_domain’: False, ‘max_operation_memory_ratio’: 1.0, ‘tensor_size_in_page_after_split’: 2, ‘tensor_unit_bridge_threshold_in_page’: 12, ‘dma_bridge_threshold_in_bytes’: 4294967296, ‘use_dma_bridges_only’: False, ‘use_split_operation_in_tk’: True, ‘allow_unlowered_operators’: False, ‘remove_lower’: False, ‘remove_unlower’: False, ‘lowering_mode’: ‘Optimal’, ‘max_num_partitioning_axes’: 2, ‘max_num_unique_shapes’: 500, ‘padding_policy’: {‘Small’: 1.1}, ‘allow_unlimited_padding’: True, ‘tactic_hint’: ‘ForLlmModelDecode’, ‘tactic_context_config’: ‘None’, ‘local_population_threshold’: 0, ‘use_sparse_bridge_population’: True, ‘reshape_einsum_mode’: {‘Reshape’: {‘permute’: False}}, ‘use_efficiently_broadcasted_tactic’: False, ‘populate_dma_optimized_einsum_tactics’: True, ‘use_exhaustive_search_in_binary_lowering’: False, ‘dma_segment_mode’: ‘Outer’, ‘populate_irregular_indices_shape’: True, ‘allow_hp_lowering’: True, ‘use_attention_kernel’: True, ‘attention_kernel_hint_mode’: ‘Decode’, ‘tactic_sorting_policy’: ‘ByEstimation’, ‘dont_care_bridge_cost’: False, ‘dma_preference’: 1.0, ‘num_transaction_simulation_per_pe’: 1024, ‘tactic_tail_shape_alignment’: 8, ‘enable_vrf_half_mode’: True, ‘vrf_reuse_optimization_level’: 2, ‘reduce_by_ve_allow_partitioning_tail_split’: True, ‘enable_tactic_pruning’: True, ‘use_aligned_repartition2’: True, ‘bf16_partial_sum_policy’: ‘EnableForSplitAndChipAndCluster’, ‘coalesce_tensors_by_common_split’: False, ‘use_block_compile’: False, ‘skip_clustering_by_scc’: False, ‘allow_external_operators_in_lir’: False, ‘no_dram_reuse’: False, ‘scheduler_beam_search’: True, ‘subgraph_scheduling’: False, ‘attention_mask_reuse_traverse’: True, ‘expected_total_beam_states’: 100000, ‘focused_ops_coverage_limit’: None, ‘dma_grace_period’: 10000, ‘sync_grace_period_per_chip’: 10000, ‘estimate_dma_command_gather_with_concrete_samples’: False, ‘mimic_sync_io’: False, ‘profile_sync’: False, ‘dump_in_nvp’: False, ‘instruction_mem_budget’: 720896, ‘instruction_chunk_size’: 2000, ‘enable_tuc_profile’: False, ‘dedup_task_commands’: True, ‘insert_wait_by_estimation’: False, ‘resolve_noc_timeout’: True, ‘c_compile_debug_mode’: False, ‘dma_throughput_per_pe’: 180, ‘remove_dtoh_htod’: False, ‘use_einsum_by_dpe_for_interleaved_mul’: False, ‘use_dma_stos_for_concat_paste’: True, ‘use_custom_broadcast_for_concat_paste’: False, ‘profile_exact_command_cycle’: False, ‘duplicate_arm_binary’: False, ‘allow_reduce_by_ve_cluster_chip_reduce’: True, ‘allow_multiple_consumer_rf’: False, ‘allow_reduce_by_ve_cluster_chip_reduce_base_population’: False, ‘all_reduce_as_dma_reduce_broadcast_tactic’: True}
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 [furiosa-llm] Generated graph metadata: —
(__compile_supertasks_with_ray pid=12237) valid_length: ~
(__compile_supertasks_with_ray pid=12237) graph_io_category:
(__compile_supertasks_with_ray pid=12237) input_category:
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
INFO:2025-10-26 02:00:21+0000 hash for the graph: 015314bf4f0084e296cc44b2da7571f3
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) output_category:
(__compile_supertasks_with_ray pid=12237) - Intermediate:
(__compile_supertasks_with_ray pid=12237) named_axes:
(__compile_supertasks_with_ray pid=12237) - Batch
(__compile_supertasks_with_ray pid=12237) - Sequence
(__compile_supertasks_with_ray pid=12237) - Embedding
(__compile_supertasks_with_ray pid=12237) tp_config:
(__compile_supertasks_with_ray pid=12237) inputs:
(__compile_supertasks_with_ray pid=12237) outputs:
(__compile_supertasks_with_ray pid=12237) operators: {}
(__compile_supertasks_with_ray pid=12237)
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 hash for the graph: d3809467d55a7b74d967ae5279a35b1f
source /home/elicer/workspaces/llama_ve310/bin/activate
ERROR: invalid npu id: renegade-8pe
Encountered exception!
non_shared_configs(w/o past_kv): [NonSharedPipelineBuildConfig(args_data=(), kwargs_data={‘input_ids’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘position_ids’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘new_key_location’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘new_value_location’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘bucket_size’: 4096, ‘use_cache’: False, ‘is_prefill’: True, ‘causal_mask’: TensorGenInfo(shape=torch.Size([1, 4096, 4096]), dtype=torch.bool)}, pipeline_name=‘Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv0-b1-attn4096’, compile_config=CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=None, num_chip=None, block_type=None, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=0), phase=<PipelineMode.LLM_PREFILL: ‘prefill’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True), logits_slice_config=LogitsSliceConfig(slice_direction=‘left’, slice_size=1), num_blocks_per_supertask=1), NonSharedPipelineBuildConfig(args_data=(), kwargs_data={‘input_ids’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘attention_mask’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.bool), ‘position_ids’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘new_key_location’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘new_value_location’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘bucket_size’: 4096, ‘use_cache’: False, ‘is_prefill’: False, ‘past_valid_key_indices’: TensorGenInfo(shape=torch.Size([4095]), dtype=torch.int32), ‘past_valid_value_indices’: TensorGenInfo(shape=torch.Size([4095]), dtype=torch.int32)}, pipeline_name=‘Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv4095-b1-attn4096’, compile_config=CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=None, num_chip=None, block_type=None, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=4095), phase=<PipelineMode.LLM_DECODE: ‘decode’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True), logits_slice_config=None, num_blocks_per_supertask=1)]

Traceback (most recent call last):
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/converter.py”, line 819, in compile_gm_and_get_preprocessed_gm_hash
compiled = compile(
RuntimeError: fail to compile: Invalid NPU ID

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/elicer/workspaces/llama_ve310/bin/furiosa-llm”, line 7, in
sys.exit(main())
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/cli/main.py”, line 20, in main
args.dispatch_function(args)
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/cli/convert.py”, line 246, in convert
builder.build(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/builder.py”, line 687, in build
target_model_artifact, target_model_pipelines = self._build_model_artifact(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/builder.py”, line 590, in _build_model_artifact
pipelines_with_metadata = build_pipelines(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/helper.py”, line 285, in build_pipelines
pipelines = pipeline_builder.build_pipelines(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 844, in build_pipelines
return PipelineBuilder.compile_supertasks_in_parallel(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 1064, in compile_supertasks_in_parallel
local_pipelines = PipelineBuilder.__compile_supertasks_aux(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 1108, in __compile_supertasks_aux
_compile_supertasks_in_pipeline(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 250, in _compile_supertasks_in_pipeline
compile_result, hash_val = GraphModuleConverter.compile_gm_and_get_preprocessed_gm_hash(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/converter.py”, line 837, in compile_gm_and_get_preprocessed_gm_hash
raise RuntimeError(f"Compilation failed with error {e}")
RuntimeError: Compilation failed with error fail to compile: Invalid NPU ID
(llama_ve310) elicer@f5e3ec3ba50d:~$

Invalid NPU ID오류가 발생하는데 무엇이 잘못된 것일까요?

(llama_ve310) elicer@f5e3ec3ba50d:~/workspaces$ furiosa-smi info
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| Index | Arch | Device | Firmware | PERT | Temp. | Power | PCI-BDF |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| 0 | rngd | npu5 | 2025.3.0+c097ea0 | 2025.3.0+52e5705 | 40.04°C | 34.56 W | 0000:bc:00.0 |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| 1 | rngd | npu6 | 2025.3.0+c097ea0 | 2025.3.1+52e5705 | 30.37°C | 34.56 W | 0000:bd:00.0 |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+

furiosa-llm 버전 2025.3.3 버전으로 아티팩트만들 때 시간이 너무 오래 걸려
2025.3.1 버전으로 다운그레이드 시켜서 테스트를 한 상태입니다.

(llama_ve310) elicer@f5e3ec3ba50d:~/workspaces$ pip list | grep furiosa
furiosa-llm 2025.3.1
furiosa-llm-models 2025.3.0
furiosa-model-compressor 2025.3.0
furiosa-model-compressor-impl 2025.3.0
furiosa-models-lang 2025.3.0
furiosa-native-compiler 2025.3.1
furiosa-native-llm-common 2025.3.1
furiosa-native-runtime 2025.3.2
furiosa-smi-py 2025.3.0
furiosa-torch-ext 2025.3.1

안녕하세요? 불편드려 죄송합니다. Invalid NPU ID는 -tp8 옵션을 주셨으면 발생하면 안되는 에러이기는 합니다. 한번 공유 주신 버전으로 제가 재현해보고 내일 까지 업데이트 드리도록 하겠습니다.

혹시 apt list --installed | grep furiosa 커맨드의 실행 결과도 공유 부탁드려도 될까요?

예를 들면, 다음과 같이 출력됩니다.

apt list --installed | grep furiosa 

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-compiler/jammy-rc,now 2025.3.0-3 amd64 [installed]
furiosa-driver-rngd/jammy-rc,now 1.11.0-3 all [installed,upgradable to: 2025.3.1-3]
furiosa-firmware-image-rngd/jammy-rc,now 1.7.0 all [installed,upgradable to: 2025.3.1]
furiosa-firmware-tools-rngd/jammy-rc,now 2025.3.1-3 amd64 [installed]
furiosa-libhal-nvp/jammy-nightly,now 0.11.0-3+nightly-250714+9f4c23a amd64 [installed,upgradable to: 0.11.0-3+nightly-251023+99925dc]
furiosa-libsmi/jammy-rc,now 2025.3.0-3 amd64 [installed]
furiosa-mlperf-resources/jammy-rc,now 5.0.0 amd64 [installed]
furiosa-pert-rngd/jammy-nightly,now 0.1.0-3+nightly-250929+f9abfb9 amd64 [installed,upgradable to: 2025.3.1-3]
furiosa-smi/jammy-rc,now 2025.3.0-3 amd64 [installed]

요청하신 내용은 다음과 같습니다.

(llama_ve310) elicer@f5e3ec3ba50d:~$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-bench/now 0.10.3-3 amd64 [installed,local]
furiosa-compiler/now 0.10.3-3 amd64 [installed,upgradable to: 2025.3.0-3]
furiosa-libcompiler/now 0.10.1-3 amd64 [installed,local]
furiosa-libhal-warboy/now 0.12.0-3 amd64 [installed,local]
furiosa-libnux/now 0.10.1-3 amd64 [installed,local]
furiosa-libsmi/jammy,now 2025.3.0-3 amd64 [installed,automatic]
furiosa-smi/jammy,now 2025.3.0-3 amd64 [installed]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

결과가 많이 다른데요… 버전에 맞게 설치를 해야 하는 것일까요?
제가 CSP에서 인스턴스를 만들어서 사용중이라 서버 재기동은 별도로 요청해야 합니다.

참고로 말씀드리면 이전에 만들어진 아티팩트는 허깅페이스에 공개한 상태입니다.
해당 아티팩트는 잘 작동이 됩니다.

새로운 인스턴스에서 새로 개발환경을 만들어서 양자화를 수행하고
아티팩트를 생성하려고 할 때 발생한 오류입니다.
참고하시기 바랍니다.

원래는 양자화를 안 하고 바로 아티팩트를 만들어서 성능을 비교하려고 했었는데
아티팩트 만드는 시간이 너무 오래 걸려서 양자화를 한 모델로 성능을 평가하였습니다.

1 Like

안녕하세요?

설치된 패키지들을 보면 1세대 제품인 Warboy용 패키지들이네요. 그렇다면, Invalid NPU id 오류가 설명이 되네요.

인스턴스에 미리 설치된 패키지였다면, Warboy, RNGD용 인스턴스가 구분되어 있을 것 같네요.

만약에 설치 문서를 보고 직접 설치하신거라면, Welcome to Furiosa Docs — FuriosaAI Developer Center 2025.3.1 documentation 를 참고하시면 됩니다.

제가 일일이 손으로 맞추어 보았지만 아티팩트를 생성하는 시간이 오래 걸리는 것은 동일했습니다.

수작업으로 아래와 같은 상태까지 만들었지만…
firmware를 새로 설치하여 콜드 부팅이필요하고
furiosa-libhal-nvp는 설치가 되지 않아서
아래와 같은 상태로 양자화는 되지만 아티팩트는 만들어지지 않습니다.
계속 대기 상태인데요…

furiosa-compiler/now 0.10.3-3 amd64 [installed,upgradable to: 2025.3.0-3]
furiosa-driver-rngd/jammy,now 2025.3.1-3 all [installed]
furiosa-firmware-image-rngd/jammy,now 2025.3.1 all [installed]
furiosa-firmware-tools-rngd/jammy,now 2025.3.1-3 amd64 [installed]
furiosa-libsmi/jammy,now 2025.3.0-3 amd64 [installed,automatic]
furiosa-pert-rngd/jammy,now 2025.3.1-3 amd64 [installed]
furiosa-smi/jammy,now 2025.3.0-3 amd64 [installed]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

그래서 인스턴스를 삭제하고 새로 생성하니
아래와 같이 warboy 상태로 설정되어 있습니다.

elicer@a9c6e8a77490:~$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-bench/now 0.10.3-3 amd64 [installed,local]
furiosa-compiler/now 0.10.3-3 amd64 [installed,local]
furiosa-libcompiler/now 0.10.1-3 amd64 [installed,local]
furiosa-libhal-warboy/now 0.12.0-3 amd64 [installed,local]
furiosa-libnux/now 0.10.1-3 amd64 [installed,local]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

인스턴스를 생성할 때 warboy나 renegade를 위한 환경을 특별히 설정하는 것이 없는데요
이런 경우. csp에게 renegade를 위한 환경설정 작업을 요청해야 하는 것일까요?
매번 인스턴스 생성할 때 마다 요청을 해야 하는 것인지 모르겠지만
csp에게 어떤 식으로 요청을 하면 될지 문의드립니다.

안녕하세요, 퓨리오사에이아이 김종욱입니다.

환경에 있는 /etc/apt/sources.list.d/furiosa.list 파일과 /etc/apt/auth.conf.d/furiosa.conf 파일을 삭제하신 후에, 여기 페이지 에서 Setting up APT 항목만 따라하시고, 아래와 같이 comiler를 RNGD 버전에 맞춰서 설치를 하실 수 있을까요?

$ sudo apt update
$ apt list -a | grep furiosa-compiler # 여기서 2025.3.0 기준의 버전을 확인한 다음
$ sudo apt install furiosa-compiler=2025.3.0-3 # 해당 버전을 지정한 후 설치

수행한 결과는 아래와 같습니다.

elicer@a9c6e8a77490:/etc/apt/trusted.gpg.d$ apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
furiosa-bench/now 0.10.3-3 amd64 [installed,local]
furiosa-compiler/jammy,now 2025.3.0-3 amd64 [installed]
furiosa-libcompiler/now 0.10.1-3 amd64 [installed,local]
furiosa-libhal-warboy/now 0.12.0-3 amd64 [installed,local]
furiosa-libnux/now 0.10.1-3 amd64 [installed,local]
furiosa-toolkit/now 0.11.0-3 amd64 [installed,local]

compiler는 정상적으로 설치가된 것으로 보이며, 아티팩트 자체는 만들 수 있을 것으로 보입니다. 한번 시도를 해볼 수 있으실까요? 다만, 실행을 위해서는 RNGD Instance 환경을 따로 받아야할 것 같습니다.

아티팩트 생성이 기존에서는 1시간 안에 처리되었는데 3시간이 넘어도 처리가 되지 않아서
일단 중지시킨 상태입니다.

관련하여 csp에 레너게이드를 위한 기본 패키지 변경을 금요일에 요청한 상태이고 대기중입니다.