빠른 지원에 감사드리고 있습니다.
이번에는 다음과 같이 아티팩트 생성과 관련된 내용입니다.
아래와 같은 명령을 이용하여 llama 3.1 8b를 미세조정한 모델을
양자화한 후, 아티팩트를 만들고자 하였습니다.
furiosa-llm build ./quantized_llam318_komedic
./komedic_qt8b
-tp 8 --max-seq-len-to-capture 4096
–num-pipeline-builder-workers 4
–num-compile-workers 4
해당 모델은 허깅페이스에 unidocs/llama-3.1-8b-komedic-instruct 로 공개되어 있으며,
해당 모델을 내려 받아서 8bit로 양자화 시켜서 ./quantized_llam318_komedic 폴더에 저장하였습니다.
양자화된 모델로 아티팩트를 만들 때 처리 속도 향상을 위해서 --num-pipeline-builder-workers와 --num-compile-workers의 숫자를 증가시켰습니다. 현재 36core여서 16,16으로 처음에 세팅하였으나, 아티팩트 만드는 시간이 증가하지 않아서 4,4로 조정하였습니다.
위와 같이 수행한 후 process 상태를 확인해 보면 ray::IDLE 상태가 많이 보입니다.
elicer 12237 12156 0 01:59 pts/2 00:00:20 ray::__build_fx_pipelines_with_ray
elicer 12238 12156 0 01:59 pts/2 00:00:05 ray::IDLE
elicer 12239 12156 0 01:59 pts/2 00:00:05 ray::IDLE
elicer 12749 5379 0 01:59 pts/3 00:00:00 grep --color=auto ray
(llama_ve310) elicer@f5e3ec3ba50d:~$ source /home/elicer/workspaces/llama_ve310/bin/activate
(llama_ve310) elicer@f5e3ec3ba50d:~$ ps -ef | grep ray
16,16으로 늘린 경우에는 대부분이 IDLE 상태이고 CPU 점유율이 5%가 되지 않고 있습니다.
이전에 제가 성공적으로 작업을 완료하였을 때는 50분 경 작업이 종료되었는데 지금은 10시간 가까이 돌려도 아티팩트가 생성되지 않고 있습니다. (SSD에서 HDD로 변경한 것이 관련이 있을지요?)
10시간 정도 지난 후에 아래와 같은 메시지가 나오고 있습니다.
- Embedding
tp_config:
inputs:
outputs:
operators: {}(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 [furiosa-llm] Compiling pipeline Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv4095-b1-attn4096, supertask 2 for renegade-8pe.
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Block type: first
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Loading compiler config for CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=8, num_chip=1, block_type=<BlockType.FIRST: ‘first’>, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=4095), phase=<PipelineMode.LLM_DECODE: ‘decode’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True)
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 Using compiler config {‘progress_mode’: ‘ProgressBar’, ‘propagate_sparse_axis_from_op’: ‘Gather’, ‘cast_64bit_types’: True, ‘support_i64_index’: False, ‘implicit_type_casting’: False, ‘implicit_type_casting_all_activation’: False, ‘reshape_remover_ignores_slice_concat’: False, ‘ignore_casts_for_precision_gain’: True, ‘arithmetic_qk_masking’: False, ‘separate_vector_ops_from_dpe’: False, ‘separate_vrf_from_interleaving’: True, ‘no_interleaving_at_fusion_stage’: False, ‘experimental_demote_concat’: True, ‘activation_shape_guide’: None, ‘normalize_sine_cosine_operator_domain’: False, ‘max_operation_memory_ratio’: 1.0, ‘tensor_size_in_page_after_split’: 2, ‘tensor_unit_bridge_threshold_in_page’: 12, ‘dma_bridge_threshold_in_bytes’: 4294967296, ‘use_dma_bridges_only’: False, ‘use_split_operation_in_tk’: True, ‘allow_unlowered_operators’: False, ‘remove_lower’: False, ‘remove_unlower’: False, ‘lowering_mode’: ‘Optimal’, ‘max_num_partitioning_axes’: 2, ‘max_num_unique_shapes’: 500, ‘padding_policy’: {‘Small’: 1.1}, ‘allow_unlimited_padding’: True, ‘tactic_hint’: ‘ForLlmModelDecode’, ‘tactic_context_config’: ‘None’, ‘local_population_threshold’: 0, ‘use_sparse_bridge_population’: True, ‘reshape_einsum_mode’: {‘Reshape’: {‘permute’: False}}, ‘use_efficiently_broadcasted_tactic’: False, ‘populate_dma_optimized_einsum_tactics’: True, ‘use_exhaustive_search_in_binary_lowering’: False, ‘dma_segment_mode’: ‘Outer’, ‘populate_irregular_indices_shape’: True, ‘allow_hp_lowering’: True, ‘use_attention_kernel’: True, ‘attention_kernel_hint_mode’: ‘Decode’, ‘tactic_sorting_policy’: ‘ByEstimation’, ‘dont_care_bridge_cost’: False, ‘dma_preference’: 1.0, ‘num_transaction_simulation_per_pe’: 1024, ‘tactic_tail_shape_alignment’: 8, ‘enable_vrf_half_mode’: True, ‘vrf_reuse_optimization_level’: 2, ‘reduce_by_ve_allow_partitioning_tail_split’: True, ‘enable_tactic_pruning’: True, ‘use_aligned_repartition2’: True, ‘bf16_partial_sum_policy’: ‘EnableForSplitAndChipAndCluster’, ‘coalesce_tensors_by_common_split’: False, ‘use_block_compile’: False, ‘skip_clustering_by_scc’: False, ‘allow_external_operators_in_lir’: False, ‘no_dram_reuse’: False, ‘scheduler_beam_search’: True, ‘subgraph_scheduling’: False, ‘attention_mask_reuse_traverse’: True, ‘expected_total_beam_states’: 100000, ‘focused_ops_coverage_limit’: None, ‘dma_grace_period’: 10000, ‘sync_grace_period_per_chip’: 10000, ‘estimate_dma_command_gather_with_concrete_samples’: False, ‘mimic_sync_io’: False, ‘profile_sync’: False, ‘dump_in_nvp’: False, ‘instruction_mem_budget’: 720896, ‘instruction_chunk_size’: 2000, ‘enable_tuc_profile’: False, ‘dedup_task_commands’: True, ‘insert_wait_by_estimation’: False, ‘resolve_noc_timeout’: True, ‘c_compile_debug_mode’: False, ‘dma_throughput_per_pe’: 180, ‘remove_dtoh_htod’: False, ‘use_einsum_by_dpe_for_interleaved_mul’: False, ‘use_dma_stos_for_concat_paste’: True, ‘use_custom_broadcast_for_concat_paste’: False, ‘profile_exact_command_cycle’: False, ‘duplicate_arm_binary’: False, ‘allow_reduce_by_ve_cluster_chip_reduce’: True, ‘allow_multiple_consumer_rf’: False, ‘allow_reduce_by_ve_cluster_chip_reduce_base_population’: False, ‘all_reduce_as_dma_reduce_broadcast_tactic’: True}
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 [furiosa-llm] Generated graph metadata: —
(__compile_supertasks_with_ray pid=12237) valid_length: ~
(__compile_supertasks_with_ray pid=12237) graph_io_category:
(__compile_supertasks_with_ray pid=12237) input_category:
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
INFO:2025-10-26 02:00:21+0000 hash for the graph: 015314bf4f0084e296cc44b2da7571f3
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - ModelInput
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) - Weight
(__compile_supertasks_with_ray pid=12237) output_category:
(__compile_supertasks_with_ray pid=12237) - Intermediate:
(__compile_supertasks_with_ray pid=12237) named_axes:
(__compile_supertasks_with_ray pid=12237) - Batch
(__compile_supertasks_with_ray pid=12237) - Sequence
(__compile_supertasks_with_ray pid=12237) - Embedding
(__compile_supertasks_with_ray pid=12237) tp_config:
(__compile_supertasks_with_ray pid=12237) inputs:
(__compile_supertasks_with_ray pid=12237) outputs:
(__compile_supertasks_with_ray pid=12237) operators: {}
(__compile_supertasks_with_ray pid=12237)
(__compile_supertasks_with_ray pid=12237) INFO:2025-10-26 02:00:21+0000 hash for the graph: d3809467d55a7b74d967ae5279a35b1f
source /home/elicer/workspaces/llama_ve310/bin/activate
ERROR: invalid npu id: renegade-8pe
Encountered exception!
non_shared_configs(w/o past_kv): [NonSharedPipelineBuildConfig(args_data=(), kwargs_data={‘input_ids’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘position_ids’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘new_key_location’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘new_value_location’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.int32), ‘bucket_size’: 4096, ‘use_cache’: False, ‘is_prefill’: True, ‘causal_mask’: TensorGenInfo(shape=torch.Size([1, 4096, 4096]), dtype=torch.bool)}, pipeline_name=‘Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv0-b1-attn4096’, compile_config=CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=None, num_chip=None, block_type=None, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=0), phase=<PipelineMode.LLM_PREFILL: ‘prefill’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True), logits_slice_config=LogitsSliceConfig(slice_direction=‘left’, slice_size=1), num_blocks_per_supertask=1), NonSharedPipelineBuildConfig(args_data=(), kwargs_data={‘input_ids’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘attention_mask’: TensorGenInfo(shape=torch.Size([1, 4096]), dtype=torch.bool), ‘position_ids’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘new_key_location’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘new_value_location’: TensorGenInfo(shape=torch.Size([1, 1]), dtype=torch.int32), ‘bucket_size’: 4096, ‘use_cache’: False, ‘is_prefill’: False, ‘past_valid_key_indices’: TensorGenInfo(shape=torch.Size([4095]), dtype=torch.int32), ‘past_valid_value_indices’: TensorGenInfo(shape=torch.Size([4095]), dtype=torch.int32)}, pipeline_name=‘Quantized_furiosa_llm_models.llama3.symbolic.mlperf_submission.LlamaForCausalLM-kv4095-b1-attn4096’, compile_config=CompilerConfigContext(model_metadata=ModelMetadata(pretrained_id=‘meta-llama/Llama-3.1-8B’, task_type=‘text-generation’, llm_config=LLMConfig(optimization_config=OptimizationConfig(attention_type=<AttentionType.PAGED_ATTENTION: ‘PAGED_ATTENTION’>, optimize_rope=True, optimize_packed=True, decompose_layernorm=False, optimize_furiosa=False, use_unsplit_packed=False, compact_causal_mask=False, use_rngd_gelu=False, causal_mask_free_decoding=True, kv_cache_sharing_across_beams=False, inbound_beamsearch_softmax=False, calculate_logit_only_for_last_token=False, optimized_for_speculative_decoding=False, use_2d_masks=False, merged_kv_indices=False), quantization_config=QuantizationConfig(weight=<QDtype.FP8: ‘fp8’>, activation=<QDtype.FP8: ‘fp8’>, kv_cache=<QDtype.FP8: ‘fp8’>, use_mcp=True)), hf_configs={‘vocab_size’: 128256, ‘max_position_embeddings’: 131072, ‘hidden_size’: 4096, ‘intermediate_size’: 14336, ‘num_hidden_layers’: 32, ‘num_attention_heads’: 32, ‘num_key_value_heads’: 8, ‘hidden_act’: ‘silu’, ‘initializer_range’: 0.02, ‘rms_norm_eps’: 1e-05, ‘pretraining_tp’: 1, ‘use_cache’: False, ‘rope_theta’: 500000.0, ‘rope_scaling’: {‘factor’: 8.0, ‘high_freq_factor’: 4.0, ‘low_freq_factor’: 1.0, ‘original_max_position_embeddings’: 8192, ‘rope_type’: ‘llama3’}, ‘attention_bias’: False, ‘attention_dropout’: 0.0, ‘mlp_bias’: False, ‘head_dim’: 128, ‘torch_dtype’: ‘bfloat16’, ‘tie_word_embeddings’: False, ‘architectures’: [‘LlamaForCausalLM’], ‘bos_token_id’: 128000, ‘eos_token_id’: [128001, 128008, 128009], ‘_name_or_path’: ‘./quantized_llam318_komedic’, ‘transformers_version’: ‘4.48.1’, ‘model_type’: ‘llama’}, model_weight_path=None, trust_remote_code=False, allow_bfloat16_cast_with_mcp=True, auto_bfloat16_cast=None), num_pe_per_chip=None, num_chip=None, block_type=None, num_blocks_per_graph=1, embedding_as_single_block=False, bucket=Bucket(batch_size=1, attention_size=4096, kv_cache_size=4095), phase=<PipelineMode.LLM_DECODE: ‘decode’>, beam_size=None, compiler_config_overrides=None, enable_bf16_partial_sum_for_split=True), logits_slice_config=None, num_blocks_per_supertask=1)]Traceback (most recent call last):
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/converter.py”, line 819, in compile_gm_and_get_preprocessed_gm_hash
compiled = compile(
RuntimeError: fail to compile: Invalid NPU IDDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/elicer/workspaces/llama_ve310/bin/furiosa-llm”, line 7, in
sys.exit(main())
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/cli/main.py”, line 20, in main
args.dispatch_function(args)
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/cli/convert.py”, line 246, in convert
builder.build(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/builder.py”, line 687, in build
target_model_artifact, target_model_pipelines = self._build_model_artifact(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/builder.py”, line 590, in _build_model_artifact
pipelines_with_metadata = build_pipelines(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/artifact/helper.py”, line 285, in build_pipelines
pipelines = pipeline_builder.build_pipelines(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 844, in build_pipelines
return PipelineBuilder.compile_supertasks_in_parallel(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 1064, in compile_supertasks_in_parallel
local_pipelines = PipelineBuilder.__compile_supertasks_aux(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 1108, in __compile_supertasks_aux
_compile_supertasks_in_pipeline(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/api.py”, line 250, in _compile_supertasks_in_pipeline
compile_result, hash_val = GraphModuleConverter.compile_gm_and_get_preprocessed_gm_hash(
File “/home/elicer/workspaces/llama_ve310/lib/python3.10/site-packages/furiosa_llm/parallelize/pipeline/builder/converter.py”, line 837, in compile_gm_and_get_preprocessed_gm_hash
raise RuntimeError(f"Compilation failed with error {e}")
RuntimeError: Compilation failed with error fail to compile: Invalid NPU ID
(llama_ve310) elicer@f5e3ec3ba50d:~$
Invalid NPU ID오류가 발생하는데 무엇이 잘못된 것일까요?
(llama_ve310) elicer@f5e3ec3ba50d:~/workspaces$ furiosa-smi info
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| Index | Arch | Device | Firmware | PERT | Temp. | Power | PCI-BDF |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| 0 | rngd | npu5 | 2025.3.0+c097ea0 | 2025.3.0+52e5705 | 40.04°C | 34.56 W | 0000:bc:00.0 |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
| 1 | rngd | npu6 | 2025.3.0+c097ea0 | 2025.3.1+52e5705 | 30.37°C | 34.56 W | 0000:bd:00.0 |
±------±-----±-------±-----------------±-----------------±--------±--------±-------------+
furiosa-llm 버전 2025.3.3 버전으로 아티팩트만들 때 시간이 너무 오래 걸려
2025.3.1 버전으로 다운그레이드 시켜서 테스트를 한 상태입니다.
(llama_ve310) elicer@f5e3ec3ba50d:~/workspaces$ pip list | grep furiosa
furiosa-llm 2025.3.1
furiosa-llm-models 2025.3.0
furiosa-model-compressor 2025.3.0
furiosa-model-compressor-impl 2025.3.0
furiosa-models-lang 2025.3.0
furiosa-native-compiler 2025.3.1
furiosa-native-llm-common 2025.3.1
furiosa-native-runtime 2025.3.2
furiosa-smi-py 2025.3.0
furiosa-torch-ext 2025.3.1