Llama 3.1 8b instruct 기반 sLLM 모델의 양자화 이슈

tobewiseys · August 8, 2025, 2:06am

안녕하세요?

furiosa-llm (2025.3.0) 과 가이드 문서를 참고하여 llama 3.1 8b instruct 모델을 양자화하고 아티팩트를 성공적으로 만들고 추론까지 완료하였습니다.

그래서 저희 회사에서 llama 3.1 8b Instruct 모델을 pretrain 시켜서 Hugging Face에 공개한

unidocs/llama-3.1-8b-komedic-instruct 모델을 양자화하고 아티팩트로 만들고자 합니다.

아래와 같이 가이드 문서에서 제시하고 있는 코드에서 모델부분만 변경해서 수행시키면

from furiosa_llm.optimum.dataset_utils import create_data_loader
from furiosa_llm.optimum import QuantizerForCausalLM, QuantizationConfig

# model_id = “meta-llama/Llama-3.1-8B-Instruct”
model_id = “unidocs/llama-3.1-8b-komedic-instruct”

-# Create a dataloader for calibration
dataloader = create_data_loader(
tokenizer=model_id,
dataset_name_or_path=“mit-han-lab/pile-val-backup”,
dataset_split=“validation”,
num_samples=5, # Increase this number for better calibration
max_sample_length=1024,
)

quantized_model = “./quantized_model”

-# Load a pre-trained model from Hugging Face model hub

quantizer = QuantizerForCausalLM.from_pretrained(model_id)

-# Calibrate, quantize the model, and save the quantized model

quantizer.quantize(quantized_model, dataloader, QuantizationConfig.w_f8_a_f8_kv_f8())

아래와 같은 오류가 발생하고 있습니다.

meta-llama/Llama-3.1-8B-Instruct 인 경우에는 정상적으로 수행되던 환경입니다.

unidocs/llama-3.1-8b-komedic-instruct 모델은 정상적으로 다운로드된 상태입니다.

또한, unidocs/llama-3.1-8b-komedic-instruct 모델은 위의 meta의 llama 3.1 8b instruct 모델을 pretrain 단계에서 추가학습시킨 모델입니다.

무엇이 문제인지 알려주시면 감사하겠습니다.

추가로 아래와 같은 메시지가 존재하고 있습니다.

LlamaForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn’t directly inherit from GenerationMixin. From v4.50:backhand_index_pointing_left: onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.

If you’re using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See Auto Classes

If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you’ll get an exception).

If you are not the owner of the model architecture class, please contact the model code owner to update it.

hyunsik · August 8, 2025, 4:20am

안녕하세요?

불편드려 죄송합니다. 양자화 경로 일부 코드에 pretrained_id 에 의존된 부분이 있습니다. 해당 문제를 해결해서 다음 주에 hotfix 릴리즈를 진행하도록 하겠습니다.

최현식 드림

tobewiseys · August 8, 2025, 12:26pm

네 알겠습니다.
빠른 진단 및 응대 감사드립니다.
다음 주 릴리즈되면 테스트하도록 하겠습니다.

hyunsik · August 14, 2025, 3:50am

안녕하세요?

릴리즈가 다소 늦어서 다음 주 중순경까지 준비될 것 같습니다. 지연 말씀 알리게되어 죄송합니다.

hyunsik · August 22, 2025, 6:41am

안녕하세요?

일정 보다 조금 늦어 죄송합니다. 2025.3.1 minor 릴리즈가 방금 publish 되었습니다. 말씀 드린 일정 보다 조금 늦어 릴리즈 노트 올리는 것 보다 먼저 말씀 드리네요. 아래 instruction 을 따라 업그레이드 하시면 됩니다.

driver, firmware 는 minor 한 업그레이드라 반드시 하실 필요는 없고 간단하게 업그레이드 하시려면 아래 커맨드만 실행하시면 됩니다.

pip install --upgrade furiosa-llm

릴리즈 노트가 정식으로 publish 되겠지만 주요 변경사항은 다음과 같습니다.

fine-tuned model support (다시 말해, model config 의 model_type이 같고 기존 모델 크기가 비슷하다면 양자화와 컴파일을 지원합니다)
- 예로 올려주신 모델도 테스트 하였습니다.
Structured output 을 지원합니다 (json object, json schema, choices, regex, ebnf/lark grammar)

사용하시면서 궁금하신 점 있으시면 편히 질문 주세요.

hyunsik · August 22, 2025, 6:43am

추가로, 양자화도 좋지만 8b면 모델이 크지 않아 bf16 으로 바로 컴파일을 권장 드려봅니다. 양자화 없이 furiosa-llm build 등으로 바로 빌드 가능하십니다. 질문 있으시면 남겨주세요.

tobewiseys · August 22, 2025, 6:46am

2025.3.1 minor 릴리즈 감사합니다.
테스트 해 보도록 하겠습니다.

tobewiseys · August 22, 2025, 8:47am

테스트를 수행결과입니다.

양자화
- 성공적으로 양자화된 모델이 생성됨
아티팩트 생성

furiosa-llm을 upgrade한 후 아티팩트 생성을 시도하니 아래와 같은 오류가 발생

OSError: [Errno 107] Transport endpoint is not connected: ‘/proc/stat’

시행착오를 거치다 인스턴스를 재기동하니 해당 오류가 사라짐

추론

아티팩트가 성공적으로 생성된 후 추론 수행 - 성공함

이제 좀 디테일한 작업이 가능할 것 같습니다.
지원 감사드립니다.

Topic		Replies	Views
Fake quantize mode 이슈 Furiosa LLM	3	82	August 7, 2025
아티팩트 생성 시 오류 발생 이슈 Furiosa LLM	10	130	November 3, 2025
Gemma 3 27b 모델 양자화 오류 Furiosa LLM	4	184	December 5, 2025
Isnet 모델을 양자화 하는데 도움이 필요합니다 일반	8	336	August 8, 2023
Furiosa-llm 모델 로드 이슈 Furiosa LLM	2	147	August 6, 2025

Llama 3.1 8b instruct 기반 sLLM 모델의 양자화 이슈

Related topics