RNGD에서 LLM이 아닌 Vision Model 또는 Custom Model 컴파일 및 실행 방법 문의

아래 환경에서 LLM이 아닌 vision 모델 (ex, resnet) 또는 torch.nn.Module로 구현한 커스텀 모델을 컴파일하고 실행시킬 수 있는 방법 문의드립니다.

# apt list --installed | grep furiosa

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

furiosa-compiler/now 2025.3.0-3 amd64 [installed,local]
furiosa-libsmi/now 2025.3.0-3 amd64 [installed,local]
furiosa-smi/now 2025.3.0-3 amd64 [installed,local]

# pip list | grep furiosa
furiosa-llm                   2025.3.4
furiosa-llm-models            2025.3.0
furiosa-model-compressor      2025.3.0
furiosa-model-compressor-impl 2025.3.0
furiosa-models-lang           2025.3.0
furiosa-native-compiler       2025.3.1
furiosa-native-llm-common     2025.3.1
furiosa-native-runtime        2025.3.2
furiosa-smi-py                2025.3.0
furiosa-torch-ext             2025.3.1

아래의 방법으로 시도해봤지만, 잘 되지 않아서 문의 남깁니다.

  1. onnx 추출 후, furiosa-compiler cli로 컴파일 → ERROR: io error: unexpected end of file, error: Invalid model 에러

  2. nn.Linear 하나를 포함한 커스텀 모델에 furiosa.native_compiler.compile method 사용 → [3/4] Compiling from cdfg to lir 중, Beam search step을 무한 반복

    log

    [3/4] :magnifying_glass_tilted_left: Compiling from cdfg to lir
    2025-11-03T08:32:50.366617Z INFO Finished graph mutation for subgraphs.
    2025-11-03T08:33:55.944642Z INFO Baseline scheduling step #23180 (schedule actions: 25130 / 25130) @ 60.00s
    2025-11-03T08:34:10.960088Z INFO Baseline scheduling finished: cycle: 21508509 (method: Baseline, step: 37380)
    2025-11-03T08:34:13.833770Z INFO Beam size: 20
    2025-11-03T08:35:13.851061Z INFO Beam search step #188 @ 60.02s
    2025-11-03T08:36:13.874138Z INFO Beam search step #571 @ 120.04s
    2025-11-03T08:37:13.993064Z INFO Beam search step #942 @ 180.16s
    2025-11-03T08:38:14.055478Z INFO Beam search step #1316 @ 240.22s
    2025-11-03T08:39:14.226939Z INFO Beam search step #1684 @ 300.39s
    2025-11-03T08:40:14.298412Z INFO Beam search step #2047 @ 360.46s
    2025-11-03T08:41:14.428752Z INFO Beam search step #2419 @ 420.59s
    2025-11-03T08:42:14.554889Z INFO Beam search step #2789 @ 480.72s
    2025-11-03T08:43:14.601654Z INFO Beam search step #3159 @ 540.77s
    2025-11-03T08:44:14.615889Z INFO Beam search step #3632 @ 600.78s
    2025-11-03T08:45:14.635573Z INFO Beam search step #4204 @ 660.80s
    2025-11-03T08:46:14.683514Z INFO Beam search step #4734 @ 720.85s
    2025-11-03T08:47:14.771746Z INFO Beam search step #5236 @ 780.94s
    2025-11-03T08:48:14.848452Z INFO Beam search step #5715 @ 841.01s
    2025-11-03T08:49:14.959629Z INFO Beam search step #6179 @ 901.12s
    2025-11-03T08:50:15.052284Z INFO Beam search step #6624 @ 961.22s
    2025-11-03T08:51:15.169065Z INFO Beam search step #7051 @ 1021.33s
    2025-11-03T08:52:15.211545Z INFO Beam search step #7472 @ 1081.38s
    2025-11-03T08:53:15.369175Z INFO Beam search step #7879 @ 1141.53s
    2025-11-03T08:54:15.513916Z INFO Beam search step #8281 @ 1201.68s
    2025-11-03T08:55:15.573405Z INFO Beam search step #8670 @ 1261.74s
    2025-11-03T08:56:15.663664Z INFO Beam search step #9054 @ 1321.83s
    2025-11-03T08:57:15.820988Z INFO Beam search step #9432 @ 1381.99s
    2025-11-03T08:58:15.986759Z INFO Beam search step #9803 @ 1442.15s
    2025-11-03T08:59:16.044614Z INFO Beam search step #10166 @ 1502.21s
    2025-11-03T09:00:16.169216Z INFO Beam search step #10524 @ 1562.33s
    2025-11-03T09:01:16.218971Z INFO Beam search step #10879 @ 1622.38s
    2025-11-03T09:02:16.236491Z INFO Beam search step #11226 @ 1682.40s
    2025-11-03T09:03:16.315402Z INFO Beam search step #11571 @ 1742.48s
    2025-11-03T09:04:16.328344Z INFO Beam search step #11910 @ 1802.49s
    2025-11-03T09:05:16.491118Z INFO Beam search step #12248 @ 1862.66s
    2025-11-03T09:06:16.521900Z INFO Beam search step #12582 @ 1922.69s
    2025-11-03T09:07:16.578968Z INFO Beam search step #12913 @ 1982.74s
    2025-11-03T09:08:16.690849Z INFO Beam search step #13242 @ 2042.86s
    2025-11-03T09:09:16.738329Z INFO Beam search step #13567 @ 2102.90s

  3. torchvision.models.resnet18furiosa.native_compiler.compile method 사용 → torch dynamo 에러 발생

    error log

    Traceback (most recent call last):
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/output_graph.py”, line 1446, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
    File "/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/dynamo/repro/after_dynamo.py", line 129, in call
    compiled_gm = compiler_fn(gm, example_inputs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/dynamo/repro/after_dynamo.py”, line 129, in call
    compiled_gm = compiler_fn(gm, example_inputs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/init.py”, line 2279, in call
    return self.compiler_fn(model
    , inputs, **self.kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/furiosa_torch_ext/torch_ext.py”, line 223, in call
    self.traced_gm = preprocess(gm, inputs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/furiosa_torch_ext/torch_ext.py”, line 207, in preprocess
    traced_gm = do_make_fx(fake_gm, fake_inputs, decomposition_table)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/furiosa_torch_ext/torch_ext.py”, line 79, in do_make_fx
    gm = make_fx(gm, decomposition_table=decomposition_table)(*inputs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 2110, in wrapped
    return make_fx_tracer.trace(f, *args)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 2048, in trace
    return self._trace_inner(f, *args)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 2034, in _trace_inner
    t = dispatch_trace(
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_compile.py”, line 32, in inner
    return disable_fn(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py”, line 632, in _fn
    return fn(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 1127, in dispatch_trace
    graph = tracer.trace(root, concrete_args) # type: ignore[arg-type]
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py”, line 632, in _fn
    return fn(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py”, line 823, in trace
    (self.create_arg(fn(*args)),),
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 1182, in wrapped
    out = f(*tensors)
    File “”, line 1, in
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py”, line 1688, in wrapped
    func_outputs = func(*func_args, **func_kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py”, line 801, in module_call_wrapper
    return self.call_module(mod, forward, args, kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 1039, in call_module
    return forward(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py”, line 794, in forward
    return _orig_module_call(mod, *args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
    return forward_call(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/_lazy_graph_module.py”, line 126, in _lazy_forward
    return self(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 784, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 361, in call
    raise e
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 348, in call
    return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py”, line 801, in module_call_wrapper
    return self.call_module(mod, forward, args, kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 1039, in call_module
    return forward(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py”, line 794, in forward
    return _orig_module_call(mod, *args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1747, in call_impl
    return forward_call(*args, **kwargs)
    File “<eval_with_key>.17”, line 277, in forward
    copy__default = torch.ops.aten.copy
    .default(bn1_running_mean, getitem_3); bn1_running_mean = getitem_3 = copy__default = None
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_ops.py”, line 716, in call
    return self._op(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py”, line 1230, in torch_function
    return func(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_ops.py”, line 716, in call
    return self._op(*args, **kwargs)
    RuntimeError: false INTERNAL ASSERT FAILED at “aten/src/ATen/RegisterFunctionalization_0.cpp”:3931, please report a bug to PyTorch. mutating a non-functional tensor with a functional tensor is not allowed. Please ensure that all of your inputs are wrapped inside of a functionalize() call.

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
    File “/root/cj/compile_test.py”, line 56, in
    results = compile(
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/furiosa_torch_ext/torch_ext.py”, line 241, in trace_module
    traced_callable(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1747, in _call_impl
    return forward_call(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py”, line 465, in _fn
    return fn(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 784, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 361, in call
    raise e
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/fx/graph_module.py”, line 348, in call
    return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1844, in _call_impl
    return inner()
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1790, in inner
    result = forward_call(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py”, line 1269, in call
    return self._torchdynamo_orig_callable(
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py”, line 526, in call
    return _compile(
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py”, line 924, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py”, line 666, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_utils_internal.py”, line 87, in wrapper_function
    return function(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py”, line 699, in _compile_inner
    out_code = transform_code_object(code, transform)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py”, line 1322, in transform_code_object
    transformations(instructions, code_options)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py”, line 219, in _fn
    return fn(*args, **kwargs)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py”, line 634, in transform
    tracer.run()
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py”, line 2796, in run
    super().run()
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py”, line 983, in run
    while self.step():
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py”, line 895, in step
    self.dispatch_table[inst.opcode](self, inst)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py”, line 2987, in RETURN_VALUE
    self._return(inst)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py”, line 2972, in _return
    self.output.compile_subgraph(
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/output_graph.py”, line 1117, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/output_graph.py”, line 1369, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/output_graph.py”, line 1416, in call_user_compiler
    return self._call_user_compiler(gm)
    File “/root/miniconda3/envs/cj/lib/python3.10/site-packages/torch/_dynamo/output_graph.py”, line 1465, in _call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
    torch._dynamo.exc.BackendCompilerFailed: backend=‘<furiosa_torch_ext.torch_ext.GmCapturer object at 0x742567b635b0>’ raised:
    RuntimeError: false INTERNAL ASSERT FAILED at “aten/src/ATen/RegisterFunctionalization_0.cpp”:3931, please report a bug to PyTorch. mutating a non-functional tensor with a functional tensor is not allowed. Please ensure that all of your inputs are wrapped inside of a functionalize() call.

    Set TORCH_LOGS=“+dynamo” and TORCHDYNAMO_VERBOSE=1 for more information

또한, 위의 환경에서는 LLM 만 지원되는 환경인지, furiosa-sdk는 더 이상 사용되지 않는지 궁금합니다.

안녕하세요? 죄송합니다만, 현재는 생성형 언어 모델을 중심으로 지원하고 있습니다. vision 모델이나 임의의 모델에 대한 지원은 관련 프로젝트가 아직 진행 중이며 PyTorch 프레임워크를 그대로 사용하여 torch.compile() 또는 eager로 형태로 준비하고 있습니다. 내년 부터는 관련 소식을 조금씩 전해드릴 수 있을 것 같습니다.

여러 가지 테스트를 진행하면서 아래와 같이 단순한 모델을 GraphModule로 추출하여 furiosa.native_compiler.compile로 컴파일 과정은 완료된 것을 확인했습니다만 (물론, 이게 정상적으로 컴파일된 것인지는 잘 모르겠습니다), 알려주신 내용으로는 이렇게 컴파일해도 런타임이 이를 지원하진 않을 것으로 추정되는 데 맞을까요?

class Model(torch.nn.Module):
  def __init__(self):
    self.linear = torch.nn.Linear(64, 64)
  def forward(self, x):
    return self.linear(x)

그렇다면 원글에서 제가 말씀드린 환경에서는

  • 현재 지원하는 LLM 모델만 컴파일할 수 있고 이를 실행할 수 있는 런타임 또한 LLM 모델만 지원한다
  • 간단한 모델 (linear layer 하나만 포함하는 모델 등)을 GraphModule로 추출하여 컴파일한다거나 onnx로 추출하여 컴파일(furiosa-sdk 사용하여 컴파일하는 것처럼)하는 것은 지원하지 않는다 (furiosa-compiler -h로 확인해보면 ONNX 입력을 받을 수 있는 것으로 되어 있는데, 에러 발생)

라고 이해할 수 있을까요?

현재 릴리즈된 버전은 LLM 모델의 컴파일과 실행만 정식으로 지원합니다.

일부 모델은 hacky 한 방법으로 컴파일까지 수행이 잘 될 것입니다. 기본적으로 컴파일러는 범용적으로 설계되었고 PyTorch 모델을 입력으로 받아드리고 있습니다.

다만, SW product 관점에서 임의 모델이나 vision 모델에 대한 컴파일은 아직 알파 단계로 외부에 릴리즈 하지는 않은 상황입니다. 아직 개발이 개발이 진행 중입니다. 내년 초 공개를 목표로 진행 중입니다.

혹시 유스 케이스를 들려주시면 조금 더 자세한 말씀을 드릴 수 있을 것 같아 제가 메시지를 드리겠습니다.

답변 감사합니다!

현재 살펴보고 있는 케이스는 LLM 뿐만 아니라 행렬곱이나 어텐션 등 단위 연산에 대한 FLOPS를 측정하는 것입니다. 모델 전체의 성능도 관심이 있지만, 단위 연산 별 성능에도 관심이 있어서 이러한 성능 추정이 가능한지 궁금합니다.

현재 주어진 릴리즈 환경에서 가능한 지, 또는 사용 가능한 툴이 존재하는지 궁금합니다.

의도를 자세히 설명 주셔서 감사합니다. 말씀 주신 것 처럼 연산 단위로 컴파일을 해서 하는 것이 최선일 것 같습니다만 furiosa-compiler는 global optimization이 기본이라 많은 연산자를 한번에 컴파일할 때와 작은 연산자 단위로 컴파일 할 때 FLOPS 차이가 적지 않게 발생할 수 있습니다. 따라서 성능 효율이 발생하는 단위까지 일련의 연산자를 컴파일 해야 의미있는 수치를 측정할 수 있을 것 같습니다.

아까 위에서 말씀 드린 임의의 모델을 컴파일하는 컴파일러와 런타임은 아직 알파버전인데요. 별도라도 조금 일찍이라도 전달 가능한지 내부에 논의해보고 업데이트 드리도록 하겠습니다.

1 Like

안녕하세요?

해당 기능을 준비하는 분과 상의를 해봤는데요. 아직 필수적으로 준비되어야 할 것들이 많이 남아 있어서 원래 말씀 드렸던 내년 초는 되어야 공개가 가능할 것 같습니다. 빠르게 도움 드리지 못하는 부분 양해 부탁드립니다.

1 Like