Example end-to-end flow with batch size>1 with the new runtime in 0.10.0

auro · July 22, 2024, 2:43pm

Priority: High
Do we have an example of a end-to-end flow with batch size>1 with the new runtime in 0.10.0

jongwook.kim · July 23, 2024, 2:04am

By changing only the batch size from 1 to a power of 2 in the existing code, it will work the same.
Here is an example code for batch size > 1 with runtime in 0.10.x.

import numpy as np
from furiosa.runtime.sync import create_runner

input_ = np.random.rand(2,3,640,640).astype(np.uint8)

with create_runner("yolov8n_i8.onnx") as runner:
    runner.run([input_])

saumitrabg · July 23, 2024, 11:09pm

Thank you, end-to-end example would be very helpful; meanwhile I went ahead to try out some parts, had a doubt on model loading; with create_runner module, it tries to complete 3 steps of splitting, lowering and optimising graph, which takes about 8mins while loading; with FuriosaRT module we get a chance to precompile this work prior and then just load the .enf model; are we taking create_runner option because it supports batching? or can we load an .enf model with create_runner also?.
Don’t think model load taking time will be a high priority concern then an end-to-end example for integration, just posting this question to confirm my understanding.

auro · July 24, 2024, 12:02am

That’s exactly right…compiling the model (long) is a one-time cost, the resulting .enf file can be loaded (instantaneous).

furiosa-compiler [quantization_file path] -o [enf file path] --target-npu=warboy

We hear you about creating an end-to-end example with BS>1.

It hasn’t started yet.

A realistic schedule would be end-of-this-week.

If we need it sooner, please say so, we can see who is available to put it together.

Sarvanan · July 24, 2024, 11:14am

Okay, thank you!
We would require to support model load with .enf, instead of onnx with batching support; and in end-to-end example it would be helpful if it covers everything from model export to onnx with batch configuration, then calibration, quantisation and detail on sending multiple images for inference and post processing with NMS support to handle batching based detail.
We wanted to close the integration within this week, it would be helpful if end-to-end example is ready sooner for integration.

auro · July 24, 2024, 4:16pm

Thank you!
Loading .enf already exists (transparent during loading).
At load time, you may see (for precompiled model):

✨  Finished in 0.000006634s
✨  Finished in 0.000005981s

If its precompiled, just point to it.

model_path = "enf_models/borde_model_single_2.enf"
# model_path = "borde_model_i8.onnx"

In-line compilation happens only if:

model is supplied in the onnx format
the compiled model is not in cache

For production scenarios, precompiled .enf is the way to go.

Topic		Replies	Views
Model Zoo 배치 추론 에러 일반	1	85	July 11, 2024
Errors in compilng enf file having batch_size > 1 with furiosa-compiler command Furiosa Runtime	2	63	January 8, 2025
HuggingFace 모델 사용 문의 일반	5	277	August 21, 2023
Isnet 모델을 양자화 하는데 도움이 필요합니다 일반	8	319	August 8, 2023
FuriosaAI SDK 사용자들을 위한 YOLOv5s 및 YOLOv7 예제 일반	3	736	August 2, 2023

Example end-to-end flow with batch size>1 with the new runtime in 0.10.0

Related topics