Example end-to-end flow with batch size>1 with the new runtime in 0.10.0

Priority: High
Do we have an example of a end-to-end flow with batch size>1 with the new runtime in 0.10.0

By changing only the batch size from 1 to a power of 2 in the existing code, it will work the same.
Here is an example code for batch size > 1 with runtime in 0.10.x.

import numpy as np
from furiosa.runtime.sync import create_runner

input_ = np.random.rand(2,3,640,640).astype(np.uint8)

with create_runner("yolov8n_i8.onnx") as runner:
    runner.run([input_])
2 Likes

Thank you, end-to-end example would be very helpful; meanwhile I went ahead to try out some parts, had a doubt on model loading; with create_runner module, it tries to complete 3 steps of splitting, lowering and optimising graph, which takes about 8mins while loading; with FuriosaRT module we get a chance to precompile this work prior and then just load the .enf model; are we taking create_runner option because it supports batching? or can we load an .enf model with create_runner also?.
Don’t think model load taking time will be a high priority concern then an end-to-end example for integration, just posting this question to confirm my understanding.

That’s exactly right…compiling the model (long) is a one-time cost, the resulting .enf file can be loaded (instantaneous).

furiosa-compiler [quantization_file path] -o [enf file path] --target-npu=warboy

We hear you about creating an end-to-end example with BS>1.

It hasn’t started yet.

A realistic schedule would be end-of-this-week.

If we need it sooner, please say so, we can see who is available to put it together.

2 Likes

Okay, thank you!
We would require to support model load with .enf, instead of onnx with batching support; and in end-to-end example it would be helpful if it covers everything from model export to onnx with batch configuration, then calibration, quantisation and detail on sending multiple images for inference and post processing with NMS support to handle batching based detail.
We wanted to close the integration within this week, it would be helpful if end-to-end example is ready sooner for integration.

Thank you!
Loading .enf already exists (transparent during loading).
At load time, you may see (for precompiled model):

✨  Finished in 0.000006634s
✨  Finished in 0.000005981s

If its precompiled, just point to it.

model_path = "enf_models/borde_model_single_2.enf"
# model_path = "borde_model_i8.onnx" 

In-line compilation happens only if:

  1. model is supplied in the onnx format
  2. the compiled model is not in cache

For production scenarios, precompiled .enf is the way to go.

2 Likes