Furiosa litmus --skip-quantization model_path 정상동작하지만 추론 속도 이슈

제공된 가이드라인대로 지원되는 연산자 목록만 활용하여 양자화 및 dfg 형태로 변환을 완료해서 litmus를 실행하였을때 에러 없이 동작은 하였으나 warning 과 함께 latency가 말도안되게 높게 나옵니다.

furiosa litmus --skip-quantization model.dfg
libfuriosa_hal.so --- v0.11.0, built @ 43c901f
furiosa-quantizer 0.10.2 (rev. d51ae81) furiosa-litmus 0.10.2 (rev. d51ae81)
[Step 1] Skip model loading and optimization
[Step 2] Skip model quantization
[Step 3] Checking if the model can be compiled for the NPU family [warboy-2pe] ...
[1/6] 🔍   Compiling from onnx to dfg
Done in 0.12505654s
[2/6] 🔍   Compiling from dfg to ldfg
Done in 17.377073s
[3/6] 🔍   Compiling from ldfg to cdfg
Done in 0.001495814s
[4/6] 🔍   Compiling from cdfg to gir
Done in 0.003468882s
[5/6] 🔍   Compiling from gir to lir
Done in 0.011159081s
[6/6] 🔍   Compiling from lir to enf
Done in 0.004156906s
✨  Finished in 17.522451s

ok: compiled successfully! (output.enf)

[Step 3] Passed
[Step 4] Perform inference once for data collection... (Optional)
✨  Finished in 0.000003343s
2024-08-02 02:50:09.164468374 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692237, index: 1, mask: {2, 8, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:09.164726629 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692238, index: 2, mask: {3, 9, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:09.182492232 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692242, index: 1, mask: {2, 8, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:09.182507515 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692243, index: 2, mask: {3, 9, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:09.185461072 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692248, index: 1, mask: {2, 8, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:09.185478501 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692249, index: 2, mask: {3, 9, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:49.121352475 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692641, index: 2, mask: {3, 9, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:49.121351197 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692640, index: 1, mask: {2, 8, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:49.145515268 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692646, index: 2, mask: {3, 9, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:50:49.154792637 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1692645, index: 1, mask: {2, 8, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:52:33.029261684 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1693659, index: 1, mask: {2, 8, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:52:33.029395553 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1693660, index: 2, mask: {3, 9, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:52:33.045603355 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1693665, index: 2, mask: {3, 9, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-08-02 02:52:33.045619390 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 1693664, index: 1, mask: {2, 8, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
======================================================================
This benchmark was executed with latency-workload which prioritizes latency of individual queries over throughput.
1 queries executed with batch size 1
Latency stats are as follows
QPS(Throughput): 0.00/s

Per-query latency:
Min latency (us)    : 376796285
Max latency (us)    : 376796285
Mean latency (us)   : 376796285
50th percentile (us): 376796285
95th percentile (us): 376796285
99th percentile (us): 376796285
99th percentile (us): 376796285
[Step 4] Finished

어떤 부분이 문제인 걸까요?

안녕하세요, 퓨리오사AI 김종욱입니다.

문제의 정확한 원인을 파악하기위해 혹시 onnx 모델 구조를 jongwook.kim@furiosa.ai 로 보내주실 수 있으실지 여쭤봅니다. 만약해당 과정이 어려우신 경우 임의의 양자화를 진행하신 이후에, 성능 프로파일링을 진행하신 다음 얻은 json 파일을 공유해주시면 어떤 부분에서 inference 속도가 느려지는지 파악이 가능할 것 같습니다.

감사합니다.

현재 특정한 형태의 Conv1D 연산자 컴파일 쪽에 문제가 있어 WARBOY에서 가속되고 있지 않는 상태입니다. 이를 해결하기 위해서 Conv1D 연산 부분들을 Conv2D로 모두 바꾸었을때 공유주신 모델에 대하여 속도가 10배정도는 빨라지는 것은 확인하였습니다.

해당 내용 메일로 공유드릴 수 있도록 하겠습니다.

보내주신 메일 잘 확인하였습니다. 답변 감사합니다.

1 Like