Furiosa SDK 2025.3.1 release

hyunsik · August 26, 2025, 6:20am

The 2025.3.1 release is a minor update but introduces several important new features. Because these features are added at the Furiosa-LLM frontend, they do not introduce any breaking changes. This release also includes a number of bug fixes.

Highlights of this release:

Print out the log of average throughput, KV cache usages, and running/waiting requests regularly.
Expose the more production metrics (running/waiting requests, total KV cache and usage) through /metrics endpoint.
Fix the compilation error of small LLMs like Qwen 2.5 3B.
Fix the bug that occurs when initializing the runtime multiple times in the same Python interpreter.
Support tool_choice: “required” and tool_choice with a named function.
Support the structured output with guided_choice, guided_regex, guided_json, and guided_grammar (see Structured Output).
Add llguidance as the default guided decoding backend.
Bundle a set of NPU programs as a single zip file.
Allow to quantize and build the fine-tuned models.
More pre-compield models of Qwen 2.5 family available in Hugging Face Hub:
- Qwen2.5-32B-Instruct
- Qwen2.5-Coder-14B-Instruct

Topic		Replies	Views
Furiosa SDK 2025.2.0 release Announcements release , sdk , rngd	5	89	May 20, 2025
Furiosa SDK 2025.1.0 release Announcements release	0	101	February 24, 2025
Furiosa SDK 2025.3.0 release Announcements release , rngd , sdk	0	47	August 4, 2025
A brief introduction to Furiosa SDK 2025.2 Documentation release , rngd , furiosa-llm	0	57	June 1, 2025
Furiosa SDK 2024.2.0 Release Announcements release	0	33	January 13, 2025

Furiosa SDK 2025.3.1 release

Highlights of this release:

Related topics