The 2025.3.1 release is a minor update but introduces several important new features. Because these features are added at the Furiosa-LLM frontend, they do not introduce any breaking changes. This release also includes a number of bug fixes.
- What’s New — FuriosaAI Developer Center 2025.3.1 documentation
- Upgrading FuriosaAI’s Software — FuriosaAI Developer Center 2025.3.1 documentation
Highlights of this release:
- Print out the log of average throughput, KV cache usages, and running/waiting requests regularly.
- Expose the more production metrics (running/waiting requests, total KV cache and usage) through /metrics endpoint.
- Fix the compilation error of small LLMs like Qwen 2.5 3B.
- Fix the bug that occurs when initializing the runtime multiple times in the same Python interpreter.
- Support tool_choice: “required” and tool_choice with a named function.
- Support the structured output with guided_choice, guided_regex, guided_json, and guided_grammar (see Structured Output).
- Add llguidance as the default guided decoding backend.
- Bundle a set of NPU programs as a single zip file.
- Allow to quantize and build the fine-tuned models.
- More pre-compield models of Qwen 2.5 family available in Hugging Face Hub: