Furiosa SDK 2025.3.1 release

The 2025.3.1 release is a minor update but introduces several important new features. Because these features are added at the Furiosa-LLM frontend, they do not introduce any breaking changes. This release also includes a number of bug fixes.

Highlights of this release:

  • Print out the log of average throughput, KV cache usages, and running/waiting requests regularly.
  • Expose the more production metrics (running/waiting requests, total KV cache and usage) through /metrics endpoint.
  • Fix the compilation error of small LLMs like Qwen 2.5 3B.
  • Fix the bug that occurs when initializing the runtime multiple times in the same Python interpreter.
  • Support tool_choice: “required” and tool_choice with a named function.
  • Support the structured output with guided_choice, guided_regex, guided_json, and guided_grammar (see Structured Output).
  • Add llguidance as the default guided decoding backend.
  • Bundle a set of NPU programs as a single zip file.
  • Allow to quantize and build the fine-tuned models.
  • More pre-compield models of Qwen 2.5 family available in Hugging Face Hub:
2 Likes