Hello! We are excited to announce the Furiosa SDK 2025.2.0 release. SDK 2025.2.0 has officially been published today. This release is the fourth major release for the RNGD, and provides the streamlined stack to enable LLM on RNGD, including driver, SoC FW, PERT, HAL, Model Compressor, Furiosa Compiler, and all Furiosa SDK components including Furiosa-LLM.
Here are the release notes are documents:
- Release Note (2025.2.0)
- Upgrading FuriosaAI’s Software (2025.2.0)
- Furiosa Docs (2025.2.0)
- Furiosa SDK 2025.2 (Beta 2) Supplementary Guide
Key features and improvements in the 2025.2.0 release
- Introduce LLM.chat() API to support chat-based models.
- Add support for
/v1/models
and/v1/models/{model_id}
endpoints in furiosa-llm. - Add support the chunked prefill feature in furiosa-llm.
- Enable direct building of
bfloat16
/float16
/float32
models without quantization step. - Add support for the reasoning model parser in OpenAI-Compatible Server.
- LLM API, furiosa-mlperf, furiosa-llm serve now support loading artifacts from Hugging Face Hub.
- furiosa-llm now supports Python 3.11 and 3.12.
- Optimize the NPU DRAM stack usage for the furiosa-llm.
- Support Ubuntu 24.04 (Noble Numbat).
- Add support for abort() in LLMEngine and AsyncLLMEngine APIs.
- Add support for the metrics endpoint (
/metrics
) used to monitor the health of OpenAI-Compatible Server. - Support sampling parameter “logprobs” in Furiosa-LLM
- Add support for Container Device Interface (CDI) for container runtimes (e.g., docker, containerd, and crio).
You can find more update details at Release Note (2025.2.0).
Also, please check out the available pre-compiled models in Hugging Hub.