Furiosa SDK 2024.2.0 Release

We are very excited to announcement Furiosa SDK 2024.2.0 release. 2024.2.0 is the second major SDK release for RNGD. This release includes a lot of new features and significant improvements, including new model support, 8k context length support in models, Tensor Parallelism support, Pytorch upgrade, Optimum API, and performance improvements. You can find more details the below highlights. Please refer to Upgrading Furiosa Software Stack to upgrade the Furiosa software stack.

Highlights

  • New Model support: Solar, EXAONE-3.0, CodeLLaMA2, Vicuna, …
  • Up to 8k context length (<= 8192) support in models, such as LLaMA 3.1
  • Tensor Parallelism support (tensor_parallel_size <= 8)
  • Torch 2.4.1 support
  • Transformers 4.44.2 support
  • Furiosa LLM
    • ArtifactBuilder API and CLI tools (refer to ArtifactBuilder API)
      • Users can build artifacts from Huggingface Hub models with Huggingface Transformers compatible API
    • Huggingface Transformers compatible API support (furiosa_llm.optimum)
      • AutoModel, AutoModelForCausalLM, AutoModelForQuestionAnswering API
      • QuantizerForCausalLM API support for calibration and quantization
    • LLMEngine, AsyncLLMEngine API support compatiable with vLLM
  • About 20% performance improvements in models based on LlamaForCausalLM
    • e.g., 3580 tokens/sec in LLaMA 3.1 8B model with a single RNGD card

Breaking Changes

  • LLM.from_artifacts() API has been deprecated. Please use LLM.load_artifacts() instead.
  • The artifacts built from 2024.1.x is not compatible with 2024.2.x. Please use the artifact built from 2024.2.x.

You can find more details about 2024.2.0 release at Release Note of Furiosa SDK 2024.2.0 Beta0.