FuriosaAI Launches AI Accelerator Chip 'RNGD' for High-Performance LLM Inference

AI semiconductor vendor FuriosaAI has unveiled AI accelerator RNGD (pronounced "Renegade") at Hot Chips 2024. RNGD is positioned to be the most efficient data center accelerator for high-performance large language model and multimodal model inference.

Furiosa successfully completed the full bring-up of RNGD after receiving the first silicon samples from their partner, TSMC. Early testing of RNGD has revealed promising results with large language models such as GPT-J and Llama 3.1. A single RNGD PCIe card delivers 2,000 to 3,000 tokens per second throughput performance for models with around 10 billion parameters.

RNGD's key innovations include a non-matmul, Tensor Contraction Processor (TCP) based architecture that enables a perfect balance of efficiency, programmability and performance; programmability through a robust compiler co-designed to be optimized for TCP that treats entire models as single-fused operations; efficiency, with a TDP of 150W compared to 1000W+ for leading GPUs; and high-performance, with 48GB of HBM3 memory delivering the ability to run models like Llama 3.1 8B efficiently on a single card.

June Paik, Co-Founder and CEO, FuriosaAI
The launch of RNGD is the result of years of innovation, leading to a one-shot silicon success and exceptionally rapid bring-up process. RNGD is a sustainable and accessible AI computing solution that meets the industry's real-world needs for inference. With our hardware now starting to run LLMs at high performance, we're entering an exciting phase of continuous advancement. I am incredibly proud and grateful to the team for their hard work and continuous dedication.
Aditya Raina, CMO, GUC
The collaboration between GUC and FuriosaAI to deliver RNGD with exceptional performance and power efficiency hinges on meticulous planning and execution. Achieving this requires a deep understanding of modern AI software and hardware. FuriosaAI has consistently demonstrated excellence from design to delivery, creating the most efficient AI inference chips in the industry.
Top