AI semiconductor vendor FuriosaAI has unveiled AI accelerator RNGD (pronounced "Renegade") at Hot Chips 2024. RNGD is positioned to be the most efficient data center accelerator for high-performance large language model and multimodal model inference.
Furiosa successfully completed the full bring-up of RNGD after receiving the first silicon samples from their partner, TSMC. Early testing of RNGD has revealed promising results with large language models such as GPT-J and Llama 3.1. A single RNGD PCIe card delivers 2,000 to 3,000 tokens per second throughput performance for models with around 10 billion parameters.
RNGD's key innovations include a non-matmul, Tensor Contraction Processor (TCP) based architecture that enables a perfect balance of efficiency, programmability and performance; programmability through a robust compiler co-designed to be optimized for TCP that treats entire models as single-fused operations; efficiency, with a TDP of 150W compared to 1000W+ for leading GPUs; and high-performance, with 48GB of HBM3 memory delivering the ability to run models like Llama 3.1 8B efficiently on a single card.