Intel Unveils Gaudi 3 AI Accelerator Chip
Intel recently introduced its latest AI accelerator chip, Gaudi 3, at the Vision 2024 event in Phoenix. This cutting-edge chip is positioned as a viable alternative to Nvidia’s H100, a sought-after data center GPU that has experienced supply shortages recently.
Performance Claims and Comparison
According to Intel, Gaudi 3 boasts significant performance advantages over Nvidia’s H100, specifically clocking a 50% faster training time for models like OpenAI’s GPT-3 175B LLM and Meta’s Llama 2. Additionally, Intel asserts a 50% faster inference performance for models such as Llama 2 and Falcon 180B.
Despite the dominance of the H100 in the data center GPU market, Nvidia has plans for more powerful AI accelerator chips like the H200 and Blackwell B200, which have yet to be released to the public. The ongoing supply constraints of the H100 have prompted tech giants to explore custom AI accelerator chip designs.
Gaudi 3 Features and Specifications
Intel’s Gaudi 3 is an evolution of its predecessor, Gaudi 2, designed with two identical silicon dies interconnected by a high-bandwidth connection. Each die contains a central cache memory of 48 megabytes, along with four matrix multiplication engines and 32 programmable tensor processor cores, culminating in a total of 64 cores.
Intel highlights Gaudi 3’s enhanced performance through its use of 8-bit floating-point infrastructure, delivering double the AI compute capability of Gaudi 2. Additionally, the chip provides a fourfold increase in computational efficiency using the BFloat 16-number format. Gaudi 3 is equipped with 128GB of HBMe2 memory and boasts a memory bandwidth of 3.7TB.
Efficiency and Energy Consumption
Recognizing the power consumption challenges in data centers, Intel emphasizes Gaudi 3’s power efficiency. The company claims a 40% greater inference power-efficiency across various parameters compared to Nvidia’s H100, attributed to Gaudi’s large-matrix math engines that require less memory bandwidth.
Comparison with Blackwell Architecture
Intel’s use of TSMC’s N5 process technology in manufacturing Gaudi 3 showcases a narrowing technological gap with Nvidia, especially as the latter prepares to introduce the Blackwell architecture built on a custom N4P process. The decision to utilize HBM2e memory in Gaudi 3 emphasizes Intel’s commitment to competitive pricing.
While a direct performance comparison between Gaudi 3 and Nvidia’s B200 is pending third-party benchmarks, the advancement of Intel’s upcoming Falcon Shores chip and the utilization of cutting-edge nanosheet transistor technology remain topics of interest.
Image/Photo credit: source url