Intel Vision Brings Gaudi 3 To Life

by Harold Fritts on April 9, 2024

At the core of Intel’s Vision lies a significant forecast: enterprise investment in GenAI is poised to quadruple from $40 billion in 2024 to an impressive $151 billion by 2027. However, amidst this optimism, the path to enterprise adoption is strewn with obstacles.

Intel is firmly committed to addressing these challenges by delivering scalable, secure, end-to-end GenAI solutions that confront the most critical AI hurdles—from security and integration complexities to cost concerns. With a steadfast focus on enabling transformation, Intel is crafting a comprehensive enterprise AI strategy, embracing an open ecosystem-based approach across its entire product line, spanning AI PCs, edge computing, and the data center.

Stealing The Show: Intel Gaudi 3 Accelerator

At the forefront of Intel’s Vision stands the unveiling of the Intel Gaudi 3 AI accelerator, riding the coattails of its predecessor’s established performance and efficiency, the Intel Gaudi 2 AI accelerator. Offering customers the flexibility of open community-based software and industry-standard Ethernet networking, the Gaudi 3 marks a significant leap forward in system scalability.

Designed for GenAI performance and efficiency, the Gaudi 3 delivers impressive enhancements over its predecessor. With 4x AI compute for BF16, a 1.5x increase in memory, and 2x networking bandwidth, it promises unparalleled productivity for AI training and inference on large language models (LLMs) and multimodal models.

Justin Hotard, Intel’s executive vice president and general manager of the Data Center and AI Group, underscores the significance of Gaudi 3 in addressing the evolving landscape of the AI market. He highlights the demand for increased choice among enterprises and emphasizes Gaudi 3’s compelling combination of price performance, scalability, and time-to-value advantage.

In critical sectors like finance, manufacturing, and healthcare, enterprises are rapidly expanding AI accessibility, transitioning from experimental to full-scale implementation of generative AI (GenAI) projects. Intel views the Gaudi 3 accelerator as pivotal in meeting these requirements, offering versatility through open community-based software and standard Ethernet networking to scale AI systems and applications flexibly.

The Gaudi 3’s custom architecture is tailored for efficient large-scale AI compute, manufactured on a 5 nanometer (nm) process and equipped with key features:

AI-Dedicated Compute Engine: Featuring a heterogeneous compute engine comprising 64 AI-custom and programmable Tensor Processor Cores (TPCs) and eight Matrix Multiplication Engines (MMEs), the Gaudi 3 excels in handling complex matrix operations fundamental to deep learning algorithms.
Memory Boost for LLM Capacity Requirements: With ample memory capacity, bandwidth, and on-board static random access memory (SRAM), the Gaudi 3 efficiently processes large GenAI datasets, enhancing workload performance and data center cost efficiency.
Efficient System Scaling for Enterprise GenAI: Integrated with twenty-four 200 gigabit Ethernet ports, the Gaudi 3 enables flexible and open-standard networking, facilitating efficient scaling to support large compute clusters while eliminating vendor lock-in.
Open Industry Software for Developer Productivity: The Gaudi software integrates the PyTorch framework and provides optimized Hugging Face community-based models, enhancing developer productivity and ease of model porting across hardware types.
Gaudi 3 PCIe: Introducing a new form factor in the product line, the Gaudi 3 PCIe add-in card offers high efficiency and lower power consumption, ideal for workloads such as fine-tuning, inference, and retrieval-augmented generation (RAG).

Expected to deliver significant performance improvements for training and inference tasks on leading GenAI models, the Gaudi 3 accelerator stands poised to revolutionize AI and high-performance computing (HPC). It will play a vital role in Falcon Shores, Intel’s next-generation GPU, integrating Intel Gaudi and Intel Xe intellectual property with a single GPU programming interface based on the Intel oneAPI specification.

Intel Gaudi 3 vs NVIDIA

During the opening session, Intel CEO Pat Gelsinger took center stage to introduce the Intel Gaudi 3 AI accelerator. Gaudi 3 is expected to deliver a 50 percent faster time-to-train compared to NVIDIA H100 across Llama2 7B and 13B and GPT-3 175B parameters. Additionally, the Gaudi 3 accelerator can produce a 50 percent higher inference throughput and 40 percent better inference power efficiency across Llama 7B and 70B parameters and Falcone 180B parameter models compared to NVIDIA H200. Moreover, Gaudi 3 has demonstrated 30 percent faster inferencing than NVIDIA H200 on Llama 7B and 70B parameters and Falcon 180B parameter models.

Featuring impressive performance metrics, Gaudi 3 boasts a remarkable 50 percent better inference throughput and 60 percent better power efficiency compared to industry alternatives.

Supported by top OEMs such as Dell Technologies, Supermicro, Lenovo, and HPE and enhanced with features like PCIe card support, Gaudi 3 emerges as a GPU with tremendous potential in the AI landscape.

Intel Xeon 6 Processors

In addition to the Intel Gaudi 3 accelerator, Intel provided updates on its next-generation products and services across all segments of enterprise AI, with new Intel Xeon 6 processors.

Intel Xeon 6 processors will offer performance-efficient solutions to run current GenAI solutions, including RAG, that produce business-specific results using proprietary data. Intel introduced a new brand for its next-generation processors for data centers, cloud, and edge. Intel Xeon 6 processors with new Efficient-cores (E-core) will deliver efficiency. At the same time, Intel Xeon 6 with performance cores (P-core) will offer increased AI performance and launch soon after the E-core processors.

Intel Xeon 6 processors with E-cores (formerly code-named Sierra Forest) are expected to deliver an improved 2.4x performance per watt and 2.7x better rack density than 2nd Gen Intel Xeon processors. The Intel Xeon 6 processors with P-cores (formerly code-named Granite Rapids) incorporate software support for the MXFP4 data format, reducing subsequent token latency by up to 6.5x over 4th Gen Xeon using FP16 and the ability to run 70 billion parameter Llama2 models.

Harold Fritts

I have been in the tech industry since IBM created Selectric. My background, though, is writing. So I decided to get out of the pre-sales biz and return to my roots, doing a bit of writing but still being involved in technology.

Previous post: Wasabi Technologies Introduces Wasabi AiR, AI-Enabled Intelligent Media Storage

Next post: Supermicro X14 Server Family With Future Intel Xeon 6 Support Announced

Intel Vision Brings Gaudi 3 To Life

Stealing The Show: Intel Gaudi 3 Accelerator

Intel Gaudi 3 vs NVIDIA

Intel Xeon 6 Processors

Harold Fritts

Advertisement