Home Enterprise Meta Unveils Next-Gen Meta Training and Inference Accelerator (MTIA)

Meta Unveils Next-Gen Meta Training and Inference Accelerator (MTIA)

by Lyle Smith
META MTIA

Meta has announced advancements in its AI infrastructure by unveiling the next-generation Meta Training and Inference Accelerator (MTIA). This development marks a significant leap in Meta’s efforts to enhance AI-based products, services, and research, responding to the escalating demands for more sophisticated AI models.

Meta has announced advancements in its AI infrastructure by unveiling the next-generation Meta Training and Inference Accelerator (MTIA). This development marks a significant leap in Meta’s efforts to enhance AI-based products, services, and research, responding to the escalating demands for more sophisticated AI models.META MTIA

Following the introduction of its first-generation accelerator, the MTIA project is refining the compute efficiency essential for Meta’s distinctive AI workloads. This includes the deep learning recommendation models integral to elevating user experiences across Meta’s platforms.

Next Gen Vs. First Gen MTIA

The first-generation MTIA has undergone significant technological advancements to its next-gen counterpart to meet the growing demands of AI workloads. Initially built with TSMC’s 7nm process, the first-gen MTIA had a frequency of 800MHz, supporting 1.12 billion gates and delivering up to 102.4 teraflops per second (TFLOPS/s) for INT8 operations. It was equipped with 128MB of on-chip memory and 64GB of off-chip LPDDR5 memory, with a TDP of 25 watts. This setup was optimized for a balance between performance and power efficiency, with a total memory bandwidth capacity that included 400GB/s per processing element (PE) for local memory and 800GB/s for on-chip memory.

The next-gen MTIA has shifted to TSMC’s more advanced 5nm process, enabling the accelerator to operate at a higher frequency of 1.35GHz. This upgrade doubles the gate count to 2.35 billion and increases the FLOPS to 103 million, indicating a substantial boost in the chip’s processing capabilities. The next-gen MTIA introduces a threefold increase in local PE storage and doubles the on-chip SRAM to 256MB while expanding off-chip LPDDR5 memory to 128GB. This increase is matched with enhanced memory bandwidth, reaching up to 1TB/s per PE for local memory and 2.7TB/s for on-chip memory, ensuring higher data throughput and efficiency.

In addition, the TDP is now 90 watts to accommodate the higher performance levels. The host connection has also been upgraded to 8x PCIe Gen5, doubling the bandwidth to 32GB/s, which supports faster data transfers between the accelerator and the host system. These notable improvements provide a stronger foundation for developing and deploying AI-driven applications and services.

Next Gen MTIA Features

At its core, the MTIA features sophisticated 8×8 grid PEs that significantly boost dense and sparse compute performances. This enhancement is attributed to architectural advancements and a substantial increase in local PE storage, on-chip SRAM, and bandwidth. Moreover, the accelerator’s improved network-on-chip (NoC) architecture facilitates speedy coordination between PEs, ensuring low-latency data processing essential for complex AI tasks.

Meta’s approach extends beyond silicon innovation. The next-gen MTIA is supported by a robust rack-based system capable of housing up to 72 accelerators, offering considerable scaling potential for Meta’s ambitious AI projects. The system’s design allows for higher operation frequencies and efficiency, easily accommodating diverse model complexities.

Software integration also plays a pivotal role in the MTIA’s ecosystem, with Meta leveraging its work on PyTorch to ensure seamless compatibility and developer productivity. The inclusion of advanced programming and execution frameworks, like Triton-MTIA, facilitates the efficient translation of AI models into high-performance computing instructions, streamlining the development process.

Initial Performance Results of Next-gen MTIA

Meta says preliminary performance metrics indicate a significant improvement over the first generation, demonstrating its capability to process simple and complex ranking and recommendation algorithms efficiently. This chip manages algorithms that vary significantly in size and computational demand, outperforming standard commercial GPUs thanks to Meta’s integrated technology approach. The company is focused on enhancing energy efficiency as it deploys these chips across its systems.

Initial testing has shown that the next-generation MTIA chip triples the performance of its predecessor across key models. With an upgraded system that includes double the devices and a high-powered dual-socket CPU, Meta has achieved a six-fold increase in model processing throughput and a 50% improvement in energy efficiency over its first-generation MTIA setup. These improvements result from extensive optimizations in computing components and server architecture. Optimizing models has become quicker as the developer ecosystem matures, with ample room for further efficiency gains.

Now active in data centers, the MTIA chip is enhancing Meta’s AI workload processing, proving to be a strategic complement to commercial GPUs. With several initiatives underway to extend MTIA’s functionalities, this release marks another big move toward the company’s commitment to advancing AI technology and its applications.

Meta AI

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed