At the 2025 OCP Global Summit, Intel emphasized AI inference by unveiling two key advancements: a new data center GPU called “Crescent Island” and a rack-scale reference design for Gaudi 3. Both developments align with the growing transition from model training to real-time, widespread inference, where factors such as latency, memory bandwidth, efficiency, and operational simplicity are now crucial.
Intel’s CTO, Sachin Katti, summarized the transition by noting that as agentic AI becomes more widespread, inference turns continuous, context-rich, and increasingly system-intensive. To manage the growing token volumes, complex modality combinations, and strict SLA requirements, a heterogeneous infrastructure combining optimal silicon with an open, developer-centric software stack becomes necessary. In this environment, Intel’s Xe architecture data center GPUs offer the capacity and reliability needed as sequence lengths and token rates increase, while Gaudi 3 supports an open, scalable inference ecosystem with predictable total cost of ownership (TCO).
The key takeaway is not simply “faster chips.” Inference success hinges on systems integration: topology-aware memory, flexible interconnects, power and cooling that match utilization profiles, and orchestration that treats models, tokens, and streams as first-class citizens. Enterprises don’t want lock-in at precisely the moment they need to iterate across models, frameworks, and serving stacks.
Intel argues it’s uniquely positioned to deliver end-to-end breadth, with AI PCs, industrial edge, and data center racks grounded in Xeon 6 CPUs, Gaudi 3 accelerators, and Intel GPUs. The common themes include:
Partnering with OCP supports the company’s preference for open, referenceable designs, which are easier to procure, validate, and scale in operations.
Crescent Island is Intel’s upcoming data center GPU designed for air-cooled enterprise servers that need high token throughput without requiring advanced power or cooling solutions. Its focus is practical: prioritize cost efficiency and energy performance while delivering the memory capacity and bandwidth essential for modern inference tasks.
Highlights include:
Crescent Island is about inference economics at scale. It’s air-cooled, capacity-forward, perf/W focused. Suppose your workloads are dominated by token-heavy serving with strict latency SLOs. In that case, the onboard 160 GB and efficient data types should translate into more concurrent sessions per watt and fewer cluster surprises as sequence lengths creep up.
Gaudi 3 extends from PCIe deployments to complete rack-scale configurations, offering a path to scale without forcing an all-at-once architectural bet. The new rack-scale reference design targets enterprises standardizing on large-model inference and latency-critical real-time systems.
Key elements:
In production inference, factors like memory topology and consistent thermal management surpass peak theoretical performance. The Gaudi 3 rack, which features 8.2 TB of HBM across 64 accelerators, addresses practical needs such as loading multiple large variants, managing long prompts, and handling bursts without causing thrashing. Liquid cooling isn’t just optional at these densities; it’s essential for maintaining QPS and latency under heat stress.
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed
Alloy Enterprises introduced single-piece, leak-proof cold plates that deliver direct liquid cooling (DLC) to every component.
Daikin Applied Americas has acquired Chilldyne, a manufacturer of negative-pressure, direct-to-chip liquid cooling systems.
The Veeam App for Microsoft Sentinel enhances SOC visibility by uniting backup and security intelligence to detect and respond more…
NVIDIA announced collaborations with Korean industry leaders to expand the country's AI infrastructure by deploying over 500M NVIDIA GPUs.
HPE expands NVIDIA AI Computing by HPE to deliver secure, scalable AI infrastructure for governments, enterprises, and AI factories.
New 9.6TB and 19.2TB models with preinstalled E1.S SSDs targeting media pros, now available in Core i5 and i3 configurations.