Enterprise

Dell Expands AI Platform with AMD GPU and Modular Infrastructure Options

Dell Technologies has announced two major updates to its Dell AI Platform with AMD, targeting organizations scaling from pilot AI deployments to full production environments. The enhancements focus on high-performance training infrastructure and a modular architecture that balances cost, scalability, and operational control.

The first update introduces a large-scale configuration featuring Dell PowerEdge XE9785 server nodes equipped with AMD Instinct MI355X GPUs and AMD EPYC CPUs. It is designed for demanding AI workloads, including model training, pre-training, and high-throughput inference. The platform integrates Dell PowerSwitch networking and PowerScale storage, ensuring a consistent infrastructure stack across deployments.

Using AMD Instinct MI355X GPUs increases per-node memory capacity, enabling support for larger models and more efficient scaling across clusters. This configuration targets enterprises and service providers with continuous AI workloads that require predictable performance at scale.

The second enhancement extends Dell’s modular AI Factory architecture to support AMD Instinct MI350P PCIe GPUs paired with AMD EPYC CPUs. This configuration uses Dell PowerEdge XE7745 and R7725 servers, PowerSwitch networking, and PowerScale storage, and integrates with the Dell AI Data Platform. It is positioned as a cost-effective path for organizations moving from pilot to production, enabling incremental scaling of compute, memory, storage, and network resources to address specific bottlenecks.

Both configurations are built on the AMD ROCm software stack and support open frameworks such as PyTorch and vLLM. Integration with the Dell Automation Platform provides deployment and lifecycle management capabilities that simplify cluster provisioning and scaling.

Dell also cited findings from an Omdia study comparing on-premises deployments with public cloud alternatives. According to the report, a configuration featuring PowerEdge XE9785 servers with AMD Instinct MI355X GPUs can deliver up to 65% lower total cost of ownership, driven by infrastructure efficiency and open software ecosystems.

Modular Design Targets Predictable AI Scaling

Dell positions the updated platform around a modular design that supports a consistent path from small-scale testing to enterprise production. Organizations can start with a single-node configuration using as few as two GPUs, then scale by adding compute nodes, GPU capacity, storage, and network bandwidth as demand grows. This approach enables reuse of initial infrastructure investments while scaling in controlled increments.

The platform is also designed for workload flexibility. Standardizing on AMD’s enterprise AI software stack and open frameworks supports a range of AI use cases without requiring specialized infrastructure silos. This enables model portability and reduces the operational overhead of managing multiple environments.

Security and governance remain central to the platform’s design. By prioritizing on-premises deployment, Dell reduces exposure to external risks and maintains control over data locality. AMD’s enterprise AI Resource Manager provides additional governance capabilities, including policy controls and access management, to help organizations enforce data protection and compliance requirements.

Harold Fritts

I have been in the tech industry since IBM created Selectric. My background, though, is writing. So I decided to get out of the pre-sales biz and return to my roots, doing a bit of writing but still being involved in technology.

Recent Posts

Intel Launches Xeon 6+ on 18A With 288 E-Cores, E835 200GbE Ethernet, and Crescent Island GPU Details

Intel announced a set of data center updates at Computex 2026 in Taipei spanning compute, networking, and its AI accelerator…

2 days ago

NetApp and Cisco Expand FlexPod With Validated AI Architectures and Splunk SOAR Storage Response

NetApp and Cisco have introduced an expanded set of FlexPod-validated solutions to simplify the deployment of secure, scalable AI infrastructure.…

2 days ago

Nutanix Unified Storage Earns Enterprise-Level NVIDIA Certification for Production AI Workloads

Nutanix announced that its Nutanix Unified Storage (NUS) solution is now NVIDIA-Certified at the enterprise level, validating the platform for…

2 days ago

ZutaCore Raises $100M Series C to Scale Waterless Two-Phase Cooling for AI Data Centers

ZutaCore has secured $100 million in Series C funding, with participation from Mitsubishi Electric, Carrier Ventures, Samsung Electronics, and others…

3 days ago

CoolIT Systems Demonstrates 15kW Coldplate, Extending Single-Phase DLC Beyond 2030

CoolIT Systems has announced the development of what it describes as the first 15kW direct liquid cooling (DLC) coldplate design,…

3 days ago

HPE XD230 STAC-A2 Record: Intel Xeon 6980P and Micron MRDIMMs Lead Financial Risk Benchmarks

Financial services infrastructure continues to be defined by the need to process larger risk models within fixed power and space…

4 days ago