Dell Technologies has announced two major updates to its Dell AI Platform with AMD, targeting organizations scaling from pilot AI deployments to full production environments. The enhancements focus on high-performance training infrastructure and a modular architecture that balances cost, scalability, and operational control.
The first update introduces a large-scale configuration featuring Dell PowerEdge XE9785 server nodes equipped with AMD Instinct MI355X GPUs and AMD EPYC CPUs. It is designed for demanding AI workloads, including model training, pre-training, and high-throughput inference. The platform integrates Dell PowerSwitch networking and PowerScale storage, ensuring a consistent infrastructure stack across deployments.
Using AMD Instinct MI355X GPUs increases per-node memory capacity, enabling support for larger models and more efficient scaling across clusters. This configuration targets enterprises and service providers with continuous AI workloads that require predictable performance at scale.
The second enhancement extends Dell’s modular AI Factory architecture to support AMD Instinct MI350P PCIe GPUs paired with AMD EPYC CPUs. This configuration uses Dell PowerEdge XE7745 and R7725 servers, PowerSwitch networking, and PowerScale storage, and integrates with the Dell AI Data Platform. It is positioned as a cost-effective path for organizations moving from pilot to production, enabling incremental scaling of compute, memory, storage, and network resources to address specific bottlenecks.
Both configurations are built on the AMD ROCm software stack and support open frameworks such as PyTorch and vLLM. Integration with the Dell Automation Platform provides deployment and lifecycle management capabilities that simplify cluster provisioning and scaling.
Dell also cited findings from an Omdia study comparing on-premises deployments with public cloud alternatives. According to the report, a configuration featuring PowerEdge XE9785 servers with AMD Instinct MI355X GPUs can deliver up to 65% lower total cost of ownership, driven by infrastructure efficiency and open software ecosystems.
Modular Design Targets Predictable AI Scaling
Dell positions the updated platform around a modular design that supports a consistent path from small-scale testing to enterprise production. Organizations can start with a single-node configuration using as few as two GPUs, then scale by adding compute nodes, GPU capacity, storage, and network bandwidth as demand grows. This approach enables reuse of initial infrastructure investments while scaling in controlled increments.
The platform is also designed for workload flexibility. Standardizing on AMD’s enterprise AI software stack and open frameworks supports a range of AI use cases without requiring specialized infrastructure silos. This enables model portability and reduces the operational overhead of managing multiple environments.
Security and governance remain central to the platform’s design. By prioritizing on-premises deployment, Dell reduces exposure to external risks and maintains control over data locality. AMD’s enterprise AI Resource Manager provides additional governance capabilities, including policy controls and access management, to help organizations enforce data protection and compliance requirements.




Amazon