During GTC 2026, VDURA showcased updates to its Data Platform that improve GPU utilization and storage efficiency in AI environments. The announcement includes the general availability of Remote Direct Memory Access (RDMA), a preview of its Context-Aware Tiering technology, and validated infrastructure setups based on AMD EPYC Turin CPUs and NVIDIA ConnectX-7 networking.
The updates aim to eliminate data movement bottlenecks between GPU clusters and storage and to optimize data placement across storage tiers for large-scale AI training and inference workloads.RDMA Enables GPU-Direct Data Paths
VDURA has added RDMA support across its platform, allowing GPU servers to access storage directly over the network without CPU involvement. This enables GPU-to-storage data transfers that bypass traditional kernel and CPU-mediated paths, reducing latency and increasing throughput.
The implementation integrates with VDURA DirectFlow, the company’s data movement layer, to ensure all GPU server traffic uses RDMA. By eliminating CPU overhead in the data path, compute resources remain dedicated to model training and inference tasks. This approach is intended to sustain higher GPU utilization rates while minimizing pipeline latency in distributed AI clusters.
Context-Aware Tiering Targets Data Placement Efficiency
VDURA also detailed the first phase of its Context-Aware Tiering capability, scheduled for release later this year. This feature introduces automated data placement across storage tiers based on workload behavior and access patterns.
The initial phase extends the DirectFlow buffer into local NVMe SSDs, allowing frequently accessed data to reside closer to compute resources. This reduces dependency on shared or network-attached storage for hot data and improves response times for active workloads.
The platform also introduces KVCache writeback controls, which selectively persist only critical inference data to durable storage. This reduces unnecessary write activity while maintaining persistence guarantees required by production inference pipelines.
Additionally, VDURA is implementing a unified Context Cache Tiering framework that spans DRAM and local SSD. This enables high-speed read and write access aligned with LMCache-class performance, supporting use cases such as long-context LLM inference and retrieval-augmented generation.
VDURA indicated that future phases of Context-Aware Tiering will expand into application-aware data placement, improved cache coherence across nodes, and support for emerging infrastructure components such as NVIDIA BlueField-4 DPUs.
The company also introduced optimized platform configurations combining AMD EPYC Turin processors with NVIDIA ConnectX-7 network adapters. These configurations are designed to complement RDMA-enabled data paths and support high-throughput, low-latency communication between GPU clusters and storage systems.
Full-Stack AI Data Pipeline Focus
VDURA CEO Ken Claffey highlighted the company’s AI storage platform, which spans the entire data hierarchy from memory to long-term storage, and emphasized its performance. He said the platform uses RDMA for direct, CPU-free data access and features Context-Aware Tiering to position data across storage tiers. Claffey noted that these innovations help organizations support larger models, handle more inference requests, and scale AI infrastructure while meeting production AI reliability requirements.
The combined approach is intended to support larger model sizes, increase inference throughput, and improve infrastructure efficiency while maintaining reliability requirements for production AI deployments.
Availability
RDMA is now available on the VDURA V5000 and V7000 platforms. Context-Aware Tiering Phase 1 is expected to reach general availability later in 2026, with early access programs currently underway.




Amazon