Enterprise

MinIO Introduces MemKV for Petabyte-Scale AI Inference Memory

MinIO has announced MemKV, a context memory store designed to address a growing bottleneck in large-scale AI inference environments. Positioned as the second core component of the company’s portfolio alongside AIStor, MemKV extends MinIO’s data infrastructure into the memory tier, targeting persistent, shared context for agentic AI workloads operating across GPU clusters.

As AI systems evolve from single-response interactions to multi-step reasoning and task execution, maintaining context across inference cycles has become critical. In current architectures, context is frequently lost due to limited capacity in GPU-adjacent memory tiers such as HBM and DRAM. This forces GPUs to recompute previously generated context, increasing latency, compute utilization, and energy consumption. MinIO characterizes this as a recompute tax that compounds at scale, particularly in hyperscale and cloud environments.

MemKV is designed to mitigate this issue by providing a shared, persistent memory layer capable of microsecond retrieval at the petabyte scale. By maintaining context across inference operations, the platform reduces redundant computation and improves overall system efficiency. In internal benchmarks, MinIO reports improvements in time-to-first-token at production concurrency levels. In a representative deployment with 128 GPUs and 128K-token context windows, GPU utilization increased from about 50 percent to over 90 percent, resulting in significant annual compute cost savings.

MinIO’s leadership noted that recompute overhead has historically been masked in smaller deployments but becomes a structural inefficiency at scale. As GPU clusters grow, the cost of repeatedly regenerating context rises in both power consumption and infrastructure requirements, making purpose-built memory systems necessary for sustainable AI operations.

Addressing the Memory-Scale Tradeoff

Traditional AI infrastructure forces a tradeoff between speed and scale. High-performance memory tiers such as HBM and DRAM provide microsecond latency but are capacity-constrained and expensive. Conversely, storage systems offer scale but introduce millisecond latency, which is unsuitable for real-time inference and long-context reasoning.

MemKV is designed to bridge this gap by introducing a shared memory tier that combines low-latency access with large-scale capacity. Built to run on NVIDIA BlueField-4 STX and integrated with NVIDIA Dynamo and NIXL, the platform enables an entire GPU cluster to access a common pool of context data at speeds aligned with inference requirements. This approach eliminates the need to shuttle context between disparate memory and storage layers, reducing latency and improving throughput.

Architecture Optimized for Inference Workloads

MemKV is purpose-built for the inference data path and aligns with MinIO’s description of the G3.5 layer in the GPU memory hierarchy. It delivers petabyte-scale capacity on NVMe-based infrastructure while maintaining microsecond-level access characteristics, effectively decoupling memory scale from GPU compute resources.

The system avoids traditional storage abstractions by moving data directly from NVMe into the AI data path via end-to-end RDMA transport. This eliminates overhead from HTTP protocols, file-system translation, and intermediary storage servers, which are common in object- and file-based architectures.

Source: Google

Key architectural elements include native execution on NVIDIA BlueField-4 STX as an ARM64 binary embedded in the storage layer, reducing reliance on external x86 storage nodes. Data transfers occur over RDMA from GPU memory to NVMe, bypassing conventional storage stacks. MemKV also uses larger block sizes, ranging from 2 MB to 16 MB, optimized for GPU throughput patterns rather than legacy 4 KB storage blocks. Networking performance is aligned with modern high-speed fabrics, including NVIDIA Spectrum-X Ethernet and PCIe Gen6, enabling near wire-speed data movement across the cluster.

Availability

MinIO MemKV is available immediately.

Harold Fritts

I have been in the tech industry since IBM created Selectric. My background, though, is writing. So I decided to get out of the pre-sales biz and return to my roots, doing a bit of writing but still being involved in technology.

Recent Posts

Intel Launches Xeon 6+ on 18A With 288 E-Cores, E835 200GbE Ethernet, and Crescent Island GPU Details

Intel announced a set of data center updates at Computex 2026 in Taipei spanning compute, networking, and its AI accelerator…

2 days ago

NetApp and Cisco Expand FlexPod With Validated AI Architectures and Splunk SOAR Storage Response

NetApp and Cisco have introduced an expanded set of FlexPod-validated solutions to simplify the deployment of secure, scalable AI infrastructure.…

2 days ago

Nutanix Unified Storage Earns Enterprise-Level NVIDIA Certification for Production AI Workloads

Nutanix announced that its Nutanix Unified Storage (NUS) solution is now NVIDIA-Certified at the enterprise level, validating the platform for…

2 days ago

ZutaCore Raises $100M Series C to Scale Waterless Two-Phase Cooling for AI Data Centers

ZutaCore has secured $100 million in Series C funding, with participation from Mitsubishi Electric, Carrier Ventures, Samsung Electronics, and others…

3 days ago

CoolIT Systems Demonstrates 15kW Coldplate, Extending Single-Phase DLC Beyond 2030

CoolIT Systems has announced the development of what it describes as the first 15kW direct liquid cooling (DLC) coldplate design,…

3 days ago

HPE XD230 STAC-A2 Record: Intel Xeon 6980P and Micron MRDIMMs Lead Financial Risk Benchmarks

Financial services infrastructure continues to be defined by the need to process larger risk models within fixed power and space…

4 days ago