MinIO has announced MemKV, a context memory store designed to address a growing bottleneck in large-scale AI inference environments. Positioned as the second core component of the company’s portfolio alongside AIStor, MemKV extends MinIO’s data infrastructure into the memory tier, targeting persistent, shared context for agentic AI workloads operating across GPU clusters.
As AI systems evolve from single-response interactions to multi-step reasoning and task execution, maintaining context across inference cycles has become critical. In current architectures, context is frequently lost due to limited capacity in GPU-adjacent memory tiers such as HBM and DRAM. This forces GPUs to recompute previously generated context, increasing latency, compute utilization, and energy consumption. MinIO characterizes this as a recompute tax that compounds at scale, particularly in hyperscale and cloud environments.
MemKV is designed to mitigate this issue by providing a shared, persistent memory layer capable of microsecond retrieval at the petabyte scale. By maintaining context across inference operations, the platform reduces redundant computation and improves overall system efficiency. In internal benchmarks, MinIO reports improvements in time-to-first-token at production concurrency levels. In a representative deployment with 128 GPUs and 128K-token context windows, GPU utilization increased from about 50 percent to over 90 percent, resulting in significant annual compute cost savings.
MinIO’s leadership noted that recompute overhead has historically been masked in smaller deployments but becomes a structural inefficiency at scale. As GPU clusters grow, the cost of repeatedly regenerating context rises in both power consumption and infrastructure requirements, making purpose-built memory systems necessary for sustainable AI operations.
Traditional AI infrastructure forces a tradeoff between speed and scale. High-performance memory tiers such as HBM and DRAM provide microsecond latency but are capacity-constrained and expensive. Conversely, storage systems offer scale but introduce millisecond latency, which is unsuitable for real-time inference and long-context reasoning.
MemKV is designed to bridge this gap by introducing a shared memory tier that combines low-latency access with large-scale capacity. Built to run on NVIDIA BlueField-4 STX and integrated with NVIDIA Dynamo and NIXL, the platform enables an entire GPU cluster to access a common pool of context data at speeds aligned with inference requirements. This approach eliminates the need to shuttle context between disparate memory and storage layers, reducing latency and improving throughput.
MemKV is purpose-built for the inference data path and aligns with MinIO’s description of the G3.5 layer in the GPU memory hierarchy. It delivers petabyte-scale capacity on NVMe-based infrastructure while maintaining microsecond-level access characteristics, effectively decoupling memory scale from GPU compute resources.
The system avoids traditional storage abstractions by moving data directly from NVMe into the AI data path via end-to-end RDMA transport. This eliminates overhead from HTTP protocols, file-system translation, and intermediary storage servers, which are common in object- and file-based architectures.
Source: Google
Key architectural elements include native execution on NVIDIA BlueField-4 STX as an ARM64 binary embedded in the storage layer, reducing reliance on external x86 storage nodes. Data transfers occur over RDMA from GPU memory to NVMe, bypassing conventional storage stacks. MemKV also uses larger block sizes, ranging from 2 MB to 16 MB, optimized for GPU throughput patterns rather than legacy 4 KB storage blocks. Networking performance is aligned with modern high-speed fabrics, including NVIDIA Spectrum-X Ethernet and PCIe Gen6, enabling near wire-speed data movement across the cluster.
MinIO MemKV is available immediately.
Intel announced a set of data center updates at Computex 2026 in Taipei spanning compute, networking, and its AI accelerator…
NetApp and Cisco have introduced an expanded set of FlexPod-validated solutions to simplify the deployment of secure, scalable AI infrastructure.…
Nutanix announced that its Nutanix Unified Storage (NUS) solution is now NVIDIA-Certified at the enterprise level, validating the platform for…
ZutaCore has secured $100 million in Series C funding, with participation from Mitsubishi Electric, Carrier Ventures, Samsung Electronics, and others…
CoolIT Systems has announced the development of what it describes as the first 15kW direct liquid cooling (DLC) coldplate design,…
Financial services infrastructure continues to be defined by the need to process larger risk models within fixed power and space…