Tenstorrent QuietBox 2 Brings RISC‑V AI Inference to the Desktop

by Harold Fritts on March 11, 2026

AI ◇ Enterprise

Tenstorrent has introduced TT‑QuietBox 2 (Blackhole), a liquid‑cooled AI workstation designed to run models up to 120 billion parameters entirely on the desktop. The system combines a fully open‑source software stack with RISC‑V–based silicon and is positioned as a teraflop‑class inference platform that does not require racks, a server room, or specialized power.

Inference as the Primary AI Workload

QuietBox 2 targets developers and organizations that want direct control over their infrastructure, including visibility into silicon architecture at the compiler and runtime levels. It is aimed at on‑premises deployments in labs, offices, and small to medium businesses that need high‑end inference capability without moving to a rack‑scale footprint.

Tenstorrent CEO Jim Keller characterized the effort as a deliberate push to deliver an open, teraflop‑class development system suitable for lab or office environments, emphasizing that the platform is open throughout the stack, including mechanical design. Keller has an impressive background, including serving as an architect for AMD Zen, Apple A4/A5, and Tesla Full Self-Driving chips.

Pre-integrated Workloads for Developers

QuietBox 2 ships with a set of real workloads that can be deployed immediately across language, multimodal, and scientific domains.

For LLMs and coding, GPT‑OSS 120B runs entirely on the device, keeping a full 120‑billion‑parameter model resident at the desk for private, local inference. Llama 3.1 70B is reported at 476.5 tokens per second, a notable figure for interactive workloads, code assistance, and agentic pipelines. Qwen3‑32B is presented as a local coding agent capable of reasoning through entire codebases without cloud token limits, providing a path for offline development copilots and constrained‑environment deployments.

On the creative and multimodal side, Flux handles image generation locally, while Wan 2.2 delivers on‑device video synthesis. Both use cases keep media assets and prompts on premises, which will appeal to teams with strict IP and data residency requirements.

For scientific research, the system targets specialized models such as Boltz‑2, a biomolecular ML workload. Tenstorrent reports that Boltz‑2 predicts the structure of a 686‑amino‑acid protein in 49 seconds on a single Blackhole processor, compared to roughly 45 minutes on a modern CPU. That result aligns QuietBox 2 with flagship workstation GPUs in performance while focusing on cost and power efficiency. The ability to predict four protein structures in parallel on a single system provides roughly 4x the throughput of single‑job runs, which is relevant for labs and biotech workflows.

For models beyond the preinstalled catalog, TT‑Forge, Tenstorrent’s open‑source AI compiler, accepts graphs from PyTorch, ONNX, TensorFlow, JAX, and PaddlePaddle and targets them directly to the hardware. The design principle is straightforward: if a model runs on a standard framework, it should be possible to compile and run it on QuietBox 2.

RISC‑V Silicon and Memory Architecture

At the hardware level, QuietBox 2 uses four Blackhole ASICs operating as a unified mesh inside a desk‑friendly enclosure. Together they provide 480 Tensix cores and a claimed 2,654 TFLOPS at BlockFP8 precision, backed by 128 GB of GDDR6 high‑speed memory and 256 GB of DDR5 system memory.

The architecture integrates compute and high‑density SRAM on a single die, following a dataflow approach that prioritizes efficient tensor movement through on‑chip memory. By leaning on SRAM and GDDR6, the system is designed to sidestep traditional DRAM bandwidth bottlenecks that limit sustained throughput on conventional CPU and GPU platforms. Tenstorrent is also explicitly avoiding dependence on High‑Bandwidth Memory; by not using HBM, QuietBox 2 is insulated from some of the supply and pricing pressures currently affecting GPU‑class accelerators.

The workstation runs Ubuntu 24.04 and plugs into a standard 120V wall outlet, with no requirement for dedicated power, racks, or a controlled server room. That configuration aligns it more with high‑end professional desktops than with datacenter hardware, even though the inference-performance targets rack‑class use cases.

Open Source from Compiler to Kernel

A central design point for QuietBox 2 is that its software stack is entirely open source. Tenstorrent promotes this as full‑stack visibility for teams that need to understand and audit behavior at every level of the system.

TT‑Forge provides control over graph lowering, transformation, and optimization flows, exposing how models are mapped onto the underlying hardware. TT‑Metalium serves as the low‑level AI SDK, providing kernel‑level control and deterministic execution for developers who need to tune performance or implement custom operators. TT‑LLK handles low‑level kernel software, extending visibility into how workloads execute on the Tensix cores.

This structure allows developers to inspect the pipeline from model graph to kernel execution, debug at the hardware boundary, fork or modify individual components, and adapt the stack to specialized workloads. For regulated industries, sovereign AI initiatives, and research institutions that must document how infrastructure manages data and executes models, this level of transparency is a core part of the platform proposition rather than an optional add‑on.

QuietBox 2 ships with Ubuntu 24.04 preconfigured, along with the full open‑source stack and TT‑Studio, Tenstorrent’s development environment. The goal is to make the system usable out of the box for both experimentation and production‑grade inference.

Power, Thermals, and Desk‑Side Deployment

The second‑generation QuietBox emphasizes power and acoustic efficiency. Tenstorrent reports that engineering changes have cut idle power consumption and heat output by roughly 50 percent compared to previous generations. The new liquid‑cooled chassis is designed for quiet, sustained operation under heavy workloads in a desk‑side setting rather than a dedicated equipment room.

Alongside the hardware changes, Tenstorrent has expanded documentation and developer tooling to target faster bring‑up and shorter time to usable performance. For small and medium businesses, the combination of a single 120V plug, liquid cooling, and a preinstalled stack is intended to provide rack‑class inference capability without adding server-room build‑outs or incremental IT headcount.

Availability

TT‑QuietBox 2 is scheduled to ship globally in Q2 2026, with pricing starting at $9,999. Interested organizations can join the waitlist via Tenstorrent’s site at www.tenstorrent.com/waitlist/tt-quietbox.

Harold Fritts

I have been in the tech industry since IBM created Selectric. My background, though, is writing. So I decided to get out of the pre-sales biz and return to my roots, doing a bit of writing but still being involved in technology.

Previous post: CTERA Fusion Direct Targets Files and Object Storage for AI-Driven Workload

Next post: Veeam Announces General Availability of Agentless Backup for HPE Morpheus VM Essentials