NVIDIA DGX Spark Achieves 2.5× Performance and 8× Video Speed in CES 2026 Enterprise Update

by Harold Fritts on January 5, 2026

AI ◇ Enterprise

At CES 2026, NVIDIA outlined a software update for DGX Spark that expands its role as a compact, on-premises AI system. The changes focus on new runtimes, quantization formats, and deployment playbooks that make it easier to run open-source models, automated workflows, and creator and 3D pipelines locally. The update is intended for teams already using DGX Spark in labs, edge deployments, or engineering environments and does not introduce new hardware.

Open-Source AI in Enterprise Settings

NVIDIA highlighted the rapid rise of open-source AI. Monthly downloads of open-source AI frameworks and models increased from around seven million in June 2023 to approximately 390 million by December 2025. Spark has been adjusted to support this trend: PyTorch, vLLM, SGLang, llama.cpp, LlamaIndex, and many community models from Qwen, Meta, Stability, and Wan are now fully integrated into the system.

For businesses standardizing on open tools, this translates to less custom integration work and more reliable operation for the stacks they already utilize.

Up to 2.5× Faster Since Launch

NVIDIA claims that performance on key workloads has improved by up to 2.5× compared to DGX Spark at launch. This boost comes from enhancements in TensorRT-LLM, aggressive quantization, and decoding optimizations rather than hardware changes.

A specific example is Qwen-235B using TensorRT-LLM. Transitioning from FP8 to NVFP4 quantization and employing speculative decoding with Eagle3 more than doubles throughput. Qwen3-30B under CUDA and Stable Diffusion 3.5 Large under llama.cpp achieves about 1.4× improvements. Fine-tuning processes, whether full or parameter-efficient, show noticeable but smaller gains as PyTorch pipelines are optimized for Spark’s GPUs.

Playbooks: From “Box on Desk” to Usable Workflows

To bridge the gap between raw computing power and usable solutions, NVIDIA is introducing new DGX Spark playbooks that combine tools, models, and guides for common enterprise tasks.

The new content includes:

Inference: vLLM, SGLang, TensorRT-LLM, speculative decoding, multi-modal inference, NIM on Spark, NVFP4 quantization, and local Nemotron-3 Nano.
Fine-tuning: multi-Spark PyTorch fine-tuning, FLUX.1 DreamBooth LoRA, LLAMA Factory, NeMo, and Unsloth-based efficient training.
Data science: CUDA-X Data Science, optimized JAX, text-to-knowledge-graph, and domain-specific workflows like single-cell RNA sequencing and quantitative portfolio optimization.
Tooling and connectivity: ComfyUI, VS Code, DGX Dashboard, RAG in AI Workbench, Tailscale, and NCCL-based “Connect Two Sparks.”

The focus is less on flashy demos and more on repeatability. These playbooks serve as pre-validated reference frameworks that can be deployed on-premises without reworking the entire stack.

8× Faster AI Video Pipelines with Hybrid MacBook and Spark

In content creation, NVIDIA tested AI video generation workflows pairing a MacBook Pro M4 Max with DGX Spark via ComfyUI and FLUX.1-dev, WAN 2.2, and GPU-accelerated upscalers.

The reference pipeline, featuring a 4K video of a red sports car in a futuristic city, showcased the improvements. Running the entire workflow on the MacBook alone took about eight minutes. Diverting the heavy processing to DGX Spark reduced the time to around 1 minute, resulting in an 8-fold speed increase.

This progress arises from operating FLUX.1-dev and WAN 2.2 in quantized NVFP4/NVFP8 on Spark, followed by RTX Video Super Resolution. For media, design, and marketing teams, this transforms generative video from batch-style rendering to a process much closer to interactive iteration while keeping control within the data center or edge rack.

3D and RTX Remix: Spark as a Background AI Co-Processor

DGX Spark is also marketed as a coprocessor for RTX Remix-based 3D workflows. For example, an RTX 5090 system runs the interactive “Mod” environment. At the same time, Spark generates AI textures in batches (e.g., 100 materials) and sends the results back into the scene without interrupting the artists.

A before-and-after comparison using SWAT 4 illustrated how AI-enhanced materials can revitalize legacy assets. However, the key business benefit lies in workflow design: Spark acts as a continuous background engine for generative tasks while primary workstations remain active.

Nsight Copilot: Local CUDA AI Assistant

One noteworthy addition is Nsight Copilot, which operates locally on DGX Spark.

Developers can request tasks, such as “write matmul in fp4 in CUDA,” and Nsight Copilot provides a complete cuBLAS-Lt example that quantizes FP32 inputs to FP4 on-device, sets per-tensor scaling, executes GEMM in FP32, and returns FP32 output. Importantly, this process occurs without sending code or data to a cloud service:

No cloud inference costs.
Source, data, and IP stay on-premises.
Local GPUs manage latency and throughput.

For organizations with strict data governance rules, this allows for AI-assisted development while keeping everything secure within their network.

Agents, Robotics, and Edge: From Lab to Production

NVIDIA is expanding Spark into agentic and robotics applications through a partnership with Hugging Face and the open-source Reachy Mini robot. DGX Spark operates the agents, LLMs, VLMs, and planners, while Reachy Mini provides a physical form suitable for human-robot interaction research, education, and prototyping.

DGX Spark will be included in NVIDIA AI Enterprise by the end of January. This offers businesses a supported software stack, lifecycle management, and security updates for running Spark in edge and on-prem scenarios, such as smart manufacturing and inspection, intelligent retail and loss prevention, and point-of-care healthcare applications.

Ecosystem and OEM Support

NVIDIA concluded the update by highlighting the extensive ecosystem: model providers (OpenAI, Mistral, Stability, Qwen, Meta, ElevenLabs, Black Forest Labs, and others), tools (vLLM, Comfy, LlamaIndex, Docker, Jupyter, Weights & Biases, Anyscale, along with scientific codes like VASP and GROMACS), and a wide range of OEMs, including Dell, Lenovo, HP, ASUS, MSI, Supermicro, and more.

The message for enterprise buyers is that the DGX Spark is no longer just a developer-focused tool. It is a small-form-factor node that integrates seamlessly into existing tools, vendor relationships, and production AI strategies.

Harold Fritts

I have been in the tech industry since IBM created Selectric. My background, though, is writing. So I decided to get out of the pre-sales biz and return to my roots, doing a bit of writing but still being involved in technology.

Previous post: Intel Core Ultra Series 3 Arrives at CES 2026 as Intel’s First 18A-Based Client Platform

Next post: VAST Data Introduces DPU-Native Inference Architecture for Shared KV Cache and Long-Lived Agentic AI