New Epoch in AI Hardware As At CES 2026, NVIDIA didn’t just unveil a new chip
At CES 2026 in Las Vegas, NVIDIA Chief Executive Jensen Huang stood before an audience of technologists, developers and global press and delivered a message that will define the next decade of computing: “The future of AI depends not on isolated processors but on system-scale innovation.” To make that future real, NVIDIA introduced the Vera Rubin architecture, a thoroughly re-engineered AI computing platform designed to propel trillion-parameter models and advanced reasoning workloads beyond the limits of current hardware.
Unlike previous hardware milestones that focused narrowly on faster chips, Vera Rubin represents an end-to-end infrastructure shift. It tackles one of the most pressing constraints in modern AI, memory bandwidth and context scalability, with radical design choices across processors, interconnects, storage, and integrated networking. The result: a platform not just built for speed, but for scale, cost-efficiency and AI’s next frontier.
“Vera Rubin” Name Matters
The architecture’s name is inspired by Vera Florence Cooper Rubin, an American astronomer whose pioneering observations fundamentally altered our understanding of the universe’s structure. In a similar spirit, NVIDIA’s Rubin platform seeks to alter the architecture of large-scale AI computing, expanding the horizons of what machines can model, reason and generate at massive scales.
Just as Rubin’s work challenged assumptions about dark matter and cosmic motion, the Vera Rubin platform challenges assumptions in AI hardware, that performance gains come only from incremental transistor counts or cycle-speed boosts. Instead, NVIDIA’s approach redefines bottlenecks, elevating memory bandwidth, sparse reasoning, and system integration to first-class design goals.
Six-Chip Ecosystem: More Than GPU
At the core of Vera Rubin is a radical idea: components should be designed together, not in isolation. The Rubin platform consists of six tightly integrated chips that function as a single AI supercomputer when deployed in rack-scale configurations.
1. Rubin GPU: A 336-Billion-Transistor Powerhouse
The heart of the architecture is the Rubin GPU, boasting approximately 336 billion transistors, roughly 1.6× the density of NVIDIA’s prior Blackwell chips. These GPUs harness HBM4 high-bandwidth memory, up to 288 GB per GPU and 22 TB/s of memory bandwidth, delivering transformative capability for models that must manipulate vast token contexts.
- Inference Performance: ~50 PFLOPS NVFP4, about five times Blackwell’s output in key AI workloads.
- Training Performance: ~35 PFLOPS, significantly accelerating model convergence.
- Memory Breakthrough: 22 TB/s bandwidth redefines data access speeds critical to large-model training and inference.
This shift isn’t just about raw compute; it’s about feeding that compute rapidly and without bottlenecks, a longstanding challenge as models have ballooned in parameter counts.
Vera CPU: Orchestrating Data for Reasoning
To complement the GPU’s raw throughput, NVIDIA introduced a custom Vera CPU featuring 88 ARM-based “Olympus” cores with spatial multi-threading for 176 threads and massive memory capacity. This CPU is engineered not as a traditional processor but as a data movement and coordination hub, essential for tasks such as long-context reasoning and agentic AI workflows that involve dynamic memory access.
Each Vera CPU integrates 1.5 TB of LPDDR5X memory with bandwidth scaled to match the GPU’s throughput, enabling seamless data exchange across AI workloads. This massive memory reservoir addresses a growing trend: modern AI agents require far more than GPU local memory, they demand unified, coherent memory that spans CPU and GPU domains.
Interconnects, Networking and Storage Chips
The Vera Rubin platform also breaks new ground in how data moves and lives in an AI ecosystem:
- NVLink 6 Switch: Provides up to 3.6 TB/s bidirectional bandwidth between GPUs, crucial for coherent scaling across racks.
- ConnectX-9 SuperNIC: Ultra-fast network interface cards tailored for distributed AI workloads.
- BlueField-4 DPU: Offloads data processing functions, helping accelerate inference and reduce system overhead.
- Spectrum-X Ethernet Switch: High-capacity networking infrastructure supporting multi-rack deployments.
In effect, these pieces unify compute, storage and networking into a cohesive AI engine far beyond what traditional server architectures deliver.
Scaling Trillion-Parameter Models: Breaking Bottlenecks
By 2026, AI research has entered a new design frontier: models with parameter counts in the trillions or more. While advances in model architectures have accelerated, hardware has struggled to keep pace, especially in terms of memory handling and efficient data flow. Rubins’ architecture addresses this head-on.
Memory Bandwidth as the New Performance Frontier
Historically, AI performance scaled with raw FLOPS. But as researchers and engineers push toward agentic models, the real constraint is memory bandwidth and latency. Vast model state, long-context windows, and multi-step reasoning amplify the need for lightning-fast access to context data.
- HBM4 Memory: At 22 TB/s per GPU, the Rubin platform dislocates traditional bottlenecks, ensuring that compute units spend more cycles working and fewer waiting on data.
- Inference Context Memory Storage: A new storage tier powered by BlueField-4 DPUs shifts large token caches out of limited GPU HBM into scalable shared memory, boosting throughput up to 5× over legacy architectures.
These innovations are key for mixture-of-experts (MoE) models and other architectures that split computation across multiple subnetworks, allowing mega-models to be trained with up to 4× fewer GPUs and at a fraction of token cost compared to the Blackwell generation.
Rack-Scale Integration: NVL72 and Beyond
NVIDIA didn’t simply build a chip; it built a crawl-into-the-rack AI engine. The NVL72 — a rack-scale system consisting of 72 Rubin GPUs, 36 Vera CPUs, and advanced interconnects — acts as a single massive logical GPU, delivering exascale performance in a modular form factor.
- Exascale AI Performance: NVL72 systems can deliver performance on par with leading AI supercomputers, making them suitable for research labs, cloud service providers and enterprises alike.
- Modular Design: Liquid-cooled and cable-free configurations dramatically reduce deployment time and operational complexity.
- High-Speed Networking: Up to 260 TB/s total NVLink bandwidth across a rack ensures tight coupling between compute and memory.
Major cloud partners, including AWS, Microsoft Azure, Google Cloud and others, will begin offering Vera Rubin-based services in the second half of 2026, making this technology accessible beyond NVIDIA’s own data centers.
Strategic Implications and Competitive Landscape
NVIDIA’s unveiling of Vera Rubin is more than a product launch, it’s a strategic statement that hardware remains central to AI progress. As AI workloads evolve from static model training toward continuous reasoning, agentic AI and AI factories, high-bandwidth memory and integrated systems will define competitive advantage.
Cloud and Data Center Dynamics
Hyperscalers are already embracing Rubin:
- Nebius plans to deploy NVL72 systems across the U.S. and Europe, enabling next-generation AI applications.
- Early commitments from AWS, Google Cloud and Azure underscore the demand for Rubin infrastructure.
Industry Challenges Ahead
While Rubin redefines performance, supply chain pressures, especially for cutting-edge HBM4 memory, and global competition in AI hardware remain complex. Ensuring widespread availability across regions and industries will be key to maintaining NVIDIA’s leadership.
New Standard for AI Infrastructure
NVIDIA’s Vera Rubin architecture marks a pivotal turning point in AI infrastructure. It moves beyond the CPU-GPU dichotomy that dominated the previous decade, embracing system-level design that directly responds to the needs of trillion-parameter AI models, massive context reasoning, and next-generation AI services.
By integrating GPU compute, server-grade CPUs, ultra-high-bandwidth memory, advanced interconnects and scalable storage, Vera Rubin doesn’t just accelerate AI workloads, it redefines the very architecture of compute. It signals that the future of AI lies not in isolated chips, but in holistic platforms built for scale, agility and reasoning.
In an era where AI models grow exponentially and applications demand unprecedented throughput, the Vera Rubin platform may prove to be the Rosetta Stone of AI infrastructure. It answers a fundamental question: how do we build systems that not only compute faster, but think bigger? The answer, at CES 2026, was unveiled in full.

