NVIDIA introduces Rubin platform for large-scale AI systems

The platform combines six new chips and rack-scale systems, with early adoption expected across cloud providers, AI labs and system manufacturers.

NVIDIA launched the Rubin platform, which includes six chips intended to be used together in a rack-scale AI system. NVIDIA said Rubin is designed to support building, deploying and securing large AI systems while reducing training time and inference cost.

The Rubin platform uses extreme codesign across the six chips — the NVIDIA Vera CPU, NVIDIA Rubin GPU, NVIDIA NVLink 6 SwitchNVIDIA ConnectX-9 SuperNICNVIDIA BlueField-4 DPU and NVIDIA Spectrum-6 Ethernet Switch â€” to slash training time and inference token costs.

Named for Vera Florence Cooper Rubin — the trailblazing American astronomer whose discoveries transformed humanity’s understanding of the universe — the Rubin platform features the NVIDIA Vera Rubin NVL72 rack-scale solution and the NVIDIA HGX Rubin NVL8 system.

The Rubin platform introduces five innovations, including the latest generations of NVIDIA NVLink interconnect technology, Transformer Engine, Confidential Computing and RAS Engine, as well as the NVIDIA Vera CPU. These breakthroughs will accelerate agentic AI, advanced reasoning and massive-scale mixture-of-experts (MoE) model inference at up to 10x lower cost per token of the NVIDIA Blackwell platform. Compared with its predecessor, the NVIDIA Rubin platform trains MoE models with 4x fewer GPUs to accelerate AI adoption.

Broad ecosystem support

Among the world’s leading AI labs, cloud service providers, computer makers and startups expected to adopt Rubin are Amazon Web Services (AWS), Anthropic, Black Forest Labs, Cisco, Cohere, CoreWeave, Cursor, Dell Technologies, Google, Harvey, HPE, Lambda, Lenovo, Meta, Microsoft, Mistral AI, Nebius, Nscale, OpenAI, OpenEvidence, Oracle Cloud Infrastructure (OCI), Perplexity, Runway, Supermicro, Thinking Machines Lab and xAI.

Engineered to scale intelligence

Agentic AI and reasoning models, along with state-of-the-art video generation workloads, are redefining the limits of computation. Multistep problem-solving requires models to process, reason and act across long sequences of tokens. Designed to serve the demands of complex AI workloads, the Rubin platform’s five groundbreaking technologies include:

  • Sixth-Generation NVIDIA NVLink: Delivers the fast, seamless GPU-to-GPU communication required for today’s massive MoE models. Each GPU offers 3.6TB/s of bandwidth, while the Vera Rubin NVL72 rack provides 260TB/s — more bandwidth than the entire internet. With built-in, in-network compute to speed collective operations, as well as new features for enhanced serviceability and resiliency, NVIDIA NVLink 6 switch enables faster, more efficient AI training and inference at scale.
  • NVIDIA Vera CPU: Designed for agentic reasoning, NVIDIA Vera is the most power‑efficient CPU for large-scale AI factories. The NVIDIA CPU is built with 88 NVIDIA custom Olympus cores, full Armv9.2 compatibility and ultrafast NVLink-C2C connectivity. Vera delivers exceptional performance, bandwidth and industry‑leading efficiency to support a full range of modern data center workloads.
  • NVIDIA Rubin GPU: Featuring a third-generation Transformer Engine with hardware-accelerated adaptive compression, Rubin GPU delivers 50 petaflops of NVFP4 compute for AI inference.
  • Third-Generation NVIDIA Confidential Computing: Vera Rubin NVL72 is the first rack-scale platform to deliver NVIDIA Confidential Computing â€” which maintains data security across CPU, GPU and NVLink domains — protecting the world’s largest proprietary models, training and inference workloads.
  • Second-Generation RAS Engine: The Rubin platform — spanning GPU, CPU and NVLink — features real-time health checks, fault tolerance and proactive maintenance to maximize system productivity. The rack’s modular, cable-free tray design enables up to 18x faster assembly and servicing than Blackwell.

AI-native storage and secure, software-defined infrastructure

NVIDIA Rubin introduces NVIDIA Inference Context Memory Storage Platform, a new class of AI-native storage infrastructure designed to scale inference context at gigascale.

Powered by NVIDIA BlueField-4, the platform enables efficient sharing and reuse of key-value cache data across AI infrastructure, improving responsiveness and throughput while enabling predictable, power-efficient scaling of agentic AI.

As AI factories increasingly adopt bare-metal and multi-tenant deployment models, maintaining strong infrastructure control and isolation becomes essential.

BlueField-4 also introduces Advanced Secure Trusted Resource Architecture, or ASTRA, a system-level trust architecture that gives AI infrastructure builders a single, trusted control point to securely provision, isolate and operate large-scale AI environments without compromising performance.

With AI applications evolving toward multi-turn agentic reasoning, AI-native organizations must manage and share far larger volumes of inference context across users, sessions and services.

Different forms for different workloads

NVIDIA Vera Rubin NVL72 offers a unified, secure system that combines 72 NVIDIA Rubin GPUs, 36 NVIDIA Vera CPUs, NVIDIA NVLink 6, NVIDIA ConnectX-9 SuperNICs and NVIDIA BlueField-4 DPUs.

NVIDIA will also offer the NVIDIA HGX Rubin NVL8 platform, a server board that links eight Rubin GPUs through NVLink to support x86-based generative AI platforms. The HGX Rubin NVL8 platform accelerates training, inference and scientific computing for AI and high-performance computing workloads.

NVIDIA DGX SuperPOD serves as a reference for deploying Rubin-based systems at scale, integrating either NVIDIA DGX Vera Rubin NVL72 or DGX Rubin NVL8 systems with NVIDIA BlueField-4 DPUs, NVIDIA ConnectX-9 SuperNICs, NVIDIA InfiniBand networking and NVIDIA Mission Control software.

Next-generation ethernet networking

Advanced Ethernet networking and storage are components of AI infrastructure critical to keeping data centers running at full speed, improving performance and efficiency, and lowering costs.

NVIDIA Spectrum-6 Ethernet is the next generation of Ethernet for AI networking, built to scale Rubin-based AI factories with higher efficiency and greater resilience, and enabled by 200G SerDes communication circuitry, co-packaged optics and AI-optimized fabrics.

Built on the Spectrum-6 architecture, Spectrum-X Ethernet Photonics co-packaged optical switch systems deliver 10x greater reliability and 5x longer uptime for AI applications while achieving 5x better power efficiency, maximizing performance per watt compared with traditional methods. Spectrum-XGS Ethernet technology, part of the Spectrum-X Ethernet platform, enables facilities separated by hundreds of kilometers and more to function as a single AI environment.

Together, these innovations define the next generation of the NVIDIA Spectrum-X Ethernet platform, engineered with extreme codesign for Rubin to enable massive-scale AI factories and pave the way for future million-GPU environments.

Rubin readiness

NVIDIA Rubin is in full production, and Rubin-based products will be available from partners the second half of 2026.

Among the first cloud providers to deploy Vera Rubin-based instances in 2026 will be AWS, Google Cloud, Microsoft and OCI, as well as NVIDIA Cloud Partners CoreWeave, Lambda, Nebius and Nscale.

Microsoft will deploy NVIDIA Vera Rubin NVL72 rack-scale systems as part of next-generation AI data centers, including future Fairwater AI superfactory sites.

Designed to deliver unprecedented efficiency and performance for training and inference workloads, the Rubin platform will provide the foundation for Microsoft’s next-generation cloud AI capabilities. Microsoft Azure will offer a tightly optimized platform enabling customers to accelerate innovation across enterprise, research and consumer applications.

CoreWeave will integrate NVIDIA Rubin-based systems into its AI cloud platform beginning in the second half of 2026. CoreWeave is built to operate multiple architectures side by side, enabling customers to bring Rubin into their environments, where it will deliver the greatest impact across training, inference and agentic workloads.

Together with NVIDIA, CoreWeave will help AI pioneers take advantage of Rubin’s advancements in reasoning and MoE models, while continuing to deliver the performance, operational reliability and scale required for production AI across the full lifecycle with CoreWeave Mission Control.

In addition, Cisco, DellHPELenovo and Supermicro are expected to deliver a wide range of servers based on Rubin products.

AI labs including Anthropic, Black Forest, Cohere, Cursor, Harvey, Meta, Mistral AI, OpenAI, OpenEvidence, Perplexity, Runway, Thinking Machines Lab and xAI are looking to the NVIDIA Rubin platform to train larger, more capable models and to serve long-context, multimodal systems at lower latency and cost than with prior GPU generations.

Infrastructure software and storage partners AIC, Canonical, Cloudian, DDN, Dell, HPE, Hitachi Vantara, IBM, NetApp, Nutanix, Pure Storage, Supermicro, SUSE, VAST Data and WEKA are working with NVIDIA to design next-generation platforms for Rubin infrastructure.

The Rubin platform marks NVIDIA’s third-generation rack-scale architecture, with more than 80 NVIDIA MGX ecosystem partners.

To unlock this density, Red Hat today announced an expanded collaboration with NVIDIA to deliver a complete AI stack optimized for the NVIDIA Rubin platform with Red Hat’s hybrid cloud portfolio, including Red Hat Enterprise Linux, Red Hat OpenShift and Red Hat AI. These solutions are used by the vast majority of Fortune Global 500 companies.

For more information, visit nvidia.com.