NVIDIA Rubin Platform: Key Features & Components – Summary
Here’s a breakdown of the key features and components of the NVIDIA Rubin platform,based on the provided text:
Core Innovations of the Rubin Platform:
* Rubin GPU: the foundation of the platform,offering significant performance improvements.
* NVLink 4: Next-generation nvlink providing 1.8TB/s bi-directional bandwidth.
* Fifth-Generation NVLink: Enables GPU-to-GPU and GPU-to-CPU interaction.
* Second-Generation Transformer Engine: Accelerates large language model (LLM) training and inference with features like FP8 precision and accelerated compression.
* Third-Generation NVIDIA Confidential Computing: Secures data across CPU, GPU, and NVLink domains.
* Second-Generation RAS Engine: Provides real-time health monitoring, fault tolerance, and proactive maintenance.
DGX SuperPOD Components (Scale-Out):
* DGX Vera Rubin NVL72 or DGX Rubin NVL8 systems (the core compute units)
* NVIDIA BlueField-4 DPUs: For secure, software-defined infrastructure.
* NVIDIA Inference Context Memory Storage Platform: For next-generation inference.
* NVIDIA ConnectX-9 SuperNICs: High-performance network interface cards.
* NVIDIA Quantum-X800 InfiniBand & NVIDIA Spectrum-X Ethernet: Networking solutions.
* NVIDIA Mission Control: Automated AI infrastructure orchestration and operations.
Specific System Details:
* DGX Vera Rubin NVL72:
* 576 Rubin GPUs (8 systems unified)
* 28.8 exaflops of FP4 performance
* 600TB of fast memory
* 36 Vera CPUs, 72 Rubin GPUs, 18 BlueField-4 DPUs per system
* 260TB/s aggregate NVLink throughput
* DGX Rubin NVL8:
* 512 Rubin GPUs (64 systems)
* 5.5x NVFP4 FLOPS compared to Blackwell systems
* Liquid-cooled, x86 CPU based.
Networking for AI Factories:
* NVIDIA Spectrum-6 Ethernet switches
* NVIDIA Quantum-X800 infiniband switches
* BlueField-4 DPUs
* ConnectX-9 SuperNICs
Key Benefit:
* Up to 10x reduction in inference token cost compared to the previous generation.
In essence, the Rubin platform is designed to be a highly scalable, secure, and efficient infrastructure for running demanding AI workloads, notably large language models. It focuses on eliminating bottlenecks in compute, memory, and networking to create a true “AI factory.”