NVIDIA confirms its entry into the next phase of the AI race. The first test samples of the Vera Rubin platform (VR200) have already been delivered to customers, with full-scale deliveries expected to begin in the second half of 2026. The company promises a significant increase in performance while simultaneously lowering the costs of training and inferring models. This is in response to record demand: in fiscal year 2025, NVIDIA achieved USD 215.9 billion in revenue, of which USD 68.1 billion was earned in the fourth quarter alone.
576 GB HBM4 and up to 100 PFLOPS in Superchip
The heart of the platform is the Rubin accelerator, offering up to 50 PFLOPS (FP4) per single unit and even 100 PFLOPS in Superchip configuration. Each GPU uses two computing chiplets and eight stacks of HBM4 memory, providing 288 GB per GPU and 576 GB in the Superchip module. The new Vera processor (Armv9.2 “Olympus”) has 88 cores and 172 threads and works with up to 1.5 TB LPDDR5X (SOCAMM). NVIDIA declares that with this architecture, training models with a trillion parameters may require even four times fewer GPUs than in the Blackwell generation, and inference costs are expected to drop by as much as 10×.
Cloud, scale and market advantage
Interest in the VR200 is expected to be very high, and implementations in the data centres of the largest cloud providers are anticipated following the start of mass production. If the announcements are confirmed, NVIDIA will solidify its position as a leader in AI infrastructure, setting new standards for memory density and computing power. The pace of delivery and real TCO in production environments will be critical. On paper, the VR200 represents a generational leap; in practice, everything will be decided by benchmarks and availability.
The Vera Rubin (VR200) is another step by NVIDIA towards extreme scale AI: hundreds of PFLOPS, hundreds of gigabytes of HBM4, and up to 1.5 TB of system memory. With record financial results, the company has the resources and demand to maintain its advantage. The second half of 2026 will reveal whether the promises of 4× fewer GPUs and 10× cheaper inference translate into real-world implementations.
Source: NVIDIA
Katarzyna Petru












