NVIDIA confirms its entry into the next phase of the AI race. The first test units of the Vera Rubin platform (VR200) have already reached customers, and full-scale deliveries are set to begin in the second half of 2026. The company announces a significant increase in performance while simultaneously reducing the costs of training and inference for models. This is a response to record demand: in the fiscal year 2025, NVIDIA achieved revenues of $215.9 billion, with $68.1 billion generated in the fourth quarter alone.
576 GB HBM4 and up to 100 PFLOPS in Superchip
The heart of the platform is the Rubin accelerator, offering up to 50 PFLOPS (FP4) per single chip and even 100 PFLOPS in Superchip configuration. Each GPU uses two compute chiplets and eight stacks of HBM4 memory, providing 288 GB per GPU and 576 GB in the Superchip module. The new Vera processor (Armv9.2 “Olympus”) features 88 cores and 172 threads and works with up to 1.5 TB LPDDR5X (SOCAMM). NVIDIA claims that with this architecture, training models on the order of a trillion parameters may require up to four times fewer GPUs than in the Blackwell generation, and inference costs are expected to drop by as much as 10×.
Cloud, Scale and Market Advantage
Interest in the VR200 is expected to be very high, with implementations in the data centres of the largest cloud providers anticipated following the start of mass production. If the announcements are confirmed, NVIDIA will solidify its position as the leader in AI infrastructure, setting new standards for memory density and computational power. The speed of delivery and the actual TCO in production environments will be crucial. On paper, the VR200 represents a generational leap; in practice, everything will be determined by benchmarks and availability.
Vera Rubin (VR200) is NVIDIA's next step towards extreme AI scale: hundreds of PFLOPS, hundreds of gigabytes of HBM4 and up to 1.5 TB of system memory. With record financial results, the company has the resources and demand to maintain its advantage. The second half of 2026 will show whether the promises of 4× fewer GPUs and 10× cheaper inference will translate into real implementations.
Source: NVIDIA
Katarzyna Petru












