You're offline - Playing from downloaded podcasts
Back to All Episodes
Podcast Episode

Nvidia Blackwell Ultra Delivers Fifty Times Performance Boost for AI Inference

February 17, 2026

0:00
3:30
Podcast Thumbnail

Nvidia has released benchmark data showing its GB300 NVL72 systems powered by Blackwell Ultra GPUs deliver up to fifty times higher throughput per megawatt and thirty-five times lower cost per token compared to the previous Hopper platform. Major cloud providers including Microsoft, CoreWeave, and Oracle are already deploying the systems at scale for agentic AI workloads.

Blackwell Ultra Raises the Bar for AI Inference

Nvidia has published new performance data demonstrating that its latest GB300 NVL72 rack-scale systems, built around the Blackwell Ultra GPU architecture, deliver a massive leap in AI inference efficiency. According to benchmarks validated by SemiAnalysis, the platform achieves up to fifty times higher throughput per megawatt and thirty-five times lower cost per token compared to the company's previous-generation Hopper systems.

Architectural Improvements Under the Hood

The performance gains stem from a combination of hardware and software advances. Blackwell Ultra Tensor Cores deliver one point five times more compute performance than the standard Blackwell GPUs, while attention-layer processing has been doubled through accelerated softmax execution. These improvements directly target the bottlenecks in transformer attention layers that reasoning models with large context windows depend on. The system features seventy-two Blackwell Ultra GPUs and thirty-six Grace CPUs in a fully liquid-cooled, rack-scale design with up to two hundred and eighty-eight gigabytes of HBM3e memory per GPU.

Nvidia's TensorRT-LLM inference library has also seen significant optimisation, with throughput per GPU doubling at some interactivity levels since late twenty twenty-five.

Cloud Providers Racing to Deploy

Major cloud infrastructure providers have moved quickly to bring GB300 NVL72 systems into production. CoreWeave was among the first to deploy the systems, integrating them with its Kubernetes-based cloud stack. Microsoft launched what it describes as the world's first large-scale GB300 NVL72 supercomputing cluster, achieving over one point one million tokens per second on a single rack. Oracle Cloud Infrastructure is also scaling its deployments beyond one hundred thousand Blackwell GPUs.

Reshaping AI Economics

The cost reductions could fundamentally change AI deployment economics. Leading inference providers including Baseten, DeepInfra, Fireworks AI, and Together AI have reported up to ten times cost reductions using the standard Blackwell platform. The Ultra variant extends these gains further for low-latency workloads, making large-scale deployment of AI agents and coding assistants more economically viable.

MLPerf Dominance and What Comes Next

In the latest MLPerf Inference benchmarks, Blackwell Ultra set new records across all newly added tests, including DeepSeek-R1 and Llama three point one, delivering forty-five percent higher throughput than the previous Blackwell generation on reasoning workloads. Looking ahead, Nvidia has already unveiled its next-generation Vera Rubin platform at CES twenty twenty-six, promising another ten times performance improvement over Blackwell, with availability planned for the second half of twenty twenty-six.

Published February 17, 2026 at 5:06am

More Recent Episodes