Nvidia Corporation (NASDAQ:NVDA) has built a $4.4 trillion empire selling chips for training AI models, but the AI business, previously defined by massive training runs, may soon not require the same amount of chips.
Hyperscalers are still spending heavily on training, but the priority has shifted to inference, the real-time computing that actually delivers AI to end users.
Jensen Huang has been calling 2026 the year inference takes over, and the numbers back him up.
OpenAI and Anthropic are producing thousands of times more inference tokens than a year ago as agentic AI workloads explode.
But Nvidia’s bestselling Grace Blackwell servers may not be the right hardware for the job.
Users say the systems consume too much energy and lack the memory for efficient inference.
‘There Is No Moat In Inference’
Cerebras CEO Andrew Feldman is leading the charge.
He told The Wall Street Journal that Nvidia’s proprietary CUDA software ecosystem, the moat that locked in developers for training, simply doesn’t apply to inference.
Cerebras recently signed Amazon Web Services as a customer and landed a deal with OpenAI reportedly worth over $10 billion for 750 megawatts of compute through 2028.
In a recent Benzinga interview, Feldman called the $20 billion Groq deal a strategic admission that GPU dominance is ending.
Cerebras shelved a previous IPO attempt in October 2025 and has since refiled confidentially, with a public listing possible as early as April.
Nvidia Isn’t Standing Still
At GTC on Monday, Huang unveiled Nvidia’s answer to the inference threat.
The Groq 3 LPU is a new type of chip built specifically for inference, not repurposed from training hardware.
It came out of the $20 billion licensing deal Nvidia struck with AI chip startup Groq in December, and is designed to process user queries at higher speeds and lower cost per token than Nvidia’s existing GPUs.
Meta Platforms (NASDAQ:META) is already buying in.
The company announced a long-term infrastructure partnership that includes deploying thousands of Nvidia’s Vera CPUs without any GPUs attached, a first for the company.
Deepwater’s Gene Munster recently told Benzinga he expects Nvidia’s revenue growth in calendar 2027 to hit 40%, well above the Street’s 28% estimate, driven largely by inference demand.
Nvidia CFO Colette Kress said the company remains confident. “Right now, we’re the king of inference.” Bettors agree, for now.
On Polymarket, traders still give Nvidia a 70% chance of remaining the world’s largest company at year-end.
Image: Shutterstock
Login to comment