NVIDIA RTX Spark + Cosmos 3: Everything From the Keynote

TL;DR

NVIDIA used its GTC Taipei 2026 keynote to reposition from chip maker to full-stack AI platform. The headline was RTX Spark, a Windows-on-Arm superchip putting 1 petaflop of local AI compute on slim laptops and compact desktops, built to run personal AI agents without cloud inference bills. NVIDIA also open-sourced Cosmos 3 for physical AI and robotics, pushed its Vera Rubin platform into full production across 350+ factories, and released Nemotron 3 Ultra, its largest open-weights model yet.

Contents

The under-covered detail: NVIDIA named two open source agent projects, OpenClaw and Hermes Agent, as the flagships of its local-agent push, with Nous Research’s CEO quoted directly on stage. RTX Spark systems arrive fall 2026, no price announced. Jensen and Microsoft framed it as reinventing the PC for the agentic era, with Huang appearing at Microsoft Build the next day to continue the pitch.

Best for anyone tracking AI hardware, local agents, or where the industry is heading. Not ideal for readers who only care about chatbots and not the machines underneath them.

Jensen Huang got on stage at GTC Taipei last night and said it directly: a long time ago Nvidia used to be a GPU company.

Then he spent two hours proving the past tense.

This wasn’t a graphics card keynote with some AI sprinkled on top. Instead, it was a chip company announcing it now sells the whole stack. The silicon, the operating layer, the models, and the agents that run on top of them. Here’s everything that dropped and what actually matters.

RTX Spark Puts a Petaflop on Your Laptop

The headline was RTX Spark, a new class of Windows machine built around a chip NVIDIA calls a superchip. It pairs an Arm CPU with a Blackwell GPU and 128GB of unified memory, delivering roughly 1 petaflop of AI performance. It runs models up to 120B parameters entirely on the machine. Laptops and desktops arrive fall 2026. No price was given, which usually translates to expensive.

The Pitch Is Independence, Not Speed

The pitch isn’t raw speed. Instead, it’s independence from the cloud. RTX Spark is built to run personal AI agents locally rather than renting inference by the token. NVIDIA and Microsoft framed it as the first real reinvention of the PC in 40 years. In their version, AI agents act as a new interface on top of the mouse and keyboard. You tell the agent what you want in plain language. Then it sets goals, calls tools, checks its own work, and refines.

Jensen pushed the phone comparison hard. After all, people barely use phones for calls anymore. So what does a PC become when the primary user isn’t you, but an agent acting for you? His answer: an agentic AI computer running in your house, doing tasks while you’re elsewhere.

Why the Economics Matter

The strategic weight here is real. If agents run locally on hardware you bought once, then the per-token cloud business that funds OpenAI and Anthropic suddenly has a competitor. And that competitor doesn’t charge by the request. The model is open weight. The bill is the laptop. As a result, it’s a different economic model than the entire cloud-AI industry is built on.

NVIDIA also announced a DGX Station for Windows. It’s positioned as a deskside supercomputer for trillion-parameter models, aimed at enterprise developers inside the Windows ecosystem rather than Linux.

The Hermes and OpenClaw Moment Almost Nobody Covered

Here’s the piece that matters most for the open source agent world, and most mainstream recaps skipped it entirely.

NVIDIA named two open source agent projects as the flagships of its local-agent strategy: OpenClaw and Hermes Agent. Both are integrating NVIDIA’s new OpenShell runtime and Microsoft’s security primitives into native Windows apps. The NemoClaw installer now ships with streamlined Hermes Agent support too. And Nous Research CEO Dillon Rolnick was quoted directly in the keynote. RTX Spark buyers, he said, will feel like they bought a full-fledged assistant, not a typical laptop.

For context, Hermes crossed 140,000 GitHub stars in under three months and became the most-used agent on OpenRouter. So this is the largest institutional endorsement an open source agent has ever gotten. NVIDIA put Hermes on stage at the exact keynote that repositioned the whole company. We broke down why it works in the Hermes Agent complete guide. The short version: Hermes is a real orchestration layer, not a thin wrapper. That’s exactly what makes it run well on the smaller local models RTX Spark is designed for.

The read underneath the announcement is simple. The local-agent stack NVIDIA just blessed runs on open weights. Hermes, OpenClaw, Qwen 3.6, Nemotron. Not Claude. Not GPT. Anthropic and OpenAI were absent from the local-agent story entirely. In this version of the future they’re the cloud layer, while the thing running on your laptop is somebody else’s software. That’s the same positioning fork the OpenClaw complete guide has been tracking for months. Last night, NVIDIA made it concrete.

Performance got a real lift too. New inference optimizations deliver 2x throughput in llama.cpp and 2.6x in vLLM for top agentic models. Additionally, there are multi-GPU gains in llama.cpp and ComfyUI. H Company is also bringing its computer-use tools to NVIDIA platforms, optimized for RTX and DGX machines.

Cosmos 3 Opens Up Physical AI

NVIDIA open-sourced Cosmos 3, its foundation model for physical AI, with Super and Nano variants live on Hugging Face and GitHub. One model handles vision reasoning, world generation, and action prediction together. The goal is cutting robotics training time from months to days.

This is the robotics play. For instance, the founding Cosmos Coalition includes Runway, Black Forest Labs, Skild AI, Agile Robots, Generalist, and LTX. The pitch is that physical AI needs a shared foundation model, the same way language AI standardized on transformers. NVIDIA wants Cosmos to be that base. And open-sourcing it is how you make a standard.

Cosmos 3 ranked #1 across seven-plus robotics benchmarks per NVIDIA’s own numbers. As always with vendor benchmarks, treat that as a starting point rather than gospel. Still, the direction is clear. NVIDIA is trying to own the robotics stack the same way it owns the AI training stack.

Vera Rubin Enters Full Production

The infrastructure announcement was Vera Rubin, NVIDIA’s next-generation AI platform, entering full production. Jensen put real numbers on it. The supply chain is twice the size of Grace Blackwell, with 150 Taiwan-based partners across 350+ factories in 30 countries. The platform pairs an 88-core Vera CPU with Rubin GPUs and Spectrum-X Ethernet Photonics, targeting AI factories running a million GPUs.

That’s not a product. It’s an industrial base. Moreover, it’s the part of the keynote that explains why NVIDIA can afford to give away Cosmos 3 and subsidize local agents on RTX Spark. The data center platform is the profit engine. Everything else is NVIDIA expanding where its chips show up, from the cloud to the robot to your desk.

Jensen also laid out the roadmap beyond Vera Rubin: Vera Rubin Spark processors in 2028, then Rosa Feynman Spark in 2030. The naming convention pairs a CPU scientist with a physicist, and it’s now the multi-year cadence.

Nemotron 3 Ultra and the Open-Weights Push

NVIDIA released Nemotron 3 Ultra, its largest open-weights model to date. The specs: 550B parameters with 55B active, an Intelligence Index of 48, and 300+ tokens per second. NVIDIA says it tops the US open-weights rankings.

The point of Nemotron isn’t to beat Claude or GPT on raw capability. Rather, it’s to give the local-agent stack a strong open model that runs on NVIDIA hardware. That way the whole loop stays inside the NVIDIA ecosystem: chip, runtime, model, agent. A company that sells the hardware has every reason to make sure there’s a great open model to run on it. Nemotron 3 Ultra is that model.

The Gaming Stuff, Briefly

It wasn’t all AI. NVIDIA announced DLSS 4.5 Ray Reconstruction coming in August, with 11 more games adding support and over 1,000 RTX-enhanced titles now available. Adobe Premiere and Photoshop are also getting RTX Spark updates that double their speed. But Jensen’s own framing said it all. The GPU announcements were the warm-up act. The main event was a chip company becoming an AI platform company in real time.

What This Actually Means

Three things came out of this keynote that matter past the news cycle.

First, NVIDIA is now selling the entire stack, not just the chips. That means silicon, the OpenShell agent runtime, open models like Nemotron and Cosmos, and the hardware to run them locally. The company that powered everyone else’s AI is now building the full vertical. Notably, it’s the same move Microsoft is making with its MAI models and OpenAI is making with its consulting arm. Everyone is going vertical at once.

Second, the local-agent era stopped being theoretical. A petaflop on a laptop running 120B-parameter models is enough to run a serious agent without touching the cloud. So if RTX Spark ships and performs as claimed, the cloud-inference business has a real competitor for a chunk of the agent workload. And that competitor charges once for hardware instead of forever by the token.

Third, the open source agents won the endorsement war. Hermes Agent and OpenClaw didn’t get a passing mention. Instead, they got named as the flagships of NVIDIA’s local-agent push, with a founder quoted on stage. The agents the open source community built are the agents NVIDIA chose to put on its hardware. The frontier labs make the cloud models. Meanwhile, the community makes the things that run on your desk. Last night NVIDIA picked a side, and it wasn’t the obvious one.

A long time ago NVIDIA used to be a GPU company. After last night, it’s the company building the floor the entire agentic AI era is going to stand on.

OpenClaw: The Complete 2026 Deep Dive (Install, Cost, Hardware, Real Reviews & More)

Agent Skills Marketplace (Skills.sh): The App Store for AI Agents Has a Malware Problem

Claude Checked the Clock, Saw 2026, and Decided the Real World Was the Test

An AI Model Read HAWK for 60 Hours. Its Authors Pulled It the Next Day.

It Took $100,000 and Some Badly Spelled Prompts to Weaken a Post-Quantum Cipher

Moonshot Gave Away a Frontier Model and Almost Nobody Can Run It

NVIDIA Stopped Being a Chip Company Last Night