NVIDIA GTC: New Open Models & AI Advancements for Agent PCs
NVIDIA unveiled a new category of computing, dubbed “agent computers,” at this year’s GTC conference, shifting the paradigm from personal devices like PCs and smartphones to systems designed to run personalized AI agents locally and privately. The announcement centers around the emergence of “openclow,” an open-source stack designed to maximize the user experience of running these agents on NVIDIA hardware.
The company introduced new open models for local agents, including NVIDIA Nemotron 3 Nano 4B and Nemotron 3 Super 120B, alongside optimizations for Qwen 3.5 and Mistral Small 4. NVIDIA’s NemoClaw, an open-source stack for openclow, aims to enhance security and support local models, optimizing the experience for users on NVIDIA devices. The company also highlighted Unsloth Studio, a tool designed to simplify fine-tuning, further improving the accuracy of open models for agentic workflows.
Throughout the GTC conference, which ran daily from 8 AM to 5 PM local time until March 19th, attendees participated in the “build-a-claw” event at GTC Park. NVIDIA experts assisted participants in building and deploying personalized, always-on AI assistants tailored to their individual devices. The event was designed to be accessible to all skill levels, allowing users to name their agents, define their personalities, and grant access to necessary tools for employ within existing messaging applications.
The newly released open models are designed to deliver cloud-level quality locally. Nemotron 3 Super, featuring 120 billion parameters and 120 billion active parameters, is specifically engineered for complex agentic AI systems and performs optimally on systems like the DGX Spark or NVIDIA RTX PRO workstations. It achieved a score of 85.6% on PinchBench, a new benchmark for large language model (LLM) performance within the openclow environment, surpassing comparable open models.
Mistral Small 4, with 119 billion parameters and 60 billion active parameters, consolidates the capabilities of Mistral’s flagship models, offering a high-efficiency model optimized for both general chat, coding, and agentic tasks. Both models are designed to run locally on DGX Spark and RTX PRO GPUs.
For GeForce RTX users seeking lighter models, NVIDIA released Nemotron 3 Nano 4B, a smaller yet powerful model for building local agents and assistants on RTX AI PCs. This model is particularly suited for implementing conversational personas that can perform actions within hardware-constrained environments like games or applications, offering high instruction-following capabilities and tool usage performance with minimal VRAM requirements.
NVIDIA also announced optimizations for Alibaba’s Qwen 3.5 model, demonstrating high accuracy across various sizes (27B, 9B, 4B) and suitability for running local agents on NVIDIA GPUs. The model natively supports vision, multi-token prediction, and a large context window of up to 262,000 tokens. The 27 billion parameter version demonstrates enhanced performance when paired with an RTX 5090 GPU.
These models are accessible through platforms like Ollama, LM Studio, and llama.cpp, enabling users to experience accelerated inference on RTX GPUs and DGX Spark systems.
Recent updates to Lightricks’ LTX 2.3, an audio-video model, now support NVFP4 and FP8 distilled models, resulting in a 2.1x performance improvement. Black Forest Labs’ FLUX.2 Klein 9B model has also seen image editing speeds increased by up to 2x with recent updates, with NVIDIA collaborating to release an FP8 version optimized for performance and memory efficiency on RTX GPUs.
Addressing concerns around token costs, security, and privacy associated with agentic systems like openclow, NVIDIA introduced NemoClaw, an open-source stack for optimizing openclow on NVIDIA devices. NemoClaw initially features NVIDIA Nemotron open models and the NVIDIA OpenShell runtime, allowing users to execute inference locally, enhancing privacy and eliminating token costs. OpenShell is designed as a secure runtime for running claws.
To simplify the process of fine-tuning open models, NVIDIA launched Unsloth Studio, a web-based, user-friendly interface. Supporting over 500 AI models, Unsloth Studio streamlines training and fine-tuning, allowing users to upload datasets, generate high-quality synthetic data through a graph-based canvas, and initiate fine-tuning. The studio supports quantized low-rank adaptation, low-rank adaptation, and full fine-tuning, providing real-time monitoring and visualization of the process. The new interface leverages custom GPU kernels to accelerate training by up to 2x and reduce VRAM usage by up to 70%, enabling users to maximize the performance of NVIDIA RTX GPUs and DGX Spark systems. Unsloth Studio now supports Nemotron 3 Nano 4B and Qwen 3.5.
