The Dawn of Multi-Agents: Enter Nemotron 3 Nano
If 2025 was the year we started seeing AI agents pop up everywhere, then 2026 is shaping up to be the year of the multi-agent. Imagine a team of AI assistants working together, each contributing its unique skills to accomplish complex tasks. This exciting leap forward, however, hinges on a critical capability: the ability to process and generate a vast number of tokens with models that are both lightweight and highly accurate.
This is where the innovation race truly heats up. The challenge lies in a classic trade-off: smaller, more efficient models are fast and budget-friendly, but often fall short in the deep reasoning, robustness, and extended memory required for sophisticated multi-agent systems. On the flip side, larger, more powerful models deliver impressive accuracy but can become prohibitively slow and expensive when multiple agents need to operate in parallel.
As agentic systems become more complex and ambitious, the costs of running them can skyrocket, context windows become limiting bottlenecks, and overall reliability can start to waver. This is precisely why efficiency has become paramount in the AI landscape.
It’s within this crucial balancing act that NVIDIA has engineered a significant advancement: the NVIDIA Nemotron 3 Nano 30B A3B. This isn’t just another model; it’s a key component of NVIDIA’s Nemotron 3 family, which also includes the Super and Ultra variants, designed to set a new standard for open, intelligent, and highly efficient agentic models.
Nemotron 3 Nano distinguishes itself with a pioneering hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture. What’s truly remarkable is its colossal 1 million-token context window. This powerful combination empowers developers to build high-throughput, dependable agents that are not only more accurate and scalable but also excel at specialized sub-tasks within long-running, multi-step workflows.
Nemotron 3 Nano: The Quick Take (TL;DR)
- Hybrid Mamba-Transformer MoE Architecture: A sophisticated blend of Mamba-2 for swift, low-latency long-context processing and Transformer attention for precise, fine-grained reasoning.
- Efficient Parameter Usage: Boasts 31.6 billion total parameters, but only utilizes approximately 3.6 billion active parameters per token, leading to impressive throughput and reduced latency.
- Exceptional Inference Efficiency: Achieves up to 4x faster inference speeds compared to Nemotron Nano 2 and up to 3.3x faster than leading models in its size category.
- Best-in-Class Reasoning Accuracy: Demonstrates superior performance across reasoning, coding, tool utilization, and complex agentic tasks.
- Intelligent Reasoning Controls: Features ‘Reasoning ON/OFF’ modes and a configurable ‘thinking budget’ to precisely control ‘thinking’ tokens, ensuring predictable inference costs.
- Massive 1M-Token Context Window: Ideal for handling extended workflows, retrieval-augmented tasks, and maintaining persistent memory.
- Fully Open: Comes with open weights, datasets, training recipes, and framework access, fostering community collaboration.
- Comprehensive Open Data Stack: Includes 3 trillion new high-quality pre-training tokens, 13 million cross-disciplinary post-training samples, and over 10 Reinforcement Learning (RL) environments covering more than 900,000 tasks in math, coding, reasoning, and tool-use, alongside approximately 11,000 agent-safety traces.
- Effortless Deployment: Seamless integration with popular serving frameworks like vLLM and SGLang, and easy access via OpenRouter, various inference service providers, and build.nvidia.com endpoints.
- License: Released under the permissive NVIDIA Open Model License.
What Exactly is Nemotron 3 Nano?
Nemotron 3 Nano (30B/A3B) represents NVIDIA’s latest stride in creating compact yet potent reasoning models. It builds upon the foundation laid by Nemotron Nano 2, inheriting its innovative hybrid Mamba-2 + Transformer architecture, intuitive ‘Reasoning ON/OFF’ modes, and explicit thinking budget controls. The significant evolutionary leap in Nano is the introduction of a sparse Mixture-of-Experts (MoE) design.
At its core, this architecture involves:
- 31.6 billion total parameters: A substantial model size.
- ~3.6 billion active parameters per token: Thanks to the MoE routing, this dramatically enhances efficiency.
- A hybrid layer stack: Interleaving Mamba-2 layers for efficient long-context handling with Grouped-Query Attention (GQA) Transformer layers for high-accuracy reasoning.
- A learned MLP router: This intelligent router activates 6 out of 128 specialized ‘experts’ for each forward pass, striking a remarkable balance between efficiency and sophisticated reasoning capabilities.
This sophisticated combination allows Nemotron 3 Nano to deliver reasoning quality akin to much larger models, all while maintaining the speed and cost-effectiveness expected of a more lightweight architecture.
The Architecture: A Symphony of Mamba and Transformers
(Imagine a diagram here showing interleaved Mamba-2 and Transformer layers, with a router connecting to multiple expert blocks.)
Nemotron 3 Nano’s architecture seamlessly integrates Mamba-2 layers, known for their efficiency in handling long sequences, with Transformer layers employing Grouped-Query Attention (GQA) for robust reasoning. Crucially, the standard Feed-Forward Network (FFN) layers found in conventional Transformers have been replaced by sparse MoE layers. These MoE layers dynamically activate specific ‘experts’ based on the input, significantly boosting both efficiency and scalability. This is the secret sauce that allows the model to punch above its weight class in terms of performance and cost.
Nemotron 3 Nano is purpose-built for a wide array of demanding applications, including agentic systems, complex reasoning tasks, tool utilization, and conversational AI. Its support for a context length of up to 1 million tokens is a game-changer for any task requiring deep historical context or the ability to process extensive documents.
This release continues the evolution of the Nemotron model family, pushing the boundaries of open, accurate, and efficient models specifically designed for the burgeoning field of AI agent development.
The NVIDIA Nemotron Family: Powering the Future of AI Agents
(Imagine an image showcasing the Nemotron family – Nano, Super, Ultra – with sleek branding and perhaps icons representing their strengths like speed, accuracy, and scale.)
NVIDIA’s Nemotron family of open models is engineered to tackle advanced reasoning and agentic tasks head-on. They consistently deliver leading accuracy while setting new benchmarks for efficiency, making sophisticated AI more accessible and practical for a wider range of applications.
Crafting Nemotron 3 Nano: A Multi-Stage Masterpiece
Building a model like Nemotron 3 Nano is no small feat. NVIDIA employed a sophisticated, multi-stage pipeline that combines massive-scale pre-training, highly specialized supervised fine-tuning (SFT), and cutting-edge reinforcement learning techniques to hone its reasoning abilities and agentic behaviors.
Phase 1: Pre-Training – Laying the Foundation
Nemotron 3 Nano’s journey begins with an immense pre-training phase, utilizing a colossal 25-trillion-token corpus. This colossal dataset is not just vast; it’s meticulously curated. It includes a substantial 2.5 trillion tokens derived from new Common Crawl data, alongside extensive collections of code, mathematical texts, encyclopedic knowledge from Wikipedia, academic papers, and a diverse range of multilingual content (spanning 15 languages).
The pre-training strategy itself is a two-phase approach designed for maximum impact:
- Phase 1: Diversity (The First 94%): This broad and diverse mixture of data is crucial for maximizing the model’s coverage of various domains and its ability to generalize across different types of information.
- Phase 2: Quality (The Final 6%): In the concluding stages, the focus shifts to extremely high-quality sources like Wikipedia. This phase is dedicated to refining accuracy, ensuring consistency, and imbuing the model with a deeper understanding of factual information.
Extending the Horizon: The 1M-Token Context Window
One of the standout features of Nemotron 3 Nano is its ability to handle an astounding 1 million tokens. This context length was achieved through a dedicated continued pre-training (CPT) stage specifically conducted at a 512,000 sequence length. To ensure that performance wasn’t sacrificed on shorter contexts, the training mixture included both 512,000 and 4,000 sequence lengths.
This extended training involved carefully constructed synthetic data designed to enhance critical long-range capabilities. These include:
- Long-range retrieval: The ability to find relevant information within very large documents or datasets.
- Multi-hop reasoning: Connecting disparate pieces of information across multiple steps to arrive at a conclusion.
- Multi-document information aggregation: Synthesizing information from several sources into a cohesive understanding.
NVIDIA is generously releasing a significant portion of these pre-training datasets openly on Hugging Face. These additions contribute 3 trillion new tokens to the Nemotron-Pretraining series, offering enhanced fidelity in areas like code, mathematics, and complex reasoning. Furthermore, advanced synthetic augmentation and annotation pipelines have been employed to increase data density and structure, leading to more efficient training and directly contributing to Nemotron 3 Nano’s exceptional quality.
The team at NVIDIA has learned a crucial lesson: raw quantity of data is meaningless without quality. Their pre-training data strategy continues to emphasize efficiency, incorporating smarter filtration methods, rewriting and improving existing samples, and even rescuing nearly half a trillion tokens of mathematical and code data that previous pipelines might have discarded. This focus on extracting valuable ‘signal’ from the ‘noise’ is what enables smarter, smaller models that are not only cheaper to train and run but also achieve superior accuracy.
Phase 2: Post-Training – Specialization and Refinement
Following the massive pre-training phase, Nemotron 3 Nano undergoes a critical post-training process. This involves three key stages: Supervised Fine-Tuning (SFT), Reinforcement Learning from Verifiable Rewards (RLVR), and Reinforcement Learning from Human Feedback (RLHF). These stages are meticulously designed to specialize the model for agentic workflows, adept tool use, high-quality reasoning, and engaging chat interactions.
Supervised Fine-Tuning (SFT): Mastering Agentic Behavior
NVIDIA’s SFT recipe has been significantly refined from the Nemotron Nano v2 version to better equip the model for complex agentic behaviors. Key improvements include:
- Greater Dataset Diversity: Exposing the model to a wider range of scenarios and interaction styles.
- Higher Data Quality: Ensuring the training data is accurate, consistent, and representative of desired outcomes.
- Explicit Training for Multi-Step and Multi-Turn Reasoning: Directly teaching the model how to handle sequential actions and maintain context across extended conversations.
The SFT process also allows the model to learn its ‘Reasoning ON/OFF’ modes directly from the chat template. When ‘Reasoning ON,’ the model operates in a multi-step mode, preserving and building upon its previous chain-of-thought within a task. When ‘Reasoning OFF,’ it enters a multi-turn mode where reasoning content is not carried over, ensuring more concise and direct responses, which is crucial for many conversational applications.
Nemotron 3 Nano: Throughput and Accuracy Synergy
(Imagine a bar chart comparing Nemotron 3 Nano’s throughput and accuracy against other models like Qwen3-30B and GPT-OSS-20B, highlighting Nano’s superior performance.)
Figure 4 elegantly illustrates how Nemotron 3 Nano achieves its remarkable performance. The hybrid MoE architecture drives exceptional throughput efficiency, while advanced Reinforcement Learning techniques, honed within the NVIDIA NeMo Gym environment, ensure leading accuracy. This synergy is what makes Nemotron 3 Nano a standout in the current AI landscape.
NVIDIA is committed to transparency and community empowerment, releasing the majority of their SFT datasets and codebase openly. The expanded post-training data release further enhances the model’s intelligence. With an addition of 13 million new post-training samples – nearly tripling their previous release – this now stands as the largest openly available post-training corpus by a factor of 2.5. To achieve even higher reasoning accuracy, cross-disciplinary domains such as code, mathematics, physics, and chemistry were blended. This creates novel, multi-step problems that don’t exist in standard scraped web data, enabling the model to reason effectively about questions that bridge different fields – the very nexus where scientific and technical breakthroughs often occur.
Multi-Environment Reinforcement Learning from Verifiable Rewards (RLVR): Mastering Diverse Tasks
Nemotron 3 Nano was simultaneously trained across a multitude of distinct environments. This comprehensive training regimen encompassed diverse areas such as mathematics, coding, question answering, instruction following, multi-step tool use, multi-turn conversations, and structured output generation. The training employed a synchronous Group Relative Policy Optimization (GRPO) algorithm. This multi-environment RLVR stage is critical for ensuring:
- Uniform Improvement: The model’s performance is enhanced consistently across all trained domains.
- Reduced Overfitting: By training on a wide array of benchmarks, the model avoids becoming overly specialized to any single task.
- Reliable Agentic Behavior: It fosters more dependable and predictable performance in real-world, complex workflows.
The AI Training Gym: NeMo Gym Opens Doors
(Imagine an image symbolizing a gym, but with AI models and code instead of weights and equipment, with the NeMo Gym logo prominently displayed.)
Models don’t just learn from textbooks; they need a dynamic training ground – a ‘gym’. NVIDIA stands out as one of the few open model providers that not only releases reinforcement learning datasets but also the very environments used to train them. This empowers developers to rigorously test their agents, identify and capture critical edge cases, and proactively prevent model drift over time.
This latest release introduces over 10 new RL environments. These environments cover a wide spectrum, from competitive coding challenges and advanced mathematical problems to realistic calendar scheduling simulations. Crucially, NVIDIA is open-sourcing all the essential RLVR infrastructure – the environments themselves, along with their associated datasets and the code used to build and scale them. These components form the backbone of the new NVIDIA NeMo Gym library, a powerful tool designed for scalable RL environment construction.
Training at scale is facilitated by NVIDIA NeMo RL, their high-performance RL library, which enables efficient and advanced reinforcement learning training pipelines.
Reinforcement Learning Using Human Feedback (RLHF): Polishing Conversational Skills
To further elevate the model’s conversational quality, a generative reward model (GenRM) was trained using GRPO on the Qwen3-235B-A22B model. This GenRM is designed to analyze conversations. Given a conversation history, a new user query, and two potential assistant responses, it explicitly reasons about the strengths and weaknesses of each response. It then assigns individual helpfulness scores and generates a relative ranking between the two candidates. These valuable reward signals are subsequently utilized in an RLHF stage to enhance Nemotron 3 Nano’s helpfulness, coherence, correctness, and overall chat experience.
The culmination of this sophisticated post-training pipeline – SFT, RLVR, and RLHF – results in the final, highly capable Nemotron 3 Nano 30B-A3B model.
As AI models evolve into sophisticated multi-step agents that can interact with tools, they inevitably encounter entirely new safety and security challenges. To support responsible deployment, NVIDIA is releasing an agentic safety dataset. This dataset features nearly 11,000 labeled traces from realistic, tool-using workflows, providing developers with the essential data needed to evaluate, diagnose, and mitigate potential safety risks before agentic systems are deployed in production.
The Need for Better RL Infrastructure
During the development of Nemotron 3, the limitations of existing Reinforcement Learning tooling became starkly apparent. Training large, complex reasoning models with RL presents significant hurdles:
- Complexity of Orchestration: Managing multi-step rollouts can be incredibly intricate.
- Brittle Tool Integrations: Integrating external tools often proves to be a fragile process.
- Conflicting Logic: Orchestration logic can sometimes clash with the core training loop design.
- Data Collection Bottlenecks: Gathering rollout data at scale is a slow and demanding undertaking.
- Proprietary Environments: Most high-quality RL environments are kept as closed, proprietary systems, limiting accessibility.
As a consequence, meaningful RL training has historically been the exclusive domain of major AI research labs.
NeMo Gym: Democratizing Reinforcement Learning
To dismantle these barriers, NVIDIA developed NeMo Gym, an open-source, standardized library for building and scaling RL environments. NeMo Gym is the engine that powers the reinforcement learning pipelines used in Nemotron 3 Nano. Now, it offers developers:
- Ready-to-Use Environments: A collection of pre-built RL environments covering math, code, tool use, multi-turn reasoning, and agentic workflows.
- Custom Environment Creation: The ability to build bespoke RL environments with verifiable reward logic.
- Ecosystem Interoperability: Seamless integration with NeMo RL and other popular training frameworks like TRL, Unsloth, and VerRL (underway).
- High-Throughput Rollout Orchestration: Enabling large-scale RL training by efficiently managing the data collection process.
- A Practical Pathway for RL: Providing a clear and accessible method for researchers and developers to perform RL on their own models.
NeMo Gym is more than just a library; it’s a flexible open-source platform designed for building and executing RL training environments. It’s an integral part of the broader NVIDIA NeMo software suite, which supports end-to-end model training. NeMo Gym provides the necessary infrastructure for designing, running, and scaling complex RL environments. Having been rigorously tested during the development of the entire Nemotron 3 model family, NeMo Gym includes core environment development infrastructure, a growing catalog of ready-to-use training environments complete with their RLVR datasets, and tight integration with NeMo RL – NVIDIA’s high-performance and efficient RL training engine. NeMo RL supports advanced RL training algorithms, end-to-end FP8 training, and asynchronous RL.
With NeMo Gym, teams can rapidly assemble environments using modular server components and templates. They can integrate external tools, systems, or databases and orchestrate complex long-context, multi-step, multi-turn rollouts. This modularity allows training environments to be iterated upon and shared independently of the training loop, fostering agility and collaboration.
The NeMo Gym Training Loop Integration
(Imagine a flowchart illustrating the NeMo Gym architecture. It would show the RL Training Framework sending requests to NeMo Gym, which then interacts with an Agent Server, Policy Model Server, and External Resources Server. Trajectories and rewards are sent back to the framework.)
Figure 6 details how NeMo Gym fits into the RL training loop. The RL training framework (such as NeMo RL) sends task prompts to NeMo Gym, which operates as a collection of independent HTTP services. Within NeMo Gym, an agent server orchestrates the rollouts by coordinating the policy model server (responsible for generation) and the external resources server (handling tools and rewards). NeMo Gym then returns the resulting model trajectories and rewards back to the training framework, which uses this data to update and refine the policy model.
By decoupling RL environments from the RL training frameworks, NeMo Gym achieves seamless compatibility with numerous popular frameworks, including NeMo RL. It supports high-throughput, concurrent rollout collection, and enables large-scale distributed RL training. This clear separation of concerns makes scaling RL workflows and adapting environments as training objectives evolve remarkably straightforward.
To accelerate experimentation, NeMo Gym comes equipped with an expanding RL Hub – a curated catalog of ready-to-use, domain-specific environments. Developers can leverage these environments immediately or extend them to suit their specific needs. Current domains include mathematics, coding, instruction following, multi-step tool use, and multi-turn structured conversations. Practitioners can fine-tune models using these environments right out of the box, contribute their own creations, or reuse valuable community contributions.
Get Started with Nemotron 3 Nano Today!
Nemotron 3 Nano (30B A3B) delivers state-of-the-art accuracy within an exceptionally cost-efficient package. It offers up to 3.3x higher throughput compared to leading open-source models of similar size (as highlighted in Figure 1). Critically, it supports a massive 1 million-token context window, demonstrating strong performance on long-context reasoning benchmarks.
Engineered for high-volume, real-time execution, Nemotron 3 Nano excels in demanding tasks like mathematics and coding, intricate multi-step tool calling, and dynamic multi-turn agentic workflows. It also retains the familiar Nemotron ‘Thinking ON/OFF’ modes and ‘Thinking Budget’ controls, giving developers precise control over how the model allocates its computational resources for each task.
With this release, NVIDIA is also introducing NeMo Gym, providing ready-to-use training environments developed during the Nemotron 3 training process, along with the infrastructure to build your own training environments and scale rollout collection.
What’s available to the community?
- Full Model Weights: The complete Nemotron 3 Nano model.
- Complete Training Recipe: Detailed instructions and code for SFT, RLVR, and RLHF.
- Most Datasets: Access to the pre-training and post-training datasets used throughout the training pipeline.
- Training Frameworks: The foundational frameworks that power Nemotron 3.
Essentially, everything needed to study, reproduce, or extend this advanced model is now openly accessible.
Ready to dive in? Here’s how to get started with Nemotron 3 Nano:
- Download the Model: Available now on Hugging Face.
- Try Hosted Endpoints: Run instant queries on OpenRouter or build.nvidia.com.
- Deploy at Scale: Utilize our specialized cookbooks for vLLM, TRT-LLM, and SGLang.
- Experiment and Deploy on Edge Devices: Available for edge devices like NVIDIA RTX AI PCs and Workstations, and DGX Spark via Llama.cpp, LM Studio, and Unsloth.
For an in-depth exploration of the architecture, datasets, and comprehensive benchmarks, be sure to read the full Nemotron 3 Nano Technical Report.