Google today released Gemini 3.1 Pro, a significant update to its flagship artificial intelligence model, introducing a new tiered reasoning system designed to mimic the capabilities of its specialized Deep Think tool. The release marks a departure from Google’s previous update cadence, signaling a shift towards more frequent, incremental improvements to its AI offerings.
Gemini 3.1 Pro introduces three levels of adjustable “thinking” – low, medium, and high – allowing developers to dynamically scale the model’s reasoning effort based on the complexity of the task. Previously, Gemini 3 Pro offered only low and high reasoning modes. The new medium setting is comparable to the previous high mode, even as the revamped “high” setting effectively transforms the model into a “mini version of Gemini Deep Think,” according to Google.
The update is particularly aimed at enterprise users, offering a way to manage computational costs and response times. Instead of routing different requests to specialized models, organizations can now utilize a single model endpoint and adjust the reasoning depth as needed. Simple tasks, like document summarization, can be handled with the “low” setting for quick responses, while complex analytical challenges can leverage the “high” setting for more in-depth reasoning.
Google’s benchmarks indicate substantial performance gains with Gemini 3.1 Pro, particularly in areas requiring abstract reasoning and agentic capabilities. On the ARC-AGI-2 benchmark, which assesses a model’s ability to solve novel abstract reasoning problems, Gemini 3.1 Pro achieved a score of 77.1%, more than doubling the 31.1% achieved by Gemini 3 Pro. This result likewise surpasses scores from Anthropic’s Sonnet 4.6 (58.3%), Opus 4.6 (68.8%), and OpenAI’s GPT-5.2 (52.9%).
Improvements also extend to academic reasoning, with Gemini 3.1 Pro scoring 44.4% on Humanity’s Last Exam, a rigorous academic reasoning benchmark, compared to 37.5% for Gemini 3 Pro. The model also demonstrated strong performance on scientific knowledge evaluation, achieving 94.3% on GPQA Diamond, outperforming listed competitors.
The gains are particularly notable in agentic benchmarks, which measure a model’s ability to perform complex, multi-step tasks. On Terminal-Bench 2.0, evaluating agentic terminal coding, Gemini 3.1 Pro scored 68.5% versus 56.9% for its predecessor. Similarly, on MCP Atlas, a benchmark measuring multi-step workflows, the new model reached 69.2%, a 15-point improvement over Gemini 3 Pro. Agentic web search capability, as measured by BrowseComp, also saw a significant boost, with Gemini 3.1 Pro achieving 85.9% compared to 3 Pro’s 59.2%.
The decision to release Gemini 3.1 Pro as a “point one” update, rather than a full version launch, signals a strategic shift for Google. The company previously employed a preview-based release cycle for its Gemini models. According to Google, the 3.1 Pro update builds directly on lessons learned from the Gemini Deep Think series, incorporating techniques from both earlier and more recent versions. Reinforcement learning appears to have played a key role in the improvements, particularly in reasoning, coding, and agentic tasks.
Gemini 3.1 Pro is currently available in preview through the Gemini API via Google AI Studio, Gemini CLI, Google’s agentic development platform Antigravity, Vertex AI, Gemini Enterprise, Android Studio, the consumer Gemini app, and NotebookLM. Google has indicated it will continue to refine agentic workflows before a general availability launch.