Podcast Episode

Google Teaches AI to Look Twice with Agentic Vision

January 28, 2026

Audio archived. Episodes older than 60 days are removed to save server storage. Story details remain below.

Google has unveiled Agentic Vision for Gemini 3 Flash, a groundbreaking feature that transforms how AI models analyse images. Instead of processing visuals in a single glance, the AI can now actively zoom in, crop, annotate, and re-examine images through code execution, delivering five to ten percent accuracy improvements across vision benchmarks.

A New Era for AI Vision

Google has introduced Agentic Vision for its Gemini 3 Flash model, marking a fundamental shift in how artificial intelligence processes and understands images. Traditional AI models analyse images in a single pass, often missing fine details like serial numbers on microchips or distant street signs. When this happens, they simply guess.

The Think, Act, Observe Loop

Agentic Vision transforms this passive approach into an active investigation. The system operates through a three-phase loop: during the Think phase, the model analyses the query and image to formulate a multi-step plan. In the Act phase, it generates and executes Python code to manipulate images through cropping, rotating, or annotating. Finally, the Observe phase appends the transformed image back into the model's context window for further inspection.

Real-World Impact

The technology is already proving its worth in practical applications. PlanCheckSolver, a platform that validates building plans against complex codes, has improved its accuracy by five percent using Agentic Vision. The AI can now iteratively inspect high-resolution architectural inputs, cropping and analysing specific sections like roof edges or building facades to confirm compliance.

Beyond Simple Looking

The capability enables the model to draw bounding boxes and numeric labels over objects it identifies, creating what Google calls a visual scratchpad. For visual mathematics involving high-density tables, the model offloads computation to a deterministic Python environment rather than relying on probabilistic guessing.

Availability and Future Plans

Agentic Vision is available now through the Gemini API in Google AI Studio and Vertex AI, with rollout beginning in the Gemini app. Google is working to make behaviours like image rotation and visual mathematics fully implicit in future updates, and is exploring additional tools including web and reverse image search.

Published January 28, 2026 at 12:46am

Google Teaches AI to Look Twice with Agentic Vision

A New Era for AI Vision

The Think, Act, Observe Loop

Real-World Impact

Beyond Simple Looking

Availability and Future Plans

More Recent Episodes

Anthropic's Amodei Siblings Both Named to TIME's 2026 Most Influential List

TSMC 2nm Supply Crunch Forces Smartphone Chip Downgrades as Memory Crisis Deepens

European Regulators Sound Alarm Over Anthropic's Mythos AI Cyber Capabilities