Podcast Episode
Google Teaches AI to Look Twice with Agentic Vision
January 28, 2026
Audio archived. Episodes older than 60 days are removed to save server storage. Story details remain below.
Google has unveiled Agentic Vision for Gemini 3 Flash, a groundbreaking feature that transforms how AI models analyse images. Instead of processing visuals in a single glance, the AI can now actively zoom in, crop, annotate, and re-examine images through code execution, delivering five to ten percent accuracy improvements across vision benchmarks.
A New Era for AI Vision
Google has introduced Agentic Vision for its Gemini 3 Flash model, marking a fundamental shift in how artificial intelligence processes and understands images. Traditional AI models analyse images in a single pass, often missing fine details like serial numbers on microchips or distant street signs. When this happens, they simply guess.The Think, Act, Observe Loop
Agentic Vision transforms this passive approach into an active investigation. The system operates through a three-phase loop: during the Think phase, the model analyses the query and image to formulate a multi-step plan. In the Act phase, it generates and executes Python code to manipulate images through cropping, rotating, or annotating. Finally, the Observe phase appends the transformed image back into the model's context window for further inspection.Real-World Impact
The technology is already proving its worth in practical applications. PlanCheckSolver, a platform that validates building plans against complex codes, has improved its accuracy by five percent using Agentic Vision. The AI can now iteratively inspect high-resolution architectural inputs, cropping and analysing specific sections like roof edges or building facades to confirm compliance.Beyond Simple Looking
The capability enables the model to draw bounding boxes and numeric labels over objects it identifies, creating what Google calls a visual scratchpad. For visual mathematics involving high-density tables, the model offloads computation to a deterministic Python environment rather than relying on probabilistic guessing.Availability and Future Plans
Agentic Vision is available now through the Gemini API in Google AI Studio and Vertex AI, with rollout beginning in the Gemini app. Google is working to make behaviours like image rotation and visual mathematics fully implicit in future updates, and is exploring additional tools including web and reverse image search.Published January 28, 2026 at 12:46am