NEW Gemini Agentic Vision Update is INSANE! 🤯
8:21

NEW Gemini Agentic Vision Update is INSANE! 🤯

Julian Goldie SEO 29.01.2026 14 257 просмотров 281 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Want to make money and save time with AI? Get AI Coaching, Support & Courses 👉 https://www.skool.com/ai-profit-lab-7462/about Get a FREE AI Course + 1000 NEW AI Agents + Video Notes 👉 https://www.skool.com/ai-seo-with-julian-goldie-1553/about Want to know how I make videos like these? Join the AI Profit Boardroom → https://www.skool.com/ai-profit-lab-7462/about Get a FREE AI SEO Strategy Session: https://go.juliangoldie.com/strategy-session?utm=julian Sponsorship inquiries:  https://docs.google.com/document/d/1EgcoLtqJFF9s9MfJ2OtWzUe0UyKu1WeIryMiA_cs7AU/edit?tab=t.0 Gemini Just Learned to THINK with its Eyes! (New Update) Google's new Gemini Agentic Vision update introduces a revolutionary 'Think, Act, Observe' loop that allows the AI to investigate images using real Python code. Discover how this shift from passive guessing to verifiable reasoning is eliminating hallucinations and transforming visual data analysis. 00:00 - Intro: Gemini’s New Vision 00:30 - Active vs. Passive AI Vision 01:27 - The Core: Reasoning + Code 02:07 - The Think, Act, Observe Loop 03:54 - Real-World Use Case Examples 04:59 - Visual Math & Data Analysis 05:43 - The Future of Agentic AI 06:34 - How to Use Agentic Vision Today

Оглавление (8 сегментов)

  1. 0:00 Intro: Gemini’s New Vision 88 сл.
  2. 0:30 Active vs. Passive AI Vision 181 сл.
  3. 1:27 The Core: Reasoning + Code 126 сл.
  4. 2:07 The Think, Act, Observe Loop 371 сл.
  5. 3:54 Real-World Use Case Examples 189 сл.
  6. 4:59 Visual Math & Data Analysis 135 сл.
  7. 5:43 The Future of Agentic AI 138 сл.
  8. 6:34 How to Use Agentic Vision Today 355 сл.
0:00

Intro: Gemini’s New Vision

New Gemini agentic vision update is insane. Gemini just learned to think with its eyes and this changes everything. Google just dropped something wild vision that actually investigates instead of guessing. This is huge. All right, so Google just crossed a massive line with Gemini. And I need to tell you about this because it's actually insane. Until now, every vision model you've used worked like this. Look at image once, guess what's in it, give you an answer. Done. That's it. But Gemini just changed the entire
0:30

Active vs. Passive AI Vision

game. With Gemini 3 Flash, vision became something completely different. It's not passive anymore. It's active. What does that mean? Instead of looking once and guessing, Gemini now looks, thinks, acts, observes, and repeats until it gets the right answer. This is vision plus agents plus tools all working together. And the results are actually mind-blowing. Let me break down what's happening here because this is where it gets crazy. Gemini doesn't just see images anymore. It investigates them. It can zoom into details. It can crop specific areas. It can draw on images. It can run actual calculations. It can create charts. And it does all of this using real code execution. This is not guessing. This is not hallucinating. This is actual verifiable reasoning. Hey, if we haven't met already, I'm the digital avatar of Julian Goldie, CEO of SEO agency, Goldie Agency. Whilst he's helping clients get more leads and customers, I'm here to help you get the latest AI updates. Julian Goldie reads every comment, so make sure you comment below. So here's the core idea that
1:27

The Core: Reasoning + Code

makes this so powerful. Agentic vision equals visual reasoning plus Python code execution. Let me say that again. Gemini sees something, then it writes codes to interact with what it sees, then it looks at the result, then it does it again if needed. This loop is what makes it so accurate. The old way was simple. You upload an image. AI looks at it. AI guesses what's there. Sometimes it's right. Sometimes it hallucinates. Sometimes it misses details. Sometimes it makes up numbers. you know how frustrating that is. The new way is completely different. Gemini looks at your image. It thinks about what it needs to do. It plans multiple steps. Then it actually executes those steps using code. Here's how the loop works.
2:07

The Think, Act, Observe Loop

First is the think stage. Gemini analyzes your question and the image. Then it plans what to do. Like if you ask it to count something small, it might think, okay, I need to zoom in here, crop that area, then count the objects pixel by pixel. That's planning. That's not guessing. Next is the axe stage. This is where it gets wild. Gemini writes Python code and runs it right there in real time. It can crop images. It can rotate them. It can draw boxes around objects. It can count things. It can extract data from tables. It can run math. It can make graphs. This is actual code running, not a language model pretending to do math. Real code, real results. Then comes the observe stage. After Gemini transforms the image, it adds that new image back into its own context. Then it looks at it again. It examines the new visual data. It refineses what it thinks and then it gives you a final answer that's grounded in actual proof and if it needs to, it repeats the whole loop again. Now, let me tell you why this matters so much. Traditional vision models fail at specific things. They miss tiny details. They hallucinate numbers constantly. They can't do multi-step visual problems. They just guess and hope they're right. Aentic Vision fixes all of that. Gemini can now inspect serial numbers that are super small. It can read street signs that are far away. It can count objects one by one instead of estimating. It can show you visual proof of its answer. This is a massive shift. We went from probabilistic guessing to verifiable reasoning. That's huge. And if you want to learn how to use tools like Google Gemini to automate your entire business and save hundreds of hours, you need to join my AI profit boardroom. This is where I teach you how to scale your business, get more customers, and save tons of time with AI automation just like this. The link is in the description. Trust me or you don't want to miss this because what I'm about to show you next is going to change how you think about AI vision.
3:54

Real-World Use Case Examples

Let me give you some real examples so you can see how powerful this is. First one is zooming and inspection. Gemini 3 flash automatically zooms into fine details. Now it crops specific regions to look closer. There's a real company called plan check solver. They use AI to validate building plans. They switched to Gemini with a Gentic Vision. The result, 5% accuracy improvement. That might not sound like a lot, but in their business, that's massive. What was Gemini doing? It was repeatedly cropping roof edges and building sections. Then it verified if they matched complex building codes. This is true agentic behavior. It's not just describing an image. It's investigating it like a detective. Second example is image annotation. This is like a visual scratch pad. Instead of saying, "I think there are five fingers," Gemini draws bounding boxes around each finger. It labels them with numbers, then it counts them visually. This makes mistakes way harder to make. Think about it like chain of thought, reasoning, but visual. You can see the AI's work. You can verify it yourself. Third example is visual math and
4:59

Visual Math & Data Analysis

plotting. This one is crazy powerful. Gemini can read dense tables from images. Then it normalizes the data. Then it runs calculations on it. Then it generates actual charts using Mattplot Lib. Why does this matter? Because normal language models hallucinate math all the time. They just make up numbers. Gemini offloads the computation to actual Python code. So the results are verifiable. They're not guessed. They're computed. This is perfect for research, for engineering, for data analysis. Anytime you need to trust the numbers, this is the tool. And here's the performance boost you need to know about. With code execution turned on, Gemini 3 Flash gets a consistent 5 to 10% improvement across most vision benchmarks. That's not marketing talk. That's real measurable performance. Let me tell you where this is going next
5:43

The Future of Agentic AI

because it gets even crazier. Google already hinted at what's coming. First, more implicit behaviors. Right now, zooming happens automatically most of the time, but rotating images or doing math sometimes needs you to prompt it. Soon, all these behaviors will trigger automatically. The AI will just know what to do. Second, more tools are coming. They're planning to add web search, reverse image search, external grounding tools. This turns Gemini into a real world perception agent. Imagine uploading a photo and Gemini not only analyzes it, but also searches the web for similar images, finds context, and gives you a complete answer. That's the future. Third, more model sizes. Right now, this only works on Gemini 3 flash, but it's coming to larger Gemini models soon, possibly even mobile optimized versions. So, you could have this power
6:34

How to Use Agentic Vision Today

on your phone. Now, let me show you how to try this yourself today. You can access a Gentic Vision right now. It's available in Google AI Studio. It's in the Gemini API. It's in Vert. Ex AI and it's in the Gemini app. If you're using the Gemini app, just select thinking from the model dropdown. Here's how to enable it in AI Studio. Open the playground, turn on code execution under tools, upload an image, then ask multi-step visual questions. That's it. There's also a demo app that showcases all these behaviors if you want to see it in action first. So what does this mean for you? If you work with images, this changes everything. If you do research, you can now trust AI to analyze visual data accurately. If you're an engineering, you can validate technical drawings and plans. If you create content, you can analyze competitor images and understand what's working. Look, here's the bottom line. Gemini Aentic Vision is not just an incremental update. It's a paradigm shift. It's the difference between an AI that looks and an AI that investigates. between an AI that guesses and an AI that proves, between an AI that describes and an AI that acts. And if you want to learn how to use tools like Google Gemini to automate your entire business and save hundreds of hours, you need to join my AI profit boardroom. This is where I teach you how to scale your business, get more customers, and save tons of time with AI automation just like this. The link is in the description. And if you want the full process, SOPs, and 100 plus AI use cases like this one, join the AI success lab links in the comments and description. You'll get all the video notes from there, plus access to our community of 38,000 members who are crushing it with AI. This is the best free AI community out there, and you need to be in it. All right, thanks for watching. Hit the like and subscribe button and I will see you in the next ones.

Ещё от Julian Goldie SEO

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться