NEW Gemini 2.5 Computer Use AI Agents are INSANE (FREE!) 🤯

16:15

NEW Gemini 2.5 Computer Use AI Agents are INSANE (FREE!) 🤯

Julian Goldie SEO 12.10.2025 11 545 просмотров 198 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Want to get more customers, make more profit & save 100s of hours with AI? https://go.juliangoldie.com/ai-profit-boardroom Get a FREE AI Course + Community +1,000 AI Agents + video notes + links to the tools 👉 https://www.skool.com/ai-seo-with-julian-goldie-1553/about 🤖 Need AI Automation Services? Book a FREE AI Discovery Session Here: https://juliangoldieaiautomation.com/ 🚀 Get a FREE SEO strategy Session + Discount Now: https://go.juliangoldie.com/strategy-session 🤯 Want more money, traffic and sales from SEO? Join the SEO Elite Circle👇 https://go.juliangoldie.com/register Click below for FREE access to ✅ 50 FREE AI SEO TOOLS 🔥 200+ AI SEO Prompts! 📈 FREE AI SEO COMMUNITY with 2,000 SEOs ! 🚀 Free AI SEO Course 🏆 Plus TODAY's Video NOTES... https://go.juliangoldie.com/chat-gpt-prompts FREE AI SEO Skool Group: 🚀 Want to rank #1 and make more money with SEO? - Join here → https://www.skool.com/ai-seo-mastermind-group-3510/about - Join our FREE AI SEO Accelerator here: https://www.facebook.com/groups/aiseomastermind

Оглавление (4 сегментов)

Segment 1 (00:00 - 05:00)

New Gemini 2. 5 computer use AI agents are insane. Google just dropped something absolutely nuts. And I'm talking about AI that can actually use your computer. Not just talk about it or write code, but actually click buttons, type text, and fill out forms. This is the kind of stuff that makes regular automation look like child's play. And if you've been waiting for AI to actually do the work instead of just helping you think about it, this is your moment. I'm about to show you exactly how it works and why this changes everything. All right, let's get into this because Google just released the Gemini 2. 5 computer use model and this thing is genuinely different from anything we've seen before. Most AI models can write code, answer questions, and generate images, but they can't actually interact with software the way humans do until now. This model can control user interfaces, navigate websites, click buttons, type into forms, scroll through pages, and submit information, which is basically like having an assistant that can actually touch your screen. Here's the crazy part about all of this. It's built on Gemini 2. 5 Pro, so it has insane visual understanding and reasoning capabilities, which means it can see your screen, understand what's on it, and decide what to do next. Google released this through the Gemini API, and you can access it in Google AI Studio and Vert. Ex AI, and both are free to start testing right now. Now, let me tell you why this actually matters in the real world. Right now, most AI tools need structured APIs to work, which means someone has to build a technical connection between the AI and the software, and it's complicated and doesn't work for everything. But graphical user interfaces are everywhere on every website, every app, and every form you fill out online. And they're designed for humans, not for APIs. So, if AI can't interact with these interfaces directly, it can't do a huge chunk of real work that needs to get done. That's exactly what this solves in a massive way. The Gemini 2. 5 computer use model can fill out forms, use drop downs, apply filters, navigate pages, and even work behind login, which is how you build generalpurpose AI agents that can do tasks the way you would actually do them yourself. Let me explain how it actually works under the hood. The model uses something called the computer use tool, which is part of the Gemini API, and it runs in a continuous loop. Here's the process from start to finish. You give it a request, something like, "Go to this website and fill out this form. " And then you send it a screenshot of your screen, so the model can see what's currently on your screen right now. You also send it a history of recent actions so it knows what it just did, which helps it stay on track throughout the entire workflow. The model analyzes all of this information together, figures out what to do next, and then sends back a function call, which is an action like clicking a button or typing text into a field. Sometimes the model will ask for confirmation, especially for high stakes actions like making a purchase or sending an email, which is a smart safety feature. Your code executes the action, then takes a new screenshot and sends it back to the model, and the loop continues action after action until the task is completely done. This is how the model can complete multi-step workflows automatically because it's not just one action, it's dozens or sometimes even hundreds of actions in sequence. The model is primarily optimized for web browsers, but it also works on mobile UIs, though it's not great for desktop OS level control yet, but that's probably coming soon. Now, let me show you what this thing can actually do with real examples. Google shared some demos, and they're absolutely wild. The first demo had this prompt. From this pet care signup form, get all details for any pet with a California residency and add them as a guest in my spa CRM. then set up a follow-up visit appointment with the specialist for October 10th, anytime after 8:00 a. m. And the reason for the visit is the same as their requested treatment. That's a genuinely complex task with multiple steps, multiple websites, data entry, and appointment booking all combined together. The model did it completely automatically. It navigated to the form, found the California pets, copied their details, went to the CRM, added them as guests, and then set up the appointment with the right specialist at the right time with the right reason. All without any human input after the initial prompt. The second demo had this prompt. My art club brainstormed tasks ahead of our fair. The board is chaotic, and I need your help organizing the tasks into some categories I created. So go to this sticky note app and ensure notes are clearly in the right sections and drag them there if not. The model went to the app, looked at the board, identified which notes were in the wrong sections, dragged them to the right places and organized everything perfectly. No human input needed, just the initial prompt and the AI figured out the rest. This is insane because these aren't simple tasks at all. These require visual understanding, reasoning, multi-step planning, and precise execution. and the model does it faster than humans can. Speaking of speed, let's talk about performance and how this stacks up against other models. If you want step-by-step tutorials and 100 plus AI use cases, you can check out the AI money lab link in the comments and description. We have 28,000 plus members, and we give away free trainings

Segment 2 (05:00 - 10:00)

and SOPs every day. Inside the school feed, you get video notes, checklists, and access to all the trainings. and it's the best place to learn AI automation with a massive community. Link in the comments and description. Google tested this model on multiple benchmarks including web control benchmarks and mobile control benchmarks and it outperformed every leading alternative on the market. On the browserbased harness for online Mind2 Webb, Gemini 2. 5 computer use had the highest accuracy and the lowest latency combined. Lower latency means faster responses and faster responses mean faster task completion which is critical for real world use. Some of the other models were slower, some were less accurate, but Gemini beat them on both metrics at the same time. This isn't just Google saying it either because browser base ran their own independent evaluations and third parties confirmed the results. So, the model is genuinely legit. Now, let's talk about safety because this is actually super important. AI agents that control computers are incredibly powerful, but they're also risky if not handled correctly. There are three main risks you need to understand. First is intentional misuse by users, where someone could try to use the model to do something harmful like hack into systems or bypass security measures. Second is unexpected model behavior, where the model might do something you didn't intend because it misunderstood the task or made a mistake along the way. Third is prompt injections and scams where malicious content on websites could try to trick the model by injecting commands or showing fake information. Google built safety features directly into the model to address all three of these risks from the ground up. They also give developers safety controls to prevent misuse. There's a perstep safety service which is an outofod system that checks every action before it's executed and if the action looks risky, it stops it immediately. There are also system instructions where developers can tell the model to refuse certain actions or ask for user confirmation before doing them. For example, the model won't autocomplete actions that harm system integrity, compromise security, bypass captures, or control medical devices, which are all critical safety boundaries. These guardrails are absolutely critical because without them, this technology could be genuinely dangerous in the wrong hands. Google also published a full system card that explains all the safety measures in detail and gives developers best practices to follow. But they're very clear that developers need to test their systems thoroughly before launching anything to production because the safeguards reduce risk, but they don't eliminate it completely. Now, let's talk about who's already using this in the real world. Google teams have deployed the model to production for UI testing, which makes software development way faster than traditional methods. The model can automatically test user interfaces, find bugs, and report issues without human testers needing it to manually click through everything. The model is also powering Project Mariner, which is Google's experimental AI agent project, and it's powering the Firebase testing agent and some features in AI mode in search. But it's not just Google using it internally. Early access users are testing the model 2 for personal assistance, workflow automation, and UI testing, and they're seeing real results that matter. One company is Poke and they build a proactive AI assistant for iMessage, WhatsApp and SMS with multiple third party energetic workflows. They said that a lot of their workflows require interacting with interfaces meant for humans where speed is especially important. And Gemini 2. 5 computer use is far ahead of the competition, often being 50% faster and better than the next best solutions they've considered. Another company is AutoTab and they build AI agents that run fully autonomously performing work where small mistakes in collecting and passing data are completely unacceptable. They said Gemini 2. 5 computer use outperformed other models at reliably passing context in complex cases increasing performance by up to 18% on their hardest evaluations. Google's payments platform team used the model as a contingency mechanism to address fragile end-to-end UI tests that contributed to 25% of all test failures. They said that when conventional scripts encounter failures, the model assesses the current screen state and autonomously ascertains the required actions to complete the workflow. And this implementation now successfully rehabilitates over 60% of executions that used to take multiple days to fix manually. So, what can you actually do with this in your own business or workflow? Let's get super practical here. You can automate data entry for forms, spreadsheets, and CRM. Anywhere you're manually typing information, the model can do it for you automatically. You can automate workflows that involve multi-step processes across multiple websites or apps where the model can navigate through them, complete each step, and finish the entire task from start to finish. You can build personal assistants that can actually do things not just answer questions but book appointments, submit forms, and manage tasks in real applications. You can

Segment 3 (10:00 - 15:00)

automate UI testing for software development where the model can test your interfaces, find bugs, and report issues faster than human testers ever could. Like you can automate research where the model can navigate websites, collect information, organize it, and save it in a structured format. The possibilities are genuinely huge here. And the best part is that it's free to start testing right now. You can access the Gemini API through Google AI Studio or through Vert. Ex AI and both have free tiers available. Google AI Studio is the easiest option because it's a web-based interface where you can start building with the API right away without any complex setup. Now, let's talk about the bigger picture of what this means for AI. This is a genuinely huge step forward for AI agents overall. For years, we've been talking about AI agents that can complete tasks autonomously and work like employees. But most agents have been severely limited in what they can actually do. They can answer questions, generate content, and write code, but they can't interact with the tools we use every day in our actual workflows. This changes that completely with computer use capabilities. Agents can do real work by using websites, apps, and software just like humans do. And this is just the beginning of what's possible. Right now, the model is optimized for web and mobile, but desktop OS level control is coming next. So, imagine an agent that can control your entire computer, open apps, manage files, and run programs completely autonomously. That's the future, and it's closer than most people think. Google isn't the only company working on this either, because Anthropic released a computer use model earlier this year, and OpenAI is probably working on something similar behind the scenes. This is the next frontier for AI overall with computer use, agentic workflows, and autonomous task completion becoming the new standard. The companies that adopt this technology early will have a massive competitive advantage because automation is no longer about coding scripts manually. It's about giving AI a task and letting it figure out how to complete it on its own. Let me give you some real world use cases broken down by type. For businesses, you can automate customer on boarding where the model navigates your CRM, fills out customer information, sets up accounts, and sends welcome emails all automatically. You can automate data collection, where the model scrapes websites, collects competitor pricing, monitors reviews, and organizes everything into spreadsheets without manual work. You can automate reporting where the model pulls data from multiple sources, generates reports, and sends them to stakeholders on a schedule. For agencies, you can automate client reporting where the model accesses analytics platforms, pulls performance data, creates reports, and sends them to clients without you touching anything. You can automate outreach, where the model navigates LinkedIn, finds prospects, sends connection requests, and follows up based on your criteria. For individuals, you can automate job applications, where the model feels out forms, uploads résumés, and submits applications to multiple companies. You can automate research where the model navigates websites, collects information, and summarizes findings into a clean document. You can automate scheduling where the model accesses calendars, finds available times, and books appointments with the right people. The use cases are genuinely endless, and we're only scratching the surface. And the best part is you don't need to be a developer to use this because the Gemini API is accessible and the documentation is clear enough that you can start building today. Now, here's what you need to know about current limitations. First, the model is optimized for web and mobile, but desktop OS level control isn't there yet, though it's probably coming soon. Second, the model sometimes needs confirmation for high stakes actions, so it's not fully autonomous for everything right now. Third, the model can make mistakes, especially on complex tasks. So, you need to monitor it, test it, and make sure it's doing what you actually expect. Fourth, safety guardrails might block certain actions, even if they're legitimate. So you might need to adjust your approach or provide confirmation, but these limitations are honestly minor compared to what the model can already do right now. And Google is actively improving it with future versions that will be better, faster, and more capable. Now, let's talk about competition in this space. Anthropic released a computer use model earlier this year called Clawed Computer Use, and it works similarly with screenshots, actions, and loops. But based on the benchmarks, Gemini 2. 5 computer use is faster and more accurate overall. OpenAI hasn't released a computer use model yet, but they're almost certainly working on it behind the scenes. This is going to be a major feature for all AI companies moving forward because it's the next logical step in AI evolution. We're going from chat bots to agents, from assistants to actual workers that can complete tasks. And the companies that nail computer use will dominate the AI market over the next few years. Right now, Google is leading with Gemini 2. 5 computer use, but the race is just getting started and things are going to move fast. Now, here's what you should do next to take action. First, go test the model yourself by getting access to Google AI Studio and trying simple tasks

Segment 4 (15:00 - 16:00)

to see what it can actually do. Second, think about your own workflows and identify where you're doing repetitive tasks where you're manually clicking and typing because those are perfect opportunities for automation. Third, start building by using the Gemini API to build agents, automate tasks, and save yourself massive amounts of time. And if you want help scaling your business with AI automation, check out the AI Profit Boardroom where we have 1,000 members, and it's the best place to learn how to get more customers and save hundreds of hours with AI. Link in the comments and description. Also, if you want a free SEO strategy session, we're offering those right now. link in the comments and description and you can book a call where we'll show you how to get more traffic and leads to your business. And remember, the AI money lab has SOPs and processes for everything we're talking about, plus 100 plus use cases, 28,000 plus members, and free trainings every single day. Drop a comment below and tell me what tasks you're going to automate and what workflows you're going to build. Julian Goldie reads every comment, so make sure you comment. This is absolutely huge. And if you're not paying attention, you're going to miss it.

Другие видео автора — Julian Goldie SEO

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник