Gemini’s Agentic Update Is Here, OpenAI’s Prism & New Model Leaks!

8:42

Gemini’s Agentic Update Is Here, OpenAI’s Prism & New Model Leaks!

Universe of AI 29.01.2026 2 966 просмотров 92 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Google just rolled out Agentic Vision in Gemini, letting AI actively inspect images instead of guessing. OpenAI also introduced Prism, a new AI-native workspace for scientific writing, and we break down the latest model leaks — what’s real, what’s noise, and what might be coming next. Stay tuned for clear, no-hype AI updates. For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪‪ ⁨‪‪‪‪‪‪‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: /intheworldofai 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ ai news,artificial intelligence news,gemini agentic vision,gemini ai update,google gemini ai,agentic ai,ai vision models,openai prism,prism ai,ai for research,ai scientific writing,ai latex,model leaks,ai model leaks,gemini 3 update,gemini 3 flash,claude sonnet leaks,anthropic claude update,ai multimodal models,latest ai updates,ai industry news,llm news,universe of ai #AINews #ArtificialIntelligence #GeminiAI #AgenticAI #OpenAI #PrismAI #ModelLeaks #ClaudeAI #AIUpdates #UniverseOfAI 0:00 - Intro 0:14 - Gemini Voice Cloning 0:00 57 - Gemini Agentic Vision 4:43 - OpenAI Prism 6:36 - New Model Leak Update 8:24 - Outro

Оглавление (5 сегментов)

Intro

Google is rolling out new agent capabilities. OpenAI just dropped a new research focused workspace. And later in the video, we're also clearing up some of the most recent model leaks. What's real? What's noise? And what might actually be coming next. So, let's get into it. Google appears to be testing

57 - Gemini Agentic Vision

Google is rolling out new agent capabilities. OpenAI just dropped a new research focused workspace. And later in the video, we're also clearing up some of the most recent model leaks. What's real? What's noise? And what might actually be coming next. So, let's get into it. Google appears to be testing voice cloning inside Google AI Studio. According to testing catalog, from what's been spotted, users can record or upload their own voice and allow Gemini to generate audio using that voice. This is showing up directly in AI Studio, which suggests it's aimed at developers rather than a consumer release, at least initially. There's no official announcement or timeline yet, but this could be tied to the upcoming Gemini 3 Flash native audio upgrade. If it rolls out more broadly, it will let developers build applications with consistent personalized voices directly on Gemini. For now, this looks like early testing, but it's another signal that Google is pushing deeper into native multimodal audio capabilities. Google has introduced a new capability called Agentic Vision in Gemini 3 Flash, and it represents a meaningful shift in how AI systems interpret images. Most vision models today process images in a single static pass. They take one look at an image and generate an answer based on that snapshot. If a small or fine- grain detail is missed, such as text on a distant sign, a serial number, or a subtle structural feature, the model has no way to re-examine the image and must rely on inference. Aentic vision changes this by treating visual understanding as an active multi-step process rather than a one-time observation. At a high level, agentic vision introduces a think, act, observe loop into image understanding. First, the model analyzes the user's query along with the image and forms a plan for how to answer it. Next, it can generate and execute Python code to manipulate or analyze the image. This could include actions like cropping specific regions, zooming into small details, annotating objects, counting elements, or performing calculations based on the visual inputs. Finally, the transferred images or computated outputs are added back into the model's context, which allows it to reason over updated visual evidence before producing a final response. So, by combining the visual reasoning with code execution, the model is able to produce a more grounded answer compared to a single pass interpretation what current models do, which also leads to hallucinations or misrepresentation of the image. According to Google, enabling code execution with Gemini 3 Flash results in a consistent 5 to 10% quality improvement across most vision benchmarks. Google shared several examples that illustrate how this works in real applications. In inspection and compliance task, Gemini 3 flash can zoom into highresolution images constantly to examine specific regions of interest. In one case, a building plan validation platform uses capability to inspect structural details such as roof edges and building sections. By repeatedly cropping and analyzing specific areas, the system improved validation accuracy without changing the underlying model. Another example focuses on image annotation. Instead of only describing what it sees, the model can directly draw on the image to support its reasoning. For tasks like counting objects, the model can label and mark each detected element before generating an answer, which reduces common visual counting errors that we know with AI models nowadays. Aentic vision also enables more reliable visual math and data analysis, which is huge. When presented with dense tables or charts embedded in images, the model can extract the underlying data and delegate calculations to a Python execution environment. This allows it to normalize values, perform multi-step calculations, and generate plots based on computed results rather than estimated ones. What makes Agentic Vision important is that it changes how models actually approach images. Instead of taking one look and moving on, the model can go back, zoom in, transform what is seeing, and verify details step by step. That makes a real difference in situations where accuracy matters. things like inspections, technical diagrams, dense charts, or any task where you need to be confident that the model didn't miss something important. Google has also been pretty clear that this is just a starting point. Right now, some behaviors like rotating images or doing more complex visual math still need to be explicitly triggered, but over time, those actions are expected to become more automatic. They're also planning to add more tools into this loop and expand Agentic Vision beyond just Gemini 3 Flash to other model sizes, which suggests that this isn't a one-off feature, but a direction they're committing to for the long run.

OpenAI Prism

Next up, Open AAI just introduced something called Prism and is basically a new workspace built specifically for scientific and academic writing. If you ever worked on a research paper, you know how messy the workflow can get. You're writing in LaTeX, managing references somewhere else, sending drafts back and forth, and then separately using AI tools to proofread or rewrite sections by copy and pasting everything around. Prism is trying to put all of that into one place. It's a cloud-based latex native editor, but with GPT 5. 2 built directly into the document itself. So, you're drafting, compiling, and collaborating in the same workspace with live previews in real-time edits from co-authors. And there's no local setup or no need to juggle tools. What's interesting is how the AI works here. Instead of just looking at one paragraph at a time, it understands the entire paper. The structure, equations, figures, citations, all of it. So when you ask it to revise a section or clean something up, it can actually make changes that stay consistent across the document rather than breaking formatting or missing references. Prism also bakes in a lot of the stuff researchers usually do elsewhere. literature search, citation management, formatting fixes, equation handling, all inside the same interface. The idea is that you spend less time on cleanup and more time actually thinking about the work. Collaboration is another big piece. You can have unlimited collaborators editing at the same time, leaving comments and iterating quickly without dealing with version conflicts or merging drafts. Zooming out, this kind of feels like OpenAI pushing deeper into real workflows and not just chat windows. Instead of saying use chat GBT to help write papers, they're saying here's the place where papers actually get written and you should use it. Prism is free to use and available now. So if you're doing any serious technical or academic writing, this is definitely something you want to try out. All right, a quick

New Model Leak Update

check-in on some of the recent model leaks because a few things are kind of getting mixed together. Firstly, the so-called Snow Bunny model that I've showed on this channel, according to Chattisula, based on internal testing and what's been observed so far, Snow Bunny has a high chance of simply being Gemini 3 Pro going generally available, which is known as the GA. So, it's not a brand new model or a major architectural jump. That lines up with performance test 2. In test, it still struggles with very long outputs. Things like generating thousands of lines of code without a little bit of extra prompting, which is what you would expect from a GA release, not a nextG model. Looking ahead, if Google sticks to its usual release cadence, Gemini 3. 5 would likely land sometime in the second half of April. I hope it's a little bit earlier, but following the six-month pattern they've used before, this kind of makes sense. So, if you're expecting something substantially new, that's probably the window to watch, but not right now. Now, on the anthropic side, there's something more interesting brewing. There have been mentions of unreleased Fenic model showing up in testing, which could point to an upcoming Claude model potentially in the Sonnet tier. To be clear, this part is still unconfirmed and is early and there's not enough detail yet to say what exactly Fenck is and when it might ship. But it does suggest that Enthropic is actively testing new variants behind the scenes, which lines up with how they usually stage releases before making everything public. The big takeaway here is that not every leak means that we're getting a launch right away, and not every code name signals a major model jump. A lot of this is internal testing noise that only makes sense once you zoom out and look at the release patterns. I'll keep track of these leaks as they come up. And if you want to stay uptod date as all of this unfolds, make sure you're subscribed to the channel.

Outro

Make sure to subscribe to our channel. We do real tests, not just headlines. Make sure you're also subscribed to the world of AI. And don't forget to check out our newsletter for deeper breakdowns you won't see on YouTube. And I'm growing my Twitter following, so make sure you follow me on Twitter as well. Hope you guys enjoyed today's video and I'll see you in the next

Другие видео автора — Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник