Gemini 3.5 Testing: 3000 Lines of Code in One Prompt!

9:01

Gemini 3.5 Testing: 3000 Lines of Code in One Prompt!

Universe of AI 22.01.2026 13 371 просмотров 306 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Google's Gemini 3.5 Early Checkpoint (codename Snowbunny) testing results: - 3000+ lines of code - full Game Boy emulator - 88% on Heiroglyph lateral reasoning benchmark - Music generation, SVG graphics, and more - Currently in A/B testing on AI Studio For hands-on demos, tools, workflows, and dev-focused content, check out World of AI, our channel dedicated to building with these models: ‪‪ ⁨‪‪‪‪‪‪‪@intheworldofai 🔗 My Links: 📩 Sponsor a Video or Feature Your Product: intheuniverseofaiz@gmail.com 🔥 Become a Patron (Private Discord): /worldofai 🧠 Follow me on Twitter: https://x.com/UniverseofAIz 🌐 Website: https://www.worldzofai.com 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ #Gemini35 #googlegemini #ai #machinelearning #aicoding #Snowbunny #aimodels #technews Gemini 3.5, Gemini 3.5 Snowbunny, Gemini 3.5 testing, Google Gemini, Gemini AI, AI coding, AI code generation, Gemini 3.5 benchmark, Gemini 3.5 vs GPT, Gemini 3.5 leaked, Google AI, AI model testing, lateral reasoning AI, Heiroglyph benchmark, AI Studio, Gemini 3.5 early checkpoint, Gemini Snowbunny leaked, AI generates game boy emulator, 3000 lines of code AI, Gemini 3.5 code generation, GPT-5 vs Gemini, Claude vs Gemini, best AI for coding, AI benchmark results, artificial intelligence, machine learning, AI news, tech news, coding AI 0:00 - Intro 0:32 - Early Benchmark Results 2:05 - Test 1: Frontend Coding 3:21 - Test 2: Generating Music! 4:00 - The Music 4:40 - Leaked Gemini Flash Model 5:31 - Test 3: SVG Robots 7:40 - Test 4: Nintendo Gameboy Emulator 8:42 - Outro

Оглавление (9 сегментов)

Intro

It looks like it's snow bunny season and I'm not joking. There's been some interesting leaks coming out that a new Gemini model code named Snow Bunny is internally being tested and the performance numbers are kind of insane. We're talking about something called either Gemini 3. 5 or possibly Gemini 3. 0 Pro GA. There's actually some confusion about what to call it. And honestly, that confusion is part of the story here. So, I've gathered some leaks today that I'm going to show you guys and we can see why these leaks matter, what it means for the potential upcoming Gemini model. So, let's get into it. Let's

Early Benchmark Results

start with the hard numbers because this is where things get interesting. Leo, who runs the hieroglyphic benchmark, posted a result showing two versions of this model, both called Snow Bunny. One is called RAW and one is called Less Raw. I didn't name them that, so just saying. And they're scoring about 16 out of 20 on lateral reasoning task. That's achieving a score of total of 80%. Now, if you're not familiar with the hieroglyphic benchmark, it's a benchmark specifically designed to test lateral reasoning capabilities. We're not talking about simple question answering or even complex math. This is about making connections that aren't obvious. Thinking around problems, the kind of reasoning that usually trips up AI models. And if you look at the competition here, lithium flow, which is another leaked model, is at 10 out of 20. And the various GPT5 models are sitting around 11 out of 20. And the Gemini 3 Pro preview is at 9 out of 20. And then there's Claude 4 Opus at 1 out of 20. Now, that Claude score isn't necessarily a knock on Claude. Different models are optimized for different things, but it shows you how specialized this kind of reasoning is. What really stands out is that both Snowony versions hit the same score. The RAW version and the less raw version both hit a score of 16 out of 20. that suggests that this isn't a fluke or a cherrypic result. The model is consistently performing at this level. Leo mentions he's going to need to work on a version two of the benchmark, which tells you something. When your test is being outperformed to the point where the person who made it says that they need to make it harder, that's a significant improvement. The

Test 1: Frontend Coding

first tweet I want to show you is from Chedula, who's a moderator in the AI community and is well known for testing these leaks. and they're testing this model in studio and posted a demo of a website it built. So, let me show you the website. If we take a look at this website, it looks pretty good. Like we see that it's something called Ops Forge and it looks like it's supposed to be like a business application sock to compliant. I wonder what the code or sorry the prompt must have been but like if you look at it the front end is pretty good. The tweet also mentions that they're planning to release bunch of other demos that they have tested and they have tested something from front end to backend and even 3D to music. So the range is notable. The description that caught my attention is this model feels something big like deep think but fast like flash. That's an interesting comparison. Deepthink models or what people call reasoning models. They take their time to work through problems. They're thorough but slow. Flash models are the opposite. quick responses, less depth. If this model is actually managing both, that's a different beast entirely. Testing is apparently happening in studio for at least a week before they post various type of tests. So, we're seeing early results from internal testing, not the polished demo that Google will put out in a blog post.

Test 2: Generating Music!

The second tweet is also from Legit API, and it's about music generation. The claim is that this model even outperforms lithium flow in music. Lithium Flow has a pretty solid reputation for audio generation, so that's not a small claim. The tweet shows what looks like a waveform visualization of the music. So, you can see these layers of red and pink creating this almost organic looking audio pattern. What this tells us is that this model isn't just good at one thing. It's not a coding specialist or a reasoning specialist. We're seeing claims of strong performance across completely different domains. Code, reasoning, benchmarks, and now music generation. So, why don't you guys take a listen?

The Music

The question in this tweet is whether this is a Gemini 3 Pro GA or Gemini 3. 5 Pro. that uncertainty is showing up in multiple places which is kind of weird for a Google release. Usually by the time people are testing something the naming is kind of more clear but right now nobody knows what this new model

Leaked Gemini Flash Model

could be. Now to make matters even more confusing yesterday night there was a new notification from Eller Marina saying that there's an updated Gemini 3 flash model available and it was called something Gemini 3 Flash 202620 which was yesterday's date. So everyone got excited. People were looking for this model. Even myself, I was trying to find this model on Elmarina, which looks like it could have been dropped. But the funny thing is after a couple of minutes, it was gone. The model was removed. So, this new model could be an updated Gemini 3 flash model that we might be seeing. So, it might not be just a 3. 5 Pro. It could be an updated flash model, which is kind of interesting because the flash model, I mean, it's good. It's amazing, but it would be surprising for Google to update over the other model, which that other model powers Flash. Anyways, the third

Test 3: SVG Robots

example is SVG generation. Legit API posted these cyberpunk robot designs, 10 different variations, all with this consistent aesthetic. Dark backgrounds, neon accents in cyan, purple, and pink. Each robot has a distinct design, but they all kind of feel like they belong to the same set. The tweet says that these were made by a new Gemini Pro model currently in AB testing in AI Studio. Again, that AB testing detail keeps coming up, so this is clearly a limited roll out to specific users. Now, SVG generation is interesting because it requires both creative and technical precision. You're writing code that renders as graphics, and the geometry has to be correct. The styling has to work, and ideally, it should look good. These robots show pretty clean design work. The proportions are consistent, the color schemes work together, and there's actual variation between them. To see how good this new model could be, I want to generate five Cyberpunk robot SVGs side by side. And I'm going to use two models, Claude Opus 4. 5 and then GPT 5. 2 high. And let's see what we get with these two models and compare that to the leak Gemini model. Looks like our generation is done. Now, I'm not going to dunk on the generation too much because obviously I don't have the specific prompt that legit API use, but what we're seeing right now, the first example is from Claude Opus 4. 5. I mean, these Cyberpunk units kind of look good, but like not as detailed as we saw in the League Gemini version. I like the naming like Centennial X7 Hunter V9, which is kind of cool. Good touch. A for effort on that, but it's not like as good as the other one, but not bad. Then we have the GPT 5. 2 to high one, which kind of looks, not going to lie, the worst in my opinion out of the three. Bonus points for creating like a website that can I can add random neon accents, which doesn't really do much uh and anything like that. Once again, nice names and all of that, but compared to what we saw with the leak Gemini model, which kind of looks more like actual robots, has limbs that are longer than these little droid bots here, I'm going to have to say that the leak model wins in this case. This is what I mean when I

Test 4: Nintendo Gameboy Emulator

talk about capability range. We went from compliance dashboards to music generation to SVG graphics to a 3000line Game Boy emulator. That's not incremental improvement over existing models. That's a different category of capability. The question everyone's asking is when this becomes publicly available and what it costs. Because a model that can do this but cost $100 per request is not going to be useful for most people. But if Google prices competitively and it maintains this level of performance or even more, this changes what's possible with AI assisted development. We still don't have official confirmation from Google. We don't know the final name, the release date, the pricing, the rate limits, or any of the practical details that matter for actually using this. I'll keep watching for updates. When Google makes an official announcement or when more testing results come out, I'll cover it for sure. If you're in the AI Studio program and you get access to Snow Bunny, definitely experiment with it and share what you find. That's it for

Outro

today. Make sure to subscribe to our channel. We do real tests, not just headlines. Make sure you're also subscribed to The World of AI. And don't forget to check out our newsletter for deeper breakdowns you won't see on YouTube. And I'm growing my Twitter following, so make sure you follow me on Twitter as well. Hope you guys enjoyed today's video and I'll see you in the next

Другие видео автора — Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник