# They Have Better AI Than They’re Shipping! Gemini Math, Open Weights, 3D Asset Upgrades

## Метаданные

- **Канал:** MattVidPro
- **YouTube:** https://www.youtube.com/watch?v=9LSS7hL2IEc
- **Дата:** 17.01.2026
- **Длительность:** 15:47
- **Просмотры:** 15,079

## Описание

In this episode, I dive into some amazing recent developments in the AI world. First, I discuss a novel theorem in algebraic geometry proved by Google DeepMind's internal Gemini model, which has impressed top mathematicians. Then, I explore Tencent's 3D Studio 1.2, an advanced 3D creation pipeline now available for public beta. We take a look at some stunning AI-powered 3D object details and manipulations. Next, I talk about exciting updates like Phil's Crystal Video Upscale, the most accurate transcription model by 11 Labs, and Google's new personal intelligence feature that connects with your Google apps for a personalized experience. Lastly, I touch on open-source AI models aimed at improving AI-based image generation and more. Tune in for the latest highlights and advancements in artificial intelligence!
▼ Link(s) From Today’s Video:
Novel Algebra Theorem: https://x.com/AISafetyMemes/status/2011838955894022537
Tencent HY Studio 1.2: https://x.com/TencentHunyuan/status/2012005104153678331
LTX-2 DeepZoom Lora: https://x.com/wildmindai/status/2011832875809997261
Pixverse R1: https://x.com/PixVerse_/status/2011100288690897317
Niji V7: https://x.com/midjourney/status/2009748519133827304
Crystal Video Upscaler: https://x.com/philz1337x/status/2012193228993569154
https://fal.ai/models/clarityai/crystal-video-upscaler
Elevenlabs Scribe V2: https://x.com/elevenlabsio/status/2009626517521797288
ChatGPT Translate: https://chatgpt.com/translate
Google Personal Intelligence: https://x.com/google/status/2011473056921706852?s=46
Black Forest Labs Flux 2: https://x.com/LlmStats/status/2011843251536322822
https://x.com/ComfyUI/status/2011830637062471830
GLM-Image: https://z.ai/blog/glm-image
https://huggingface.co/zai-org/GLM-Image
https://wavespeed.ai/models/z-ai/glm-image/text-to-image

► MattVidPro Discord: https://discord.gg/mattvidpro

► Follow Me on Twitter: https://twitter.com/MattVidPro

► Buy me a Coffee! https://buymeacoffee.com/mattvidpro
-------------------------------------------------

▼ Extra Links of Interest:

General AI Playlist: https://www.youtube.com/playlist?list=PLrfI66qWYbW3acrBQ4qltDBsjxaoGSl3I

AI I use to edit videos: https://www.descript.com/?lmref=nA4fDg

Instagram: instagram.com/mattvidpro

Tiktok: tiktok.com/@mattvidpro
Gaming & Extras Channel: https://www.youtube.com/@MattVidProGaming

Let's work together!
- For brand & sponsorship inquiries: https://tally.so/r/3xdz4E
- For all other business inquiries: mattvidpro@smoothmedia.co

Thanks for watching Matt Video Productions! I make all sorts of videos here on Youtube! Technology, Tutorials, and Reviews! Enjoy Your stay here, and subscribe!

All Suggestions, Thoughts And Comments Are Greatly Appreciated… Because I Actually Read Them.
00:00 Introduction and AI Safety Memes
01:36 Tencent's 3D Studio 1.2: A Major Upgrade
03:50 LTX Two and Deep Zoom Lora
04:56 PixVerse's Real-Time World Models
07:50 Midjourney's Anime Generation and Crystal Video Upscale
09:48 11 Labs' Transcription Model and ChatGPT Translation
10:26 Google's Personal Intelligence and AI Image Generation
14:48 Conclusion and Future of AI

## Содержание

### [0:00](https://www.youtube.com/watch?v=9LSS7hL2IEc) Introduction and AI Safety Memes

Hey everybody, how's it going? Welcome back to the Matt Vidpro AI YouTube channel. Today I'm rounding up everything that has caught my eye recently regarding AI. Anyways, I want to open up with this. This comes from at AI safety memes. A paper was released proving a novel theorem in algebraic geometry with an internal math specialized version of Gemini. So this isn't something that is released to the public. It is a research model a collab between Google deep mind and some various professors. Co-author professor Raviki Vakquille president of the American Mathematical Society said that Gemini's proof was rigorous, correct, and elegant. The kind of insight I would have been proud to produce myself. Oh boy. I often times will see comments from you guys honestly about stuff like this. The truth of the matter is that the models that are distributed at a large scale from closed companies like Google or OpenAI, they have precursor models like this internal math specialized version of Gemini. The capabilities exist today and they choose sometimes and to some degree what they really share with us. And there is the meme to wrap it all up. It's absurd stuff. And honestly guys, at this point, you know, this is an internal Gemini model. These are real mathematicians, professors saying like, hey, this is legit. This is real stuff. This is real progress being made in this field with the use of AI technology. Whether that is perceived as a positive thing up to the eye of the beholder, but the progress is undeniable. Next up, Tencent

### [1:36](https://www.youtube.com/watch?v=9LSS7hL2IEc&t=96s) Tencent's 3D Studio 1.2: A Major Upgrade

has released Huan 3D Studio 1. 2, a major upgrade to their 3D creation pipeline. You can generate assets with sculpt level detail and fine-grained interactive control. The studio is officially open for public beta. The component partitioning and resolution is pretty high here for basic level work and for like video game assets. I feel like the resolution is starting to get pretty good. Intuitive brushbased control for precise manual component editing. Essentially hopefully allowing you to improve any mistakes that the AI makes. drastically improved geometry integrity for even the most intricate objects. We're generating these 3D objects from input views. You can have up to eight of these input views. Take a look, guys. This stuff is typically super mind-blowing. Tencent Hunyan Studio 1. 5 detail preservation. So, part Gen 1. 0. This is the old model versus 1. 5 the new model. You can see definitely some accuracy and resolution bumps. Complex geometry here. Oo, I like that with all the gears. That's not easy to do. The gears look pretty stable and pretty legit here. Same thing with the piping as well. You can see part gen 1. 0 had a component missing. Part gen 1. 5 was able to pick that up. Good to see stuff like that. Okay, here is the brush interaction for touch-ups. I see. So, if I wanted to make the gears move, I can highlight the gear and then split the object and actually pulls away and comes off as a new region. Very, very useful if it can just really do it automatically like that. Pretty darn cool. Yeah, definitely more texture, more detail here. These are some pretty good-looking 3D objects. I mean, yeah, there's some weirdness kind of going on with some of the more fine grain pieces, but it's really not all bad. The closer up you get, the bigger the difference really is actually pretty insane. Would be so cool to like hook something like this up to an agent and see if you can generate assets for a video game or something. Here's the starting picture versus finalized asset. That's pretty awesome. So, here is the website if you want to try it. The weight list is gone from this one, but it does look like this is all being translated from Chinese. And yeah, these aren't, you know, your typical login methods, at least for me here in the States. So, it looks like you're going to have to sign

### [3:50](https://www.youtube.com/watch?v=9LSS7hL2IEc&t=230s) LTX Two and Deep Zoom Lora

up. My last video was an update surrounding LTX2, which dropped fully open source. We're starting to see Laura's popup for LTX2, which I'm very, very excited about. This is a deep zoom Laura. It creates this like reveal effect. So you can essentially upload any image and then it will kind of zoom in and try to get like this macro high detail effect going on with, you know, whatever you upload. It's pretty awesome. We've seen Loras like this before at this caliber of quality with other open-source AI video models, but this one does have sound. These all have music in the background, although I'm sure you could prompt that out and just ask for like cool sound effects. — It's kind of cool. Just transitional, you know. Went for a different kind of song for the sneaker. Pixvere is working

### [4:56](https://www.youtube.com/watch?v=9LSS7hL2IEc&t=296s) PixVerse's Real-Time World Models

on realtime world models. This is Pixverse R1 Infinite Continuous Alive. Yes, world model. Not out for everybody yet. This is just sort of a preview showing off the capabilities. Let's take a look at their video. — Welcome to my world. — I don't know. That's creepy. — Generation is active. Input code to control the world. Model one, World of War. — Soldier lying on a snowy mountain holding binoculars. Crow patrol officers runs desperately to escape. Fell down beside a tank. That's pretty cool. It's not first person, you know? It's like watching a movie you can control. Pair of binoculars. Something about it is very cool to me. — Model two undersea world skull statue. Whoa. They just kind of grow out. It's going to be interesting to see how we try to like control this and use it for different applications. Diver turns into a skeleton. — Model 3 busy room. — Oh, the book turned into a phone. — Wake up, dear. — Does it actually have audio? Yeah, you can see it has a little bit of a trouble. It's not perfect. — Real time generation archived. Returning to reality. — This is the future. — That's creepy. — Connection. Infinite possibilities. — Don't talk to me like that. Very scary robot lady. I like to see that smaller organizations like Pixver are investing in trying to work on real-time world models. It's not like this obvious investment for these AI companies necessarily. Google, you know, seems to be the most ahead, but the more competition, the more people that try to get there with these innovative new models, the better for everybody. I'm excited to try some decent like baby step world models this year. something I can prompt and control a little bit.

### [7:50](https://www.youtube.com/watch?v=9LSS7hL2IEc&t=470s) Midjourney's Anime Generation and Crystal Video Upscale

Midjourney is still around and they just released Niji V7. This model is fine-tuned for anime and they also have a video model, don't forget. So, this is just showing off some of those new anime generation capabilities with Niji V7. This is a little bit more specialized, so that's why I'm talking about it. I haven't really kept up or followed MidJourney since V7. they have fallen behind especially for a lot of realworld workflows. They're more focused on aesthetics, I think, and capturing different styles. I'm not saying that they have no place as an AI image generator. I'm just not so sure if they're on the bleeding edge still. However, I got to say a lot of these anime frames look pretty good. This seems to be a model that is fine-tuned on a specific style and does it very, very well. Are other image generators also very capable in this style? Absolutely. A new AI video upscaler is here by Phils. It's called Crystal Video Upscaler and it's getting improvements since it launched just 3 days ago. Bugs have been fixed and now it can upscale to a longer video duration. At 4K, a maximum of 43 seconds. At 1080p, you can do 2 minutes and 50 seconds. And at 720p it's 6 minutes and 40 seconds. Apparently though it's still pretty slow at upscaling and Phils wants to make improvements. Honestly, this appears to be a very good video upscaler. Possibly a real alternative to the Topaz AI video upscaling that kind of dominates the market right now. It's a more expensive option compared to something like this that you can run through an API for 10 cents per megapixel per second. I don't know. This detail is pretty great. Humans are a pretty difficult benchmark. I mean, obviously you can tell there's a little bit of AI shimmering and stuff going on, but compared to everything else I've seen on the market, this is really impressive. 11 Labs recently

### [9:48](https://www.youtube.com/watch?v=9LSS7hL2IEc&t=588s) 11 Labs' Transcription Model and ChatGPT Translation

dropped the most accurate transcription model ever. It even beats Open AIS. They have a real-time model built for agents and low latency and a batch model built for scale subtitling and captioning. AI transcription models already are very good, but they're only getting better, picking up more subtle nuances and increasing accuracy. This one being over 95% accurate on the benchmarks. But if you care about translation, not transcription, Chat GPT now has a dedicated and what appears to be free translation page. Little bit of a competitor to Google Translate.

### [10:26](https://www.youtube.com/watch?v=9LSS7hL2IEc&t=626s) Google's Personal Intelligence and AI Image Generation

Regardless, if you want complex AI based translation, chatbt. com/transate. Google is introducing something called personal intelligence. With your permission, and if you're wondering, yes, I actually have opted into this, Gemini can now securely connect information from Google apps like Gmail, Google Photos, and YouTube history. Gemini can be uniquely helpful and personalized to you. It's in beta through the Gemini app. It's looking through your data to get to know you a little bit better. Some people definitely aren't going to be cool with that, and I understand. I'm a little crazy and I just love testing AI products. So, of course, I'm like, sure, you know, opt in. This is a very interesting use case that I might have something to say about. Reasoning across sources. Buy parts for your car, but don't have info handy. Recommend tires for my car. References Gmail and photos. Can understand your car's make and model. Get info like your license plate number to make your visit to the auto shop go a little bit more smoothly. I don't know. Maybe. The thing with cars and tires is that they need to be pretty specific. Even some of the books that they have in tire shops might actually have incorrect information for your vehicle. Typically, it's best to look at the tires yourself. This is something that it might get right, but I would definitely double check on. See, this is where I could see it working a little bit better. Recommend some hidden gems or activities I might enjoy based on your current location and what it knows about you from all the different sources. it could determine what you might be interested in and maybe get you some good results. Here you can see they're actually doing a sideby-side comparison with a standard chat versus personal intelligence. You can see the model goes a little bit deeper basing the answer off of the person's interests. You can see, of course, absolutely getting lampuned in the replies, but this is one of those things that I might end up actually liking more than I think. The product may be effective, but do you want to share your information? Black Forest Labs has released some new Flux models under the name Klein. You can get it through the API or run it locally. Klein 4B under Apache 2. 0. Klein 9B as open weights. I like to see this hit the scene as open weights. You can see the major platforms like Comfy UI are supporting right out of the gate. If you are into local image gen, this is definitely a nice treat. AI image generation models have gotten so good. Models like Flux Klein are really designed to produce aesthetically pleasing imagery that is decently coherent and sharp. Also to be photorealistic text, infographics, more complicated things like that. You're going to need an auto reggressive model, something for dense knowledge and highfidelity image gen. This is GLM image. It's actually released open source which is pretty awesome. This is like a nano banana or a GPT image 2 type of model. You can see it definitely is high quality. It can do photorealistic stuff. It can do artistic things. It can do good text. I assume, you know, all of this text is correct or else they wouldn't be presenting it on their page. I mean, look at all of this. They definitely went pretty hard. I have seen GPT 2. 0 Image Gen from OpenAI. It is not as good as the most recent Nano Banana models, but I don't know. This right here might actually throw GPT image 2 for a little bit of a loop in terms of its ability to produce infographics or text. The project page here has an absolute ton of benchmarks if you're into that as well as the techniques for how they actually built this model. I like in the paper they describe it as industrial grade. But yeah, open source. How cool is this? You can download everything from hugging face. It's about 35 GB. Gemini, how much VRAM do you need to run this model? With CPU offloading, you still need 23 GB of GPU memory. Wow, over 32 GB of VRAM just for the weights. Yeah. Okay, so this one's still going to be pretty tough to run locally. Although, it being open source, hopefully the community can figure out some fun things with this. Break down that barrier and bring it towards consumer class GPUs. They also have an API for this one though as well through Wave Speed. Also, this model doesn't look to be too cheap to run. 12 cents per run. Pretty expensive and obviously it's not going to have the same quality as a Nano Banana Pro, but being open source, that's a huge win.

### [14:48](https://www.youtube.com/watch?v=9LSS7hL2IEc&t=888s) Conclusion and Future of AI

Thanks so much for joining me for this AI news roundup, guys. Have the larger players in the AI space zip their mouths shut for now? Yeah, their heads are all down. They're working on the next thing to bring you in the springtime. That leaves some room in the stage for some smaller competitors that bring honestly a lot of great open- source models and developments. I'm excited because early in 2026, local AI is looking like a beautiful, fun little hobby to get into. Lots of experimentation. If you have a pretty beefy GPU, LTX2 is accessible and is now starting to see Lauras, new image generation, and even this new 3D pipeline from Tencent create real 3D objects from images that you can actually bring into a video game or something like that. If you want to stay the most upto-date in the AI world, make sure you join my Discord server and check out the latest AI news leaks channel. Seriously, pretty much as soon as something drops, it's posted there. Thanks so much for watching. I'll see you guys in the next video and goodbye.

---
*Источник: https://ekstraktznaniy.ru/video/11379*