Chinese Website Builder, LLM Secrets & More AI Use Cases
19:32

Chinese Website Builder, LLM Secrets & More AI Use Cases

The AI Advantage 04.04.2025 18 808 просмотров 719 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Anthropic published a study this week claiming they've discovered a way to understand LLM thought processes better than ever before. Plus, Runway Gen-4 launched, OpenAI published new benchmarks for AI agents, and more. Join the AI Advantage Community today! 👉 https://myaiadvantage.com/community?nab=0 Links: https://huggingface.co/spaces/enzostvs/deepsite https://www.anthropic.com/research/tracing-thoughts-language-model https://www.aboutamazon.com/news/innovation-at-amazon/amazon-nova-website-sdk https://openai.com/index/paperbench/ https://x.com/elevenlabsio/status/1905653402429723110 https://x.com/runwayml/status/1906718935778545964 https://myaiadvantage.com/community?nab=0 https://huggingface.co/spaces/Stable-X/Hi3DGen https://huggingface.co/spaces/VAST-AI/TripoSG https://www.nvidia.com/en-us/software/nvidia-app/g-assist/ Chapters: 0:00 What’s New? 1:08 DeepSite 3:28 Anthropic Research Paper 5:27 Amazon Nova 7:02 OpenAI PaperBench 9:03 ChatGPT Updates 11:03 ElevenLabs Actor Mode 13:20 Runway GEN-4 15:18 AIA Community 17:45 TripoSG & Hi3DGen 18:50 Project G-Assist #ai 🔑 Get My Free ChatGPT Templates: https://myaiadvantage.com/newsletter 🌟 Receive Tailored AI Prompts + Workflows: https://v82nacfupwr.typeform.com/to/cINgYlm0 👑 Explore Curated AI Tool Rankings: https://community.myaiadvantage.com/c/ai-app-ranking/ 💼 LinkedIn: https://www.linkedin.com/company/the-ai-advantage 🐦 Twitter: https://x.com/IgorPogany 📸 Instagram: https://www.instagram.com/ai.advantage/ Premium Options: 🎓 Join the AI Advantage Courses + Community: https://myaiadvantage.com/community 🛒 Discover Work Focused Presets in the Shop: https://shop.myaiadvantage.com/

Оглавление (11 сегментов)

  1. 0:00 What’s New? 253 сл.
  2. 1:08 DeepSite 514 сл.
  3. 3:28 Anthropic Research Paper 430 сл.
  4. 5:27 Amazon Nova 363 сл.
  5. 7:02 OpenAI PaperBench 428 сл.
  6. 9:03 ChatGPT Updates 443 сл.
  7. 11:03 ElevenLabs Actor Mode 477 сл.
  8. 13:20 Runway GEN-4 477 сл.
  9. 15:18 AIA Community 604 сл.
  10. 17:45 TripoSG & Hi3DGen 232 сл.
  11. 18:50 Project G-Assist 172 сл.
0:00

What’s New?

It has been another week in generative AI and as per usual there have been a bunch of releases. This week we'll be looking at a use case of new tool using the Chinese open source model that came out last week that lets you build websites with just one sentence. No strings attached, no installations, nothing is as simple as it could get and freely available. So that should be interesting. Brand new research out of anthropic that finally is starting to understand how these generative AI models actually work instead of just calling them a black box. and big releases like Runway's Gen 4 foundational video model. And as you can see, I'm not exactly in my typical studio setup because I'm in San Diego, California right now. And it has been an absolutely insane week for me personally because I came here to speak at a social media marketing world conference and I held a lecture on the community webinar for a company all while staying on top of all this madness and going through the hiring process with some new team members. By the way, if you've never been, San Diego, absolutely magnificent. About 10 years ago, I studied in San Francisco at San Francisco State University. And I was lucky enough to experience most of California, but this little spot down here. Absolutely stunning. And I'm not just talking about the weather here. All right, enough talk. Let's get into this week's AI news that you can use.
1:08

DeepSite

Starting out with this service that is called Deepsite, and it's essentially built on Deepseek V3, a model that has been released under an open source license last week from the Chinese company behind Deepseek R1. This is their non-thinking model, V3, and apparently it's really good at building site. Now, I also have to shout out Gemini 2. 5 Pro at this point, which has really established itself within the space amongst builders, vibe coders, developers. I always kind of brush over these new model releases because the benchmarks only tell a part of the story and the real usage is what matters here the most. And after one week, it just turns out that Gemini 2. 5 Pro, there's pretty much a consensus opinion that it's the current king for development use cases. But this thing is open source. So you can even do things like building a website from one prompt locally on your machine if you wanted to. So let's give this a shot. I'll just say build me a 3D racing game. And then V3 builds it. And in this web viewer, we can see the result of it right away. A really fun little way for you to experiment with the capabilities of something like DeepSeek right here yourself. And I wanted to start out with this because it gives people who have never attempted VIP coding a completely free and intuitive way to do this. This is your chance. You don't need anything. Just go to the link to this thing in the description and give it a shot. If you like it, I would recommend you go down the rabbit hole and start using models like Gemini 2. 5 Pro inside of Cursor or Windsurf. There's a few more hoops that you need to jump through then with this simple interface where it's literally prompt to website. But then you also get more options and you still don't have to know how to code to do things like this. Yeah. And there we go. Our 3D racing game. Let's just start the race. Let's see how this goes. Game over. One more. Oh, and we gaming. Okay, game over 40. So, as you can see, this is pretty much working out of the box. Now, I could follow up and improve this, but rather than doing that, I'll go over to this DeepS gallery that you can also check out from another link in the description, and it pretty much shows the various sites that people have built with this. Very inspirative from various landing pages, games, calendar apps, and even this photo booth site. You can kind of just click them and try out all of these and gain inspiration for what you too could build from a simple prompt. Look at that. How about a YouTube competitor? No problemmo. Again, we've seen this before, but it has never been made this accessible to people who just want to give it a shot for free. Okay, on to the next one. All right, next up
3:28

Anthropic Research Paper

we have an anthropic paper, which is maybe not something you can directly use, but this is one of the most revolutionary things in the generative AI space as of recently, so I just had to briefly talk about this. They're basically starting to decode what goes inside of the black box inside of an LLM. If you're not familiar, so far even the most sophisticated people building these models did not fully understand the end to end process of how these models were generating such level of quality and showing these levels of intelligence within what they're doing. Now, this new anthropic research paper for the very first time managed to pull the curtain back a little bit and they're starting to understand the thinking process that goes on and they created this little freeminute video that I recommend you watch if this piqued your curiosity in which they showed how AI writes a poem and how they can infer on how it's actually thinking depending on one word that is being switched out. And then the paper contains a bunch of more examples across various domains, but I think the writing ones are the most reliable ones. And the bottom line of this whole thing, if you don't care about the technicalities of this, is that with knowledge like this, they'll be able to make things way more precise, particularly in the hallucination department. This is something that we had no remedies to up until now, except of including better sources and prompting it more precisely. But with understanding like this, LLMs might be able to get to a point where they can discern between information that is factual and information that is hallucinated. Simply rerunning that prompt and giving you factual information. Now, this is a hard problem. I'm not saying that has been solved right now. I'm just saying these are the first step towards something like that. It's crazy because that disproves even the biggest AI haters that say, "Hey, you will never be able to fully rely upon these models. There's always going to be an error rate. " Sure, but all it takes is one piece of research that goes a bit further than this and voila, it might be solved. Okay, maybe that statement right there is a bit hopeful and naive. I'm just saying this is a real step in that direction. and honestly the first glimmer of hope showing that this problem might actually be solvable and not just absolutely impossible as has been assumed up until now. All right
5:27

Amazon Nova

next up, just a super quick story. This is nothing that is state-of-the-art, but Amazon did release a brand new platform with a bunch of new LLMs that people can use. So far, these LLMs have only been accessible through their service Amazon Bedrock. And even more interestingly, they're releasing a new thing called Amazon Nova Act, which is essentially an agent that can use a browser. And right here on screen, you can see an overview of all these models. Now, one thing that I do want to note is that these support over 200 languages. Usually, LM supports around 100. At least that's what they say in the release. I believe this is the first time I see over 200 languages, but feel free to correct me if I'm wrong. These bottom ones, Real and Canvas only support English because they're visual models. I mention Video Gen, all of them are fine-tunable, and Amazon Nova Light and Nova Pro have a context window of 300,000 tokens. So, some really solid models. Again, as I mentioned, none of these are cutting edge releases, but with this release that as of now is only accessible in the US and is probably only interesting to developers who want to build with this. We see also Amazon entering the agentic browser use AI space, which has become increasingly crowded, but not a single one of them is at a level that should make you as a consumer feel obliged to use these tools yet. They're just a little stupid and very, very unreliable. I can tell you, I got excited when Operator came out. I was using it daily. I was trying to incorporate it into all parts of my life until the incidents with the 26 bananas or whatever it was occurred and it just randomly spent $60 or $70 and order the bananas to the wrong address. But this category of these agentic tools that work in a browser are just not at a level yet where they're ready for the consumer. I wish they would be, but they're just not. Don't worry though, this show will keep you updated whenever that changes.
7:02

OpenAI PaperBench

Next up, open as paperbench. And this one in particular caught my interest because it tested different LMS something super unique. It lets LLM attempt the recreation of state-of-the-art AI research. And the way it does that is it employs a LLM judge to score the results on how well the model did in replicating some of the research findings from these papers. Of course, there's a lot more details that go into this, but essentially that is it. So, if new models come out on this benchmark, once they run paper bench on top of them, the model will try to replicate what they have found through the state-of-the-art papers in the AI space. And they apply this to a few models that you already know here. Particularly, they tested GPT40, 03 Mini, Deepseek R1, Claude 3. 5 Sonnet, the new version, and Gemini 2. 0 Flash. Now, obviously, there's more models that should be there now, like Claw 3. 7 Sonnet, but there are limitations in the API to that didn't allow them to run this fully. And also, I would love to see the new Gemini 2. 5 Pro here now. But for now, this will do amongst these models. Would you have guessed that Claude 3. 5 Sonet performed by far the best? I mean, look at these numbers. Deepc R1, which is a thinking model, performs almost on the level of GPT40 here. And again, Claw 3. 5 Sonnet just crushes it. So, this is really interesting to me because when people use these models, the results usually are vastly different than from what the benchmarks say. Claude 3. 5 Sonnet didn't crush all the benchmarks, but it quickly became the go-to model for development and even writing. People just really, really loved the model, the way it behaved, the way it talked, the way it thought about things. And I love seeing benchmarks pop up that cover this ethereal quality, these vibes as people call them. And it just got me thinking, maybe I could create a benchmark of myself of some of these vibe coding tools on some use cases that I really prefer, marketing and business efficiency focused use cases with an LLM judge in the end. Sort of an automated testing suite based on various use cases that we teach at the Advantage. I don't know, just a thought that I added to my to-do list. But moving forward, I'll at least be looking forwards more alternative benchmarks like this paper bench right here. Okay, next up I want
9:03

ChatGPT Updates

to quickly tell you about some chat GPT updates that came out over the last weeks. These were rather small but significant ones. The first one was this popup that I received in my interface and many other people have too. Image creation just got better one and two it's rolled out to everybody now including all the free users. So it took them over a week to actually get this out to everybody. Now that has been done and they improved it already. The model thinks a little longer to give more accurate outputs which practically I tested it a bunch of times. I'm not sure if it's slower now. I don't really think so. It was kind of relatively slowish already compared to some of the competitors that just spit out an image within 2 3 seconds. This usually takes 10, 20, sometimes 30 seconds. But you should be able to expect more accuracy now. And Sam Alman tweeted that chat GBT on the web has gotten way faster across the board. As somebody who's using these tools daily and I run a few test prompts, I cannot really say that I feel this yet. Maybe it's in some specific categories. GBT4 seems pretty much the same as it has been before and the thinking models I don't know maybe there's a boost of 20 30%. But I haven't been able to feel this myself. What I have been able to fill is the fact that now that we have image generation inside of 40, it has become my go-to model for so many things because this modality of just generating visuals from what you're already working on is just very powerful. So yeah, image generation is more precise and it's supposed to be more faster. Maybe it also depends on the time of day. I don't know. Can't really confirm myself. And one more fantastic piece of news for all students is that Chat GPT plus the paid version that cost $20 a month is free for college students in the US and Canada through May. So most likely you will require a college email from the US or Canada to get this for free. And this meme in the replies on Twitter. It's just too accurate. It's also how I felt right away. How do you do fellow kids? Everybody will want to be a college student now. I suppose if you're enjoying this content, don't forget to leave a quick like. It really helps out the channel. and I always underestimate how much it helps. So, go a hit that like button if you like this and let's
11:03

ElevenLabs Actor Mode

move on to the next story. All right, so next up we have a new release from 11 Labs. This one is called actor mode. And what it essentially does, it allows you to create an audio file with 11 Labs as per usual, but then a person with the microphone, the actor can map their inonation onto the produced voice recording. You could put emphasis on certain words. to suffer the slings and arrows of outrageous fortune. You could change up the pacing. That is the question. And then have that applied to the AI generated voice. What I figured would be the perfect way to test it is to use this in a way we actually use 11 Labs all the time. Cuz let me tell you, I've shared this before, but I'll say it again. We have a voice that is trained on me. And if I either misspeak something that's really important or I get some piece of information wrong and we want to change it in the edit, well, me re-recording takes a lot of effort. So, our video editing team has access to my voice in 11 Labs and we just recreate that little word. Now, the problem is often that word or number does not smoothly edit into the final video. This is the worst because it just sounds different. I mean, just look at the way I speak. The last thing I said, I said in a manner like this. I sat The standard voice generation is not going to slot into there smoothly. But now we have actor mode. So how about we give this a spin? Let me do two quick statements and then I would kindly ask the video editing team to use this brand new 11lapse actor mode to do a generation once without actor mode to replace one word and then right afterwards we will use actor mode to integrate it more smoothly. Okay, here we go. Phrase number one, my original recording. Training this brand new model cost them $20 million. Let's try and replace the 20 with 30 and see how that goes. Training this brand new model cost them $30 million. Example number two. Let's try and change the word annoying to impressive. And that is the most annoying thing about AI. Let's see if we can completely manipulate the meaning of that sentence with this new actor mode. And that is the most impressive thing about AI. And that is the most annoying there you go. As you can see, we didn't really manage to get this to work in a way that we were hoping for. Maybe we did it wrong, but at the very least, this feature seems to be unintuitive and not working as well as we would hope for. Onto the next one. All right, next
13:20

Runway GEN-4

up, we have a big release. Runway came out with a brand new foundational model, the successor to Gen 3. They're calling it Gen 4, and I'm about to show you a bunch of comparisons to some of the other best models in the space. So, let's do it. Okay. And right away, even from the first example, I can already tell you the quality of this is super high. And on this car example in particular, a lot of models struggle with the wheels in particular. The wheels are just static like on a parked car. Although it's driving on a highway, this model gets it right. Now, some of the other state-of-the-art models also get this right. So, we do have to get a bit more detail oriented here. I would say this drone shot looks authentic, but it's less polished than some of the other examples that I've even seen out of Gen 3. That's weird. It sort of looks like a lower quality camera, which is more realistic, I guess. Interesting result there. What about this one? This one is also interesting because the water is hard to do and the human anatomy and movement is hard to do. Okay, so there's definitely some trippy movements in there. Here's a comparison to some of the other models. And also the water, just like with almost any model, is not perfect. If you really look at these waves and you've ever seen water break on the beach before, you'll realize that's not entirely it. This close-up is very good, though. Maybe some slight twitching in the eyes that is not supposed to be there. I don't know. I'm just really trying to be nitpicky here, but you can make up your own mind based on these examples. And we did run a few extra image to video tests here. So, we'll show both the images and the resulting videos here. So, this first one was really easy and it handled it quite well, but the other ones are very tricky ones because they include human anatomy. And I have to say, I thought it was really good with everything besides the fingers, but the hands are just all over the place. But I have to say, the other models wouldn't have even gotten the bodies right. So, strong performance. And this one obviously is very hard to do in terms of realism when it comes to the human anatomy. Eh, face doesn't look real. I don't know. some of these image video examples, not the best. Definitely better than Gen 3, but probably not state-of-the-art. If I just had to give my first opinion, we'll do more extensive testing and we'll publish the results of that in our monthly video generator rankings. Okay, on to the next one. Okay, look, if you're watching this
15:18

AIA Community

video, you know exactly how exhausting and overwhelming it can be to stay on top of AI and navigate all the different spaces across the internet that house the information. YouTube is fantastic, but can be a sensory overload. on X. You get good info quickly, but it's mixed with a bunch of topics that you probably don't even want to see. And I love Reddit, but often feels like the posts are just driven by somebody trying to prove themselves to the world, sharing their knowledge just so they can prove to themselves how smart they are. Not always the case, but if you've been around, you know there's some truth to those statements. Now, I myself was also looking for an alternative to this. And after some conversations, I kind of realized that what I really wanted is this old school forum vibe. Now, you probably got to be at least 25 or 26 to know what I'm talking about here. But back in the day on the internet around 10 to 15 years ago, there used to be these traditional forums with genuine discussions. And even old Reddit was a completely different culture than it is today. Genuine discussions could flourish and you actually knew the different users by name because there wasn't tens of thousands of them. And if you saw a specific profile picture, you knew that, oh wow, this person created a new post. I might really want to read that because what they do is high quality. And at the advantage, you might know that we started our very own community. Now, this community is not for everyone and it is paid, but I check it out every single day and I do get this feeling that I used to get in the golden age of the internet forums. The people in our community are genuine and helpful and nobody's posting just to make themselves feel better. Everybody there is on this common journey of trying to master these tools, trying to get the most out of them to improve their professional or personal life. And there's zero clickbait because there's no point in that. Generally speaking, you only join the community if you have an open mind, can afford it, and you're curious about the different possibilities that AI tools absolutely do hide. Not all the use cases are obvious. And if you pass those filters, you don't need to use inflammatory language to get people to click on a guide or a course. That's not how humans actually communicate. It's how humans have to communicate if you're distributing one person to hundred thousands. But a few dozens or hundred, you don't need that. So, it just creates this unique environment that I myself and many other members in there cherish. And I kind of just wanted to take a second to communicate that in a bit more of a human way of me just kind of ranting about it a little bit. But it really is a space where you can ask genuine questions and get proper answers to them. Share your progress and actually feel heard while being on this journey of acquiring skills and developing your skills relating to generative AI along with others who are on the same journey. So, if you enjoy this channel and you're looking for a place to connect with others who are interested in generative AI and its possibilities just much as you are, then this is it. That's why we created the community. All right, that's all I got to say here. Now, let's continue with
17:45

TripoSG & Hi3DGen

the video. All right, next up, we have two new hugging face pieces that popped up this week that allow you to do text to 3D. Both of these are promising. It's a bit hard for me to say if they're actually state-of-the-art, and these releases are probably only relevant to you if you work with 3D models. So, we'll keep this brief, but I do want to keep everyone informed. One of them only generates meshes and the other one generates fully fresh 3D models. We decided to skip over some of the pre-loaded test images and uploaded our own image of a Labrador. And here are the results from model number one and model number two. Now, I think it's most interesting to view things like this in contrast to the current state-of-the-art solutions. So, here's a comparison between this new tool and what is currently considered state-of-the-art, which is the Hanuan 3D2. So, I threw that Labrador that I admittedly screenshotted from our test footage here. So, there's buttons up top. Hope that won't disturb it too much. And here's the results side by side. From what I can tell, they look very similarly, but hm, maybe some of the lines on this new model are even cleaner. It's definitely close. And if you want to AI generate 3D models, this new one is definitely one you need to
18:50

Project G-Assist

consider. Next one is a super quick one and this is only relevant to you if you own high-end Nvidia hardware. It's a new piece of software called Nvidia's Project G Assist. And the way I understand it is this existed before, but now they upgraded it with some serious AI features and it basically helps you optimize your settings to the hardware you have on your computer. So basically like having a tech nerd who knows everything about your setup, making recommendations and adjustments to your system so you get the most out of it. So, if you're on Windows and you have a Nvidia graphics card, you might want to look into the new version of G Assist. I just personally thought this looked really interesting and I would want to know about it if I own this hardware. And that's pretty much everything we have for this week. I hope you find something interesting. I'm going to go catch up on some sleep and I will see you next

Ещё от The AI Advantage

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться