about Sora 2 is the moment it's out that forever will be the worst that AI ever is at video generation. Likewise, Claude Sonicet 4. 5, which is claimed to be the best coding model in the world, although they don't fully have the stats to back that up, is I guess the worst that coding is ever...this is with thinking enabled and it was 54%. Big step up from Clawude Force on it and it does feel in the ballpark of Claude 4. 1 Opus when I'm doing coding on one benchmark at least bench verified. It even beats Opus 4. 1 and you might say, well that's already a model from Anthropic
Поиск по транскрипциям
dominate Chinese leaderboards. They're in the top 1% on Sinua, Superbench for text generation, reasoning, and coding tasks. But on global benchmarks, GPT5 still leads in coding and math. Claude is ahead on nuanced reasoning and writing. GLM5 is being designed to compete head-to-head with these models. If they pull it off, you're looking...using chat GPT or claude to generate blog posts, social media content or video scripts that cost you per token. But if GLM5 comes out as an open model with similar or better quality, you could run it locally or use it through cheap API providers. Or think about coding. If GLM5 matches GPT5 on coding tasks, you could
share what these AI coding assistants like Claude Coder can already do. They're not just chat bots that write code for you anymore. Now, they can access your command line, install packages, manage your development environment, and even troubleshoot issues on their own. When they hit an error, they don't give up. They keep trying different approaches. They
jump or anything. It says, yeah, that's pretty poor. That is not a dinosaur, mate. Not a pixelated dinosaur and definitely not a dinosaur. Claude is still coding it out. So, wait for that. And we have the code back from Grock 3 as well. So, let's see what we get back here. So far, Gemini is winning...preview, but there's nothing inside the preview, which is strange. All right, so we're just going to copy this code, and we'll go to p5 js and just preview this again. Let's see how Claude performed. Pixelated Dino Runner. Let's test it out. Not bad, actually. That is a hard game to play, though. Look
well, quickly had uh Deep Seek having their reasoning models. Um, hybrid reasoning. I'm not sure what they meant by that, but you've got Think Mode on Claude coming out as well. — It's just like a visual explanations for the average Joe. — Oh, got you. Okay. Yeah, it's just split. I just remember thinking that...they kind of separated in the very distinct and unique use cases and I think claude was the biggest one with I think 80% — of all the coding usage is literally through claude which was like very impressive for running these models uh yourself what's the say if you're working on GBT OSS where are you like setting
made for coding. The most popular code editor is called visual studio code which now has an AI powered tool called cursor. So cursor can basically read your entire coding project and make contextual suggestions. And with its agent feature you can type in what you want and it's going to attempt to build that for you within cursor...different AI models and workflows. Finally, it's important to point out that these general purpose LLMs that you already use for brainstorming ideas like chatbt or claude can also write working code. This means you always have a coding partner on standby, so you're not locked into any single tool. You can go back and forth between tools
going to run the command on the index HML and we'll compare these side by side so this is the output that we get on R code with Claude all right and this is the output that we get with Gemini now Gemini was a lot faster and easier to use with less errors Claude took a lot longer...number four actually gives us a better quality output especially on responsive mode so I mean that shows you how powerful rot code is pretty awesome to see pretty powerful and honestly why would you use Claude at this point like Gemini does it magic and it does it faster easier cheaper and better
browser. on your desktop. It can see your screen, control your apps, write code, build websites, do your work while you do something else. And it's beating Claude code and codecs on every benchmark. You can download it right now. Let me show you how. Deep Agent Desktop is made by Abacus AI. They call it a god tier...Second is the code editor. This is where you build apps and websites. It handles complex coding tasks, writes features, ships code fast. Third is the chat mode like chat GPT but way more powerful. It talks to Claude, Gemini, even GPT5 all in one place. Uses the best model for each task. Plus, it has desktop features
Recently with the launch of Claude 4 Opus and Claude 4 Sonnet, there has been a rise of confusion as to which coding model is the best. Both of these models are exceptional at elite coding and structured reasoning. On the Swaybench verified test, you can see both of these models lead in this category versus many of these other...Cloud Opus the same prompt, it is likely going to perform better overall due to its strengths in structured reasoning as well as clean code generation and building applications from scratch. The claude is a model that excels at producing logically organized, maintainable code with clear separation of concepts and concerns. Ideal for guies as well as different sorts
switch to thinking and we'll see what we got back. But I think so far Claude is still in first place for coding. Gemini, it's missed out L2 and R2. Test Gen Spark. I would test Gen Spark, but we'd be here for 10 minutes otherwise. And it's also It's only using Quinn and Code...fair. It says it's better for creative, emotional, and collaborative interactions. So, I would say like it's not really designed for coding that much. We can see some examples right here. But yeah, Claude for sure wins. Let's see what the next
want to see more details on that, you can actually see some details on it right here. But yeah, today we're going to be looking at Ralph Claude code, which is basically like a powerful way to create an infinite loop and just let your agents run on autopilot and just autonomously go off, cycle through and create stuff...that and then inside claw code which you can run inside the desktop by the way there was a new update inside claude this week where if you go to claw desktop then you go to the code section at the top so you can switch between chat and code you can run claw code inside claw desktop right here
Today we're going to be looking at Ralph Claude code which is basically like a powerful way to create a infinite loop and just let your agents run on autopilot and just autonomously go off cycle through and create stuff. For example, normally if you're in Cycl code, let me show you an example...that. And then inside claw code, which you can run inside the desktop, by the way, there was a new update inside Claude this week where if you go to claw desktop, then you go to the code section at the top. So you can switch between chat and code. You can run claw code inside claw desktop right here
your computer. And then we're going to run Claude after. I'm going to go and open up terminal again. And I'm just going to type Claude in here and hit enter. And look at that. We're ready to set up. And our next step will be logging in. But let me show...back up again, all you're going to need to do is paste that code into it and hit enter. And it's going to go through and run Claude code on here. It won't take too long. When it's all done, you can type claude at the end just like I did with the Windows
like there's a ton of new features that we've recently launched. There's a lot of momentum, and now there's other offerings as well, like the Claude Code SDK and things coming out soon. What are you most excited about, Katelyn? What's the future looking like here in the next 6-12 months? - Yeah...right now, essentially everybody is using Claude and it doesn't have a computer. So I'm really excited about giving Claude a computer and you see the very baby steps of that with the code execution tool, where the model can write code executed on the VM and get the results back. So it can zoom in on images
That's of course just anecdotal. I mean you could get the outputs from clawed code ask about it in GC5 codeex and vice versa and far more often claude code will say oh yeah I'm wrong sorry codeex is right and that is borne out in testing too and remember coding is anthropic specialty so openai are really
cloud code power setup and CLI. Um, yes and no. Yes, I think it can code complex SAS Claudebot. Uh, and at the same time, I still do think claude codes necessary. Like Peter Steinberger said this in an interview and I agree with him. Like the Ralph loop is interesting and all that...mentioning Ralph lately, huh? Ain't no one talking about Ralph. Open Claw is a less ambiguous claude bot. Ambitious realism creates another $5 donation. Thank you, my friend. Alex, you got to try running teams in team sessions. Duncan runs two coding teams. They ship six apps while I sleep last night. Game changer. Hash game changer
using Claude when you're interacting with mCP servers and creating apps or tools or websites Etc is that when you're doing this and you're coding out the files Claude can actually number one is it's one of the best for coding number two front end development as well but number three it can actually use browser...also it can use browser use so it can for example like check through that website it can deploy that website it actually show you a preview of Claude navigating through the website as it's checking all the information pretty powerful stuff right there we'll wait for that to load so here's an example right
there's some built-in tools in the strand framework like HTTP request but then I also just made my own tool which is literally this one line of code return this lens. So it's very easy to make your own custom tools. You just put this tool decorator and that's it. So very streamlined. All right...there's a whole bunch of them listed here but I'm going to highlight these two an example. So let's go back and we can also ask claude code to explain it as well. So get back me open my cloud code. So I have to explain how the MCP diag one works. So let me open that
launch Claude Code without any slash commands, then you can use this. So you can do claude --disable slash commands. And then if you do something like slash MCP or slash resume, you won't see any of the commands available. So I guess this gives you a more minimal like just chatting with Claude experience. Maybe they now also
more things you want to know as you use Claude for web and mobile is one, you want to avoid overlap of functionality. So, you're going to be spinning up a bunch of agents that work on a whole bunch of tasks. You want to avoid overlap between what they're working on. So, when giving tasks to agents...think of three tasks I can give my AI agent to build out before I go to bed. So, here's the challenge to you if you're using Claude Code. Before you go to bed tonight, pull out your phone, go into the Claude app, and force yourself to spin up three AI agents to work on functionality