GPT 5.5 LET'S GOOOOOOOO!

2:05:30

GPT 5.5 LET'S GOOOOOOOO!

Wes Roth 23.04.2026 20 625 просмотров 312 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

GPT 5.5 LET'S GOOOOOOOO! #ai #openai #llm

Оглавление (25 сегментов)

Segment 1 (00:00 - 05:00)

I hope so. So, let's uh let's find out. — Yeah. — looking good. Yep. The same on my end. So, yeah, internal code name Spud. You know, like a potato. Um it is now live, it looks like. GPT-5. 5, the most capable model yet. We'll put that test to the or that claim to test. Pretty exciting. Let's see where we're at here. Um it does look like this is already rolling out to chat GPT users plus pro and enterprise tier. So, I'm going to do some testing here in a minute, also. Um agentic intelligence seems to be uh new. So, that's kind of the big headline. I'll look into that. I have found the some of the agentic searches, like when uh like when I use Grok and it breaks it into like 16 and I can see them like chatting with each other. Like, you know, agent number three is correcting agent number four. I do find that pretty fascinating to watch. So, I'll see if they've added something like that. Um multi-step tasks, uh more improved tool use. Does a better job checking on its own work. All right. And keeping context along uh across longer workflows. So, a lot of stuff here to investigate. All right. So, for if anybody watching, you can uh watch on Dylan Curious, my channel on YouTube, or you can be watching on West Roth's YouTube. I think we're trying to dual stream here. So, I'm going to be monitoring the comments on my channel. And uh seeing what you guys have to say. Okay. Yeah, we have people piling in. So, welcome. Welcome. Let me just I'm doing my last-minute tests and making sure everything is working. I'm going to learn One thing I learned from watching you live stream, West, is like that to add those polls in the chat. So, I'm going to try to figure out how to do that today, too. Just be like, "Hey guys, what do you think? Do you like it or not? " Yeah, polls are pretty awesome. Oh, pop out chat. That's what you're talking about. Okay. Now I feel like a real live streamer. Twitch person. Yeah, Twitch. Twitch streamers, here we come. What's up, Zen all? Sup? All right. So, I think actually everything is working. So, let me just open up all the pages that um All right. Yeah, it looks like coding performance is definitely the big focus here. I'm sure they got a little scared from Anthropic's both profits and success, you know? But, um 82. 7 on Terminal Bench. 56. 6 on SWE-Bench Pro with real GitHub issues. That's pretty good. Predicts bugs before they happen. All right. So, hopefully everybody can see that. So, introducing 5. 5. Maybe let's look at some of the first impressions uh and then we'll drop uh jump into like what's happening. My first impression of GPT-5. 5 is that it is it is different in the sense that it actually understands what I'm trying to tell it to do. Um I see before previously a lot of my prompts have to be very detailed or very instructiony kind of where I'm trying to tell it like, "Hey, look in this part of the base. Do this. " Uh whereas with 5, sometimes I become lazy and I kind of give it a very biggest task and it will figure it out. It actually directs its research and exploration to the right areas of the code base, comes up with potentially multiple options of how we can do it, and then um gets a company. So, it's been impressive. We had a little backlog of crew and I just dropped that into a CSV, gave it to 5. 5, and said, "Go fix a couple categories of bugs that had really bothered me for a really long time. " And it did. I would say a 98% job all by itself, and I buttoned some stuff up, and it was done. It was able to traverse a pretty complex code base. It was doing the grouping and architecture of the solutions in a way that I didn't feel like I had to babysit as much, and I saw the alerts on bugs that had really kind of bothered us for a really long time, but were hard to hunt down. Totally go away. It's been incredible using GPT-5. 5. Uh I did not expect the tidal wave of pull requests and changes coming in uh as a result of engineers having so much intelligence at their fingertips.

Segment 2 (05:00 - 10:00)

My first impression of GPT-5. 5 — All right. So, let's um there's a little bit of an echo. Let's see. All right. Uh yeah, so everybody, thank you so much for being here. Let us know where all are listening from. Um and let's check this thing out. So, yeah, people are asking [clears throat] like, "When did this just drop? Did this drop today? " Like, — the benchmarks, but then you start using it, it's like kind of not as great. But, this one was this one seems very very solid. Frontier math 51. 7%. Uh Frontier math tier 435. So, we're seeing some pretty big jumps. Oh, and I'm not even looking at the GPT-5. 5 Pro. Interestingly, it looks like it's even better. Like, the Pro is slightly less than the regular GPT-5. 5. Yeah. Yeah, someone in my comment sections kind of theorizing that it could be a uh like a checkpoint update, you know? Like, they're really training for that big step change model, but they had enough capabilities here that they thought, "Okay, let's just launch it. " So, you might not get kind of records in all aspects of it. It's not finished tuning. Mhm. Total outputs. Okay. So, yeah, we're kind of seeing it that the curve is above the previous models. Here's Claude Opus 4. 7. And hm Agentic coding with Terminal Bench 2. 0. Sorry, this just dropped. Usually I have a Sometimes — to look at it, yeah. — Yeah, to read through it, kind of like get my bearings here. It's like, nope. Here's just everything. Start doing I'll start doing a few

Segment 3 (10:00 - 15:00)

questions to it too just to see how it handles. Let's see. This is a NASA / JPL Horizons vector data for Orion, the moon, the sun. So this is interesting if we can Oh, here's the prompt. Attached image, implement this as a new app. All right. So something like this would be very interesting. So this is they're trying to do what to to plan the trajectory of a certain Is this a spacecraft flying past a planet so that it uses the gravitational pull to turn? Is that what it is? I'm trying to make sure the math is right on it. Probably. Great visual though. I mean Yeah. We have seen some stuff similar to that, but the fact that it might be using the right logic and giving you a visualization like that does seem like pretty awesome. That is pretty cool, I got to say. Some vibe we can do some vibe coding tests. Where's the energy Wes and Dylan? I don't know what is happening. This is a low energy day. So we were actually supposed to interview somebody. We we logged in at What was it? Like half an hour ago and we're waiting for the interviewer for the for person to interview. We're just like, "Oh my god, 5. 5 came out. We should just go live. " So we should go live. So we are This is like as unprepared as I think we've ever been. Well, for this thing. So what we but which we're not going to do. But what I could pivot into is like 15 questions for Wes about the origin of the universe. But then you'll be like, "I have no idea. Like I don't know. " Yeah. — guest was like a cosmologist. So I was like, "Okay. " We were going to talk about a Penrose and human consciousness intelligence, where that comes from, and all sorts of other stuff. But Yeah. I have We could talk about health. Dude, I have so many fun things we could talk about, but I'll just we'll try to get this 5. 5 breakdown as soon as possible. Um Yeah, I mean like you have smarter reasoning. You've got the same speed as far as like how many how the tokens are being processed. Um but they say better real-world task execution also. Um Financial workflows, people are saying, seem to be like much better. So if you're already getting rich using chat GPT, maybe this is going to be your next step change. Some people are like some kind of planet. Did you not notice the moon emission? I apologize. It took me a second to Yeah, so this is it's circling it or going around the moon. Oh, it's not Alpha Centauri. Who would have thought? Yeah, I Yeah, sorry. I'm slowly coming online here. I apologize. Higher cybersecurity capabilities, too. That looks like a big thing they're focused on. Biochem reasoning. That's interesting. I wonder what the pressure is behind the scenes on that to getting the cybersecurity up. Especially when they know their competitor might have the better model. Mhm. What is the context window? Did you notice That's actually a good question. — out. Gene bench, I haven't heard about this. So let's see. Uh 5. 5 gains on scientific and technical research workflows, which require more than answering a hard question. Researchers need to explore idea, gather evidence, test assumptions, interpret results, and decide what to try next. So GPT 5. 5 is better at persisting across that loop than other models. Okay. Very very cool. So I guess Dylan, if you want to take over and maybe just uh talk about some of this stuff. I will see if I can pull it up in the API or in chat GPT proper and see if we can put it to the test. Okay. Um what How do I keep these entertaining? Like hey, ask me some questions. Let's do Q& A. Like somebody ask some questions and we'll get a conversation going. Also depends on if you want to go off topic a little bit cuz if you want to come back to GPT-5 and you can let me know in the comments, I could talk about some other interesting things that I've been looking at this week. There's like some societal studies that were done on like how AI is affecting us. Also I did like a bunch of research this week on health cuz we're doing that peptide podcast. But I was thinking a lot about digital twins and like

Segment 4 (15:00 - 20:00)

how AI and health are changing including some like gene-related stuff. But if that's too off topic, no problem. We can do something Everybody please throw in some questions. So we're Like I said, we're kind of just spinning up right now cuz I don't think anybody sort of I mean we knew this was coming out. We just didn't know when and we were focused on something else. So we need like 5-10 minutes to just set everything up in order to start testing this model. In the meantime, yeah, throw some questions. Open Mythos. Dylan, did you see anything about that? I keep seeing somebody somehow managed to reverse engineer Mythos. — Yeah, well, enough of it leaked, but that's another one I'm not super up on right now. Open Mythos. But I did want to cover that one for the video that I'm going to film tonight. I think Yeah, I don't know. I think they're trying to I saw um Was it Carpathy? Try Someone is saying like China's With Open Mythos, they think or maybe it was Dario, but it was just saying like that we're basically 6 months behind and Open Mythos is close to it. So I have to look into that one a little bit more. By the way, I am reading The Infinity Machine and I think you might like this book. Like it's the first time I've got the full history of the origins of DeepMind and I had no idea like Peter Thiel was one of the early investors like through the Founders Fund and that Dario had to like go to his house and pitch him and the whole thing about keeping it in London and all that stuff. So that was kind of like my week. Like what I'm really in on right now is like the biography basically of Demis of Demis and how like how that early stuff played out cuz I'm getting kind of ready for this lawsuit when they're when OpenAI goes up against Elon Musk and they talk about the original like story. So that's been my expertise for the week, I guess. That's my deepest dive stuff. And that and like all the health issues. Not that I'm having health issues, but like the you know, guts becoming programmable, the virome, like how AI is starting to figure out which viruses are like in us, gene editing, and then body digital twin stuff. And then a little bit nano like Have you thought about nanobots, by the way? Like kind of a side note, but Not too much. I mean that's something that's been on the radar for a while. Right now I'm kind of like thinking more about like AI, but I know that's the next sort of We're going to get to that whole thing at some point. Dude, yeah, Wes and Dylan podcast 2028 will be like, "All right, are you experimenting with the new nanobots that patrol your bloodstream? " And yeah, I don't know. I mean that one seems Even drones right now still seem a little unreliable, so I don't think I want those machines in my body yet, but tiny machines that can hunt down cancer cells, unclog arteries, give your immune system a boost. Like I guess. I mean if they're walking around cleaning your house, why don't you make really tiny ones and have them clean your organs, I guess. But to me that's crazy. Also Oh, you know what you know what else I did watch? I watched God, what was that guy's name? He There's this guy that's not usually an AI guy, but he's a 3D printing guy on YouTube. And he was talking about some new systems that are building 3D printed rocket parts using um like copper. And I was like AI is starting to do some really cool stuff over there. And it can't hallucinate at all, right? Like you just can't Like physics is physics. Like if you want the most thrust you can get from an engine, it has to be programmed. But you can use AI to program the physics simulator and do all the double-checking and all the verification that all the code is legit. And then you can 3D print the part and test it. And that seems to be like it kind of in its own little world of breakthroughs right now, too. So that would be a fun thing to explore. Cuz there's no extra cost for complexity, you know? So now all of a sudden AI is coming up with these super fascinating kind of engine components that would have never been built before cuz they would have just made the cost too high or been just impossible based on the kind of When you know, when you have a block of of an object like some kind of metal and you're going to carve into it, you can only get so intricate. But when you 3D print it up by the layers, you can do all these other things. So that was another sort of fascinating thing. And then um Yeah, like and then mental health like BCI interfaces and mental health is something I want to explore in a future video cuz um it'd be kind of fun, I guess, to be connected to something like Neuralink, but then I was like kind of thinking like, "Well, how do I trust the guardrails to make sure that my brain is still under my control and like people don't sell me stuff that I don't want to when they have direct access to my brain? " So I'm going to probably try to do a little deep dive on that, too. So, if that was a wild curious tangent, let me know if there's any aspects there you guys have questions on or want to chat about. We'll let West

Segment 5 (20:00 - 25:00)

do his thing and get up to date on 5. 5. Well, I'm going to share Let's take a look at X cuz actually some pretty interesting stuff is coming through on X. — Okay. And let me share this. Boom, look at that. Okay, that worked. All right. So, first and foremost, uh Sam Altman First of all, he's got a cool new little visual here uh for his profile. So, let's see what he's saying. So, what — your thoughts on his new image profile? Terrific. I guess I get We're We're no longer being — Breaking news here. My opinion on Sam's new profile pic. Anyways, go ahead. Let's see what you got. — thing is like the thing is like with the Ghibli thing, it was such a big thing that I think my profile is still Ghibli-fied. So, um it's almost like fashion. You know what I mean? Like Oh, yeah, dude. If your avatar isn't your profile your avatar isn't cool, dude, you'll never get a date. Exactly. You're not cool anymore. Um let's see. So, we believe in iterative deployment. Although GPT 5. 5 is already smart model, we expect rapid improvements. So, I agree with him a lot on the iterative deployment. Like let's kind of like instead of holding it back and then this tidal wave just wipes everything away. Now, just keep throwing pieces out there. Yeah, things will break. Things will kind of need to be reshuffled, but I feel like it's better than I think this is the right approach for AI. — Well, also you've kind of shifted my opinion on that, too. Cuz when we first met, I was a lot more in the camp of like, "Hey, this is getting out there too fast. " But you did convince me that like the more tiny steps we take as a collective is maybe the safer way than waiting. So, yep, I agree with you now, too. And it seems to be working out uh at least so far. He's saying, "We believe in democratization. We want people to be able to use a lots of AIs. We want to have the most efficient models, the most efficient inference stack, and the most compute. " Yeah, using the open-source Kimi K 2. 6 yesterday. Like it sat there for 40 minutes just cranking away, but it's free. It's free because in the Hermes agent by News Research. I'll put a video on that later today, probably. It's almost ready. Um it just feels different cuz the thing is it's like if the model is free, like do you really care if it's like takes an hour or a day or a week? As long as it's cranking away at its project that it's given project, you know, um it's like whatever. Yeah, take a week. Just come back a week later, tell me it's done. If it's free or super cheap, it's like that's totally fine. Um and Kimi 2. 6 might be the first model that you can actually use it for serious coding tasks. It's not as good as, you know, the Opus and all that stuff. Um yeah, cuz sometimes like Techninja420 quality is better than speed. Yeah, I mean, sometimes you need speed for certain things, but there's a lot of stuff where it's not just not necessary. So, like go ahead, take your time, come back. Um so, there's it just these models hit a little bit different. Um and yeah, I'm a little bit more kind of pumped up about open-source models because we're beginning to see some pretty good ones. Oh, yeah. And I dream of the time when I can just have my own I mean, I guess you kind of do with your open claw system, but yeah, just an extra PC sitting around churning away at my problems all day. Like it's a powerful future. So, you're saying, "We love you and want you to win. We want to be a platform for every company, scientist, entrepreneur, and person. " Uh Scientist, entrepreneur, and person? All right. — Yeah. You want all of us? We're It's all all-inclusive. All right. So, this is from OpenAI developers. So, one So, my theory that I kind of posted about yesterday is this model is going to be really good in combination with the image 2. 0 that they released what, 2 days ago? In the sense that that's going to be able to create images of front-end designs, then you put it into GPT 5. 5, and this thing is going to be incredibly good at creating those, you know, kind of high fidelity exactly like it was, you know, working sort of prototypes from images. So, we'll see how well that works a little bit later. I still don't have access to it. Does anybody have access to it yet? Um 5 minutes ago, people were saying they don't. If you get access in ChatGPT or the API, let us know. Yeah, let's see if I can choose that model from the picker. So, Codex now generates high-quality

Segment 6 (25:00 - 30:00)

spreadsheets, slide decks, and documents. That's interesting. So, in the GPT val, you saw just the progression of how well it does like Excel, for example. And it started out kind of like, "Whatever, you know what I mean? " Didn't look too good. And with the latest series of models like the previous generations, they got good. They looked really good. Like so, if this is yet another sort of a step forward, then this thing's going to be a spreadsheet machine. Yeah, have you How familiar are you with like spreadsheet AI? Have you used Google Docs with their newest one or Microsoft Office with their newest version? The thing is like not really. Somebody's saying double the price of 5. 4. Yikes. Okay. Um Yikes. Dang. Okay. Uh so, compared to Opus 4. 7, it's better on a lot of tasks, it seems like. Um I mean, when the stuff first started coming out, I was like very impressed because I was like, "Oh, just whatever I want. " I just type it in. I'm like, "How do you make a formula for blah blah blah? " It's like, "Here's the formula. " You just post it into Excel spreadsheet, whatever. But after Open Claw came out, like I think to me and a lot of other people that like Karpathy talked about it, Peter Steinberger talked about it. It's like at the end of the day, the final UI for everything is going to be these AI agents. Mhm. Why? Why do you need spreadsheets? Like if you really think about it, it's like this I think it's already outdated. Because right? So, Just let it control the whole computer and it'll just bring up a spreadsheet and do what you want. Not have this tool integrated into it. Yeah, cuz spreadsheets is the interface for us, for humans. So, we have this data. Like Why do we have spreadsheets? We have a whole bunch of like unruly data, tons of data. We need to make sense of it. So, we need to put it into buckets and then make charts and then make this and this so that we can understand it and get insights. The thing is that's what I'm doing with like Open Claw. Like go out there, scrape this stuff. And then I'm going to ask you questions about it. Like the — Yeah, that's an interesting point. You know what I mean? Like the spreadsheets, they're just like in-between step, but it's like, "Why? " Like you don't need cuz if you're able to just ask questions of the data, then that whole thing the necessity for it goes away, I feel like. — Yeah, if you go far enough into the future, it seems to me like the only way that I really will interact with AI is something like a conversation. Like you know, like the movie Her where you just have the AirPod and it just understands and sees around the world with you, maybe the glasses. Like that's the ultimate form factor. But so much work is done on a desktop, and if it can move the mouse around and bring up a spreadsheet and fill it out, I guess I really don't need integration into the spreadsheet. But as of right now, I kind of like it because it's still in the world where I'm like bringing up a document and I'm I guess I'm not giving it access to everything all at the same time. It's weird with the spreadsheet though cuz I'm not used to a spreadsheet talking to the internet. So, my The crazy thing that keeps happening to me is I'm asking it information about the data I have, and then it's going and processing not just new columns with new formulas, but it's it can pull in information from, you know, outside of the sheet. And then I say, "Oh, I didn't need all that information. " I kind of almost get too overwhelmed too fast with it sometimes. Cuz it doesn't need to I don't need that much of an interface. And then if I do have questions like that, I'd rather just dump them into a folder and then point, you know, a project at it or go into a Google Drive session and then have it pull everything out of there. So, yeah, maybe the spreadsheet thing is a little bit in-between, but for somebody who's just on spreadsheets all the time, there's got to be so many cut and paste savings that they're getting from it. Yeah. I'm curious how this is going to affect So, when So, there's a lot of questions now recently about how this will affect like the software companies, right? Cuz a lot of the stuff that Claude releases tends to break stuff, tends to, you know, the um uh stock prices go down, various things. And I still feel like a lot of the analysts and people out there, they don't really understand what the final destination [snorts] is. Because they're like, "Oh, there will be more people creating software. " Right? There's going to software. And they're like, "Oh, but there's some lock-in for these older software. " Like, for example, for the customer management relationship management CRM systems. You know, they're like, "Well, if like I remember one analyst that is fairly well known out there, um he's saying, "Well, you got to understand

Segment 7 (30:00 - 35:00)

like if there's a person like in their 50s that's used to a particular CRM, it's hard for them to switch. " Blah, blah. So, the idea was that they're going to be sort of locked into that ecosystem. Um Oh, the great unlocking? Is that where you're going with this? Well, I don't know what the great unlocking is. I haven't heard that term, but the the thing is like at the end of the day, everybody's going to be using these agents as their UI to everything. So, this idea of being locked in Well, tell me, what is the great unlocking? What is that? That sounds like Is that exactly what it is? Well, it was created by me 1 minute ago as just a comment, but you know, like the great unlocking, I better define it now. So, uh we actually have an audience listening, I can coin a term or something, but You heard it here first. Yeah, I mean, it's nothing official, but it's just my I use that word just thinking about how there's so much data in so many CRMs, and I can be logged into them and pull them out, but to me, the great unlocking is the interface where you step back. It basically open claw, right? Like it already it just it can use Zapier to like link an API, it could write its own API, if it could credential in as you and authenticate itself. I mean, you're just unlocked so many connections for data. Like the way you learned about your blood work by unlocking different data sources and putting them all together is just something that we're going to see a whole lot more of, and when contacts and emails and notes and blood work all overlap, there's going to be stuff there that is just fascinating to learn. You're going to realize there's some weird connection between something you wrote um your friend about like I'm feeling sick today and something about your blood work that you never thought was connected, and it's going to point that out. They were just such a different data sources, you know? Or something even bigger than that, like broadly speaking about what's going on in the national health system or something about pollution that you didn't think might have been correlated with health. Like it'll just be wild how many connections there will be. Yeah. I mean, for people that maybe not haven't heard the story, so for me, I used open claw to basically go back years and find all of the blood work that I've ever done, and I just threw it all in there, right? Um and so, what that allowed me to do, I think that's actually a great example of what I'm talking about, because before number one, you had to it was kind of locked into where it was sitting, or if you extracted the PDFs of the labs, you know, you could put in some folder, whatever, but then you would have to like how do you sort of combine it? I guess you could go line by line, put it into Excel, so that you're able to track kind of movement over time. Open claw, or it's not limited to open claw, it can be any agent. Really, this is open claw just kind of showed us what was possible, I guess. I'm using Hermes quite a bit now, so some people are saying that they love Hermes. prefer open claw. The the At at some point, there's going to be a million of these different AI agent systems, and everybody's going to have their own personal flavor, depending on what their favorite thing is, and what they're using it for. But the point is, now that I have all the blood work in one place, I can just ask questions from the agent. Or I can even do deep research and say, "Hey, just comb through all of this. What is changing over time? What things are you seeing that could help me be healthier? What steps should I take? " And the more and more context that I give it, cuz these things are like context monsters, right? So, if you have all of your blood work and you have your uh genome sequence, then you have maybe you've been You know, I have my Whoop thing here that tracks my sleep and my exercise and resting heart rates and heart rate variability, right? So, imagine just throwing all of that into an agent that has context across that. I mean, that's more than a human being could ever possibly keep in their head. So, it's going to be able to pull out better insights than potentially even your doctor could, because your doctor that's been with you for 10, 20 years, yeah, they might know some detail about you that happened 10 years ago, but when they're sitting down with you for a consultation, they're not necessarily like — You know what I mean? Like able to see across that whole timeline, right? Maybe if you have your own personal doctor, you know, that's you're paying a million a year just to focus on you, but this allows you Mhm. Yeah, well, that brings up So, what's your thoughts on I don't exactly know how Hermes handles memory, but I know about Andrew Karpathy's project where you kind of create your own Wikipedia, where it says, "Okay, instead of having this folder full of all your blood work and all your, you know, contacts, and then having this huge

Segment 8 (35:00 - 40:00)

context window that you put all of that into, and then it tries to piece it all together, you should be building every day some sort of personal Wikipedia, where each time you dump something into that folder, it's read, it's synthesized, an AI looks at it and says, "Oh, what's interesting about this in terms of the last questions that Wes has been asking about like what I know about him," and it tries to put together a Wikipedia that has a relevant sort of network of everything, and the waste is gone, and then you're closer to what matters, but you're also now a little bit in an editorial place where the Wikipedia, you know, personal Wikipedia itself of your health history might have some bias, but you could also reference the ground truth. You store that also. So, anyways, what what's your thoughts on like the actual way that the memory is maintained and how that matters in the quality of your responses? Yeah, I mean, I feel like that's So, first of all, I we got to take a second and just be thankful for how surprisingly simple and straightforward memory is for these AI agents. It didn't have to be this way, but here's the thing, and this is what I think a lot of people uh sort of who are very surprised about, like for Claude code, for these models, like how do you save all this data? Is it some weird SQL databases? Is it whatever? No, it's just a lot of it is just markdown files, it's just text files with some code. I it's so simple. And the thing is, if I want to migrate all of my knowledge from one to the other, I can just I can just do it. A lot of it is going to be just text files. It's able to pull data from the internet, it's able to read the pages. Like Wow, like how simple it is. Yeah. So, so like now people are talking about, "Okay, so do we use Obsidian to um manage memory, or do we use this other thing, Karpathy's What did he call it? LLM library, or LLM Do you remember the terminology for this? — Yeah, his Wikipedia project. — Yeah, the wiki project in Hermes. Probably later today I'll drop my Hermes um agent video. Like there, what some people do is they create I forget, like profiles, I think they call them. So, basically, you can have multiple agents. They're almost like sub agents underneath your main umbrella. They'll communicate, but they're like different profiles. So, for example, you can create a librarian profile, and you can select a specific model for it. So, for example, you could use a local model, because it's dealing with these massive context that it needs to manage. So, but it doesn't need to be super smart, you know, it's not like it's doing a lot of coding and stuff like that. So, that could be like a local model that just is a librarian that's slowly crawling over the vast amounts of data that you're shoving into that library. Um Oh, and are you You're choosing the agent? The only thing I know about Hermes is that loop that it was supposed to go through, the think, act, observe, adjust loop, or whatever. But are you choosing the personality kind of or like the goal of the agent like librarian, and then setting it off on its task, or is it doing all that for you? You So, it's so funny, because with Hermes, so New Research is the people be You know, New Research, we interviewed uh their director of AI personality, or I forget exactly what the title was, but man, that was kind of a mind-blowing interview. But dude, they dropped so much that like what we're talking about is like one little piece of like this uh on a sandy beach, it's one grain that they've dropped. So, yeah, they allowed you to create a separate profile with its own model, with its own personality as a sub agent that just manages like this particular sort of area of your little agentic empire, right? Um and you can have one coder and one this, one that. The thing is, they ship so much that it's really kind of like hard to even um to follow everything that they're doing, you know what I mean? Um but yeah, so your question was like, you're able to Are you able to set up a like a specific personality model for it? That's what their profile system is. Um that Tech N was talking about. Mhm. You know what else you should Maybe you started this at the beginning of the podcast, but the um it might be good to share your thoughts on this combination between the new image model that came out of OpenAI yesterday, and then this sort of coding agentic kind of update they have today and how you think they might fit together cuz we're we were talking a lot about text and Wikipedia, but seems like you're kind of thinking images are part of Sam Altman's view for how this all plays together. Yes. That's a great question. So, first

Segment 9 (40:00 - 45:00)

and foremost, like really fast, here's kind of like what I was talking about. So, this is Technium. He's one of the I think main guys over there at News Research. Um so, he's talking about the Hermes agent profiles. So, basically, this person has a dedicated librarian profile to maintain the LLM wiki, right? So, all of this stuff that you're sort of putting under one umbrella, it's all there. This uses a specific model, has its own skills, so its own skill set, its own personality, its own model. And the other agents, Hermes coding agent, can ask it questions. So, it's dialed in on just owning the wiki. So, I mean, does that answer your question? Like that that's seems pretty awesome. It's its own little agent that just is managing your wiki, your library. You know what I mean? Mhm. So, it's um pretty awesome. Here's another person that did a comparison between Claude Opus 4. 7 and the newly released Kimmy model. Um some people are saying that Kimmy So, the big concern, the big sort of drawback to these open-source Chinese models is they were benchmarked. Meaning that they really tuned them to just maximize that they're you know, ability to — Benchmarks, yeah. to do the benchmarks. But then, in general, they weren't as good, so they weren't generally smart. From the few tests that I did with Kimmy K 2. 6, it's probably not as good as it appears in the benchmark on the benchmarks. Like, yes, they've it's overblown. [snorts] But if you ignore that, it's pretty good. Like it's it got to the point where it just sits there for like an hour and it slowly just grinds through what you give it, and at the end, it works and it's surprising. Um let's see. And hang on, can you remind me what the question was that you asked? Why did I go off on that? — I was going to kind of talk about You had a thought that maybe like especially in building web pages, that it might be — a higher [clears throat] like there might be something about the image model that's also kind of understanding how an image might break down into something that you can code around or it can take sort of a concept and be like, all right, if this is a you know, a 3D image, then maybe if they want me to code this up, I need to go to a 3D modeling program and build it. Or if it's an architectural thing, I need to go to a CAD software. Or if it's a blueprint, I need to go to something else. So, it's kind of there might be something about the way it's understanding images that's connected to coding. Yeah, so here's one of the person that was one of the people that were able to get their hands on the stealth model testing release for GPT 5. 5. So, they're saying that the UI design, so front-end coding, is the best. And what they've done is they just took images of websites, and the model is just easily able to replicate it. In fact, it does seem like it has these little tricks where it almost like um it takes the reference image and it autonomously crops the exact UI elements. So, if it has certain buttons that have a certain specific look, it will just like crop those buttons out and it'll put them on the website. That's what it seems like. Um and what I think is why I think this is important is because their new model that they just got to release, GPT Images 2. 0, is extremely good at front-end design, at creating images. So, you create that image in GPT Images 2. 0, you put it into GPT 5. 5, and you're able to go from sort of just like a brainstormed idea of a certain look into a fully functional website with that precise look. Um Do you feel like if it's on the same thread, there'd be an advantage than just bringing it in? Like if it actually generated the image, did it also generate some kind of meta information that might be contextually relevant to then coding it into an app? I mean, it sounds like they're talking about that super app. It sounds like it's all going to be My guess it's more connected than we realize because OpenAI has been like kind of explicitly re- refusing to answer questions about the architecture of GPT Images 2. 0. Like we don't know too much about it. Is it um is it what's it called? Is it um diffusion or is it Dude, it's so hard every once in a while like I'm talking and somebody throws some bombshell in the chat and my

Segment 10 (45:00 - 50:00)

brain just goes to zero. — I dude, you're like screen sharing and trying to talk with me. Like, yeah, this is definitely an attention span. This Well, I was livestreaming in general is so new to me. I can just feel my brain failing, but please forgive us for it cuz it's a lot. It sounds like it the model is in the Codex. So, um my part of my brain is like I don't know. But it has to be a diffusion model, right? Is it Could it not be? Um we don't know. Oh, even something so basic as if it's a diffusion model is up for grabs? Man, that's crazy. Yeah. — Yeah, there there's so much behind the scenes. You're right, it could be connected in all sorts of ways. So, I'm curious cuz they've been hinting at this uh super app. So, I'm wondering if it's possible that it's all somehow interconnected. Um maybe even being like one model or I have no idea, but there's definitely something cooking behind the scenes cuz also, we're noticing, you know, even in the main interface app, they're kind of going away from showing the model numbers and they're trying to be like they're trying to take all that stuff and all the drop-down menus and kind of just like push it away, and eventually, maybe even have that completely go away, and then just everything's going to be just the one thing if that makes sense. Mhm. By the way, I don't know if you know, have you ever played with a diffusion language model where it's not an LLM, but it's like it diffuses the words like the end of the sentence, the middle of the sentence, and the front all come like at the same time? I think I hold a world record of sorts because at least I did for a time because when Google released their diffusion coding model, and I don't remember what it's called, but it was like last year sometime. Do you know what I'm talking about? They had a — the name of it, but I knew a few of them were out there, but I didn't know either of any of them enough to Yeah, so basically, I when it just first got released, I recorded myself making I think it was like 10 or 15 different little bits of software. Like simple games or simple whatevers. And I think it was like every 3 seconds I would have a new piece of software. — Because it was literally like that. — yeah. Yeah, it was like, create a little Minesweeper game. And it's like, and the code would actually it would create the code and the code would render in a different window and I was like, done. Next. Create a little Flappy Bird game. And it would render. And so, like every Maybe it was like every 5 seconds seconds. So, I was able to create whatever it was, 10 pieces of software in like 20 seconds. I posted the video online. It was just kind of — Yeah. Well, they're so fast, but it's also just funny that the sentence kind of isn't in an order. So, yeah, I could totally imagine like a game like Minesweeper where just gray becomes yellow cuz it just start like it diffused gray and then iterated to yellow and you're like, oh, okay, that just changed up. I mean, if people were like diffusion systems, you'd be saying all like go left, go right, up, down, left, right, forward. Okay, forward. And then you like land on forward and you're like, why did you have to say all those other things? Yeah. — You know, cuz they all come out of order. But yeah, I mean, even if I guess I'm going to still assume that the OpenAI image model is um diffusion just because, you know, DALL-E was and most of them kind of are, but I'm sure it's not just that anymore. I'm sure there's transformer-based architectures understanding the prompts. There could absolutely be simulations of some kind for try to you know, if you say this is a person standing in a windy hallway, maybe it like has some coding simulator that it uses for wind and then diffuses from there. So, you know, some kind of inpainting happening behind the scenes for parts that need that kind of effort. It could be some mixed version of all of that. Or it could be like a purely just not diffusion anymore, but it seems to me like it probably is. I don't know. Yeah. So, I kind of want to test out Codex. So, uh if you can give me a second here, let me see if I can pull up this model in Codex. Okay. Yeah, I was just I was thinking like cuz if you say to a image model like draw a cat, you know, it's it's pretty open-ended, right? So, it's going to have like a lot of different ways that it could go and a diffusion model is some small like change in the way that the fuzziness is at the beginning might make the cat like a cartoon cat or a realistic cat and if you don't specify those things there's got to be some kind of placement logic that you would think that it would have been reinforced from that it might take into context cuz if too many users said oh you know when I say draw a cat

Segment 11 (50:00 - 55:00)

I expect a drawing not like a realistic one or something like that so I'm sure the reinforcement learning too is on some layer underneath it but I don't know it's kind of just wild to think about. Mhm not Narl Narhead saying waiting on mailing list for Codex on Linux. What So tell me a little bit more about that. People are saying click bait titles. Do I have a click bait title on this and What? — GPT 5. 5 let's go. How's that click bait? GPT 5. 5 it's here. Let's go. What how what? Uh — Yeah and I mean one of my commenters said that GPT 5. 5 declared state-of-the-art on Arc AGI 2. Oh interesting. Yeah. Yeah and then also people are talking about space. So okay so the so the SpaceX the new company that's going to go live. Do you know what it's called because it has you know XAI in it. You know Twitter {slash} X is in it. Do Do you know what — Oh I think it just debuted as SpaceX right? It just owns all those other components. Is there a new name for it? I think it's Yeah okay yeah somebody posted it in the chat. — it? Probably just X right? Knowing him like x. com or something. — So it's Space XAI. Space Oh okay like space times AI. That's interesting. — No like all three of those companies combined into one. So it's a SpaceX X XAI. — Oh SpaceX AI. — So that's uh they shoved all three in them. It's like when they renamed Google to Alphabet. They're like we want one company with every letter in the alphabet. Like okay it's a strange. And then met at least it wasn't like as bad as meta cuz I thought meta was a cool idea but then they went like just into AI and didn't even they gave up on the metaverse it feels like. Dude have you seen any of that stuff 80 billion dollars lost on the metaverse? No. Yeah. I mean maybe they're just too early or something but you know it's not like those glasses are exceptionally profitable or popular yet. Yeah I feel like that's such an interesting technology but man like I don't know are we ever going to like see actually what it's capable of like I it just never seems to go anywhere. Do you like being in VR? I really I loved it when I first got my hands on the Vive. Um loved it. I was like oh my god this is incredible but then after a while you just don't ever want to go back. It's like a hassle. I don't know. Do you feel like the um the experience is just dizzying like it hurts your brain after a while? Um I did have some not nausea associated with it. Yeah I feel like that's the big thing. If you can't just walk around your house and like have lunch and have it on it's just it's hard to imagine it being as useful as sort of a laptop or a phone especially. But I thought yeah I kind of would have predicted to be honest pretty off on it but I probably would have predicted that I would sit here in at least 2026 2027 and probably not have a laptop. I thought a headset to do work would make more sense but I don't really see that on the horizon at all anymore. I mean the Apple Vision Pro was fun as a demo but I don't see myself buying even a improved version of it right now. Yeah. I never tried the uh Apple stuff uh you know for me but I did love the um what's it called the uh Vive HTC Vive those very fun so Yeah no I mean the experience is cool. It's just very like novelty feeling. And um Well I don't know it could it can be social too. I don't know I don't get it. I totally get why somebody could be convinced that this is a worth 80 billion dollars. And Zuckerberg was in a position to do that and it didn't seem like he you know AI was on the radar the way the transformer model had you know not really This is before GPT three I guess when he first named it meta you know. So I get it but this is just interesting how it doesn't play out the way you think it might. Also like I have the same question about computation in space. You know I look at the all the advantages of SpaceX putting servers in space and he's not wrong but somehow

Segment 12 (55:00 - 60:00)

my gut's telling me it's just not right yet or we're just too early to it to really sell it as a an actual viable profitable thing you know? Like one day for sure like you can get the cost of rocket fuel down you can build these things in space and it's great that you can get solar power it's cold up there. But just a little bit part of me thinks why not just go somewhere cold on Earth and just build a big Costco size warehouse and put some servers in there you know? It just always feels like that's going to be a little bit easier at least for a lot longer than people think it will be. But maybe but I'm watching some like I'm watching some crazy stuff with material science and AI too so maybe we do maybe that gap does close faster. And it's also not a product people have to buy like it's not something you have to convince the average person to enjoy it's more just like an engineering problem and if the money makes sense then maybe it is so. That could be another part of it and getting humans to actually like something and adopt it might be different than how useful something is if it's not involving like a B2C customer. Uh yeah. So I do have a Hostinger account with a virtual private server on there so I'm just trying to quickly install Codex on there so that we can maybe uh Yeah go for it. Run this thing. Yeah we can that'd be fine do some coding model stuff. I'll see if I can look into these Arc AGI 2. Um I'd be curious how good this model is at that. Um what else can we chat about? So while you're doing that I can give you guys a quick update on maybe the digital twin stuff that I was looking at before. So the um I guess I'll say one of the more interesting breakthroughs I was looking at this week was there are some AI models that are coding in a way where they're getting reinforced to um code actual physics of the universe that can't be wrong. So this is back to what I was saying about those like copper engines that they're building for potential space fairing ships. The but those models what they get punished for is writing a code that wouldn't be a realistic simulation. And then you have to run the simulation with no AI like it has to be like hard coded so that you get the actual answer you want. And that is pretty valuable in like a whole bunch of industries that we haven't seen it applied to very much yet so material science was one of them and that's when you are putting different chemicals together and you're actually trying to create materials that have different properties. Um it seems like there's some really interesting breakthroughs coming on the radar in that space. There's definitely some in this 3D printing that's the one that kind of introduced me to it. Um in medical devices there's a bunch that seem like they're kind of on the realm so in those big three categories and then I can I guess I could even imagine like we're far in the future but the nano like nano bots that patrol your bloodstream I can imagine nano bots needing to be built this way too. Because they have to be so precise inside your body and don't build up and cause problems but um yeah that was another kind of fascinating aspect that I'm going to maybe try to explore on Dylan Curious in the future so I don't know if you guys have any questions about coding in logical ways through AI but that's something I've been thinking about. So you got the Codex installed now? Uh almost there so let me see if Hey you got a vertical monitor? No this is just I had I have three things side by side. I got chat I got uh Claude helping me troubleshoot some stuff. So basically since it's a brand new sort of install so install basically I have to um Oh boy. Sorry guys. So yeah basically so I'm trying to get Codex up and running so that to see if I'm able to run this GPT 5. 5 in Codex cuz it looks like it looks like it's available there for the time being. So let's see. But for people that are not familiar with like terminal and stuff like that, this stuff became a lot easier with

Segment 13 (60:00 - 65:00)

uh the addition of, you know, if you have a chatbot like Claude code, OpenAI, whatever Gemini, if you're just typing the stuff in there, it's a lot easier now. Okay, can you walk us through what is the process? Just I mean, I see your install command. Okay, so basically here I was able to install and brew. Uh yeah, so basically well, I guess let me do this. Yeah, so for people that might not be familiar, so what I'm doing is I have my favorite chatbot open. Right now, I do really like opening Claude. It's doing, you know, wonders for me. So I'm just asking, "Hey, how do I install OpenAI Codex on Linux? " And I'm just pasting the commands in here. If I'm running into issues, I ask it to troubleshoot. And uh that's pretty much it, right? So it walks you through how to do that. And now this is Codex. So Codex is running on this a virtual private server. So I just need to log in with my ChatGPT account. Give me 1 second here. Okay. Yeah, that's interesting to see your triple monitor situation. You're so much better at jumping attention from one thing to another. I just I I'm like a whole kick to try to get rid of my short-form media and just sit and listen to audiobooks and I'm Well, I'm doing this new thing with a memory palace. I don't know if you guys know that technique for um Oh yeah, like the cards. Yeah, I So I got this app, but I'm sure somebody vibe coded this thing together, but it's cool. Like I can take a photo of uh like a building or my house or whatever, like some place, some interesting garden where I am, and then I just take a photo, and then I tag little dots of where I want to like remember things, and I'm kind of practicing right now trying to memorize like visuals for all the numbers 1 through 100 and then some actions for them so I can like try to string it together, but it's been kind of fun. Like sometimes when I see interesting stuff now, I'm like taking photos of it to try to do memory palace things cuz I can just I'm feeling stupider. Like just I don't know what happened over the last few years, but I'm feeling like I just can't remember things the way I used to, and I feel like it's my addiction to short-form media and stuff like that is kind of getting all of us. So I'm trying to make an concerted effort to listen to more audiobooks, do more memory palace techniques, try to challenge my brain sometimes and not go to ChatGPT and Claude for everything, but — [clears throat] — you know, it's a good idea. If you guys want to vibe code any memory palace apps, you got one person looking through the app store trying to find stuff like that, and I might not be the only one who's looking for memory tools. Yeah, so I to me, I mean, the biggest thing So touching short-form content, I think that's like crack. I don't I can't deal with that. I try not to touch it. I try to remove it where possible. Yeah, zero short-form videos. Or so Does Tory kind of hurt your brain after a while? Like were you ever on there for more than an hour or two? I don't think so, no. Oh. No. Like after a while, I just every body was kind of wobbly, and everything happened so fast, you know? It was just like "Hey, I like Do you remember the like just the way it like there's no break. Like there's no time to breathe. It was just like, "Hey, there's an owl. Owls are great. Like I'll buy one. Two trucks. Broom broom, you know? " And I was like, "What what what? " I could just feel my brain try to make sense of how quick everything was happening. And sometimes that's kind of funny and engaging, but just after a while, it's insane how much energy that must take. So I'm trying to sign in with a I think I just need to use the API key. So give me 1 second here. Yeah. Let's see if there's any questions to answer. Um Ital Rich bro has a solution. He says, "I just tell my wife, and she remembers. " So there you go. That's a No, that's the same problem. I already have that. I tell you I put up a poll on my um uh on my YouTube channel about whether or not OpenAI should buy Snapchat. I thought it was pretty interesting. Like there's a good play there for them. Um OpenAI can use the social data the same way that Elon's using Twitter data.

Segment 14 (65:00 - 70:00)

Um they don't really own a social network anymore. Now that they're getting rid of Sora, they would, you know, in the future have the ability to potentially bring back AI video when the cost drops to a certain point and they want to, so um I don't know if you want to think about that, but have you Do you feel like Snapchat would be a good purchase for OpenAI? Maybe. I mean, I don't touch Snapchat or Facebook or Instagram or like I just You're just X. Yeah, X really is the only one cuz it's text. It's fast. Um it's not that visual. They bought their own They created their own kind of niche, and it just works. I think most text stuff happens on X, it feels like. Yeah, I mean, but if it was a I mean, it it'd be a big acquisition. Like I don't know what Snap I should look up what Snap's worth, but the like to absorb the market cap for a company that's already in such insane sort of debt would be pretty pretty tough, but it's an interesting combination. It does feel like the two could use each other, and like the AI models being integrated into a super app and chat already having a user base might be something that would kind of make sense. But obviously, I don't like consolidation in general, so keeping them separate is probably better off. Tech Ninja 420 saying besides um Dave Shapiro, is anybody else kind of thinking about post-labor economics and stuff like that? You know, what's his name? Shane Legg, co-founder of DeepMind, Google DeepMind, they're hiring a What are they calling it? AGI economist. So they are So Google is looking into it, which is great. I do feel like there's not enough people that are looking into it. We need more people thinking about how to transition if the jobs take a hit, if the jobs go away, like at least have a plan, sort of a model for how to transition off that system. Dude, you know what's It's funny you brought up Shane cuz I'm reading the book right now the Infinity Machines, and it's the story of Demis Hassabis, you know, it's like his biography from being like a chess prodigy to building his first game company and all the people he interacted with. And yeah, when I first when Shane is like introduced to his life and Mustafa Suleyman life, they're so different. Like these characters are they're just such different people. So I always think of them as all these like three AI geniuses that were just sitting there together, but they really are very different, you know? And like Shane does care a lot more about big economic questions and, you know, Demis was much more of like a physics sort of minded person, and then um uh Mustafa is just like he kind of he was friends with Demis's brother and like didn't even really know about all this stuff at first, you know, and was like deeply integrated into all these like political issues and thought a lot about like geopolitics and stuff. So it's a much more interesting founding than I kind of thought it was. And the fact that they picked up like all these super geeks, like these PhDs that had total trouble communicating and stuff, but that was fine. Like they could find people to do that, but just it was just really the story of extremely smart people coming together. But Shane is uh Yeah, it was one of the more interesting here. I think you think I If I get this right, I think he was like one point doing ballet, and people were just like, "Oh, you do ballet, bro? " And he was like, "Yeah, yeah. " It's like, "Okay. " Let me look that up cuz I just remember that part being like, "What? " I do have um I do have the book Infinity Machines about Demis, and I have found that it is extremely difficult to um to read books nowadays. Like I'm I consume a ton of content. Audiobooks, podcasts, YouTube, tons of texts online, tons of newsletters, just but man, sitting down to read a hard cover book, a actual physical book, like I feel like that sort of skill set like it's a skill apparently that slowly decorates over time. You know what I mean? Yeah, oh absolutely. Yeah, by the way just double checked out yeah like yeah he did talk about taking up ballet as an adult and wasn't as a PR stunt or quirky headline. He was just like genuinely curious about it, you know? And I was like you just expect somebody who's like pure math and abstract thinking not to care about that stuff but precision dancing was like super fascinating to him and I was like all right, you go

Segment 15 (70:00 - 75:00)

bro. Narnar head is saying I should just read it on a stream. So which is interesting idea. I'm going to do a poll which — would you guys sit there and listen to me read a whole book? Uh I feel like they would. I feel like I would. I'm I mean I'm listening to an audiobook now I might as well do it with West Russell Russ the live. Dude, make it a pattern be like all right noon I'm going to put 3 hours into reading this book noon on Tuesdays. I need to get my voices right man. Some of these voice actors that do like Game of Thrones or some of these other books they're just so expressive and they just go in and out of the different voices. Um it's crazy. You know at one point when he was doing ballet Shane said that his dad thought he was gay so he like got all sad or he was like dealing with that and like kind of working through the fact that his son might be gay and he's not, you know? And then his mom was like very accepting and he's like well it's good to know that she would accept me if I was but I'm just not, you know? I just feel like he like just really drums to his own beat. Mhm. But yeah I'd say read it out loud it'd be great. I'd be down. You got a good voice for reading a book I think. You definitely have the pattern that you could listen to a whole book through so I think it's destiny. Yeah switching to this this going to be an ASMR channel and just me — droning on. Uh let's see. So I'm not able to see quite yet the um 5. 5 appear anywhere so give me a second here. So wait somebody said some people have confirmed that they're able to see it where are we seeing it in where are people able to access it? So is it Codex? Is it I'm not seeing it in the UI? Is it just maybe it just slowly sort of diffusing out to people? Slash model I mean I did slash model. I mean you could always have your open cloud go take your voice take the book go to 11 labs like turn your voice into an entire book and then just go live, you know? Like you could probably just jump into your YouTube channel go live and just read it from front to back. It's probably copyright or maybe modify it a little bit maybe a shortened version. Codex UK um Codex okay so I'm not seeing it. What's version number? So people that are saying Codex what version number? So I'm in the 0. 124. 0 so or has there been or did you force an update? Cuz I'm seeing 5. 4. Yeah 5. 4 5. 3 Codex. Yeah in slash model I have just 5. 4 is the latest. Okay so they're after saying like Gemini 31 or GPT 55 what's better? Oh same version. Yeah. Is God what is I don't know Gemini for some reason I always have trouble comparing Gemini to a lot of things. Seems good but then sometimes I don't use it for things very often. Um I guess GPT 5. 4 and maybe this latest version is already winning in coding and debugging. Yeah I by the way did you hear that uh one of the co-founders of Google was it Brin or Larry um he just started a whole new strike team because of their losing in coding. Yeah. So like he's like we just need to not get blindsided by this cuz they're totally in a position where if they can if they could figure out coding first they could start that flywheel of improving all their products faster than anyone else so I could see why he is concerned about that. But yeah there's a for you guys that don't know there's a there was a kind of warning that went through the internals of Google this week where one of their co-founders said you know we like need to make this a top

Segment 16 (75:00 - 80:00)

priority and he's taking some of the best people inside of DeepMind and um they're really going hard after coding cuz it feels like they might be in third place right now and Google doesn't like to be there, you know? They certainly have the talent and computation and everything that they should be able to pull this off but they got to stay focused on it so it's kind of a directive from the top down to catch up. Yeah and it doesn't feel like maybe this isn't about just like users or about a chatbot they're really like focused on this idea of you build the best coding agent that helps you do AI progress machine learning research so it's like this like compound effect. — yeah. — The flywheel and Google's like oh okay. I like I feel like people predicted it we kind of knew it was coming but it was still kind of like this theoretical like oh is this how it will work? And now it's becoming more clear that yeah that's exactly kind of what's happening and Google's like waking up and um Sam woke up. Yeah no Dario just the actual story might be that OpenAI had the lead they had the transformer model text is very close to code I mean it works just almost one-to-one it's a great chance to be the first in the world to build a coding system but then Sam took over and had really big visions with Sora and side quests and all sorts of things that all will be transformed by AI and he probably thought to himself like I need to be positioned in all these verticals cuz robots are coming and all sorts of things are happening and then Dario just shut down. Like you know what I mean? Like there was never a great way to like pull an image out of Claude. You could do it with Grok ChatGPT Google was doing all sorts of stuff with vision models and they were trying to build like a unified multimodal system from day one and it just seemed it just seems like what they did at Anthropic was hyper focused in on it and then the money came the enterprise money came the coding model is now top tier and everybody's trying to catch up to it and it might look like they did they figured out a way to like beat everyone else and once they get coding done then you go back and you code everything else, right? Like coding is done now so you're like now go use like lower end GPUs and make a video model that's better than the best video model that Google had that was programmed by humans who can't work 24/7 and then they end up at the lead of that flywheel and uh maybe they're the first ones to really get to AGI and have sort of have a winner take all situation but you know who knows? We're just watching the race play out but it's a it could be the way that it looking back on time that it goes. Yeah. It's going to be interesting to see this whole thing. I I'm having I'm updating Codex on my Windows now so CLI 0. 124. Um brand Codex that's just got okay. You know it might just be So for the people that are saying they have 5. 5 are you guys logged in with the OAuth like through the OpenAI authorization or are you using an API key? I'm assuming user signed in the app. Yeah yeah that's what I'm going to be doing. Okay. All right. Dylan what else can we talk about? I know I went through Shane legs ballet thing that was you know that was the biggest part of my conversations but um but besides that and the future of the strike team and the way the coding models are going to go um you know I we could talk about what's one task you wish AI could handle for you right now. Okay that's a good question. What is one task that I wish AI could handle for me right now? It would I always jump to the most annoying thing in my life. You know it's like switch what drives me nuts is switching profiles. Like I have all these Chrome profiles like these different emails. Like when someone asked me to do a simple task, I hate that my brain is so consumed by okay, that task is on this list. It requires this group of people to solve it, which means I need to be logged into this account and this Chrome profile. It's like I know the answer to it, but then I have to go backwards and think about all the small things. So, the task that I want AI to do is essentially like and I'm never going I'm this going to be like one of the last things I'm going to let it do, too. But like I want it to just bring the right information up for me to solve a problem, you know? Like I want to I guess synthesize something so I can

Segment 17 (80:00 - 85:00)

give somebody an answer. That's part of it, but also just being logged in properly at the right place. So, the task that I wish it could do right now would be logging Yeah, just getting the right information in the trusted bubble that I need to be to answer something. Um I just find it like the big passwords. Like just having all these passwords and going back to my password manager. It's these like medial tasks. And I think it goes back to what Wes said earlier. The best thing for me for AI to do would be full control of the UI. Like use the mouse, switch the profiles for me, bring up the right tabs, and then let me do my work or like solve my problem or get to my answer without the constraints of that. You know, cuz like in my house it's just so like I walk to the kitchen and like the food is in the kitchen, right? But if I want the equivalent on my browser, I have to look at all the tabs I have and remember on this profile am I logged into that account? You know, I have like a business LinkedIn and like business versions of like all these different AI tools and sometimes I'm like oh, I don't want to paste stuff that's business related into my other one. And all that stuff. Like so there's a frustration where a lot of cognitive waste comes from. So that's what I'm looking most for. Um I'll also I mean physical Well, AI handling some physical tasks would be nice, too. Like I had so much yard work this weekend and I was like dang, like I could definitely imagine some somebody handling some of this stuff, too. Cuz it's just yard work, you know? Like the somebody go prune the trees or whatever. So maybe that'll be the next thing. Well, and scientific stuff, too. But yeah, go ahead. Oh, man. Yeah, we're just seeing crazier and crazier uh sort of um robots out there. Uh One of them too this morning I saw it Everything's coming out of China. So one of them is a humanoid robot, but attached to the feet are wheels. And he's like kind of like roller skating. Yeah. No, no. It's like he's got two actual legs with wheels instead of feet. And so he's like rollerblading. Like he would rollerblade on these wheels and looks bizarre cuz it's not it's completely different, you know what I mean? The skate bot. Yeah, people are saying oh, man. — Oh, let me look this thing up. Skate bot? I love crazy robot stuff. I'm going to go check this out now. Skate bot. I did Yeah, I mean I was looking at the races. Like there's a lot In China there was a bunch of like Right. 4Ks and stuff that were beat by humans. Skate bot. Oh, that's bringing up some stuff from the '90s. Some like Transformer like as if Oh, here it is. No, no. That's fake. That's fake. All right, let's go. We have it. It's GPT 5. 5 live in Codex. — Dun da dun West, take it away. Oh, man. But no, go ahead like finish what you were saying. — find I can't find skate bot. If I find it, we'll joke about it at the end. But it's no big — it this morning and uh it was extremely Yeah, it's Unitree. Unitree is just like out of this world with all the stuff that they're doing. It's kind of crazy. Can I ask the commenters to give me a link or does it not work that way? Can they not do that? Send me a link if you can. Or I'll just look up Unitree. All right, what do we got? Do you take it away, Wes? Well, I don't know. What should we build? Uh Minesweeper? Angry Birds? Um A dating app? Give me some ideas, people. A profile Well, yeah, or a profile updater. Cuz you know, we talked about how profiles need to be trendy. Like a profile trend updater so that my profile keeps getting the newest trending thing done to it. So — agent dating app so agents can find love. An agent dating app. Um Let me do one thing. So let me let me do that. I I'm just curious right now to test one thing actually because yesterday yesterday what I did Bomberman. Oh, man. Um so was as I had Kimmy K 2. 6 create this thing that goes online and tracks the various for videos the for SEO purposes like where are they ranked for certain keywords on Bing and YouTube and google. com. So I was able to do it. It ran for about like 40 minutes. It was able to create that database tracking as well as a visual

Segment 18 (85:00 - 90:00)

sort of visual UI that you can access from anywhere. So let me throw that in there and it starts working. Give me one quick second. Okay. I did find that rollerblading thing. That's crazy. That's so much more It's like those wheels are more intense than I expected. Um Okay, but anyways, I was So when you get there, I've got this idea for an app. So you know how birds they make uh songs to try to attract mates? What you could do is build an app and it's for AI agents and they use Suno to generate love songs to attract other agents to them. So that way if you're like a high-powered agent and you don't have enough, you know, bandwidth or tokens to like work with all the other agents, you can review the songs that they they make. And the songs should say here's all my qualities. Like this shows that I'm worth partnering with. And then just like the natural world where birds pick the best, you know, mate, AI agents can use songs to isolate the best partners. Mhm. Yeah, that's it. That's And they had the whole uh Gemma models were able to translate what is it? Dolphin languages. I mean, we're going to pretty soon be able to hear what animals are saying, so to speak. Uh it's great. Um Yeah, thank you for trying to take that serious. I appreciate it. I am really Yeah, I really appreciate that. But yeah, so you know, the world is our oyster with the new coding models. So But yeah, I was looking at this Unitree G1 on roller skates. I think that's pretty effective. Imagine that thing could just run right next to you. It could go straight It could just make it street legal, right? Like now the robots they deliver packages. It's not about Amazon delivering a truck with the package on it. It's about the robots zipping right out of the Amazon shipping warehouse holding it and then they wheel down the road 60 miles an hour on the freeway. They get off the freeway cuz they have wheels for feet. They're humanoid shape. They're holding the box going down the freeway. Then they come to your house, roll up to your front door, drop it off, use their fingers cuz the hands are there, ring the doorbell, boom. Zip off. They got camera vision. They take a photo of it. You know the delivery's there. You check your app. It's in front of your door. I think that Unitree G1 roller skating robot is pretty crazy, actually. Yeah. I really the feet are not optimal. Like we all kind of wish we had wheels, you know? But biology does doesn't do wheels. But humanoid with wheels Although there's an argument to be made that maybe we shouldn't keep building robots that can chase us down, but I guess so. I've seen the robot dogs with wheels do some pretty awesome terrains, so. They finally figured out how that uh it's a some sort of a bacterial thing that looks like an engine. Have you seen those like this highly sophisticated thing that slowly rotates? Basically is like an engine that the um some bacteria not invented, but just like was able to evolve. And I think yesterday they were saying how they finally figured out how. And it's like it's faster than F1 car in terms of like how many RPMs it gets. It's just this weird and saying thing that um Oh, evolution figured out the wheel? No. It's more than a wheel. It's like a like an engine basically. Uh I wonder if people know what I'm talking about. I mean, I know there's like in insane stuff inside of like the mitochondria that make everything work, but I always felt like evolution couldn't invent the wheel cuz it's like a hard thing to iterate on. Iterate into. And like blood vessels and nerves and stuff like that. Like how would a free spinning wheel Wait a minute. If you haven't seen this thing, this is going to blow your mind. So you've never seen this um engine looking thing that's made by bacteria. Uh I don't think so. Like I've seen flagellum that have like little Is that right? They have the little spinny Wait. But they just They're They're just Why They look like little monsters. They just like walk up and down nerves, I think. They're little monsters. I mean, there's the I mean, there's rolling animals like the little bug that rolls into a ball that like, you know, So I'm sure most people have seen this thing, right?

Segment 19 (90:00 - 95:00)

Did Would that exist in the body? This is like a That's molecules? It's like a bacteria. It's like So the flagella of Helicobacter pylori works like specialized propellers driven by high torque rotating motor. This convert chemical energy generated from a proton gradient across the bacterial membrane into mechanical rotation. And I think yesterday they figured out — Evolution came up with that? Yes. [snorts] So this is like a natural thing that just like popped up. and a wheel. Yeah, I you know what? I mean, I guess maybe on a molecular level it's different than like on kind of higher ordered level, but still like that is definitely a wheel. Where is cuz yesterday this was all Just yesterday? I had no idea. Well, yesterday they figured out something about how it works. So basically this thing is spinning at like — do you know all this stuff? Like I was just thinking about wheels from that robot. Oh, like it's just on X, man. It's just like you're just scrolling through. You're like, "What? They did what? " We're living in a simulation. Yeah. So um this thing is spinning at uh 20,000 rotations per minute. So a Boeing 737 rotor spins at about 14,000 RPMs. So this thing is able to create quote unquote rotors or motors, whatever engines, whatever you want to kind of compare it to, that is spins faster than a lot of the stuff that we know how to do it how to do. And surprisingly low low energy, right? So the the amount of energy that it uses is tiny. So what was the breakthrough? Sorry, I said 20,000 RPMs. I'm sorry if I miss Oh, and up to 100,000 — Yeah. RPMs in some species. What was the — What was the thing that was the breakthrough in the last couple of days that they were talking about? They figured something out about how this worked. I don't know. We'll go to the comment section. Maybe someone knows. But I mean, that's mind-blowing. That is mind-blowing. Like whatever I think about just the fact that we have a trillion cells and each one of them has these intricate systems, it just breaks my brain. Mhm. It truly seems like impossible engineering to me, but you know, I mean, it obviously works. It makes ATP. Oh, wait. GPT They're saying that Hang on. They're saying that it might be a fake command to do that. Let me verify which model we're using it. Um it on? Give me a second here. Um Okay. I'm going to share the screen really quick, okay? I'll just show you the image — yeah. Yeah, go for it. And if you have that robot, too, please Yeah. That's what I was going to ask. I was like, "Dude, do I get the robot? " Dude, they like put a sword in his hand. This thing is crazy. Oh god. Um Okay. Check it out. If it doesn't work Oh. Yeah. No. Okay. So for the person that's saying that, yes, um so that's what had happened the first time I ran it. It got an error when I tested it. But the thing is once I logged in as an OAuth through OpenAI, now it works. And it seems to be legit. Does that look like clickbait to you? GPT 5. 5. Let's go. — [snorts] — I don't know. It's It is GPT 5. 5 and uh Let's go. — What's Yeah, like and then you use the completely black Dude, it does bring the eye to your videos, though. Having like no text right there for some reason. I mean, this error is like kind of cool. — else. Yeah. Um This is what you naturally look like when you come up in my feed. All right. Anyways, here's the wheel drone. — My god, it really does look like It looks like an ice skater or something in the Olympics. What? Front flip and rollerblades?

Segment 20 (95:00 - 100:00)

— Okay, this is getting pretty crazy. This And we're sure this is human, right? Oh, yeah. You Unitree Robotics. Okay. Ah, dude, is America toast? Are we like getting cooked here or — [snorts] — Now it's on ice? Don't have It's on ice. Oh my god. No, imagine it dude. Hockey Look, I mean, I guess I never thought about robot hockey till right now, but like that almost seems like next, right? Just train them to play hockey. Imagine the checks. They could like smash each other. Break apart. Huh. Are they doing the Olympics the steroid Olympics or whatever it's called? You know what I'm talking about in Vegas? — Yeah. When is that happening? — Yeah, the non-natty Olympics. Yeah, they have a funny name for it, right? The enhanced games. Yeah. The enhanced Yeah, dude, put this in the enhanced games. I think somebody on steroids versus this robot like on a a race track. You know? Like who wins? This will This thing will actually absolutely win, but Like this thing where you're pushing. Like that, maybe. I got a feeling a human on ice skates could maybe still win. You know? Mhm. But of course not for very long. The thing is we'd beat them on endurance if they can't like recharge and stuff like that. But in terms of like just power, speed, agility, that's insane. Like I don't think a human could do that. No, really. I mean, definitely you'd have to be really comfortable on rollerblades to catch yourself like that. It's not going to be the average rollerblader. Dang, rollerblades are coming back. Maybe you could just put the AI in the wheels themselves and then they could have shoes for humans that give us wheels like that. Then we could just jump in be on the freeway free balling it, you know? Just like us. I mean, not free balling. We'd have clothes on, but like you could just be us on the freeway with shoes that have wheels that go at 60 miles an hour. Oh, wait. Sorry. Missed Wait. What did you say? We would have clothes on? Yeah, well, I just We would just be normal people, but I'm saying maybe we could have AI shoes with wheels in them and then we could also have some of these abilities, you know? Just enhanced. Oh, got you. Got you. I use the term free balling as a Yeah, well, I was like on the free Well, cuz I What I was imagining was a human with clothes on, but just freedom cruising down the freeway and I misspoke, but Okay. Cuz free balling means naked, right? But I just meant like free going or whatever. I'm sure they'll forgive me. — Free balling naked or commando? I don't know. — Do we have any Oh, that's right. Yeah, but you know, you can have underwear on, too. I don't care. There we go. Yeah, it's like who cares? As long as you have rollerblades on, it counts. Uh So what do you think? Do you got You going to try to do some Codex stuff? Or does it feel like the model's not available right now? Oh, no, it's working. It's building. This whole time it's been building and um so it built it. So now I'm asking for a link so that we can see it. If it built the thing thing, then um it would definitely be kind of a good start. And we got to figure out some more Um actually some more good test for it. Actually, for people in chat, can you guys help me out? So here's what I was thinking uh as a test. I want the app to create something similar. So imagine like a StarCraft. Maybe just like low res, more simplified, right? May right not graphical, whatever. But you sort of have these bases. You have little droids that go out and farm resources. And you know, zerg and then mine something. And then the thing is able to build soldiers. And so it's like a little open map. And each one of those is controlled by a particular model. So, you got the Gemini, you got the Opus, you got the GPT 5. 5. Yeah, RTS, real-time strategy. Thank you. Um How Give me some ideas for like what how to build the game. Would it make sense to make it real-time? um turn-by-turn? What game mechanics would you throw in? Specifically, what we're trying to figure out is the model's ability to Like, ideally, I wanted to create a test where it goes through different stages. And one of the stages is like a real-time strategy. a little bit more like diplomacy, if you've if people have played that game

Segment 21 (100:00 - 105:00)

where it's like You know what I mean? Like, they kind of like uh negotiate and betray each other and stuff like that. So, and maybe one where some sort of um financial game is also. So, basically, it's a stages where they alternate. People are say RTS is not turn-based. I guess it's in the uh So, basically, the reason why I guess Yes, it wouldn't be RTS. It could be, but the issue of RTS and then it seems like the instant models would have an advantage versus the you know, if you have a thinking model that takes longer or an open-source to process. Like, you don't want it to have a disadvantage just because it takes a longer to think about. Like, I don't want that to be a disadvantage. So, maybe it wouldn't be real-time. Maybe it would be like a 5-second increments. And And you know, you would wait for each model to update before you proceed. Drug Wars style RTS. Uh Dylan, have you heard about that? — Drug Wars? No. Uh Drug Wars. Probably some kind of like sort of simple version of GTA or something, I'm guessing. Let's see. Oh, that's the like the old-school Is that the thing that people used to play on like the calculators? — Calculator? Oh, yeah, turn-based strategy game where the players assume the role of a drug dealer engaged in arbitrage. Okay. Okay, that could work. Oh, it was — Yeah. It was really late on the TI-83 Plus. Yeah, exactly. The object of the game is to deal the most drugs to pay off the a loan shark by the end of the game and make a profit. Yeah. Bot-powered RTS with the economy of EVE Online. Dude, yeah, like EVE Online like I don't know. Like, I want something that's either Factorio or EVE Online or StarCraft or something or have it play I mean, like the old Pokémon games were great for that. Like, what I'm saying is anybody have like a really good idea for what would be like the best sort of game design to test these models? Cuz then I would have this thing create to that. EVE is a goodbye to 4 years of your life. Yeah, EVE is interest EVE they really captured something um Did you spend a lot of time playing it? Not really, cuz the things by the time I realized it was there, it's been around for so long. It's With these MMOs, I feel like you got to commit at the beginning. You know what I mean? Uh Age of Empires. Dune 2, yeah. XCOM 2, Tower Defense. What's that old um Dwarf Fortress. I mean, if it's able to create something like that and then run it effectively, that would be very fascinating to watch. Um Age of War, yep. All right, so really fast. 1 second. Okay. So, here is Let me share my screen. Yeah, I can see the excitement in your eyes. Let's see what we got here. Uh prepare for the flashbang because I should have specified that we wanted dark mode. So, here it is. This is what 5 GPT 5. 5 built in the last however many minutes. It was much faster than Kimi K 2. 6. And it's looking pretty good. So, notice that I'm able to search for different keywords. So, basically, we're trying to create a tracker of SEO tracker of different um keywords by different people. So, notice it has a views if it's a video or not, link to the actual Hm. to the actual thing um source links and over time we would be able to add more keywords

Segment 22 (105:00 - 110:00)

have a database of how it's sort of moving uh up and down, etc. And it was able to create a link to send to me to be able to access it from anywhere. I got to say, this is So, this is good. I'm not going to say it's like super impressive or AGI or whatever. — Yeah, yeah. But, I got to say, like I asked for it what, 5 minutes later, maybe longer? I don't know how long it took, but you know, it comes back and boom, there it is. Um Here's the thing is Do you know that like Social Blade? Like, do you know how much they make? I always wondered. I know, but I mean, you're right. To mimic a dashboard like that, to have tools like that. So, the thing is there's tons of these channel um websites and stuff like that make a lot of money. Some of them millions and millions of dollars. A lot of them just aggregating information and helping people that are looking for it in a certain way. Um and yeah, their mode especially is just being able to collect this data over time and then present it and have it for people. Um No, the dashboard industry could be so huge. Imagine just uh I don't know, people who plant trees or something wanting a certain weather like information. And just nobody's ever had the profit reason to go out and build that whole tool, but even if there's 10 customers and you can use AI to vibe code it together, those 10 customers would be worth, you know, a day's of work in sales. But, on social media, like so many brands are trending, that kind of information is so powerful. I mean, we're always looking for AI news and like futurism breakthroughs. There's I mean, you could probably just ask for ideas, but there is a lot of small niches. Like, just things related to pets and dogs and city problems and all sorts of small interesting things would be fascinating. Yeah. Um And I have So, for people that are kind of interested in this stuff, you know, I do encourage if this is something that you want to build, right now, I feel like there's going to be a big opportunity to create stuff like this. And I I've been thinking about it. I don't know if I will I'm pre- I really shouldn't be doing anything else other than focusing on my channel what I'm doing. I tend to get all sorts of distracted, but So, there's a website called What Runs. Um and all it does is goes across the entire web and it looks at all the different um websites and what they run. Are they on Shopify? Are they on WordPress? Are they on Vercel? Whatever. Oh. And um it was founded in 2017 by one person. It has one employee. — What? And they make $3 million? People need the information that much? Yeah, they make millions and millions. And they have a strong moat because it's been collecting that data all the way since 2017. And a lot of it is available online. But, here's the thing. A lot of people want access to the his- historical information. So, it's hedge funds that want to know like, "Oh, is Shopify increasing its share of online stores or decreasing? " Or You know what I mean? Like, they're able to go out there and make trading decisions based on this data. It's publicly available, so it's perfectly legal, but it's just hard to access, right? And so, this person created this thing. Very simple. Um and uh it just sits there, collects this data, and years later, all of a sudden, he has a moat, right? Because you can't go back in time and collect that data. He's already collected it. Um and people are willing to pay millions and millions for it. And there's tons of Factorio with EVE Online markers markets. Boom, that's That's terrific. Um So, basically, it used to be code used to be expensive, right? Creating stuff like this used to be difficult, used to be expensive. And it's like that idea of the papal on effect when they print money, right? So, the people that get that money first tend to be advantaged in a sense that they're still spending that money before the inflation kicks in. So, imagine how highly we value code now. The stuff that it can do, right? And just over the last When do you think was the inflection point for AI coding models when they went from, "Oh, kind of cool. " to, "Okay, it can do actual work. " I think I mean, maybe when Andrew Karpathy posted that thing like 7 months I don't I would like 7 months ago maybe when people like

Segment 23 (110:00 - 115:00)

me and you could kind of be like, "Oh, I could do that, you know. " I think that was the inflection point. It was like the November 2025 or thereabouts when a lot of people started coming out like, "Oh, Claude code is awesome. " So, whatever that was 4. 0 4. 5 or whatever that was. But then in the Codex models Yeah, December Okay, so December is I think when it started coming out for everybody, but a lot of people that had early preview as early as November. So, to me, that was the turning point. The other thing now is Notice how you're able to take a screenshot of a website, give it to a model, and recreate that website. Not only that, but there's that whole idea of clean room engineering where you're able to say, "Okay, I like the functionality of this software. Recreate it. " Yeah, like go read everything that's patented about that and do none of it, you know, like build it from scratch without overlapping. As long as you can even just just do it in a different coding language. As long as you're not using their code cuz code is copyrighted. So, if you copy their code, you get in trouble. But ideas, functionality, layouts, that's not copyrighted. So, you're able to Can you go back to that thing that it generated? Like what did Is there's nothing there that would be copyrightable, right? And also was the images generated from it? Wait, which one? The one that it's put it on? Yeah, let me open it up again. Okay. Well, I guess what I was curious about is did it do anything with images cuz they have that new image model. I was just I don't know if it put a logo in there or anything or if everything was just CSS and text-based, but Oh, no, cuz I mean, I didn't specify, but that's actually a good idea. What can it do? Dude, Let me let me get it on that. Like will it generate a PNG if it wants one and you're like put a logo at the top corner? Or will it just make the logo out of CSS? Okay, let's put that in. Let's give it a few minutes. What did we agree on on? What are we building? That's a great question. I'm sorry. I assume people are talking about the game. Yeah, like the Eve online thing or the Yes. You know, I like the idea of Factorio with Eve markets. So, the idea of it needs to create systems of automations to produce products. But also, they have to produce products and be able to trade it on the markets with the other LLMs. So, let's say there's four LLMs. One might have a advantage of creating metal, one creating electricity, whatever. So, they're able to buy and sell, trade it on the markets. But also at the same time, having a some sort of a real-time war conflict going on. So, can it understand how to create those automations stringing them together to produce what it wants to be able to I mean, the point is since it's a competitive game, you're almost like using financial warfare, right? So, if your opponent is pressuring you, but you know that they need metal and you produce metal, you sort of pull that off the market like that. Or, you know, can they negotiate with other people like, "Hey, this guy won't send me sell me metal. Can you buy it from him and I'll buy it from you at a markup? " Like Do you know what I mean? Like intelligence test, I feel like. Diplomacy cuz I want to throw more sort of things more layers into it, right? So, it's not just talking or just this. It's like Can all three sort of fit in there? Cuz you can't train for that. You can't fine-tune for that. You know what I mean? You can't benchmark for something like that. How human they act and how clever they are. Yeah, so let me So, it's it's working on it right now. Okay, I might need to head out soon too, by the way. Feel free to like keep going with it and might drop off a little bit early, but

Segment 24 (115:00 - 120:00)

Yeah, I'm at some point have to get going as well. It would Forget we have lives outside of YouTube, but sometimes people around me don't realize that. Huge news. We also have lives. RL lives. Hang on. RL like real life in real life or just real life? RL shortened it up a little. Oh, that's an interesting point. Okay, so what RL Was that a World of Warcraft? Was that a WoW term? World of Warcraft? Cuz in World of Warcraft, they used to say RL. And then later Yeah, probably RL real life or reinforcement learning. Yeah. Okay, yeah. So, here's the site that I had built completely entirely with agents. I I'm not having it updated now, but it used to basically collect all the different stories from the internet from Yeah, all of the publications, blah blah. It also used to update these AI model benchmarks to make to keep them up to date. Right now, I'm not having it do that in large part because you know, Anthropic pulled their OAuth sort of authorization. I decided to cool it for a little bit. You know what I mean? Wait, yeah. But here's what it built. As one of the demos, every time something new came out, I had to create some new demos to kind of like visualize it. Here's what I'm talking about. So, this is a replay with So, this is an LLM match, I believe. Or is it? Hang on. Let me let me select it. Cuz some of these were I think this might be like a static match that was just with like an algorithm, but a few of them were actual LLMs that would battle out on this little battlefield, right? So, the goal was for to collect resources, create various whatever units and go in and conquer. Um and I have some LLM replays on here that were kind of like them going up against each other. I just forgot where that was stored. It was in here somewhere. Anyways, so that's what I was trying to create to have it be like a test of intelligence for the latest models so you can compare them head-to-head in you know, in this arena. This is very cool. Okay, so somebody is So, thank you for the super chat. They're giving us a question. So, car A travels at 100 km/h. 30 minutes later, car B starts from the same point in the same direction at 200 km/h. When is B exactly 50 km ahead of A? All right, let's see what 55 has. So, I feel like this should be easy for a model like this. Hang on. Let me open this thing up. Oh, wow. Okay. Let's test it out. Sorry, Riverside is tripping out. I think I hit share too many times, so it's like it's refusing to do it. Give me 1 second here. Screen. Entire screen. Boom. Okay. So, first and foremost, are you able to see my screen or not? I cannot. It takes a while sometimes. I like the way it does it is every time it starts a new sort of Yeah, like a new track. — or something. Yeah, you see that in the settings. Yeah, for live streams it's a little bit annoying cuz yeah, like right now it's loading. I can see it's slowly loading, but it takes time. So, I apologize, but basically on So, we're running 5. 5. I'm going to go ahead and post that exact problem into 5. 5 and we're going to run it. Can you figure out what the solution is? Not me. So, car B is exactly 50 km ahead of car A 1 hour after car B starts. That's 1 hour 30 minutes after car A starts. Is that correct? Can anybody verify uh the math on that? But, here's

Segment 25 (120:00 - 125:00)

Hm. Here's I asked it to create to add a bunch of whatever to add — Yeah, it looks pretty, but it's not there's no JPEGs, no images there. Hm, you're right. It's all CSS to me. So, I asked it to create it in the Oh, look at that. It's got some What? It's got actual these things. — But, it pulled that data, right? It didn't generate — pulled that data. Oh, you're saying you want to see actual uh images created by the you model. I was just wondering if it would put that in there when it needed it. Like it would generate like a game asset or something. Mhm. But, uh image 2. 0 5. 5 doesn't seem to be available to plus users in Codex. I just got an error message. Are you logged in as OAuth or API? So, I was not able to pull it up in the API. I was able to in you know the other thing. All right. Yeah, I can thinking. It's not seeing the new model, but this weird. I know that Hermit's agent OAuth. Okay, you're using OAuth or API? Yeah, you got to use OAuth. 50 km ahead of car A 1 hour after car B starts. Is that what it said? One after 1 hour Yeah. Okay, so it sounds like it got the correct answer. So, the person I apologize I can't read your O2W. Um is that what you expected? That seems Or is it one of those simple bench questions that are there's some trickery to it. You know what I'm talking about? Sometimes um You might yeah. Uh Philip from AI Explained, he's got these questions where it's like super detailed but then you realize that there's like some common sense thing missing. Right. — Or it's like cuz there's one where it's just like this. It's like a car drops a you know, a car going over a bridge drops a handkerchief. Uh and under the bridge there's a river with this current and the air is going this way. After 2 hours, where's the handkerchief or whatever? And the answer is what? It's on the ground cuz they dropped it on the ground. You know what I mean? Yeah, yeah. He like took out that extra information and threw it off. Yeah. Yeah, or at least sure that car like there's like is it faster to take my car or walk to the car wash to get my car washed? — Oh, yeah. What is that uh It's like, you know, a block away and it starts to measure like, oh, is that worth walking or driving to? And then it's like, oh, it makes more sense to walk there. But, you're like, but it's for a car wash. So, obviously it doesn't make sense, you know? But, you can definitely get lost in that pretty easy. Uh so, not Coco thank you so much for the super chat. So, I'm going to put that in. We are about to I think I'm going to go Dylan. I don't know about you. I am going to like jump off here in a second. — It's funny how live stream like you've done a lot more live streaming than me, but somehow I'm just in it and then my brain's like, I'm done. Did you just end it? Okay. Well, it was a pleasure. Is everybody still watching the live stream? Not really sure. Wes and I may have just gotten a fight. Unsure. Maybe he's like, I'm out. But, I think it just broke down or something like that. But, anyway, thanks for watching and Dylan Curious on YouTube if you want to check out my channel. Wes Roth his. I try to cover um a little bit more than just AI, just anything that makes me curious, but it's usually futurism related and uh yeah. I was hopefully you guys liked that and I will see you soon. Bye.

Другие видео автора — Wes Roth

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник