Learn How to Use AI With me - https://www.skool.com/postagiprepardness
🐤 Follow Me on Twitter https://twitter.com/TheAiGrid
🌐 Checkout My website - https://theaigrid.com/
00:00 – Sim-to-Real Breakthrough
03:10 – Autonomous Home Cleaner
06:50 – China's Robot Surge
09:20 – AI vs Virus Experts
11:40 – Virtual Employees Soon?
14:05 – Meta Talent Shift?
16:30 – Sherlock Benchmarking
18:10 – Grok 3.5 Incoming
19:30 – AGI? Not Yet
21:45 – AI Buys For You
24:20 – Explosive Code Growth
27:10 – Half-Human Dev Teams
29:35 – Apple & Anthropic Collab
31:25 – OpenAI’s Open Model Strategy
Links From Todays Video:
https://x.com/TomValentinoo/status/1918391629552996357
https://x.com/cixliv/status/1918544040561115177
https://x.com/M1Astra/status/1912201785181155761
https://x.com/vitrupo/status/1917998798397423735
https://x.com/OpenAI/status/1916947243044856255
https://x.com/slow_developer/status/1918334668669079640/video/1
https://x.com/slow_developer/status/1917570427020533987/video/1
https://x.com/slow_developer/status/1917639663935971392/video/1
https://x.com/chelseabfinn/status/1914727362996129924
https://www.reddit.com/r/singularity/comments/1kegemk/noam_brown_openai_recently_made_this_plot_on_ai/
Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.
Was there anything i missed?
(For Business Enquiries) contact@theaigrid.com
Music Used
LEMMiNO - Cipher
https://www.youtube.com/watch?v=b0q5PR1xpA0
CC BY-SA 4.0
LEMMiNO - Encounters
https://www.youtube.com/watch?v=xdwWCl_5x2s
#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
Оглавление (14 сегментов)
Sim-to-Real Breakthrough
So, one of the first things I'm going to talk about is, of course, Botton Dynamics. So, what they did is they partnered with Nvidia to demonstrate and deploy Nvidia's Dexter AHRGB workflow. They used the upper torso of their Atlas MTS robot equipped with three fingered grippers to showcase the Dexter AHRGB's abilities. This robot was trained entirely in simulation using Nvidia's Isaac Lab. Then it actually successfully transferred to the real world without any extra fine tuning. And this is where you have zero shot to sim real performance where it's basically trained in simulation and then just deploys really well. Now this robot as you can see here can grasp industrial lightweight heavy objects showing retry behaviors where if it dropped it's able to retry once again. and they provided the realworld robotic hardware platform to prove that Nvidia's AI workload can go beyond simulation and do real tasks. Overall, this is a, you know, monumental, not really milestone, but it just goes to show what this technology is going to be able to do. We really have to think about the implications of this. I mean, humanoid robots are continuing to develop and consistently get better and better. And when we have general robots that are able to do things that they haven't seen before and really, you know, just be trained in simulations for millions and millions of hours and just immediately get better like we've seen with the unitary robots, I think people are going to, you know, really start to appreciate just how crazy robot robots are going to get because I think right now robots are still in that phase where it's like an iPhone like the early days of the internet and you know it's there but it's not really impacting you until the later stages. is and I think the Boston dynamics are really going to show us some crazy things within the next 10 years. Now, something that I really want to talk about and I genuinely for the life of me cannot believe that this AI/rootic story was not picked up on a wider note is because this company managed to achieve something and I'm going to play this video for you all because it's absolutely incredible what they managed to achieve. But they basically managed to achieve a zeroshot robot that was able to clean in a house that it's never seen before. So if you aren't familiar with this company, this is called Pi Zero and they basically built this foundation model for robotics. That is absolutely insane. Like it can even be used um in crazy stuff. So we can see here it says humans, you know, can operate in new environments. Robots can only operate environments they've been trained in. And this is PI 0. 5, our first step towards changing that. So they tested it in Airbnbs across San Francisco that the robots had never seen before and they were able to clean effectively in those places. Guys, you can see here she says, "Can you please place this the dishes in the sink, too? " And the homes are entirely new to the robots. I remember making a video about this ages ago. Like saying that this is the moment. This is the chat GBT moment for robotics. And the video only got around 10K views or even 9K views. It wasn't really received that well because I don't think there
Autonomous Home Cleaner
was a flashy demo. But this is completely autonomous. A robot going into a house and is able to do tasks, you know, that it hasn't really done in that same environment before. Now, you might be thinking, okay, well, why is that such a big deal? Why on earth does that even matter? Like, who cares? Okay, the robot is doing some stuff it hasn't seen before. Isn't that what robots usually do? No. Robots don't generalize well to new environments. In fact, I remember there was even some, you know, famous roboticists or, you know, people within the industry who kept stating that, you know, one benchmark that would prove that robots have truly leveled up is if it can go into a home and make a coffee from scratch. I'm not sure what benchmark that is, and I'm not sure that this robot would be able to do that just yet, but going into a random home that it hasn't seen before, non- teleyoperated, to be able to, you know, clean that environment is absolutely outstanding. And I know it seems super basic to people that are not following the field, but trust me guys, this is a major milestone. Now remember, this company has only existed for around I think 18 months. They've been building and stuff for quite some time, but they are backed by some of the biggest companies out there. They've gotten I think over a billion dollars in funding or even more. I remember that companies like Amazon and OpenAI have, you know, invested millions and millions of dollars in this company. So we can actually expect this company to be a major player. The reason I talk about this company as well is because this company isn't building the hardware side. The hardware side is not really their main game. Their main game is to build the brain for these AI robots. So that when you know companies like Tesla and Figure and all of these other companies when they need to deploy their robots in new environments, this is the company that they go to. So honestly guys, I am super surprised. This is something that I probably should have done a dedicated video on, but it honestly did slip my mind, which is now why I'm including it in this video. And it does go to show that, you know, whilst AI moves ahead at an alarming pace, robotics is not that far behind if we're now generalizing to unseen environments. So, Pi Zero is definitely absolutely crazy and it's quite likely that we will get even better robots in the future. So, of course, as always, I'll leave a link to this one, but this one literally blew my mind as someone in the AI field. Now, also, if we're going to look at AI and robotics, we can see the iron, the Xpang iron. So, I wanted guys to see the old gate of the robot. So, this is the robot walking, you know, several months ago. This is how it used to walk. I mean, some people hilariously would say that is the Joe Biden walk or just like, you know, it needs to defecate. That's what some people would say. Honestly, that is quite amusing because it is quite true. And you know, we can see that the initial first versions of this robot necessarily aren't that smooth. But nonetheless, being able to have a robot that walks like this is still an achievement. And then of course, you can see on the new version just exactly how much they've managed to improve that gate in such a small amount of time. So often times the hardware is not really the issue. The physical robot's body isn't what is driving the problems. I mean oftent times it is just the software, the training, the reinforcement learning being able to get that robot to really understand how to move itself. And over time it's quite likely that this robot will become more capable. Like I've said before, we've already seen many different robots, you know, start out as super basic, you know, robots that can't really do that much and then over time they become super capable as reinforcement learning takes place and as they've been trained in simulation so much. So definitely looking to see what Xpang are doing because this is a company really interestingly. They have like a bunch of electric vehicles and autonomous vehicles. So they are basically the Chinese version of Tesla
China's Robot Surge
and I know that China does not mess around when it comes to their production. So, I'll certainly be paying attention to what they do because China are always trying to out compete the US, which means, you know, now that they're really, really in the AI race, it's going to be, you know, not surprising if they manage to even surpass us in some of these robotics tasks. Although, I think the US definitely has a steady lead. I wouldn't be surprised if China start making some incredible progress. And we've already seen things like Engine AI. Now, of course, there was this video that was going viral on the internet and some people are claiming that this is the world's first robot attack and it's easy to understand why that is. Now, of course, you guys can see here I'm talking a lot a bit about robotics and you know, I am doing that for now, but essentially we can see here that what happened is this robot started to glitch out and started to just move out in a crazy way. And of course, you can see the engineers surrounding the robot decided to quickly try and rein it in because they were seemingly quite confused as to what just happened. And this went viral on, I think, Chinese social media. It also went viral on Twitter. Most people were like, "Okay, is the robot uprising here? What on earth is going on? " But of course, as always, there are pieces of information that don't necessarily make the headlines that, you know, clear things up quite a bit. So if we actually take a look at this, this is the same robot that was in the simulation. And what's interesting is that we can actually see that this robot that was trained in simulation actually had the same kind of movements that we just saw. So it's clear to me that it's, you know, clear that, you know, it's clearly, you know, in simulation they probably made a mistake and this was probably the policy that the robot just adopted for whatever reason happened. And then of course that policy somehow you know got embodied onto the robot and the robot just started playing this policy. So I don't think the robot was like angry and trying to like chop the heads off their AI engineers because you know the robot I don't think it's bound by an LLM just yet or any kind of AI brain but um yeah we can see here that in simulation this is exactly what we saw and then of course with the real video that was exactly what was there. So once again it isn't going to kill us just yet but nonetheless I wanted to show you guys what it looks like in simulation. Now, if we're going to talk about AI actually killing us, Dan Hendris actually did a study on open eyes 03. So, this is where he's showing the fact that open eyes 03 models are getting really good at working with the viruses. So good that they can now outperform 94% of real expert viologists on a special test called the viology capabilities test. So
AI vs Virus Experts
this test checks if AI can understand complex experiments like flu virus lab tests, troubleshoot problems like when an AI experiment goes wrong. I mean a vir viology experiment goes wrong and also solve tricky science questions you know just like real viologists. And the chart basically shows that AI has gotten way better over time and opens 03 is now far above previous AIs in understanding and solving virus related problems. And you know, Dan here is raising a real concern. If AI can already troubleshoot virus experiments so well, it might also be able to help create dangerous bioweapons someday. Meaning that AI is moving into very serious areas that could become risky if not handled correctly. And the thing we have to really sort of, you know, think about here is this going to be something that, you know, gets worse over time? I think, you know, one of the things I saw recently was the fact that if you want to use the GPT4 image API, you actually do have to put some kind of verified organization ID in. And I'm guessing it's potentially if you generate harmful material. I don't actually have confirmed information on that. But the point I'm trying to make here is that like if AI gets to the point where it's able to have such capabilities that could potentially implement danger, I think OpenAI may have to restrict certain models because the capabilities are far too intense for the average user. So I mean, you know, it's like putting a shotgun in the hands of a child. I just think you wouldn't do that. It would seemingly be quite irresponsible to do that. So, I think the thing here that we really need to take account of is that, you know, this is currently the worst these models are ever going to be. It's quite likely that these models are going to get smarter over time. They're going to be able to reason more effectively and in doing so, they're going to become better at a range of tasks. Remember that these AI systems are general in nature. So, they're not really going to be specifically trained to do that, but as a byproduct of becoming so much smarter, they are going to be able to do that. So it's going to be really interesting to see just how OpenAI manages to handle that without crippling the model. So now this is where Anthropic are warning that fully AI employees are a year away. So Anthropic actually expects AI powered virtual employees to begin roaming corporate networks in the next year and the company's top security leader told
Virtual Employees Soon?
Axios in an interview this week. Now, this matters because managing those AI identities will require companies to reassess their cyber security strategies or risk exposing their networks to major security breaches. And the big picture is that virtual employees could be the next AI innovation. And of course, AI agents typically focus on a specific task, but virtual employees could take that entire automation a step further. Those AI identities would have their own memories, their own roles in the company, and even their own corporate accounts and passwords. they would have a level of autonomy that far exceeds what agents have today. And in that world, there are so many problems that we haven't solved yet from a security perspective that we need to solve. That's what the representative said. So between these lines, the problems include how to secure the AI employees user accounts, what networks it should access, should it be given, you know, this or that, who's going to be responsible for managing its actions if something goes wrongs. These are all the things they are talking about. And anthropic believes that it has two responsibilities to help navigate AI related security challenges. Firstly, thoroughly to test claude models to ensure that they can actually withstand any cyber attacks and then of course the second is to monitor any safety issues and to mitigate the ways that malicious actors can abuse Claude. So, of course, before these AI employees are ever debuted, we have to make sure that they are extremely safe because the level of autonomy that these models will actually have is going to be like something we've never seen before. They'll arguably have the same level of control as you do, which means that, you know, if these AI models tend to go rogue, that could be pretty bad for the company. Overall, I do think it's kind of interesting to talk about, you know, AI employees only being a year away when we haven't really solved the hallucination problem when it comes to AI agents. So I think they might have to solve that one first before we can even start thinking about that because when AI agents hallucinate they do have several ramifications that aren't exactly the best. Now there was also this tweet that was rather interesting. Payman Milanfar is a distinguished scientist at Google where he leads the computational imaging team and this guy is recognized for his pioneering work in computational imaging, image processing and the application of AI to imaging technologies. Now, it's pretty crazy because this guy tweeted out something that I don't want to say is a little bit controversial, but it gave me insight into what might be going on behind the scenes at AI. Now, he posted this without context and said, "I've never received so many résumés from Meta. "
Meta Talent Shift?
Now, let me show you guys some initial context that might give you guys the reason why he said that. So recently there was this video that I did a video on and the video was basically speaking about how Yan Lun basically said that I'm no longer interested in LLMs. Now I do think he's doing this for good reason. A lot of people seem to think that you know because he's no longer interested in LLMs. It just means that you know Meta's AI was a failure. But I do have to say that Meta's Llama 4 did actually underperform in terms of what they were expecting to achieve, especially with the timeline horizon. So Meta basically, you know, delivered Llama 4 quite late. And in doing so, it was also quite behind Deepseek and Quint 3. And of course, there were some very, very shady tactics around the benchmark area, which is of course, you know, leading people to believe that, you know, Llama 4 was something that they really just didn't succeed at. And there were even individuals that essentially were trying to distance themselves from llama 4 so much. So right now it isn't really looking the best for Meta. I mean you know currently Gen AI is very competitive in the entire industry as a whole. But I think that you know over time I do think that Meta will eventually manage to find their footing because they do have a billion users. lot of distribution and they still things that are going for them. Now, I also want to showcase this interesting benchmark that I came across. Someone tweeted at me this interesting benchmark that I really hadn't seen anywhere before, but this is an interesting LLM benchmarking system. The Sherlock Bench is a benchmarking system designed to test an LLM's ability to proactively investigate a problem. Sherlock Bench is an interesting benchmark because it is resistant to memorization because it doesn't use a Q& A format. Essentially, it tests if the model can practice the scientific method, i. e. hypothesize, experiment and analyze. A model has to use function calling and structured outputs to perform well at this benchmark. And that means it gives an indication of whether this model is actually useful for professional or business workloads. And the benchmark system is open source and it is easy to make new problem sets for it. Now, when we get into the actual results here, we can see that there are a few interesting things that we can see. Right now, it looks like 04 mini from OpenAI is currently the state-of-the-art in this benchmark. Now, of course, this is an interesting one because it's a new kind of benchmark and
Sherlock Benchmarking
it's kind of based on a different kind of reasoning. But, interestingly, we can see that O4 Mini is really levels ahead when it comes to this kind of reasoning. Now, currently, if you are wondering where the two models that most people love are, such as Deepseek R1 and Gemini 2. 5 Pro, Deepseek R1 does not support function calling, so they can't benchmark this model. And Gemini 2. 5 Pro was unfortunately giving some errors. So, that's why this isn't here. But overall, I do think that, you know, it does show us that these models are getting smarter when it comes to various different benchmarks. Now, this is where we start to get into the speculative section of the video. Microsoft seems to be cooking up a new AI model or a new batch of large language models that are around coding. So, it seems that they've so shown something called Nexcoder. And it says Nexcoder is a family of code editing LLM developed with selective knowledge transfer and its training data. So, it could be an opensource model that is able to code. This has been spotted on HuggingFace. There are no public files yet, but there are frequent updates hinting at a code focused LLM, possibly tied to the Fi series or Copilot. So, it's not that much information. I wish I really did have more for you, but it might be in the Fi series that could be a lightweight coding model that could be really effective when it comes to performing well on certain coding tasks. Now, of course, that's not the only company that is working on their next model. We also do have Grock 3. 5. So Elon Musk states that next week Grock 3. 5 is going to be an early beta release to super Grock subscribers only and it's
Grok 3.5 Incoming
going to be the first AI that can reason accurately answer technical questions about rocket engines or electrochemistry and apparently Grock is reasoning from first principles coming up with answers that simply do not exist on the internet. Now interestingly I have been doing some you know browsing around to see if anyone has this. Some people are claiming that they do have this model. Like they are posting screenshots saying, "Look, I do have Gro 3. 5. I can use this. " And I am a little bit skeptical of them because I haven't really known anyone to be able to use the model yet. But they are stating that these models are completely different and they're just so good. I do wonder if it is kind of like an O3 type model that is just super effective. But overall, it will be interesting to see what Krook 3. 5 has because the last time we got an update from Grock 3, it was really, really top-notch, and it was so surprising at how capable the model was. Now, next here we have Stuart Russell, the computer scientist, predicting four major predictions for the future. So, one of them being scaling up LLMs won't lead to AI. Two, AI labs already realize this and are exploring new methods. And three, governments won't act on AI safety until a major incident incident. basically stating that the best case scenario is that we're going to have some kind of Chernobyl like event which is uh where an AI does something that is so drastic that it then causes new rules to be made up. Now if you don't know what Chernobyl was basically a nuclear
AGI? Not Yet
reactor kind of just exploded leaking out nuclear radiation into nearby areas causing deaths and cancer and just a whole load of issues for anyone in that surrounding area. And the thing is that like this is pretty true. People aren't going to realize how bad things get until they really get bad. And I know that sounds super simplistic to say, but the way the human mind works is that it's more sort of corrective in the sense that it will wait for things to get really bad and then implement a correction rather than safeguarding beforehand. And so it's going to be really interesting to see how that does occur. There are maybe half a dozen futures that could unfold. This is my current best guess about the most likely path that the world is going to take. So the first that further scaling of these large language models such as chat GPT is not going to lead to AGI. I think the big AI companies already understand this and they are working on alternative and complimentary approaches. They claim to be making a lot of progress and I think it's likely that within a decade we will see those transformative advances where AI systems start to exceed human capabilities in very important ways. not necessarily in all ways but in ways that are enough to create massive transformation in our world and to pose significant risks to us. I also believe that governments are not going to legislate and enforce regulations on the safety of AI systems. So they are going to allow the companies to do what the companies themselves claim will lead to at least a 25% chance of human extinction. So in the best case we will have a Chernobylized disaster and then the governments will wake up and do something. That's the best case. The worst case obviously is that the disaster is something that's irreversible uh and we lose control. Now in an update to Chat GBT, there is also a shopping feature where they're essentially experimenting with making shopping simpler and faster to find and compare products and buy products inside of Chat GBT. So they've improved product results, visual product details, pricing, reviews, direct links to buy product results are chosen independently and are not ads. And these shopping improvements are basically rolling out today. Now, the reason I think this is so interesting is because it does go a
AI Buys For You
little bit deeper than just having, you know, your AI find you the best espresso machine after doing search. Of course, search is being disrupted on an incredible scale, the likes of which we've never seen before. And Visa CEO Ryan Mccclain says that the company is launching AI agents that will shop and pay on your behalf. With partners like OpenAI and Perplexity, Visa is going to be enabling AI credentials, spending rules, and merchant trust, turning payments into infrastructure for agents. So, apparently, he said that we're not just going to see this in months, but in the next couple of quarters, agents are going to be buying things for us. So, first of all, we're all going to have agents, and they're going to be able to go out and scour the world's inventory and find what it is that we're looking for, either for ourselves, maybe it's tickets to a hard to get concert, or as a gift, you know, a hyper relevant gift for my wife for Mother's Day. And what we've announced is a set of tools that give agents the capabilities to go make payments on your behalf. Think of that as AI enabled Visa credentials. and also the rules and the capabilities that will provide trust. Trust that consumers will have in their agents, trust that merchants are going to have that they're going to get paid and trust that financial institutions will have that you've actually empowered your agent to make those purchases on your behalf. So, we announced AI enabled Visa cards. That's going to allow you to give your agent agency giving them your Visa credentials to go buy something on your behalf. But we also announced important capabilities, things that enable you to set parameters and tell your agent how long do you want them to go look for something. Maybe those concert tickets we were talking about. How much are you willing for your agent to spend even to designate specific merchants that you want your agent to go look at and not go look at other merchants. So, it's the products, it's the capabilities that are going to give you and everyone else trust that your agent is going to go buy exactly what you're looking for on your behalf. Okay. So that concert, what year are we talking? How are we going to start to see the companies embrace the fact that I will have an agent that will go out scouring and buying on my behalf? Yeah, you're going to see this in months. the next couple quarters. We announced a series of partnerships with all of the leading players in this space with OpenAI, with Microsoft, with Anthropic, with Perplexity, with IBM, with Stripe, with all of the key players that are building out the infrastructure to enable this type of buying and shopping experience. So, it's going to be here soon. Now, another interesting
Explosive Code Growth
metric to look at is, of course, the progress AI is making. Noan Brown, the lead for reasoning at OpenAI, said he recently made this plot for a talk that he gave on AI progress and it helped him appreciate just how quickly AI models are improving. Of course, there's still a lot of benchmarks where progress is flat, but the progress on code forces was quite flat for some time ago. So, we can see right here from GPT3 to GPT4, there wasn't actually that much of a jump on code forces. Even to GPT40, there was bit of a minor jump, but with 01 preview, we can really start to see that things get remarkably impressive as we dive into the 01 series. It's really, really interesting here. And I do find it funny sometimes that we literally just skipped O2 because of course there's like copyright with names and that. So of course 03, which is literally the only second iteration of the model, is doing such a crazy jump. Should show you guys just how crazy this is. I mean GB 3. 5 was literally just at the end of 2022 and not less than you know 3 years later we've moved forward to a model that is nearly the you know on the level of the top human competitor. So when we really look at that you know we can see that this curve is starting to go upwards and upwards kind of you know exponential growth if you will. Of course you can't really have exponential growth in AI in the sense that you know things compound on themselves. I mean in the future they probably will but I mean just looking at the growth that we can see here it's definitely really impressive to see just how quickly things are moving. Now something that also surprised me was that Satin Adella said that AI actually generates 20 to 30% of Microsoft's code and the reason it's surprising is because I know that a lot of people talk about coding so much and I remember it was even last week that Google basically said that it generates 30% of their code also. So Satan and also saying this was a little bit of an eye openener. There's two sort of things we're tracking. One is the accept rates itself, right? That is sort of whatever 30 40 it's going up monotonically. Uh and it depends like one of the big challenges we had for a long time is we are a lot of our code is still C++ um C and C# is pretty good but C++ it was not that great Python it's fantastic so we now gotten get better at that so as language support has increased the code completions have gotten good the place where the agentic code still it's very it's sort of nent for new green field it's very high uh but as I said it's nothing is green field uh in many cases and so therefore I would say maybe at this point the PR oh by the way code reviews are very high so in fact the agents we have for reviewing code uh that usage is increased and so I would say maybe 20 30% of the code that is inside of our repos today in some of our projects are probably all uh written by software now when it comes to AI developing things as well and code Mark Zuckerberg actually said that half of Llama's development will soon be done by
Half-Human Dev Teams
AI and not people. And Satin Adella says that we're going to re-imagine the infrastructure the agents use from sandboxes to repos. And of course the remaining human engineers, they're basically going to be like tech leads with their own little army of AI agents. So it's going to be interesting to see how this development occurs because Meta are clearly trying to build their own research AI agent, which is going to be something that I'm quite interested in. But the big one that we're focused on is um building an AI and a machine learning engineer to advance the llama development itself. Right? Because I mean our bet is sort of that in the next year probably you know I don't know maybe half the development is going to be done by AI as opposed to people and then that will just kind of increase from there. I mean to me the the uh agent is the sort of the first attempt. So the question for us is in the next year can we get um like let's take a kernel optimization right will we get to sort of something like that happens I think it's more likely whether it comes up with a novel model architecture change probably not so the question is which task yeah no optimizations security improvements that type of stuff I think seems like it's pretty high opportunity and the other thing for us is yeah to your point our core business in fact you know Bill started the company as a tools company and so to us uh the interesting thing I'll think about now is maybe the way we should reconceptualize our tools is and infrastructure quite frankly are the tools and the infrastructure for the agents to use because even the three agent needs a bunch of tools and what shape should they be uh what should their infrastructure what should their sandboxes be so a lot of what we're going to do uh is essentially evolve even what does the GitHub repo construct even look for the su agent Yeah. No, it's that that's a it's a very interesting concept. And I mean I tend to think that like every engineer is effectively going to end up being more of like a tech lead in the future that has sort of their own little army of of engineering agents that they work with. But yeah, as well we have GBT4 gone. GT4 was a great model. Everyone remembers the release. You could only message it 25 times within 3 hours. And today Samman officially sent off the model. Goodbye GT4. You kicked off Revolution. will proudly keep your weights on a special hard drive to give to some historians in the future. Now, another interesting update that I wanted
Apple & Anthropic Collab
to share with you guys as well is the fact that Anthropic and Apple are teaming up to build an AI powered vibe coding platform. So, this one kind of caught me off guard because Apple, whilst they have some of the most talented AI researchers and developers out there, they don't seem to really be deploying anything worthwhile. But it seems that they're teaming up with Anthropic on a new Vibe coding software platform that will use AI to write, edit, and test code on behalf of programmers. The system is a new version of Xcode, Apple's programming software that will integrate Anthropic's Clawed Sonic model according to people with knowledge of the matter. So apparently they're going to roll this out internally and they haven't decided whether to launch it publicly. So I personally don't think Apple will launch this publicly. We know that they have this perfectionism where they want to release everything to a certain standard and I mean it's understandable that's how they've built their brand for the last 20 years or so and you know AI just doesn't really mesh well with that. AI is a little bit imperfect. Often things can make mistakes. So it will be interesting to see how Apple managed to integrate this. Now Google also did introduce a new image feature. I did speak about this quite before when they had this image flash generation tool that was released just before GPT40 image. But I will say interestingly this one does seem a little bit better than the one before. So here we can see that it actually does keep the subject image and just add things and replace things. Before when I actually used it again like checking the second or third time I actually did see that things didn't maintain their world consistency. like things just, you know, kind of just got a little bit out of control after two or three prompts. But here we can see that they've actually updated this and you can try this in the latest version of Gemini. So I'll definitely go on over to Gemini to try this. Now another thing that we had here was we had the OpenAI CPO Kevin Will talk about the fact that when they release their open weights model, it's not actually going to be the
OpenAI’s Open Model Strategy
Frontier model. It's going to be an open weights model for something that is really good, but of course they're going to keep their proprietary models secured down. And I think that does make sense. I know a lot of people give OpenAI flack for not releasing open- source models, but I do think it does make sense. I mean, you pour billions and billions of dollars into something to maintain a competitive advantage. There really isn't any point giving that stuff away for free. I want the best open weights model in the world to be a US model. I want it to be built on democratic values. I don't want the best open weights model in the world to be a Chinese model. And so from that perspective also, we think it's really important that we put a great model out there. Now it will not be our frontier model. The way we think about it is uh probably something like a generation behind because putting a frontier model out is also accelerative to China. Um but we think we can put out a great open weights model that the entire world can adopt. So we have a model that the entire world is using built on democratic values. We will also have our full frontier models that'll be a generation ahead that we offer like we do today. and the US will have the best models in the world regardless of how they're used. I think the critical point there is the frontier models are not open sourced and it's the smaller models. But I think core to your question is we do need to be very mindful collectively about the conditions to keep the US in the lead because we've been talking for quite some time that the gap between the US and China whether it's on models or on chips has been narrowing and now we've seen it with deepseek we see it with the Huawei progress on ascend on harmony OS you see it in a lot of the data around graduates the ascendancy of universities and