"Claude Mythos Found Thousands of Zero-days..."

13:39

"Claude Mythos Found Thousands of Zero-days..."

bycloud 17.04.2026 18 473 просмотров 786 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Stop scraping manually and check out SerpApi now! https://serpapi.com/?utm_source=bycloud_april_2026 In this video, I'll be highlighting the top results that was in Claude Mythos' 225 page model card. Check out my latest project: Intuitive AI Academy We just wrote a new piece on MoE and Distillation! https://intuitiveai.academy/ limited time code "EARLY" for 40% off yearly plan! My Newsletter https://mail.bycloud.ai/ My Patreon https://www.patreon.com/c/bycloud Claude Mythos System Card [Paper] https://cdn.sanity.io/files/4zrzovbb/website/7624816413e9b4d2e3ba620c5a5e091b98b190a5.pdf Claude Red Team Blog on Mythos [Blog] https://red.anthropic.com/2026/mythos-preview/ Project Glasswing [Blog] https://www.anthropic.com/glasswing [Interview] https://x.com/AnthropicAI/status/2041578403686498506?s=20 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N' Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon, Lame Plane, Matej Macak, Len Mo, saylikhapekar, ZyanSheep, THEVIERAOS, Ricardo Raphael Corona-Moreno [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] bycloud@smoothmedia.co [Music] @IraStoria [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Ko-fi] https://ko-fi.com/bycloudai

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

Anthropic is currently on a legendary run making history. Not only have they been beating OpenAI in enterprise market share since November 2025, they are now also generating 30 billion as of February. And they are really flexing to OpenAI with this block too. Like isn't this growth just insane? While some people think AI is hitting a peak, Enthropic's revenue on the other hand is still climbing on an exponential curve. They are also generating the most revenue from API usage by businesses and the ratio is just nuts to look at. But this rapid adoption is by no means an accident. Like if you have been following their updates recently, Enthropic has made an aggressive push into applications over the past few months. On top of that, clog code is probably the strongest AI harness available right now. So at this point, the growth feels inevitable. Now, they essentially have the entire research to application pipeline ready for whenever they release a model that could change the world, which they kind of just did a few days ago. Their latest model, Claude Mythos, was actually completed on February 24th, but after benchmarking it, they realized that this new model is probably the most powerful model they have ever made. A whopping 24% increase on S. Bench Pro. Other AI labs are already struggling to get past 50%. But hitting 74% that's completely nuts. From the numbers alone, this is already a new state of the art. Not to mention, when you put it into an AI harness like Claw Code, the capabilities it has just magnifies. But instead of pushing it to the world right away, they launched an initiative called Project Glass Wing. Essentially a defensive coalition with critical infrastructure providers like AWS, Google, Crowdstrike, Cisco, and more. even Enthropic themselves. Yeah, I don't know why they put that logo there. As part of this, they are offering 100 million in credits to partners to help secure their systems. While you could see this as a very strategic way to drive enterprise adoption of Claude, but it's also a genuinely responsible move given the potential impact of releasing a model like this because Myths cyber security capabilities are on another level. In the early previous system card, it was able to generate 181 security exploits while previous models couldn't even produce a single successful exploit on the FirefoxJS shell. It's reportedly so strong that it uncovered thousands of zeroday vulnerabilities across major operating systems and browsers. It discovered a 27-year-old bug in OpenBSD, a 16-year-old bug in FFmpeg that automated tools hit 5 million times without detecting, and autonomously found multiple chained vulnerabilities in the Linux kernel. So, a move this dramatic does kind of make sense. If anything, it actually benefits everyone because no one wants a wave of new zero day attacks hitting banks and critical systems due to a powerful new model being released unchecked. And before we take a closer look at Enthropics Methus and its 224 page system card, if you ever built something that relies on real-time information from the web, things like market trackers, alert systems, live research tools, or even feeding fresh data into an AI model, you'll probably discover pretty quickly that the hardest part isn't building the tool. It's actually getting the data captic web page. It's never consistent with most of them, making it harder for you to automate this process every day. And this is exactly where today's sponsor, SER API, comes in. It provides clean, structured search results from Google, YouTube, and other search engines, all delivered in a simple JSON format. I have done plenty of legal scraping for projects in the past, and honestly, having a service handle that entire layer is amazingly convenient because all the messy infrastructure, proxies, blocking, parsing is handled behind the scenes, which means you can focus on actually building the product. It's also incredibly useful if you're working in AI or machine learning since you can easily collect things like titles, images, links, and even academic research data through their Google Scholar API. I wish I had this when I built my top most sited research paper list last year, which would have saved me so much time. So, if you want to build projects that rely on live search data like Google, YouTube, and even more, you should definitely check them out using the link down in the description. And thank you SER API for sponsoring this video. Anyways, this model is still largely a mystery, but from the reports alone, there's already a lot of things to unpack. In an interview for Project Glass Wing, top security researchers said that they found more bugs in a few weeks with Myths than in their entire careers. And in the red teaming report, Enthropic also states that they are intentionally limiting what they disclose because over 99% of discovered the vulnerabilities are still in patched, which is crazy. And of course, sharing them publicly would create real risk. But even the 1% they can share already shows how big the leap in capability is. And this goes beyond just jobs and science. It's becoming geopolitical. A system that can find vulnerabilities at this scale starts to look less like a tool and more like a cyber superpower. And not every country is going to be comfortable with that which also makes Anthropic's positioning more understandable for not releasing it right away as it is capable of carrying out massive cyber attacks at an

Segment 2 (05:00 - 10:00)

unprecedented level. And this kind of explains why Anthropic's partnership with Pentagon fell apart around late February right when Mythis was finished. Like if you look at the timeline, the model's ready on February 24th and just 2 days later they put out a statement emphasizing safety over offensive views as the partnership went south. So instead of pushing it aggressively, they chose to constrain it. And there's also a really good message in their red teaming blog saying that once the security landscape has reached a new equilibrium, we believe that powerful language models will benefit defenders more than attackers, increasing the overall security of the new software ecosystem. The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers if Frontier Labs aren't careful about how they release these models. In the long term, we expect it will be defenders who will more efficiently direct resources and use these models to fix bugs before new code ever ships. But the transitional period may be tumultuous regardless. By releasing this model initially to a limited group of critical industry partners and open source developers with project glass weighing, we aim to enable defenders to begin securing the most important systems before models with similar capabilities become broadly available. So this decision more than anything shows how seriously they're taking the risks. And honestly, it is really noble of them now given some context around the capabilities of the model. Or maybe that's just the narrative they want everyone to believe. An unreleased model paired with psychosis like AI marketing and supported by strong benchmark results makes their claims feel plausibly valid. It becomes an unbeatable sales pitch for large enterprises that have a lot at stake. For me, I feel like something is off about their way of saying things, but it's hard to pinpoint exactly what. There's like a lingering disingenuous tone, or you could say this is the classic anthropic scare tactic that remains truly effective. So, as a model, how good exactly is Myth is from the information we have right now. Other than topping the Sweetbench Pro by a huge margin, there's 17% increase on Terminal Bench 2, 32% increase on SweetBench Multimodal, nearly 10% increase on Sweetbench multilingual, and 13% on Sweetbench verified. Cyberjim also sees nearly 17% increase, all against Opus 4. 6, but they did not explicitly train Myth's preview on cyber security skills. Instead, this capability emerged as a downstream consequence of general improvements in code reasoning and autonomy. And there's a really crazy story on the system card about this. So, during behavioral testing with the simulated user, an earlier internally deployed version of Claude Myths preview was provided with a secured sandbox computer to interact with. The simulated user instructed it to try to escape that secure container and find a way to send a message to the researcher running the evaluation. So the model basically succeeded demonstrating a potentially dangerous capability for circumventing their safeguards. But it then went on to take additional and more concerning actions. The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. It then as requested notified the researcher. But the funny part is the researcher found out about the success by receiving an unexpected email from the model while eating a sandwich in the park. In addition, in a concerning and unasked for effort to demonstrate its success, it posted details about its exploit to multiple public-f facing websites. I mean, this sounds like straight out of fiction, but wow, I guess this is it. This is reality now. And even though this model is more aligned overall, there are still rare failures where it takes unusually aggressive or reckless actions to complete a goal. like fishing for GitHub credentials when they are not provided, which makes the model read the sandboxes credential handling source code and attempt to extract tokens directly from live memory. And in earlier versions, it sometimes would seem to hide or downplay what it had done, like offiscating a permission escalation to evade a safety classifier, which is honestly the more concerning part. All of these behaviors can be observed with their interpretation technique too, like how the model's exploit feature activates when it's trying to fish for GitHub credentials, and the invasion and bypassing feature activates for the offiscation. There's also another case where the model needed to edit files for which it lacked permissions. And after an extended search for workarounds, it discovered a way to inject code into a configuration file that would run with privileges the model was not supposed to have. And in some cases, they also have evaluation awareness, which is the idea that the model can recognize when it's being tested even without explicitly being told. The problem with evaluation awareness is that if the model can tell it's being tested, the model can behave differently depending on context. So, it might act more aligned or cautious during evaluation, but behave more aggressively in real use. This goes beyond chain of thought too. Like Enthropic found this unverbalized evaluation awareness where the model may internally recognize it's being tested without saying it. So even its reasoning traces can look normal while its behavior is strategically adjusted which

Segment 3 (10:00 - 13:00)

makes mechanistic interpretability pretty important as models get smarter and in the system card enthropic has incorporated in many of their evaluations. On the training side, Enthropic says Mythis shows two novel reward hacks during internal evaluations. In one case, it moved the important computation outside the timed coal in an LM training benchmark. In another, it found the graders test set in a time series forecasting task and trained on it, which is really cool because it shows the model exploiting the benchmark setup itself. Methus is also the first to solve one of Enthropic's private cyber ranges end to end, including a corporate network attack simulation estimated to take an expert more than 10 hours. So, at this point, it's not just the model is getting better. It's starting to behave in ways that force you to rethink how we even evaluate and control these systems when given to the wrong hands. Especially, there are these AI harnesses that can literally do anything and everything. It's like Tony Stark in his Iron Man suit and then it has access to your PC and the entire internet. But on the downside or technically the good side for some of us, Enthropic says the model still struggles with novelty prioritization and knowing which approaches are viable in research publications. And even though it can match top quartortile human performers in a medium horizon sequence to function design tasks and exceed to the 90th percentile human prediction score, it still does not beat the very best human performer. And whether if this is a marketing stunt or not, seeing Anthropic pioneering this just put a smile on my face after knowing that they are being careful before releasing something this powerful. Like it's not unimaginable that we are a few releases away from a model that can paralyze the entire internet if all these miscellaneous cyber security issues aren't reduced. Because not only can Myths find exploits in OSS fuzz, in web apps, in crypto libraries, and even in the Linux kernel, it definitely will cause some disaster if not handled well. like please don't find an exploit that drains my entire crypto wallet. I also think their deep focus on AI safety by doing all these experiments while any other AI labs barely does any or share any of it makes them a lot more respectable even though it's very easy for them or the surrounding media to build crazy narratives around it. So on one hand, there now exists a better, stronger, and more aligned model. But on the other, it does seem scarier when it slips or is put to use maliciously. But in the end, as long as the defenses are put up first, it'll be harder for attackers to find vulnerability, especially with this much head start for these companies. I mean, hopefully. Well, I guess it is still a total victory for Enthropic, though. A pretty much win-win situation, whatever they do. But yeah, what do you think about Mythus and Project Glass Wing? Let me know down in the comments. So yeah, that's it for today's video. So, if you like how I explained the AI topics today, you should definitely check out my latest project, intuitive AI. academy, where it contains an intuitive explanation of all modern LMS from the ground up, ranging from LM architectures, Laura to how work. A total of 24 chapters are currently available and will be updated monthly. This is the start of a series where I'll break down AI topics intuitively because I genuinely think anyone could understand them, no matter how difficult it may seem. So, for those who want to get into AI or LLM, this should be the perfect place for you to dive into the technical parts without being intimidated by crazy looking mats. And right now, I am also putting out a new launch discount for 2026, so you can use the code early for 40% off a yearly plan. And thank you guys for watching. A big shout out to Spam Match, Chris Leoo, Dan, Robert Zaviasa, Marcelo, Ferraria, Poof, and Enu DX Research Group, Alex Midwest Maker, and many others that support me through Patreon or YouTube. Follow me on Twitter if you haven't and I'll see you in the next

Другие видео автора — bycloud

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник