Mistral's Devstral: NEW Opensource Coding LLM! 1# On SWE Bench! (Fully Tested)

7:35

Mistral's Devstral: NEW Opensource Coding LLM! 1# On SWE Bench! (Fully Tested)

Universe of AI 24.05.2025 12 738 просмотров 279 лайков обн. 18.02.2026

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

In this video, we dive into Devstral, a brand-new agentic LLM built by Mistral AI in collaboration with AllHands AI. It's fully open-source under the Apache 2.0 license and is already ranked #1 on the SWE-Bench Verified benchmark, beating out massive closed models like GPT-4.1-mini by over 20%. [🔗 My Links]: Sponsor a Video or Do a Demo of Your Product, Contact me: intheworldzofai@gmail.com 🔥 Become a Patron (Private Discord): https://patreon.com/WorldofAi ☕ To help and Support me, Buy a Coffee or Donate to Support the Channel: https://ko-fi.com/worldofai - It would mean a lot if you did! Thank you so much, guys! Love yall 🧠 Follow me on Twitter: https://twitter.com/intheworldofai 📅 Book a 1-On-1 Consulting Call With Me: https://calendly.com/worldzofai/ai-consulting-call-1 📖 Want to Hire Me For AI Projects? Fill Out This Form: https://www.worldzofai.com/ 🚨 Subscribe To The FREE AI Newsletter For Regular AI Updates: https://intheworldofai.com/ 👩‍💻 My Recommended AI Engineer course is Scrimba: https://v2.scrimba.com/the-ai-engineer-path-c02v?via=worldofai" 👾 Join the World of AI Discord! : https://discord.gg/NPf8FCn4cD [Must Watch]: DeepCoder-14B: NEW Opensource Coding Model Beats 03-Mini! (Tested): https://youtu.be/U_OcMM_h-9g?si=MCkwIyGfxeLjSE72 Google Launches an Agent SDK - Agent Development Kit + Agent2Agent (Opensource): https://youtu.be/Cv6mUjdTowo?si=h0yqRsm0ZBAtkPVU Cline v3.10 UPDATE: Fully FREE Autonomous AI Coding Agent! (Chrome Browser, YOLO Mode, Drag & Drop: https://youtu.be/PodEIhAJco0 [Link's Used]: Blog Post: https://mistral.ai/news/devstral Continue Setup Guide: https://blog.continue.dev/devstral/ Allhands Setup: https://docs.all-hands.dev/modules/usage/llms/local-llms Openrouter API: https://openrouter.ai/mistralai/devstral-small:free But numbers aside, we fully tested Devstral in real-world coding workflows — and it seriously delivers. From understanding large codebases to autonomously fixing GitHub issues, this model is built for true software engineering, not just toy examples. 🔥 What We Cover: What is Devstral and why it matters How it compares to other models like GPT-4.1-mini and DeepSeek-V3 Running it locally with ollama/devstral Integration with Continue, OpenHands, and SWE-Agent Use cases for dev teams, IDE copilots, and enterprise 💡 Whether you're building your own coding assistant, testing LLMs on real-world bugs, or just looking for the best OSS model for autonomous agents — Devstral is a must-try. 📥 Available on: HuggingFace, Ollama, LM Studio, Kaggle, Unsloth 💻 Local-friendly: Runs on RTX 4090 or Mac 32GB RAM 📌 API Name: devstral-small-2505 🔧 Tags: Devstral, Mistral AI, Devstral LLM, Devstral benchmark, SWE Bench, Open Source LLM, Agentic LLM, Coding Agent, GitHub Copilot alternative, Mistral Devstral test, LLM benchmark, SWE-Bench Verified, Ollama Devstral, Continue AI, OpenHands, SWE-Agent, local LLM, software engineering AI, Devstral vs GPT-4, AI coding tools #MistralAI #Devstral #OpenSourceAI #CodingAgent #SWEbench #AItools #LLM #AutonomousAgents 👨‍💻🧠

Оглавление (7 сегментов)

Introduction

While everyone was busy this week with the Google IO conference and the release of Enthropic's no models, Mistrol AI was cooking behind the scenes. Just 2 days ago, they've introduced a new coding model that is completely open- source and it's called Devstral. It's their state-of-the-art open model released under the Apache 2. 0 licensing and it's designed specifically for coding agents. They've actually partnered with all hands to develop it. Devstrol is a new open-source agentic large language model built for real software engineering tasks. It significantly outperforms all previous open-source models on the sway bench verified benchmark which is scoring a 46. 8 percentage. That's more than 6 percentage points higher than the previous best. It even surpasses some large closed models like GBT 4. 1 Mini by over 20%. And the thing is, unlike

What is Devstral

typical large language models that struggle with realworld coding challenges like navigating large code bases or resolving bugs across multiple files and understanding project structure. Devstrol is purpose-built for this. It doesn't just spit out functions. It works through actual GitHub issues using agent scaffolds like open hands or suite agent. It is built upon open hands, which is something that we've showcased multiple times on this channel where it's simulating how a real developer would actually approach any sort of problem. And get this, despite its strong performance, this is something that you can actually run locally. They have a lot of instructions as to how you can set this up with LM Studio or with something like continue. You can run this on an RTX 4090 or even a Mac with 32GB RAM. Before we get started, I just want to mention that you should definitely go ahead and subscribe to the world of AI newsletter. I'm constantly posting different newsletters on a weekly basis. So this is where you can easily get up-to-date knowledge about what is happening in the AI space. So definitely go ahead and subscribe as this is completely for free. Devstrol is

Performance

flexible, fast and actually enterprise ready. You can see within the sway bench verify test it is outpacing even claw 3. 5 haiku sway smmith LM32B and even GBT 4. 1 mini. This is a 24 billion parameter model and it is something that you can actually access through continue. This is where you can simply go into the continue dev uh extension within VS Code and then you can try adding the O Lama Devstro model within continue and you can see that it is running within the extension to have it uh execute any sort of complex coding based task for you. And here's a guide to set this up with the continue. It's super simple. You're just going to need to make sure you have the continue extension within VS Code. All this is open source by the way and you're going to need to have Olama installed. So once you have that just run the lama rundevstrol command within your terminal and then you can go into continue and switch it to Asia mode to then access the devstrol model. Open router also provides the devstral small model for free. It is not going to get you the same performance and there is actually a rate limit to it but it is something that you can access for free if you do not have the compute to install it. Also, just an FI, if you're going to be using Mistl's API, able to access this model with this following pricing structure. 10 cents for 1 million input tokens and 30 output tokens.

Testing the Model

30 cents for 1 million output tokens. Now, what I'm going to be doing is using client to showcase the performance of this model. And this is where I'm putting in the open router API key and selecting the desk small model, not the free one, so we get a better uh performance out of it. So, let's first start off by having it generate a SAS landing page. This is a good benchmark to see how well the model is in terms of tackling front-end designs as well as seeing if it's capable of generating something that is appealing. So, let's see what it actually ends up generating. Now, before I showcase the SAS landing page, just keep in mind that this is an open-source model. So, it's not going to generate something too

SAS Landing Page

impressive, but it's still pretty decent for a 24 billion parameter model. This is the SAS landing page that it was capable of generating. You can see the simple pricing structure that's listed out. And I actually made a video on the new Opus model generating a SAS landing page. There's actually a huge difference between these two. And it's just crazy to think how far AI has become. But anyways, for an open- source model that is not even geared towards developing frontends did a pretty decent job. It's mostly used for com tackling complex software engineering tasks and that is something that I'll be showcasing next.

Debugging

So you may wonder what's the purpose of actually using this. Well, the reason why you may use this is to tackle complex software engineering tasks. You can deploy this model to debug anything. Like in this case, I have given it context to my overall codebase and it is working slowly on reviewing through all the files with the help of client to then see if there's any need for uh any sort of revision and if there is any sort of error, it will work on tackling that error for us. This is where it can automatically navigate and edit large code bases and that's why it is something that you would want to use as an open- source model. So, I know this model may not be good as a front-end developer, but in terms of debugging, it did a pretty exceptional job in terms of identifying the several issues. It was able to find the six different issues that I had within my repository. I actually have around eight that I had uh tweaked on my own. And in this case, it is a bugged repository that I use with different model testings. And in this case, this model was able to find six potential issues and it is now working on resolving all of them. So I'm pretty impressed to see that it did a decent job in terms of finding all the issues or partially all the issues. And you can see that whenever it makes a change, it is going to able to autonomously navigate and edit any of the different components within the large uh codebase that I have. And it's making sure that if there is any sort of edit needed for an individual file, it makes sure that it is not going to negatively affect the other components within the codebase. So if there is a change needed for multiple files, it will make sure that it executes that task by making the changes for all the necessary files needed. So it won't leave your code broken. And this is why I find this to be pretty impressive for that particular use case.

Outro

But that's basically it for today's video on this new Devastro model. This is actually their smaller size model. They're going to be potentially releasing a medium-sized and a large-siz model for it. So, I'm definitely going to keep you guys posted whenever they drop it cuz it's going to be insanely powerful if they release a large version of this and it could be able to rival up against many closed source models like even Enthropics older claw 3. 7 Sonnet. So, stay tuned for that. But that's basically it, guys. I'll leave all these links in the description below. Make sure you go and join our newsletter as well as our Discord. Follow me on Twitter. And lastly, make sure you guys subscribe, turn on notification bell, like this video, and please take a look at our previous videos because there's a lot of content over here that you'll truly benefit from. But with that thought, guys, have an amazing day, spread positivity, and I'll see you guys fairly shortly. He suffers.

Другие видео автора — Universe of AI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник