Cosines New AI Software Developer GENIE Surprises Everyone! (AI Software Engineer)
11:26

Cosines New AI Software Developer GENIE Surprises Everyone! (AI Software Engineer)

TheAIGRID 01.09.2024 16 917 просмотров 478 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
Prepare for AGI with me - https://www.skool.com/postagiprepardness 🐤 Follow Me on Twitter https://twitter.com/TheAiGrid 🌐 Checkout My website - https://theaigrid.com/ Links From Todays Video: https://cosine.sh/ Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos. Was there anything i missed? (For Business Enquiries) contact@theaigrid.com #LLM #Largelanguagemodel #chatgpt #AI #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #Robotics #DataScience

Оглавление (3 сегментов)

Segment 1 (00:00 - 05:00)

so software development has taken another massive stride with cosign Genie coming in showing us the new state-of-the-art fine-tuned version of GPT 4 that can perform 3. 8% on the new software engineering verified Benchmark announced last Tuesday take a look at their announcement video It's rather fascinating hi I'm Ally co-founder and CEO of cosign a human reasoning lab and I'd like to show you Genie our state-of-the-art fully autonomous software engineering colleague Genie has the highest score on SW bench in the world and the way we achieved this was by taking a completely different approach we believe that if you want a model to behave like a software engineer it has to be shown how a human software engineer Works we've designed new techniques to derive human reasoning from real examples of software Engineers doing their jobs our data represents perfect information lineage incremental knowledge Discovery and step-by-step decision making representing everything a human engineer does logically by actually training genie on this unique data set rather than simply prompting base models which is what everyone else is doing we've seen that we're no longer simply generating random code until some works it's tackling problems like a human so let's take a look at Genie solving a real problem from a real repo you'll notice you can prompt Genie with a natural language prompt ticket or in our case a GitHub issue so I'll go ahead and start so now Genie's fetched the GitHub issue when I click solve it'll start looking into the problem as you can see it started thinking about what it'll need to find in order to solve this problem this process is iterative and it will keep going until the model is satisfied that it's found everything that it needs there we go we can see that it's pulled a couple of examples of files from the code base that intuitively look like they're relevant to the issue that we're looking at now it's going to start writing code to try to solve the problem much like the retrieval step this process is also iterative Genie will write code run it and then react as a function of what it's seen one of the great advantages of our data first approach is that because our model has watched more human solve problems than any human could in a lifetime it has a great grasp of how software Engineers really breakdown and triage issues it's also easily able to edit code in place which is something that foundational models struggle with without rewriting entire sections Genie is now running the code its writing and is using the debugging tools that we've given it to look at application State and execution flow just like a developer would again it's seen humans do this millions of times and is emulating that process so back to the task we've just watched Genie try a couple of different approaches to solving this problem and at first it wasn't successful so it planned again and has just written an alternative approach this process can continue indefinitely and because of the long context window that Genie has available to it many different approaches could be tried without losing an information along the way there we go all the tests have now passing jinia successfully solved this problem and it solved it in just 84 seconds which i' guess was much faster than any human could come to an unknown repo with an unknown issue and solve a problem so now it'll write a PR title and body and actually open the pr on our link GitHub repo through the cosine web platform any comments or reviews left on that PR will be heard by Genie and will be acted upon as if it was a real human colleague we'd like to thank open AI for allowing us to fine-tune such a long context window model and I'm extremely excited to see where and how you guys use Genie if you'd like to give Genie a try just head over to our website at cosign Dosh we truly believe that software engineering is just the starting point and that we can codify human reasoning for any job or industry we can't wait to show you what we've been working on now with this what we can see here is the other models that are on this Benchmark so thewe bench verified leaderboard is the leaderboard that puts together all of the previous agents SL models SL agentic workflows that work to solve these issue now previously the previous high score was Amazon Q's developer agent at 38. 8% now what's crazy about all of this is the rate at which models are improving we can see that from 7% earlier this year all the way up to 43. 8% this is a remarkable level of improvement now the reason that this is truly remarkable is not mainly for the fact that we got better models but the craziest thing about all of this is that one of the things that you know Leopold Ashen brener someone who worked at open AI on the super alignment team what he actually spoke about in his paper the

Segment 2 (05:00 - 10:00)

decade ahead was this thing called un hobbling the gains and this was where by default the model learns a lot of amazing raw capabilities but they are all hobbled in sorts of Dumb Ways limiting their practical value and with simple algorithmic improvements like reinforcement learning Chain of Thought prompting with tools and with scaffolding we can unlock significant latent capability basically stating that look the way how we use LMS is rudimentary and over time we're going to figure out ways to get better and better with these models so overtime is going to be interesting to see how these models will perform in terms of their abilities that we manage to extract from those models when we understand what they're capable of for example in this paper it talks about this you know this is UN hobbling so imagine you had to solve a hard math problem but you had to instantly answer with the very first thing that came to mind it seems obvious that you would have a pretty hard time except for the simplest problems but until recently that's how we asked llms to solve math problems you remember in the first days of gbt 4 people would just ask it a question but after that what we decided to do was Chain of Thought we decided to give it a step-by-step scratch pad and it was able to solve much more difficult problems that were so Chain of Thought prompting unlocked that for llms and the reason I'm going over this is because now that we're seeing that with new methods and the way that new AI systems are performing we're managing to unlock more and more cas capabilities with the system you can see here how the base GPT 4 has gained you know around 40% on its level it says that gbt 4 base model 5% with just the base model to 20% with gbt 4 post trained on release to nearly 40% today with better post trining tools and agent scaffold so now the reason that I actually spoke about this is because this relates exactly to what cosign gen are doing and on their paper where they actually talk about this you know model they state that you know Genie was always designed to be agentic although when we first dreamt up the idea back in 2022 that term didn't really cement itself in 2022 that was you know really early on so basically what they're stating here is that from the start of developing this model they designed it to be you know autonomous they wanted this model to act independently and make decisions rather than a smart assistant that would just make it a passive toour they wanted this to be like an actual assistant so they wanted Genie to actually understand what it was looking at and respond in the most logical way quite like a human programmer would so essentially you can see here it says this is the tip of the iceburg when it comes to the work that was done to make as much implied information in a developer's mind explicit and for every task they train genie on they had to teach it how to First gather essential background information about the project and this was actually to prevent Genie from making up code that doesn't fit with the existing project structure that's where they talk about you know so that it wouldn't hallucinate code and ju solutions that were in line with how the code base was already organized and already operated so they put a lot of effort into teaching Genie the kind of background knowledge that experienced programmers already have in their heads but don't actually always write down basically how you teach some of the rules of the game but all of the unwritten strategies too now here's where they actually talk about how gen Works how it's you know a genetic Loop actually works they say that you know um the agentic loop is compromised of four main processes planning retrieval code writing and code running and these alone are not new most Tools in this space so the main thing is of course planning retrieval code writing and running and these are alone and not new of course most Tools in this space will'll use a mix of all of these but they say that because Genie is trained to perform each of these tasks like a human would rather than how a base llm would we're basically able to get so much more performance from the model so once again as I've spoken about before with the UN hobbling it seems that genie have managed to just extract more performance out of this model now another crazy thing that I saw was that they actually talk about the use of self-improvement in training the model they say that much of the data that we were training on was in a perfect State because the vast majority of the time the code that is published by human is in a working state for it to be published so basically what they did here which was rather you know genius was that they you know used the first version of Genie to try and solve coding problems and then when made mistakes they showed it how to correct those mistakes and they then added these examples of mistakes and corrections to the training data for the next version of GT and then they repeated this process multiple times so they basically used self-improvement of you know to train the model and I'm wondering that like could they somehow repeat this Loop in the future to get these models even better and you can see it says every time we repeated this process the initial candidate solution from Genie was stronger and in many cases is correct and the cases where it wasn't

Segment 3 (10:00 - 11:00)

the amount of correction we had to show the model in the data set was much reduced so there was this iterative Improvement of you know the model improving the model that was just completely crazy so um they also talk about the future and they state that you know despite Genie's impressive state-of-the-art performance we know that there's untapped potential and we're committed to refining the data set to enhance Genie's capability they're going to be broading data introducing new capabilities and that Genie will become Prof efficient in more programming languages and the latest Frameworks so overall they're going to be creating different sizes of AI models smaller ones for simple tasks bigger ones for more complex jobs and they can turn any advanced model into a genie by their method of fine-tuning and what's interesting about this is that they're stating that they're going to you know do an open source model and pre-training extending a foundational model on our extensive data set aiming for improved generalization and specialized data reconciliation and one of the things that they talk about is that a really exciting feature for businesses is that they can find Jun Genie to perfectly understand specific larger code bases this works even for uncommon or company specific programming languages it's like teaching Genie to become an expert in a company's unique dialect of code so this is going to be rather fascinating because the software development space for AI has evolved so rapidly and it seems like nearly every month we get a large update that shows how much these companies are improving

Другие видео автора — TheAIGRID

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник