Zapier Co-Founder on AI Agents and the Path to AGI | Mike Knoop (Zapier)
45:22

Zapier Co-Founder on AI Agents and the Path to AGI | Mike Knoop (Zapier)

Peter Yang 01.12.2024 4 024 просмотров 92 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
My guest today is Mike Knoop, co-founder of Zapier. Mike co-founded Zapier and started the ARC Prize to accelerate progress towards open AGI. We had a great chat about how Zapier works, how PMs and marketers can use AI to automate even more work, and why LLMs won’t reach AGI by themselves. Timestamps: (00:00) There are two types of AI automation (01:57) How Zapier automation works (05:14) I quit being Zapier's CPO to do AI research (07:54) I gave employees a week off to just use AI (11:57) How PMs can use AI agents to save time (14:03) We saved $100K by using AI agents in marketing (21:28) Robotic process automation might be better than APIs (26:30) Most definitions of AGI are wrong (30:45) Why LLMs won't get us to AGI and what might work instead Get the takeaways: https://creatoreconomy.so/p/zapier-co-founder-ai-agents-10x-productivity-mike-knoop Where to find Mike: X: https://x.com/mikeknoop Website: https://mikeknoop.com/ 📌 Subscribe to this channel – more interviews coming soon!

Оглавление (9 сегментов)

  1. 0:00 There are two types of AI automation 431 сл.
  2. 1:57 How Zapier automation works 759 сл.
  3. 5:14 I quit being Zapier's CPO to do AI research 658 сл.
  4. 7:54 I gave employees a week off to just use AI 917 сл.
  5. 11:57 How PMs can use AI agents to save time 482 сл.
  6. 14:03 We saved $100K by using AI agents in marketing 1773 сл.
  7. 21:28 Robotic process automation might be better than APIs 1136 сл.
  8. 26:30 Most definitions of AGI are wrong 933 сл.
  9. 30:45 Why LLMs won't get us to AGI and what might work instead 3289 сл.
0:00

There are two types of AI automation

I think there's like two kind of broad categories of automation one is API based Automation and then there's another type of automation that is what's called like robotic process automation is the industry name for RPA it's basically keyboard and mouse automation if you want to think about them and they have tra they have like pros and cons uh the API based automation this is what zappier does it's very permission oriented right you have to go get API Keys um but it's also way more reliable the API space is much smaller so it's simpler to create automations over and this is app without any AI to date or was able to sort of get as many users as we did the downsides of it are that the API is limited um and this is where RPA kind of like gets you know P exciting just due to the sort of nature of the technology without any AI it tends to be very hard to set up because you have to like literally like you know oh follow my mouse and keyboard and copy this sequence of patterns and so I think that's why generally like zapier and the API base has sort of won the day so far now importantly I've sort of described the world without AI I do think that like this RPA based sort of future is a much better one if we can figure out how to get it to work because you get anything that a sort of human could do with a given piece of software my belief today is that we need weak forms of AGI in order to get um the reliability and the setup ease of use high enough on let's call it the RPA style of Automation in order to get bring consumer products to Market on that but certainly within the r possibility now but the full version I think we all want you know can work with any app learn anything I can give it all right so my guest today is Mike n co-founder of zapier Mike leted zapier's push into AI agents that can automate daily tasks and he also recently founded the arc price to accelerate development of open AGI so welcome Mike yeah thank you Peter excited to chat about AI here today yeah so let's talk about zapier first for people who are not familiar like can you describe what is zap is and like you know why AI might be a great fit yeah zap is an automation platform
1:57

How Zapier automation works

our sort of mission is to put Automation in the hands of everyone including non-technical folks so folks who don't know how to code you know aren't an engineer developer and fact that indeed that's like the majority of zapier users are just like line of business folks you know Marketing sales support product management design being able to like learn and figure out how to use zapier to automate workflows that they use between a lot of the apps that they use day-to-day and one of the things that I started realizing a couple years ago as I was getting into AI is that sort of the promise of Automation and AI are kind of the same thing actually you know it's both like the desire for computers just to do more work for us right so we don't have to do the grunt work and do the lame kind of like grindy work that you know a lot of our jobs sort of entail and once I realize they're like Ah that's you know both why are like best users love zappier and also why people are so like enamored in love and the promise and dreams of AI it's the same thing and so you know I think I sort of realized you know oh yeah zapier is like an AI company it might not know it yet but it is yeah and like I wrote this whole blog post on zapi because I really believe like after you learn how to use like CLA and chat gbt like I feel like zap here is the next tool you should really learn how to use because you know it can actually do work for you when you're not there like with chat GPT I still have to talk to it all the time but and like you let's talk about how you let the push of Zap your AI or just AI in general inside the company in the first place like how do you get people to even start using AI stuff yeah so I mean I guess it sort of the Genesis comes goes back to my time in college I did mechanical engineering sort of focused on doing a finite element analysis and optimization you know imagine sort of like trying to optimize the structure of like a race card optimize weight versus performance and it turned out and I didn't know this at the time but all the math that you need in order to be really good at that modeling those types of S software it's all linear algebra and statistics it's the exact same math that underly seting and so when I sort of realized that maybe 2015 2016 I got kind of like into AI but mostly is a curiosity like you know I had a day job at zappier that was my main focus but I paid attention to a lot of the developments in deep learning and sort of the language model stuff and so I kind of you know had been paying attention to gpd2 and then three when it came out heck I gave a whole I remember giving a whole like presentation to at an all hand zap at zapier on like how cool jb3 was and how it worked and trying to get people like figur out use cases for it and you know I feel like I had a pretty good model of what these large language models could do and what they couldn't do and then in January 2022 there was this the original like Chain of Thought paper the sort of let's think step-by-step paper got published and I read that and all these like reasoning benchmarks that people had created to test language models were all spiking based on this like technique from I don't know 30% accuracy up to like 70 or 80% accuracy and that was my sort of first like whoo moment of hey this stuff can do something that I didn't price in to what I thought it could do and that has a really material impact both from like zap your customer perspective on automation but also just like as a human like I'm curious like shooter if we're on path for you know artificial Jour Intelligence on a short timeline like I want to know that and I would like to do what I can to make that happen because I think there's a lot of benefit from that and so that I
5:14

I quit being Zapier's CPO to do AI research

gave I still an exec at zapier I was running half the company at all product engineering reporting it for me and fortunately zapier's multiple co-founders and I was able to give it all back over to Wade I said waade you got to run this so that I can go just be an AI researcher I think it's the most important thing I can do here at this point and fortunately he said yes and so that's what I basically did for the next six or six months 12 months me and Myer co-founder Brian as a CTO we just wrote code pretty much all day trying to prototype with these like language model products to kind of get a sense of what they could do what they couldn't do and the very first thing we identified and this was remember like summer 2022 so before even chat GP came out we had sort of found that like tool use was one of the most important things that zapier felt like we could like help out with because zapier has a lot of tools right all the Integrations on zapier you could think of them tools all the actions and like these language models were frozen in time right because they were sort of Frozen from a training perspective on their weights and so we started prototyping with tool use and that's kind of ultimately what led us to getting involved in the open AI original chat PT plug-in launch and I think really sort of accelerated zapier's awareness into Ai and from there kind of just snowballed on awareness and users were coming to us and telling us what they wanted to do and you know now we've got like I don't know four or five I mean we got AI pretty much across the entire product we have several dedicated AI product as well it all kind of started from that personal kind of moment of got it and how did you encourage the average employee as zap here to actually use zap here AI or like any other AI tool or like just like you know yeah how do you do it um I was looking at the numbers we so again we're over like 50% of the company still is using AI tools on a daily basis it's out here and I've been interviewing a lot of like Executives or at least I probably did this last year more than I've done it this year but when I was meeting a lot of Executives on this AI stuff a lot of them sort of had like a sort of hand you know a finger in the air like I think a lot of people are using a tools our company I'm not totally sure and I actually know like for sure because I can look at the usage of AI features on zapier and that is actually what I just cited so it's like 50% of zapier employees use zapier AI like on the platform on a daily basis so they have these automations that AI is like a part of the middle of and so I can say that with some degree of confidence now yeah I think your question is still valid of like well how did you get that and one of the things that we did and this was the week between when gbd4 came out and the plugin store came out there was like a onewe window if you remember back to like March of 2023 and Wade myself Brian we went to the company and just said hey we're gonna have a code red moment on AI like this is zapier's future I think we all saw it at that point like you know zapier's automation company but like man AI like we're an AI company we just aren't acting like it yet and we went to
7:54

I gave employees a week off to just use AI

the company and said code red now we don't expect everyone sees this yet but we think this is going to happen and if we want to have a chance to sort of help our customers and help our users learn how to use these AI tools and build them into Automation and get a lot of business value and productivity value from them we need to be the first adopters and so we went to everyone and said code red we've got this upcoming hackathon and we're going to change it we had this like week set for like early April I think we said we're not gonna do that anymore instead everyone in the company like you get the week off basically from your day job and all we want you to go do is learn how to use AI tools that's it and it doesn't need to be as zaate just go learn how to use these tools and figure them out because if you know in order to benefit our customers we have to be we have to have the knowledge of how to use them ourselves and so I can see that I can actually chart this like adoption of usage on zap your AI and you know it's like literally an S curve where the like steepest slope of the S goes right through the week where we have that hackathon and it really sort of forced a lot of internal figuring it out I think these are things most companies are adopting tools had to figure out last you know 18 24 months of what's our security p and privacy policy how do we allow usage over private data right all the considerations folks went through we kind of went through in like a very focused finite like onewe period we just got it all figured out because we said this is like must happen now and that's really I think accelerated the sort of ability for us to build you know useful hopefully useful AI tools for our users and customers yeah I love that because most companies like the security department like super risk adverse and it takes like a while for AI to get adopted I mean I think the reality is that like all the employees are just using it's existential opportunity existential risk so I think when you're staring into the abyss you can get a lot done yeah got it makes sense like I think more companies should probably think that way actually I think not enough probably do you know I think that like you know the Advent of AGI which we don't have yet but like when we do like I think people most folks still underestimate the degree of um progress we will see in purely like digital Realms so things where you know the world of bits and I think people still kind of way overestimate the degree of progress we're going to see in the world of atams once we have AGI because the first version of AGI like is going to be information constrained and like what are the systems where you can go create generate or collect information fast it's going to be in the world where you can model it digitally either it's digital software you're primarily rate limited by you know CPU clock cycle Network bandwidth things like that whereas in the real world you're sort of rate Li by very different things right you're sort of rate limited by like you know can I convince a venture capitalist to invest in my company and you know buy some equipment from you know overseas and ship it here and buy office like there's just very different sort of rate limiters in play and I think that's one thing most folks don't realize or don't ful acknowledge yet I think once we have weak and strong forms of AGI existing all right Mike so let's talk about like let's take a tour quick tour of the different functions in Tech and like how they can use AI to really like save a lot of time let's start with like product managers like how can product maners you use the ey I'm going to answer actually I'm going to give you like a high level of like here's the general form of how zapier like across zap is using it and then I'm going to ground it to be specific because I think you're going to see hear a similar answer across all and it's worth just like getting the gener general answer so I think the like most useful thing that we found internally for language models across app here that's like that you know majority of folks are using is to pars and handle and work with unstructured text so wherever your business is getting inbound data where it's highly unstructured is really I think one of the like newest opportunities to use language models do analysis feature extraction summarization drafting responses to the these are the kind of use cases that are not possible with zappier four years ago right these are the types of unique use cases that AI can do there's lots of Zap zappier can do that just like AI like kind of is just an alternative way to do but this like handling of unstructured text is one of the really special new use cases so you asked like product
11:57

How PMs can use AI agents to save time

manager one of the cool ones we had here was we when we first launched this product called AI actions which is a developer toolkit for allowing you to plug the zappier like tool Library into your own like LM products when we first launched this thing we had some evals that we had built for it and yeah we didn't even never really know how good it was when we launched it we kind of launched it originally Just Vibe checks getting a baa out there and it turned out the first version was actually not very good in fact I think the accuracy was something like 30% 40% accurate was like really bad and low and but we had scoring on it we had thumbs up thumbs down with like text boxes attached to every single sort of transaction we got feedback from users on you know was this a good sort of translation of a user request or utterance into a tool or not and we had a all that data that like thumbs up thumbs down was just getting plumbed into like a database table somewhere you maybe a Google sheet or ajango admin something like that and then we built a zap that picked up all of that each transaction each record and looked over it nightly and we said let's collect you know last 24 hours of all of our feedback and then we had an LM summarize all of the issues that we had collected over the last 24 hours into basically like priority and summary and then give the raw links attached as well and we dumped that into a slack Channel along with the overall eval score that we had because we had this like you all running against all the feedback we were collecting and then we pumped that into a s channel that was like you know the engineering and product management on this project and over the course of about two three months based on sort of that workflow just l literally just looking at the feedback we were getting on a nightly basis and trying to react to it as best as we could we were able to get accuracy from somewhere around like sub 50% all the way up to like 80 90% accurate in just a few months and so like yeah that that's a job that like totally could be done by human it's just like a really the large data processing and understanding job that certainly was like much easier and happened I think happened with a higher degree of quality because we're able to sort of use this like AI summarization step in the middle there yeah and I guess the same AI summarization can be applied to like you know say some Arcane like you know transcribing Sal sales calls oh yeah I
14:03

We saved $100K by using AI agents in marketing

got to tell you about our like go to market stuff here because it's really cool so we this is probably the one spot we're like making the most direct dollars from actually is on the go to market side with AI so we use HubSpot probably no surprise there as our CRM we're also use gong which is a sales call transcription tool and so when our sales dfts here's the sort of you know the process of old was sales reps would wrap up their calls and they would spend the next 10 to 15 minutes reviewing back over either their notes the transcript their memory and trying to do extraction over to these like custom fields that we had set up in HubSpot that we thought were important to know about you know assessing How likely is this particular lead to buy what are their blockers what information do we need to get them what are the next steps you know all these kinds of sort of meta signals and every morning our sales managers wake up and they're looking over our HubSpot funnel trying to answer who are the customers that I should go spend time with today and they're trying to find the folks that are sort of the most likely to purchase that have the biggest easiest blockers to resolve you know you want to book the revenue that you can book generally in sort of a sales funnel and so this process is sort of dependent on the quality of the data that you actually get into UPS spot and unfortunately this manual transcription job like the 10 to 15 minute wrap-up documentation from every sales rep to do every call is one of the like least favorite things that s do uh because it is like the most aerpro it's the most grindy like you want to talk be talking to people right like that's the fun part of the job is figuring how to solve their problems not like transcribing junk into a database table and so we've got this whole process now where every time a sales gong call wraps up we automatically get the transcript coming off of The Gong side feed that into some llm steps where we do feature extraction over it we grab out you know lead name email company propensity to buy blockers what Integrations and apps do they care about who else matters on the team all these like signals automatically fill it h spot sales reps don't have to spend any time after calls then sort of manually entering that data and our sales reps or manag that in the morning get a much higher quality signal and like upto dat accurate view of the funnel to sort of go mine over so you you're probably given like sales reps back 15% of their maybe time even just to actually spend more time with customers and then you're also booking additional reper because our sales reps are actually knowing what of the leades are actually the right people to spend with that's crazy so you probably like save uh six figures seven figures just to do at one point I calculated it last year I haven't gone back and updated the number yet but yeah it was like hundreds of thousands of dollars on bar in additional Revenue with this setup yeah I totally agree man I think that's like my number one use case like just like you know just like copy and pasting even like long slack threads internal slack threads like I don't to read all this like it's like come hey what's the why did this person tag me on this like 15 bches slack thread like tell me what's going on it's really good for synthesizing everything yeah come back to the other point you all just want computers do the grany work for us we don't have to right so I think that's the problem of this stuff do you have uh do you also use it in your per personal like do you tools yeah I the main thing that I use like a lot of our AI tools that I use you know across like chat B and Claud and some zapier tools built in here is primarily around learning new things I find that these like large language models are one of are extraordinarily learning and teaching tool particularly like learning new codebase for example I was picking up Jack's this past summer which is a sort of flavor of pie torch where it's kind of a I don't know how much details you care to know about this but St as to say it's a very different way to think about how to do machine learning and training models and it requires a mental reshift and I didn't get it at first I kind of knew the promise of it but it's a completely sort of different API surface area and has a lot of quirks and catches that don't exist in sort of you know the underlying P torch style libraries and so I spent a lot of time with tool like with chbt with quad to get a handle on the sort of Epi shape and space understand the gotas now it can't actually write code three I think this is like you're going to see this in every case it's like you're not actually going to get like if all you want is like code that matches a tutorial that's somewhere on the internet you'll be able to get that but as soon as you sort of step one degree of like complication like away from it generally you're gonna have to you're going to start being more skeptical about whether it's gonna be vow or not and all my cases it never was but it gave me the right sort of like style sense and shape and kind of help me understand okay here's the next important post important question to ask and kind of get to the froner of knowledge there and that's probably the my personal number one use case for all these AI systems and tools is just acceler my learning into domains that I am unfamiliar with and you just like have a conversation with it or like you paste into the documentation or just chat more for me personally it's more targeted than that like more surgical right like I have a very narrow specific question like I'm generally going to read documentation directly from somewhere on the line online but I will often then once I've like sort of sat down to part as of documation and I now I have like a project plan in my head of like okay well here's what I'm actually trying to do I will then sort of have a very targeted initial question of like okay how do I show me an example how to do this and that like initial stop like going from Zer to one I find you know it's much easier like work within an existing code base to when you're learning than it is to write zero to one Code Zero to writing zero to1 code requires a really quite deep fundamental understanding of like what you're doing and the sort of programming toolkit you're working with and if you don't have that it's much easier to work within an existing codebase so like I think you can accelerate your sort of initial like you know productivity by you know let letting these things at least go zero to one and then even if it's not quite you know maybe like three qus of one it's not all the way there you start using it as a starting point to sort of figure out okay how do I get from that to one and you know along the way you're sort of learning some of the keep Concepts and you're getting familiar with the syntax shape of the API libraries it gives you sorry spots for it yeah makes sense it's like you know it's much easier to edit a draft block post and to like write from scratch it's the same kind of principle yeah for what it's wor I'm very different from my co-founder Brian like he is a very prolific you know like cursor visual stud completion using language models to like do autocomplete and stuff like that he has a very different workflow than I do in terms of how to like Leverage language models for like programming and coding and stuff and I've tried to get do his and it hasn't like worked as well he's tried to do mine and like he hasn't so I think there's actually some like different personal preferences of like how people learn and work that you know they find different tools so it's kind of worth experimenting with different patterns I mean that that's also like kind of one of the big benefits of AI you can give like really personalized education based on your preferences it's not like a classroom where like everyone to learn the same thing yeah figure out what works for you right you kind of have to have tried enough stuff to know okay this is what I like but yeah I definitely agree with that statement yeah I was at a panel with Kevin and Mike from like anthropic and open yesterday and like there was a really good quote that they said about like how you know AI is already intelligent but you got to teach it how to do things like it's always smart you just got to give it the right instructions so I can actually do stuff for you yeah we can debate that definition which we should but I'll accept it at face value for the purpose of this conversation okay all right cool let's talk about like AI agents like you know like like you said the promise is like you know just get AI to order a pizza for me or like get to do stuff for me without me having be there like how do you think that's almost here or like I mean in some ways Z is kind of like a many AI agent but like how close are we to AI agents that actually like do stuff without supervision like do a lot more complicated tests I guess yeah I think like I could give like a let me give like an overview because I
21:28

Robotic process automation might be better than APIs

think is kind of like interesting to consider in the sort agent world or not agents pH a very overloaded term even in zapier like I think back in 2023 I said don't we're not you can't use the word agent anymore you have to be more specific because like everyone was using it to and had a different sort of context loaded up so let me talk about from an automation standpoint I think there's like two kind of broad categories of automation that exist in the world one is API based automation this is what zapier does there's quite a few competitors at this point to zapier but I I'll categorize it is like API based Automation and then there's another type of automation that is what's called like robotic process automation is the industry name for RPA it's basically keyboard and mouse automation you know if you want to think what computer use if you're gonna you know it's a more friendly name that anthropic came up with recently I think for the quad model but these are very distinct types of agent automation if you want to think about them and they have tra they have like pros and cons the API based automation this is what zapier does it's very permission oriented right you have to go get API keys so and you're sort of working with the API that third party vendors have given you right and said hey here's how we want robots to work with our service and so it's like very you know upfront there's a per there's a partnership agreement you get API keys but it's also way more reliable the API space is much smaller so it's simpler to create automations over and this is why zapier without any AI to date or you know up till this 20122 or three was able to sort of get as many users as we did because like the shape of the API is relatively small we can build it into some pretty nice ux I think to let non-technical users work with it the downsides of it are that the API is limited it can't the apis you get from say Salesforce or HubSpot or slack these are not perfect apis to everything you can do with those products they are actually narrow and this is where RPA gets you know P exciting you know RPA historically has been generally quite fragile and hard to set up you know the sort of main uipath is like an example of like one of the more famous I think RPA companies in the past and their primary audience they sell to is like Legacy it and Engineering organizations who are trying to automate you know like Cobalt code from the 90s that's like running on a Mainframe server somewhere that no one wants to touch because no one knows how it works anymore and it's easier to like slap a UI automation layer and like literally have a sort of software drive a mouse and keyboard around the interface and like click this like Legacy software instead of trying to like reengineer it in a more modern you know language or framework or anything like that and so it tends to be just due to the sort of nature of the technology without any AI it tends to be very hard to set up um because you have to like literally like you know oh follow my mouse and keyboard and copy this sequence of patterns it tends to be very fragile because you imagine automating like a Windows box or a Mac laptop or something well what happens when you like close the lid it's like okay well Sops working or what happens if you get like a system you know permission pop up at some point or it says hey you need to update your software like well if like generally those like RPA based automations are very fragile and they're just going to break and so like it's generally is like an order of magnitude harder to set RPA up than API automation it's generally an order of magnitude like harder to keep maintaining like maintain them compared to API based Automation and so I think that's why generally like zapier and the API based has sort of won the day so far yeah importantly I've sort of described the world without AI I do think that like this RPA based sort of future is a much better one if we can figure out how to get it to work because you get ort of the full you know you get the full interface like the anything that a sort of human could do with a given piece of software whether it's in a browser or on a desktop or on your phone anything where humans can do it is actually part of the API you know or the sort of like the interface back basically you could have it operate over whereas you know with more API based it's very permission oriented and you know you're never going to get the sort of full version that you can do so I think like that's kind of the world today that's absolutely the future my belief today is that we need weak forms of AGI in order to get the reliability and the setup ease of use high enough on let's call it the RPA style of Automation in order to get bring consumer products to Market on that I think you could look at rabbit that product that came out last year the little orange you know thing yeah so like I think that actually was a that was a consumer RPA product right they were trying to create an automation layer a pointand click automation layer over Android apps that were running on a remote server and that's what they were trying to do and it's just a really hard problem to do when language model like require a ton of like still domain specific prompting they're not truly generalizable because they don't truly understand any of the underlying concept they're working with are just pattern matching and so to the degree that you can engineer pattern matching even in somewhat of a aario domain it can work but it requires a lot of effort but certainly within the real of possibility now but the full version I think we all want you know can work with any app learn anything I can give it well the frankly like learning is actually not one of sort of language model specialty sections today so it's not very good at the current AI is actually quite bad at learning new skills we talk about AI now
26:30

Most definitions of AGI are wrong

when you say weak AI like what do you mean like have you tried the computer use thing from cloud yeah so I think it's like definitions are actually really quite important here and I think the two most commonly accepted definitions of AGI are either one I know it when I see it I'm not even going to worry about defining it because it's like undefinable in some way or you know it's not worth debating I sort of disagree with this definitions are important because we can turn them into benchmarks and benchmarks we can measure for Progress so I actually think definitions are useful now one definition that is used right now quite often in sort of AI industry is the one that open Ai and Microsoft like put in their contract from a couple years back where they said AGI is a system that can outperform humans at the majority of economically use for or something to the thrust of this and I also think this is not a great definition of AGI because I think it is actually one that narrow AI systems can do today without actually having AGI in fact there was an article a couple weeks ago where they were trying to open it was reported that open was trying to like renegotiate that contract by you know threatening to claim they already had AGI and it's like because it's not act like the thing we have today is like it's undefinable in that sense so it's like not a very good definitional contract because it's a prac you can accomplish that goal of like automating a lot of economic use Flor without actually have R intelligence I think you can do it through purely the sort of language model you know memorization or neartime generalization regime that you get from them so I think I'll cut to the answer here I think the right definition of AGI is one that France waset coined back in 2019 in his paper on measure of intelligence and it is a system that can efficiently learn how to learn new skill and apply it to New uput ended problems so efficiency is a really important definition and this idea of learning new skills a really important problem and here's the thought experiment to like ground this so you know AI versus AGI we've had AI systems for years now that can outperform humans open a definition and games of skill like go or chess or poker and how do researchers and Engineers accomplish this well they like have to start over from zero every single time right they have to go think of invent new architectures invent new search algorithms often they have to go collect new data they have to sometimes reach new levels of scale in order to accomplish those Feats the important thing is every time they're starting over though in each domain and this is in direct contrast to you Peter right I mean I could sit you down here and probably teach you a new card game in probably a few hours maybe a new board game in a few days and get you up to human level proficiency it and that I think is really the Hallmark of what makes you generally intelligent is your ability to rapidly and efficiently learn that new skill and learn any new skill it's not just you know card gameer board it's like any new skill I could sit you down and teach you and you'd be able to sort of pick it up quite quickly and apply towards new problems that you had never sort of encountered before in your life and that is what no one knows how to get computers to do today that is Agi and so when I say weak forms of AGI I think the first forms of AGI we're going to see are likely going to be quite inefficient you know it will be on the spectrum of efficiency of like learning new skill and apply it but I think it's going to be very weak it may take days to learn new tasks it may be very slow may need still like a sort of large degree of you know training data relative to what humans do but I suspect the sort of spectrum of efficiency is going to be the one that like gets better and better over time and is probably the main gradient of how we'll get from weak AGI up to a stronger AGI yeah make good point like yeah like with the game of Go like it was just trained on like millions of gold games or something right there also of search this was how chess was beat in the 90s right it's a largely a search algorithm that was how those games initially got defeated there was in paper that came out just a few weeks ago that I think maybe from Google or someone who trained just a really large AI model on just pure pre-training alone that was able to beat chess so like you know I think this sort of fits within the same regime of how large language models work which is the Paradigm of deep learning even more fundamentally than just language models is we improve system accuracy by showing them more situations and more answers as opposed to that's right you know the orthogonal dimension of actually measuring their ability to learn that skill with efficiency and apply it towards these openend problems so you
30:45

Why LLMs won't get us to AGI and what might work instead

have this Arc prize thing going on you went to many different schools and I went to the website And the tagline is the tech is Agi progress has installed new ideas are needed what can you explain why you think hii progress is are open hii progress yeah so I think this claim was accurate coming into the summer and I think that we might not be in this regime anymore and I think there's a bunch of reasons why so we're going to need to update I think that for our priz 2025 but I do think it was an accurate claim coming into the summer and the reason is because coming into the summer basically the sort of you know largest you know mean line system that we could use was right gbd 40 and this is just a large scale build up pre-trained language model doing you know a pass through the network weights generate tokens on the outside this is the same regime you're basically scaling up the same regime all the way going back to you know gb2 which came out in 2019 actually around the same time that France W published the onmeasure of intelligence paper which is where the Benchmark that Arc priz is testing against was also introduced and so the claim is that we really haven't seen an improvement in generalization power since 2019 at best probably going back to 2017 with the invention of discovery of the Transformer but we really haven't seen any generalization power improving I think the claim would be that you know we have these systems have had the same amount of generalization power due to their fundamental architecture um but we've improved their accuracy on benchmarks by giving them more training data but the generalization power has been fixed and constant and quite low and that is going all the way up to the Summer where we launch shark priz and a couple things happened we launched darkar priz there are I think four or five startups that I'm aware of that are all now working on completely orthogonal research directions to Deep learning to try and beat AR prize and they're starting to like make pretty impressive progress along these directions in fact I would make the claim that I think the top approaches on both the private and the public leaderboard for our cries are sort of not just they're just not pure models they are models plus some extra search or something interesting something distinct and different from the deep lording Paradigm that's allowing them to get these results they get and this is quite similar to1 now actually from right openi kind of made this big claim when they released it that 01 is not just another large language model in fact it's a new paradigm to scaling you know towards AGI and so we did we actually tested it on Arc prize and it got about 20% 21% it's on par with Cloud 35 son it you know this is about half or so the state of the art actually still on Arc so but even despite it's kind of like middling score I do believe that 01 is truly the biggest Improvement in generalization power we've seen in a commercial model going all way back to gb2 this idea of being able to do test time search where they're allowing the botel to not just generate one Chain of Thought but multiple and search over them do backtracking allows these models to more sort of fully like kind of navigate the situation space around the prompt that the user was given now I think there's a debate of like well is it AGI and I would claim no I think there's one thing that's really missing from it that conceptually limits it from reaching AGI at least by this definition the way I just shared which is it's still operating over the pre-training distribution that it was fundamentally trained on and you know the way they did this was they went and generated lots of synthetic change of thought over you know formal domains like math and code and programming and things like that there was likely informal pre-training as well things maybe like gold sub go breakdown that got sort of scored by humans they use that signal as a reward signal for the pre-train but you're still fundamentally limited by what data got put into the pre-train here and I think in order to at least conceptually relax the constraints on your architecture sufficiently you both need some form of like test time search like what 01 is doing like what a lot of the top scores on AR prise are doing and you need the ability to incorporate information from test time back into your model and Carry It Forward right you need the ability to sort of make contact with reality and learn from it instead having to learn from it sort of every single time and this is something I think Alpha proof from Deep Mind actually does they have a combination of both test time fine-tuning plus this like search mechanism at inference time and so all of this landed like this summer we got 01 we got Alpha proof we've got a ton of like Frontier approaches now that are completely that are way more than just language models sort of the top of the arc Pride leaderboard so I think it's actually a really exciting time in sort of the field of like generality research I think we've I think Arc price has hopefully played a small role in helping catalyze some of that to happen as well that's amazing man like what are some uh companies that we should check out that's part of Arc price you said there's like a couple of companies that leading This research the top of leader board they're worth talking about probably or that I know that I can talk about are the top of public leader boards it's actually this guy his name is Ryan greenblat a really interesting approach where he's using gbd 40 the public leaderboard doesn't have the prize money attached but it let's use commercial models like open AI he's using gbd4 prompts it for like 2,000 python programs for each puzzle and then he searches over the python puzzles or programs to like look for ones that match the puzzles and then he applies it and then the other team on the private leaderboard the one that's actually in the lead right now for the prize money is this team called mes AI it's actually a super team this was a team of three people that were individually working on Arc for the last couple years separately and after we launched AR PR they sort of joined forces into like one team to kind of you know pull together all their insights and thinking together and they've actually made a ton of progress this summer from doing that and they're doing something quite different to Ryan they're doing this test time F tunity thing where they pre-train a I think it's like a Coen model from Salesforce like a very small model 200 million parameters they pre-train it on a bunch of art puzzles and then inside uh which is the competition platform we're using they fine-tune their pre-train model with the private puzzles that they get exposed to and so they're making their system sort of has found a way to sort of you know relax or they found a way to make contact with that reality incorporate some understanding of reality back into the model into the distribution for what for when they're prompting it and you know they're doing of course a lot more bells and whistles than just that but that's like the sort of thrust of what they're doing and I should mention you know the goal of the contest is that in order to win any of the prize money you have to open source SHO code and so by the end of this year we expect to have you know new open source code from at least five maybe seven different teams that are working on Arc and hopefully use that to rebaseline everyone working on it going into next year to make it more likely we can figure this thing out yeah it's really impressive to see like just some random person like Ryan beating out 01 preview on some of these big models that's great Ryan is a fantastic PR engineer he's probably one of the best people in the world that figure out how to like get language models to do interesting things and these are col students or not they're like a looks at like a research Fir andsf and then M's AI folks all have like pretty different backgrounds I think one of them is has like a psychology background you know like not sort of like at you know big AI Tech Lab scaling the next you know trillion parameter model or something like that they're all folks that have pretty interesting unique Journeys yeah so that that's a great segue into like you know like how can really smart kids or like people just want to switch careers like get to get ahead of this like AGI thing that's coming or like you know actually have a career like should I how to fix to toilet or like is what yeah you know I think the like I the third tack if you're interested in AI it's easier to answer this question I think from someone who's like kind of interested in getting into AI in the first you know one of the things that I spent a lot of time talking with like college students um over the last year or so with Arc PR and one of the more disappointing things that I sometimes heard and this was more true going into the summer was like man I like it feels like AI is already all fig out and I don't really want to go work on it because I don't there's nothing else I can contribute we're just like scaling this thing up we're just going to reach AGI and you know almost sort of this mindset of defeatism right and you know on one hand like I don't kind of blame them because they're getting beaten over the head by this narrative from all these like major hyperscalers oh like no we got it all figured out don't compete with us we're just gonna like scale our way up you need data centers with nuclear power to be able to like build this stuff and like yeah you know sure if that's true like yeah okay I get it but I think things like Arc The Benchmark demonstrate there are tasks that are extraordinarily easy and stle and straightforward for humans to do that AI cannot do and it's not a random happen stance that this Arc Benchmark just isn't beaten yet it's because it was fundamentally designed to subvert the very thing that deep learning is good at which is this memorization regime and so like sure while we made progress on our crisis Year we're up 20% now from the start of the contest in June we don't have the core ideas yet to get to the Grand price solution so I think there is still a lot of fertile ground for individual people students researchers early graduates people that just want to like try and make a make an impact like we are still like idea bound and conceptually constrained to get to AI again no one in the world knows how to build a computer that can efficiently learn new skills and solve open-end problems that has not been trained on so if that's an exciting problem for you and you kind you want to get into it I guess I would just say like jump in because yeah like that you know we're not just on the path of is not all figured out yet yeah that that's actually like I'm really glad you're out there telling that story because yeah like I think the general public just thinks like you know you gota buy like a ton of Nvidia chips and like you gotta spend a lot of money to even compete here like I think like it kind of yeah I think if you zoom out enough on the public Consciousness like progress is progress right you know you just zoom out I might check this thing out once every six months maybe I use it maybe I don't it hasn't had aort of material impact on my day-to-day life but like businesses seem to use it and like that's kind of the vibe check you know you sort of get over this stuff and you know on that side I don't know that the technical details on the architecture the benchmarks really matter that much you know I think if you zoom out on a long enough time Horizon this stuff's going to just feel like it's going to keep getting better and better as long as we actually keep focusing on the right problems to solve though and I think that's the point that Arc is trying to make is that like most of the industry is oriented in the wrong direction at least it has been I think we're finally starting to see some course action here but over the last like three years of scaling four years of scaling a particular paradig around deep learning like I mean it was it's really over rotated like in 2023 there was I don't know something like 200 billion by my count invested into language model startups and maybe like a hundred million into AGI startups doing new stuff like it was like 100 to one sort of attention funding disparity ratio and you so I think that's kind of the reaction we're trying to have is like hey we want to we want Arc prise to be a sort of North Star for AGI and have it always be pointing in the right research direction towards it to increase the chance that actually get this thing that's my one of my final beliefs is like we should seek to try and accelerate to this discovery it's one of the most important ones I think we can ever sort of get to and you know it's just kind of frustrating to see like you know man well we have this like counter example that's really important and most of the world isn't aware of it and like we're all over rotated in approach never going to solve it and that's the I think one of it that's the core mission of our priz do you think like op some these other companies are like doing it doing the same thing internally like because they're trying to reach Ahi too right like I think they have the same approach her same approach in what sense like they're not just trying to skill the existing LM they're trying to I know this definitively I mean you can just go read like Nome Brown's tweets ahead of reasoning and planning an open a like yeah nome's an interesting fellow because he was one of the people who is probably most individually responsible for of building those really great AI systems that can outperform humans I think he started out on poker he might have done a couple other games he did diplomacy at meta when he was there and I was listening to a podcast with him I think it was last year before he joined open Ai and I think he had this quote something paraphrase which was like I don't think games of skill Are any are interesting anymore for AI it's a solved problem and you know what we don't know how to do is this other sort of you know general form of like reasoning and working over many different domains and you know now he's working as the head of reasoning and planning and open ey and so like I know for sure that a lot of the labs are trying to like figure this out I think deep mine's another pretty impressive lab I think Alpha proof is maybe one of the systems in the world today that's closest to like AI just in terms of it's you know has ability to do search Plus test time fine tuning still limited to like a formal domain with lean the math proving theorem but you know it's kind of funny like you know Alpha proof and sort of A1 kind of have S different very different pros and cons right Alpha proof is like they sort of figured out some really important things around generality but it's very constricted to like a formal domain of reasoning with Leen whereas L1 works over on structured text but maybe is not quite as good at doing the formal reasoning and so they're both trying to like get to each other's spot today so I do think that there there's certainly a lot of like Frontier research on all these I don't think I think we're at the speartip of research we're ped oh we're just going to go train a model bigger than gp4 that's great so just to close on this like when you say just jumping like what do you mean by just jumping like just start contributing to ar price or is there PR not to sort of you know talk my own book too much here but like yeah I think AR is not a bad place to get started if you w to like if you are curious here's the frame off for what this Benchmark looks like that has resisted every single language model that has been invented over the last five years remember this Benchmark was created in 2019 before LMS even existed it was created as a response to deep learning not LMS and it has resisted every as like every amount of scale that we've been able to throw at with LMS if you're that's just not a little bit like Curious you know if your curiosity is sparked and you're like yeah what is that Benchmark go to AR pr. org there are some puzzles in The Benchmark it's a very visual Benchmark that you can do as a to get a feel for the data set go check it out you can take some of these things and just like I think you might be surprised at the degree of Simplicity and the degree of straightforwardness of the actual you know tasks that are in The Benchmark and if you're curious to then go further there's a really good technical website and get started kind of start read about the Benchmark and even some open source code you can get started running it on kaggle and on your own machine to get started in the contest awesome well I mean it's good that there's at's uh I'm seeing this graph it's good that in this area humans are still way more advanced than any kind of AI system so far but yeah hopefully if AR pra sees it will catch up quick quickly yeah cool well thanks so much Mike yeah this is super awesome conversation yeah thanks for having here was always exciting to CH about the stuff

Ещё от Peter Yang

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться