Ben Firshman CEO of Replicate on Building Community, Open Source, and Navigating the AI Industry
27:05

Ben Firshman CEO of Replicate on Building Community, Open Source, and Navigating the AI Industry

AssemblyAI 17.10.2024 1 111 просмотров 32 лайков

Machine-readable: Markdown · JSON API · Site index

Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
In our third installment of Assembly Required - a series of candid conversations with AI founders sharing insights and learnings - AssemblyAI CEO Dylan Fox sits down with Replicate CEO Ben Firshman as they recount his experiences building a Machine Learning research community during a time of AI transformation, and what he’s learned from the journey. Back when AI was more commonly referred to as Machine Learning and long before AI was integrated into almost every product we use, Ben Firshman was thinking about ways to make it easier for researchers and software developers to share their findings and build with Machine Learning and AI. His experience at companies like Docker, as well as prior frustrations with having to piece together findings from papers to rebuild Machine Learning models from scratch, spurred him to start a research community for Machine Learning researchers and developers. The community would be an all-in-one place for researchers and developers to “put their work inside a box” and more easily share their Machine Learning, or AI, models with other developers in the space, so they could continue to tinker with and build on those models. This idea became the foundation for Replicate, an open-source community for developers trying to “replicate” Machine Learning models. To help achieve this goal, the models shared on the Replicate platform had to be packaged in a way that let other developers implement the models for real-world tasks without having to rebuild the model based on the initial research paper — saving time, unlocking innovation, and allowing more opportunities for collaboration. Since its founding in 2019, Ben has expanded Replicate to match the needs of its users' demands, hosting thousands of models contributed by the community, and building a tool suite to further support the open source community as the complexity—and demand—for AI products has skyrocketed. To read more on Replicate's story, visit: https://www.assemblyai.com/assembly-required/assemblyai-replicate 0:00 - Replicate’s founding story, Ben’s background, and the growth of Replicate’s developer and research community 6:13 - Early use cases for stable diffusion models and initial products built in the Replicate community 8:55 - The introduction of Llama and open source LLMs and how that changed the community’s ability to build 10:28 - Top products that are being built and finding traction and success in the Replicate community 14:04 - What level of heuristics, customization, and duct tape are needed to make different models work for your product needs 22:02 - Difference between building prototypes and production-ready AI products 25:43 - Replicate CEO’s biggest surprise over the last 2 years building in AI ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬ 🖥️ Website: https://www.assemblyai.com 🐦 Twitter: https://twitter.com/AssemblyAI 🦾 Discord: https://assemblyai.com/discord ▶️ Subscribe: https://www.youtube.com/c/AssemblyAI?sub_confirmation=1 🔥 We're hiring! Check our open roles: https://www.assemblyai.com/careers ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Оглавление (7 сегментов)

Replicate’s founding story, Ben’s background, and the growth of Replicate’s developer and research community

Ben I'm really excited to sit down and chat with you we met I think it was around 18 24 months ago really as this current AI wave was starting to explode and kick off this was before your series a I think you were raising your series a when we met and you were like we're about to close it don't tell anyone yet um but since then replicates just become a very popular almost household name within the developer community and I'm really excited to sit down with you and just chat about what the last 18 24 months has looked like from your point of view you have a really unique perspective where a lot of products AI products are being built on with replicate and so I think your point of view and the insights that you've been able to gather over the last two years will be really interesting for people that are just starting to build with AI I think to start though I would love to just hear about your founding story with replicate you started this company before this AI wave so would love to learn about the founding story what you saw how you predicted that all this was coming um and it'd be great to start there yeah for sure it's a very roundabout story because I've been I quit my last job about seven years ago so it's been a long journey but I'll give a kind of somewhat paraphrased version of the story it came primarily from an old friend of mine Andreas who was a machine learning researcher and engineer at Spotify and he spent a lot of his time uh at the time it was called machine learning for a start uh rather than Ai and all of the advances were being published as uh papers as academic papers like a lot of his job was taking some of these new advances that were happening in machine learning and applying them to real things inside Spotify what you'd have to do is you'd have to take this academic paper which described this um described this machine learning model in kind of pros and math and try and turn it into like real production software and it was just a nightmare because not all the details were in there it was really hard to um actually get it working efficiently and get it working in production um and the sort of tragedy of this is at some point that researcher actually produced a running piece of software at some point they compressed it down to pros and then Andreas had to uncompress it back into software again I used to be I used to work at Docker and I used to do products at Docker I created doer capos which is one of their core tools and through conversations with Andreas kind of connected back the problem he was seeing back to the things we were doing at Docker cuz at Docker we kind of solved that problem for normal software by telling software Engineers to just put their work inside this metaphorical box inside this container and then you would know that this piece of software would run on lot people's machines it would run in test environments it' run on all sorts to different clouds and all that kind of thing so we took that sort of metaphorical container and literal container it is powered by containers behind the scenes to machine learning and we thought okay what if we can get the machine learning researchers to put their work inside a box such that it could they could share it with other researchers and it could run in production on various clouds and that's where the initial idea for replicate came about um and we initially started with the research Community we created this place where people could where researchers could publish their work uh but people could also use these model it was packaged up in a way such that people could use these models for real world tasks as well without having to kind of reimplement the paper uh so we got started with the research Community like that and then while we're doing this we noticed this really interesting community of people who were building models what they were doing I think they were trying to replicate it's where the word comes from actually we want to make research reproducible um they were trying to replicate darly and to do that they were taking clip which was a model that's open a open sourced and they were smushing it together with a gan to try and create images which looked like a piece of text um and what was really interesting about this community is they were doing it all in collaps which was very different to the research Community they were much more like open source haets so they were just tinkering around in collabs sharing them on Twitter sharing them on Reddit sharing them on Discord and then riffing off each other ideas to like oh what if I swap out this Gan big Gan with VQ Gan or what if I tweak these parameters to see if you can get a better around person people were just foring all these things like crazy uh and that's where the this really interesting initial textto image Community came from and we saw this and was like this is really interesting the time is really low quality the images took like 15 to generate it was more like art than it was like a sort of crisp image of something um but that's almost like what made it interesting and we built up with this community we built tools for this community as well to help them share their you know these collabs were very hard to use that were unreliable so we made a thing that way such that they could make a really nice web form out of it you could call it with an API so you could integrate into products we sort of worked with this community and build a tool for this community and then stable the fusion happened and that's when um these texal image models really reached the masses and it's where we were positioned perfectly as the place where these models were it's place where people were publishing these models um and where people were tinkering all these models and make variations of them as well which was really the interesting bit about the open source Community but right at that time loads of people wanted to build products out of these models um image editors they wanted to build bu AI Avatar generators generative games and all these kind of things just around the time of chat GPT where there was all this interest then it was just this sort of perfect storm of the supply and demand that just helped us grow so late 2022

Early use cases for stable diffusion models and initial products built in the Replicate community

early 2023 um what did you see people trying to build right away with these stable diffusion models and with these image generators it was relatively simple stuff initially as you can imagine partly it was just the core technology was really magical for the first time ever you could write a description of something you want and it just appeared instantly uh so most of the initial traction and the initial products that were being built were just people plugging stable diffusion into an existing application or building a new application out of it I remember the first app that went viral um it was uh we were seeing a huge increase of demand early on and we were just sort of feeling the strain of it on our sort of uh on our kind of rickety sort of early infrastructure and we were like we have to completely rewrite our infrastructure and then a few days later um this user from Japan uh emailed us being like I'm going to be on Japanese TV next week is it all right if I send you this can't remember what it was but you know hundreds of requests a second and we're like uh sure so we had a deadline for rewriting the infrastructure um but what that app was it was just stable diffusion in an app with a bunch of like buttons to help you prompts it was like I would like this style this style and I like this in the image and it was incredibly compelling it went viral in Japan um and you know just in those early days this idea of generating an image was magical the first thing where it got really interesting was when um fine-tuning was possible the first system where it worked really well was a system dream booth and we built one of the first apis for training these models where you can feed it 10 images of an object or a style so it works really well with both so you can feed it um uh one that's very common is feeding pictures of your face and then you can generate Avatar you can generate pictures with you in it and we saw lots of people build uh Avatar apps we saw people build you know things like generating things in the style of your game or your product or things like that and that was the next thing that really took off and the next thing was probably control Nets this was where you could um control the output of the image or make it output things in a certain shape or things like that and that also opened up a whole another instead of that's particularly good for image editing and things like that where you need things to be constrained in a certain way uh but they were the early things that were being built so image uh models are

The introduction of Llama and open source LLMs and how that changed the community’s ability to build

where you guys got your initial traction where you started to see a lot of demand and then open source language models started to happen well it was llama 1 came out in April 2023 I think or about that time spring 2023 and what was really interesting about llama is that it was the first really capable large language model or on the same within the same realm as the proprietary models like open a eyes models but it was tinker and you could fine-tune it and you could mess with the code and you could plug it into other models and this kind of thing which really caught the imagination of the open source community so there was this sort of community of hackers a bit like that early collab Community with stable diffusion who were tinkering on these models and making interesting variants of it but it was encumbered by licenses which meant you couldn't use it for commercial use but that was that caught the imagination of the hacker Community cuz it felt naughty you know it felt like something weren't allowed to do leak yeah exactly which made it really compelling to hack on um so that was that Community sort of grew around there um but it was noncommercial it wasn't really good enough and it was until llama 2 where meta kind of realized this was really interesting and oh we should probably make a version here that's not encumbered by all this sort of research license so that's when they made llama 2 and that's when it really took off because llama 2 was much better and it was possible to use in products as well and that's when it really took off so

Top products that are being built and finding traction and success in the Replicate community

you guys have been at the Forefront of the hacker community building with these text image models these diffusion models and then you saw this happen with LMS because one of the things that's really cool with replicate and this didn't exist when we first met was um you can take a model and you can deploy it on replicate and then other people in the community can access that model it really meets this ml in a box replicate Vision I think what I'm curious about is how are you seeing uh people actually build products with these models especially the ones that are Finding Traction that are durable that are creating a lot of end user value and I think there's a lot of really cool demos on Twitter and um prototypes but there's a lack of those prototypes making it into production for various reasons one of the stats I talk about a lot is that Enterprise deployments of AI into production it's actually gone down year-over-year uh even though prototypes have gone up when you splice the data across in production versus prototypes these data citations are always like a bit fuzzy but I do think that's probably directionally correct you're in again unique position you're seeing so many products being built on replicate what are you seeing like how are people actually building successful products on top of all these 10,000 different models that are on replicate one way to understand this is segmenting by different mediums so and different stages of the company as well so I think language is quite different to image audio video 3D and other multimodal stuff and things startups are doing is quite different for Enterprises and we're primarily building for startups and when we say startups it's both actual startups but also small teams inside large companies that are kind of behaving like startups these teams are having a lot of success building like either whole products that are native to AI or like particular Point solutions to certain things in inside their products to give you a few examples of things that are really working with our customers we have lots of people are building consumer apps with gentra VII so people like uh people who are building these um these virtual photo shoot applications which are enormously popular like some of our top customers are building these applications where you let you sort of create your own photo shoots and people are building image editing apps which who either using generative AI as like a starting point for an image or as a way to edit images so there these whole startups that are built around these new mediums we also see it being mixed into existing products in a really successful way so I think generative um you know photo editings a bit like that but also generative um game content where it's all of the sort of you know we see people having conversations with their game characters us some language models um who where the uh you know content in the game is generated by image models or 3D models um even audio and music is generated using as well um and that's just getting started and that's sort of being weaved into existing products for all the developers and

What level of heuristics, customization, and duct tape are needed to make different models work for your product needs

companies that are building these types of applications that you're seeing how much stringing together of different models do they need to do are these models just working kind of off the shelf from replicate how much herur istics duct tape is a word you've used how much duct tape are they having to put around these things um I'd love to learn about what you're seeing there it depends a lot on the medium I think so language models you can prompt them to do what you want to do quite effectively and in fact as prompts as as uh as contact Windows get larger you can encode a lot of behavior in the model in your prompt very effectively so you don't necessarily need to customize and from most users we talk to most people aren't fine-tuning language models or customizing them to to any extent really um but we are seeing plenty of that for Multi Image and multimodel use cases which is really interesting for people doing quite simple use cases you can use these offthe shelf models but what we find is anybody doing anything sufficiently sophisticated they need to customize the models whether it's fine-tuning them whether it's tinkering the code whether it's Plumbing multiple models together in some way um you need to be able to customize them somehow uh so out of our top 10 customers by Revenue eight are customizing the models to some extent they're taking open source models and customizing them where that's fine-tuning pipelining or editing the code but what you're seeing is that people were not interested in fine-tuning these LMS I'd love to just understand like that I think something we're finding I think it depends a lot on the medium so again image models and similar you know non language mediums are quite different from language so what's quite interesting about image models is it's kind of in some sense unreasonably effective fine tuning them is unreasonably effective we can give it kind of 10 examples and it can represent that object or that style extremely well whereas fine tuning language models is much harder for whatever reason and it might just be that we haven't got the right techniques yet tools but you need to produce typically need to produce an awful lot more data so it's on the order of thousands of examples instead of tens of examples and it's much slower and more expensive to do and it's much more finicky to get right for whatever reason and it turns out that it's actually quite easy to pront language models to get the result that you want my intuition for the reason that is that it's is it's a few things it's kind of it's partly because it's the same medium in and out so you're prompting text and text is coming out but it's also much a much sort of lower has a much lower dimensionality whereas things like image audio video all these kind of things it's very rich information dense medium uh whereas text obviously has a lot of information in it but it's a lot less so it's very hard you know to be more specifical about this it's very hard to prompt an image mod to get exactly the star want I couldn't front an image model to get Dylan into it be very difficult to do that I hope so but that's but fine tuning works so well for that um and I think another intuition here as well is when you're dealing with um things that are not text you need to work at a much lower level it requires a lot more sort of duct tape and plumbing to get it to work right you're like Plumbing it's almost like systems programming where you need to Plum these uh video models into Audio models into text models at a very low level relatively in sort of terms of in terms of programming these things to get them to be low licy internet a lot so many follow questions to that because a year ago even you might have been a proponent of hey fine tune an open source LM it will be cheaper it will be better and that's the path you should go now it seems like you've switched camps or at least you're bit more party neutral yeah it's a bit more Nuance that there are use cases F language mods and be clear we see customers F language models and in fact people are deploying F language models on replicates it's just not a sort of First Class part of the products anymore what are those use cases where it does make sense two things it works really well for is sort of similar to image models like if you wanted get to be in a very particular style like if you want it if you have like a house style for how you write or you want it to learn you know a language or learn a program language or something like that um a great example of where this works really well is if you wanted to get it really good at so a customer that is deploying fine two models and replicates um their use case is they have a proprietary query language and they want to turn natural language into that query I see and for that it's almost like teaching it a style and that it works very well it's possible to prompt it to do that but fining Works quite well for that um there's another use case where it works very well for is if you have a very specific uh thing you're trying to do where is it's like I'm trying to summarize legal text or something and I want to do as cheaply and fast as possible and you can actually train a very small language model like a 7B model to perform as well as very large on just a very specific typ or just extract from this like free form text extract these text Fields but I want to do incredibly cheaply it's almost like when you want to really optimize the models that's the point you do it but what's interesting about this is that base models are getting better smaller and proprietary models are also getting better and cheaper as well better cheaper and faster as well uh so for some of you know it's just sort of a spectrum and it's like a sort of equilibrium almost where you know maybe six months a year ago for some of those tasks it would have made sense to find you in an open source language model but today it actually might make sense to build that on gp4 a mini and then 6 months to a year it might change again it might shift around a little bit um or shift around it sort of uh in the same direction or another direction or something it's just it's all kind of Shifting around as the market changes and just in a really interesting way and now how are you seeing those Dynamics play out differently in these other modalities you talked about video audio image needing a lot more kind of in the weeds tuning and kind of peeling back the layers of these models and getting in there and getting your hands dirty why do you think that's the case you spoke about the dimensionality do you think that those models are just weaker underinvested to some extent we just need to wait and see I think there's some interesting developments with multimodal models that do like whole things end to end and I think that's super interesting in reality for a lot of these kind of complex multimodal applications there's almost always going to need to be some kind of duct taping thought and I think what we find when these models hit the real world you can't just ship a big model as the product there's always going to be like some duct tape and some heuristics and some filters and some massaging of the output to make it behave how you want so you know how it pans out with how these models end up and how they behave there's always going to be a lot of duct tape in here to involved in Building Products you seeing that as a

Difference between building prototypes and production-ready AI products

pattern like when it comes to Building Products everyone's trying to like you know f i say fine tune which is probably not the right word but uh put that duct tape in place get it to something that's like production quality yeah we see this both in our customers but also a lot of the people not have built real products and the the sort of intuition for understanding this is it's very easy to make a really good prototype and something that looks really cool but that is uh some extent it's like most products but to even more extreme extent I think that is 10% of the work the remaining 90% is getting actually working in production as a sort of reliable robust product and that is a lot of it's a lot of work on the model but it's also just a lot of heris stics and duct tape surrounding it we see that too like a lot of that I call Last Mile work is not even the fun fancy you know research problem it's product development work it's engineering work it's putting those guard real and it's testing with users testing exactly yeah I think something we're learning from the open source world as well is that we're building so who we're building for primarily is the software engineer building products but the sophisticated ambitious software engineer who wants to use Ai and learn about AI so we're not building for like the machine learning researcher we're building for something somebody who's on the the sort of a bit higher level but digging down if that makes I think I've heard you call him like the AI engineer yeah this was a term uh coined by uh somebody in the a community called swix yeah who it's effectively like an ambitious software engineer who can build AI systems at a high level but they might not be kind of training models from scratch or nothing like that they can get in there they can Tinker they understand how it works but they're not training models from scratch exactly and we're building for that developer and I think something we find from that developer which is really interesting and I think this is a software developer thing really is that they want they don't want guard rails they don't want to use things that feel like toys I'm curious just because you've seen so many people build AI products on replicate that people start out with an idea of what they want to build a product a feature a service they then take it out to Market they put it in front of their users their customers and they completely Chang their plans either because people interact with that thing differently or it didn't work as well as they thought it was going to are you seeing that as a pattern 100% yeah and it's sort of a similar to any building any kind of products but I think it's just exacerbated the fact by the fact that these systems are very complicated and people are not really sure how to how what do with them like with any product it's coming from two sides it can be either technology driven or it can be sort of pulled by the problem uh so we see customers who are like ah we have this problem with our product um let me see if I can try and use AI to solve this problem you sort of you know triy a bunch of things try different models see if you can solve the problem or which is quite common particularly for startups and particularly with a lot of kind of these new models that have come about in the past couple of years there's this really cool piece of technology and what new kind of products could we build with features could we build in our application that could be done with these and it's a mix of both and both require it iteration and experimentation uh and yeah it's a process so you guys have really like

Replicate CEO’s biggest surprise over the last 2 years building in AI

rid in the wave I feel like over the last 18 24 months in a good way you know uh I agree I think there was this perfect storm of like stable diffusion and then llama and everyone was talking and thinking about Ai and rep was helping people to quickly experiment and prototype and customize AI for new products for new prototypes what's the biggest thing you've been surprised by over the last two years with all the AI craziness that's that's just kind of taken over I think one thing that's been really I mean as you've experienced as well just the extraordinary uh the rate of innovation has been incred just been Ely extraordinary particular in the sort of midst of some of the developments last year last year just a new thing was coming out each week that was not possible before yeah I think coming into January 2023 compared to now like the perspective of most people on just what the market would look like is and the tooling layer so different it's been crazy um well cool thank you so much for sitting down with me it's been awesome to chat with you today and really appreciate you doing this yeah it's been great fun thanks for invite me thanks

Другие видео автора — AssemblyAI

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Дайджест Экстрактов

Лучшие методички за неделю — каждый понедельник