Day 19: Prague Roadtrip, Single vs. Serial AI Agent showdown, EmailSpy update [Update #07]

23:07

Day 19: Prague Roadtrip, Single vs. Serial AI Agent showdown, EmailSpy update [Update #07]

n8n 20.09.2024 1 363 просмотров 36 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

The week has been both exciting and exhausting, but the community's enthusiasm keeps me going. Your ideas and participation are truly inspiring. 00:00 - Rollup from last days 04:31 - Prague trip 05:32 - Single vs. Serial AI Agents with Oleg 17:14 - Evaluating the Agents 21:04 - Wrap up and next up -- 🔗 Explore @Notion Database Assistant build-in-public project: https://bit.ly/notion-db-assistant 🔗 Try StarLens here: https://starlens.aisprint.dev/ - 🔗 Follow the 30-Day AI Sprint journey: https://30dayaisprint.notion.site/ 🔗 AI Newsletter Signup: https://bit.ly/aisprint-register 🔗 Follow me on Twitter: x.com/maxtkacz 🔗 Connect with me on LinkedIn: https://linkedin.com/in/maxtkacz

Оглавление (5 сегментов)

Rollup from last days

it's the end of day 19 and I'm going to make this little intro segment pretty quick because it's like 8:00 P. M on a Friday the Sprint is picking up intensity it's been an insane week in a really good way and a really tiring way as well I'm getting so much inbound love so I've decreased a little bit of my velocity in building just because there's only you know people hitting me up hey what about this use case what about this and a big part of us is engaging in communities I'm totally okay with that because I'm getting a lot of folks dming me saying hey Max I started a Sprint we got people waking up on a Saturday at 6:30 a. m. to build AI Workforce so I'm truly inspired by that I'm having so much fun especially the opportunity to collaborate with folks so please keep pushing keep giving me your ideas you're the wind underneath my sales first up I went to Prague this week two-day trip in an out job was really fun we had a Meetup it was our first meet up in Prague had some great talks Le from the community did some fantastic stuff with uh a fork of nidm we doing some unit testing all gave some amazing demos of upcoming AI features I did record an interview with Olf about that I don't have time to put it into this update cuz I want to get it out but I'll put it out on the next one so definitely catch the next one cuz you're getting a little sneak peek on what's coming down in the AI pipeline at and the day after the Meetup I had a hackathon day with Olf now olith is one of the main Engineers on the AI features who's super knowledgeable stuff what we ended up doing was taking a single agent approach a Serial agent approach that is where you have one AI agent output some results and feed that into another AI agent there's some evidence that for some use cases that could be better our results found that the single agent approach for most use cases is probably going to be simpler and there's going to be less work to do you get a lot more control with the serial agent approach but it's going to be longer so for most use cases especially if you're not building some Enterprise use case where you need infinite control I would definitely start with a single agent use case the majority of the rest of this video is going to be a look at Prague and my hack with all we get into detail I didn't have time to publish all the workflows for that but on Monday I plan to publish all those works you I have a play around the use case that we did was the user inputs an academic topic and then the AI will research keywords find relevant academic papers summarize them and output references my flux generator that I put out been getting some love and uh I think I get why there's so many crappy sites that are charging money for things that are free just cuz they're putting like a little UI on top but I still want more people to get that value so I'm doing a sneaky little product hunt launch tomorrow morning Saturday morning very last minute threw some assets together let's see what happens already today you can try the generator from my own net instance I'm serving it through a form trigger but you can also clone it um it could be headless service that you're using you could tweak the trigger to have different inputs and whatnot so definitely check that out if you haven't and definitely please I know it's a Saturday but head to product hunt um then check it out and if you like it support the product PS I did not ask you to upload it and that's important I'm not asking you to upload it I'm asking you to check it out and if you like it show it some love and then email spy so I've been a bit of a bad egg on the emails pipe project cuz there's been so much stuff on the go but Oscar has been an absolute Gem and has done so much work on it so far he's done four workflows so far he's got the main workflow he's got a URL tool this is the tool for agent to get URL he's using Puppeteer with that there a text tool same as the URL tool and there's a got a test version of that what he would like me to build next is a cash for all this similar to what we did on sty Lins so that when we launch that on product hunt next week if people paying the exact same query it just Returns the cash result he also was having some issues in versel so in versal the maximum timeout you can have when you make a request to some API is 60 seconds a lot of this stuff scraping multiple web pages and running through llms takes longer than 60 seconds and this has been a Common Thread in the AI Sprint 60 seconds in the API world is rather of you kind of do an async pattern at that point right where you make a request you wait some time and then you fetch to try to get it again or you do some sort of callback pattern so you basically make the request and you tell when you're making request hey when it's ready ping my endpoint which would be like your web hook trigger and then that's how you do that Loop but I think with the Advent of all these LM models and the idea that hey 60 seconds might not be that long to do something that took a human two days to do there's definitely some different strategies around that TI if you're building on versel keep in mind that they 60-second hug

Prague trip

limit I just got into Prague it's 20 minute walk to the hotel it's a beautiful day and taking a bit of the city on the way know about I just checked out of the hotel now I'm waiting on an Uber to go to a co-working space Al is meeting me there and then we're going to start hacking away at something exactly what I'm not sure yet later today I got to catch a train back to Berlin there he is hey ell good morning man how you doing

Single vs. Serial AI Agents with Oleg

so over our morning coffee we were thinking what we should do because I actually had no idea by the time I got here what we're going to do and something I've been seeing a little bit about is serial agent work flows right so we have one agent do something and those outputs go to another agent which is different from the multi- agentic approach and I built a very simple work for like this for doing some pod customization one thing we were talking with ol about and thought would be pretty cool tool is to build a use case where we have a single agent workflow do a Serial agent approach and then we can compare and Benchmark the two and see which one's better so you guys can take those learnings and use those in your own workpl so ol what's the use case that we want to build out so we want to compareed two approaches the use case going to be you would provide it with a list of keywords topics that you're interested in and we would search for those keywords and find some relevant papers whatever relevancy means at this point that's something we have here figure out we would get those PDFs let's say five PDFs chunk them and then we would ask agent or a series of agent to provide the summarization of each of those papers and then ideally we would Benchmark the result of this summarization based on some criteria like how readable it is how well the summarization has been done using some smarter models actually also us reading the papers and uh yeah concluding how good it is unless reading and comparing which one sheet we would want to read like the single agent one sheet or the multi- agentic it's maybe not as a scientific test but I think if that was the end result of why we would use this tool I think it's also okay if at least for this one-day hacker D if we run a couple versions of it and compare it and vote they do a blind vote I just WR them out and see which one you would prefer and so at a high level I think how this work is going to work is yeah you type in your topic like AI let's it takes relevant academic papers some izes those and up puts that so that's quite a lot of steps to do for single agent right and I think one thing I've seen in the community is when we have an agent doing a lot of things it gets a little bit more confused or it can be a bit less predictable it might have good outcomes but less often the serial agend approach is going to be more expensive right it is going to take longer there multiple LMS running like we predict it's going to be longer probably so it' be nice to see what benefits it has cuz if it's a bit more expensive maybe it's a lot more expensive but if it's a high quality then people can use that in the decision- making Case by case cuz for some use cases that's no problem but for other ones you know it's some free online tool that's going to be a big problem when you rather than the worst quality but it's cheap so this all sounds like a great plan what's the first step what do we do first well we can start by first getting the keywords that what you're interested in and there we can already use some even a simple chain to from this list of keyword generate some search phrases for the search engine that we're going to use to search for the papers M because you know I can just type AI that might not be the ideal phrase to search for because they just very broad once we get that we get bunch of papers we grade them based on how old or new they are and some relevant effect like maybe citations we get access to that in API what do you think for the first step say you know we got to capture the stuff we use it what if we use a form trigger yeah instead because we could use the chat trigger Bo but the form trigger is this sounds like a headless service we could launch you could consume this via API it can be useful with the form trigger the way it's kind of set up from my experience it's very easier than swap that out for webhook trigger so we can ship this and we can have hey you can consume this headless or you can use it incl the UI and test it out M sound good yeah that sounds like a good first step then let's build okay quick status update we split up the project into two parts I'm working on the first part so that's I getting the topic from the user and then outputting keywords for that what all and I did was we just agreed on the output schema that my part of the work was going to do to his just like Engineers would do with the front end and back end right so then we were able to basically uh break up the work and work into anal which pretty cool so my part's working but it's also a very simple part I just have to outut an array of keywords what's your progress H wait minut I created a tool that one of Peter agent going to use are we going to use sequential sequentially that takes these keywords provided by Max's workflow it fires a request to the ky. org it gets papers and summary of those papers for every keyword and it's it gets arbitrary number of results so for now we're just using two papers per keyword but that's something that agent can decide on their own or we can specify it run or what not and it gives you gu aggregated list of these papers now I'm going to be working on setting up tool that can summarize individual paper and then providing these tools to the agent and let's see how well it's able to use them and come to some conclusions basically cool since my very easy part is done now could we do a similar thing for some of the project what could we split up that I could work on maybe you could start working on some of the evaluations we also need the single agent approach too right should I try and build a naive one of that I thought that I would start with there because single agent would be just an agent that has access to these tools and you're building the tools already it's going to be F you build that cool so yeah I'll build I'll start building evu thing yeah cuz we're going to need a work for that runs these gets results adds them to a table let's say notion cuz that's where're going to have the whole project so everyone can see those runs P them and then well let me get that done first so I need to be able to run these work post get the results once we have the results if we run out of time cuz I got to catch a train in a couple hours if we have the results the crunching them and the doing a quick you know hot take on which we think is better we to do afterwards yeah cool let good all right let's get building a few moments later so all how it works we're not going to distract him what I'm working on now because the keyword part was really first is we're going to need a way to evaluate these so what my workflow is doing here is it defines some test variables this is the topic that we want to run and then it uses these execute workflow nodes to run both the single agent version and the multi-agent version it then merges those and adds it to a super based table what's going to happen is each time I run this perform a run workflow it's going to run each one of those once with the spefic specific topic I think what's going to be good is for each topic we do 5 10 executions so that way we can generate 30 40 50 of these things and we might not make it to actually evaluate the results St but then we'll have it in a single source of Truth and then we can crunch the numbers and once we figure out which one wins share that as a template for you guys but I'm going to make sure that this is all available so you can inspect and also evaluate it yourself and in a sec we'll check in with to see his progress ol can you get back to work please I'll figure out lunch thank you much lunch is here we got to keep our developers happy right so I'm going to go grab there um let's find our Empanada delivery guide hello yeah are you um downstairs or upstairs hello let's try not to drop these kombuchas they were not cheap so we got lunch we got some empanadas um August choice right jents lunch is here grab that but he so before we let you eat how what's your progress how how you doing so I connected these tools it's now running for the input I'm testing my second pomp where also generating the keywords based on the user question and I just got some output I asked the question what's the latest of effects of inter fasting it came up with some keywords based on that and started to getting the vapers MH is this a single agent or is this agent yeah see they go some result it selected the ones that thinks it might be relevant for the summarization run the summarization tool Forum finally came up with output and it also included the relevant references and citations now we're going to eat and we'll see how useful this is so with this sequential approach uh the question is how strict we want this to be because here we know that user asked for a topic being intermittent fasting that we converted to a few keywords that we got the papers from and now I'm using this chain you see I had to retrieve 900 items to get roughly five of them being relevant that's how much non-relevant stuff the API return but the question is do we want to match only based on the user's topic because when I asked about intermittent fasting if I'm including the keywords in the prom I'm getting stuff like personalized weight management through variable devices and artificial intelligence which is somewhat related to wave loss and fasting but not directly so we need to decide if you also want to check the keywords for really relevant papers or really only strictly check for do I think at the end this is going to be a variable the end user consuming this service or using this is going to want to it's going to depend I could see the remote we B strictly just this keyword so for now given also a time crunch I would say this version is fine because I think we know that that's going to be easy for us to add as a logic that we then just like fil you know what I mean like capture from the form or whatever and then have that populate here get the prompt yeah that sounds good right so right now um all is raising my blood pressure cuz we're not done yet and we got to leave in the SEC so hopefully these words of encouragement help I'm just kidding hold positive VI how we doing well U I'm running the test execution to getting the paper summarized it's seven papers maybe I should have limited it to two it's currently running this St is skute okay so that might take a while set on papers potentially a lot but the good news is even if we don't make it right now to test it we basically have things in place to run some eval and in from that we might want to tweak some of the things want but basically by tomorrow because it's Thursday today and I'd love this to go out in the Friday update so the Friday update we'll be able to have this I'll be able to pick this up from this yeah it's possible and I'll make it back home tonight so that's good cool I'll grab my laptop I wonder how much this cheesy pop music in the background of this co-working space is going to make it in yeah so far what we've gotten done is got two versions of a workf done almost done but summarizing or taking a topic uh outputting keywords finding academic papers summarizing those outputting a text two different versions of that and an eil stack which we have to switch from notion to super base and the one thing we might not get to is getting it all tested today I that's pretty damn good V 4 between you and me we started at 10:00 a. m. so I think that's pretty good 10:30 probably after the coffee if I ever say we'll just cut that out I'm not going to do that it's going to be included just oh crap all right while ol is finishing that let me use the power of AI let's see if there's going to be traffic on the way yes it's going to take 20 minutes now before it was 15 we better Le soon oh trick shot no way and it h got skills yo okay so made it to the train 2 minutes to spare little out of breath but just save the company a bit of money on a last minute ticket let's see how much further I can get I got 4 hours on this train ride great trip in prg and so cool to see the community out here talk with some folks see some great use cases and build with all day two so running the

Evaluating the Agents

evaluations right now one of the issues uh I've been having or one of the comparisons I'm seeing already is just the runtime so the serial agent is taking 10x longer to run it's also failing a bit more of so we can see the ones that succeed are taking like 14 and 1 half minutes on the individual one we're doing that in about a minute and a half it's about 10x longer now if the results were demonstrably better of course in a lot of use cases one and a half minutes versus 10 minutes if it's something asyncronous would be totally fine as long as you understand that and then it is arrowing out though and times that it errors out what's happening is if we zoom in here one of these sub workflows that we're running it's this one here if you go into the ATP request we see that the parents sending in a null PDF link so the way this is structured right we would now have to build in error validation between those two whereas if it was a single agent doing that if it gets an error from an HTTP request it goes it understands that it got an error and might try to run it again actually and the previous ones where I have an HTP request which is which was for Bing due to my own setup error the agent would try to use it two three times and then told use a haste sorry it seems like tools not working so one learning is if you're going to have a Serial agent approach after each agent you are going to have to validate the outputs because the other agent expects a specific schema or something and it's going to break if not you have to add in what happens if that isn't the case and with a single agent approach so far it was working so I'm not sure if we're going to be able to do a proper evaluation run at the same time I think that's okay because by virtue of not being able to do the evaluation run due to this thing taking 10 times longer and being more brittle is already some kind of a verdict now again this is not an academic test right the ways we did this seral agent perhaps there's other ways to do that or other use cases so I don't I wouldn't say that never do a Serial approach because I do have a different workflow that I can show you while this is running which I was getting some good results from and so if I go into that it's called the generate History Podcast project so I was trying to do this before with a single agent approach and the results it was being too short so what this does is it takes a topic like a historical topic and then I modeled out what might happen in production house to create a podcast so the producer might do an outline of it this outputs chapters we break those chapters into each chapter itself because it outputs in markdown and then we have a loop where we have a research agent research each chapter what do experimenting with here is actually also having it go to superbase and add its research log so it's a summary of the previous chapter is almost kind of like a blockchain so that when it goes back into the agent it knows what it researched in the previous chapter without having to feed that all in it does each chapter level research adds it to a super base those all have a workflow execution ID so that by the time we go to the done Loop and go to the actual script wrer what I was going to do is it's going to have a superbase tool where it can pull all the research for every chapter from superbase and then output this so far was better than a single approach but as we're seeing with this other one when we were doing with ol get this single execution is still running so we're not going to get a formal eval today I'm going to cut this off because I got some other things to do but I will update that the verdict is whe the single agent approach seems to be better here so whatever solution you have I would extrapolate and say build a single agent solution first there's a benchmark because you can do some runs on it you know where you stand and then if that's not good enough you could take on the effort to try a zal agent approach it is going to be longer but perhaps for use case that's okay but I would always uh go with a single approach first because

Wrap up and next up

that's what I did in the History Podcast I wasn't happy with a single approach and so I did the serial one a few minutes later so it's been a really long week I'm going to go recharge and get my Ravon in Berlin but next week is the last full week got to make it count please show some love to the project build some workflows and I have it on good authority that we're going to get a little competition together for the best flows built by the community just pretty awesome so really get building on those floor so you have a little extra time uh to submit those and yeah next week there's a lot to do right we got to launch email spy which is non-trivial app there's a quadrant collab there's probably some projects I'm not going to get to but we got to push push and one thing I'm seeing you know you can use Siri to run and itent work so Siri handles all the voice stuff it's a super easy way because Siri can be run through the shortcuts app on iOS and on Mac and the shortcuts app can invoke an API endpoint so before the Sprint is over mark my work I got to ship something with voice we're going to see me talk with the workfl so I know I'm saying that on camera so that's a little stressful and my blood pressure is definitely going up but it's a Sprint let's get it done I really hope you enjoyed this video it took a hell of a long time to edit down but it was also a lot of fun we're getting closer to the end of the Sprint now every day counts I really ask for your support in building along with me in giving me feedback on what like you'd like to see more because at this point the Sprint's doing pretty well and I think we have a pretty good into turning this into my full-time jobs every day I'm flow graming I'm sharing it and I think I had a ton of learnings from the Sprint in a long-term format we could do it a bit more efficiently we're going deeper into tutorial content and stuff CU I think that's one place where I've been a little bit light so apologize about that I'm doing a lot of different stuff on the go filming the flow gramming collaborations and editing it's all fun but it can only get better with time please send me your ideas Twitter LinkedIn in the comments I'm monitoring everything and if I get to many comments we'll just automate it and summarize it so I'll definitely get your idea thanks everyone give you

Другие видео автора — n8n

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник