‘n8n AI Projects at Datakoppelaars’ - from the Amsterdam Meetup (November 2024)

23:06

‘n8n AI Projects at Datakoppelaars’ - from the Amsterdam Meetup (November 2024)

n8n 25.11.2024 1 063 просмотров 31 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

In this talk from the n8n Meetup in Amsterdam, Jelle de Rijke from https://datakoppelaars.nl showcased a fascinating project his company developed: using AI to rewrite complex legal documents into a more consumer-friendly format. The project involves a series of AI agents, each handling a different aspect of the process. Links: - Read the full report on the Amsterdam meetup here: https://community.n8n.io/t/amsterdam-november-2024-meetup-report/62428 - Interested in hosting a community event in your area? Become an n8n Ambassador: https://n8n.io/ambassadors

Оглавление (5 сегментов)

Segment 1 (00:00 - 05:00)

okay um so my name is uh yella and I'm from uh data coplar nond Dutch speakers data coplar roughly translates to data couplers or integrators or whatever yeah and so the past few years we've built let's I don't know the exact number because I don't have a Brum system yet but I guess 100 plus Integrations with n8n and first of all I'd like to acknowledge that I don't want to act like uh we know everything about n8n or anything I just like to present some well I think interesting uh use cases of n8n some AI implementation and some other stuff and especially the stuff where we found uh challenges yeah because the stuff that's really easy I can show you a lot of Integrations that were done in 2 hours or something but I guess for a certain audience that is not as interesting uh so yeah like Bart mentioned we integrate with uh the bigger Dutch uh accy software systems such as Alas and exact and the like so that's one uh project I'd like to demo um I have another case where we work with well in our perspective larger data sets perhaps for one of you guys this tiny data set but let's see and I have a project that is Longing To Be migrated to n10 but so far I haven't been able to do it so maybe one of you guys afterwards can help me discover how to fix this issue yeah um oh yeah and for the people that didn't get it uh in Dutch a monkey is called an app and appy so okay so we have this first use case uh where we uh connect for one of our clients that is actually in the um let's say legal industry roughly yeah that that's all I'm going to say about this client um where we applied uh AI to do something that would usually take a lot of effort and that is um that they have like legal documents they are actually saved in a PDF file in apas which is rather unfortunate because then you got to get all through all these steps to get the actual textual content and what they usually do is they want to present these legal documents in a readable format uh on their website that is understandable for people that are definitely not uh like legal professionals yeah so that takes usually takes a professional uh text writer copywriter whatever and she usually makes so she gets as long as documents and she will make fairly eable readable summaries of these documents and this client asked us like hey could that be done in a modern and easier and more scalable way and uh so what we did is so we get all this data from a I've blurred it out here but it's uh it's just a request node getting the data from alas so we get the files and extract the text from the files and what we found is that uh even though the latest open AI models are quite intelligent we tend to uh treat them as a I would say junior employee yeah because if you give them like 10 instructions at once they're maybe going to pick up randomly two or three of these things and the rest they're going to drop and forget and not do so open AI does this as well and so what we do here is we uh inject Uh current summaries that are good great actually so really good summaries of these legal documents and we instruct the AI to say like okay we need a summary just like this but this is now the legal document and we want you to make it just like this but we found out that if you do this they sometimes forget to include some of the more important subjects they use difficult to understand words all sorts of issues so what we did is uh in Siri we added some different steps so some different noes where each AI we consider as a new Junior employee that gets his own little task so in this case this is

Segment 2 (05:00 - 10:00)

called s so to correct the summary and we instruct the AI to uh remove some difficult to understand words so we have a like a forbidden words list because they are used quite often in these uh legal documents but probably none of us could really understand what it means and so we have a specific instruction to say like don't do this because we found out that if you let the AI summarize document and say and don't do this well epex one of those two and not both um and then in the end I've kind of figured out like okay so maybe it should be uh reviewed and it should get a score based on is it readable and are all the relevant facts included and we do this twice because well why not it's cheap so we uh let the AI calculate a score of the summary and it only continues in this code note if the relevancy score is high enough and what is high enough I don't know we just did a lot of runs and discussed with the client um and basically it never happens that the AI uh gives us a summary that doesn't make any sense at all so that's great news and eventually we do this execute workflow thing where we uh because this client does not want automated summaries at all on their website so what we do is we send it back to Alas in a system that's called Insight maybe some of you heard about it and in this case uh there's a part where a human needs to either approve or disapprove the suggested summary so um yeah that's what I thought was one fairly interesting case um you guys like to continue to the next case or do you perhaps have questions on this particular example uh to yeah you first yeah okay uh what about data privacy so you're sending data to open AI yeah uh what about the names and that's a great question um okay so the question is for the online readers is what about data privacy because you sent all sorts of stuff to open AI is that okay it's not okay but the documents we sent are already um anonymized yeah so these documents uh the actual PDFs we are talking about they are also uh put on the website but to be fair for uh normal umv they're not readable so there is no uh sensitive data being sent to open AI in any case well open a will not train on API data no they won't but it's still not okay uh to send like because in this case uh this legal company they handle uh um complaints about uh your GP yeah so the it's a great question because the data is really sensitive it's about complaining that your doctor didn't handle you appropriately so you definitely don't want any names whatsoever y great question question 1 B okay yes question 1B please do so that you have configured three different agents no in this case it's not an agent it's just the regular open AI node okay so just a check completion you're asking something getting an answer and then doing another check completion yeah to to be sure that so in a sense this is not as intelligent as an agent but it is in my experience more controllable thanks okay yeah two questions the more important one is for this get workflow node uh and the human task that's behind it um is what are the mechanics of this it's I know in mad you can use a weight node to assign a task and start again with a web hook but you're not doing that here you're sending it to another system yeah and what it's calling oh oh yeah the system we send it into that is an apas product and AAS has um so what they have two products one is called outside that's to build a website and they have a system that's called inside and that is to be used internally in your company yeah you can't make this up but and so it's basically let's say uh it's mostly used as a sort of CRM geared application yeah so what we do is we send a request so there's no

Segment 3 (10:00 - 15:00)

magic going on it's just a request where we do a post and we send this textual data and the appropriate person will get uh they call it a in their Baka so in their bucket they get an item in their bucket that shows up that needs reviewing and they have like buttons to either approve or disapprove or whatever and then from inside alas it will be moved from inside to outside uh um no actually if you want to get technical about it um afa's main uh database system is profit and profit uh stores all the data so both outside and inside all the data lives inside of profit yeah but but if you click that button alas will move it to to be publicly displayed instead of uh I'm afraid that AOS is not able uh to do this so we do this with another na10 automation yeah that was my question yes we do this uh yeah that's what I was trying to understand are you waiting for some external event in that orange color code doesn't sound no no we just do a post request that's it cool second question y uh which may be out of scope for what we have the steps three through five there where you're you have a sort of a linear progression of AI yeah how do you know that three steps and not 33 steps why don't you dynamically generate sub workflows based on like a planning AI that says we need 10 steps for this thing or three steps for that thing how do you hard code the steps here okay the reason I mean first of all this sounds super interesting and fairly genius um I had two beers yeah they are 0% right um but no I so okay so the question is why do we statically set these uh serialized steps of AI and well my first reply is well that's a great idea but because this does the job yeah so first of all when we only did step three the results were very promising but they kept saying like Okay yeah but you keep on using these and these words and these sentences and we don't want that so I Tred inject them in the first step and it didn't work because then all sorts of effects were missing and so I just added stuff until it worked and now it works consistently and they're happy so that's the reason yeah yeah go ahead able to train the AI with the feedback from okay are we able to train the AI um I would almost say that's a question uh out of this scope because uh it involves knowledge about l LMS and as a basic answer I could say that if you use open AI in the way we are using it right here the uh llm stays the way it is yeah so there are ways to work around this I have one use case that sort of does this in a way and there are other ways that I'm not as well educated on but here no there's no actually you are because you're injecting the previous summaries that you have manually selected and curated to be the correct style of summary and that's called in context learning so that's actually what is happening there it's also some sort of training but it's a prompting technique basically feedback from the users right so if I could compare it with like the question where they were wondering about why don't you use an agent in this case in my experience I would compared to a zero shot agent where you basically hope for that if you do this one injection of instruction like you got to do it like this then it will probably listen and turns out it works thing is that you need to keep in mind the uh limitation of 128k uh tokens which you can inject and not more so if you have really a big document with a lot of pages then might be that is correct but these documents aren't that long what about error handling sorry error handling yeah sure what if the first agent or the first step to open AI gives an error okay so first of all uh currently this company has uh around 350 of these documents so to process them all by badge is not that big a deal MH but it's really important and it's really quite because there are a lot of people involved you know all sorts of I'm not

Segment 4 (15:00 - 20:00)

really familiar with the English terms for it but uh um anyway it the volume is not as big and so this can this will be actually triggered from within a custom web application that we buil so this is basically our thing we do because it makes sense with the data coplar yeah but we also build uh web applications so we will use this as a sort of API endpoint from an web application so that's how it works so if it errors the user will see an error and will probably call us like what's going on yeah okay yeah have you tried other llms besides CH besides gpts from open AI for example recently I oh yeah okay so the question is have you tried other llms recently I've been trying Claud 3. 5 and I tend to think that if there are a lot of technical details involved sometimes Cloud 3. 5 does a better job but so far this been good okay cool but yes it's it's relevant and for instance for meeting summaries and stuff like that I think clot works better but I'm also confident that open AI will fight back okay um so given the time yeah if you could take like another five minutes max sure we run over time a little bit but I do want to hear a bit more cool okay so I I think this an interesting thing to discuss like but how about working with in our perception uh larger uh data sets with n8n so I have two slides let's get through them so basically here we have a challenge where we get this request like okay we got 880,000 uh invoices we got to get them away from some system real fast yeah um so what we do is we build this workflow it does some stuff and what we tend to think that it that works for a scaling is that you do a loop over items and you execute a separate workflow because then na10 tends to choke up less and here you can see what the ex cute workflow looks like and so this works for 80k plus docks and that's great so for if anyone would be wondering like hey can you do semi larger scale stuff yes you can um and so another use case was a similar job but then with uh 500,000 and uh this was actually done from like thinking about how to do this by harm uh couple of weeks ago and it was done in one and a half days I think like including time to figure the stuff out um and what we did is we created a mySQL database where we registered which IDs were already processed so which invoices were already processed because and it then would from time to time crash or we would get some API rate limit error or whatever but in the end we met managed to transfer all the 500,000 invoices and then when we got into conversation with Bart later on he said you're going to love this new feature because we have a new D duplicate node that can get IDs from previous executions and my question to Bart is does that also work when the execution fails because that is the situation where you need the ideas and if he only does it for successful executions then it doesn't solve the problem that this workflow does solve so I don't know if it makes sense the way I explain it right now but we register in the database which records we have processed and so if all hell breaks loose and n8n becomes unresponsive which I don't blame n8n on but it can happen then we know okay so these records we can skip now yeah so that's the basic how to and there are all sorts of different settings and you can have multiple workers and everything but if you just want to do it simple it's possible yeah I don't have the answer right now but I can get it for you so I'll do that alternative solution for this could be to set up like a que like rabbit mq and just push all your jobs in there and then you can like easily process it from different workers as well sure um so and like I said I'm not acting like we know everything about it I'm just showing how we did it and it was like this and the client was happy so we're happy I think your point of using subw workflows is essential here because Go free up the memory after they

Segment 5 (20:00 - 23:00)

run batch that really allows you to run larger scales okay and one comment from my side with this is that it would be great if it would be possible in the future to uh easy to make it more easy to discover uh rate limits on both API end points because then you can sort of think calculate a way like Loop over how many items and wait for how long and blah blah and it would also be great way to have some sort of way to discover what of my current NN instance whether it's a self-hosted or Cloud hosted because we use mostly Cloud instances um be great to get some sort of intel on what is keeping us up why is it CPU meem what is it yeah okay um so I'll just quickly click through this one because I have 15 seconds left but uh and so we're not going to discuss this in in detail but I have this project I've been longing to migrate to n8n this bit here yeah so it's a flutter flow based mobile application that has chat functionality inside of it and the chat is 100% client to Ai and this AI is an agent-based model on L chain that I've started building when L chain was first released so probably really old Concepts that are now outdated but yeah um and then this agent searches in a vectorized database that's too bad that the AI specialist leaves to get pizza um but it will return documents with a relevancy score so how relevant is this document to the question asked by the client and if the relevancy score of the documents returned is not high enough uh we are afraid that the AI is going to come up with a BS answer so we exit the Lang chain agent we say okay stop right now don't do your thing anymore now you're going to just return you're going to respond like you're going to get a human answer within I don't know 24 hours um and that's it and we call a uh a telegram bot and we ask an actual human like okay so this is the suggested AI answer that is generated and you have this button to either approve or disap approve and if you disapprove you need to input your manual answer and when you respond via telegram it gets sent back to the flutter flow application so I thought that was pretty cool but I want this in na10 because I don't want to have this one thing in Python running but I can't get the filter on relevancy score so please uh if you are done with your presentation bother me with your input uh yeah let's do it so that be it thank you

Другие видео автора — n8n

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник