‘Combining your knowledge with AI’ - n8n Business Lab (November 2025)

38:49

‘Combining your knowledge with AI’ - n8n Business Lab (November 2025)

n8n 06.01.2026 1 713 просмотров 44 лайков

Machine-readable: Markdown · JSON API · Site index

Смотреть на YouTube

Поделиться Telegram VK Бот

Транскрипт Скачать .md

Анализ с AI

Описание видео

Daniel Pötzinger and Christian Gaege, n8n expert partners from the @aoepeople team, share how they use n8n to combine their company knowledge with LLMs using Retrieval-Augmented Generation (RAG) as a framework. In this keynote on the first edition of n8n Business Lab in Wiesbaden, Germany, they talk about how the indexing pipeline is crucial for preparing data for retrieval, and how essential evaluations are for enterprise AI applications and keep improving RAG applications.

Оглавление (8 сегментов)

Segment 1 (00:00 - 05:00)

Our topic is combining uh your knowledge with AI um challenges that you normally have when you start with it and solution options that you have uh in NEN and how to solve them. So what will what we are planning to talk about is um of course introducing why do you need um company knowledge to work with your AI? Why need to um yeah think about this? Of course, we start with the nature of LLMs very quickly and then introduce rag. We'll start with a so-called naive rag approach. What is it? How it's working? How you can build it with nn and what typical challenges you um like see when starting this approach and then step by step introduce a more advanced rag approach where we solve some of these challenges. Um then we come to evaluations. Nick already mentioned it. How can you do um serious AI automation without evaluation? So we'll have this in the talk as well. Um and then some summary key takeaways and Q& A. So let's get started. I introduced myself in the starting. So um Christian. — Yeah. Hi, I'm Christian um AI architect here at IO. Very nice to meet you here. So today we we're going to talk about business use cases that leverage AI and the importance of bringing in your own infrastructure. So as you as you know with chat GPT you can answer uh generic questions about openw world knowledge that all the big models already include but they are typically lacking company specific knowhow that is not worldwide available and has been included in the training data of those models. And this information typically has a crucial role really essential for business use cases because you need to consider domain specificity. Um the answer that the AI application should provide should reflect your internal knowledge like product knowhow information about your internal processes if you want to automate them. You also need up-to-date uh data. Typical large language models have a cut off date like one or two years ago and if you really need real time data for your automation processes you need to bring them along and also traceability and trust I mean we've all seen hallucinations and models and in order to really get grounded information uh into your answers you need to to put your company knowledge along with the large language models So yeah, maybe just to to summarize how LLMs typically work, you have text input, your prompt, you your question that you're answering, and you get text output. That's basically it. And that's all the the only option that we basically have to to get our question answered. And those models have been trained basically on publicly uh a available data on the internet but not on your internal data because they basically had no access to it. And the problem is the only way how we can integrate internal data into those L& Ms is by adding them to the prompt. So typically nowadays we don't train our custom models anymore. We don't do fine-tuning. And but we have this option since the large language model models have this huge context window. We can put a lot of information into the prompt and that's a pro maybe our target gate how we can add internal information into those LLMs. Yeah. And as I mentioned context windows are getting bit bigger and bigger. We can put more and more information into our prompt. But the problem is that leveraging a large context window having a huge prompt is expensive. It is slow and it may confuse the LLM. I think most of you have heard from of the needle in a haststack problem when you basically overload the model with too much information and it's really hard uh for the model to identify which small piece of information is really essential to answer a specific question. So basically we need a solution where we can include as much information as is required to answer a specific question but not more and this where is where rack comes into play. — Yeah. So this is uh where rack comes to play. Who has heard about rack? Okay. Good. Who've not heard about it? Just okay. Good. So we can tell something. There was a lot of no hand raises. Um so quickly um what is rack? Um rack is kind of a framework that

Segment 2 (05:00 - 10:00)

gives LLM access to external data, external knowledge, um in order to ground its answers. So the goal is that the LLM does not answer with the trained model knowledge, but it's answering with the knowledge that we give it and to crown the answer actually. So how it's actually working? Um a rack a typical her architecture has two parts. Um it has one part that's the indexing. Um also called rack ingestion. So you take whatever knowledge you want to use and prepare it to be retrieved later. Um we'll come to more details later but that is the indexing part. Um normally it's a vector store. Um that's very typical for reg approaches. We come to this in a minute. And the second part is um retrieving this data. So using that data in the inference part. So when you actually want to work with the model, it works usually the way that um your AI agent or application wants to do something. So there is a task to solve and the first thing we do is based on that task retrieve the relevant knowledge that is the retrieval part in rack. Um then augment the context. So augment the prompt that is sent to the LLM and the then the LLM generates the answer based on the like um prompt that is augmented with the company knowledge and then you have hopefully a response that is grounded by the knowledge that you have. That is the basic idea of rack and uh I said a typical tool used very popular for rack is vector databases and yeah quickly introduce what vector databases are. Just to give a brief summary what is a vector database and how which role does it play in a rack context. So a vector database basically is a spe special type of database that is able to store information by its meaning. So it is able to detect okay we have like animals on the left hand side and we have here in the screen a multi-dimensional vector space and each of the dots represents a vector in the vector space and vectors that are pretty close to each other have the same meaning like having a a chicken and a bird maybe or a cat and a dog are closer to each other while concepts that are not really related to each other each have more distance in the vector space. Sorry, just loing losing my headphone here. Okay, so yeah, as you can see, we have the bananas and the apples like concepts pretty close to each other. And that this is basically the idea store information by capturing um the context and the meaning. And how does it typically work? How do you get information into that vector database? You take unstructured data like images, documents, uh videos and you process them using a so-called embedding model. And this embedding model is taking for example for text like sentences, words or entire documents and it's converting them into a multi-dimensional vector like numeric representations. And in the vector database, we can then store that vector embeddings. And this is basically the the foundation to be later on able to spec uh to fetch specific information from the database by its meaning. So we can find possible answers to questions by identifying what is the meaning of the question and which part of the information could be a possible answer for that and choosing the right embedding model is the first challenge. So there are a couple of available uh embedding models like open source models, proprietary models and these models are have been optimized for like a specific language, a specific use case. So this is the first thing and there are open benchmarks that compare different um embedding models and this is probably the first starting point. Pick a good a high ranked embedding model that fits your language as a starting point. Good. Um, let's uh dive into an example in N8N. So, we've prepared a cooking recipe example. It's maybe not the perfect company example, but everyone can relate to. Um, so, um, we have some data that we want to use as knowledge. Um, it's stored here in this example, stored in a database. You see the data structure. So, recipe name, um, the ingredients, the how to cook it, timing. So everything you need um for a recipe and this is kind of our example and we'll dive directly into NN

Segment 3 (10:00 - 15:00)

how both indexing and ingestion work in this naive rag approach. Let's see if that is working. So we have prepared an NN workflow here. Um and you see the top flow is this indexing pipeline and a typical indexing pipeline starts with um retrieving the data. Um in this case it's retrieving the data from a database. Um then um you normally do some transformation. In our case it's very simple. Um we take the data from the recipe. I can also do a quick run. I think we don't break something. So then we can look what happened. So um yeah some configuration then um we fetched this, 90 recipes from the database for testing purposes or for showcase purposes. We limit it to just have two more to limited to two in the flow. It's really just to show it here. Um this step is preparing the text that we want to uh embed for the vector store. So um we're just making a full recipe like having the name, having the ingredients, timing, directions. So everything you want to have um for the later semantical retrieval. And this is then uh stored in the vector database. Um we are using the VV8 vector store here. Um and you also it's very like flow cramming. It's nicely presentated here. Um normally the storage in the vector database um is done in two steps. First you chunk the document. So if you have very large PDFs, you're not like vectorizing the complete PDF. You come up with useful chunks. That is done here by the document loader and the chunker. So [snorts] you can decide the chunk size, chunking overlap. Um and you see how it's chunked. So we chunk it pretty small here. We'll come to this later. So one recipe is chunked into like text not bigger than 500 characters. Um so you have this chunks here. And these chunks are then embedded. You can even look here um what is the embedded uh vector and this is then stored uh in the VV8 vector store. So this is the first part the indexing part. Let's see how we can use it. So we have a very simple inference pipeline here uh that can be trigger triggered by um a chat here. Let me clear execution. Let me clear it here. Let's try like how can I cook a nice apple pie. So it's running here and we first res retrieve informations from our knowledge base um from the vector store. So that's done here. You can also see um what is received here. So how can I cook a nice apple pie? And you see what has been received. just one fitted chunk pretty not much. Um and all the found chunks are then given into the AI node and here you can nicely also see the prompt engineering very simple. So as usual, you give that AI agent node a system prompt. In this case, it's a helpful cook and should answer cooking related questions. And then you give uh the all the content that has been found in the vector store. So all the found knowledge chunks together with the user question and he should make sense of it. And I think it makes sense. Here's a simple recipe to cook a delicious apple pie. And this is grounded by the knowledge that was received by the recipe database. Good. So very simple indexing and um inference pipeline called narack that is how it started. So can I start the presentation again? Perfect. Good. We have seen this. And now to the challenges. — Yeah, Daniel just nicely explained the naive rack approach so-called. So, so what you typically see that nowadays it's pretty easy to throw in your PDFs and don't worry about indexing details and you just the magic just happens. But if you um dig deeper then you will notice okay sometimes the results are not expected and the answers that you get seem to be okay at the first look but they are not really grounded and we will now want to introduce some of our learnings that we've made in in various customer projects where we learned okay nave rag falls short on some aspects and then we

Segment 4 (15:00 - 20:00)

will provide you some possible solutions how they can be addressed. The first thing to stick with our recipe example is the problem that typically poor indexing and retrieval. So it's not the LLM actually that is not behaving correctly. So if we you know the wording garbage in garbage out. So if we going to provide the LLM with insufficient or unrelated information, it's going to have a hard time providing us really good high quality answers. So for example, if too much context was lost in indexing or we have conflicting information that we provide the LLM with or we give them irrelevant documents in the context. So this will either result in incorrect responses or illerent answers. Mixed chunks without context. That's a common problem. So if you take a naive indexing approach where you have like your cookbook with all the recipes, each recipe is one side in the book and you you take all the pieces all the sides out of the book and you take scissors and you cut them into small pieces, throw them all together and then give them the LLM to answer your questions. For example, let's say we have a recipe for spaghetti cabanara with some chunk. Whisk the eggs and cheese together in a bowl. And we have another recipe for region pancakes. Whisk the almond milk and flour until smooth. So there is wording that is pretty close together. So both chunks contain whisking and something with a bowl. So in the vector store these two chunks will be might be pretty close together and they might be returned for a query. So the LLM might eventually come up with having both pieces and the you can see the result we have like spaghetti with me and pancakes and that's that that's probably not what the user wanted to cook. So how can we sorry how can we fix this? Okay here we go. So the idea is while indexing we have to preserve the overall context of the trunks and in the context of a recipe we may want to attach metadata like a recipe ID or a recipe name for each of the different chunks. So the LLM will be later on able to detect which chunks belong to which recipe which provides valuable information for the LLM and this metadata can attach can be attached either uh at indexing time and they will be stored along with the actual vectors in the database and they can also be included later on if you retrieve information from the vector database and send it to the LLM. So for example, we don't have only this one sentence, but we will include the recipe name or recipe ID like a unique identifier to the LM. And this basically helps to solve that problem. And the other thing is you can see here on the right hand side this is a typical recipe. It fits on one page and the other approach would be to like we don't need to cut or chunk this single document into too many pieces. We can it's not that big the information. So we could basically index the entire recipe as one chunk and provide a unique self-contained chunk with a recipe later on to the LLM. Next problem is hallucination. Yeah, you all probably have heard about the general LLM problem of hallucination. It's when an AI gener an AI generates information that like seems right but actually isn't right isn't factually right. Um so that is hallucination one of the big challenges in LLM because it's part of their feature but it's something we still don't want. Um when it comes to rack there's some kind of spec rack specific hallucination which this example shows for example which desserts are fitting to to fish is the question here. In this case, there was actually no knowledge found for this question, but the LLM came up with a information that seems right, but it's not grounded. So, that's also something we usually don't want to have that um it's not coming up with answers that are right, but that are not grounded. Um there are two possible solutions. One is easy, the other one would go more in detail. Um so what you normally want in rack systems design them in a way that they only answer with grounded information so that they are not answering but they when they have not

Segment 5 (20:00 - 25:00)

found suitable informations in their retrieval process. So one thing that you always should do and which works to a certain extent is um tune your system prompt so that it really does not come up with uh its own answers. The other one um would be would require more logic like having own guardrails checking is there actually knowledge received. Does it fit to the answer? So we have um after the AI generated answers a guardrail that explicitly checks if the answer was grounded or if the LLM came up with nonsense or non- grounded answers. Good. Next thing, — semantic matching fails on logic filtering. In our example here with a recipes, we have a combination of both unstructured and structured data. So you can see a screenshot from the database here. So we have the text information, the actual recipe, but we also have some structured information. How long does it take to prepare the meal? Uh how many servings do we have? What's the total time like combination preparation time and cooking time? And this is uh typically a a scenario that we also want to take into account when asking questions like here for example, we need a recipe with apples that can be prepared and cooked in under 10 minutes. So the the LLM answered with and provided a recipe that takes longer than than 10 minutes or so. We have like an entry in here that clears that there is a kiwi banana apple smoothie that can be done in 10 minutes. And this is basically a question how can you filter maybe you can jump to the next slide. Thank you. How can we filter information within the d vector database that it does not only rely on semantic search but so that it's only taking additional metadata like the cooking time or the preparation time into account at query time. So this is typically a combination how you can address these issues when having both structured and unstructured data. — Yeah. And also let's show this um like how in this example you can tackle this. So I again jump into our prepared more advanced workflow and we start with um the indexing pipeline that shows also more transformation and enriching logic which you normally want to do if you like um prepare your data for retrieval. Um so in this case um maybe first we check here that our plan is to not only store the se the chunk text but we store together with a couple of meta data because we want to use it for retrieval and for context enriching. So everything that we have as additional properties of a recipe, we add um certain uh properties here. And we also have um one property here like the total preparation time in hours. Um that is a numeric field. We want to use that for filtering because that um has actually the preparation time in hours. It's something we don't have in our recipe um uh data structure yet. um the it's really if you look in some um recipes really stored as a string like 10 minutes cooking 20 minutes um waiting so it's like in a text so we need to um retrieve this total um preparation time and we've done this also to demonstrate it uh with an AI logic in indexing that's actually something we use more and more to use AI logic to prepare your data for um better retrieval So um you probably can come up with other ideas. We use it for example to summarize support tickets or make sense of a large conversation really prepare the data for better retrieval. In this case is um it um gets the recipe data and its goal is to just extract somewhere here to extract the total preparation time. Give me a good list of the ingredients. So everything that is just in the text, it should extract it because we want to use that extracted data later in the retrieval. Um I'm not hitting run now because like it takes a while for each of the thousand recipes. It also shows another thing on your indexing pipeline. If you have a large amount of data, you want to do it incrementally. But to give you the idea um of what we are showing um yeah so what we have should we open VV? and we can do it later. Um so trust me the recipes are um chunked in bit bigger chunks now. So we have um sorry we have changed also the chunking so to

Segment 6 (25:00 - 30:00)

contain the complete uh recipe um and we've like also all have all the meta data in it. Good. So that is optimized for retrieval. So we have this part. Um should I show the retrieval agent as well or later? — You can do it. — Okay. So we have also an a more advanced setup here um for using the data. And what we see here it's not a naive rack approach. So we've changed a lot here. So there's an agent node, the recipe agent. Um, and it has access to retrieval tools. And in this case, it has access to the vector store like before. Um, and it also has access to another tool um that allows to get recipes by um semantical search and by preparation time. So if you shortly look here, don't be afraid if you've never seen this before. Um it's a tool um that allows the uh AI to give a query for the semantic search and also to give and pass over uh the total preparation time the meal should not yeah should fit. So we have now an agent that has access to two different retrieval tools. Um and if I now ask the question um yeah let's make two example um give me a nice apple pie recipe. No I want it. So there are no like time constraints given. So it's just using the normal recipe tool. Um it's requested just by apple pie. That's another benefit of this approach. It's not stupidly passing the user query, but it thinks of okay, I need to search for apple pie. This give me is not relevant for the semantical search. Um, and you see we have tuned also the output a bit. The meta data are included. So you see nice image. You see the nice recipe. So that looks much better. And let's try the other question. Um, give me a recipe uh that with apples that I can prepare in under 12 minutes. So, let's see if that is working. Now, it should detect and it did. Um, okay. I better use the other retrieval tool that allows me to also filter for um yeah preparation time. So this one is used in this case it's in a sub workflow and we get the microwave baked apples that has a total time of 10 minutes preparation. So something that the naive rag approach was not capable of catching. No other direction. Good. So what are some messages here? The one of the key message is definitely when you're um building a um like rag applications on your indexing pipeline really um think of how to make sense how to prepare my data for retrieval. So it's more like classical data pipeling ETL knowledge. So all that knowledge is required. You've probably also heard good data architecture empowers good AI. So um that is a like more typical high level approach of how to deal with data pipelines. You have your raw data. You have normally some loading uh process then somewhere to store the prepared data. It can even be a data lake for bigger companies or level one store and then you have all the like application specific transformation cleaning filtering enriching. Um we have seen that you can use AI for this. um something that was not able two years or three years ago. So really powerful tools um and then you store the data for better retrieval based on what you want to do. Yeah. And I mean we have seen a huge progress in in rack applications and northern rack architecture since jet GPT came out um some years ago. And I think in 2025 Atlantic rack with way more autonomy has become uh the mainstream architecture. And maybe we can jump one slide further and yeah and take a closer look. Daniel just briefly highlighted the agentic approach and what the agentic approach differs from the classical workflow style is that now the agent has way more autonomy. So it has this

Segment 7 (30:00 - 35:00)

clear goal assist us with answering recipe related questions but we don't have this strict sequential anymore. So we leave it up to the LLM to decide where whether it should query the vector database or use the other tool that we have provided us with. And the LLM can also decide whether an additional retrieval round with a different query might be necessary if the first round did not provide any any good answer. So there's way more flexibility and also way more autonomy. Think we can go. Yeah. And and here's a here's a an example what you can see. And maybe we can drop this slide. Yeah, which I saw in the live demo. — Saw it. Yeah, — but yeah, what I mean you also see the nice debugging of N here. I think I've not shown it. So you see what the LLM is passing to that new tool. It's passing the query and it's passing the maximum cooking time and then you only get knowledge that fits both properties. Good. And of course I mean all the terms are pretty new. So everyone there's also not a very clear definition of what exactly is a genic rag and what is an agent and I think it's very overlapping um because if you think of like the pure AI agent approach um in an rag example you are of course not only having the like wellroven vector databases to get semantical fitting content but of course depending on the task that your AI node needs to solve you can give them much more tools to also retrieve company knowledge. Um something which is getting very yeah mature or popular is text to SQL. So models are really good in writing SQL and you can prepare data in a format you want to um have it for the agent. Like for example summarize how many recipes do I have in each category. That would be something um the agent would be um able to formulate an SQL query and give you the answer. Um or craft database um is also getting very popular because it gives other ways to retrieve knowledge based on taxonomy, based on uh craft traversal um also quite sophisticated um and so forth. API calls to retrieve information MCP um you name it. — Good. But how to make how to know uh what to do? So good. — Yeah. We've learned that agents are now able to handle more and more logic by itself. And the qu the question is how can we guarantee the quality of our AI applications with more and more logic and autonomy that we give the agents. And this is where AI evaluation uh be comes into play and it actually plays a really essential um role in real world enterprise AI applications. So what AI evaluation basically means is checking how well an AI system works and whether it gives reliable correct and useful. — How many time do we have — results? They are writing out loud. — Yeah. And why do we need that? We we've all learned LLMs are not deterministic. Same questions may uh may lead to different results. And we also see that the models are evolving. So if you stay start uh on day one with a specific version of a model and everything is working fine, but uh maybe OpenAI is releasing a new world version four weeks later and you have to decide um should I upgrade uh to the new version because it might be better. But you never know really and without automated evaluation you can have to rely on manual spot tests for a couple of questions or on your gut feeling and most is not really a good foundation for an enterprise project. So in our example scenario what evaluation would mean like we are requesting information or suggestion for a specific recipe and as a human we have in mind how an expected response would like and then we can compare the expected response with the actual response from the AI and we can decide well is this is this a good answer or not. The challenge is with with real world applications having spot tests does not really move the needle. You need an a big evaluation set with like hundreds of questions and answers in

Segment 8 (35:00 - 38:00)

order to really do a 300 360 degree evaluation on a variety of different question and answers. So what we see here is a a data set in NN where we prepared like example questions with expected answers and actual answers. And this basically built the foundation for our evaluation. So we could now go through that list and compare the actual answers with the expected answers. The problem is that is not scalable for a human. I mean you can take a look at four or five of them but uh it's getting boring and it's slow. So we need some sort of organization how we can we compare actual results with expected results and all the familiar uh mechanisms like having rag x or something like that in a senses doesn't work anymore because the the actual results may uh differ with every call. So we need an automatable mechanism that is able to understand and compare expected answers and actual answers. And this is again another case where LLMs come into play. So we can use another LLM to compare actual results with the expected results and let the LLM rate how good is this answer. Is this is a good answer? bad helpful answer? Is this maybe a correct but not helpful answer? So we can build different kind of metrics and let the LLM basically act as a judge for our automatable AI evaluation. And what we see here in NN, we have the agent that we have already seen and we have our evaluation data set and we can add specific evaluation nodes in edit n and they will basically take all the questions from our data set, feed them through the agent, take the actual answers, write them in the data store, and then let the LLM as a judge compare all the rows, expected answers and actual answers and then calculate metrics on this huge or on this larger data set. And then we have real numbers that help us to decide did a change of the model improve the results on on specific metrics or did it decrease it. So we can have way more control than having to rely on gut feeling or manual spot tests. So this brings us to summary more or less just in time. Yeah, three key takeaways that we have seen today. Um first thing on your indexing pipeline because this is where you prepare data for retrieval that is part of your business logic typically. Um second thing um we see the pipel the timeline. So rack paradigms, rack patterns, tool usage they will constantly improve. So we expect that using external knowledge is getting uh even better and better. Um and to leverage it and to actually see if something gets better or not when you start tuning your retrieval or your rack application evaluation is a must have as we have just seen here. We also say evaluation uh first is how you should approach it to evaluation driven ai development that is kind of the pendong um to test driven development in classical coding. Yeah. And that's it. We have some questions. Thank you.

Другие видео автора — n8n

Ctrl+V

Экстракт Знаний в Telegram

Экстракты и дистилляты из лучших YouTube-каналов — сразу после публикации.

Подписаться

Лучшие методички за неделю — каждый понедельник