RAG Для Документов с Картинками и Графиками Делаем Шаг За Шагом (n8n, ConvertAPI)
16:52

RAG Для Документов с Картинками и Графиками Делаем Шаг За Шагом (n8n, ConvertAPI)

ИИшенка | AI Automation 17.08.2025 3 906 просмотров 168 лайков обн. 18.02.2026
Поделиться Telegram VK Бот
Транскрипт Скачать .md
Анализ с AI
Описание видео
🚀 Усиленный вариант этой автоматизации, который работает даже с отсканированными документами и таблицами внутри них, ищи в Pro-сообществе тут: https://t.me/iishenka_pro_bot ⭐️ Бесплатные материалы из этого видео тут: https://t.me/+W1SnvvkcV6A3NWMy В этом видео я расскажу, как сделать ваш RAG ещё мощнее. Что особенного в этом уроке? Мы не только изучим, как векторизовать данные из графиков и картинок, но и построим пример такой автоматизации по шагам на n8n. Следующее видео: https://youtu.be/xbF8GfXx7PE 🔥 Независимо от того, работаете ли вы с AI-агентами или только начинаете осваивать автоматизации в n8n, этот урок поможет вам овладеть процессом настройки ИИ для любых задач. 💡 Не забудьте поставить лайк и подписаться, чтобы не пропустить новые уроки по n8n и AI-агентам. Давайте сделаем AI-автоматизации простыми! 🙌 Тайм-коды: 00:01 - Введение. Зачем это? 00:50 - Теория RAG 02:18 - Строим автоматизацию 04:36 - Извлекаем картинки из файлов 07:30 - Анализируем изображения 11:50 - Настраиваем векторизацию 14:50 - Настраиваем агента 15:50 - Усложнённая версия автоматизации Я — Илья Бовкунов, основатель и СЕО Sendforsign — это компания, занимающаяся AI-автоматизацией договоров и документооборота. В прошлом был Директором по продукту и продуктовому дизайну в международных AI-стартапах. Позвать в подкаст или предложить другое сотрудничество aiiszdes@gmail.com Не забудьте поставить лайк, подписаться и нажать на колокольчик, чтобы не пропустить новые видео об AI-агентах и автоматизациях!

Оглавление (9 сегментов)

  1. 0:00 <Untitled Chapter 1> 14 сл.
  2. 0:01 Введение. Зачем это? 178 сл.
  3. 0:50 Теория RAG 276 сл.
  4. 2:18 Строим автоматизацию 449 сл.
  5. 4:36 Извлекаем картинки из файлов 484 сл.
  6. 7:30 Анализируем изображения 839 сл.
  7. 11:50 Настраиваем векторизацию 540 сл.
  8. 14:50 Настраиваем агента 172 сл.
  9. 15:50 Усложнённая версия автоматизации 194 сл.
0:00

<Untitled Chapter 1>

Friends, hello everyone. Well, today is another mega-practical video. Today we will figure out
0:01

Введение. Зачем это?

how we vectorize documents that contain texts, pictures, and graphics. The topic of cancer is a topic that is regularly raised on our channel. And today we are doing the next iteration and delving into this topic. But before we go that deep, be sure to subscribe, like, leave a comment, I will be very grateful. Let's go. By the end of this video, we will build an automation in which we will upload our files through a Telegram bot, extract texts from these files, extract images from these files, try to analyze these images and do it in such a way that all vectorization is saved to our database. At the very end, we will check how it works using an agent. As always, this automation at the very end of the video will be downloaded from here and uploaded to my free Telegram channel. The link to it will be in the description. Let's talk a little about the theory. The topic of cancer is a topic that we have touched upon many, many times. But
0:50

Теория RAG

every time we talked about it, we worked mainly with text files or with Excel, with some tabular forms. But most often, when we talk about some real documents, these documents contain not only text, but they also contain tables, they also contain pictures, some graphs. And a very common question that can be found in the comments under such videos is how to work with this? Is it possible to vectorize pictures in such a way that it would be possible to search for data by these elements? When we work with classic documents, we usually extract text, form chunks from it, vectorize these chunks separately. But what is the concept of working with documents that contain other elements? The key to the approach is that we should not try to vectorize such documents as a single object. First, as before, we extract the texts, then we extract all the images, then we extract all the graphics, which are most often a special case of images. After these images are extracted, we save them separately, pass them to a specialized neural network for image analysis. Image analysis returns us a description of this image. And after that, all the text that is in this document and the descriptive part of the images are placed in a vector database, from which we can subsequently read both the texts and the description of the images. Naturally, we must ensure that in this database our images and chunks with text are somehow generalized. In this video, we will do this based on the name of the document itself. Well, let's get down to automation. Open a new project
2:18

Строим автоматизацию

in NVN. Let's come up with some way to put files in NCM. There are a lot of options. I came up with a way. I will simply throw in files, pdfs, right? I will make myself a trigger via Telegram on message. I'll add myself some crannies. Yes, if you don't know how to add crannies, watch the very first video. It's easy. And after that, I'll immediately add a node, as well as one from Telegram, called get file. Get a file. This is a node that allows you to download a document's binaries by its ID. Let's run a primary test right away, launch this node and upload some PDF file. I have some really weird report here, yes, but the main thing is that there are graphs, there is some data. Let's try it. We upload a file here, send it and see what we've got. Yes, great, our file has arrived. We see this, we grab it right away and see that it has an ID file. This means that the next node, yes, if we connect it, we'll be able to upload the ID file to the input of this node. And we already remember, yes, that we can download the binaries themselves by the ID file. Great. Here we have downloaded the binary. We can open it. Here is the same file that we just uploaded to our Telegram channel. Excellent. After that, as we have already discussed, we need to, uh, split the content of this file into several parts, right? That is, we know for sure that it contains text, we know for sure that it contains some kind of picture, some kind of graph. How do we extract texts from such files? In NVCN there is a node called extract, yes, extract from file. And here, or rather, this is a set of nodes, which has different nodes for different occasions. Yes. Now we definitely want to use the extract from PDF node, right? We just throw it in here. This means that as soon as we try, and, in principle, it is already configured by default, right? As soon as we try to execute it, look, if we switch to Jon, we see that here it is, all the text, uh, from the binary, ah, ours has been extracted. This is great. But, of course, if we look more closely, we will see that there are no images here, and we know for sure that they were in the file. How do we act in such cases? I suggest you use a specialized service. The service is called Convert API. I use it myself all the time. It
4:36

Извлекаем картинки из файлов

has different APIs for different occasions. And here there is a mega important, ah, one API, which we will now look at together. You log in here, there is some kind of free usage, right? That is, do not be afraid of this. Log in here, go to your dashboard. And here there is a choice of what we want to convert from, where, right? We will do, we want from PDF. And here there is a separate function called image extraction, right? And look, it selects for us some API end point, which is called extract PDF images. This is exactly the endpoint that will help extract all-all images from the PDF. And now we will try to do this. Look, there is already a Snippet code here. Yes, we again take all-all this Snippet code, go back to our automation, yes, and add an HTTP node here. Here we click from node, or rather, import from. Look, it has already specified all-all the fields we need, including even just now it has checked my API key itself. Here. And everything else is specified here. The only thing that needs to be written is where we expect data for this endpoint. The previous node itself returned a binary object to us, yes, the date. We write the date here. And basically, since he even put our API key here, we are ready to work. If he did not put the authorization key in the API, yes, you go here, you go to your dashboard, click authentication, and here you see your test keys, your production keys and then with this key you put your authorization. Listen, well, let's try. Click Execute. Look, we got some results back very, very quickly. Look, this is an array that contains 1 2 3 4 5 elements at once. That is, we expect that he extracted five pictures and even provided us with a URL to these pictures. Let's try to open some, uh, random picture. Okay, look, something downloaded quickly for us. Let's open it. That is, this service extracted with maximum accuracy exactly the specific picture, yes, the sequence of pictures, which is contained in our text document. Well, that's great. What do we do next? Now we want to turn the array of five elements into five elements, in order to process them individually. How is this done? We look for, uh, the node called Split out. We connect it here and see by which fields we want to split. Yes, this will be an array so that it gives us exactly five elements or however many files there will be. Let's check it right away. Yes, look, it gave us five elements. Great. That is, we now have five pictures. Now we can want to, uh, analyze each picture, how do we do this. Again, yes, we open AI, and so we see
7:30

Анализируем изображения

what nodes are here. And here is a node called image analytics. You should understand that, in principle, image analysis is already in most neural networks. Choose a neural network that you like, for example, how it analyzes pictures. We already have UPI connected, and we know for sure that it works great with URLs, yes, with links. That is, we can take and give here our link to any picture, yes, and ask it. Here, the prompt is already written, what is on this picture. Look, for this experiment, let's immediately choose another model that is not very large for the experiment. Well, 4 o min. Look, for this video, yes, I will leave a simple description of what is on this picture. You should understand that for your cases, for example, if you know that your pictures are some specific, you can directly write here, what do you expect from it, what specific nuances, what you need to pay attention to. For example, if it's some kind of graph, yes, you can say: "Be sure to indicate the name of the axes, be sure to indicate what periods these graphs are divided into." That is, something like this, so that it directly pays attention and tries to understand it from the picture. If you leave the prompt as is, as I will leave it now, it will give you a generalized description, which most often also works fine. Well, let's try. You and I remember, yes, that we feed five elements to the input. This means that now the procedure itself will scroll five times, yes, in exactly this way. So, we give this to the neural network, wait for some time. Look, and five elements are returned to us. That is, I wrote the prompt in English, yes, what is in this picture, so it gave me a description in English. You indicate what is important to you, for example, in Russian. Although for cancer we know it is not particularly important, yes, but if you care about readability yourself, look, it writes that what is there, let's say, picture number two, yes, that same graph contains a graph of revenue and profit dynamics, financial data for several months is presented, that is, a blue graph, a line, yes, this is revenue, millions in rubles, yes, and everything-everything-everything else, look, it describes in quite a lot of detail, yes, you again adjust the detail using promt. If you are satisfied, leave it like that. But what is important to us is the fact that we have just essentially received two blocks of text from this file, right? That is, the first text is the text itself, which is contained in the file. The second block is the descriptions of all the pictures that we found in this file. But we know, yes, that we already know how to deal with text and vectorize it perfectly. What do we do next? Next, I want to aggregate, yes, everything that the neural network gave us. After all, we gave it five elements, and it gave us five elements. And for vectorization, we will now take it directly, we know that the previous node gives everything, yes, in the content field. Accordingly, if we now ask, a, to group, yes, all these nodes, we see that now we have a single unified text, yes, great. Now we know that all the descriptions of all the pictures that we have are now a single text, and we can try to get closer to vectorization. Just let's go back to our previous node, nodify. We remember that it also gives us all the text, yes, only in the text field. We will now do a little life hack. If we know that here at the output we had was a failure, let's add another renaming node, yes, we can use the Edit Fields node for this. We will now place it closer to extract fields, or rather, to extract from file. And here we know for sure that everything that comes to us from the text, yes, we just throw it into it and say, call it content. This means that from this node, yes, then content will come out, but already with the text from the PDF, that from this node content will come out, yes, only with descriptions of all the pictures. So, and let's do a grand test and see how it all works. Look, in what sequence. Yes-yes, it first went here, yes, or rather, first extracted the text, then went, uh, extract pictures, analyze all the pictures. It is important for us to see what we ultimately have at the output of this workflow. So, at the output of this Oflow, look, we have the very description of all-all-all the pictures, yes, which we need. And here we have all-all the text, which we also want to vectorize. So, great. How do we vectorize? Yes, again, we have done this with you a lot of times.
11:50

Настраиваем векторизацию

Doba Let's add a node for vectorization, yes, from Superbase. This will be the Subase Vector Store node, in which we, look, will do this. We will enter here both the text and the description of the images. Next, we will figure out why we do this. What do we write inside? We create a table in abase, yes, a vector table. Again, if you have never created them, watch the previous video. We have already worked with Superabase many times and created tables for them. We write our, yes, Docum YT table. Here I just created it, yes, Docum YT. We see that it is completely empty, there is nothing in it. And then we need to add data. We add datoader here. Look, there will be a specific description of what we want. We know exactly what from the previous node, yes, that we get content from this node, that we get content from this node. This means that in the dataloader we can simply write jon content. Regardless of whether it is text, yes, which came from our first node, or this is writing pictures. We should be able to vectorize it after that. How do we preserve the unity of pictures and texts that we vectorize? We add metadata. Click the Property button here. Add metadata. And here we write that we want to save the name file. Where do we get the name file from? Since at the very beginning, yes, in the very first Telegram Trigger node, there is already a name file, that is, here it is, yes, a document. We don’t need to invent anything, we just throw in this value node. And thus, from Telegram, we will also throw in the name file itself during the vectorization of each chunk of this information. We still have to throw in some kind of splitter text here. Let's throw a recursive chunk size 1.000. Let it remain 1.000, but this is not particularly important now. Well, and the model for embeddings, since we already have PNI connected, let's use it. Text to embedding 3 Small. Well, it turns out that we need to do a grand test and see how all this is vectorized. Let's launch it, let's see how it is. We have already completed the first vectorization of the text of this document, right? And now Workflow is also waiting for the end of the image analysis, then their aggregation and then vectorization of the entire text that we will receive from Open AI. Look, yes, our vectorization is in progress and we are going to our fuss. Look, we already have several lines at once. The text itself, which we vectorized, was divided into three whole chunks. And then look, each picture, yes, the description of this picture is vectorized separately. And the most important thing is that the name of the file from which these pictures were taken is saved in the metadata. So we can organize some kind of community, right? That is, we can always track which file both the text and the description of these pictures came from. Now the most interesting part. Let's try to ask our agent something based on this data. I directly
14:50

Настраиваем агента

take agents from our previous video using copy-paste, because they do not differ from video to video. The only thing is, here I directly write down that we take, yes, data from the documents white table, including metadata. And everything else remains approximately the same. Let's try to open a chat and ask: "Look at the graphs, what about our revenue." And now, ah, we see that it has already accessed the vector knowledge base and is forming its answer. According to the latest data, the company's revenue in July 2025 was 5.2 million rubles, which is 3% more. Here are some interesting details. The cost of production was 3.1 million, although revenue is growing, net profit remains at the level of 900,000 rubles. He took all this data from the graphs. That is, you understand that this is, well, a completely different qualitative level of analysis of your data. And thus you can vectorize your documents, including all the images and graphs. Look, a separate topic, and it remains more complex.
15:50

Усложнённая версия автоматизации

And how to vectorize, for example, documents that were scanned, or what to do with tables inside documents? This is a more technical thing. There is no need to bother more. There are special services for recognition, good recognition of such documents. We did this automation in our progroup. And there will be a complicated version of this automation, where we even remember specific places where there were pictures, leave them in the text so that later it would be possible to trace all the cause-and-effect relationships. Well, and, of course, we also look there at how we work with tables that are found in documents, and even turn them into markdowns, that is, into real tables, so that later Neyronka reads them, for example, line by line, they are like a single array of text. So, if you are interested in this, be sure to watch it in the progroup. And now we have figured out how to build your automation in 10-20 minutes, which carefully extracts data from your pdfs and images from your pdfs and victorizes them for your subsequent analysis. Thank you all for watching and have a nice day.

Ещё от ИИшенка | AI Automation

Ctrl+V

Экстракт Знаний в Telegram

Транскрипты, идеи, методички — всё самое полезное из лучших YouTube-каналов.

Подписаться