two-step process of how rag works underneath. So, here I have HR policy document from Atliq, which will have a section on retirement benefits, okay? So, now if I go to ChatGPT, and if I copy this particular section, okay, and I ask my question related to contribution to employees retirement fund, in that case, ChatGPT will be able to answer that question. Because here you are asking question and providing uh knowledge as a reference in the context, and it can pull the answer. But, what if your HR policy document is 3,000 page PDF, okay? What if that knowledge is very big? What's going to happen in that case is you will run out of your context window limit. And even if you have a huge context window, uh you should still not feed the entire knowledge because it will be too many tokens, it will be costly. So, what people do is they will chunk this document. So, they will create, let's say, basic strategy is fixed-size chunks. And then, for a given question, you can pull the relevant chunks. So, for this particular question, let's say my first chunk is 70% uh probability that it will contain the answer. Second chunk is 60% match. And you can have, by the way, I'm showing just three, but you can have 1,000 chunks, and some of the chunks might have 5% or even 10% possibility that it may contain the answer. Let's say the chunk contains uh information on uh when Atliq was founded, culture, founders, etc., then that doesn't has anything to do with the retirement question that you are asking, okay? So, the relevance of that chunk will be very, very low. Now, how do you exactly find this kind of similarity? So, there is this concept of embeddings, okay? So, embedding is a process of converting text into a vector such that it can represent its meaning, okay? So, all the chunks, you will convert them into vector embeddings, and then you will store them into a vector database. This is different than your regular database. Your regular database can search using exact values, whereas vector database will be able to search using the meaning. So, when you search for, let's say, uh a company that is a leader in electric vehicle, it will return Tesla uh from the database. So, it is searching based on the meaning, not based on the exact word. To generate embedding, you can use variety of models, sentence transformer, text embedding three small, and so on. And there are many vector database choices that you have in market, Milvus, Quadrant, Chroma DB, and so on. This step is called indexing. This is the first step in rag process, where you are indexing all these vectors of chunks into a vector database. The second step is retrieval, where for a given question, you will generate embedding using the same embedding model. Then, you will try to find the relevant chunks in a vector database. So, here it is doing the semantic search, giving you a relevant vectors. You can specify top K factor, let's say I need two chunks or five chunks, and so on. And then, you will generate the actual text out of those chunks, and you will put it in your prompt along with the question. And when the question is given to LLM, it will give you the answer. So, here uh below the question, what you are doing is you are providing only the relevant chunks. So, this way LLM will not hallucinate, and it will give you accurate answer. That takes us
example. Let's now look at rag categories. The first one is a vector rag, and naive rag is the example that we just saw, where you pull the top K relevant chunks from a vector database and answer user's question. The second category is vectorless rag, in which you can perform a keyword rag. So, here you are not generating any vector embeddings. You don't have a vector database, but you are using keyword match, uh techniques like BM25, TF-IDF, etc., to uh query into the document using the exact keywords. This method will work when you have a lot of codes, jargons, IDs, citations. Let's say you are doing research, and you are always searching using some particular ID or a particular keyword, then this will work. This is weak for semantic understanding. When you are not doing exact keyword matching and searching using meaning, this is not a good choice. And the key tools that uses keyword rag concepts like BM25 is Elasticsearch and Apache Solr. The next category in vector rag is hybrid rag, where you're combining vector search and keyword search, okay? You do both of these in parallel and merge the results. This is best for most of the production systems. The key tools here are Elasticsearch plus any vector DB. Now, in Atliq, we worked on one rag project for our client, where we have developed our own custom hybrid method for doing rag, and we have given details of this approach in a different video. You can check it out if you are interested. Also, if you want to learn AI engineering by building production grade AI systems similar to the projects that I just mentioned, then check our AI engineering cohort where we have live sessions on weekends and we will teach you all the concepts plus you will build eight plus production grade projects. The next category in vector less rag is a graph rag. It is also known as KG rag. So here you will generate a knowledge graph. So let's say your knowledge is Elon Musk and all the companies he has founded. So in that case you will build this kind of knowledge graph where you will say Elon Musk founded Tesla, SpaceX, Neuralink, OpenAI and so on. And then these companies will be operating in these different domains. So these are all the entities and they are connected through some kind of relationship. Now when you ask a question, which companies are founded by Elon Musk which are working in AI, you will traverse this particular path, okay? So you will look at all the companies and then you will do breadth first traversal and you will find that OpenAI is working in AI. The next one is SQL rag. This is also known as text to SQL. This method is very simple. Let's say you have sales database which contains the sales of products. Now you are asking this question, which product sold the most last month? Using LLM you can first generate a query for that database. You will execute the query, get the results and then give it back to LLM to generate a comprehensive answer. Very simple technique. You are taking a sentence in a natural language, converting it to SQL using LLM and putting a query in your database to get the results. And the last method, which is relatively new, is called page index. It is reasoning based rag. So here let's say you have 3,000 page PDF document. First you will generate the table of content, okay? The table of content or your information structure. This is like you are having a book and you are having all the chapter and topic layout. Now when somebody asks this question, what does the contract say about compensation for breach of contract, the LLM will use its reasoning capability and this particular table of content to traverse this particular graph and locate the thing that it is looking for. So for example, in this case it will first find out that this is related to performance of contracts because the contract is already executed. So it has to be related to this and then it finds compensation of breach. So it goes from here to here and then you are discussing loss. So due to that it will out of all these nodes, it will go to this particular node and it will pull the relevant document. Now this might give you an index and using index you might have to refer back to the original knowledge. So here is the GitHub for page index. It is known as vector less rag but it is one of the categories of vector less rag, okay? The right term is reasoning based rag. So here you can see from document you are generating a tree, which is your knowledge tree structure index of documents and then LLM will do its reasoning to find the relevant chunk. Here you are not using any vectors. embeddings. No vector DB. Just by looking at the structure, you know, the table layout, which looks something like this, you will try to find a given node, okay? And see here there is a summary. So using the summary, LLM can reason and it can say, "Okay, maybe the answer is in this particular node. " And then it will go to that node, refer to the original document and pull the answer. I have