Introduction to RAG

(Retrieval-Augmented Generation)

Learn how RAG improves LLM responses by retrieving relevant context from your data and generates answers that are safer, more reliable and more accurate.

Learn how RAG improves LLM responses by retrieving relevant context from your data and generates answers that are safer, more reliable and more accurate.

Learn how RAG improves LLM responses by retrieving relevant context from your data and generates answers that are safer, more reliable and more accurate.

Definitions


LLM - Large language Model

Examples: OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, etc.)


RAG - Retrieval Augmented Generation



Background


RAG is a popular Gen AI concept that was developed in 2020.


You may have already used it without knowing it.



High Level Description


RAG is the process of retrieving relevant context from a data store, combining it with your input, and feeding it to an LLM so that a more accurate and relevant response is generated.



Question


What happens when you upload a file to ChatGPT and ask it a question?



Answer


ChatGPT leverages RAG to generate the responses. Here’s how:


  1. File is uploaded to ChatGPT.

  2. File is split into chunks.

  3. Chunks are inserted into a data store.

  4. Then, when you send a question to ChatGPT, it executes the following steps:

  5. Retrieves the relevant chunks from the data store.

  6. Combines it with your question.

  7. Feeds it to the LLM.

  8. And then, voila, the LLM generates a response.


That’s an example of RAG in the wild!



Benefits


  1. Generate responses that are more accurate and relevant than calling an LLM directly.

  2. Customize how responses get generated by an LLM.

  3. Customize how responses get generated by an LLM.

    1. Users can explicitly define the scope of the material used to generate the responses.

  4. Receive responses with their sources cited.

    1. Easy to fact check and get the exact section of a document that was used to generate the response.

  5. And many more!



RAG Call Flow Diagrams

Definitions


LLM — Large language Model

Examples: OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, etc.


RAG — Retrieval Augmented Generation


Background


RAG is a popular Gen AI concept that was developed in 2020.


You may have already used it without knowing it.


High Level Description


RAG is the process of retrieving relevant context from a data store, combining it with your input, and feeding it to an LLM so that a more accurate and relevant response is generated.


Question


What happens when you upload a file to ChatGPT and ask it a question?


Answer


ChatGPT leverages RAG to generate the responses. Here’s how:


  1. File is uploaded to ChatGPT.

  2. File is split into chunks.

  3. Chunks are inserted into a data store.

  4. Then, when you send a question to ChatGPT, it executes the following steps:

  5. Retrieves the relevant chunks from the data store.

  6. Combines it with your question.

  7. Feeds it to the LLM.

  8. And then, voila, the LLM generates a response.


That’s an example of RAG in the wild!


Benefits


  1. Generate responses that are more accurate and relevant than calling an LLM directly.

  2. Customize how responses get generated by an LLM.

  3. Customize how responses get generated by an LLM.

    1. Users can explicitly define the scope of the material used to generate the responses.

  4. Receive responses with their sources cited.

    1. Easy to fact check and get the exact section of a document that was used to generate the response.

  5. And many more!



RAG Call Flow Diagrams

Diagram 1: User Uploads a File


Step 1: User uploads a document to ChatGPT.


Step 2: ChatGPT breaks down the document into small chunks (ex: 100 word blocks) and inserts them into a data store.

Diagram 1: User Uploads a File


Step 1: User uploads a document to ChatGPT.


Step 2: ChatGPT breaks down the document into small chunks (ex: 100 word blocks) and inserts them into a data store.

Diagram 2: User Sends a Question


Step 1: User asks ChatGPT a question.


Step 2: ChatGPT retrieves the relevant chunks from the data store based on question.


Step 3: ChatGPT provides both the relevant chunks and the question to the LLM.


Step 4: The LLM generates a response using all that information.


Step 5: ChatGPT provides a response to the user.



Resources


https://aws.amazon.com/what-is/retrieval-augmented-generation/


https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/


https://en.wikipedia.org/wiki/Retrieval-augmented_generation

Diagram 2: User Sends a Question


Step 1: User asks ChatGPT a question.


Step 2: ChatGPT retrieves the relevant chunks from the data store based on question.


Step 3: ChatGPT provides both the relevant chunks and the question to the LLM.


Step 4: The LLM generates a response using all that information.


Step 5: ChatGPT provides a response to the user.



Resources


https://aws.amazon.com/what-is/retrieval-augmented-generation/


https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/


https://en.wikipedia.org/wiki/Retrieval-augmented_generation

© Qikr AI 2025