Introduction to RAG
(Retrieval-Augmented Generation)



Learn how RAG improves LLM responses by retrieving relevant context from your data and generates answers that are safer, more reliable and more accurate.
Learn how RAG improves LLM responses by retrieving relevant context from your data and generates answers that are safer, more reliable and more accurate.
Learn how RAG improves LLM responses by retrieving relevant context from your data and generates answers that are safer, more reliable and more accurate.
Definitions
LLM - Large language Model
Examples: OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, etc.)
RAG - Retrieval Augmented Generation
Background
RAG is a popular Gen AI concept that was developed in 2020.
You may have already used it without knowing it.
High Level Description
RAG is the process of retrieving relevant context from a data store, combining it with your input, and feeding it to an LLM so that a more accurate and relevant response is generated.
Question
What happens when you upload a file to ChatGPT and ask it a question?
Answer
ChatGPT leverages RAG to generate the responses. Here’s how:
File is uploaded to ChatGPT.
File is split into chunks.
Chunks are inserted into a data store.
Then, when you send a question to ChatGPT, it executes the following steps:
Retrieves the relevant chunks from the data store.
Combines it with your question.
Feeds it to the LLM.
And then, voila, the LLM generates a response.
That’s an example of RAG in the wild!
Benefits
Generate responses that are more accurate and relevant than calling an LLM directly.
Customize how responses get generated by an LLM.
Customize how responses get generated by an LLM.
Users can explicitly define the scope of the material used to generate the responses.
Receive responses with their sources cited.
Easy to fact check and get the exact section of a document that was used to generate the response.
And many more!
RAG Call Flow Diagrams
Definitions
LLM — Large language Model
Examples: OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, etc.
RAG — Retrieval Augmented Generation
Background
RAG is a popular Gen AI concept that was developed in 2020.
You may have already used it without knowing it.
High Level Description
RAG is the process of retrieving relevant context from a data store, combining it with your input, and feeding it to an LLM so that a more accurate and relevant response is generated.
Question
What happens when you upload a file to ChatGPT and ask it a question?
Answer
ChatGPT leverages RAG to generate the responses. Here’s how:
File is uploaded to ChatGPT.
File is split into chunks.
Chunks are inserted into a data store.
Then, when you send a question to ChatGPT, it executes the following steps:
Retrieves the relevant chunks from the data store.
Combines it with your question.
Feeds it to the LLM.
And then, voila, the LLM generates a response.
That’s an example of RAG in the wild!
Benefits
Generate responses that are more accurate and relevant than calling an LLM directly.
Customize how responses get generated by an LLM.
Customize how responses get generated by an LLM.
Users can explicitly define the scope of the material used to generate the responses.
Receive responses with their sources cited.
Easy to fact check and get the exact section of a document that was used to generate the response.
And many more!
RAG Call Flow Diagrams



Diagram 1: User Uploads a File
Step 1: User uploads a document to ChatGPT.
Step 2: ChatGPT breaks down the document into small chunks (ex: 100 word blocks) and inserts them into a data store.
Diagram 1: User Uploads a File
Step 1: User uploads a document to ChatGPT.
Step 2: ChatGPT breaks down the document into small chunks (ex: 100 word blocks) and inserts them into a data store.



Diagram 2: User Sends a Question
Step 1: User asks ChatGPT a question.
Step 2: ChatGPT retrieves the relevant chunks from the data store based on question.
Step 3: ChatGPT provides both the relevant chunks and the question to the LLM.
Step 4: The LLM generates a response using all that information.
Step 5: ChatGPT provides a response to the user.
Resources
https://aws.amazon.com/what-is/retrieval-augmented-generation/
https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/
https://en.wikipedia.org/wiki/Retrieval-augmented_generation
Diagram 2: User Sends a Question
Step 1: User asks ChatGPT a question.
Step 2: ChatGPT retrieves the relevant chunks from the data store based on question.
Step 3: ChatGPT provides both the relevant chunks and the question to the LLM.
Step 4: The LLM generates a response using all that information.
Step 5: ChatGPT provides a response to the user.
Resources
https://aws.amazon.com/what-is/retrieval-augmented-generation/
https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/
https://en.wikipedia.org/wiki/Retrieval-augmented_generation
© Qikr AI 2025