What Is RAG? A Simple Guide to Retrieval-Augmented Generation

Introduction to RAG

(Retrieval-Augmented Generation)

Learn how RAG improves LLM responses by retrieving relevant context from your data and generates answers that are safer, more reliable and more accurate.

Definitions

LLM - Large language Model

Examples: OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, etc.)

RAG - Retrieval Augmented Generation

Background

RAG is a popular Gen AI concept that was developed in 2020.

You may have already used it without knowing it.

High Level Description

RAG is the process of retrieving relevant context from a data store, combining it with your input, and feeding it to an LLM so that a more accurate and relevant response is generated.

Question

What happens when you upload a file to ChatGPT and ask it a question?

Answer

ChatGPT leverages RAG to generate the responses. Here’s how:

File is uploaded to ChatGPT.
File is split into chunks.
Chunks are inserted into a data store.
Then, when you send a question to ChatGPT, it executes the following steps:
Retrieves the relevant chunks from the data store.
Combines it with your question.
Feeds it to the LLM.
And then, voila, the LLM generates a response.

That’s an example of RAG in the wild!

Benefits

Generate responses that are more accurate and relevant than calling an LLM directly.
Customize how responses get generated by an LLM.
Customize how responses get generated by an LLM.
1. Users can explicitly define the scope of the material used to generate the responses.
Receive responses with their sources cited.
1. Easy to fact check and get the exact section of a document that was used to generate the response.
And many more!

RAG Call Flow Diagrams

Definitions

LLM — Large language Model

Examples: OpenAI’s ChatGPT, Google’s Gemini, Anthropic’s Claude, etc.

RAG — Retrieval Augmented Generation

Background

RAG is a popular Gen AI concept that was developed in 2020.

You may have already used it without knowing it.

High Level Description

RAG is the process of retrieving relevant context from a data store, combining it with your input, and feeding it to an LLM so that a more accurate and relevant response is generated.

Question

What happens when you upload a file to ChatGPT and ask it a question?

Answer

ChatGPT leverages RAG to generate the responses. Here’s how:

File is uploaded to ChatGPT.
File is split into chunks.
Chunks are inserted into a data store.
Then, when you send a question to ChatGPT, it executes the following steps:
Retrieves the relevant chunks from the data store.
Combines it with your question.
Feeds it to the LLM.
And then, voila, the LLM generates a response.

That’s an example of RAG in the wild!

Benefits

Generate responses that are more accurate and relevant than calling an LLM directly.
Customize how responses get generated by an LLM.
Customize how responses get generated by an LLM.
1. Users can explicitly define the scope of the material used to generate the responses.
Receive responses with their sources cited.
1. Easy to fact check and get the exact section of a document that was used to generate the response.
And many more!

RAG Call Flow Diagrams

Diagram 1: User Uploads a File

Step 1: User uploads a document to ChatGPT.

Step 2: ChatGPT breaks down the document into small chunks (ex: 100 word blocks) and inserts them into a data store.

Diagram 1: User Uploads a File

Step 1: User uploads a document to ChatGPT.

Step 2: ChatGPT breaks down the document into small chunks (ex: 100 word blocks) and inserts them into a data store.

Diagram 2: User Sends a Question

Step 1: User asks ChatGPT a question.

Step 2: ChatGPT retrieves the relevant chunks from the data store based on question.

Step 3: ChatGPT provides both the relevant chunks and the question to the LLM.

Step 4: The LLM generates a response using all that information.

Step 5: ChatGPT provides a response to the user.

Resources

https://aws.amazon.com/what-is/retrieval-augmented-generation/

https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/

https://en.wikipedia.org/wiki/Retrieval-augmented_generation