Retrieval-Augmented Generation (RAG)

RAG grounds an AI's answers in your own documents . Instead of relying only on training knowledge, you retrieve the most relevant passages and inject them into the prompt, so the model answers from real source text.

Learn Retrieval-Augmented Generation (RAG) in our free Prompt Engineering course — a beginner-friendly interactive lesson with worked examples, a practice…

Part of the free Prompt Engineering course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

The result: fewer made-up answers, plus fresh and private knowledge the model never trained on. This lesson explains chunking, embeddings, vector stores, and retrieval.

RAG has two phases. First you prepare your data; then, for each question, you retrieve and answer:

Without grounding, the model guesses from memory. With it, the model has real facts in front of it:

For adding up-to-date or private knowledge, RAG is usually the simpler choice, and the two can be combined.

📋 Copy-paste RAG prompt template

⏱ Test Yourself — Timed Quiz

10 quick questions, 12 seconds each. Instant feedback — beat the clock!

Practice quiz

What does RAG (Retrieval-Augmented Generation) do?

Grounds answers in your own documents by retrieving relevant text and adding it to the prompt
Makes the model bigger
Deletes documents
Speeds up typing

Answer: Grounds answers in your own documents by retrieving relevant text and adding it to the prompt. RAG retrieves relevant chunks of your data and injects them into context to ground the answer.

Why split documents into 'chunks'?

For decoration
To delete them
So you can retrieve only the most relevant pieces instead of whole files
To make them longer

Answer: So you can retrieve only the most relevant pieces instead of whole files. Chunking lets you fetch just the relevant passages rather than entire documents.

What is an embedding in RAG?

A font setting
A numeric representation of text used to find similar chunks
A picture
A password

Answer: A numeric representation of text used to find similar chunks. Embeddings turn text into vectors so similar meaning can be matched numerically.

A vector store is used to…

Print documents
Block the model
Translate languages
Store embeddings and quickly find the chunks most similar to a query

Answer: Store embeddings and quickly find the chunks most similar to a query. A vector store indexes embeddings and retrieves the closest matches to a query.

'Top-k retrieval' means…

Fetching the k most relevant chunks for the question
Waiting k seconds
Deleting k files
Using k models

Answer: Fetching the k most relevant chunks for the question. Top-k pulls the k closest-matching chunks to feed into the prompt.

A major benefit of RAG is…

It hides sources
It reduces hallucination by grounding answers in real source text
It makes answers random
It removes all context

Answer: It reduces hallucination by grounding answers in real source text. Grounding in retrieved text reduces made-up answers.

RAG is especially useful for…

Changing the font
Deleting the model
Slowing replies
Adding fresh or private knowledge the model was not trained on

Answer: Adding fresh or private knowledge the model was not trained on. It injects up-to-date or private information at query time.

Citing sources in a RAG answer helps because…

It looks fancy
It wastes space
Users can verify the answer against the original documents
It confuses readers

Answer: Users can verify the answer against the original documents. Citations let users check the answer against the retrieved source text.

How does RAG differ from fine-tuning?

Fine-tuning needs no data
RAG injects knowledge at query time; fine-tuning bakes patterns into the model by training
They are identical
RAG deletes the model

Answer: RAG injects knowledge at query time; fine-tuning bakes patterns into the model by training. RAG adds context per query without retraining; fine-tuning changes the model's weights.

The first step before retrieval in a RAG pipeline is usually…

Chunking and embedding your documents into a vector store
Turning off the model
Printing everything
Deleting the question

Answer: Chunking and embedding your documents into a vector store. You chunk and embed documents into a vector store so they can be retrieved later.