Retrieval-Augmented Generation (RAG)

RAG grounds an AI's answers in your own documents . Instead of relying only on training knowledge, you retrieve the most relevant passages and inject them into the prompt, so the model answers from real source text.

Learn Retrieval-Augmented Generation (RAG) in our free Prompt Engineering course — a beginner-friendly interactive lesson with worked examples, a practice…

Part of the free Prompt Engineering course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

The result: fewer made-up answers, plus fresh and private knowledge the model never trained on. This lesson explains chunking, embeddings, vector stores, and retrieval.

RAG has two phases. First you prepare your data; then, for each question, you retrieve and answer:

Without grounding, the model guesses from memory. With it, the model has real facts in front of it:

For adding up-to-date or private knowledge, RAG is usually the simpler choice, and the two can be combined.

📋 Copy-paste RAG prompt template

⏱ Test Yourself — Timed Quiz

10 quick questions, 12 seconds each. Instant feedback — beat the clock!

Practice quiz

What does RAG (Retrieval-Augmented Generation) do?

  • Grounds answers in your own documents by retrieving relevant text and adding it to the prompt
  • Makes the model bigger
  • Deletes documents
  • Speeds up typing

Answer: Grounds answers in your own documents by retrieving relevant text and adding it to the prompt. RAG retrieves relevant chunks of your data and injects them into context to ground the answer.

Why split documents into 'chunks'?

  • For decoration
  • To delete them
  • So you can retrieve only the most relevant pieces instead of whole files
  • To make them longer

Answer: So you can retrieve only the most relevant pieces instead of whole files. Chunking lets you fetch just the relevant passages rather than entire documents.

What is an embedding in RAG?

  • A font setting
  • A numeric representation of text used to find similar chunks
  • A picture
  • A password

Answer: A numeric representation of text used to find similar chunks. Embeddings turn text into vectors so similar meaning can be matched numerically.

A vector store is used to…

  • Print documents
  • Block the model
  • Translate languages
  • Store embeddings and quickly find the chunks most similar to a query

Answer: Store embeddings and quickly find the chunks most similar to a query. A vector store indexes embeddings and retrieves the closest matches to a query.

'Top-k retrieval' means…

  • Fetching the k most relevant chunks for the question
  • Waiting k seconds
  • Deleting k files
  • Using k models

Answer: Fetching the k most relevant chunks for the question. Top-k pulls the k closest-matching chunks to feed into the prompt.

A major benefit of RAG is…

  • It hides sources
  • It reduces hallucination by grounding answers in real source text
  • It makes answers random
  • It removes all context

Answer: It reduces hallucination by grounding answers in real source text. Grounding in retrieved text reduces made-up answers.

RAG is especially useful for…

  • Changing the font
  • Deleting the model
  • Slowing replies
  • Adding fresh or private knowledge the model was not trained on

Answer: Adding fresh or private knowledge the model was not trained on. It injects up-to-date or private information at query time.

Citing sources in a RAG answer helps because…

  • It looks fancy
  • It wastes space
  • Users can verify the answer against the original documents
  • It confuses readers

Answer: Users can verify the answer against the original documents. Citations let users check the answer against the retrieved source text.

How does RAG differ from fine-tuning?

  • Fine-tuning needs no data
  • RAG injects knowledge at query time; fine-tuning bakes patterns into the model by training
  • They are identical
  • RAG deletes the model

Answer: RAG injects knowledge at query time; fine-tuning bakes patterns into the model by training. RAG adds context per query without retraining; fine-tuning changes the model's weights.

The first step before retrieval in a RAG pipeline is usually…

  • Chunking and embedding your documents into a vector store
  • Turning off the model
  • Printing everything
  • Deleting the question

Answer: Chunking and embedding your documents into a vector store. You chunk and embed documents into a vector store so they can be retrieved later.