Prompt Injection & Safety
Prompt injection is adversarial text that tries to hijack the AI's instructions, either typed directly by a user or hidden inside content the model reads. Untrusted text in context is the core risk.
Learn Prompt Injection & Safety in our free Prompt Engineering course — a beginner-friendly interactive lesson with worked examples, a practice exercise and…
Part of the free Prompt Engineering course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
This lesson covers direct and indirect injection, jailbreaks, and the defenses that keep AI systems safe, plus the difference between a refusal and a safe completion.
Injection tries to make the model follow attacker text. It can come straight from a user, or hide inside content the AI reads:
The danger is that the model may not distinguish your trusted instructions from text that merely sits in the context. Any user input or retrieved data could be read as a command.
No single fix is enough. Combine several layers:
Refusal vs safe completion: a refusal declines a request entirely; a safe completion still helps but stays within safe bounds. Aim for safe, helpful responses where possible; reserve refusals for genuinely unsafe asks.
⏱ Test Yourself — Timed Quiz
10 quick questions, 12 seconds each. Instant feedback — beat the clock!
Practice quiz
What is prompt injection?
- When adversarial text in the input tries to hijack the AI's instructions
- A way to speed up the AI
- A type of font
- A backup method
Answer: When adversarial text in the input tries to hijack the AI's instructions. Prompt injection is malicious input that tries to override the intended instructions.
Direct prompt injection is when…
- You delete a file
- The screen dims
- A user types a malicious instruction straight into the chat
- The model crashes
Answer: A user types a malicious instruction straight into the chat. Direct injection comes straight from the user's own input.
Indirect prompt injection hides malicious instructions in…
- A password
- Untrusted content the AI reads, like a web page or document
- The system clock
- The keyboard
Answer: Untrusted content the AI reads, like a web page or document. Indirect injection is embedded in content the AI processes, such as a fetched page.
A 'jailbreak' attempts to…
- Change the font
- Make the AI faster
- Save a file
- Trick the AI into ignoring its safety rules and guardrails
Answer: Trick the AI into ignoring its safety rules and guardrails. A jailbreak tries to bypass the model's safety constraints.
Why is untrusted text in context dangerous?
- The model may treat injected text as instructions to follow
- It is never dangerous
- It makes text bigger
- It deletes the prompt
Answer: The model may treat injected text as instructions to follow. If the model treats retrieved or user content as commands, attackers can steer it.
A core defense is to…
- Remove all guardrails
- Separate trusted instructions from untrusted data
- Run more loops
- Trust everything
Answer: Separate trusted instructions from untrusted data. Keeping instructions and data apart limits what injected text can do.
'Least privilege' for tools means…
- Use the biggest model
- Give the AI every possible permission
- Disable the AI
- Grant only the minimal tool access the task needs
Answer: Grant only the minimal tool access the task needs. Least privilege limits the blast radius if the AI is manipulated.
A safe rule for retrieved or user content is to…
- Delete it
- Forward it to everyone
- Never treat it as instructions, only as data to consider
- Always obey it as instructions
Answer: Never treat it as instructions, only as data to consider. Treat external content as data, not commands the AI must follow.
Input and output filtering helps by…
- Changing the language
- Catching dangerous content before or after the model processes it
- Hiding the answer
- Slowing the AI down for fun
Answer: Catching dangerous content before or after the model processes it. Filtering screens inputs and outputs for risky content.
What is the difference between a refusal and a safe completion?
- A refusal declines entirely; a safe completion helps within safe bounds
- A refusal is faster only
- Safe completion deletes data
- They are identical
Answer: A refusal declines entirely; a safe completion helps within safe bounds. A refusal says no outright; a safe completion provides a helpful, bounded response.