Prompt Injection & Safety

Prompt injection is adversarial text that tries to hijack the AI's instructions, either typed directly by a user or hidden inside content the model reads. Untrusted text in context is the core risk.

Learn Prompt Injection & Safety in our free Prompt Engineering course — a beginner-friendly interactive lesson with worked examples, a practice exercise and…

Part of the free Prompt Engineering course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

This lesson covers direct and indirect injection, jailbreaks, and the defenses that keep AI systems safe, plus the difference between a refusal and a safe completion.

Injection tries to make the model follow attacker text. It can come straight from a user, or hide inside content the AI reads:

The danger is that the model may not distinguish your trusted instructions from text that merely sits in the context. Any user input or retrieved data could be read as a command.

No single fix is enough. Combine several layers:

Refusal vs safe completion: a refusal declines a request entirely; a safe completion still helps but stays within safe bounds. Aim for safe, helpful responses where possible; reserve refusals for genuinely unsafe asks.

⏱ Test Yourself — Timed Quiz

10 quick questions, 12 seconds each. Instant feedback — beat the clock!

Practice quiz

What is prompt injection?

When adversarial text in the input tries to hijack the AI's instructions
A way to speed up the AI
A type of font
A backup method

Answer: When adversarial text in the input tries to hijack the AI's instructions. Prompt injection is malicious input that tries to override the intended instructions.

Direct prompt injection is when…

You delete a file
The screen dims
A user types a malicious instruction straight into the chat
The model crashes

Answer: A user types a malicious instruction straight into the chat. Direct injection comes straight from the user's own input.

Indirect prompt injection hides malicious instructions in…

A password
Untrusted content the AI reads, like a web page or document
The system clock
The keyboard

Answer: Untrusted content the AI reads, like a web page or document. Indirect injection is embedded in content the AI processes, such as a fetched page.

A 'jailbreak' attempts to…

Change the font
Make the AI faster
Save a file
Trick the AI into ignoring its safety rules and guardrails

Answer: Trick the AI into ignoring its safety rules and guardrails. A jailbreak tries to bypass the model's safety constraints.

Why is untrusted text in context dangerous?

The model may treat injected text as instructions to follow
It is never dangerous
It makes text bigger
It deletes the prompt

Answer: The model may treat injected text as instructions to follow. If the model treats retrieved or user content as commands, attackers can steer it.

A core defense is to…

Remove all guardrails
Separate trusted instructions from untrusted data
Run more loops
Trust everything

Answer: Separate trusted instructions from untrusted data. Keeping instructions and data apart limits what injected text can do.

'Least privilege' for tools means…

Use the biggest model
Give the AI every possible permission
Disable the AI
Grant only the minimal tool access the task needs

Answer: Grant only the minimal tool access the task needs. Least privilege limits the blast radius if the AI is manipulated.

A safe rule for retrieved or user content is to…

Delete it
Forward it to everyone
Never treat it as instructions, only as data to consider
Always obey it as instructions

Answer: Never treat it as instructions, only as data to consider. Treat external content as data, not commands the AI must follow.

Input and output filtering helps by…

Changing the language
Catching dangerous content before or after the model processes it
Hiding the answer
Slowing the AI down for fun

Answer: Catching dangerous content before or after the model processes it. Filtering screens inputs and outputs for risky content.

What is the difference between a refusal and a safe completion?

A refusal declines entirely; a safe completion helps within safe bounds
A refusal is faster only
Safe completion deletes data
They are identical

Answer: A refusal declines entirely; a safe completion helps within safe bounds. A refusal says no outright; a safe completion provides a helpful, bounded response.