Strings, Bytes & Unicode Encodings
In Python a str is readable Unicode text while bytes is raw binary data, and you move between them with encode and decode using an encoding such as UTF-8 — the foundation for working correctly with files, networks, and every language's characters.
Learn Strings, Bytes & Unicode Encodings in our free Python course — an interactive lesson with runnable examples, a practice exercise and a quick reference.
Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Mixing up text and bytes is one of the most common sources of confusing errors. Once the difference clicks, encoding bugs stop being mysterious.
A str is text you can read; bytes is raw data you store or send. To turn text into bytes you encode it; to turn bytes back into text you decode them. Always name the encoding:
Decoding bytes with the wrong encoding raises UnicodeDecodeError . The cure is to decode with the same encoding the data was created in. When you can't, errors= lets you continue:
Prefer fixing the encoding over hiding the error. Use errors="replace" only for best-effort display of possibly-corrupt data, never for data you must keep intact.
ord() turns a character into its Unicode code point number, and chr() turns a number back into a character. Escape sequences like \n and \u let you write special characters in source code:
Replace each ___ so the text round-trips through UTF-8 bytes and comes back unchanged.
Write a report comparing how many bytes each name takes in UTF-8 — handy for understanding storage and network limits across languages.
Lesson complete — text and bytes no longer confuse you!
You know that str is text and bytes is binary, you convert with encode / decode using UTF-8, you can diagnose a UnicodeDecodeError , and you can map characters to code points with ord and chr .
🚀 Up next: Iterators — __iter__ & __next__ — how the for-loop really works.
Practice quiz
What is the difference between str and bytes?
- str is binary data; bytes is text
- They are identical in Python 3
- str is human-readable Unicode text; bytes is raw 8-bit numbers
- str can only hold ASCII; bytes holds Unicode
Answer: str is human-readable Unicode text; bytes is raw 8-bit numbers. str is a sequence of Unicode characters (text); bytes is a sequence of raw 8-bit values (binary data).
Which method converts a str to bytes?
- .encode()
- .decode()
- .bytes()
- .str()
Answer: .encode(). encode() goes str to bytes ('encode to send'); decode() goes bytes to str ('decode to read').
How many bytes does 'café' take when encoded as UTF-8?
- 3
- 4
- 8
- 5
Answer: 5. café is 4 characters but 5 bytes in UTF-8, because é takes 2 bytes.
Why does UTF-8 keep English text compact?
- It compresses all text
- ASCII characters stay one byte each
- It drops non-ASCII characters
- It uses 4 bytes for every character
Answer: ASCII characters stay one byte each. UTF-8 encodes ASCII characters in a single byte and uses more bytes only for other scripts, staying ASCII-compatible.
What causes a UnicodeDecodeError?
- Decoding bytes with the wrong encoding
- Encoding a string twice
- Using a string longer than 1024 characters
- Mixing tabs and spaces
Answer: Decoding bytes with the wrong encoding. Decoding bytes with an encoding that can't interpret them — like reading UTF-8 data as ASCII — raises UnicodeDecodeError.
What does ord('A') return?
- 'A'
- 1
- 65
- 0x41 as a string
Answer: 65. ord() returns the Unicode code point as an integer; ord('A') is 65, and chr(65) gives 'A' back.
What does chr(127881) produce?
- 'A'
- The 🎉 emoji
- An error
- 127881
Answer: The 🎉 emoji. chr() turns a code point back into its character; 127881 is the party-popper emoji 🎉.
What does "a" + b"b" raise?
- Nothing — it gives 'ab'
- A UnicodeDecodeError
- A ValueError
- A TypeError, because str and bytes can't be concatenated
Answer: A TypeError, because str and bytes can't be concatenated. You can't combine str and bytes. Decode the bytes to text or encode the text to bytes first.
What does raw.decode('ascii', errors='ignore') do with undecodable bytes?
- Raises an error anyway
- Drops the undecodable bytes
- Replaces them with a placeholder character
- Returns the original bytes object
Answer: Drops the undecodable bytes. errors='ignore' silently drops bytes it can't decode; errors='replace' inserts a placeholder instead.
What is the right mental model for encode vs decode?
- encode: bytes to str; decode: str to bytes
- Both convert str to str
- encode: str to bytes; decode: bytes to str
- Both convert bytes to bytes
Answer: encode: str to bytes; decode: bytes to str. encode turns readable text into bytes to send; decode turns received bytes back into readable text.