Strings, Bytes & Unicode Encodings

In Python a str is readable Unicode text while bytes is raw binary data, and you move between them with encode and decode using an encoding such as UTF-8 — the foundation for working correctly with files, networks, and every language's characters.

Learn Strings, Bytes & Unicode Encodings in our free Python course — an interactive lesson with runnable examples, a practice exercise and a quick reference.

Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

Mixing up text and bytes is one of the most common sources of confusing errors. Once the difference clicks, encoding bugs stop being mysterious.

A str is text you can read; bytes is raw data you store or send. To turn text into bytes you encode it; to turn bytes back into text you decode them. Always name the encoding:

Decoding bytes with the wrong encoding raises UnicodeDecodeError . The cure is to decode with the same encoding the data was created in. When you can't, errors= lets you continue:

Prefer fixing the encoding over hiding the error. Use errors="replace" only for best-effort display of possibly-corrupt data, never for data you must keep intact.

ord() turns a character into its Unicode code point number, and chr() turns a number back into a character. Escape sequences like \n and \u let you write special characters in source code:

Replace each ___ so the text round-trips through UTF-8 bytes and comes back unchanged.

Write a report comparing how many bytes each name takes in UTF-8 — handy for understanding storage and network limits across languages.

Lesson complete — text and bytes no longer confuse you!

You know that str is text and bytes is binary, you convert with encode / decode using UTF-8, you can diagnose a UnicodeDecodeError , and you can map characters to code points with ord and chr .

🚀 Up next: Iterators — __iter__ & __next__ — how the for-loop really works.

Practice quiz

What is the difference between str and bytes?

  • str is binary data; bytes is text
  • They are identical in Python 3
  • str is human-readable Unicode text; bytes is raw 8-bit numbers
  • str can only hold ASCII; bytes holds Unicode

Answer: str is human-readable Unicode text; bytes is raw 8-bit numbers. str is a sequence of Unicode characters (text); bytes is a sequence of raw 8-bit values (binary data).

Which method converts a str to bytes?

  • .encode()
  • .decode()
  • .bytes()
  • .str()

Answer: .encode(). encode() goes str to bytes ('encode to send'); decode() goes bytes to str ('decode to read').

How many bytes does 'café' take when encoded as UTF-8?

  • 3
  • 4
  • 8
  • 5

Answer: 5. café is 4 characters but 5 bytes in UTF-8, because é takes 2 bytes.

Why does UTF-8 keep English text compact?

  • It compresses all text
  • ASCII characters stay one byte each
  • It drops non-ASCII characters
  • It uses 4 bytes for every character

Answer: ASCII characters stay one byte each. UTF-8 encodes ASCII characters in a single byte and uses more bytes only for other scripts, staying ASCII-compatible.

What causes a UnicodeDecodeError?

  • Decoding bytes with the wrong encoding
  • Encoding a string twice
  • Using a string longer than 1024 characters
  • Mixing tabs and spaces

Answer: Decoding bytes with the wrong encoding. Decoding bytes with an encoding that can't interpret them — like reading UTF-8 data as ASCII — raises UnicodeDecodeError.

What does ord('A') return?

  • 'A'
  • 1
  • 65
  • 0x41 as a string

Answer: 65. ord() returns the Unicode code point as an integer; ord('A') is 65, and chr(65) gives 'A' back.

What does chr(127881) produce?

  • 'A'
  • The 🎉 emoji
  • An error
  • 127881

Answer: The 🎉 emoji. chr() turns a code point back into its character; 127881 is the party-popper emoji 🎉.

What does "a" + b"b" raise?

  • Nothing — it gives 'ab'
  • A UnicodeDecodeError
  • A ValueError
  • A TypeError, because str and bytes can't be concatenated

Answer: A TypeError, because str and bytes can't be concatenated. You can't combine str and bytes. Decode the bytes to text or encode the text to bytes first.

What does raw.decode('ascii', errors='ignore') do with undecodable bytes?

  • Raises an error anyway
  • Drops the undecodable bytes
  • Replaces them with a placeholder character
  • Returns the original bytes object

Answer: Drops the undecodable bytes. errors='ignore' silently drops bytes it can't decode; errors='replace' inserts a placeholder instead.

What is the right mental model for encode vs decode?

  • encode: bytes to str; decode: str to bytes
  • Both convert str to str
  • encode: str to bytes; decode: bytes to str
  • Both convert bytes to bytes

Answer: encode: str to bytes; decode: bytes to str. encode turns readable text into bytes to send; decode turns received bytes back into readable text.