Checkpoint: A Real-World Script

You've learned regex, datetime, JSON, HTTP requests, CSV/Excel, web scraping, itertools and pathlib — let's build a real script. No new concepts here: this checkpoint is about wiring your skills together into one program that fetches data, cleans it, analyses it, and saves a report.

Learn Checkpoint: A Real-World Script in our free Python course — a beginner-friendly interactive lesson with runnable examples, a practice exercise and a…

Part of the free Python course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

This is exactly the kind of small automation tool that working developers write every week.

You'll build a script that produces a tidy CSV report about a GitHub user's recent repositories. It touches almost every tool from this section:

Fill in the four TODOs. The sample data mimics a real GitHub API response, so your logic will work unchanged when you later point it at the live API.

Here's one clean, complete implementation. Read it top to bottom — every section maps to a lesson you've finished.

Notice how each tool slots in: re.match filters, strptime + subtraction gives the age, Counter finds the top language, csv.DictWriter + pathlib save the report.

To make it live, replace the hard-coded repos list with a single request. Everything downstream stays identical:

Six questions covering the whole section. Try to answer before revealing each one.

1. You fetched JSON with requests . Which method turns the response body into a Python dict?

response.json() — it runs json.loads on the body and returns a dict or list.

2. Which datetime function converts the string "2024-06-17" into a datetime object?

datetime.strptime("2024-06-17", "%Y-%m-%d") — strptime parses a string into a datetime. (strftime does the reverse.)

3. What's the difference between re.match and re.search ?

re.match only checks the start of the string; re.search looks anywhere in it. Both return None if there's no match.

4. When writing a CSV with csv.DictWriter , what must you call before writerows to add the column titles?

writer.writeheader() — it writes the field names row from the fieldnames you passed to DictWriter .

5. Which pathlib expression safely creates the folder chain output/2024/ even if output/ doesn't exist yet, without erroring on a second run?

Path("output/2024").mkdir(parents=True, exist_ok=True) — parents=True makes missing parents, exist_ok=True avoids the FileExistsError on re-runs.

6. Why does itertools.groupby sometimes split identical values into separate groups, and how do you fix it?

Because groupby only groups consecutive equal items. Fix it by sorting the data on the same key first: data.sort(key=k) then groupby(data, key=k) .

Checkpoint cleared — you can build real tools now!

You combined regex, datetime, JSON, requests, CSV, itertools and pathlib into a single working script that fetches, cleans, analyses, and saves data. That orchestration skill — wiring tools together — is what separates someone who knows Python from someone who builds with it.

🚀 Up next: File Handling — go deeper on reading and writing files, handling encodings, and managing resources safely.

Practice quiz

You fetched JSON with requests. Which call turns the response body into a Python dict or list?

  • response.text()
  • response.dict()
  • response.json()
  • response.parse()

Answer: response.json(). response.json() runs json.loads on the body and returns the decoded dict or list.

Which datetime call converts the string "2024-06-17" into a datetime object?

  • datetime.strptime("2024-06-17", "%Y-%m-%d")
  • datetime.strftime("2024-06-17", "%Y-%m-%d")
  • datetime.fromisostring("2024-06-17")
  • datetime.parse("2024-06-17")

Answer: datetime.strptime("2024-06-17", "%Y-%m-%d"). strptime parses a string into a datetime. strftime does the reverse (datetime to string).

What is the difference between re.match and re.search?

  • match scans the whole string, search only the start
  • They are identical
  • search returns a list, match returns one object
  • match only checks the start of the string, search looks anywhere

Answer: match only checks the start of the string, search looks anywhere. re.match anchors at the start of the string; re.search looks anywhere in it. Both return None when there is no match.

With csv.DictWriter, what must you call before writerows to add the column titles?

  • writer.writecolumns()
  • writer.writeheader()
  • writer.writetitles()
  • writer.header()

Answer: writer.writeheader(). writer.writeheader() writes the field names row using the fieldnames passed to DictWriter.

Which pathlib call creates output/2024/ even if output/ is missing, without erroring on a second run?

  • Path("output/2024").mkdir(parents=True, exist_ok=True)
  • Path("output/2024").mkdir()
  • Path("output/2024").create(force=True)
  • Path("output/2024").makedirs()

Answer: Path("output/2024").mkdir(parents=True, exist_ok=True). parents=True builds missing parent folders and exist_ok=True avoids FileExistsError on re-runs.

Why does itertools.groupby sometimes split identical values into separate groups?

  • It always sorts before grouping
  • It hashes values incorrectly
  • It only groups consecutive equal items, so unsorted data fragments
  • It ignores the key function

Answer: It only groups consecutive equal items, so unsorted data fragments. groupby only groups consecutive equal items. Sort the data on the same key first so equal items sit next to each other.

In the GitHub reporter, which tool finds the most common language across the kept repos?

  • re.findall
  • collections.Counter(...).most_common(1)
  • sorted with reverse=True
  • json.dumps

Answer: collections.Counter(...).most_common(1). Counter counts each language and most_common(1) returns the single most frequent one.

How is "days since last push" computed from a parsed pushed_at datetime?

  • pushed - TODAY
  • TODAY.subtract(pushed)
  • pushed.days_until(TODAY)
  • (TODAY - pushed).days

Answer: (TODAY - pushed).days. Subtracting two datetimes yields a timedelta; its .days attribute gives the whole-day difference.

Why does the script use embedded sample data shaped like the real API response?

  • Sample data is faster to hash
  • So the filter/parse/summarise logic can be built and tested offline, then point at the live API unchanged
  • Because requests cannot return lists
  • To avoid using pathlib

Answer: So the filter/parse/summarise logic can be built and tested offline, then point at the live API unchanged. Keeping the data source swappable lets you build and debug the processing offline, then flip to requests.get() with no other changes.

When the live version calls requests.get(), how should network failures be handled?

  • Ignore them — requests never fails
  • Wrap the call in a while loop forever
  • Catch requests.RequestException and handle it gracefully
  • Call sys.exit() on any response

Answer: Catch requests.RequestException and handle it gracefully. requests.RequestException is the base class for connection and HTTP errors; catching it lets the script degrade gracefully instead of crashing.