Checkpoint: Time Series & I/O
A checkpoint is a hands-on review that ties recent lessons together — here you'll combine resampling, interpolation, exploding, cross-tabs, SQL/Parquet I/O, and ranking into a single working pipeline.
Learn Checkpoint: Time Series & I/O in our free Pandas course — a beginner-friendly interactive lesson with worked examples, a practice exercise and a quick…
Part of the free Pandas course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.
Recap the six skills, then prove them with a multi-step build challenge and a short quiz before moving on to the capstone.
Here is the whole toolkit from this stretch of the course, each as a one-liner you can reach for:
Put the pipeline together end to end. Starting from a sales table stored in SQLite, your job is to:
The starter below runs as-is and prints the raw table. Extend it to produce the ranked report, then check yourself against the full solution.
Answer each in your head, then expand to check.
df.resample("ME").sum() . It requires a DatetimeIndex (parse the dates and set_index first, or pass on="date" ). Remember the alias is "ME" , not the old "M" .
Use interpolate() when the value changes smoothly between known points (temperatures, prices) so a straight-line estimate makes sense. Use ffill() for step-like data that stays constant until it changes — it copies the last value forward rather than estimating a trend.
First split into a list: df["col"] = df["col"].str.split(",") . Then explode it: df.explode("col") . explode only unpacks real list-like values, so the split must come first.
In to_sql it drops the existing table and recreates it, so any rows already there are lost. Use "append" when you want to keep accumulating data, and "fail" (the default) to be warned that the table exists.
With "min" the tied values are both 2 and the next value is 4 (a gap). With "dense" the tied values are both 2 and the next value is 3 (no gap).
df.nlargest(3, "revenue") is faster (and shorter) because it only has to locate the top 3 rather than sorting the entire DataFrame. Both return the same rows.
Checkpoint complete — your pipeline skills are solid!
You can move data with read_sql / to_sql and binary formats, reshape time with resample and interpolate , restructure with explode and crosstab , and surface winners with rank and nlargest — all in one flow.
🚀 Up next: the Capstone — an end-to-end data-cleaning project bringing the whole course together.
Practice quiz
Which call turns a daily series into monthly totals, given a DatetimeIndex?
- df.groupby("month").sum()
- df.rolling("ME").sum()
- df.resample("ME").sum()
- df.asfreq("ME")
Answer: df.resample("ME").sum(). resample("ME").sum() aggregates daily rows into monthly totals. It needs a DatetimeIndex (or on="date").
In current pandas, what is the correct alias for month-end resampling?
- "ME"
- "M"
- "MON"
- "EOM"
Answer: "ME". The modern alias is "ME" (month end). The old "M" alias is deprecated.
You upsampled and now have NaN gaps in smoothly changing data (e.g. temperatures). Which method best fills them?
- ffill()
- dropna()
- fillna(0)
- interpolate()
Answer: interpolate(). interpolate() estimates the values between known points with a straight-line trend, ideal for smoothly varying data.
A column holds the string "a,b,c". What two steps turn that one row into three rows?
- explode() then str.split(",")
- str.split(",") then explode()
- pivot() then melt()
- crosstab() then stack()
Answer: str.split(",") then explode(). explode only unpacks list-like values, so first str.split(",") makes a list, then explode("col") spreads it into rows.
What does pd.crosstab(df["region"], df["choice"]) produce?
- A count of every region/choice combination
- A correlation matrix
- A merged DataFrame
- A time-resampled table
Answer: A count of every region/choice combination. crosstab builds a frequency table counting how often each combination of the two categories occurs.
In df.to_sql("t", conn, if_exists="replace"), what does "replace" do?
- Appends new rows to the table
- Raises an error if the table exists
- Drops the existing table and recreates it
- Updates only changed rows
Answer: Drops the existing table and recreates it. "replace" drops the existing table and recreates it, so any rows already there are lost. Use "append" to accumulate.
You want the 3 highest-revenue rows. Which is the most direct and efficient call?
- df.sort_values("revenue").head(3)
- df.nlargest(3, "revenue")
- df.rank("revenue").head(3)
- df.max(3, "revenue")
Answer: df.nlargest(3, "revenue"). nlargest(3, "revenue") only locates the top 3 rather than sorting the whole DataFrame, so it is shorter and faster.
Two values tie for 2nd place. What rank does the NEXT value get with method="dense"?
- 4
- 2
- 5
- 3
Answer: 3. With "dense" there are no gaps: the tied values are both 2 and the next value is 3.
Which I/O format is the recommended always-available binary round-trip for caching a DataFrame?
- CSV
- Pickle
- JSON
- XML
Answer: Pickle. to_pickle / read_pickle give a fast binary round-trip with no extra dependencies, ideal for caching intermediate results.
What does interpolate(method="time") weight its estimates by?
- Row position only
- The column order
- The elapsed time between index points
- Alphabetical labels
Answer: The elapsed time between index points. method="time" weights the interpolation by the actual elapsed time between DatetimeIndex points, not just row spacing.