Streaming and Real-Time SQL

Traditional SQL runs once over a finite table and stops. Streaming SQL runs forever over an endless stream of events, continuously updating its answers. By the end of this lesson you'll understand bounded vs unbounded data, continuous queries , event-time vs processing-time , the three windowing styles (tumbling, sliding, session), watermarks for late data, incrementally-updated materialized views , and the tools — Flink SQL, ksqlDB, Materialize, RisingWave — that power real-time dashboards, fraud detection, and alerting.

Learn Streaming and Real-Time SQL in our free SQL course — an interactive lesson with worked examples, a practice exercise and a quick reference.

Part of the free SQL course at LearnCodingFast — hands-on lessons with examples you run in your browser, plus practice exercises and a quick quiz.

Imagine click events flowing in. As a finite snapshot (a bounded table) it looks like this — but a real stream would keep appending rows forever. Each row has a user, the minute bucket it fell in, and its event-time.

Bounded data is finite: a table, a file, yesterday's logs — it has a definite end, so a query can read all of it and produce a final answer. This is classic batch processing. Unbounded data is an endless stream of events that never stops arriving. A query over it can never see "all the data," so it must run continuously and keep revising its output as new events flow in.

Batch is a lake : a fixed body of water you can measure end to end. A stream is a river : always flowing past you. You can't measure "all" of a river — you can only measure what passes a point in a window of time. Streaming SQL is how you put a measuring station on the river.

In batch SQL, you run a query and it returns once. A continuous query (also called a standing query) is registered once and then runs indefinitely, emitting updated results every time relevant events arrive. The same mental model as a spreadsheet cell that recalculates whenever its inputs change — except the inputs are an infinite event stream.

Every event really has two clocks. Event-time is when the event actually happened — a timestamp carried inside the record (when the user clicked). Processing-time is when your system happens to handle it. They diverge constantly: a phone goes offline and uploads its clicks an hour late, or a network hiccup reorders messages.

For correct analytics you almost always want event-time : "how many clicks happened at 10:01" should mean when they occurred , not when they reached the server. Processing-time is simpler and lower-latency, but gives wrong window counts whenever data is delayed or out of order.

Since a stream never ends, you aggregate over windows — finite slices of time. There are three classic shapes:

If you aggregate by event-time, here's the hard question: when is a window done ? You can't wait forever for stragglers, but closing too early loses late events. A watermark is the answer — a moving marker that says "event-time has advanced to T; we believe all events older than T have arrived." When the watermark passes a window's end, that window is finalized and emitted.

A watermark bounds how long you wait. Allow more lateness (e.g. 30s) and results are more complete but slower; allow less and they're faster but may drop late events. Events arriving after the allowed lateness are typically dropped or sent to a side output for separate handling.

A normal materialized view stores a query's result but goes stale until you refresh it (recomputing the whole thing). Streaming databases like Materialize and RisingWave take a different approach: you write a familiar SELECT once, and the engine keeps the result continuously up to date by applying only the deltas as new events arrive — never recomputing from scratch. You then query the view like any table and always get a millisecond-fresh answer.

This is the magic that makes streaming feel like ordinary SQL: the standing query is hidden behind a view. Your dashboard just runs SELECT * FROM clicks_per_user and the numbers update themselves.

Fill in the blanks to count events per minute bucket. In a streaming engine this becomes a tumbling-window query that emits each minute forever — but the aggregation logic is exactly what you write here.

Q: Is streaming SQL just normal SQL run repeatedly?

No. Re-running a batch query on a schedule (micro-batch) is one approximation, but true streaming SQL maintains a continuous query with persistent state, event-time semantics, windows, and watermarks, emitting incremental updates as events arrive.

Q: Tumbling vs sliding vs session — how do I choose?

Use tumbling for clean, non-overlapping periods ("revenue per hour"). Use sliding for smooth rolling metrics ("error rate over the last 5 minutes, updated each minute"). Use session when activity comes in bursts and you want to group each burst (user sessions, device activity).

Q: What happens to events that arrive very late?

It depends on the allowed lateness in your watermark. Within the bound, late events still update their window; beyond it, they're usually dropped or routed to a side output so you can handle them separately rather than silently losing them.

Not strictly, but a durable log like Apache Kafka (or Pulsar/Kinesis) is the common backbone — it's where the unbounded streams live. ksqlDB targets Kafka directly; Flink, Materialize, and RisingWave connect to Kafka and other sources.

Practice quiz

What is the key difference between bounded and unbounded data?

  • Bounded data has a known, finite end; unbounded data is a continuous, never-ending stream
  • Bounded data is encrypted while unbounded data is not
  • Bounded data is always smaller than one gigabyte
  • There is no real difference between them

Answer: Bounded data has a known, finite end; unbounded data is a continuous, never-ending stream. Bounded (batch) data is finite with a definite end; unbounded (stream) data arrives continuously and has no end, so queries must run forever.

How does a continuous query differ from a traditional batch query?

  • It must be re-typed every minute
  • It runs once and returns a single final result
  • It keeps running and updates its results as new events arrive
  • It can only count rows

Answer: It keeps running and updates its results as new events arrive. A continuous (standing) query stays active over an unbounded stream and incrementally updates its output as new data flows in.

What is event-time?

  • The time a dashboard refreshes
  • The time the event actually occurred, usually carried in the data
  • The time the system happens to process an event
  • The time a server boots up

Answer: The time the event actually occurred, usually carried in the data. Event-time is when the event truly happened (a timestamp in the record). Processing-time is when the system happens to handle it, which can differ due to delays.

A tumbling window is best described as which kind of window?

  • Overlapping windows that slide by a small step
  • Windows defined by gaps of inactivity
  • A single window covering all of history
  • Fixed-size, non-overlapping, contiguous time buckets

Answer: Fixed-size, non-overlapping, contiguous time buckets. Tumbling windows are fixed-size and non-overlapping, so each event falls into exactly one window (for example, counts per 1-minute bucket).

What distinguishes a sliding (hopping) window from a tumbling window?

  • Sliding windows can overlap because they advance by a step smaller than their size
  • Sliding windows have no fixed size
  • Sliding windows never overlap
  • Sliding windows ignore event-time

Answer: Sliding windows can overlap because they advance by a step smaller than their size. A sliding window has a size and a smaller slide or hop interval, so consecutive windows overlap and an event can belong to several.

A session window groups events by what?

  • The size of each event in bytes
  • Periods of activity separated by gaps of inactivity
  • Fixed clock minutes
  • User ID in alphabetical order

Answer: Periods of activity separated by gaps of inactivity. Session windows close after a configured gap of inactivity, capturing a burst of related activity (like a user session) of variable length.

What problem does a watermark solve in stream processing?

  • It compresses messages on the wire
  • It assigns primary keys to events
  • It encrypts the stream end to end
  • It tracks event-time progress so the system knows when a window can be considered complete despite late data

Answer: It tracks event-time progress so the system knows when a window can be considered complete despite late data. A watermark marks event-time progress; it tells the engine that events up to a point have likely arrived, so a window can be finalized while bounding how long to wait for stragglers.

What makes an incrementally-maintained materialized view different from a normal one?

  • It can never be queried directly
  • It stores no data at all
  • It updates its stored result automatically as underlying data changes, instead of being fully recomputed
  • It only works in batch systems

Answer: It updates its stored result automatically as underlying data changes, instead of being fully recomputed. An incrementally-maintained materialized view (as in Materialize or RisingWave) keeps results fresh by applying just the deltas, rather than recomputing the whole query.

Which tool lets you run SQL queries directly over Apache Kafka topics?

  • Apache Hadoop MapReduce
  • ksqlDB
  • Microsoft Excel
  • SQLite

Answer: ksqlDB. ksqlDB is a streaming SQL engine built for Kafka, letting you define streams, tables, and continuous queries over Kafka topics with SQL.

Which of the following is a classic use case for streaming SQL?

  • Real-time fraud detection and live dashboards that update as events occur
  • Compressing an archive of old logs
  • Designing a normalized schema on paper
  • Generating a one-time year-end report from a static file

Answer: Real-time fraud detection and live dashboards that update as events occur. Streaming SQL shines for continuously updated needs: real-time dashboards, fraud detection, monitoring, and alerting on live event streams.