1What is the key difference between tumbling and hopping windows?
Windowing divides an unbounded event stream into finite, time-bounded segments (windows) for aggregation and analysis. The three fundamental window types are tumbling (fixed, non-overlapping), hopping (fixed, overlapping), and session (dynamic, activity-based). Windowing is essential for computing time-based metrics like 'clicks per minute,' 'average latency over 5 minutes,' or 'purchases per user session.'
Batch processing operates on bounded datasets: 'process all of yesterday's logs.' Stream processing operates on unbounded data: events arrive continuously, forever. To compute aggregations (counts, averages, sums) on unbounded data, you need to define finite segments -- **windows** -- over which to compute.
The three fundamental window types are:
**Tumbling windows** divide time into fixed, non-overlapping intervals. A 1-minute tumbling window creates buckets [00:00-01:00), [01:00-02:00), [02:00-03:00), etc. Each event belongs to exactly one window. This is the simplest window type, used for metrics like 'requests per minute' or 'errors per hour.'
**Hopping windows** (also called sliding windows) have a fixed size and a fixed advance interval (hop). A 5-minute window with a 1-minute hop creates overlapping windows: [00:00-05:00), [01:00-06:00), [02:00-07:00), etc. Each event belongs to multiple windows (size/hop = 5 windows per event in this case). This smooths out metric spikes: instead of a sharp drop at window boundaries, the metric updates every hop interval.
**Session windows** are dynamic: they start with the first event and close after an inactivity gap (e.g., 30 minutes with no events). A user who clicks at 10:00, 10:05, 10:20, and 11:00 creates two sessions: [10:00-10:20] and [11:00-11:00]. Session windows capture user behavior patterns: page sessions, shopping sessions, or chat conversations.
A critical distinction is **event time** vs **processing time**. Event time is when the event occurred (embedded timestamp). Processing time is when the system processes the event. Due to network delays, batching, and retransmission, events may arrive out of order. A click at 10:00:05 may arrive at 10:00:12. If you use processing time, the click falls in the wrong window. Event-time windowing uses the embedded timestamp to place events in the correct window.
**Watermarks** solve the problem of knowing when a window is complete. A watermark is a declaration: 'all events with timestamp ≤ T have arrived.' When the watermark passes a window's end time, the window is closed and its result is emitted. Late events (arriving after the watermark) can be handled by allowed lateness: the window stays open for a grace period to accommodate stragglers.
Click-Through Rate by Minute
An ad platform computes click-through rate (CTR) per ad per minute. Using a 1-minute tumbling window keyed by ad_id: each window collects all click events for a specific ad within that minute, counts them, divides by impressions (from another stream), and emits the CTR. At 10:01:00, the window [10:00-10:01) closes and emits 'ad_123 had 47 clicks / 1,200 impressions = 3.9% CTR.' A click event timestamped 10:00:58 that arrives at 10:01:03 (3 seconds late) is still assigned to the [10:00-10:01) window because event-time windowing uses the embedded timestamp, not the arrival time.
Uber
Uber uses session windows to detect driver trips. A trip session starts when the driver picks up a rider (first GPS event) and ends after 5 minutes of no movement (session gap). Within each session, Uber computes trip distance, duration, average speed, and route efficiency. Session windows naturally handle stops at traffic lights (short gaps within the session) while detecting trip completion (extended gap).
Netflix
Netflix uses hopping windows for real-time quality-of-experience monitoring. A 5-minute window hopping every 30 seconds computes average buffering ratio, bitrate, and error rate per title per region. The overlapping windows provide smooth trend lines on the monitoring dashboard, updating every 30 seconds with 5 minutes of context.
Twitter/X
Twitter uses tumbling windows for trending topic detection. A 1-minute tumbling window counts tweet volume per hashtag. The top-N hashtags by volume across the last window, with a velocity filter (comparing against the previous window), identify trending topics. The simplicity of tumbling windows enables processing millions of tweets per minute with predictable resource usage.
| Aspect | Description |
|---|---|
| Accuracy vs Latency (Watermark Strategy) | Aggressive watermarks (advance quickly) emit results sooner but may miss late events. Conservative watermarks (advance slowly) capture more events but delay results. The allowed lateness parameter controls this trade-off: higher lateness means more complete results but more memory usage to keep windows open. |
| Window Size vs Resource Usage | Larger windows consume more memory (more events buffered). Hopping windows multiply this by size/hop. Session windows with long gaps can grow very large. Operators must provision sufficient state storage (RocksDB for Flink, in-memory for Kafka Streams). |
| Event Time vs Processing Time Complexity | Processing-time windows are trivial to implement (wall clock) but incorrect with out-of-order events. Event-time windows are correct but require watermark generation, state management for open windows, and late event handling. Most production systems use event time because correctness trumps simplicity. |
| Session Window Merging Complexity | Session windows require merging: when a new event arrives that falls within the gap of an existing session, the sessions merge. This is computationally expensive and creates variable-size state. Flink and Kafka Streams support session window merging, but it adds overhead compared to fixed windows. |
Spotify's Real-Time Listening Analytics
Scenario
Spotify needed to compute real-time listening statistics: plays per song per minute, average listen duration per album, and user session length. With billions of play events daily and events arriving from mobile devices with variable network latency (seconds to minutes of delay), processing-time windowing produced inaccurate metrics. A song play at 10:00 arriving at 10:03 would be counted in the 10:03 window instead of the 10:00 window.
Solution
Spotify migrated to event-time windowing using Apache Beam on Google Cloud Dataflow. Play events carry an embedded timestamp from the client device. Tumbling windows (1-minute) compute plays-per-song with event time. Session windows (30-minute gap) compute listening sessions per user. Watermarks are generated from the stream's observed event times with a 2-minute allowed lateness. Late events trigger window refires that update the aggregation.
Outcome
Metric accuracy improved from ~94% (processing time) to 99.7% (event time with 2-minute lateness). Real-time dashboards show plays-per-song within 3 minutes of the event (watermark delay + processing). Session analytics accurately capture listening behavior: average session length, session frequency, and genre transitions. The 2-minute allowed lateness captures 99.7% of late events; the remaining 0.3% (>2 minutes late, typically from reconnecting offline devices) are routed to a side output for batch reconciliation.
See Windowing in Stream Processing in action
Explore system design templates that use windowing in stream processing and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the key difference between tumbling and hopping windows?
2Why are watermarks needed for event-time windowing?