Event Streaming
Processing high-throughput, ordered log histories (e.g. Apache Kafka) persistently at scale.
What you'll learn
- Kafka Topics & Partitions
- Consumer Groups & Offset Management
- Retention & Replay
- Ordering Guarantees
- Exactly-Once Semantics (EOS)
- Stream Processing (Kafka Streams, Flink)
TL;DR
Processing high-throughput, ordered log histories (e.g. Apache Kafka) persistently at scale.
Visual System Topology
Event Streaming Network Handshake Flow
Concept Overview
Event streaming is the practice of capturing, storing, and processing continuous streams of events as they occur, in real time. Unlike message queues (where messages are deleted after consumption) or pub/sub (transient fan-out), event streaming persists events in a durable, ordered, append-only log. Any consumer can read the full event history from any point in time.
Apache Kafka is the dominant event streaming platform, used by Uber, LinkedIn, Netflix, and thousands of other companies to process billions of events per day. Alternatives include Amazon Kinesis (managed, simpler), Apache Pulsar (cloud-native, multi-tenancy), and Azure Event Hubs.
Event streaming enables three powerful capabilities: (1) Real-time processing — react to events as they occur (fraud detection, live dashboards); (2) Replay — reprocess historical events to build new views or fix bugs; (3) Event sourcing — the event log is the source of truth; state is derived by replaying events.
Key Architectural Pillars
Kafka Topics & Partitions
A topic is a named stream of events. Topics are divided into partitions — each partition is an independent, ordered, immutable log. Partitions are the unit of parallelism: more partitions = more consumers can read concurrently. The partition count determines max throughput.
Consumer Groups & Offset Management
A consumer group is a set of consumer instances sharing the same group_id. Kafka assigns each partition to exactly one consumer in the group. Each consumer tracks its offset (last processed message position) in Kafka. Restarting a consumer resumes from the last committed offset.
Retention & Replay
Kafka stores events for a configurable retention period (hours to forever). Any consumer can seek to any offset — including offset 0 (the beginning of time). This enables: debugging by replaying production traffic, rebuilding materialized views from scratch, and bootstrapping new consumers.
Ordering Guarantees
Kafka guarantees strict ordering within a partition, but not across partitions. Use a consistent partition key (e.g., user_id, order_id) to ensure all events for the same entity go to the same partition — maintaining entity-level ordering.
Exactly-Once Semantics (EOS)
Kafka Transactions (KIP-98) enable exactly-once processing: read-process-write cycles that commit atomically. The producer can write to multiple partitions atomically; if the producer crashes mid-write, the partial write is rolled back. Critical for financial and inventory systems.
Stream Processing (Kafka Streams, Flink)
Frameworks for processing Kafka streams in real time: filtering, aggregation, joining, windowing. Kafka Streams runs inside the application (no separate cluster). Apache Flink is a separate cluster with more powerful stateful processing.
