Event Streaming

TL;DR

Processing high-throughput, ordered log histories (e.g. Apache Kafka) persistently at scale.

Visual System Topology

Event Streaming Network Handshake Flow

Client Node Initiates Request

Multiplexed

Event Streaming Gateway Routes Traffic

Fast Payload

Backend Server Executes Logic

Concept Overview

Event streaming is the practice of capturing, storing, and processing continuous streams of events as they occur, in real time. Unlike message queues (where messages are deleted after consumption) or pub/sub (transient fan-out), event streaming persists events in a durable, ordered, append-only log. Any consumer can read the full event history from any point in time.

Apache Kafka is the dominant event streaming platform, used by Uber, LinkedIn, Netflix, and thousands of other companies to process billions of events per day. Alternatives include Amazon Kinesis (managed, simpler), Apache Pulsar (cloud-native, multi-tenancy), and Azure Event Hubs.

Event streaming enables three powerful capabilities: (1) Real-time processing — react to events as they occur (fraud detection, live dashboards); (2) Replay — reprocess historical events to build new views or fix bugs; (3) Event sourcing — the event log is the source of truth; state is derived by replaying events.

Key Architectural Pillars

Kafka Topics & Partitions

A topic is a named stream of events. Topics are divided into partitions — each partition is an independent, ordered, immutable log. Partitions are the unit of parallelism: more partitions = more consumers can read concurrently. The partition count determines max throughput.

Example: Topic "orders" with 12 partitions → up to 12 consumers in a consumer group can read in parallel, each handling 1 partition.

Consumer Groups & Offset Management

A consumer group is a set of consumer instances sharing the same group_id. Kafka assigns each partition to exactly one consumer in the group. Each consumer tracks its offset (last processed message position) in Kafka. Restarting a consumer resumes from the last committed offset.

Example: group_id="order-processor", 3 consumers, 12 partitions → each consumer reads 4 partitions. If one consumer crashes, Kafka rebalances its partitions to the remaining 2 consumers.

Retention & Replay

Kafka stores events for a configurable retention period (hours to forever). Any consumer can seek to any offset — including offset 0 (the beginning of time). This enables: debugging by replaying production traffic, rebuilding materialized views from scratch, and bootstrapping new consumers.

Example: Fraud detection team deploys a new ML model. They replay 30 days of transaction events through the new model to test it against historical data, then go live.

Ordering Guarantees

Kafka guarantees strict ordering within a partition, but not across partitions. Use a consistent partition key (e.g., user_id, order_id) to ensure all events for the same entity go to the same partition — maintaining entity-level ordering.

Example: All events for order_123 use partition key "order_123". They always land on the same partition and are processed in order: Created → Paid → Shipped → Delivered.

Exactly-Once Semantics (EOS)

Kafka Transactions (KIP-98) enable exactly-once processing: read-process-write cycles that commit atomically. The producer can write to multiple partitions atomically; if the producer crashes mid-write, the partial write is rolled back. Critical for financial and inventory systems.

Example: Kafka Streams application: read payment event, deduct balance, write updated balance event. With EOS, this happens exactly once even if the app crashes mid-operation.

Stream Processing (Kafka Streams, Flink)

Frameworks for processing Kafka streams in real time: filtering, aggregation, joining, windowing. Kafka Streams runs inside the application (no separate cluster). Apache Flink is a separate cluster with more powerful stateful processing.

Example: Kafka Streams: count user purchases per hour window. Emit an alert if any user exceeds 10 purchases/hour (potential fraud indicator).

Foundations of Distributed Systems

Networking & Communication

Data Storage & Databases

Performance & Scaling

System Architecture

Data Processing Systems

Reliability & Operations

Security

Trade-offs & Interview Thinking

Real-world Case Studies