How Uber uses Apache Kafka in production
Uber’s Kafka deployment is one of the most extensively documented in the industry, with the company publishing more than a dozen engineering blog posts and conference talks describing how its Streaming Platform team has evolved the infrastructure over a decade. By 2021, Uber was processing trillions of messages per day across tens of thousands of topics, with throughput that had grown from roughly one million to twelve million messages per second over five years. Apache Kafka sits at the centre of nearly every real-time system Uber operates, from matching riders with drivers to billing advertisers to detecting fraud.
Company overview
Uber operates a global ride-hailing, food delivery, and freight platform across more than 70 countries. At peak, thousands of trips are being coordinated simultaneously, each generating a continuous stream of GPS updates, payment events, and state changes from driver and rider applications. That volume, combined with strict latency requirements for matching and pricing, pushed Uber toward an event-driven architecture early in its growth.
Kafka was adopted in early 2015, beginning with a small cluster in a single region. Within a year, the platform was auditing approximately one trillion messages per day. By 2021, that figure had grown to trillions, and the team had built a suite of internal tools - uReplicator, Chaperone, uForwarder, and uGroup - on top of it. The timeline below traces the major milestones.
Timeline
| Date | Milestone |
|---|---|
| Early 2015 | Kafka adopted; small cluster in one region |
| November 2015 | uReplicator deployed to production |
| January 2016 | Chaperone auditing approximately 1 trillion messages per day |
| August 2016 | uReplicator open-sourced |
| December 2016 | Chaperone blog post published |
| June 2017 | Hadoop Summit: real-time infrastructure scaled to trillions of events per day |
| February 2018 | Dead letter queue / reliable reprocessing blog post published |
| March 2019 | DBEvents CDC framework blog post published |
| April 2019 | Kafka Summit SF: Kafka Cluster Federation and multi-region disaster recovery |
| January 2020 | Kappa architecture blog post published |
| December 2020 | Multi-region Kafka disaster recovery blog post published |
| August 2021 | Consumer Proxy blog published; 200,000 partitions, 12 million messages per second |
| September 2021 | Exactly-once ad event processing blog published |
| October 2021 | uGroup consumer management framework blog published |
| 2022-2023 | Kafka tiered storage deployed to production workloads |
| April 2024 | Kafka Summit London: Exactly-Once Stream Processing at Scale at Uber |
| July 2024 | Kafka tiered storage blog published |
| February 2026 | uForwarder open-sourced; 1,000+ consumer services onboarded |
Uber’s Kafka use cases
Kafka underpins real-time operations across Uber’s core products and its internal data platform. The use cases below span multiple engineering teams and reflect how the system has expanded from transport to advertising and insurance.
Rider-driver matching and surge pricing
GPS location events from rider and driver applications flow into Kafka, where stream processing jobs analyse supply and demand in near real-time. The same event streams power dynamic pricing, which updates fares every few seconds. UberEats ETAs are also calculated from Kafka-sourced event streams. These pipelines operate with strict latency targets: the platform targets API latency below 5ms and 99.99% availability.
Fraud detection
Streaming analysis of transaction and session events runs continuously to identify fraud patterns. Bot detection and rider session analytics feed into similar pipelines.
Change data capture
The DBEvents framework reads MySQL binary logs via an internal tool called StorageTapper and streams change events into Kafka. Cassandra CDC events follow the same path. Downstream, these events land in Uber’s Hadoop data lake via Apache Hudi, which supports upsert-capable writes for incremental updates.
Ad event processing
Impressions and clicks from UberEats advertising flow through a Kafka pipeline that handles ad pacing, budget management, and customer billing. This pipeline operates with exactly-once guarantees, described in more detail under Special techniques below.
Async microservice queuing
More than 1,000 internal consumer services use Kafka as an asynchronous queue via the uForwarder Consumer Proxy rather than consuming directly from brokers. The proxy handles offset management and delivery, shielding application teams from Kafka consumer group mechanics.
Dead letter queues and retry pipelines
The Driver Injury Protection insurance product deducts per-mile premiums on each trip. When payment processing fails, events route through tiered retry topics with increasing delays before landing in a dead letter queue. Independent consumer groups maintain retry pipelines for each failure tier.
Data archival and real-time analytics
Kafka events sink to Apache Pinot for real-time OLAP queries and to Apache Hive for warehouse analysis. This combination gives Uber both low-latency query access to recent data and long-term historical analysis from the same event stream.
Kappa architecture backfill
Streaming and batch jobs share a single codebase. Rather than replaying historical data into Kafka for backfill runs, Uber models Apache Hive as a streaming source in Spark Structured Streaming, allowing the same job logic to run against historical data without reloading it into the cluster.
Scale and throughput
| Metric | Value | Source |
|---|---|---|
| Messages per day | Trillions (multiple petabytes) | Uber Engineering Blog, 2020-2021 |
| Messages per second (2021) | 12 million | Consumer Proxy blog, August 2021 |
| Messages per second (baseline) | 1 million | Consumer Proxy blog (5 years prior) |
| Topics | Tens of thousands | Hadoop Summit 2017 |
| Topics audited by Chaperone | 20,000+ | Hadoop Summit 2017 |
| Partitions | 200,000 | Consumer Proxy blog, August 2021 |
| Consumer services on uForwarder | 1,000+ | uForwarder blog, February 2026 |
| Clusters | Dozens, across multiple regions | Kafka Cluster Federation talk, 2019 |
| API latency target | Less than 5ms | Hadoop Summit 2017 |
| Availability target | 99.99% | Hadoop Summit 2017 |
Throughput grew twelve-fold over five years. Much of that growth was absorbed by the Consumer Proxy rather than by adding partitions, since the proxy decouples partition count from consumer service concurrency.
Uber’s Kafka architecture
Cluster topology
Uber operates two tiers of Kafka clusters across multiple regions. Regional Clusters receive all producer traffic; producers always publish to their local region. Aggregate Clusters hold cross-region replicas and provide a unified global view of each topic. Replication between tiers is handled by uReplicator, Uber’s in-house replacement for Kafka MirrorMaker.
In addition to the regional/aggregate split, clusters are segmented by use case: separate clusters exist for logging, database changelogs, and high-reliability messaging. This isolation contains blast radius when one cluster has problems and allows retention and replication policies to be tuned per use case.
A Kafka Cluster Federation layer presents multiple physical clusters as a single logical cluster. A Kafka Proxy handles metadata routing, and a central Metadata Service maintains the registry of which topics live on which physical clusters.
Producer architecture
Producers publish through a Local Agent, a persistence layer that buffers messages on the producer side before writing to brokers. This improves durability by ensuring messages survive transient broker unavailability without being dropped at the client. Multi-language producer support covers Java, Go, Python, Node.js, and C++. An internal Kafka REST Proxy provides an HTTP interface for non-JVM producers; through internal optimisation work, its throughput was raised from 7,000 to 45,000 QPS per box.
Consumer architecture
Rather than having each microservice manage its own Kafka consumer group, Uber routes most consumer traffic through uForwarder, a push-based Consumer Proxy. uForwarder fetches messages from Kafka partitions and delivers them to consumer service endpoints via gRPC. This design separates partition count from processing concurrency: a consumer service can scale its processing threads independently of how many Kafka partitions the topic has. Offset commits are managed centrally by the proxy rather than by individual service instances.
uForwarder also handles out-of-order offset commits. Rather than blocking a partition on a slow message, it acknowledges individual messages and commits only contiguous completed ranges. This allows parallel processing within a partition without the risk of losing acknowledgement state.
Stream processing
Apache Flink is the primary stream processing engine for stateful workloads, including fraud detection and ad event aggregation. Apache Samza runs alongside Flink for some pipelines. Apache Spark Structured Streaming is used for the Kappa architecture backfill jobs described above.
Kafka Connect ecosystem
CDC ingestion uses StorageTapper, Uber’s internal MySQL binlog reader, to produce change events into Kafka. Downstream, sinks connect to Apache Hive, HDFS, and Apache Pinot. The Chaperone audit system also consumes every Kafka message as part of the observability pipeline.
Special techniques and engineering innovations
uReplicator: custom cross-cluster replication
Uber replaced Kafka MirrorMaker with uReplicator after MirrorMaker caused weekly production outages. The root cause was consumer group rebalancing: whenever a topic was added or changed, MirrorMaker would pause replication for five to ten minutes while rebalancing completed. uReplicator addresses this with Apache Helix for static partition assignment and a DynamicKafkaConsumer that eliminates rebalance-triggered pauses. New topics can be added to replication at runtime without restarting the cluster. uReplicator also applies header-based filters during replication to prevent cyclic data duplication across federated clusters.
Exactly-once ad event processing
Uber’s advertising billing pipeline uses a combination of Flink transactional producers, Kafka read_committed consumer isolation, two-minute checkpoint intervals, and per-record UUIDs to achieve end-to-end exactly-once delivery. Deduplication at the sink layer uses Apache Pinot’s native upsert capability and Hive keyed on the same UUIDs. This pipeline was built specifically to avoid double-counting billable advertising events.
Kafka tiered storage
Uber implemented Kafka tiered storage using the KIP-405 pluggable storage interface. Recent log segments remain on broker disk for low-latency access. Older segments are offloaded to remote storage (HDFS, S3, GCS, or Azure) transparently, without the consumer needing to know which tier a message is fetched from. This decouples storage retention from broker capacity: longer retention periods no longer require adding brokers or running separate data pipelines to external storage.
Kappa architecture backfill via Hive as a streaming source
When a streaming job needs to backfill historical data, replaying that data into Kafka adds significant cluster load and can disrupt real-time consumers. Uber avoids this by modelling Apache Hive as a streaming source in Spark Structured Streaming. The same job code handles both real-time Kafka consumption and historical Hive reads, and windowing semantics are preserved across the two modes.
Dead letter queue topology
Uber’s dead letter queue implementation uses a multi-tier retry topology: a main consumption topic feeds into retry topics with increasing delay intervals, and messages that exhaust retries land in a dead letter topic. Each tier uses Avro schemas and a leaky bucket pattern for flow control. Multiple independent consumer groups can maintain their own retry pipelines against the same underlying topics.
uGroup: consumer group visibility via __consumer_offsets decoding
Standard Kafka consumer group monitoring relies on active consumers reporting metrics. This misses consumer groups that are stopped or failing silently. uGroup is a streaming job that decodes Kafka’s internal __consumer_offsets topic directly, making all consumer group activity visible regardless of consumer state. It also tracks offset state across regions to support disaster recovery failover.
Operating Kafka at scale
End-to-end auditing with Chaperone
Chaperone is Uber’s audit system for Kafka pipelines. It consumes every message across all topics and uses ten-minute tumbling windows to compute count, p99 latency, and duplication metrics at four points in the pipeline: the proxy client, the proxy server, regional brokers, and aggregate brokers. It audits more than 20,000 topics and uses write-ahead logging and UUIDs to ensure that audit records themselves are written with exactly-once semantics. When a discrepancy appears, operators can pinpoint which pipeline tier introduced data loss or duplication.
Consumer group observability with uGroup
uGroup emits lag metrics and stuck-partition alerts for all consumer groups, including those that are not currently running. During a multi-region failover, uGroup provides the offset state mapping needed for consumers to resume from the correct position in the target region.
Schema governance with Schema-Service and Heatpipe
Uber maintains an in-house schema registry called Schema-Service that enforces backward-compatible Avro schema evolution. The Heatpipe library, used by producers, validates messages against the registered schema at ingestion time. This prevents malformed or schema-incompatible data from entering Kafka pipelines.
Topic ownership enforcement
Uber requires ownership metadata at topic creation. Automated tooling infers ownership where possible. This is part of a broader data culture initiative to ensure every dataset - including Kafka topics - has an identifiable owner who can be contacted during an incident.
Centralised upgrades via Consumer Proxy
Before uForwarder, each of Uber’s 1,000+ consumer services maintained its own Kafka client library, often across multiple languages. Upgrading Kafka client versions required coordinating changes across hundreds of services. With uForwarder, the Kafka consumer implementation is centralised in the proxy. Client upgrades happen in the proxy without requiring changes to the services it serves.
Deployment
Uber runs Kafka as a self-managed deployment across its own infrastructure in multiple geographic regions.
Challenges and how they solved them
| Challenge | Solution | Outcome |
|---|---|---|
| Kafka MirrorMaker caused weekly outages due to rebalancing delays of five to ten minutes | Built uReplicator with Apache Helix for static partition assignment and a DynamicKafkaConsumer | Rebalancing delays eliminated; dynamic topic whitelisting without cluster restart |
| Scaling partitions: 1 msg/sec throughput required one partition per consumer thread, which would require millions of partitions across 1,000+ services | Consumer Proxy multiplexes a single partition across many service instances via gRPC | Throughput scaled from 1 million to 12 million messages per second without proportional partition growth |
| Head-of-line blocking from slow or poisoned messages halted entire partitions | Consumer Proxy detects stuck consumers and routes problem messages to DLQ; uForwarder adds active head-of-line blocking resolution | Individual slow messages no longer stall partition processing |
| Backfilling streaming jobs required replaying data into Kafka, creating significant cluster load | Kappa architecture models Hive as a streaming source in Spark Structured Streaming | Historical backfill runs without additional Kafka load |
| Ad billing double-counting risk from at-least-once delivery | Flink transactions, Kafka read_committed, per-record UUIDs, and Pinot upserts | End-to-end exactly-once delivery for billing events |
| Storage and compute were coupled: longer retention required adding broker capacity | Kafka tiered storage (KIP-405) offloads old segments to S3, GCS, or HDFS | Retention extended without adding brokers |
| Kafka REST Proxy throughput was insufficient at 7,000 QPS per box | Internal performance optimisations | Throughput raised to 45,000 QPS per box |
| No visibility into consumer groups that weren’t actively running | uGroup decodes __consumer_offsets directly | Full consumer group visibility including offline consumers |
| Offset mapping during multi-region failover with out-of-order replication | Active/Passive mode with periodic cross-region offset synchronisation and offset mapping tables | Consumers can resume from the correct position after a regional failover |
Full tech stack
| Category | Tools | Notes |
|---|---|---|
| Message broker | Apache Kafka | Self-managed, multi-region |
| Schema registry | Schema-Service (internal) + Heatpipe | Avro backward compatibility enforced at ingestion |
| Stream processing | Apache Flink, Apache Samza, Apache Spark Structured Streaming | Flink primary for stateful workloads |
| Cross-cluster replication | uReplicator (open-source, Uber-built) | Replaces MirrorMaker; Apache Helix for partition assignment |
| Consumer proxy | uForwarder (open-source, Uber-built) | Push-based; gRPC delivery; 1,000+ services |
| Connectors / CDC | StorageTapper (MySQL binlog reader, internal) | CDC to Kafka; downstream to Hive and Pinot |
| Real-time OLAP | Apache Pinot | Native upsert for exactly-once deduplication |
| Data lake / warehouse | Apache Hadoop, HDFS, Apache Hive | CDC and event archival; also used as streaming source |
| Upsert storage | Apache Hudi | Incremental CDC updates on HDFS |
| Tiered storage backends | HDFS, Amazon S3, Google Cloud Storage, Azure | KIP-405 pluggable interface |
| Monitoring / auditing | Chaperone (internal), uGroup (internal) | End-to-end audit; consumer group lag and DR offset tracking |
| Cluster coordination | Apache Helix, Apache ZooKeeper | uReplicator partition assignment; cluster state |
| Serialisation | Apache Avro | Enforced by Heatpipe + Schema-Service |
| Transport (Consumer Proxy) | gRPC / Protobuf | uForwarder to consumer service endpoints |
| HTTP producer interface | Kafka REST Proxy (internal) | Optimised to 45,000 QPS per box |
| Languages | Java, Go, Python, Node.js, C++ | Multi-language producer support |
Key contributors
| Name | Role | Contribution |
|---|---|---|
| Chinmay Soman | Software Engineer, Streaming Platform | Led uReplicator design; authored the uReplicator blog post |
| Yuanchi Ning, Xiang Fu, Hongliang Xu | Streaming Platform engineers | Co-built uReplicator |
| Xiaobing Li | Software Engineer, Core Infrastructure | Co-authored Chaperone blog post |
| Ankur Bansal | Senior Software Engineer, Streaming Team | Co-authored Chaperone; presented at Hadoop Summit 2017 |
| Mingmin Chen | Director of Engineering, SSD Team | Hadoop Summit 2017; co-authored DR blog; uGroup |
| Yupeng Fu | Principal Software Engineer, SSD/Streaming Team | Disaster recovery blog; uGroup; ad events; Kafka Cluster Federation talk |
| Xiaoman Dong | Senior Software Engineer, Streaming Data | Kafka Cluster Federation talk; uGroup |
| Ovais Tariq | Sr. Manager, Core Storage | Led DBEvents CDC framework |
| Amey Chaugule | Senior Software Engineer, Marketplace Experimentation | Authored Kappa Architecture blog |
| Qichao Chu, George Teo, Haitao Zhang, Zhifeng Chen | Streaming Data Team | Co-authored Consumer Proxy blog; led uForwarder |
| Jacob Tsafatinos, Yuriy Bondaruk, Yupeng Fu, James Kwon | Ads Platform / Ads Billing | Co-authored exactly-once ad events blog |
| Ning Xia | Software Engineer, Payments Team | Authored dead letter queue blog |
| Abhijeet Kumar, Kamal Chandraprakash, Satish Duggana | Kafka Team | Co-authored Kafka tiered storage blog; Satish Duggana is an Apache Kafka committer and PMC member |
| Roshan Naik, Si Lao | Uber Engineering | Presented Exactly-Once Stream Processing at Scale at Kafka Summit London 2024 |
Key takeaways for your own Kafka implementation
- Replication tooling matters at scale. MirrorMaker’s consumer group rebalancing caused weekly outages for Uber. If you are replicating across clusters or regions, understand how your replication tool handles topic changes and partition rebalancing before it becomes a production problem.
- Partition count and consumer concurrency are separate concerns. Uber’s Consumer Proxy approach shows that you do not have to create more partitions to scale consumer throughput. A proxy or multiplexing layer can decouple the two, which is relevant if you have many small consumers or need to avoid partition-count overhead.
- Exactly-once requires a coordinated strategy across producers, consumers, and sinks. Uber’s ad billing pipeline combines Flink transactions, Kafka
read_committedisolation, and per-record UUIDs with Pinot upserts. No single piece provides the guarantee on its own. If you need exactly-once for a high-stakes pipeline, plan the deduplication strategy at every tier from the start. - Tiered storage changes the broker-sizing conversation. Decoupling log retention from broker disk means you can extend retention without adding brokers. If your current retention policy is constrained by storage cost, tiered storage is worth evaluating before the next round of broker capacity planning.
- Centralising consumer infrastructure simplifies client upgrades. Coordinating a Kafka client upgrade across hundreds of services in multiple languages is operationally expensive. If you are managing many consumer services, a shared consumer proxy or library layer reduces the coordination overhead significantly.
Sources and further reading
| # | Source |
|---|---|
| 1 | Chinmay Soman et al. - uReplicator: Uber Engineering’s Robust Apache Kafka Replicator - Uber Engineering Blog, 2016-08-04 |
| 2 | Xiaobing Li, Ankur Bansal - Chaperone: Audit Apache Kafka End-to-End - Uber Engineering Blog, 2016-12-08 |
| 3 | Ankur Bansal, Mingmin Chen - How Uber Scaled Its Real-Time Infrastructure to Trillion Events Per Day - Hadoop Summit, 2017-06-14 |
| 4 | Ovais Tariq, Nishith Agarwal - DBEvents: Uber’s Ingestion Framework for Database Events - Uber Engineering Blog, 2019-03-14 |
| 5 | Yupeng Fu, Xiaoman Dong - Kafka Cluster Federation at Uber - Kafka Summit SF 2019 |
| 6 | Amey Chaugule - Kappa Architecture: Uber’s Approach to Unifying Streaming and Batch - Uber Engineering Blog, 2020-01-23 |
| 7 | Yupeng Fu, Mingmin Chen - Disaster Recovery for Multi-Region Kafka at Uber - Uber Engineering Blog, 2020-12-21 |
| 8 | Qichao Chu, George Teo, Haitao Zhang, Zhifeng Chen - Kafka Async Queuing with Consumer Proxy - Uber Engineering Blog, 2021-08-31 |
| 9 | Jacob Tsafatinos et al. - Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot - Uber Engineering Blog, 2021-09-23 |
| 10 | Qichao Chu et al. - Introducing uGroup: Uber’s Consumer Management Framework - Uber Engineering Blog, 2021-10-21 |
| 11 | Abhijeet Kumar et al. - Kafka Tiered Storage at Uber - Uber Engineering Blog, 2024-07-01 |
| 12 | Zhifeng Chen, Haifeng Chen - Introducing uForwarder - Uber Engineering Blog, 2026-02-05 |
| 13 | Roshan Naik, Si Lao - Exactly-Once Stream Processing at Scale in Uber - Kafka Summit London 2024 |
| 14 | Ning Xia - Reliable Reprocessing: Dead Letter Queues for Apache Kafka - Uber Engineering Blog, 2018-02-16 |
If you want to explore your own Kafka topics and consumer groups with the kind of visibility Uber has built internally, Kpow offers a free 30-day trial and connects to any Kafka cluster in minutes.