How can we do Kafka Cluster Setup: Guidance on multi-broker clusters, ZooKeeper coordination?

How can we do Kafka Cluster Setup? Guidance on multi-broker clusters and ZooKeeper coordination (though newer versions replace ZooKeeper with KRaft)?

Kafka Cluster Setup: Setting up a Kafka cluster involves choosing between ZooKeeper-based (legacy) and KRaft-based (ZooKeeper-less) architectures. Consequently, before proceeding, it is crucial to understand the differences between these approaches. Below is a structured guide for both, including best practices and key configurations.


1. ZooKeeper-Based Cluster Setup

Core Components

  • ZooKeeper Ensemble: Specifically, it Coordinates broker metadata and leader elections (minimum 3 nodes for production).
  • Brokers: Additionally, these are Kafka servers handling data storage and client requests.

Configuration Steps

  1. Set Up ZooKeeper:
    • To begin with, Configure zoo.cfg with server entries:
server.1=zk1:2888:3888
server.2=zk2:2888:3888
server.3=zk3:2888:3888
    • Next, Create a myid file on each node (e.g., 1 for zk1).
  1. Configure Brokers:
    • Set unique broker.id in server.properties:
broker.id=1
listeners=PLAINTEXT://broker1:9092
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181
log.dirs=/var/lib/kafka/logs
    • Repeat with incremental broker.id for each node.
  1. Validate the Cluster:
    • First, Create a topic with replication factor matching broker count:
kafka-topics --create --bootstrap-server broker1:9092 \
  --topic test --partitions 3 --replication-factor 3
    • Then, Verify replication status with kafka-topics --describe.

2. KRaft-Based Cluster Setup

Core Components

  • Controllers: Specifically, they manage metadata via the Raft protocol (minimum 3 nodes for quorum).
  • Brokers: Meanwhile, they Handle data storage and client I/O.

Configuration Steps

  1. Generate a Cluster ID:
    kafka-storage.sh random-uuid
    # Output: ABCDEFGHIJKLMNOPQRSTUV
  2. Configure Controllers:
    • Firstly, Edit controller.properties:
      process.roles=controller
      node.id=1
      controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
      listeners=CONTROLLER://controller1:9093
      log.dirs=/var/lib/kafka/controller-logs
    • Then, Repeat for other controllers with unique node.id.
  3. Configure Brokers:
    • Edit server.properties:
      process.roles=broker
      node.id=101
      controller.quorum.voters=1@controller1:9093,2@controller2:9093,3@controller3:9093
      listeners=PLAINTEXT://broker1:9092
      log.dirs=/var/lib/kafka/broker-logs
    • After that, Start brokers with kafka-server-start.
  4. Validate the Cluster:
    • First, Check metadata quorum health:
      kafka-metadata-quorum --bootstrap-server controller1:9093 describe --status
    • Next, Produce/consume test messages to verify functionality.

3. Key Considerations

Factor ZooKeeper-Based KRaft-Based
Architecture Requires external ZooKeeper ensemble Self-contained; no external dependencies
Scalability Limited by ZooKeeper performance Improved metadata handling and scalability
Production Readiness Deprecated in Kafka 4.0+ Recommended for new deployments (Kafka 3.3+)
Setup Complexity Higher (dual-system management) Simplified (single-system management)

4. Best Practices

  • KRaft for New Deployments: Generally, use  KRaft unless legacy dependencies require ZooKeeper.
  • Controller Nodes: Moreover, deploy 3+ dedicated controllers in production (avoid combined roles).
  • Network Configuration:
    • Specifically, use separate listeners for internal (controller) and external (client) traffic.
    • Furthermore, enable TLS/SSL for inter-node communication.
  • Monitoring: Additionally, track metrics like ActiveControllerCount and MetadataLoggingLag for KRaft clusters.

For migrations from ZooKeeper to KRaft, carefully follow Kafka’s phased approach (backup data, test in staging, and monitor dual-write states). Importantly, note that direct upgrades between modes are unsupported.

5. References

About MinervaDB Corporation 101 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.