ClickHouse Cluster Setup and Configuration

Being a full-stack ClickHouse Optimization Support and Managed Services provider company we often get queries on ClickHouse installation and configuration for both standalone and clustered infrastructure setup. So we decided to write this blog which will help anyone interested in the setup and configuration of ClickHouse. This document is purely recommended for learning ClickHouse installation and configuration, please don’t ever use this document as a checklist or guidance for installation and configuration of ClicHouse on your production infrastructure. MinervaDB or any of their group companies/subsidiaries are not responsible for any kind of damages caused to your business from following this document for your production setup. Technically, ClickHouse installation is quite straight forward and you can run ClickHouse on any of the Linux, FreeBSD, or Mac OS X with x86_64, AArch64, or PowerPC64LE CPU architecture.

ClickHouse installation on Debian systems:

sudo apt-get install apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4

echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \
    /etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

sudo apt-get install -y clickhouse-server clickhouse-client

sudo service clickhouse-server start
clickhouse-client

sudo apt-get install apt-transport-https ca-certificates dirmngr

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4

echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee \

/etc/apt/sources.list.d/clickhouse.list

sudo apt-get update

sudo apt-get install -y clickhouse-server clickhouse-client

sudo service clickhouse-server start

clickhouse-client

From RPM packages

Add the official repository to install from pre-compiled rpm packages for CentOS, RedHat, and all other rpm-based Linux distributions:

sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/stable/x86_64

sudo yum install yum-utils

sudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPG

sudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/stable/x86_64

ClickHouse installation from repository configured above:

sudo yum install clickhouse-server clickhouse-client

1	sudo yum install clickhouse-server clickhouse-client

Installation and configuration of ClickHouse from source:

You can install and configure ClickHouse from the source here: https://github.com/ChistaDATA/ClickHouse

Single server with docker:

Run server

docker run -d --name clickhouse-server -p 9000:9000 --ulimit nofile=262144:262144 yandex/clickhouse-server

1	docker run -d --name clickhouse-server -p 9000:9000 --ulimit nofile=262144:262144 yandex/clickhouse-server

Run client

docker run -it --rm --link clickhouse-server:clickhouse-server yandex/clickhouse-client  --host clickhouse-server

1	docker run -it --rm --link clickhouse-server:clickhouse-server yandex/clickhouse-client --host clickhouse-server

Step-by-step ClickHouse Cluster setup

We will have 1 cluster with 3 shards in this setup
Each shard will have 2 replica servers
We are using ReplicatedMergeTree and Distributed table for this setup

Cluster setup

We have copied below docker-compose.yml for your reference:

version: '3'

services:
    clickhouse-zookeeper:
        image: zookeeper
        ports:
            - "2181:2181"
            - "2182:2182"
        container_name: clickhouse-zookeeper
        hostname: clickhouse-zookeeper

    clickhouse-01:
        image: yandex/clickhouse-server
        hostname: clickhouse-01
        container_name: clickhouse-01
        ports:
            - 9001:9000
        volumes:
                - ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml
                - ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml
                - ./config/macros/macros-01.xml:/etc/clickhouse-server/config.d/macros.xml
                # - ./data/server-01:/var/lib/clickhouse
        ulimits:
            nofile:
                soft: 262144
                hard: 262144
        depends_on:
            - "clickhouse-zookeeper"

    clickhouse-02:
        image: yandex/clickhouse-server
        hostname: clickhouse-02
        container_name: clickhouse-02
        ports:
            - 9002:9000
        volumes:
                - ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml
                - ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml
                - ./config/macros/macros-02.xml:/etc/clickhouse-server/config.d/macros.xml
                # - ./data/server-02:/var/lib/clickhouse
        ulimits:
            nofile:
                soft: 262144
                hard: 262144
        depends_on:
            - "clickhouse-zookeeper"

    clickhouse-03:
        image: yandex/clickhouse-server
        hostname: clickhouse-03
        container_name: clickhouse-03
        ports:
            - 9003:9000
        volumes:
                - ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml
                - ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml
                - ./config/macros/macros-03.xml:/etc/clickhouse-server/config.d/macros.xml
                # - ./data/server-03:/var/lib/clickhouse
        ulimits:
            nofile:
                soft: 262144
                hard: 262144
        depends_on:
            - "clickhouse-zookeeper"

    clickhouse-04:
        image: yandex/clickhouse-server
        hostname: clickhouse-04
        container_name: clickhouse-04
        ports:
            - 9004:9000
        volumes:
                - ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml
                - ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml
                - ./config/macros/macros-04.xml:/etc/clickhouse-server/config.d/macros.xml
                # - ./data/server-04:/var/lib/clickhouse
        ulimits:
            nofile:
                soft: 262144
                hard: 262144
        depends_on:
            - "clickhouse-zookeeper"

    clickhouse-05:
        image: yandex/clickhouse-server
        hostname: clickhouse-05
        container_name: clickhouse-05
        ports:
            - 9005:9000
        volumes:
                - ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml
                - ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml
                - ./config/macros/macros-05.xml:/etc/clickhouse-server/config.d/macros.xml
                # - ./data/server-05:/var/lib/clickhouse
        ulimits:
            nofile:
                soft: 262144
                hard: 262144
        depends_on:
            - "clickhouse-zookeeper"

    clickhouse-06:
        image: yandex/clickhouse-server
        hostname: clickhouse-06
        container_name: clickhouse-06
        ports:
            - 9006:9000
        volumes:
                - ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml
                - ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml
                - ./config/macros/macros-06.xml:/etc/clickhouse-server/config.d/macros.xml
                # - ./data/server-06:/var/lib/clickhouse
        ulimits:
            nofile:
                soft: 262144
                hard: 262144
        depends_on:
            - "clickhouse-zookeeper"
networks:
    default:
        external:
            name: clickhouse-net

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

version: '3'

services:

clickhouse-zookeeper:

image: zookeeper

ports:

- "2181:2181"

- "2182:2182"

container_name: clickhouse-zookeeper

hostname: clickhouse-zookeeper

clickhouse-01:

image: yandex/clickhouse-server

hostname: clickhouse-01

container_name: clickhouse-01

ports:

- 9001:9000

volumes:

- ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml

- ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml

- ./config/macros/macros-01.xml:/etc/clickhouse-server/config.d/macros.xml

# - ./data/server-01:/var/lib/clickhouse

ulimits:

nofile:

soft: 262144

hard: 262144

depends_on:

- "clickhouse-zookeeper"

clickhouse-02:

image: yandex/clickhouse-server

hostname: clickhouse-02

container_name: clickhouse-02

ports:

- 9002:9000

volumes:

- ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml

- ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml

- ./config/macros/macros-02.xml:/etc/clickhouse-server/config.d/macros.xml

# - ./data/server-02:/var/lib/clickhouse

ulimits:

nofile:

soft: 262144

hard: 262144

depends_on:

- "clickhouse-zookeeper"

clickhouse-03:

image: yandex/clickhouse-server

hostname: clickhouse-03

container_name: clickhouse-03

ports:

- 9003:9000

volumes:

- ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml

- ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml

- ./config/macros/macros-03.xml:/etc/clickhouse-server/config.d/macros.xml

# - ./data/server-03:/var/lib/clickhouse

ulimits:

nofile:

soft: 262144

hard: 262144

depends_on:

- "clickhouse-zookeeper"

clickhouse-04:

image: yandex/clickhouse-server

hostname: clickhouse-04

container_name: clickhouse-04

ports:

- 9004:9000

volumes:

- ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml

- ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml

- ./config/macros/macros-04.xml:/etc/clickhouse-server/config.d/macros.xml

# - ./data/server-04:/var/lib/clickhouse

ulimits:

nofile:

soft: 262144

hard: 262144

depends_on:

- "clickhouse-zookeeper"

clickhouse-05:

image: yandex/clickhouse-server

hostname: clickhouse-05

container_name: clickhouse-05

ports:

- 9005:9000

volumes:

- ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml

- ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml

- ./config/macros/macros-05.xml:/etc/clickhouse-server/config.d/macros.xml

# - ./data/server-05:/var/lib/clickhouse

ulimits:

nofile:

soft: 262144

hard: 262144

depends_on:

- "clickhouse-zookeeper"

clickhouse-06:

image: yandex/clickhouse-server

hostname: clickhouse-06

container_name: clickhouse-06

ports:

- 9006:9000

volumes:

- ./config/clickhouse_config.xml:/etc/clickhouse-server/config.xml

- ./config/clickhouse_metrika.xml:/etc/clickhouse-server/metrika.xml

- ./config/macros/macros-06.xml:/etc/clickhouse-server/config.d/macros.xml

# - ./data/server-06:/var/lib/clickhouse

ulimits:

nofile:

soft: 262144

hard: 262144

depends_on:

- "clickhouse-zookeeper"

networks:

default:

external:

name: clickhouse-net

So we have 6 ClickHouse Server containers and one ZooKeeper container. For successful replication, We have to configure ZooKeeper optimally. It’s purely ClickHouse which will take care of Database reliability and consistency in a replicated infrastructure. Between, We do have customers who replicate data by writing directly into all the available replicas. Though we strongly discourage this model as it will make your application accountable for data consistency across replication. This strategy is not scalable technically and operationally super complicated/complex to operate. we have copied below the config file:

./config/clickhouse_config.xml is the default config file in docker, we have copied the same in this blog too.

    <!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.
         By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.
         Values for substitutions are specified in /yandex/name_of_substitution elements in that file.
      -->
    <include_from>/etc/clickhouse-server/metrika.xml</include_from>

<!-- If element has 'incl' attribute, then for it's value will be used corresponding substitution from another file.

By default, path to file with substitutions is /etc/metrika.xml. It could be changed in config in 'include_from' element.

Values for substitutions are specified in /yandex/name_of_substitution elements in that file.

-->

<include_from>/etc/clickhouse-server/metrika.xml</include_from>

## copied below metrika.xml

<yandex>
	<clickhouse_remote_servers>
		<cluster_1>
			<shard>
                                <weight>1</weight>
                                <internal_replication>true</internal_replication>
				<replica>
					<host>clickhouse-01</host>
					<port>9000</port>
				</replica>
				<replica>
					<host>clickhouse-06</host>
					<port>9000</port>
				</replica>
			</shard>
			<shard>
                                <weight>1</weight>
                                <internal_replication>true</internal_replication>
				<replica>
					<host>clickhouse-02</host>
					<port>9000</port>
				</replica>
				<replica>
					<host>clickhouse-03</host>
					<port>9000</port>
				</replica>
			</shard>
			<shard>
                                <weight>1</weight>
                                <internal_replication>true</internal_replication>

				<replica>
					<host>clickhouse-04</host>
					<port>9000</port>
				</replica>
				<replica>
					<host>clickhouse-05</host>
					<port>9000</port>
				</replica>
			</shard>
		</cluster_1>
	</clickhouse_remote_servers>
        <zookeeper-servers>
            <node index="1">
                <host>clickhouse-zookeeper</host>
                <port>2181</port>
            </node>
        </zookeeper-servers>
        <networks>
            <ip>::/0</ip>
        </networks>
        <clickhouse_compression>
            <case>
                <min_part_size>10000000000</min_part_size>
                <min_part_size_ratio>0.01</min_part_size_ratio>
                <method>lz4</method>
            </case>
        </clickhouse_compression>
</yandex>

<clickhouse_remote_servers>

<cluster_1>

<shard>

<internal_replication>true</internal_replication>

<host>clickhouse-01</host>

</replica>

<host>clickhouse-06</host>

</replica>

</shard>

<shard>

<internal_replication>true</internal_replication>

<host>clickhouse-02</host>

</replica>

<host>clickhouse-03</host>

</replica>

</shard>

<shard>

<internal_replication>true</internal_replication>

<host>clickhouse-04</host>

</replica>

<host>clickhouse-05</host>

</replica>

</shard>

</cluster_1>

</clickhouse_remote_servers>

<zookeeper-servers>

<host>clickhouse-zookeeper</host>

</node>

</zookeeper-servers>

</networks>

<clickhouse_compression>

<case>

<min_part_size>10000000000</min_part_size>

<min_part_size_ratio>0.01</min_part_size_ratio>

</case>

</clickhouse_compression>

</yandex>

Each server has its own macros.xml, We have copied below same:

<yandex>
    <macros>
        <replica>clickhouse-01</replica>
        <shard>01</shard>
        <layer>01</layer>
    </macros>
</yandex>

<replica>clickhouse-01</replica>

</macros>

</yandex>

Note: Please confirm your macros setting are in sync. with remote server settings in metrika.xml

Start the server:

docker network create clickhouse-net
docker-compose up -d

1 2	docker network create clickhouse-net docker-compose up -d

Please connect with ClickHouse Server to confirm cluster settings are operational:

clickhouse-01 :) select * from system.clusters;

SELECT *
FROM system.clusters 

┌─cluster─────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┐
│ cluster_1                   │         1 │            1 │           1 │ clickhouse-01 │ 192.168.1.105   │ 9000 │        1 │ default │                  │
│ cluster_1                   │         1 │            1 │           2 │ clickhouse-06 │ 192.168.1.106  │ 9000 │        1 │ default │                  │
│ cluster_1                   │         2 │            1 │           1 │ clickhouse-02 │ 192.168.1.107  │ 9000 │        0 │ default │                  │
│ cluster_1                   │         2 │            1 │           2 │ clickhouse-03 │ 192.168.1.108   │ 9000 │        0 │ default │                  │
│ cluster_1                   │         3 │            1 │           1 │ clickhouse-04 │ 192.168.1.109   │ 9000 │        0 │ default │                  │
│ cluster_1                   │         3 │            1 │           2 │ clickhouse-05 │ 192.168.1.110   │ 9000 │        0 │ default │                  │
│ test_shard_localhost        │         1 │            1 │           1 │ localhost     │ 127.0.0.1    │ 9000 │        1 │ default │                  │
│ test_shard_localhost_secure │         1 │            1 │           1 │ localhost     │ 127.0.0.1    │ 9440 │        0 │ default │                  │
└─────────────────────────────┴───────────┴──────────────┴─────────────┴───────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┘

clickhouse-01 :) select * from system.clusters;

SELECT *

FROM system.clusters

┌─cluster─────────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─────┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┐

│ cluster_1 │ 1 │ 1 │ 1 │ clickhouse-01 │ 192.168.1.105 │ 9000 │ 1 │ default │ │

│ cluster_1 │ 1 │ 1 │ 2 │ clickhouse-06 │ 192.168.1.106 │ 9000 │ 1 │ default │ │

│ cluster_1 │ 2 │ 1 │ 1 │ clickhouse-02 │ 192.168.1.107 │ 9000 │ 0 │ default │ │

│ cluster_1 │ 2 │ 1 │ 2 │ clickhouse-03 │ 192.168.1.108 │ 9000 │ 0 │ default │ │

│ cluster_1 │ 3 │ 1 │ 1 │ clickhouse-04 │ 192.168.1.109 │ 9000 │ 0 │ default │ │

│ cluster_1 │ 3 │ 1 │ 2 │ clickhouse-05 │ 192.168.1.110 │ 9000 │ 0 │ default │ │

│ test_shard_localhost │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9000 │ 1 │ default │ │

│ test_shard_localhost_secure │ 1 │ 1 │ 1 │ localhost │ 127.0.0.1 │ 9440 │ 0 │ default │ │

└─────────────────────────────┴───────────┴──────────────┴─────────────┴───────────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┘

When you see this, It means the cluster setting is successful

Replica Table

we have now successfully configured a ClickHouse cluster and replica settings. For clickhouse, we have to create a ReplicatedMergeTree Table as a local table in each server:

CREATE TABLE test_house (id Int32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/test_house', '{replica}') PARTITION BY id ORDER BY id

1	CREATE TABLE test_house (id Int32) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/test_house', '{replica}') PARTITION BY id ORDER BY id

Create Distributed Table conn to the local table:

CREATE TABLE test_house_all as test_house ENGINE = Distributed(cluster_1, default, test_house, rand());

1	CREATE TABLE test_house_all as test_house ENGINE = Distributed(cluster_1, default, test_house, rand());

Test the setup with INSERT script

data generation and load:

# docker exec into client server 1 and
for ((idx=1;idx<=100;++idx)); do clickhouse-client --host clickhouse-server --query "Insert into default.test_house_all values ($idx)"; done;

1 2	# docker exec into client server 1 and for ((idx=1;idx<=100;++idx)); do clickhouse-client --host clickhouse-server --query "Insert into default.test_house_all values ($idx)"; done;

Count records on the Distributed table:

select count(*) from test_house_all;

1	select count(*) from test_house_all;

Count records on the local table:

select count(*) from test_house;

1	select count(*) from test_house;

☛ MinervaDB is trusted by top companies worldwide

Customer Logo

The WebScale Database Infrastructure Operations Experts in PostgreSQL, MySQL, MariaDB and ClickHouse

Enterprise-class 24*7 Consultative Support and Managed Services for PostgreSQL, MySQL, MariaDB and ClickHouse

ClickHouse Cluster Setup and Configuration

ClickHouse Cluster Setup and Configuration

ClickHouse installation on Debian systems:

From RPM packages

ClickHouse installation from repository configured above:

Installation and configuration of ClickHouse from source:

Single server with docker:

Run server

Run client

Step-by-step ClickHouse Cluster setup

Cluster setup

Replica Table

Test the setup with INSERT script

data generation and load:

Count records on the Distributed table:

Count records on the local table:

☛ MinervaDB is trusted by top companies worldwide