ClickHouse Performance Benchmarking – Brown University

Brown University Benchmark on ClickHouse


At ChistaDATA, We work for some of the largest ClickHouse installation worldwide and performance is very critical for our customers and ChistaDATA is committed to delivering optimal ClickHouse operations to the clients globally. ChistaDATA provides/publishes ClickHouse Application performance. We publish data of  both hardware and software infrastructure used for benchmarking ClickHouse with GitHub / data if ever needed to reproduce the same. The performance metrics published covers both Average Response Time / Latency and Throughput. We strongly believe that it’s the responsibility of ClickHouse infrastructure stakeholders / DBAs / Data SREs / Performance Engineers to understand the thresholds of their ClickHouse operations which eventually depend on RAM, CPU, Disk performance and Network latency. The key to ClickHouse performance benchmarking is to deliver consistently reproducible results. This is really important because the reproducible results allows you to rerun the tests and also gain confidence in the overall ClickHouse performance benchmarking exercise.

Hardware Infrastructure and Software Platforms Information

CPU

vendor and model details

CPU count

Memory Available / RAM Info.

Disk Infrastructure Operations – Capacity and Throughput

Software Infrastructure

Building ClickHouse Infrastructure for Performance Benchmarking

Source: Brown University Benchmark (MgBench is a new analytical benchmark for machine-generated log data) provided by Andrew Crotty

Step 1: Download the data (schema and data) for benchmarking ClickHouse:

Step 2: Unpack the data downloaded:

Step 3: Create Schema Objects / Tables:

Step 4: Data loading

Benchmarking ClickHouse Performance

— SQL 1: What is the CPU/network utilization for each web server since midnight?

–SQL 2: Which computer lab machines have been offline in the past day?

–SQL 3: What are the hourly average metrics during the past 10 days for a specific workstation?

–SQL 4 – Over 1 month, how often was each server blocked on disk I/O?

–SQL 5: Which externally reachable VMs have run low on memory?

— SQL 7: What is the total hourly network traffic across all file servers?

–SQL 8 – Which requests have caused server errors within the past 2 weeks?

–SQL 9 – During a specific 2-week period, was the user password file leaked?

–SQL 10: What was the average path depth for top-level requests in the past month?

–SQL 11: During the last 3 months, which clients have made an excessive number of requests?

–SQL 11: What are the daily unique visitors ?

–SQL 12: What are the average and maximum data transfer rates (Gbps)?

— SQL 17: Did the indoor temperature reach freezing over the weekend?

–SQL 18: Over the past 6 months, how frequently were each door opened?

–SQL 19: For each device category, what are the monthly power consumption metrics?

☛ MinervaDB is trusted by top companies worldwide

Customer Logo

 

About MinervaDB Corporation 36 Articles
A boutique private-label enterprise-class MySQL, MariaDB, MyRocks, PostgreSQL and ClickHouse consulting, 24*7 consultative support and remote DBA services company with core expertise in performance, scalability and high availability. Our consultants have several years of experience in architecting and building web-scale database infrastructure operations for internet properties from diversified verticals like CDN, Mobile Advertising Networks, E-Commerce, Social Media Applications, SaaS, Gaming and Digital Payment Solutions. Our globally distributed team working on multiple timezones guarantee 24*7 Consulting, Support and Remote DBA Services delivery for MySQL, MariaDB, MyRocks, PostgreSQL and ClickHouse.