Brown University Benchmark on ClickHouse
At ChistaDATA, We work for some of the largest ClickHouse installation worldwide and performance is very critical for our customers and ChistaDATA is committed to delivering optimal ClickHouse operations to the clients globally. ChistaDATA provides/publishes ClickHouse Application performance. We publish data of both hardware and software infrastructure used for benchmarking ClickHouse with GitHub / data if ever needed to reproduce the same. The performance metrics published covers both Average Response Time / Latency and Throughput. We strongly believe that it’s the responsibility of ClickHouse infrastructure stakeholders / DBAs / Data SREs / Performance Engineers to understand the thresholds of their ClickHouse operations which eventually depend on RAM, CPU, Disk performance and Network latency. The key to ClickHouse performance benchmarking is to deliver consistently reproducible results. This is really important because the reproducible results allows you to rerun the tests and also gain confidence in the overall ClickHouse performance benchmarking exercise.
Hardware Infrastructure and Software Platforms Information
CPU
vendor and model details
1 2 3 4 |
# view vendor name root@Performance-LAB:~# cat /proc/cpuinfo | grep 'vendor' | uniq vendor_id : GenuineIntel |
1 2 3 4 |
# Display model name root@Performance-LAB:~# cat /proc/cpuinfo | grep 'model name' | uniq model name : Intel Xeon Processor (Icelake) |
CPU count
1 2 3 4 |
# Count the number of processing units root@Performance-LAB:~# cat /proc/cpuinfo | grep processor | wc -l 8 |
1 2 3 4 5 6 7 8 9 |
root@Performance-LAB:~# cat /proc/cpuinfo | grep 'core id' core id : 0 core id : 1 core id : 2 core id : 3 core id : 4 core id : 5 core id : 6 core id : 7 |
Memory Available / RAM Info.
1 2 3 4 5 6 7 8 9 |
## RAM info. root@Performance-LAB:~# root@Performance-LAB:~# free -h total used free shared buff/cache available Mem: 15Gi 286Mi 14Gi 0.0Ki 1.3Gi 14Gi Swap: 0B 0B 0B root@Performance-LAB:~# |
Disk Infrastructure Operations – Capacity and Throughput
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
### Disk Information root@Performance-LAB:~# fdisk -l Disk /dev/loop0: 61.83 MiB, 64835584 bytes, 126632 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop1: 72.56 MiB, 76083200 bytes, 148600 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/loop2: 32.44 MiB, 34017280 bytes, 66440 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk /dev/vda: 200 GiB, 214748364800 bytes, 419430400 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: ED13961A-1618-437B-A9EB-A140AAA7F944 Device Start End Sectors Size Type /dev/vda1 227328 419430366 419203039 199.9G Linux filesystem /dev/vda14 2048 10239 8192 4M BIOS boot /dev/vda15 10240 227327 217088 106M Microsoft basic data |
Software Infrastructure
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
## Linux distribution / Ubuntu release root@PERFORMANCE-LAB:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 21.10 Release: 21.10 Codename: impish ## ClickHouse version root@PERFORMANCE-LAB:~# clickhouse-client --password ClickHouse client version 21.11.6.7 (official build). Password for user (default): Connecting to localhost:9000 as user default. Connected to ClickHouse server version 21.11.6 revision 54450. PERFORMANCE-LAB :) |
Building ClickHouse Infrastructure for Performance Benchmarking
Source: Brown University Benchmark (MgBench is a new analytical benchmark for machine-generated log data) provided by Andrew Crotty
Step 1: Download the data (schema and data) for benchmarking ClickHouse:
1 |
wget https://datasets.clickhouse.com/mgbench{1..3}.csv.xz |
Step 2: Unpack the data downloaded:
1 |
xz -v -d mgbench{1..3}.csv.xz |
Step 3: Create Schema Objects / Tables:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
CREATE DATABASE mgbench; CREATE TABLE mgbench.logs1 ( log_time DateTime, machine_name LowCardinality(String), machine_group LowCardinality(String), cpu_idle Nullable(Float32), cpu_nice Nullable(Float32), cpu_system Nullable(Float32), cpu_user Nullable(Float32), cpu_wio Nullable(Float32), disk_free Nullable(Float32), disk_total Nullable(Float32), part_max_used Nullable(Float32), load_fifteen Nullable(Float32), load_five Nullable(Float32), load_one Nullable(Float32), mem_buffers Nullable(Float32), mem_cached Nullable(Float32), mem_free Nullable(Float32), mem_shared Nullable(Float32), swap_free Nullable(Float32), bytes_in Nullable(Float32), bytes_out Nullable(Float32) ) ENGINE = MergeTree() ORDER BY (machine_group, machine_name, log_time); CREATE TABLE mgbench.logs2 ( log_time DateTime, client_ip IPv4, request String, status_code UInt16, object_size UInt64 ) ENGINE = MergeTree() ORDER BY log_time; CREATE TABLE mgbench.logs3 ( log_time DateTime64, device_id FixedString(15), device_name LowCardinality(String), device_type LowCardinality(String), device_floor UInt8, event_type LowCardinality(String), event_unit FixedString(1), event_value Nullable(Float32) ) ENGINE = MergeTree() ORDER BY (event_type, log_time); |
Step 4: Data loading
1 2 3 |
clickhouse-client --query "INSERT INTO mgbench.logs1 FORMAT CSVWithNames" < mgbench1.csv clickhouse-client --query "INSERT INTO mgbench.logs2 FORMAT CSVWithNames" < mgbench2.csv clickhouse-client --query "INSERT INTO mgbench.logs3 FORMAT CSVWithNames" < mgbench3.csv |
Benchmarking ClickHouse Performance
— SQL 1: What is the CPU/network utilization for each web server since midnight?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
SELECT machine_name, MIN(cpu) AS cpu_min, MAX(cpu) AS cpu_max, AVG(cpu) AS cpu_avg, MIN(net_in) AS net_in_min, MAX(net_in) AS net_in_max, AVG(net_in) AS net_in_avg, MIN(net_out) AS net_out_min, MAX(net_out) AS net_out_max, AVG(net_out) AS net_out_avg FROM ( SELECT machine_name, COALESCE(cpu_user, 0.0) AS cpu, COALESCE(bytes_in, 0.0) AS net_in, COALESCE(bytes_out, 0.0) AS net_out FROM logs1 WHERE machine_name IN ('anansi','aragog','urd') AND log_time >= TIMESTAMP '2017-01-11 00:00:00' ) AS r GROUP BY machine_name\G Query id: 6772d840-bc4f-4d90-8d78-d47a8ecf332e Row 1: ────── machine_name: urd cpu_min: 0 cpu_max: 23.493331909179688 cpu_avg: 1.499065357465967 net_in_min: 1168.3499755859375 net_in_max: 368025.625 net_in_avg: 22100.285078089873 net_out_min: 220.95001220703125 net_out_max: 108069.65625 net_out_avg: 8129.441456009021 Row 2: ────── machine_name: anansi cpu_min: 0 cpu_max: 22.5 cpu_avg: 0.1426860871961902 net_in_min: 938.949951171875 net_in_max: 2355335.75 net_in_avg: 9177.160347696004 net_out_min: 216.8800048828125 net_out_max: 2354536.25 net_out_avg: 13467.316385337206 Row 3: ────── machine_name: aragog cpu_min: 0 cpu_max: 31.14000129699707 cpu_avg: 0.46608482347828106 net_in_min: 1318.68994140625 net_in_max: 27836488 net_in_avg: 328922.02496959124 net_out_min: 308.9499816894531 net_out_max: 40642404 net_out_avg: 825490.5620742807 ↖ Progress: 217.89 thousand rows, 607.23 KB (25.75 million rows/s., 71.75 M 3 rows in set. Elapsed: 0.009 sec. Processed 217.89 thousand rows, 607.23 KB (25.39 million rows/s., 70.75 MB/s.) |
–SQL 2: Which computer lab machines have been offline in the past day?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
SELECT machine_name, log_time FROM logs1 WHERE (machine_name LIKE 'cslab%' OR machine_name LIKE 'mslab%') AND load_one IS NULL AND log_time >= TIMESTAMP '2017-01-10 00:00:00' ORDER BY machine_name, log_time; Query id: fb5f4823-8221-404c-aec5-4e4ef59a95db Row 1: ────── machine_name: cslab9e log_time: 2017-01-10 14:14:15 Row 2: ────── machine_name: cslab9e log_time: 2017-01-10 14:14:30 Row 3: ────── machine_name: cslab9e log_time: 2017-01-10 14:14:45 Row 4: ────── machine_name: cslab9e log_time: 2017-01-10 14:15:15 Row 5: ────── machine_name: cslab9e log_time: 2017-01-10 14:15:30 Row 6: ────── machine_name: cslab9e log_time: 2017-01-10 14:15:45 Row 7: ────── machine_name: cslab9e log_time: 2017-01-10 14:16:15 Row 8: ────── machine_name: cslab9e log_time: 2017-01-10 14:16:30 Row 9: ─────── machine_name: cslab9e log_time: 2017-01-10 14:16:45 Row 10: ─────── machine_name: cslab9e log_time: 2017-01-10 14:17:15 Row 11: ─────── machine_name: cslab9e log_time: 2017-01-10 14:17:30 Row 12: ─────── machine_name: cslab9e log_time: 2017-01-10 14:17:45 Row 13: ─────── machine_name: cslab9e log_time: 2017-01-10 14:18:15 Row 14: ─────── machine_name: cslab9e log_time: 2017-01-10 14:18:30 Row 15: ─────── machine_name: cslab9e log_time: 2017-01-10 14:18:45 Row 16: ─────── machine_name: cslab9e log_time: 2017-01-10 14:19:15 Row 17: ─────── machine_name: cslab9e log_time: 2017-01-10 14:19:30 Row 18: ─────── machine_name: cslab9e log_time: 2017-01-10 14:19:45 ↗ Progress: 1.63 million rows, 14.84 MB (92.92 million rows/s., 843.85 MB/s 18 rows in set. Elapsed: 0.018 sec. Processed 1.63 million rows, 14.84 MB (91.67 million rows/s., 832.47 MB/s.) |
–SQL 3: What are the hourly average metrics during the past 10 days for a specific workstation?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
SELECT dt, hr, AVG(load_fifteen) AS load_fifteen_avg, AVG(load_five) AS load_five_avg, AVG(load_one) AS load_one_avg, AVG(mem_free) AS mem_free_avg, AVG(swap_free) AS swap_free_avg FROM ( SELECT CAST(log_time AS DATE) AS dt, EXTRACT(HOUR FROM log_time) AS hr, load_fifteen, load_five, load_one, mem_free, swap_free FROM logs1 WHERE machine_name = 'babbage' AND load_fifteen IS NOT NULL AND load_five IS NOT NULL AND load_one IS NOT NULL AND mem_free IS NOT NULL AND swap_free IS NOT NULL AND log_time >= TIMESTAMP '2017-01-01 00:00:00' ) AS r GROUP BY dt, hr ORDER BY dt, hr\G Row 1: ────── dt: 2017-01-01 hr: 0 load_fifteen_avg: 0.12023027762770652 load_five_avg: 0.09362472295761108 load_one_avg: 0.08744388920943796 mem_free_avg: 2170359.9625 swap_free_avg: 8388604 Row 2: ────── dt: 2017-01-01 hr: 1 load_fifteen_avg: 0.06573361121118068 load_five_avg: 0.06255777780897916 load_one_avg: 0.05611999999382533 mem_free_avg: 2170050.05 swap_free_avg: 8388604 Row 3: ────── dt: 2017-01-01 hr: 2 load_fifteen_avg: 0.06406694483011961 load_five_avg: 0.08363861131171385 load_one_avg: 0.07870277822948993 mem_free_avg: 2170090.8666666667 swap_free_avg: 8388604 Row 4: ────── dt: 2017-01-01 hr: 3 load_fifteen_avg: 0.0986541671678424 load_five_avg: 0.1006138886945943 load_one_avg: 0.10358000136911868 mem_free_avg: 2169553.683333333 swap_free_avg: 8388604 Row 5: ────── dt: 2017-01-01 hr: 4 load_fifteen_avg: 0.09523666650056839 load_five_avg: 0.09710055496543646 load_one_avg: 0.09572555573152688 mem_free_avg: 2167171.0083333333 swap_free_avg: 8388604 ↗ Progress: 193.32 thousand rows, 2.34 MB (15.36 million rows/s., 186.18 255 rows in set. Elapsed: 0.013 sec. Processed 193.32 thousand rows, 2.34 MB (15.21 million rows/s., 184.41 MB/s.) |
–SQL 4 – Over 1 month, how often was each server blocked on disk I/O?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
SELECT machine_name, COUNT(*) AS spikes FROM logs1 WHERE machine_group = 'Servers' AND cpu_wio > 0.99 AND log_time >= TIMESTAMP '2016-12-01 00:00:00' AND log_time < TIMESTAMP '2017-01-01 00:00:00' GROUP BY machine_name ORDER BY spikes DESC LIMIT 10\G Query id: 3d9c7259-85f6-44cf-a538-acad00c162df Row 1: ────── machine_name: sourpatch spikes: 8861 Row 2: ────── machine_name: louie spikes: 581 Row 3: ────── machine_name: zotz spikes: 550 Row 4: ────── machine_name: pieces spikes: 495 Row 5: ────── machine_name: reeces spikes: 453 Row 6: ────── machine_name: thing spikes: 438 Row 7: ────── machine_name: adminhost spikes: 366 Row 8: ────── machine_name: cadbury spikes: 357 Row 9: ─────── machine_name: york spikes: 193 Row 10: ─────── machine_name: reflect spikes: 156 ↙ Progress: 1.65 million rows, 16.66 MB (104.63 million rows/s., 1.06 GB/s. 10 rows in set. Elapsed: 0.016 sec. Processed 1.65 million rows, 16.66 MB (102.57 million rows/s., 1.04 GB/s.) |
–SQL 5: Which externally reachable VMs have run low on memory?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
SELECT machine_name, dt, MIN(mem_free) AS mem_free_min FROM ( SELECT machine_name, CAST(log_time AS DATE) AS dt, mem_free FROM logs1 WHERE machine_group = 'DMZ' AND mem_free IS NOT NULL ) AS r GROUP BY machine_name, dt HAVING MIN(mem_free) < 10000 ORDER BY machine_name, dt; Query id: f4af0029-2707-4c09-b775-160dfcebc957 Row 1: ────── machine_name: chimera dt: 2016-11-04 mem_free_min: 6135.48 Row 2: ────── machine_name: chimera dt: 2016-11-26 mem_free_min: 6045.8667 Row 3: ────── machine_name: chimera dt: 2016-11-27 mem_free_min: 9374.527 Row 4: ────── machine_name: chimera dt: 2016-11-30 mem_free_min: 6933.56 Row 5: ────── machine_name: chimera dt: 2016-12-01 mem_free_min: 6535.8135 Row 6: ────── machine_name: chimera dt: 2016-12-02 mem_free_min: 6393.74 Row 7: ────── machine_name: chimera dt: 2016-12-11 mem_free_min: 8010.5 Row 8: ────── machine_name: chimera dt: 2016-12-30 mem_free_min: 8167.6665 Row 9: ─────── machine_name: chimera dt: 2016-12-31 mem_free_min: 6324.6665 Row 10: ─────── machine_name: chimera dt: 2017-01-01 mem_free_min: 5972 Row 11: ─────── machine_name: chimera dt: 2017-01-03 mem_free_min: 5948 Row 12: ─────── machine_name: chimera dt: 2017-01-04 mem_free_min: 7320 Row 13: ─────── machine_name: chimera dt: 2017-01-05 mem_free_min: 8144 Row 14: ─────── machine_name: chimera dt: 2017-01-07 mem_free_min: 6308.4663 Row 15: ─────── machine_name: chimera dt: 2017-01-09 mem_free_min: 9370 Row 16: ─────── machine_name: chimera dt: 2017-01-10 mem_free_min: 7147.467 ↖ Progress: 3.54 million rows, 39.12 MB (108.02 million rows/s., 1.19 GB/s. 16 rows in set. Elapsed: 0.033 sec. Processed 3.54 million rows, 39.12 MB (106.94 million rows/s., 1.18 GB/s.) |
— SQL 7: What is the total hourly network traffic across all file servers?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
SELECT dt, hr, SUM(net_in) AS net_in_sum, SUM(net_out) AS net_out_sum, SUM(net_in) + SUM(net_out) AS both_sum FROM ( SELECT CAST(log_time AS DATE) AS dt, EXTRACT(HOUR FROM log_time) AS hr, COALESCE(bytes_in, 0.0) / 1000000000.0 AS net_in, COALESCE(bytes_out, 0.0) / 1000000000.0 AS net_out FROM logs1 WHERE machine_name IN ('allsorts','andes','bigred','blackjack','bonbon', 'cadbury','chiclets','cotton','crows','dove','fireball','hearts','huey', 'lindt','milkduds','milkyway','mnm','necco','nerds','orbit','peeps', 'poprocks','razzles','runts','smarties','smuggler','spree','stride', 'tootsie','trident','wrigley','york') ) AS r GROUP BY dt, hr ORDER BY both_sum DESC LIMIT 10\G Query id: 78f32d40-af55-49e1-8338-768144f54c2b Row 1: ────── dt: 2017-01-10 hr: 17 net_in_sum: 87.94858539170787 net_out_sum: 71.38525362068188 both_sum: 159.33383901238975 Row 2: ────── dt: 2017-01-10 hr: 18 net_in_sum: 89.64930670570355 net_out_sum: 69.15606039811588 both_sum: 158.80536710381944 Row 3: ────── dt: 2017-01-10 hr: 19 net_in_sum: 73.60724927620798 net_out_sum: 81.40722209611724 both_sum: 155.0144713723252 Row 4: ────── dt: 2017-01-10 hr: 20 net_in_sum: 68.54794632249272 net_out_sum: 85.50667667204637 both_sum: 154.0546229945391 Row 5: ────── dt: 2017-01-11 hr: 13 net_in_sum: 42.0006900292216 net_out_sum: 69.61384714089114 both_sum: 111.61453717011274 Row 6: ────── dt: 2017-01-10 hr: 16 net_in_sum: 52.18435269713231 net_out_sum: 56.94834975170359 both_sum: 109.1327024488359 Row 7: ────── dt: 2017-01-10 hr: 15 net_in_sum: 58.92187069434693 net_out_sum: 47.70097635240866 both_sum: 106.6228470467556 Row 8: ────── dt: 2017-01-10 hr: 14 net_in_sum: 60.98027388591966 net_out_sum: 44.4488195209332 both_sum: 105.42909340685286 Row 9: ─────── dt: 2017-01-10 hr: 21 net_in_sum: 45.79773242298867 net_out_sum: 54.048210619660104 both_sum: 99.84594304264877 Row 10: ─────── dt: 2017-01-11 hr: 4 net_in_sum: 48.96217601967555 net_out_sum: 48.37625873783819 both_sum: 97.33843475751374 ↗ Progress: 1.79 million rows, 20.13 MB (78.76 million rows/s., 887.07 MB/s 10 rows in set. Elapsed: 0.023 sec. Processed 1.79 million rows, 20.13 MB (77.66 million rows/s., 874.69 MB/s.) |
–SQL 8 – Which requests have caused server errors within the past 2 weeks?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
SELECT * FROM logs2 WHERE status_code >= 500 AND log_time >= TIMESTAMP '2012-12-18 00:00:00' ORDER BY log_time\G Query id: 41fbe700-34ff-4fd4-ab90-4fcbfeae9715 Row 1: ────── log_time: 2012-12-21 16:58:00 client_ip: 229.50.247.232 request: //cgi-bin/feedback/ status_code: 500 object_size: 410 Row 2: ────── log_time: 2012-12-23 23:28:02 client_ip: 229.50.247.232 request: //cgi-bin/feedback/ status_code: 500 object_size: 410 ↗ Progress: 2.00 million rows, 4.72 MB (338.82 million rows/s., 798.60 MB/s 2 rows in set. Elapsed: 0.006 sec. Processed 2.00 million rows, 4.72 MB (324.94 million rows/s., 765.89 MB/s.) |
–SQL 9 – During a specific 2-week period, was the user password file leaked?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
SELECT * FROM logs2 WHERE status_code >= 200 AND status_code < 300 AND request LIKE '%/etc/passwd%' AND log_time >= TIMESTAMP '2012-05-06 00:00:00' AND log_time < TIMESTAMP '2012-05-20 00:00:00'\G Query id: 5782f776-56c9-4701-a1eb-0213f2ee96a9 Row 1: ────── log_time: 2012-05-09 14:46:58 client_ip: 201.183.185.11 request: /?-nd+auto_prepend_file%3D/etc/passwd status_code: 200 object_size: 21173 Row 2: ────── log_time: 2012-05-13 20:17:05 client_ip: 201.183.185.11 request: /?-nd+auto_prepend_file%3D/etc/passwd status_code: 200 object_size: 21809 ↖ Progress: 1.83 million rows, 120.13 MB (45.43 million rows/s., 2.99 GB/s. 2 rows in set. Elapsed: 0.040 sec. Processed 1.83 million rows, 120.13 MB (45.12 million rows/s., 2.97 GB/s.) |
–SQL 10: What was the average path depth for top-level requests in the past month?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
SELECT top_level, AVG(LENGTH(request) - LENGTH(REPLACE(request, '/', ''))) AS depth_avg FROM ( SELECT SUBSTRING(request FROM 1 FOR len) AS top_level, request FROM ( SELECT POSITION(SUBSTRING(request FROM 2), '/') AS len, request FROM logs2 WHERE status_code >= 200 AND status_code < 300 AND log_time >= TIMESTAMP '2012-12-01 00:00:00' ) AS r WHERE len > 0 ) AS s WHERE top_level IN ('/about','/courses','/degrees','/events', '/grad','/industry','/news','/people', '/publications','/research','/teaching','/ugrad') GROUP BY top_level ORDER BY top_level\G Query id: dfeb4126-4564-443c-b571-c70130ca1768 ↑ Progress: 4.33 million rows, 230.42 MB (39.11 million rows/s., 2.08 GB/s.Row 1: ────── top_level: /about depth_avg: 4.180225643663013 Row 2: ────── top_level: /courses depth_avg: 5.138040610251345 Row 3: ────── top_level: /degrees depth_avg: 3.385165907612232 Row 4: ────── top_level: /events depth_avg: 3.105441792057925 Row 5: ────── top_level: /grad depth_avg: 3.6942030710629687 Row 6: ────── top_level: /industry depth_avg: 4.094312105007292 Row 7: ────── top_level: /news depth_avg: 2.963451710977424 Row 8: ────── top_level: /people depth_avg: 4.485873513820637 Row 9: ─────── top_level: /publications depth_avg: 2.245890410958904 Row 10: ─────── top_level: /research depth_avg: 5.040612211618354 Row 11: ─────── top_level: /teaching depth_avg: 2 Row 12: ─────── top_level: /ugrad depth_avg: 2.4527363184079602 ↗ Progress: 4.33 million rows, 230.42 MB (27.43 million rows/s., 1.46 GB/s.→ Progress: 4.80 million rows, 255.41 MB (30.39 million rows/s., 1.62 GB/s. 12 rows in set. Elapsed: 0.158 sec. Processed 4.80 million rows, 255.41 MB (30.35 million rows/s., 1.62 GB/s.) |
–SQL 11: During the last 3 months, which clients have made an excessive number of requests?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
SELECT client_ip, COUNT(*) AS num_requests FROM logs2 WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00' GROUP BY client_ip HAVING COUNT(*) >= 100000 ORDER BY num_requests DESC\G Query id: 605feb05-9719-46fa-ba69-324b53ec1277 Row 1: ────── client_ip: 219.63.173.93 num_requests: 1540391 Row 2: ────── client_ip: 229.50.247.232 num_requests: 743801 Row 3: ────── client_ip: 97.211.80.244 num_requests: 733261 Row 4: ────── client_ip: 152.149.228.251 num_requests: 492221 Row 5: ────── client_ip: 198.156.249.133 num_requests: 370834 Row 6: ────── client_ip: 70.86.124.37 num_requests: 273057 Row 7: ────── client_ip: 67.153.111.239 num_requests: 167287 Row 8: ────── client_ip: 249.92.17.134 num_requests: 112909 ↓ Progress: 17.98 million rows, 143.83 MB (221.43 million rows/s., 1.77 GB/ 8 rows in set. Elapsed: 0.082 sec. Processed 17.98 million rows, 143.83 MB (220.44 million rows/s., 1.76 GB/s.) |
–SQL 11: What are the daily unique visitors ?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
SELECT dt, COUNT(DISTINCT client_ip) FROM ( SELECT CAST(log_time AS DATE) AS dt, client_ip FROM logs2 ) AS r GROUP BY dt ORDER BY dt\G Row 361: ──────── dt: 2012-12-26 uniqExact(client_ip): 8799 Row 362: ──────── dt: 2012-12-27 uniqExact(client_ip): 9435 Row 363: ──────── dt: 2012-12-28 uniqExact(client_ip): 9185 Row 364: ──────── dt: 2012-12-29 uniqExact(client_ip): 8247 Row 365: ──────── dt: 2012-12-30 uniqExact(client_ip): 8328 Row 366: ──────── dt: 2012-12-31 uniqExact(client_ip): 7824 ← Progress: 75.75 million rows, 605.98 MB (273.04 million rows/s., 2.18 G 366 rows in set. Elapsed: 0.278 sec. Processed 75.75 million rows, 605.98 MB (272.73 million rows/s., 2.18 GB/s.) |
–SQL 12: What are the average and maximum data transfer rates (Gbps)?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
SELECT AVG(transfer) / 125000000.0 AS transfer_avg, MAX(transfer) / 125000000.0 AS transfer_max FROM ( SELECT log_time, SUM(object_size) AS transfer FROM logs2 GROUP BY log_time ) AS r\G Query id: 06ad0671-d9ea-4881-a68f-7865b193d219 ↖ Progress: 17.24 million rows, 206.83 MB (168.95 million rows/s., 2.03 G↑ Progress: 35.32 million rows, 423.79 MB (174.44 million rows/s., 2.09 G↗ Progress: 62.59 million rows, 751.14 MB (206.70 million rows/s., 2.48 G→ Progress: 70.30 million rows, 843.54 MB (174.35 million rows/s., 2.09 G↘ Progress: 70.69 million rows, 848.26 MB (140.37 million rows/s., 1.68 G↓ Progress: 72.26 million rows, 867.13 MB (119.64 million rows/s., 1.44 G↙ Progress: 75.75 million rows, 908.98 MB (107.54 million rows/s., 1.29 GRow 1: ────── transfer_avg: 0.0046296999419207785 transfer_max: 295.028835936 ← Progress: 75.75 million rows, 908.98 MB (64.98 million rows/s., 779.80 1 rows in set. Elapsed: 1.166 sec. Processed 75.75 million rows, 908.98 MB (64.94 million rows/s., 779.28 MB/s.) |
— SQL 17: Did the indoor temperature reach freezing over the weekend?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
SELECT * FROM logs3 WHERE event_type = 'temperature' AND event_value <= 32.0 AND log_time >= '2019-11-29 17:00:00.000'\G Row 1: ────── log_time: 2019-12-01 09:30:13.417 device_id: 16C448031108ABA device_name: balcony_5 device_type: door device_floor: 5 event_type: temperature event_unit: F event_value: 32 Row 2: ────── log_time: 2019-12-01 09:59:58.874 device_id: 16C448031108ABA device_name: balcony_5 device_type: door device_floor: 5 event_type: temperature event_unit: F event_value: 32 ↑ Progress: 7.35 thousand rows, 228.52 KB (1.56 million rows/s., 48.57 MB 2 rows in set. Elapsed: 0.005 sec. Processed 7.35 thousand rows, 228.52 KB (1.51 million rows/s., 46.96 MB/s.) |
–SQL 18: Over the past 6 months, how frequently were each door opened?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
SELECT device_name, device_floor, COUNT(*) AS ct FROM logs3 WHERE event_type = 'door_open' AND log_time >= '2019-06-01 00:00:00.000' GROUP BY device_name, device_floor ORDER BY ct DESC\G Row 15: ─────── device_name: lobby_left_1 device_floor: 1 ct: 2944 Row 16: ─────── device_name: stairs_north_5 device_floor: 5 ct: 1731 Row 17: ─────── device_name: lobby_right_1 device_floor: 1 ct: 1641 Row 18: ─────── device_name: stairs_east_5 device_floor: 5 ct: 1637 Row 19: ─────── device_name: balcony_3 device_floor: 3 ct: 1619 Row 20: ─────── device_name: stairs_east_4 device_floor: 4 ct: 1093 → Progress: 245.76 thousand rows, 2.71 MB (14.90 million rows/s., 164.09 20 rows in set. Elapsed: 0.017 sec. Processed 245.76 thousand rows, 2.71 MB (14.67 million rows/s., 161.52 MB/s.) |
–SQL 19: For each device category, what are the monthly power consumption metrics?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
SELECT yr, mo, SUM(coffee_hourly_avg) AS coffee_monthly_sum, AVG(coffee_hourly_avg) AS coffee_monthly_avg, SUM(printer_hourly_avg) AS printer_monthly_sum, AVG(printer_hourly_avg) AS printer_monthly_avg, SUM(projector_hourly_avg) AS projector_monthly_sum, AVG(projector_hourly_avg) AS projector_monthly_avg, SUM(vending_hourly_avg) AS vending_monthly_sum, AVG(vending_hourly_avg) AS vending_monthly_avg FROM ( SELECT dt, yr, mo, hr, AVG(coffee) AS coffee_hourly_avg, AVG(printer) AS printer_hourly_avg, AVG(projector) AS projector_hourly_avg, AVG(vending) AS vending_hourly_avg FROM ( SELECT CAST(log_time AS DATE) AS dt, EXTRACT(YEAR FROM log_time) AS yr, EXTRACT(MONTH FROM log_time) AS mo, EXTRACT(HOUR FROM log_time) AS hr, CASE WHEN device_name LIKE 'coffee%' THEN event_value END AS coffee, CASE WHEN device_name LIKE 'printer%' THEN event_value END AS printer, CASE WHEN device_name LIKE 'projector%' THEN event_value END AS projector, CASE WHEN device_name LIKE 'vending%' THEN event_value END AS vending FROM logs3 WHERE device_type = 'meter' ) AS r GROUP BY dt, yr, mo, hr ) AS s GROUP BY yr, mo ORDER BY yr, mo\G Row 25: ─────── yr: 2019 mo: 9 coffee_monthly_sum: 862784.4318728528 coffee_monthly_avg: 1198.3117109345178 printer_monthly_sum: 63348.6489027506 printer_monthly_avg: 87.98423458715362 projector_monthly_sum: 56973.39566326607 projector_monthly_avg: 126.32681965247465 vending_monthly_sum: 322162.84531000565 vending_monthly_avg: 447.4483962638967 Row 26: ─────── yr: 2019 mo: 10 coffee_monthly_sum: 900170.2429565514 coffee_monthly_avg: 1209.9062405329992 printer_monthly_sum: 64953.121362668586 printer_monthly_avg: 87.30258247670508 projector_monthly_sum: 98168.02805722524 projector_monthly_avg: 218.6370335350228 vending_monthly_sum: 310577.09491537174 vending_monthly_avg: 417.44233187549963 Row 27: ─────── yr: 2019 mo: 11 coffee_monthly_sum: 881836.9061517064 coffee_monthly_avg: 1224.773480766259 printer_monthly_sum: 61941.82869595774 printer_monthly_avg: 86.03031763327463 projector_monthly_sum: 181716.37704881537 projector_monthly_avg: 294.9941185857392 vending_monthly_sum: 258322.18736711267 vending_monthly_avg: 358.7808157876565 Row 28: ─────── yr: 2019 mo: 12 coffee_monthly_sum: 45118.35452979575 coffee_monthly_avg: 1253.28762582766 printer_monthly_sum: 2884.8668441554055 printer_monthly_avg: 80.13519011542793 projector_monthly_sum: 13576.481448950077 projector_monthly_avg: 377.1244846930577 vending_monthly_sum: 12249.124855753844 vending_monthly_avg: 340.25346821538454 ← Progress: 107.25 million rows, 1.50 GB (94.32 million rows/s., 1.32 GB/↖ Progress: 108.96 million rows, 1.52 GB (95.82 million rows/s., 1.34 GB/ 28 rows in set. Elapsed: 1.137 sec. Processed 108.96 million rows, 1.52 GB (95.80 million rows/s., 1.34 GB/s.) |
☛ MinervaDB is trusted by top companies worldwide