Troubleshooting Cassandra Write Performance Bottlenecks

Troubleshooting Cassandra Write Performance Bottlenecks: A runbook for Cassandra DBAs



Identifying Write Performance Issues

Write performance bottlenecks in Cassandra typically manifest through increased write latency, timeouts, and reduced throughput. Key indicators include:

  • High write latency (>10ms for local writes)
  • WriteTimeoutExceptions in application logs
  • Increased pending compactions
  • High CPU utilization during write operations
  • Memory pressure and frequent garbage collection

Core Write Performance Optimization Strategies

1. Hardware and Infrastructure Tuning

Storage Configuration:

  • Use SSD storage for commit logs and data directories
  • Separate commit log and data directories on different disks
  • Ensure adequate IOPS capacity (minimum 1000 IOPS per node)
  • Configure RAID 0 for data directories to maximize throughput

Memory Allocation:

  • Set heap size to 8-14GB (never exceed 14GB due to GC overhead)
  • Allocate 25-50% of total RAM to Cassandra heap
  • Configure off-heap memory for memtables and caches

2. Cassandra Configuration Optimization

# cassandra.yaml optimizations for write performance

# Commit log settings
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
commitlog_compression:
  - class_name: LZ4Compressor

# Memtable settings
memtable_allocation_type: heap_buffers
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
memtable_cleanup_threshold: 0.11

# Compaction settings
concurrent_compactors: 4
compaction_throughput_mb_per_sec: 64
compaction_large_partition_warning_threshold_mb: 1000

# Write settings
concurrent_writes: 128
write_request_timeout_in_ms: 10000
batch_size_warn_threshold_in_kb: 64
batch_size_fail_threshold_in_kb: 640

3. JVM Tuning for Write Performance

# JVM options for optimal write performance
-Xms8G
-Xmx8G
-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=300
-XX:InitiatingHeapOccupancyPercent=70
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-Djdk.nio.maxCachedBufferSize=262144

4. Schema Design for Write Optimization

Partition Key Strategy:

  • Design partition keys to distribute writes evenly across nodes
  • Avoid hotspots by using compound partition keys
  • Limit partition size to under 100MB

Clustering Key Optimization:

  • Use time-based clustering keys for time-series data
  • Minimize the number of clustering columns
  • Consider bucketing strategies for high-volume writes

Table Configuration:

CREATE TABLE events (
    partition_key text,
    time_bucket int,
    event_time timestamp,
    data text,
    PRIMARY KEY ((partition_key, time_bucket), event_time)
) WITH
    compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': 1}
    AND gc_grace_seconds = 86400
    AND bloom_filter_fp_chance = 0.01;

5. Compaction Strategy Optimization

Size-Tiered Compaction Strategy (STCS):

  • Best for write-heavy workloads with mixed access patterns
  • Configure min_threshold and max_threshold based on write volume

Time Window Compaction Strategy (TWCS):

  • Optimal for time-series data with TTL
  • Reduces compaction overhead for expired data

Leveled Compaction Strategy (LCS):

  • Use only for read-heavy workloads
  • Avoid for write-intensive applications due to write amplification

6. Monitoring and Diagnostics

Key Metrics to Monitor:

  • Write latency percentiles (P95, P99)
  • Pending compactions count
  • Memtable flush frequency
  • Commit log utilization
  • GC pause times

Diagnostic Commands:

# Check write performance metrics
nodetool tablestats keyspace.table
nodetool compactionstats
nodetool tpstats

# Monitor commit log
nodetool getlogginglevels
nodetool setlogginglevel org.apache.cassandra.db.commitlog DEBUG

Advanced Write Performance Techniques

1. Batch Optimization

  • Use unlogged batches for single partition writes
  • Limit batch size to 64KB
  • Avoid cross-partition logged batches in high-throughput scenarios

2. Consistency Level Tuning

  • Use ONE or LOCAL_ONE for maximum write throughput
  • Consider LOCAL_QUORUM for balanced consistency and performance
  • Avoid ALL consistency level in production

3. Client-Side Optimizations

// Optimal driver configuration for writes
Cluster cluster = Cluster.builder()
    .addContactPoint("127.0.0.1")
    .withPoolingOptions(new PoolingOptions()
        .setMaxConnectionsPerHost(HostDistance.LOCAL, 8)
        .setMaxRequestsPerConnection(HostDistance.LOCAL, 1024))
    .withQueryOptions(new QueryOptions()
        .setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
    .build();

Troubleshooting Common Write Issues

High Write Latency

  1. Check for GC pressure and tune JVM settings
  2. Verify adequate hardware resources
  3. Review compaction strategy and pending compactions
  4. Analyze partition size distribution

Write Timeouts

  1. Increase write_request_timeout_in_ms
  2. Scale cluster horizontally
  3. Optimize data model to reduce hotspots
  4. Check network connectivity and latency

Memory Pressure

  1. Tune memtable settings
  2. Adjust flush thresholds
  3. Monitor off-heap memory usage
  4. Consider increasing heap size within limits

Performance Testing and Validation

Use cassandra-stress for write performance testing:

# Write performance test
cassandra-stress write n=1000000 -rate threads=100 -node 192.168.1.10

Regular performance testing ensures configuration changes deliver expected improvements and helps identify regressions before they impact production workloads.

By implementing these optimization strategies systematically, Cassandra DBAs can achieve significant improvements in write performance while maintaining cluster stability and data consistency.

Troubleshooting Thread Contention in Apache Cassandra

 

Optimizing MySQL 8 Performance with Group Commit

 

Tips and Tricks for reducing Leaf Block Contention happening to InnoDB

 

Why is RocksDB more suitable for High Write Throughput operations compared to InnoDB?

About MinervaDB Corporation 74 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.

Be the first to comment

Leave a Reply