Cassandra Write Performance: Proven Troubleshooting Strategies

Troubleshooting Cassandra Write Performance Bottlenecks: A runbook for Cassandra DBAs

Identifying Write Performance Issues

When Cassandra struggles with write performance, symptoms often surface as increased write latency, timeouts, and diminished throughput. Specifically, look out for:. Key indicators include:

High write latency (>10ms for local writes)
WriteTimeoutExceptions in application logs
Increased pending compactions
High CPU utilization during write operations
Memory pressure and frequent garbage collection

Transitioning to Optimisation

Once issues are detected, the next logical step is systematic optimisation across infrastructure, configuration, JVM, schema design, and monitoring layers.

Core Write Performance Optimization Strategies

1. Hardware and Infrastructure Tuning

Storage Configuration:

To begin with, Use SSD storage for commit logs and data directories
Separate commit log and data directories on different disks
Ensure adequate IOPS capacity (minimum 1000 IOPS per node)
Configure RAID 0 for data directories to maximize throughput

Memory Allocation:

Equally important, allocate memory wisely:

Set heap size to 8-14GB (never exceed 14GB due to GC overhead)
Allocate 25-50% of total RAM to Cassandra heap
Configure off-heap memory for memtables and caches

2. Cassandra Configuration Optimization

Fine-tuning cassandra.yaml is crucial. Here are some highlights:

# cassandra.yaml optimizations for write performance

# Commit log settings
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
commitlog_compression:
  - class_name: LZ4Compressor

# Memtable settings
memtable_allocation_type: heap_buffers
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
memtable_cleanup_threshold: 0.11

# Compaction settings
concurrent_compactors: 4
compaction_throughput_mb_per_sec: 64
compaction_large_partition_warning_threshold_mb: 1000

# Write settings
concurrent_writes: 128
write_request_timeout_in_ms: 10000
batch_size_warn_threshold_in_kb: 64
batch_size_fail_threshold_in_kb: 640

3. JVM Tuning for Write Performance

Next, adjust JVM options to reduce pause times and optimize GC:

# JVM options for optimal write performance
-Xms8G
-Xmx8G
-XX:+UseG1GC
-XX:G1RSetUpdatingPauseTimePercent=5
-XX:MaxGCPauseMillis=300
-XX:InitiatingHeapOccupancyPercent=70
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-Djdk.nio.maxCachedBufferSize=262144

4. Schema Design for Write Optimization

A well-designed schema ensures even write distribution:

Partition Key Strategy:

Design partition keys to distribute writes evenly across nodes
Avoid hotspots by using compound partition keys
Limit partition size to under 100MB

Clustering Key Optimization:

Use time-based clustering keys for time-series data
Minimize the number of clustering columns
Consider bucketing strategies for high-volume writes

Table Configuration:

CREATE TABLE events (
    partition_key text,
    time_bucket int,
    event_time timestamp,
    data text,
    PRIMARY KEY ((partition_key, time_bucket), event_time)
) WITH
    compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_unit': 'HOURS', 'compaction_window_size': 1}
    AND gc_grace_seconds = 86400
    AND bloom_filter_fp_chance = 0.01;

5. Compaction Strategy Optimization

Different workloads call for different strategies:

Size-Tiered Compaction Strategy (STCS):

Best for write-heavy workloads with mixed access patterns
Configure min_threshold and max_threshold based on write volume

Time Window Compaction Strategy (TWCS):

Optimal for time-series data with TTL
Reduces compaction overhead for expired data

Leveled Compaction Strategy (LCS):

Use only for read-heavy workloads
Avoid for write-intensive applications due to write amplification

6. Monitoring and Diagnostics

Proactive monitoring helps detect and prevent issues. Key Metrics to Monitor:

Write latency percentiles (P95, P99)
Pending compactions count
Memtable flush frequency
Commit log utilization
GC pause times

Diagnostic Commands:

First Use nodetool

Then, To debug commit logs:

# Check write performance metrics
nodetool tablestats keyspace.table
nodetool compactionstats
nodetool tpstats

# Monitor commit log
nodetool getlogginglevels
nodetool setlogginglevel org.apache.cassandra.db.commitlog DEBUG

Advanced Write Performance Techniques

1. Batch Optimization

Use unlogged batches for single partition writes
Limit batch size to 64KB
Avoid cross-partition logged batches in high-throughput scenarios

2. Consistency Level Tuning

Use ONE or LOCAL_ONE for maximum write throughput
Consider LOCAL_QUORUM for balanced consistency and performance
Avoid ALL consistency level in production

3. Client-Side Optimizations

// Optimal driver configuration for writes
Cluster cluster = Cluster.builder()
    .addContactPoint("127.0.0.1")
    .withPoolingOptions(new PoolingOptions()
        .setMaxConnectionsPerHost(HostDistance.LOCAL, 8)
        .setMaxRequestsPerConnection(HostDistance.LOCAL, 1024))
    .withQueryOptions(new QueryOptions()
        .setConsistencyLevel(ConsistencyLevel.LOCAL_ONE))
    .build();

Troubleshooting Common Write Issues

High Write Latency

Check for GC pressure and tune JVM settings
Verify adequate hardware resources
Review compaction strategy and pending compactions
Analyze partition size distribution

Write Timeouts

Increase write_request_timeout_in_ms
Scale cluster horizontally
Optimize data model to reduce hotspots
Check network connectivity and latency

Memory Pressure

Tune memtable settings
Adjust flush thresholds
Monitor off-heap memory usage
Consider increasing heap size within limits

Performance Testing and Validation

Use cassandra-stress for write performance testing:

# Write performance test
cassandra-stress write n=1000000 -rate threads=100 -node 192.168.1.10

Regular performance testing ensures configuration changes deliver expected improvements and helps identify regressions before they impact production workloads.

By implementing these optimization strategies systematically, Cassandra DBAs can achieve significant improvements in write performance while maintaining cluster stability and data consistency.

FAQs

Q1: How do I reduce write latency in Cassandra?
A: Tune JVM, optimize commit log sync, and balance write load via partition keys.

Q2: What’s the best consistency level for high write throughput?
A: LOCAL_ONE provides good balance between availability and performance.

Q3: Should I use STCS or TWCS for time-series workloads?
A: Use TWCS for time-series with TTL to minimize compaction load.

Q4: How do I monitor write performance metrics?
A: Use nodetool for latency, compactions, and commit log stats.

Q5: What’s the impact of logged vs. unlogged batches?
A: Logged batches add coordination overhead; use unlogged for better performance.

To further enhance your understanding, explore these related topics:

Cassandra Background Processes and Performance Impact
Gain insights into Cassandra’s key background processes—such as compaction, repair, and garbage collection—and their effects on system performance.
Troubleshooting Thread Contention in Apache Cassandra
Learn how to identify and resolve thread contention issues that can lead to latency spikes and reduced throughput in your Cassandra cluster.
Cassandra Architecture for SQL Server DBAs
Understand Cassandra’s distributed, peer-to-peer architecture in comparison to SQL Server, helping DBAs transition effectively.

Resources

Apache Cassandra’s documentation!

Cassandra Troubleshooting

The WebScale Database Infrastructure Architecture, Engineering and Operations Company

Full-Stack Database Engineering & Cloud DBaaS Solutions for PostgreSQL, MySQL, MongoDB & More | Performance, Scalability, High Availability, Security & Analytics Experts

Troubleshooting Cassandra Write Performance Bottlenecks

Troubleshooting Cassandra Write Performance Bottlenecks: A runbook for Cassandra DBAs

Identifying Write Performance Issues

Transitioning to Optimisation

Core Write Performance Optimization Strategies

1. Hardware and Infrastructure Tuning

2. Cassandra Configuration Optimization

3. JVM Tuning for Write Performance

4. Schema Design for Write Optimization

5. Compaction Strategy Optimization

6. Monitoring and Diagnostics

Advanced Write Performance Techniques

1. Batch Optimization

2. Consistency Level Tuning

3. Client-Side Optimizations

Troubleshooting Common Write Issues

High Write Latency

Write Timeouts

Memory Pressure

Performance Testing and Validation

FAQs

To further enhance your understanding, explore these related topics:

Resources

Troubleshooting Cassandra Write Performance Bottlenecks: A runbook for Cassandra DBAs

Identifying Write Performance Issues

Transitioning to Optimisation

Core Write Performance Optimization Strategies

1. Hardware and Infrastructure Tuning

2. Cassandra Configuration Optimization

3. JVM Tuning for Write Performance

4. Schema Design for Write Optimization

5. Compaction Strategy Optimization

6. Monitoring and Diagnostics

Advanced Write Performance Techniques

1. Batch Optimization

2. Consistency Level Tuning

3. Client-Side Optimizations

Troubleshooting Common Write Issues

High Write Latency

Write Timeouts

Memory Pressure

Performance Testing and Validation

FAQs

To further enhance your understanding, explore these related topics:

Resources

Related Articles

Apache Cassandra Anti-Patterns That Kill Speed

Mastering Cassandra Architecture for SQL Server DBAs

Cassandra High Availability: Maximum Uptime and Resilience