Mastering Azure Cosmos DB Performance

Mastering Azure Cosmos DB: Performance, Query, and Cost Optimization



Azure Cosmos DB is Microsoft’s globally distributed, multi-model database service designed for high availability, low latency, and seamless scalability. As organizations increasingly rely on real-time data processing and global application reach, optimizing Cosmos DB becomes critical to achieving peak performance while managing operational costs. This comprehensive guide dives deep into tuning Cosmos DB for optimal performance, refining query efficiency, and implementing cost-effective strategies—without relying on external references or generic AI-generated patterns.

Understanding Cosmos DB Architecture

Before delving into optimization techniques, it’s essential to understand the core architectural components of Cosmos DB. At its foundation lies a globally replicated, partitioned database engine that supports multiple APIs, including SQL (Core), MongoDB, Cassandra, Gremlin, and Table. While this article focuses primarily on the SQL API, many principles apply across other interfaces.

Cosmos DB operates on a resource model measured in Request Units (RUs). Every operation—read, write, query, or metadata lookup—consumes a certain number of RUs based on factors like item size, indexing overhead, and query complexity. Throughput is provisioned in RU/s, and understanding how operations consume RUs is central to performance and cost management.

Data in Cosmos DB is stored in containers, which are logical groupings of items. Each container can be configured with either fixed (up to 20 GB) or unlimited (partitioned) storage. For scalable workloads, partitioned containers are the norm, using a partition key to distribute data across physical partitions. Choosing the right partition key is one of the most impactful decisions in Cosmos DB design.

The Role of the Partition Key

The partition key determines how data is distributed and accessed. An ideal partition key exhibits high cardinality, even distribution of data and workload, and supports common query patterns. Poorly chosen keys can lead to “hot” partitions—those handling disproportionate traffic—which degrade performance and increase latency.

For example, using a user ID as a partition key in a multi-tenant application ensures that each user’s data resides in a predictable location, enabling efficient point reads. Conversely, using a timestamp or status field with low cardinality can result in skewed data distribution, especially if many items share the same value.

Partitioning also affects query execution. Queries that include the partition key in the filter condition are routed directly to the relevant partition, minimizing latency and RU consumption. Cross-partition queries, while supported, require fan-out across all partitions and are inherently more expensive.

Performance Tuning Strategies

Optimizing performance in Cosmos DB involves a combination of schema design, indexing policies, throughput configuration, and client-side practices.

1. Schema Design and Data Modeling

Unlike traditional relational databases, Cosmos DB is schema-agnostic, allowing flexible JSON documents. However, this flexibility demands careful modeling to avoid anti-patterns.

Denormalization Over Joins: Since joins in Cosmos DB are executed server-side and can be costly, it’s often better to denormalize data. Embedding related entities within a single document reduces the need for multiple queries. For instance, in an e-commerce application, including product details within an order document eliminates the need to join with a products collection.

Document Size Management: Large documents increase RU consumption for both reads and writes. A 1 KB document read at strong consistency costs approximately 1 RU, while a 10 KB document costs around 10 RUs. Keeping documents lean—ideally under 1 KB for frequently accessed data—improves efficiency.

Avoiding Large Property Names: While seemingly minor, verbose property names contribute to payload size. Using concise names like “uid” instead of “userId” or “ts” instead of “timestamp” reduces storage and transfer overhead, especially at scale.

2. Indexing Policies

By default, Cosmos DB automatically indexes every property in a document, enabling flexible querying. However, this comes at a cost: indexing increases storage usage and write latency, as every write must update the index.

Excluding Unnecessary Properties: If certain properties are never queried, they should be excluded from the index. For example, audit logs or telemetry data may contain fields used only for downstream processing. Marking these as “Excluded” in the indexing policy reduces indexing overhead.

{
  "indexingMode": "consistent",
  "excludedPaths": [
    { "path": "/ telemetryLogs/*" },
    { "path": "/debugInfo/?" }
  ]
}

Including Specific Paths: Conversely, for queries targeting specific paths, an inclusion policy can improve precision. This is particularly useful when most properties don’t need indexing, allowing fine-grained control over which fields are indexed.

Composite Indexes for Range Queries: When queries filter on multiple properties with range conditions (e.g., ORDER BY with multiple fields), composite indexes are required. Defining composite indexes on frequently queried field combinations improves sort performance and avoids in-memory sorting, which is RU-intensive.

For example, a query filtering by date range and sorting by price benefits from a composite index on (date ASC, price ASC). Without it, Cosmos DB may perform a full scan and sort results in memory, consuming significantly more RUs.

Spatial and Custom Indexes: For geospatial queries, enabling spatial indexes on geometry fields accelerates distance-based searches. Similarly, custom indexing policies can be defined for specialized data types or access patterns.

3. Throughput Configuration

Cosmos DB offers two throughput models: provisioned and serverless. Provisioned throughput is suitable for predictable workloads, while serverless handles sporadic traffic with pay-per-request pricing.

Autoscale vs. Manual Provisioning: Autoscale automatically adjusts RU/s based on traffic, simplifying capacity management. It supports up to 10,000 RU/s and scales in seconds. For stable workloads, manual provisioning may be more cost-effective, but autoscale provides headroom during traffic spikes without manual intervention.

Reserved Capacity Discounts: Committing to 1-year or 3-year reservations can reduce costs by up to 66% compared to on-demand pricing. This is ideal for production workloads with consistent usage.

4. Consistency Levels and Latency

Cosmos DB supports five consistency levels: strong, bounded staleness, session, consistent prefix, and eventual. Each offers a trade-off between consistency, availability, and performance.

Session Consistency as Default: For most applications, session consistency provides a good balance—ensuring monotonic reads, writes, and causal consistency within a session while maintaining low latency. It’s the default and recommended setting for web and mobile apps.

Strong Consistency Overhead: Strong consistency ensures linearizability but incurs higher latency and RU costs due to coordination across replicas. It should be used sparingly, only when strictly required (e.g., financial transactions).

Multi-Region Writes: Enabling multi-region writes allows clients to write to the nearest region, reducing latency. However, it increases complexity in conflict resolution and may require custom conflict resolution policies.

5. Client-Side Optimization

The application layer plays a crucial role in performance. Using the latest SDKs, configuring connection policies, and leveraging caching can significantly improve efficiency.

Direct Mode vs. Gateway Mode: The .NET and Java SDKs support direct connectivity (TCP+SSL), which bypasses the gateway and reduces latency. Gateway mode (HTTPS) is simpler but adds overhead. Direct mode is recommended for low-latency scenarios.

Connection Pooling and Retry Policies: Reusing TCP connections and implementing exponential backoff for throttled requests (429 errors) prevents connection exhaustion and improves resilience. The SDKs include built-in retry logic, but custom policies can be tuned based on workload characteristics.

Bulk Operations: When ingesting large volumes of data, enabling bulk mode in the SDK allows parallelizing operations and optimizing throughput. Bulk mode can improve write performance by up to 2x compared to sequential operations.

Query Optimization Techniques

Efficient querying is central to Cosmos DB performance. Poorly written queries can consume excessive RUs, increase latency, and strain system resources.

1. Writing Efficient SQL Queries

Cosmos DB uses a SQL-like language for querying JSON data. While familiar, it has nuances that impact performance.

Filter Early with WHERE Clauses: Always apply filters as early as possible. Use equality conditions on the partition key to target specific partitions. For example:

SELECT * FROM c WHERE c.partitionKey = 'user123' AND c.status = 'active'

This query targets a single partition and uses an indexed property for filtering.

Avoid Functions in Filters: Applying functions like UPPER(), LOWER(), or mathematical operations in WHERE clauses prevents index usage. Instead, store data in a query-friendly format. For case-insensitive searches, store a normalized version of the field (e.g., “emailLower”) and index it.

Use BETWEEN for Range Queries: When querying ranges, use BETWEEN instead of separate >= and <= conditions. Cosmos DB optimizes BETWEEN for index usage.

SELECT * FROM c WHERE c.timestamp BETWEEN '2025-01-01' AND '2025-12-31'

2. Minimizing Result Set Size

Large result sets increase network transfer time and RU consumption.

Projection with SELECT: Only retrieve needed fields using SELECT. Avoid SELECT * unless all fields are required.

SELECT c.id, c.name, c.createdAt FROM c WHERE c.category = 'electronics'

Pagination with Continuation Tokens: For large datasets, use OFFSET LIMIT sparingly, as it still scans skipped items. Instead, rely on continuation tokens returned by the SDK to page through results efficiently.

var query = container.GetItemQueryIterator<dynamic>(
    "SELECT * FROM c",
    requestOptions: new QueryRequestOptions { MaxItemCount = 100 });

while (query.HasMoreResults)
{
    var response = await query.ReadNextAsync();
    // Process batch
}

3. Handling Joins and Subqueries

Joins in Cosmos DB are expensive because they require cross-document operations.

Prefer Denormalized Data: As mentioned earlier, embedding related data avoids joins. If joins are unavoidable, ensure the joined collections are filtered as narrowly as possible.

Use EXISTS for Existence Checks: When checking for the presence of related items, use EXISTS instead of JOIN. EXISTS stops at the first match and is more efficient.

SELECT c.id, c.name 
FROM customers c 
WHERE EXISTS (
  SELECT VALUE r FROM r IN c.reviews WHERE r.rating > 4
)

4. Aggregation Queries

Cosmos DB supports aggregations like COUNT, SUM, MIN, MAX, and AVG. However, these operate over the entire result set and can be costly.

Filter Before Aggregating: Always apply WHERE conditions to reduce the dataset before aggregation.

SELECT COUNT(1) FROM c WHERE c.status = 'completed'

Use Approximate Aggregates: For large datasets where exact counts aren’t critical, Cosmos DB returns approximate values. Applications should be designed to tolerate slight inaccuracies in favor of performance.

Materialize Aggregates: For frequently accessed aggregates (e.g., daily order counts), consider precomputing and storing them in a separate document. This shifts the cost to write time and enables instant reads.

5. Query Execution Plans and Metrics

Understanding how queries execute helps identify bottlenecks.

Request Charge Monitoring: Every query response includes an RU charge. Monitoring this helps compare query efficiency and identify expensive operations.

Diagnostic Logging: Enable Azure Monitor logs to capture query metrics, including execution time, RU consumption, and partition statistics. Analyzing slow queries reveals patterns like full scans or in-memory sorts.

Use of Cross-Partition Queries: While sometimes necessary, cross-partition queries should be minimized. If unavoidable, use ORDER BY with LIMIT to reduce result set size and RU cost.

Cost Optimization Strategies

Performance gains are meaningless if they come at an unsustainable cost. Cosmos DB pricing is based on provisioned throughput, storage, and data transfer, making cost optimization a multi-dimensional challenge.

1. Right-Sizing Throughput

Over-provisioning RUs is a common cause of overspending.

Monitor and Scale Based on Usage: Use Azure Monitor to track RU utilization. If average consumption is consistently below 70% of provisioned throughput, consider scaling down. Conversely, frequent 429 errors indicate a need for higher throughput.

Use Autoscale for Variable Workloads: Autoscale adjusts capacity dynamically, ensuring you pay only for what you use during peak times while maintaining baseline performance.

Leverage Serverless for Sporadic Traffic: Serverless mode charges per request, making it ideal for development, testing, or applications with unpredictable traffic. It eliminates the need to provision baseline throughput.

2. Storage Optimization

Storage costs accumulate over time, especially with large documents or high ingestion rates.

Implement Time-to-Live (TTL): Set TTL on documents that don’t need to persist indefinitely. Logs, sessions, and temporary data can be automatically purged, reducing storage and indexing overhead.

Compress Large Fields: For large text or binary data (e.g., JSON blobs, logs), consider compressing the content before storage. While this adds CPU overhead, it reduces RU costs for reads and writes.

Archive Cold Data: Move infrequently accessed data to cheaper storage tiers like Azure Blob Storage. Use Cosmos DB change feed to detect and migrate stale records automatically.

3. Data Transfer Costs

Data transfer between regions and to external services incurs charges.

Co-Locate Applications and Cosmos DB: Deploy applications in the same Azure region as the primary Cosmos DB endpoint to avoid egress fees.

Use Multi-Homing for Global Apps: For globally distributed applications, enable multi-homing in the SDK to route requests to the nearest region, reducing latency and cross-region data transfer.

Minimize Chatty Patterns: Reduce the number of round trips by batching operations or using stored procedures. Each request, even small ones, has a minimum RU cost.

4. Reserved Capacity and Commitments

As mentioned earlier, reserved capacity offers significant discounts. Planning for long-term usage and committing to reservations can drastically reduce monthly bills.

Analyze Historical Usage: Before purchasing a reservation, analyze 30–90 days of usage patterns to estimate baseline throughput. Reservations are most cost-effective when utilization exceeds 50%.

Combine with Autoscale: Reserved capacity applies to autoscale throughput as well. Reserving 10,000 RU/s with autoscale allows scaling from 1,000 to 10,000 RU/s within the reserved tier, maximizing discount benefits.

5. Monitoring and Governance

Proactive monitoring prevents cost overruns.

Set Budgets and Alerts: Use Azure Cost Management to set monthly budgets and receive alerts when thresholds are exceeded.

Tag Resources: Apply tags to Cosmos DB accounts for chargeback and cost allocation. Tags like “environment=prod”, “team=backend”, or “project=ecommerce” enable detailed cost reporting.

Regular Audits: Periodically review indexing policies, container configurations, and query patterns. Remove unused containers, adjust partition keys if needed, and optimize queries based on performance data.

Advanced Optimization Scenarios

Beyond foundational practices, certain advanced techniques can further enhance performance and efficiency.

1. Change Feed for Event-Driven Architectures

The change feed provides a sorted, partitioned log of inserts and updates. It’s ideal for building event-driven systems, such as updating search indexes, triggering workflows, or syncing with analytics platforms.

Use Incremental Processing: Process the change feed incrementally using checkpoints. This ensures no events are missed and allows scaling processing across multiple instances.

Filter Change Feed with Predicates: While the change feed doesn’t support SQL filtering, you can filter in your processor logic. However, minimizing the volume of processed changes improves efficiency.

2. Stored Procedures and Transactions

Cosmos DB supports JavaScript-based stored procedures, triggers, and user-defined functions (UDFs). These run within the database engine and can perform atomic transactions across documents in the same partition.

Use for Complex Writes: When multiple documents must be updated atomically (e.g., transferring funds between accounts), a stored procedure ensures consistency.

Avoid Long-Running Scripts: Stored procedures are limited by RU budget and execution time. Complex logic should be broken into smaller operations or moved to the application layer.

3. Caching Strategies

While Cosmos DB is fast, caching frequently accessed data in memory can reduce RU consumption and latency.

Use Azure Cache for Redis: Deploy Redis as a caching layer for hot data. Implement a cache-aside pattern where the application checks the cache before querying Cosmos DB.

Handle Cache Invalidation: Use the change feed to invalidate or update cache entries when underlying data changes, ensuring consistency.

4. Monitoring and Diagnostics

Effective optimization requires visibility.

Enable Diagnostic Settings: Stream logs to Log Analytics, Storage, or Event Hubs. Key metrics include:

  • Request Rate and RU Consumption: Identify traffic patterns and peak loads.
  • Throttling (429 Errors): Indicates insufficient throughput.
  • Query Performance: Detect slow queries and high RU operations.
  • Replication Latency: Monitor for consistency and availability issues.

Use Query Metrics: Enable query metrics in the SDK to get detailed breakdowns of RU usage per operation, index lookup time, and document loading cost.

Real-World Optimization Example

Consider a global e-commerce platform using Cosmos DB to store product catalogs, user profiles, and order history.

Challenge: Order queries by user are slow, and monthly costs are rising due to high RU consumption.

Analysis: Queries for user orders are cross-partition, scanning all partitions. The order documents include full product details, making them large (~5 KB). TTL is not enabled, and logs are retained indefinitely.

Optimization Steps:

  1. Refactor Partition Key: Change the orders container’s partition key from “orderId” to “userId”. This allows point reads for user-specific queries.
  2. Denormalize Selectively: Keep only essential product fields (ID, name, price) in the order document, reducing size to ~1 KB.
  3. Enable TTL: Set TTL to 365 days for orders and 30 days for logs.
  4. Optimize Indexing: Exclude debug and audit fields from indexing.
  5. Implement Caching: Cache recent orders for active users in Redis.
  6. Right-Size Throughput: Based on monitoring, reduce provisioned RUs by 40% and enable autoscale.

Result: Query latency drops from 200 ms to 20 ms, RU consumption per query decreases by 75%, and monthly costs are reduced by 50%.

Conclusion

Optimizing Azure Cosmos DB is an ongoing process that requires a holistic approach—balancing performance, query efficiency, and cost. By carefully designing data models, tuning indexing policies, writing efficient queries, and leveraging Cosmos DB’s scalability features, organizations can build high-performance applications without overspending.

The key is to measure, monitor, and iterate. Use built-in metrics to guide decisions, test changes in non-production environments, and stay informed about new features and best practices. With the right strategies, Cosmos DB can deliver exceptional performance and value at global scale.

Further Reading

About MinervaDB Corporation 194 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.