Introduction to Azure Cosmos DB Partitioning

Introduction to Azure Cosmos DB Partitioning

In this article, we will take a look at Azure Cosmos DB partitioning, what it is, and how it works.

Azure Cosmos DB Overview

Azure Cosmos DB is a fully managed NoSQL database service offered by Microsoft to facilitate modern application development. This means that Azure Cosmos DB does not enforce any specific database schema, it scales automatically as our performance and storage needs grow, and we, the customers, need not to be concerned about this from an operational perspective.

I would highly recommend that you read the official Microsoft documentation for Azure Cosmos DB if you have not done so already.

Why do we need to partition at all?

Partitioning in this context speaks to an activity that is going to result in an improvement in performance and a reduction in cost, but why do we need to partition at all? Azure Cosmos DB is a distributed system which means that it makes use of multiple smaller systems (scale-out) working together, rather than a single system that will need to be scaled-up for performance gains. Given that Cosmos DB is distributed, our data is being stored on these multiple smaller systems. As we would expect, each machine (partition) has its respective limits (storage, networking, compute, networking, etc.) and you want to ensure that when we are reading and writing data the load is distributed evenly (or as evenly as possible) across all partitions.

Azure Cosmos DB does physical partitioning and logical partition. The differences are explained below:

  • Logical Partition - Stores all data associated with the same partition key value (we will look at what partition keys are shortly).

  • Physical Partition - The actual machines that consists of SSD backed storage, and compute power.

We actually don't need to worry about actually creating these partitions (logical or physical) manually as Azure does this for us. However, we are responsible for creating sensible partition keys that will help us to reduce your Azure spend while improving your application's performance at the end of the day.

The figure below depicts a simple representation of Cosmos DB's logical and physical partitions and how they relate to each other.

azure-cosmosdb-partitioning-figure1.png

How do we actually partition our data?

n the previous section we observed partitioning and why it is important to Azure Cosmos DB and how it affects us as consumers of the product. In this section, we will explore how we actually perform partitioning on our data. Remember, the goal here is not to partition for partition sake, as not choosing the right partition strategy can inevitably result in higher costs and lower performance for our applications.

When we set out to partition our data or develop the right partition strategy based on our data and nature of our application, we have control only over the logical partitions. What I mean by this is for each unique partition key that we have, Azure will create a logical partition which will store the corresponding documents. Let us look at an example.

{
    "id": "c27ccb94-8e5c-4059-995f-e92e62a2403a",
    "device_id": "d-7827312-c9283",
    "status": "failing",
    "message": "sensor health check failed",
    "priority": 3,
    "_rid": "r29SAMRxaPECAAAAAAAAAA==",
    "_self": "dbs/r29SAA==/colls/r29SAMRxaPE=/docs/r29SAMRxaPECAAAAAAAAAA==/",
    "_etag": "\"0000fa0e-0000-0100-0000-6178e6b80000\"",
    "_attachments": "attachments/",
    "_ts": 1635313336
}

In the above example, we have an Azure Cosmos DB document whose partition key is configured to be the device_id property. This means that documents with "device_id": "d-7827312-c9283" are going to be stored in the same logical partition, and each individual document within a partition can be uniquely identified by the document's id property.

If we look at the bigger picture now, this means our Azure Cosmos DB service will create a new logical partition for each unique device_id property. Some other things to consider are:

  • A partition key has two parts; a partition key name, and a partition key value. This is device_id and d-7827312-c9283 in our example above.

  • Your partition key should be a property that does not change during the lifetime of a document, as you cannot update this value.

  • Your partition key should a wide range of possible values (high cardinality)

Conclusion

Azure Cosmos DB has been built from the group up to support applications with immense scalability and performance needs. However, we have the responsibility of ensuring that we select a "good" partition key that will allow Cosmos DB to distribute the load across multiple logical and physical partitions.

If you want to learn more about Azure Cosmos DB, please checkout the Microsoft documentation and stay tuned for more posts in this series.