THE EVOLUTION OF DB DESIGN
The evolution of database design in Cosmos DB forces us to rethink our traditional relational structure.
The biggest cost factor in databases used to be storage, but today latency and reliability across geographic locations are bigger concerns.
In a typical relational database you have a table for customers and a table for customer addresses. Your primary key in one table is a foreign key in another.
These structures ensure that your master data tables or transactional tables take up as little space as possible. They are easier to search, easier to backup and restore. In Azure Cosmos DB the cost for storage is nominal, and Cosmos indexes everything! Now the bigger concern is designing your data structure for throughput. The key is to minimize the effort required to report on your data to minimize the request units (RUs) used. Now we think about references or embedded data. We have to think about how people will use our data.
QUESTIONS TO ASK
The first question we have to ask is what is our partition based on? What is an important unique record? This will determine when we need a new document.
Next, how can we deliver everything we need from this single document? We can embed up to 2mb worth of data in a single document. Embedding data in a single document minimizes touchpoints for your query, and reduces your costs.
TECHNICAL DOCUMENTATION
For more about Containers and Documents in Cosmos DB see the documentation online from Microsoft.
Partitioning and horizontal scaling – Azure Cosmos DB | Microsoft Learn.
In a nutshell containers are like tables, and documents are like table records, but you can embed a lot in a document. Cosmos DB documents are complete records. This is in contrast to SQL table records which usually join with other records to provide any useful information.
OTHER CONSIDERATIONS
- We have to think about how we update records everywhere a value is referenced since we are embedding values in multiple locations.
- Cosmos DB does not support joins. This seems like a significant limitation to convert most existing processes, as relational databases are heavily dependent on joins.
- Cosmos DB does not support t-sql, it uses .NET, Java, Python, Ruby, JavaScript. It is intended to speak the language of the web, for always available cloud based applications.
- The output is not immediately available as a table view, there is a Table API available for Cosmos DB to allow you to see your data in a tabular format.
COSTS – COSMOS DB CAPACITY CALCULATOR
Cosmos DB is designed for web ready applications that require low latency, flexible, scalable data. When we discuss optimizing for throughput to minimize costs, this should not be a significant concern as long as you have a partner that can help you with your technical design. The costs involved in request units are minimal, there are a few different costing models that should be discussed with your Microsoft partner. The serverless model includes pricing $0.25/ 1million request units (RU), a well designed READ command should cost 1 RU, 2.8 RU for a query, and 7-9 RU for a write command. If you are querying across multiple containers it may become more costly, and you should be cautious of services running queries every few seconds which would become expensive.
Use the online capacity calculator to help calculate the estimated monthly costs at https://cosmos.azure.com/capacitycalculator .
SUMMARY
Cosmos DB and SQL are different databases that store data in different formats and use different languages. Cosmos DB is best for cloud-based applications that rely on high scaling and are always available. SQL databases are best for structured, transactional applications that require data consistency and well-defined schemas.
For help with your data project email MIS today [email protected]
manchesterintegratedsolutions.com