CTE, subquery, and view are all database programming constructs that allow you to create reusable and flexible queries.
SUBQUERY
A subquery is an embedded query within another query, typically in the WHERE clause of the main query. A subquery can be used to filter, sort or aggregate data, and nested to create complex queries.
A view is a virtual table where you can program logic. They are simple to write, and they keep the processing at the server, but the results are generated at run time and can be slow if you’re handling a lot of data.
CTE
A Common Table Expression (CTE) is a temporary name assigned to a query that can be referenced multiple times in a larger query. CTEs are useful for creating recursive queries or queries that require complex logic. If you have a query that runs very inefficiently, converting it to use a CTE will often bring processing time way down.
Over the next few weeks we will look at some basic approaches to writing queries with some simple examples.
Email today for help with your data optimization questions.
There’s been a lot of talk about star schemas and snowflake schemas at SQLbits this week. If you’re not familiar with the terms they relate to two divergent approaches to data that are important to consider when moving to a cloud based reporting solution.
WHAT IS IT?
A star schema is a simple and denormalized structure where the fact table is at the center and surrounded by dimension tables. It looks like a star because of its shape.
On the other hand, a snowflake schema is a more normalized structure that breaks down dimension tables into sub-dimension tables. This makes the schema look like a snowflake, hence the name. This allows for better scalability and more complex relationships between tables. However, it can also make querying data more complicated and slower due to the increased number of joins required.
The star schema is simpler and faster for querying data, but less flexible, while the snowflake schema is more flexible and scalable, but can be more complex and slower for querying data. Both of these approaches have a purposes and a strength and in my experience a combination of both approaches is often best.
Snowflake is the standard in older systems, or in highly controlled transactional systems. It used to be taboo to list a value redundantly in two places. Star schemas are more likely to become out of sync, because you have to update values in multiple locations. Over time; however, it’s clear that multiple joins in ad hoc querying is inefficient, especially when we’re reporting on a large amount of data, and particularly in tables that are actively processing transactions.
MODERN REPORTING FOR LEGACY SYSTEMS
Preparing your data for consumption by converting legacy snowflake schema tables to a simplified star schema set of tables that users can query without locking or blocking transactional tables makes the most of both design approaches. Talk to one of our data experts to discuss how your data factory or warehouse resources can be simplified and turned into useful, meaningful business intelligence using modern storage solutions along with an intelligent and efficient approach to design that will optimize scanning and extraction.
You can’t make the most of the modern tools available by simply importing your legacy data into a data lake without some tweaks with the help of a reporting expert, and not just a . Get the advice of someone who understands the pitfalls. Keep your queries efficient to keep your costs low, and watch your data turn into meaningful action.
Email today for help with your data optimization questions.
The evolution of database design in Cosmos DB forces us to rethink our traditional relational structure.
The biggest cost factor in databases used to be storage, but today latency and reliability across geographic locations are bigger concerns.
In a typical relational database you have a table for customers and a table for customer addresses. Your primary key in one table is a foreign key in another.
These structures ensure that your master data tables or transactional tables take up as little space as possible. They are easier to search, easier to backup and restore. In Azure Cosmos DB the cost for storage is nominal, and Cosmos indexes everything! Now the bigger concern is designing your data structure for throughput. The key is to minimize the effort required to report on your data to minimize the request units (RUs) used. Now we think about references or embedded data. We have to think about how people will use our data.
QUESTIONS TO ASK
The first question we have to ask is what is our partition based on? What is an important unique record? This will determine when we need a new document.
Next, how can we deliver everything we need from this single document? We can embed up to 2mb worth of data in a single document. Embedding data in a single document minimizes touchpoints for your query, and reduces your costs.
TECHNICAL DOCUMENTATION
For more about Containers and Documents in Cosmos DB see the documentation online from Microsoft.
In a nutshell containers are like tables, and documents are like table records, but you can embed a lot in a document. Cosmos DB documents are complete records. This is in contrast to SQL table records which usually join with other records to provide any useful information.
OTHER CONSIDERATIONS
We have to think about how we update records everywhere a value is referenced since we are embedding values in multiple locations.
Cosmos DB does not support joins. This seems like a significant limitation to convert most existing processes, as relational databases are heavily dependent on joins.
Cosmos DB does not support t-sql, it uses .NET, Java, Python, Ruby, JavaScript. It is intended to speak the language of the web, for always available cloud based applications.
The output is not immediately available as a table view, there is a Table API available for Cosmos DB to allow you to see your data in a tabular format.
COSTS – COSMOS DB CAPACITY CALCULATOR
Cosmos DB is designed for web ready applications that require low latency, flexible, scalable data. When we discuss optimizing for throughput to minimize costs, this should not be a significant concern as long as you have a partner that can help you with your technical design. The costs involved in request units are minimal, there are a few different costing models that should be discussed with your Microsoft partner. The serverless model includes pricing $0.25/ 1million request units (RU), a well designed READ command should cost 1 RU, 2.8 RU for a query, and 7-9 RU for a write command. If you are querying across multiple containers it may become more costly, and you should be cautious of services running queries every few seconds which would become expensive.
Cosmos DB and SQL are different databases that store data in different formats and use different languages. Cosmos DB is best for cloud-based applications that rely on high scaling and are always available. SQL databases are best for structured, transactional applications that require data consistency and well-defined schemas.