on
7 minutes read
A Tale Of Two Data Worlds
Contrary to what the title of this post says, I’m not going to tell a story but I want to draw your attention to the two data worlds prevalent in the tech space. I’ve just been into Charles Dickens writings of late, hence the title.
In conversations with people who want to start analytics in their business and other data professionals, I notice some are not aware of the two different data worlds - transactional data and analytical data. For those who are aware, they are often confused or struggle to see why both data worlds exist. So let’s dive into this particular type of data chasm and shed some light on why these two data worlds exist.
The database that sits behind your software application is a transactional database. On the other hand, the database where your analysts answer complex questions from and perform analysis from is an analytical database. This difference in these two data worlds is often reflected in the organisational structures and professions these days. They also have different architectures.
To explain the difference, let’s paint a scenario in which I own a hardware store. My store has an app that the customers log in to borrow tools and the app also processes payments. The database that sits at the back end of this app and stores every interaction the customer makes with the app is called the transactional database (also known as application or operational database). A table in this database might look like this:
Date | Name | City | Tool | Amount |
---|---|---|---|---|
Dec. 01, 2021 | John Doe | Toronto | Chainsaw | $450 |
Dec. 15, 2021 | Jane Doe | Brampton | Chainsaw | $260 |
Jan. 7, 2022 | Bill Smith | Hamilton | Hammer | $300 |
Jan. 7, 2022 | Chris Lock | Toronto | Helmet | $100 |
In the transactional data world, the data in this table is read from and written into by the customer app in a row-wise manner. So if Jane Doe logs in, she only sees the items pertaining to her which are highlighted in gray above. Also, the tables in this world support concurrent multiple data writes so more than 1 customer can be using the app at the same time. In the example above, let’s imagine Bill Smith borrowed the hammer at the same time Chris Lock borrowed the helmet, the database should be able to store both records without any loss of data and in a timely manner without any conflicts. These types of operations on a database is known as Online Transactional Processing (OLTP). Relational Database Managment Systems (RDBMS) have been optimised over the years to handle these type of workloads1. Some common examples include Postgres, MySQL and SQL Server.
So far we have talked about the operational data world. The different personas in the Software Engineering field usually operate in this world and most of the work done here is geared towards increasing usability of the business application and executing business rules. There is another world of data where the work done on data is not to directly improve the performance of an app nor implement business rules but to answer questions, look back to see what has gone on in the business and predicting what might happen. This other world is the analytical data world and this is where the different data and analytics personas operate in.
Still on the example of my hardware store, after operating for a while, I’m curious to know what tools people borrow the most so that I can get more stock of those tools. My gut instinct might point me towards a certain tool based on what I observe, but I want to know what the data says and take action based on that - data-driven. So I enter the analytical world where a similar table like above exists but this time, I don’t access it row by row. Instead, I use the table from top-bottom as seen in the table below and I apply some arithmetic operations on the “Tool” column.
Date | Name | City | Tool | Amount |
---|---|---|---|---|
Dec. 01, 2021 | John Doe | Toronto | Chainsaw | $450 |
Dec. 15, 2021 | Jane Doe | Brampton | Chainsaw | $260 |
Jan. 7, 2022 | Bill Smith | Hamilton | Hammer | $300 |
Jan. 7, 2022 | Chris Lock | Toronto | Helmet | $100 |
Furthermore, if I want to know the total amount I made on each type of tool for each city for every month, I still need to work with the table in a column-wise manner but this time, with multiple columns. These types of operations on a database is known as Online Analytical Processing (OLAP). Column-oriented database systems have been optimised for these and some examples include Redshift, Synapse and Snowflake.
There are lots of differences between these two data worlds and these differences can be seen in the types of problems they try to solve. In the transactional world, rapid processing in the order of milliseconds is often emphasised so that customers can have a good user experience while in the analytical world, the complexity of questions you can ask or insights you can derive drives the value. Even though SQL is a common language spoken in both worlds, they have different data storage systems. For example, in the analytical data world, you have the data warehouses, data marts, data lakes and lake houses while in the operational data world you might find some NoSQL systems. ETL processing is the bridge that links the operational world with the analytical world and most often traffic moves in one direction. There is the concept of reverse ETL which is gaining some traction these days but that’s a topic for another day.
There are lots of databases that are being introduced to the market and the data world is booming with new products announced almost every day. Hopefully, when you read the next announcement of the new data tool, you will be able to tell what data world it belongs to.
Thanks for reading.
-
There are other types of Database Management Systems that are used in the operational data world such as Document based databases. ↩