At first glance, it may be hard to categorize a gridded report or dashboard visualization as an example of operational reporting or analytics. Most likely, it comes down to the data source that holds the data, and how end users intend to utilize it.
For example, if an HR user reading a report needs to zoom into granular details, to explain “what is the total number of openings by the department in our Northeast Sales Division right now?” then most likely we are talking about operational reporting. Instead of visualization of trends over time, users are looking for details, and that will usually require a gridded report, perhaps with outliers highlighted in red.
Operational Reporting: Granular, Real-Time
Operational reporting also requires real-time results. It should show what is happening right now. Users need to understand exactly what the status is – where do we stand in hitting our sales quota? What machines are available, and how many are offline for service?
Operational reporting’s granular view of data is very different from the 60,000-foot view provided by analytics. Who is looking for the long view? Typically the C-suite, or high-level personnel who are thinking in terms of years, not months or weeks. They may rely on data scientists to compile years of operational data or streams of big data into visualizations that tell a story over time.
To be able to see that story, enterprise companies export their transactional data (using a complex data flow) to another data source, a data warehouse. Why would an organization take perfectly good data and copy it into an entirely different database? Simply this: a data warehouse is designed for data analytics.
For Analytics, a Data Warehouse
When data is loaded to a warehouse, it is pre-aggregated to return results fast for common queries. Often database tables are denormalized – some fields are duplicated across several tables, to reduce the number of databases joins required. The database engine itself may be optimized for analytic queries. All of this improves query performance, even when fetching tens of thousands (or millions) of data points. The design of a data warehouse is often a very simple star or snowflake schema, with a central fact table surrounded by dimensional tables. By contrast, the architecture of an operational database tends to be quite complex. That’s because database tables are normalized to break redundant fields out into their own dedicated tables. Transactional (OLTP) databases are designed to optimize additions, deletions, and updates, not read-only queries. As a result, data quality is good. Additions and updates to the database are fast, and storage is used efficiently.
But while this design works well for operational reporting, it is a less than optimal solution for analytics. Operational reporting queries a live system, one in use by many personnel during business hours. There’s a lot going on at any time, and sometimes a very high volume of I/O against the database. That could happen, for example, twice a month when payroll is run, or quarterly when end-of-quarter reports are generated, or during yearly processes like printing tax forms. Because of this, operational reports tend to be scheduled to run at times that work around these high-volume jobs.
As mentioned earlier, operational reports focus on a more granular view of current activity. In contrast, analytics queries give us a view into trends over time. This type of query, then, is going to read a lot of historical data. Unfortunately, an OLTP database is not designed for fetching the massive volumes of data an analytics query requires. At some point, a long-running query is going to impact production systems. Or a query that is poorly written is going to cause production to hang. Or, users will simply get tired of long waits when they are doing what looks like to them to be simple transactions. Using a data warehouse for analytics queries is faster and safer – even long-running queries won’t disrupt production systems.
A Data Warehouse Offers More Than Fast Queries
In addition to analytics performance, a data warehouse offers other benefits. Transactional databases tend to be siloed. Customer data is held on a completely different database than a general ledger or accounts payable. This makes it difficult, or impossible, to obtain complete, holistic insights across an organization. But a data warehouse can ingest data from all of these sources and make it available via data marts designed for the needs of different business users. Often there is a metadata layer that contains business-relevant names for fields, to make it easy for employees to create their own visualizations and dashboards. With an OLTP database, trying to build a simple report or visualization will often require joining many tables. With a data mart, users don’t need to try to locate and join internal tables.
One drawback of a data warehouse is that it is a snapshot of data at the time of the last load. So the data is all in one place and curated for analytics, but not real-time. But what it lacks in timeliness, it makes up for in the speed with which it handles massive data volumes that enable mining the data for insights.
Change Data Capture Is Another Solution
However, there is another way to deliver the real-time benefits of an operational system and the speed and other benefits of a data warehouse. Operational databases can be set up to replicate changes in near real-time to the data warehouse using change data capture (CDC) functionality. For example, Oracle allows both log- and trigger-based change data captures to replicate changes to a data warehouse. Using CDC, organizations get the best of both worlds, a data warehouse that reflects close to real-time data and an operational database that is not impacted by analytic queries.