Data aggregation is the process where data is collected and presented in a summarized format for statistical analysis and to effectively achieve business objectives. Data aggregation is vital to data warehousing as it helps to make decisions based on vast amounts of raw data. Data aggregation provides the ability to forecast future trends and aids in predictive modeling. Effective data aggregation techniques help to minimize performance problems.
Aggregation provides more information based on related clusters of data such as an individual’s income or profession. For example, a store may want to look at the sales performance for different regions, so they would aggregate the sales data based on region.
Queries with aggregation (with mathematical functions) provide faster results. For example, the query for the sum of sales of a product in a month brings up faster results than the query for sales of the product in general. This is because the aggregation is applied on the former query and only the sum is displayed, while the latter query brings up individual records. Faster queries imply the better performance of the system.
Types of aggregation with mathematical functions:
- Sum — Adds together all the specified data to get a total.
- Average — Computes the average value of the specific data.
- Max — Displays the highest value for each category.
- Min — Displays the lowest value for each category.
- Count — Counts the total number of data entries for each category.
Data can also be aggregated by date, allowing trends to be shown over a period of years, quarters, months, etc. These aggregations could be placed in a hierarchy, where you can view the data trends over a period of years, then see the data trends over months for each individual year.