Audio version of the article
A data warehouse is a large collection of business data used to help an organization make decisions. The concept of the data warehouse has existed since the 1980s, when it was developed to help transition data from merely powering operations to fueling decision support systems that reveal business intelligence. The large amount of data in data warehouses comes from different places such as internal applications such as marketing, sales, and finance; customer-facing apps; and external partner systems, among others.
On a technical level, a data warehouse periodically pulls data from those apps and systems; then, the data goes through formatting and import processes to match the data already in the warehouse. The data warehouse stores this processed data so it’s ready for decision makers to access. How frequently data pulls occur, or how data is formatted, etc., will vary depending on the needs of the organization.
Some benefits of a data warehouse
Organizations that use a data warehouse to assist their analytics and business intelligence see a number of substantial benefits:
- Better data — Adding data sources to a data warehouse enables organizations to ensure that they are collecting consistent and relevant data from that source. They don’t need to wonder whether the data will be accessible or inconsistent as it comes in to the system. This ensures higher data quality and data integrity for sound decision making.
- Faster decisions — Data in a warehouse is in such consistent formats that it is ready to be analyzed. It also provides the analytical power and a more complete dataset to base decisions on hard facts. Therefore, decision makers no longer need to reply on hunches, incomplete data, or poor quality data and risk delivering slow and inaccurate results.
What a data warehouse is not
1. It is not a database
It’s easy to confuse a data warehouse with a database, since both concepts share some similarities. The primary difference, however, comes into effect when a business needs to perform analytics on a large data collection. Data warehouses are made to handle this type of task, while databases are not. Here’s a comparison chart that tells the difference between the two:
|What it is||Data collected for multiple transactional purposes. Optimized for read/write access.||Aggregated transactional data, transformed and stored
for analytical purposes. Optimized for aggregation and retrieval of large data sets.
|How it’s used||Databases are made to quickly record and retrieve information.||Data warehouses store data from multiple
databases, which makes it easier to analyze.
|Types||Databases are used in data warehousing. However, the term usually refers to an online, transactional processing database. There are other types as well, including csv, html, and Excel spreadsheets used for database purposes.||A data warehouse is an analytical database that layers
on top of transactional databases to allow for analytics.
2. It is not a data lake
Although they both are built for business analytics purposes, the major difference between a data lake and a data warehouse is that a data lake stores all types of raw, structured, and unstructured data from all data sources in its native format until it is needed. By contrast, a data warehouse stores data in files or folders in a more organized fashion that is readily available for reporting and data analysis.
3. It is not a data mart
Data warehouses are also sometimes confused with data marts. But data warehouses are generally much bigger and contain a greater variety of data, while data marts are limited in their application.
Data marts are often subsets of a warehouse, designed to easily deliver specific data to a specific user, for a specific application. In the simplest terms, data marts can be thought of as single-subject, while data warehouses cover multiple subjects.
The future of the data warehouse: move to the cloud
As businesses make the move to the cloud, so too do their databases and data warehousing tools. The cloud offers many advantages: flexibility, collaboration, and accessibility from anywhere, to name a few. Popular tools like Amazon Redshift, Microsoft Azure SQL Data Warehouse, Snowflake, Google BigQuery, and have all offered businesses simple ways to warehouse and analyze their cloud data.
The cloud model lowers the barriers to entry — especially cost, complexity, and lengthy time-to-value — that have traditionally limited the adoption and successful use of data warehousing technology. It permits an organization to scale up or scale down — to turn on or turn off — data warehouse capacity as needed. Plus, it’s fast and easy to get started with a cloud data warehouse. Doing so requires neither a huge up-front investment nor a time-consuming (and no less costly) deployment process.
The cloud data warehouse architecture largely eliminates the risks endemic to the on-premises data warehouse paradigm. You don’t have to budget for and procure hardware and software. You don’t have to set aside a budget line item for annual maintenance and support. In the cloud, the cost considerations that have traditionally preoccupied data warehouse teams — budgeting for planned and unplanned system upgrades — go away.
A data warehouse example
Beachbody, a leading provider of fitness, nutrition, and weight-loss programs, needed to better target and personalize offerings to customers, in order to produce in better health outcomes for clients, and ultimately better business performance.
The company revamped its analytics architecture by adding a Hadoop-based cloud data lake on AWS, powered by Talend Real-Time Big Data. This new architecture has allowed Beachbody to reduce data acquisition time by 5x, while also improving the accuracy of the database for marketing campaigns.
This article has been published from the source link without modifications to the text. Only the headline has been changed.