Difference between a Data Warehouse and a Live Datamart?
Data Warehouses have existed for many years in almost every company. While they are still as good and relevant for the same use cases as they were 20 years ago, they cannot solve new, existing challenges and those sure to come in a ever-changing digital world. The upcoming sections will clarify when to still use a Data Warehouse and when to use a modern Live Datamart instead.
What is a Data Warehouse (DWH)?
A Data Warehouse is a central repository of integrated data from more disparate sources. It stores historical data to create analytical reports for knowledge workers throughout the enterprise. A DWH includes a server, which stores the historical data and a client for analysis and reporting.
An ETL (Extract-Transform-Load) process extracts data from homogeneous or heterogeneous data sources such as files or relational databases, transforms the data for storing it in proper format or structure for querying and analysis purposes. Data is usually transferred in long-running batch processes from operational databases to the DWH. When data gets into the DWH, it is already at rest and some minutes, hours, or even days old.
Widespread DWHs are Teradata, EMC Greenplum or IBM Netezza. A client—often called Business Intelligence (BI) or Data Discovery tool—is either part of the server product (usually just used for reporting, e.g. weekly or monthly sales reports), or an independent solution such as TIBCO Spotfire, which offers business users the ability to discover the data easily to find new patterns or other insights. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analyses.
Finally, we can further classify some Data Warehouses that are deployed and focused on a single subject or functional area (sales, finance or marketing) as a Datamart. Next, we explore how a Live Datamart can enhance your business.
What is a Live Datamart (LDM)?
A Live Datamart is like a Data Warehouse or a Datamart derived from a Data Warehouse, but for real-time streaming data from sensors, social feeds, trading markets, and other messaging systems. It provides a push-based, real-time analytics solution that enables business users to analyze, anticipate, and receive alerts on key events as they occur, and act on opportunities or threats while they matter. You can manage and override escalations while they are happening.
The technical key difference to the “static database” of a DWH is the continuous query engine of a LDM server that processes high-speed streaming data, creates fully materialized live data tables, manages ad-hoc queries from clients, and continuously pushes live results as conditions change in real time.
The streaming data is ingested, normalized, and viewed in one user interface – the single LDM client. The client can be
- Rich client with out-of-the-box support for tables, charts, and queries via “drag & drop” user interface
- Self-developed custom rich client using Java or .NET APIs
- Web user interface integrated into a website, portal or mobile application using standards such as HTML5 and JavaScript
From an end-user perspective, an LDM client can be used, for example, by a power user on its laptop, the operations center on a big screen or people on-site at customers using tablets. Of course, events can also be handled automatically—if appropriate (e.g. for sending out an alert to another system).
Combination of Historical and Real-Time Data
Of course, a Live Datamart can also connect to a historical database and define queries to be executed against that database. To an end user, LiveView makes historical tables look just like live tables, which allows users to access both types of data—live and historical—in the same way, with one user interface. Besides, Live Datamart can also easily populate historical databases based on the real-time data it has captured, either with batch end of day loads or parallel capture. See this blog post for some example use cases.
TIBCO Live Datamart is the only available option on the market, where you can combine automated streaming analytics and proactive human interaction with one toolset.
When to use which one?
Essentially, a conventional Data Warehouse or Datamart helps manage data based on yesterday, while a Live Datamart helps manage intraday data.
Use a Data Warehouse in combination with a Business Intelligence tool for analysis and reporting of historical data. This way, you can analyze and compare different strategies, departments, financial data, order information, etc. with regards to revenue, costs, and other KPIs. You can also find patterns in historical data and implement these patterns in real time with streaming analytics for new events (e.g. fraud detection, predictive fault management, cross-selling).
Use a Live Datamart to manage operations in real time while they are happening, instead of too late. This way, you can change marketing strategies, change cross-selling offers, or repair and replace machines and devices, which will (probably) break soon. A Live Datamart is not just a Dashboard for monitoring—it’s actionable!
In summary, the key difference is that a Live Datamart allows being proactive both automatically and with human interaction (whichever is appropriate) while events are happening. A Data Warehouse only allows analyzing events that already happened.
Slide Deck and Webinar
Here is a slide deck discussing this topic:
Data Warehouse vs. Live Datamart – Comparison and Differences from
Kai Wähner
The following 15min on-demand webinar contains a video discussing the above slides.
Reference: | Difference between a Data Warehouse and a Live Datamart? from our JCG partner Kai Waehner at the Blog about Java EE / SOA / Cloud Computing blog. |