You are on page 1of 2

Research on Real Time Data Warehouse Architecture

Rui Jia1, Shicheng Xu1, and Chengbao Peng1,2


1
Neusoft Corporation, Xinxiu Road, Shenyang, P.R. China
2
Northeastern Universities, Wenhua Road, Shenyang, P.R. China
{jia.r,xushicheng,pengcb}@neusoft.com

Abstract. Real time data warehouse is the research hotspots of data warehouse.
It expands the application scope of data warehouse and provides real-time
decision-making system for business users. This paper describes the concepts of
real time data warehouse and proposes a real time data warehouse architecture
which is based on real-time cache storage. The architecture consists of three
main components: real-time data capture and integration, business event
management component and view materialization decision. There are two key
technologies: real-time data extraction and materialized view decision-making.
This paper describes existing solutions and their shortcomings, then proposes
feasible technical solutions: real-time data extraction based on transaction log
analysis and materialized view estimation model with time factor.

Keywords: real time data warehouse, data cube, real time storage, materialized
view.

1 Introduction

With the development of information technology, massive data have been generated,
enterprise need to analysis data efficiently and accurately. Data warehouse, online
analysis, business intelligence and data mining are developing, these technologies
help enterprise to analyze data and make businesses decision. Traditional data
warehouse system use historical data aggregation and analysis to provide strategic
decision making, long-term planning, and product management for corporate decision
makers. However, enterprise hope data warehouse to provide real-time strategic
decision making, such as real-time marketing, personalized service, but the traditional
data warehouse technologies can’t meet these needs.
Real-time data warehouse is a new data warehouse architecture which is based on
the traditional data warehouse development. Real-time means detect and capture the
changed data from business systems in time, and load data into the data warehouse.
Users can access and query real-time data warehouse to make tactical decision
analysis. Currently there have some academic research and product development
about real-time data warehouse. Literature [1] proposes active data warehouse base on
ODS and data warehouse concept, which can provide both strategic and tactical
decision-making for enterprise. Other representative research results [2-4] on active
data warehouse are also introduced. Literature [5-8] describes the challenge of data

Y. Yang, M. Ma, and B. Liu (Eds.): ICICA 2013, Part II, CCIS 392, pp. 333–342, 2013.
© Springer-Verlag Berlin Heidelberg 2013
334 R. Jia, S. Xu, and C. Peng

capture-efficient ETL and proposes their solutions. Data Integration is one solution,
and Change Data Capture is the key technology for data integration.
This paper compares traditional data warehouse architecture and real-time data
warehouse architecture, and propose a new real-time data warehouse architecture
which is based on real-time data cache. The paper focus on two key technologies:
real-time data extraction and real-time view of decision choice, propose specific and
feasible technical solutions: real-time data extraction based on log analysis and real-
time view estimation model which has fine timeliness and effectiveness.

2 Traditional Data Warehouse Architecture and Problem

Traditional data warehouses usually consists of business data system, ETL tool, data
warehouse, business intelligence and online analysis tools, etc.

Fig. 1. Traditional data warehouse architecture

ETL tool extract data from business systems by using batch mode at scheduled
time, then clean and transform data, finally load the processed data into data
warehouse by using batch mode. Business Intelligence tool provides decision
analysis, business reports and data mining base on the data warehouse.
The real-time demand requires synchronizing changed data from business system
to data warehouse for analyzing. This brings many technical problems which
traditional data warehouse has to solve [9-12].
Traditional ETL tool extract data in batch form, and need to monopolize data
warehouse for certain time. But real-time update can’t tolerate data warehouse being
monopolized for long time. Data warehouse update and user query are usually occur
at the same time, this should be concern especially.
In order to improve analysis performance, traditional data warehouse usually
calculate lots of aggregation data, but data will be updated frequently in real-time
environment which causes pre-calculated data become invalid. Also, user query
usually needs to execute many operations to complete analysis task, so real-time data
update will cause analysis result inconsistent problem.

You might also like