Professional Documents
Culture Documents
15
16 3 Enterprise Application Characteristics
OLAP-workloads are read-only and that the two workloads rely on “Oppos-
ing Laws of Database Physics” [Fre95].
Yet, research in current enterprise systems showed that this statement
is not true [KGZP10, KKG+ 11]. The main di↵erence between systems that
handle these query types is that OLTP systems handle more queries with a
single select or queries that are highly selective returning only a few tuples,
whereas OLAP systems calculate aggregations for only a few columns of a
table, but for a large number of tuples.
For the synchronization of the analytical system with the transactional
system(s), a cost-intensive ETL (Extract-Transform-Load) process is required.
The ETL process takes a lot of time and is relatively complex, because all
relevant changes have to be extracted from the outside source or sources if
there are several, data is transformed to fit analytical needs, and it is loaded
into the target database.
While the separation of the database into two systems allows for workload
specific optimizations in both systems, it also has a number of drawbacks:
• The OLAP system does not have the latest data, because the latency
between the systems can range from minutes to hours, or even days.
Consequently, many decisions have to rely on stale data instead of using
the latest information.
• To achieve acceptable performance, OLAP systems work with predefined,
materialized aggregates which reduce the query flexibility of the user.
• Data redundancy is high. Similar information is stored in both systems,
just di↵erently optimized.
• The schemas of the OLTP and OLAP systems are di↵erent, which intro-
duces complexity for applications using both of them and for the ETL
process synchronizing data between the systems.
The workload analysis of multiple real customer systems reveals that OLTP
and OLAP systems are not as di↵erent as expected. For OLTP systems, the
lookup rate is only 10 % higher than for OLAP systems. The number of
inserts is a little higher on the OLTP side. However, the OLAP systems
are also faced with inserts, as they have to permanently update their data.
The next observation is that the number of updates in OLTP systems is not
very high [KKG+ 11]. In the high-tech companies it is about 12 %. It means
3.6 Enterprise Data Characteristics 17
that about 88 % of all tuples saved in the transactional database are never
updated. In other industry sectors, research showed even lower update rates,
e.g., less than 1 % in banking and discrete manufacturing [KKG+ 11].
This fact leads to the assumption that updating as such or alternatively
deleting the old tuple and inserting the new one and keeping track of changes
in a “side note” like it is done in current systems is no longer necessary. In-
stead, changed or deleted tuples can be inserted with according time stamps
or invalidation flags. The additional benefit of this so called insert-only ap-
proach is that the complete transactional data history and a tuple’s life cycle
are saved in the database automatically. More details about the insert-only
approach will be provided in Chapter 26.
The further fact that workloads are not that di↵erent after all leads to the
vision of reuniting the two systems and to combine OLTP and OLAP data in
one system.
The main benefit of the combination is that both, transactional and analytical
queries can be executed on the same machine using the same set of data as
a “single source of truth”. ETL-processing becomes obsolete.
Using modern hardware, pre-computed aggregates and materialized
views can be eliminated as data aggregation can be executed on-demand
and views can be provided virtually. With the expected response time of
analytical queries below one second, it is possible to do the analytical query
processing on the transactional data directly anytime and anywhere. By
dropping the pre-computation of aggregates and materialization of views,
applications and data structures can be simplified, as management of aggre-
gates and views (building, maintaining, and storing them) is not necessary
any longer.
A mixed workload combines the characteristics of OLAP and OLTP work-
loads. The queries in the workload can have full row operations or retrieve
only a small number of columns. Queries can be simple or complex, pre-
determined or ad hoc. This includes analytical queries that now run on
latest transactional data and are able to see the real-time changes.
that are used have a low cardinality of values, i.e., there are very few distinct
values. Further, in many columns NULL or default values are dominant, so
the entropy (information containment) of these column is very low (near
zero).
These characteristics facilitate the efficient use of compression techniques,
resulting in lower memory consumption and better query performance as
will be seen in later chapters.
3.7 References