Data Warehousing

Data warehousing is combining data from multiple and
usually varied sources into one comprehensive and easily

manipulated database.
1. Because data warehousing creates one database in the
end, the number of sources can be anything
2. The final result, however, is homogeneous data, which
can be more easily manipulated.
3. Companies may very well use data warehousing to view
day-to-day operations, but its primary function is
facilitating strategic planning resulting from long-term
data overviews.
4. From such overviews, business models, forecasts, and
other reports and projections can be made.
Because the data stored in data warehouses is intended to
provide more overview-like reporting, the data is read-
only. If you want to update the data stored via data
warehousing, you'll need to build a new query.
 Data warehousing is typically used by larger companies
analyzing larger sets of data for enterprise purposes.
 Data warehousing often includes smaller amounts of data
grouped into data marts.
Data marts are much more specific and targeted in their
storage and reporting.
In this way, a larger company might have at its disposal
both data warehousing and data marts, allowing users to
choose the source and functionality depending on current
needs.
You need to load your data warehouse regularly so that it
can serve its purpose of facilitating business analysis.
To do this, data from one or more operational systems
needs to be extracted and copied into the warehouse.
The process of extracting data from source systems and
bringing it into the data warehouse is commonly called
ETL, which stands for extraction, transformation, and
loading.
During extraction, the desired data is identified and
extracted from many different sources, including database
systems and applications.
Very often, it is not possible to identify the specific subset
of interest, therefore more data than necessary has to be
extracted, so the identification of the relevant data will be
done at a later point in time
The size of the extracted data varies from hundreds of
kilobytes up to gigabytes, depending on the source system
and the business situation.
The time between two (logically) identical extractions may
vary between days/hours and minutes to near real-time.
After extracting data, it has to be physically transported to
the target system or an intermediate system for further
processing.
In other words the data warehouse provides data that is
already transformed and summarized, therefore making it
an appropriate environment for more efficient DSS
Characteristics of a data
warehouse
There are generally four characteristics that
describe a data ware house:
subject-oriented: data are organized according to
subject instead of application e.g. an insurance company
using a data warehouse would organize their data by
customer, premium, and claim, instead of by different
products (auto, life, etc.). The data organized by subject
contain only the information necessary for decision
support processing.
time-variant: The data warehouse contains a place for
storing data that are five to 10 years old, or older, to be
used for comparisons, trends, and forecasting. These data
are not updated.
integrated: When data resides in many separate
applications in the operational environment, encoding of
data is often inconsistent. For instance, in one application,
gender might be coded as "m" and "f" in another by 0 and
1. When data are moved from the operational environment
into the data warehouse, they assume a consistent coding
convention e.g. gender data is transformed to "m" and "f".
non-volatile: Data are not updated or changed in any

way once they enter the data warehouse, but are only
loaded and accessed.
Data Mining
Data mining is the search for relationships and

global patterns that exist in large databases but are
`hidden' among the vast amount of data, such as a
relationship between patient data and their medical
diagnosis. These relationships represent valuable
knowledge about the database and the objects in
the database and, the database is a faithful mirror,
of the real world registered by the database.
Phases in Data Mining
Selection - selecting or segmenting the data according to
some criteria e.g. all those people who own a car, in this
way subsets of the data can be determined.
Preprocessing - this is the data cleansing stage where
certain information is removed which is deemed
unnecessary and may slow down queries.Also the data is
reconfigured to ensure a consistent format as there is a
possibility of inconsistent formats because the data is
drawn from several sources e.g. sex may recorded as f or m
and also as 1 or 0.
Transformation -. The data is made useable and
navigable/passable and portable.
Data mining - this stage is concerned with the extraction
of patterns from the data.
Interpretation and evaluation - the patterns identified by
the system are interpreted into knowledge which can then
be used to support human decision-making e.g. prediction
and classification tasks, summarizing the contents of a
database or explaining observed phenomena.

Data Warehousing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehousing

Uploaded by

Copyright:

Available Formats

Data warehousing is combining data from multiple and

usually varied sources into one comprehensive and easily

non-volatile: Data are not updated or changed in any

Data mining is the search for relationships and

You might also like