Professional Documents
Culture Documents
DWA Unit 5-1
DWA Unit 5-1
Government
DATA WAREHOUSE
c1 GISTNIC (GISTNIC)
Informatics Centre
Service Terminal of National
(NIC) to
National Informatics Centr
Information
Géneral taken by
initiaive
ata Warehouse" an issues
is national
database by the government on
informnation in the economy
provide a comprehe1Sive
diverse subjecisikefood and
agriculture lo trénds
was collated
base
ranging across science and technoogy./This
information
most
on economists and,
and latest updates bureaucrats, politicians,
i n f o r m a t i o n needs of
to fulfil the citizens. solutionThe
of all, the KAS softwar
important aWeb-enabler in
GISTNIC data warehouse is to key Gecision-maktiE
The aims at providing opdfne
intormation with
1998.
for Tamil Nadu was implemented during
ns GISTNIC data warehouse
6l
62 a l a Warehousimg-Conepts, Techniques, Produet s and Applicalioins
pursue this in India as welI. We shall endeavor to extend full support at all
imes.
Rolini Midha, Managing Dirrctor, SAS Tustitute ndia Put. Ltd
(contid.)
Data Warclhousing in the lamil Nudiu Gocerment 63
Table C1.1 Data Marts of various sectors and their applications (contd.)
essential commodities|like
ices of various
price Rice analysis
rice, oil, cereals, etc. Using Forecast-rice prices
vegetables, sugar,
warehouse, end-users now
the GISTNIC data
about trends in
have the updated information
and will thus be able to
closely
price change
effectively (see
monitor the prices
more
e
Dlei ANiidaiel aeie
Marie
Heb
hdw
plons
h
ow S.cset
Avslabie Not Avatable
Medlca Faclily[ANA
VilageCount Viage Count obset
ducationa s t tute fA/Ne
Cistrictiame Auaiabl
COREATORE
196
DHARMAPUA 119
DNDIGUL ANRA
BHAVAN
DHARAPURAM
FRODE
GOPICHETPALAYAM
ERODE 12
KANGAYAM
33 2 Close
PERLNDURA
14
KANNYAHLMAR
MACURA
LGIRY
SATHYAMANGALAM
276 T
330
PUDUNKOTTA 241
RAMANATHAPLRAM a50
417
SALEM 183
SMAGANGA 694 632
9OUTH ARCOT
1174
THANLJAVUR
is used to
amenities data. This report
the village
on
2
SAS/multidimensional report based on availability ot
various amenities.
analyse villages in Tamil Nadu
66 Data Warehousing-Concepts, Techniques, Products and Applications
SAS
Mndow tHep
Total number of
Tiruchirappalli
2537
Thanjavur
2393
Thiruvallur
890
Salem
1809
Dharmapuri
1755
Vellore
414
000
2000
3000
Fig. C1.3 An
output of
the
displays the numbertop/bottom
of analysis
primary schools on the
in top village amenities screen
seven districts dda
in Tamil
Data Warehousing in the Tamil Nadu Govrrnment 67
hdow Hhe
obals plens
S6-arites
Ranga Arulyss-Ananie
Post&Telegraph Facility Range
Education
Be tween
e e n -5 Kas
0-5 Kn
Medica
Orinkisg eter
Market
n llage
CoMRuniCet ion
Deto A
5-10 Kas
Between
Freqsency
0-5 KMSs
EDUCATTON=IN UILLAGE7 MEDICALBETUEEN
oepts. Truhngues,
8 atn Warehousine
age Aier 1e
3th
RUPROTOW I\UI SS NOLYSIS VARIA3E
Prinary Schoa
Middie School
High Schoo
Bre-Uni versi ty Co lege
OPROTTOM Graduat ion Coilege
Adul: Literacy Centres
industr ial School
Training School
Distance Fron Toum
irrig. dei thout Elec
BY irrig.e ith Eec.
irrig. Tube del thout Elec.
Irrig. Tube del ith Elec
SHOW RE SUL TS irrigat o GovernMent Canals
rrigat on Priuate Canais
Irrigatio River
Irrigat on Tank
IrrigatioLake
Irigation - aterfa
Irrigation Other Source
Fig. C1.5 A
top/bottom alysis
Vilages based onanalysis sereen
screen
various which helps in
in
parameters mentioned listing
in the
nam
names
mes of
ot
talukas
udistricts/tal
list bOx.
Data
Warchousing in the Tamil
Nadu (.overnment 69
carding
STATON
CHITHER
CHI TRAICHAUIAD
CHTRAR
TTAVEATT
COT MBATORE TOUN
in Tamil
various weather stations
rainfall levels at line
plot for tracking the screen, the
A line in the list box
screen on
Fig. C1.6 stations
weather
one of the On clicking
Nadu. On selecting selected weather station.
rainfall level for the for the data
to reflect the data and rainfall level
plot changes the graph displays the
any point on the line plot
point.
Tehniques, I70uiets 1d Applications
D a t a Warehoising-Coneepts,
Re Fel Me
COMBATORE CUODALORE
dngs-Tobal Aee ds-Te
DHNRIMAPUR
Arie.Te
253 9,132
11408 78
4568 27 805 6e
22906 357 11
46,243 84091 8,363 180 2 113,011 29
17 206 20,630 19806 8
171 8 11
1026 17,74 5.130 52927 94
Indlie 17785 237 2434 413
MEDILM 4512 95
1.178 171 23
1032 37,167 43,180 16,319 92.763 34
ndividus
S3MED Fema 16,006 57 6.986 1,08 11201 4
557
132
76.201 63501 64 2730 101096 73
14,043 9,751 6911 4,445 4976 10
04 16
ls Grephs anes
EWAGMED
600.08
000.08 A100.08
00.00
600.00
20
00 200 00
TOTAO
TOTALMO
TOT
QINTO Moldinge
oldings
Size oldin
Jolnt -Iojtitytionel Total
Class Iidividn
19.0 S.0hec.
24. 5.0 hec.
7.5-10.0 hec.
s.-1.5 hec.
5.0 7.5 hec.
7.5 10.0 bec.
-5.0 hec.
i.s 10.0 bec.
under holdings
helPs analysing the
area
screen The sCreen
.5T82 13
80I LCOMM
.96
.6
90ILFINE 1
1 BOILSF
14
BOILCONN
B0ILCOM
I.271
10.25 11,0
B.75 9.5
BOI LCOMM
Moments
219.0000 Sen gts 214.0000
Me .5782 S 2049.7300
Std Dev 0.6601 Uar iance 0.4357
Shovness 0.6671 Kurtosis -0.82M2
SS 19725.1869 CSS 92 .8090
i g C19 An output of analysis on the retail price of rice in 1998. It displays a line p v
Patients
Year
1998
tunan 19-5 e e a - 4
-H taas 4-4
ove 54
ot 1998.
in tour quarters
malaria patiets
male/temale
C1.10 Number of the
18
74 ata Warchousing-onpls, Tecdhniques, Produts and Applications
268166
200000
122323
4873
2966
25205
362307 360515
2063T9
29.1 16.7X B.X
MSDD
NOATH EW
lth
Fig. C1.11 This screen displays the number of male students examined in tne
camps.
i3 Data Warehouse for the Government
of Andhra Pradesh
96
Data Warehouse for the Government of Andhra Pradesh 97
reasuries handle all the govenment receipts and payments,receipts are paid
o h challans. (A challan is a form which is used for all types of receiptsL
Payments against
against cha based on the submission of different types of bills
challans,are
bills be: abstract contingent bills,
paybills,
hy the drawing officershese may
dvance GPF,
advance G travelling allowances, fully vouchered contingent deposit
repayments, pensions, grants-in-aid, refunds, scholarships and loans.| Every
transaction in the government is made through related departments.
on Internet or
Intranet Web page;
on a network; and
via e-mail.
that has been sent detinitions of queries,
Transformer
model objects may contain authentication
Transformer. user classes and related
dimension views, for
measures, Transformer creates
dimensions, more cubés that
well as objects
for one_or
stores models as files with the
information, as ranstormer
in PowerPlay,
viewing and reporting based on
extensions. J cubes or cube groups
one or m o r e
ready, creating
Once he model
is several cube objects and basing
contents is possible.
By creating that serve the needs
the model can create
subset cubes
objects, one
on those
different cubes also create cubes
that provide drill-through
One can
user groups.
of distinct
access to
other cubes.
100 Data Warehousing-Concepts, Techniques, Products and Applications
with drill-dou
PowerPlay. COGNOS PowerPlay is used to populate reports own
as follows (also refer to Figs. C3.8-C3.14):
facility. Popular reports are
RAE :
E
2 Data
Warehousing-Concepts. Techniques, Products and Applications
D
Data Warehouse for the Government of Andhra Pradesh 103
DTA_FINANGE
DATAWAREHOUSE2 treasury
C Dimensions
TIME
ea
Monthh
-- Dayofnonth
DEPARTMENT
Dept Name
BANK
Bank Name
FORM_DESCRIPTION
FF Descrip
DTO_STO_DDO
Dto Name
Sto a m e
Odo Name
t 2 o t e d Charged
Voted Charged
3 Plan Nonplan
P a n Nonplan
t Major
Majo
2 Amount Range
Amount R a n g e
C Measures
*Y Amount Paid
Amount Recieved
dta finance
AmountRange BANK DEPARTMENT FORM DESCRIPTIONMajorPlanNomplan VotedCharged
AlAnount Range Al BANKA DEPARTMENTAIFORM DESCRPTIONAI Majgr Pan Voted
YearMlonthDayofnmonth
@200 Grand Total
DoName Sto Nane Dde Name Amount Recieved Amourt PaidAmountRecieed Ano
8 DTO R.R CHEVELA 292 3
E HAYATHNAGAR 0
8RAHIMPATNAM PROJECT OFICER NFE IBRPATAN 20063 0
CDPO ICDS IPATAN 174176 1335
ADASCRRDIST 15100
MOPHC MANCHAL
MOPHC DANDU MALARAM
ASST SM RFWPC YACHARAM 23384
EEPRRWSJONRRDIST 9121
GATETED ADMN OFFICER DEO RR
CASPPUNT IPATAN 00
Total 11335 717300 1136
MAHESHNWARAM 201060 0
@MEDCHAL 479306 0
PARGI 3786-8 0
RAJENDRANAGR 4
8888tE
106 Data Warehousing Concepts, Techniques, Producls and Apylioaton
A
Data Warehouse for the
Government of Andhra Pradesh 107
PRE
Applications
Techniques, Products and
Data Warchousing-Concepts,
E888EE EE
ES
Data Warehouse for the Government of Andhra Pradesh 109
MiceesoH Ee
IBocd
iatcet- 1
ort..B DEMR.. OT57
2MoErc
DEPARTMENT Al DEPARTIMENT
Non Plarn Bank Name
Data
CV Amount Paid
CV Amount
Received
aera "har Plan" Puat sRR DESI Aani Rex eived
Vaue 229/4( bunt
Paid
HAYAT Amount
Received
Plan Nonplan RAJ Amount Paid
brot
1 832 336
2644
EASUFE 781 20
5 r n39 1
EHH
CLT RE DEFRTMEN
AN M 34NE
AT
943,
EPARTMEN
ainaneel of FINANCE pn
ar/rr
Egerdtys
1
EDed turs P
gtsi
14 al
9 TANDo 34 4
4
U415
Fo Help prens Fl
PomPlnclR FNANI l
WRIIGNAM SvOLN N
Leend
hepus imems
EL
LEE hA
ETRATE E
ELARE
TA
A'TAM
orte pens F1
Sor faDOO
NANIE! sqk
nacel pe ot
P e l 1ey
E wwIYGRAM FESV06N
Total A95T OR
epto
Due Tes
Rav NorP
960
M e Haad
Watmd C h g
S Te
des 10926
Tma!
Rewerue
AGET DAR Ar
AE3T DAR AH
1R Ae
LLA HLLA
CHEELA
SeH'PARC
9.966 10,926
HAYA
eBH RR DeT 10,926
995
Bank wse
H EATA
Plan/NorFP A
M a Head
Vdedy Dreged
9 FAR
Fos Tpe
Oales
MASI 156,4
Bare e
SBH RR D9T
4
56,440
43
PERATE tEPA NT
Puy Bn
Fia, C3.14 Drill-down
report, using COGNOS.
Data Warehouse for the Government of Andhra Pradesh 113
tm fact
cmo prog
Cino
cmo_date
dwsul
wwwww. www
MO_CUBE
DATAWAREHOUSE2 cmo_mast
Dirresions
DISTPARL_ASSM
Dist Neme
Pal ame
Ass N m e
PROGAAM
Prog N ame
i YEAR MONTH
ea
Manth
MeasUres
Propos als Receved
Estinsted Cost
People Contribuliorn
Adm Sanctioned
Adm Value
Tech Sandioned
Tech Vaiue
No a r oLunded
Grounded Value
No Campleta d
ompleted Vaue
JDPanents
Scher Paymends
Total Paynena
Cleara r c e Ststus
Work S t a t 5 t a t u s
Wark Complele Status
Fig. C3.15 Data Iranstormation map lor Janmabhoomni data
114 Data Warehousing-Concepts, Techniques, Products and Applications
BA
Stop hae Seach Fanley Holby Daou
C4M0 CUBE
Orop FiterFieldsHere
ProgName
Dis Mane Parl Name AsName No Comgleted tle Comp
B ADWABAT GADLAB40 ADLABAD
ASFABAD (S
BOATH(
RUAPUR S
MDHOLE
RMAL
SPPUR
SPEDDAPALL 24
otal 286 9584 18990
EANARNTAPUR 3400 10818
CHTOOR 80262 7668
BCUDOAPA 8 7010 12978
BAST GODAVARI 1512 3804
8 CNTR 5002 30 4230
HOERABA 6372 862
KARIDMNAG 433 9994 074
1
JANMABNDOMp
Fig. C3.16 Sample data drill-down picture tor Jannmabhoomu data.
Data Warehouse for the Government of Andhra Pradesh 115
mlact
CmOog ad e g am dhsara
CFITt
nD_dete
MO_CUBE
DATAWAREHOUSE2 cmo_mat
E Dner1sons
DIST_PARL_ASSM
Dizt Narne
P a l N eme
-- A s Namee
PROGRAM
Prog Name
Y E A R MONTH
Yea
- Manth
Measures
Praposals Receved
E st imated Cost
People Contibulion
Adm Sancbored
Adm Value
Tech Sarctioned
Tach aiue
No G r o U r s d
GCLrded Vahe
No Corvmpeted
Completed Vale
Nt Parment
SehernePawments
Total Payrmenta
Clesr arce Status
Work Stait Stalue
Woik Complete Stalus
9Pwot Tatle Servce Lab Hntng an 0LAP Cuhe mith 02% Web Conponenks Hicioso Inierrel Exglores
Eye Fiole o He
CIups cube
HE
C4.1 INTRODUCTION
Hewlett-Packard is a
installed base of more than 55 million printers,
With an
billion inkjet industry. The company's
worldwide market leader in the $18 of imaging
and manufactures a full range
Consumer Products Group develops for creating
thermal-inkjet technology. Responsible
products using HP-pioneered other part
more new product categories
for HP over the last 10 years than any
the world's first
Products Group introduced
of the company, the Consumer
in 1984 and pioneered large-format
inkjet printers in 1993,
desktop inkjet printer in 1997
devices in 1994, PC photography
al-in-one (printer/fax/scanner/copier)
views expressed in
Corporation and theretore the
This case study provided by Microsoft
is
e Case study belong to Microsoft Corporalion
17
Products and Applications
118 Data Warchousing Concepts, Techniques,
coNCLUSION
When the new SQL Server 7.0 and ProClarity system is
fully operational,
(the manager of Business Analysis Group of Hewlett-Packard) expectsStanley
it to
provide significant benefits to account representatives in the field, business
analysts in his and other groups, and HP's reseller customers: "I expect that
account reps who are
calling on a major account will log onto a Web on
Monday morning before going to call on that account and pull up lastpage
week's
sales and inventory levels all the way down to the store level.
a
report that shows inventory problems in particular stores-for They can produce
printer we just introduced three weeks ago is selling really wellexample,
the
new
but it looks
like we are having stock outs in
audit data, which
particular stores. Then they can take our in-store
might show a
problem with product placement in these stores.
They can suggest that the problem may be that the product is displayed in the
computer aisle, and people may not be shopping the
Then, as analysts, we can study these trends over time computer aisle for a printer.
and say, 'These stores are
Continually out of stock. These other stores are continually over stock and dont
have as high a
sell-through rate. Let's move inventory from these stores where
just sits on the shelves for two weeks to these stores
week" where it sells out in halr d
"The bottom line'", concludes Stanley, "is that this new
accurate, detailed, and timely data, will make system, through more
our business more efficient so
in turn, can
help our resellers make their businesses more efficient". w
in the
6 Data Warehousing
World Bank
C6.1 INTRODUCTION
warehouse
warehouse which called Live Database'. This was so popular and so
was
successtul that the American Productivity Council called it the 'number one
example of the transfer of best practice within an organization'. This product was
given meritoriouS award and also resulted in much cost savings. The Africaan
region reported a saving of $12 million after the introduction to LDB data
warehouse.
C7.1 INTRODUCTION
Any highly available database system or data warehousing system will use
data replication to ensure that the data access continues with no or very few
interruptions, the latter may arise if some computational system or component
fails. Each database object, e.g. a tuple, a partition or a table, will have to be
replicated at least once and distributed among many sites. Normally, the
approaches for high availability include identical sites, identical replicas and
identical ways of storing the replicas and identical mechanisms for distribution.
The system under consideration for a highly available data warehouse, called
HARBOR (Highly Available Replication-Based Online Recovery) is more flexible
and does not insist on all these requirements of identical copies, as long as they
represent logically the same data. With this flexibility, it is possible to store data
redundantly in different sort orders in various data compression formats. The
different updatable materialized views are also possible so that a wider variety of
queries can be answered by the query optimizer and query processor (than is
possible with the uniform formats of copies stored). A system called C-Store
achieves an order of magnitude higher performance than is usual by storing the
data in different sort orders and different formats. Data
compression redundancy
can, therefore, provide higher performance and also higher availability in
HARBOR
Normally, data warehouses cater specifically to large ad hoc query workloadS
Over largeread data sets
intermingled with a small number of OLTP transactions
which may touch a lew records only, but attect
the most recent data. Under suchn
Conditions the conventional
locking techniques cause poor pertormance tOdue
142
HARBOR, A Highly Available Data Warehouse 143
returns a
past time, is read-only query that
a
result, as if the query had been executed on
the database at time T, i.e.
a historical query at time T sees not
committed or uncommitted updates
after time T. This time travel feature enables
the inspection of past states of the
database.
144 Data Warchousing-Concepts, echniques, Products and Applications
deletion time, a, az, d ... ay instead of the usual tuple a1, 42, ay
The time stamp values are assigned at commit time as part ot the commit
tuple.
On doing this, the information necessary to answer historical queries is
preserved. To answer a historical query on a tuple at time 'T', it has to be
confirmed whether the tuple was inserted at or before time T and deleted T, or
after T.
As historical queries view an old time slice of the database and all subsequent
update transactions use later time stamps hence no (current) update transactions
can ever affect the output of a historical query for some past time. Moreover, no
locks are required for historical queries. The amount of history maintained in a
system can be confirmed by the user by deleting the tuples before a specific time.
The algorithm for recovery consists of three phases and uses time stamps
associated with tuples to answer time-based range queries for the tuples inserted
or deleted during a specified time range. In the first phase, a checkpointing
mechanism is used to determine the most recent time T such that it is guaranteed
that all the updates upto and including time T have been flushed to the disk. The
crashed site then uses the time stamps available for historical queries to nun local
update transactions to restore itself to the time of its last checkpoint. In order to
record checkpoints, the buffer pool will have to maintain a duty pages table with
the identity of all in memory pages and a flag for each page indicating whether
it contains any changes not yet flushed to the disk. During the normal processing,
the database
periodically writes a
checkpoint
for some past time T by tirst takin
a snapshot of the dirty pages table at time T+1. For each page that is dirty n the
snapshot, the system obtains a write latch for the page, flushes the page to the
disk, and then releases the latch. After all the dirty pages are flushed, T is
recorded onto the same well-known location on the disk and the checkpoint is
completed. The checkpoint will be 0 on initial condition or when it got corrupted.
In the second phase, the site executes historical queries on other live sites that
contain replicated copies of its data in order to catch up with the changes made
between the last checkpoint and sometime closer to the present. It is to be noted
145
HARBOR, A Highlhy Available Data Warchouse
to run
as a significant feature of the algorithm that read locks are not required
down
historical queries-this ensures that the system is not quitened or slowed
this
While large amounts of data are copied over the network-otherwise
for recovery
approach will not be viable (if read locks are required to be obtained
query).
In the third (or final) phase, the site executes the standard non-historical
between the
queries With read locks to catch up with any committed changes
start of the second phase and the current time. The coordinator
then torwards
C7.5 PERFORMANCE
described in the
Performance evaluation had shown that the recovery approach
warehouses and similar environments,
foregoing section works well for data
insertions with relatively few
where the updated work loads consist prinmarily of
updates to historical data.
CONCLUSION
omplovees, and Order Details table has details of each order. Again, in the real
world this might not always be the case-so simple and straightforward. We may
have to figure out what some of the cryptic object names mean and exactly
which data elements we require. The data warehouse architect often needs to
into the
create a list of data mappings and clean the data as it is loaded
the
warehouse. For example, the customer names might not always be stored in
same format in the various data sources.
Once we have decided which data we need, we can create and populate
a
on the
staging database and then the dimensional data model. IfDepending
we have multiple
database.
project, we may or may not have to have a staging too
data sources and need to correlate the data from these sources prior
database is
populating dimensional data structure, then having a staging
a
tool for testing. We
convenient. Furthermore, a staging database will be a handy
number of
can compare a number of records
in the original data source with the
In
ETL routines work correctly.
records in the staging tables to ensure that the
we need in easily
accessible format;
this case the database already has all the data
database.
therefore, we need not create a staging warehouse.
dimensional modelling for this data
Now let us pertorm the the
from its relational counterpart,
Dimensional modelling is different
tables and
dimensional models consist of fact
relational database modelling. The
fact tables contain n u m e r o u s foreign keys
dimension tables. The typical
dimension tables. The
dimension tables, on the other hand,
referencing the
create, and update
contain very few c o l u m n s - d i m e n s i o n key, value,
usually record o c c u r r e n c e s of a
an obsolete date. The fact tables
date, and perhaps The dimension tables provide a way to
as customer orders.
measurable fact, such for
various diagonals of company's operations;
slice business data
across
example, we can
examine orders by customer or by product. track
obsolete_date column within the dimension tables to
We can use the
values that change over time.
This concept is known as slowly
the history of the
their
dimension. For example, c o n s u m e r s of the products may change
changing within the
reason. Similarly, multiple departments
last names for any personal divided into
combined into one, or one departnment can be
organization can be the current value. If so, we can simply
some cases, we may keep only
many. In In other
value with the new value in the dimension table.
override the existing
must keep track of the old value, and the new
value. This is when we
cases, we
record was
use the
record obsolete date to track the timeframe during which the
valid
In the case of our present data warehouse, the dimensional nmodel will be very
of the dimension tables and one fact table. Since this
following
simple, consisting we need
is just static database and there is no new data to populate it regularly,
a
tact and
not add the obsolete date column to the dimensions. We can create
sion tables using the following script (in MS SQL Server):
L*J
(D (D (D (D (D H D D (D (D ,
P P-
O P O Q
(D
D
+
D
N J
N o H
L
JT
n
A Typical Business Data Warehouse for a
Trading Company 149
supplier dimension:
INSERT dim supplier (
supplier_id,
supplier_name
supplier_city,
country)
SELECT
supplierid,
companyname,
city,
countrY
FROM suppliers
- product dimension:
INSERT dim_product (
product_id,
product_name,
discontinued)
SELECT
productid,
productname,
discontinued
FROM productss
Customer dimension:
INSERT dim_customer (
cust oner_id,
Customer name,
customer_city,
Customer country)
SELECT
customerid,
companyname,
city,
Country
FROM custome r s
employee dimension:
INSERT dim_employee
employee_id,
employee_name,
employee_city
employee_country)
SELECT
employeeid,
150 Data Warehousing-Concepts, Techniques, Products and Applications
Note that dim time is a special dimension. It is not populated by data that is
already in the warehouse. Instead we populate it with calendar dates and date
parts (day, month, quarter, year, and so forth) so that we can aggregate the
warehouse data as needed. You can come up with a routine that populates your
own time dimension; here is a sample store procedure that one can use to
populate the time dimension:
d calendar date dt
BEGIN
SELECT
(acalendar day of_week_num DATEPART (dw, @calendar date
dt)
,acalendar day of Week name= DATENAME (WEEKDAY, calendar date dt)
calendar day of nonth nun =
,
,calendar day of
DATEPART (DD, @calendar date dt)
Year num =DATE PART (DY, Gcalendar date dt)
,acalendar week num =DATEPART (WK, G calendar date dt
A Typical Business Data Warehouse for a Trading Companiy 151
'calendar_month_name,' +
calendar_quarter_num, '
calendar_year_num +
'+
VALUES +
+
END
now use load dim time procedure to populate dim time table with needed dates
MS be
SQL Server has an ETL tool-DTS-which can
used to execute and
schedule the data warehouse population routines. A typical Dis package
determines which data rows need to be extracted from their source and inserts
Such rows into the appropriate dimension and fact tables. As this is a sample
application, there is no need to create any DTS packages, but in the real world
ETL routines may become considerably complicated and might take several
weeks or more to develop.
cONCLUSION
In this case
study, we have introduced the steps involved in building a typical
data warehouse. We showed how to
model, create and populate a dimensional
data model which will be a cornerstone of the warehouse and
analytical reports,
using MS SQL Server 2000 for a
typical trading company
Note: The above case
study is derived from the open websites-we thankhully
acknowledge the original author of the case on the web.
Customer Data Warehouse of the
Egg PLC provides banking, insurance, investments and mortgages to more than
3 million customers
through its Internet site and other distribution channels.
Established in 1998, Egg PLC pioneered online banking not only in the United
Kingdom but also throughout the world./Its technical infrastructure and support
was provided
by Sun Microsystems.
Finally, Egg's data warehouse resided on Sun Fire 15K with 16 CPUs and
10 GB for the core system. All the incoming data feeds for the data warehouse
were fed onto this system. The storage application is an EMC's SANJStorage
Area Network). The system works on Solaris 9 operating system, with Oracle 9
as the DBMS pending migration to Oracle 10g. Virtual domaininga special
feature supported by Solaris 10 is utilized by Egg's data warehouse. Oracle 10g
supports RAC (Real Application Cluster) for better performance by load sharing
along with failover facility within the cluster of 2 or more servers.
C9.5 BENEFITS
Egg has availed great benefits out of
CDW.Sales nd Marketing campaigns, up
to 120 per month, could be
developed using CDWJ It also made possible For
credit decisions with due daily risk assessment and risk management. daily
example, if a customer has a credit card, the Egg management is actually able to
Customer Data Warehouse of the World's First and Largest Online Bank im the United Kingáom 155
make a decision, using the data in the data warehouses, on a transactional basis
to determine whether that customer's account requires collection, is just slightly
customer is a good credit risk and should be
overdue, or conversely, |that the
offered a pre-approved loan.
Campaigns developed on the basis of CDW have resulted in greater sales.
Conforms to the internal policies and also external regulatory requirements. The
data warehousing team comprising DBAs, meta
data analysts and Oracle
can make changes to
developers are responsible for the data security. Only they the
the code The data warehouse team decides and implements changes that are
Acts such as the UK's Data
requiréd from time to time. |External Regulatory Act
Protection Act of1998, Financiat Services Authority within the UK, Banking
CDW has to be made
in the UK provide the regulatory framework|for which the
fully compliant.
c9.9 RELIABILITY
of IT infrastructure is required
to be Hot
reliable.
The architecture and design
cover any failure in
hardware components. Multiple fait-
plugins are provided to
is also provided
disaster recovery system
safes are made available with SAN. A
with the help of the original Sun Fire 6800 server.90%
uptime is maintained
are found to be
reliable and never
with a scheduled 10% downtime.SUN systems
of maintenance engineers on a
has had any failure that was beyond the scope
24/7 basis.
coNCLUSION
In this case study we have seen how a large online bank, Egg of the UK deploys
data warehousing fer their core functions using reliable IT intrastructure. We
have also seen how such deployment has resulted in enhanced success in banking
internal staff in better
service delivery to the customers and was helped the
planning of marketing decisions.
Note: This case study is derived from Egg's websites on its data warehouse. We
EDEKA Gmbh is a German wholesale and retail food chain. The top
management of EDEKA felt a need to be able to forecast specific product
demands by identifying emerging market trends-very crucial for its business
growth.
EDEKA engaged Melsungen, a company based in Germany, to implement the
data warehouse applications based on DB2 of IBM.
EDEKA used IBM's DB2 Universal Database DBMS for iSeries and DB2 Connect
Softwares on IBM hardware environment.
EDEKA used the following hardware environment for its data warehouse
1.IBM Server iseries 850 (processor)
2. IBM TotalStorage" Enterprise Storage Server (disk storage)
3. IBM TotalStorage Enterprise Tape System 3590 (tape storage)
I57
158 Data Warchousing-Concepts, Techniques, Products and Applications
C10.5 PERFORMANCE
With the help of iSeries servers the transaction processing was made 50% faster.
As the data warehouse was run on the IBM DB2 Connect, it contributed to a
better system performance. "Business Objects' product from IBM's business
partner also helped in to better access the sales information and track,
understand and manage the wealth of information stored in the data warehouse.
The information in the data warehouse is updated within an hour the concerned
sales activity is completed, ensuring that the data is fresh always. By using the
timely business analysis, EDEKA is able to anticipate market patterns more
accurately.
A German Supermarket EDEKA's Data Warehouse 159
C10.7 FORECASTING
Based on latest data EDEKA performed the analysis were able and to anticipate
market patterns and purchase patterns in the immediate and near future. This
allowed them for example, to forecast demand growth and suggest corrective
measures in the event of a sudden surge in returned goods. Theretore, the data
warehouse helped the company in responding quickly to changing business
conditions and competitive pressures and enabled it to differentiate and
distinguish itself in the open market place from other super markets in the region.
C10.8 EXPANSION
the data
EDEKA plans to include additional data storage capacity. In future,
to
warehouse will play a key role in addressing EDEKA's ongoing ettorts
understand better consumer shopping patterns. This will help its executives keep
the right products in the stores at the right time, based on accurate and up-to-
C10.9 PERFORMANCE
satisfied with the present data warehouse
immensely
On the whole, EDEKA feels decisions which in
as it "empowers Edeka to make much more logical business
turn result in improving profits and business objectives significantly".
CONCLUSION