32 views

Uploaded by John Gong

Solutions to the practice questions for Data Warehousing, OLAP, and Data Mining.

Solutions to the practice questions for Data Warehousing, OLAP, and Data Mining.

© All Rights Reserved

- SAAD Written Report: XML
- dw & dm
- INTELIGENCIA DE NEGOCIOS
- 140
- Storage Networking Solutions for High Performance Databases by QLogic
- 2 - SAP BW Overview
- All Batch Hourly
- SAP BW Sizing Help 0904
- Worksheet
- 0512 Tips and Tricks Using SAP BusinessObjects Web Intelligence 40 on Top of SAP NetWeaver Business Warehouse Data
- Modeling Key Figures
- SSAS - Microsoft Business Intelligence - Ch1 - Analysis Service
- Logical Data Model Project Plan
- BO Notes Kalluru
- Introduction to Dw
- Developer Student Guide
- BPC Data to WEBI Integration - Best Practice.pdf
- cognos_BI_10_technical_super_session (1).pdf
- Itt MCQ
- Tips and Tricks 2011 04

You are on page 1of 4

in Data Warehousing & OLAP and Data

Mining

Laks V.S. Lakshmanan

(1) What
are
the
advantages
of
the
cube
operator
compared
to
the
operators

rollup,
drilldown,
and
pivot?

Cube
generalizes
rollup,
drilldown,
and
pivot
and
thus
subsumes
them.
The
benefit

of
this
is
that
if
we
precompute
and
materialize
the
cube,
that
will
facilitate
a
whole

class
of
data
explorations
(and
queries).
Exploration
and
visualization
of
the
cube
or

subsets
of
group-bys
from
the
cube
can
in
turn
enable
the
detection
of
patterns
that

are
otherwise
hard
to
find.

(2) Consider
a
data
warehouse
with
the
dimensions
D1,
D2,
D3,
D4,
D5
which

have
999,
99,
24,
4,
and
19
values
respectively.
Suppose
the
sparsity
factor
of

the
cube
is
10%.
What
is
the
estimated
number
of
tuples
in
the
sparse
cube?

The
size
of
the
full
cube
is
(999+1)(99+1)(24+1)(4+1)(19+1)
=
250,000,000
tuples.

With
a
sparsity
factor
of
10%,
the
size
of
the
sparse
cube
is
about
250,000,000
x
0.1

=
25,000,000
tuples.

(3) Consider
computing
the
full
cube
over
the
dimensions
{Product,
Time,

Geography},
using
sorting
to
speed
up
the
computation
of
the
group-bys
that

make
up
the
cube.
Explain
with
a
diagram
of
the
cube
lattice
how
youd
order

the
dimensions
of
each
group-by
in
order
to
minimize
the
number
of
sort

operations
required.

P=product;
T=time;
G=geography.

PGT

PG
GT
TP

P
G
T

{}

Each
red
path
is
a
sorted
pass.
By
ordering
the
sort
attributes
as
shown
in
the

cube
lattice
above,
we
can
compute
the
entire
cube
in
3
sort
passes
of

pipesort.

(4) Consider
the
following
cube
lattice,
with
numbers
showing
the
estimated

sizes
of
the
group-bys,
where
M
indicates
a
million
tuples.
If
you
are
allowed

to
materialize
3
group-bys,
which
are
the
best
three
youd
materialize
in

order
to
optimize
the
evaluation
of
queries
corresponding
to
the
various

group-bys
in
the
cube
lattice?

(a,b,c)
10
M

(b,c)
6M

(a,b)
8M

(a,c)
4M

(a)

2M

(b)
4M

(c)

5M
3.5
M

{}

1

First,
notice
that
there
is
an
inconsistency
in
the
size
estimates
provided.
(c)
has
a

size
of
5M
,
whereas
(a,c)
has
a
size
of
4M.
This
is
impossible
since
the
size
of
group-

bys
cannot
grow
as
you
go
down
the
lattice.
Lets
work
with
a
revised
estimate
for

(c)
of
3.5
M.
This
is
the
size
we
will
use
below.

Of
the
three
group-bys
we
are
allowed
to
materialize,
one
is
taken:
we
always
must

materialize
the
top
element
of
the
cube
lattice,
for
it
cannot
be
derived
from
any

other
group-by.
So,
that
leaves
two
more
group-bys
to
choose.
The
following
table

tracks
the
marginal
gain
of
each
group-by,
given
what
has
been
materialized
in

previous
rounds.
Initially,
Only
abc
is
materialized.
The
group-by
that
is
the
winner

of
the
greedy
choice
in
each
round
is
highlighted
in
red.

Remember,
marginal
saving
for
a
single
group-by
=
sum
of
marginal
savings
for
all

group-bys
that
can
be
derived
from
it,
compared
with
the
current
cheapest
way
of

computing
each
of
those
group-bys.
E.g.,
the
group-bys
that
can
be
derived
from
ab

are
ab
(yes,
you
include
itself),
a,
b,
and
{}.
Initially,
the
cheapest
way
to
compute

each
of
them
is
to
use
the
top
group-by
abc,
which
has
a
cost
of
10
M.
On

materializing
ab,
these
four
group-bys
can
be
computed
at
a
cheaper
cost
of
8
M.
So,

the
marginal
gain
of
ab
=
4
group-bys
x
savings
on
each
=
4
x
2
M.
Keep
in
mind,

sometimes
the
savings
on
different
group-bys
derivable
from
the
same
group-by
can

be
different.

Note:
For
simplicity,
we
ignore
subtraction
of
small
numbers
from
millions:
e.g.,
we

write
10
M
1
as
just
10
M.

Group-by

ab

bc

ac

a

b

c

{}

Round
1

4
x
2
M

4
x
4
M

4
x
6
M

2
x
8
M

2
x
6
M

2
x
6.5
M

10
M

Round
2

2
x
2
M

2
x
4
M

2
x
2
M

1
x
6
M

2
x
0.5
M

4
M

Thus,
the
two
additional
group-bys
we
should
materialize
according
to
the
greedy

algorithm
are
ac
and
bc.

(5) Consider
the
following
transaction
database.

Transaction_id
Basket
of
items

t1

{a,c,d,f}

t2

{a,b,d,e,g}

t3

{b,c,d,e}

t4

{a,b,c,d}

Suppose
minSup
=
3
and
minConf
=
2/3.

(a) Using
the
Apriori
algorithm,
find
all
itemsets
that
are
frequent,
i.e.,
have
a

support
minSup.

Round
1:
sup(a)
=
3;
sup(b)
=
3;
sup(c)
=
3;
sup(d)
=
4;
sup(e)
=
2;
sup(f)
=
1;

sup(g)
=
1.
Discard
e,f,g
and
their
supersets
as
their
support
is
<
minSup
=
3.

Candidates
for
round
2
=
ab,
ac,
ad,
bc,
bd,
cd.

Round
2:
sup(ab)
=
2;
sup(ac)
=
2;
sup(ad)
=
3;
sup(bc)
=
2;
sup(bd)
=
3;
sup(cd)
=

3.
Discard
ab,
ac,
bc.
Candidates
for
round
3
=
abd,
acd,
bcd.

Round
3:
sup(abd)
=
2;
sup(acd)
=
2;
sup(bcd)
=
2.

All
discarded.
No
candidates
for
round
4
stop!

The
frequent
itemsets
are
a,
b,
c,
d,
ad,
bd,
cd.

(b) Based
on
(a),
find
all
strong
association
rules,
i.e.,
association
rules
whose

confidence
minConf.

We
use
all
frequent
itemsets
found
in
(a)
above
to
form
ARs.

Singleton
itemsets
never
contribute
to
non-trivial
ARs.

That
only
leaves
ad,
bd,
cd.

conf(ad)
=
sup(ad)/sup(a)
=
3/3
=
1.

conf(bd) = sup(bd)/sup(b) = 3/3 = 1.

conf(db) = sup(bd)/sup(d) = 3/4.

conf(cd) = sup(cd)/sup(c) = 3/3 = 1.

conf(dc) = sup(cd)/sup(d) = 3/4.

All ARs above has a confidence > minConf = 2/3. If minConf instead was, e.g., 0.8, all

ARs above with d on the LHS would be disqualified as their confidence would be

below minConf.

- SAAD Written Report: XMLUploaded byCamille Lucelo
- dw & dmUploaded byUdit Kuchhal
- INTELIGENCIA DE NEGOCIOSUploaded byelizabeth061991
- 140Uploaded byHernan Santiago Acuña Palomino
- Storage Networking Solutions for High Performance Databases by QLogicUploaded byJone Smith
- All Batch HourlyUploaded byAhmed Badr
- SAP BW Sizing Help 0904Uploaded bySuraj Reddy
- 0512 Tips and Tricks Using SAP BusinessObjects Web Intelligence 40 on Top of SAP NetWeaver Business Warehouse DataUploaded byasad_khan
- 2 - SAP BW OverviewUploaded byROGER M. CAMACHO
- WorksheetUploaded bydomhon
- Modeling Key FiguresUploaded byAadil Mohammed
- SSAS - Microsoft Business Intelligence - Ch1 - Analysis ServiceUploaded byKiruthika Ekambaram
- Logical Data Model Project PlanUploaded byrahulagrawal_sd
- BO Notes KalluruUploaded bysobankuchi
- Introduction to DwUploaded byKhan A Zee
- Developer Student GuideUploaded byHéctor
- BPC Data to WEBI Integration - Best Practice.pdfUploaded byMaycon Franco
- cognos_BI_10_technical_super_session (1).pdfUploaded byassane2mcs
- Itt MCQUploaded byJ Anne Joshua
- Tips and Tricks 2011 04Uploaded byuday_bvh
- Enterprise Resource Planning.docxUploaded byArbina Chagala
- Lecture3-DataWarehousingUploaded byAshoka Vanjare
- Complete+Bank+Data+Warehouse6Uploaded byfakecam
- DW_BASICSUploaded byBhaskar Reddy
- Manager data resourceUploaded byLe Hoang Viet Anh
- ET008Uploaded byravibabu_inturi
- Lesson 5 Multiple Data ProvidersUploaded byRajendra Nurukurthi
- Why Business Intelligence Projects FailUploaded byChennakeshav Adya
- Research StudyUploaded byKumar Gaurav
- apr08pgdadmsreportUploaded bysurilkhedia

- Landslides Worksheet 2.pdfUploaded byJohn Gong
- StormsDay2c-SupercellHum-sm2Uploaded byJohn Gong
- StormsDay1c-LtngTstormSun-sm2Uploaded byJohn Gong
- Landslides Worksheet 1(1)Uploaded byJohn Gong
- Landslides 2Uploaded byJohn Gong
- Landslides 1 for NotesUploaded byJohn Gong
- Landslides 4Uploaded byJohn Gong
- Landslides 3Uploaded byJohn Gong
- E114 Volc Day5 Worksheet Answers(1)Uploaded byJohn Gong
- Volcanoes Worksheet 1 KEYUploaded byJohn Gong
- Volcanoes Day5 Worksheet HazardAssessment(1)Uploaded byJohn Gong
- E114_Volc_Day5_DW 2017W1(1)Uploaded byJohn Gong
- E114_Volc_Day3_DW 2017W1Uploaded byJohn Gong
- E114_Volc_Day2_LP 2017W1Uploaded byJohn Gong
- Worksheet QuakeLocationUploaded byJohn Gong
- Earthquakes1_Worksheet1Uploaded byJohn Gong
- Earthquakes Day5 LP2017W1Uploaded byJohn Gong
- Earthquakes Day4 LP2017W1Uploaded byJohn Gong
- Earthquakes_Day 3 LP2017W1Uploaded byJohn Gong
- Earthquakes Day 2LP2017W1Uploaded byJohn Gong
- Earthquakes Day 1LP2017W1Uploaded byJohn Gong
- Fragile Systems 3Uploaded byJohn Gong
- Fragile Systems Worksheet KeyUploaded byJohn Gong
- Fragile Systems 2Uploaded byJohn Gong
- Fragile Systems 1Uploaded byJohn Gong
- FragileSystems-EquationsList.pdfUploaded byJohn Gong
- Unit 8 - Introduction to Data MiningUploaded byJohn Gong

- Bullseye Models Using Excel and Geometers SketchpadUploaded byAanFoeBayuAji
- Chapter 4 - Complex Numbers.pdfUploaded byRayan
- UF Sparse Matrix Collection - HB GroupUploaded bythecodefactory
- 51120592-kuratowski.pdfUploaded bySusilo Wati
- tsneUploaded byDiego Paolo Lozano Godos
- MLC Exam SolutionsUploaded byAki Tsukiyomi
- Industrial Engineering by S K Mondal [Marinenotes.blogspot.com]Uploaded byAbilasha Vediappan
- Alpro Waktu LinierUploaded byAji Saputra Raka Siwi
- Maths Advance v1.0Uploaded byTimothy Kurby
- Mabel FractionsUploaded byieremor
- LQG_MPC_Notes.pdfUploaded byRaja Balan
- Evolutionary Algorithms in Engineering and Computer ScienceUploaded byVan Le
- IMP Questions TCSUploaded byHiteshwar Dutt
- Turing VariationsUploaded bynomore891
- 1M_MathsUploaded byDaniel Worrall
- Exercises 2Uploaded byabe97
- 6 1 Maxima and MinimaUploaded bySebastian Garcia
- Formula ListUploaded byharshavarman
- Calculus Taxonomy 070113Uploaded bypfredaac
- Subject Combination Briefing FaqsUploaded bySeah Qing Yu Hernandez
- Waiting time paradoxUploaded bymelanocitos
- hand1Uploaded bytridevmishra
- Introduction to Econometrics- Stock & Watson -Ch 4 Slides.docUploaded byAntonio Alvino
- FP2 Inequalities (Chapter 1)Uploaded byasifjuk6692
- E.M. Lungu and H.K. Moffatt- The effect of wall conductance on heat diffusion in duct flowUploaded byOsdfm
- The Manhattan Pair Distance Heuristic for the 15-PuzzleUploaded byhahaggerg
- Math 002Uploaded bychuapaul
- EMS 6th Form Prospectus 2015/2016Uploaded byEnglish Martyrs School
- National Curriculum FrameworkUploaded bySwami Gurunand
- Resolution of Vectors Using Triangle LawUploaded byOteng Richard Selasie