You are on page 1of 1

An Automatic Data Warehouse Conceptual Design Approach

star schema resulting from applying our algorithm is a fact and has a set of dimensions. The stopcondition
shown in Figure 5. is a Boolean expression, true if either the size of MS A
Note that IdDate, IdClient and IdProd were added as becomes 1 or all the values in MS are lower than a
attributes to identify the dimensions. In addition, sev- threshold set by the designer. Let us extend our previous
eral attributes were added to complete the dimension example with the additional star S2 (Figure 6).
hierarchies. This addition was done in step 3.2.1. of The five steps of the DM schema construction
the algorithm. are:

Constellation Schema Generation a. Construct the matrix of similarities MS


b. Find all the occurrences of the maximum max in
In the above phase, we have generated a star schema MS
for each fact in the same analysis domain. These latter c. Construct a constellation by merging all schemes
have to be merged to obtain star/constellation schemes. having the maximum similarity max
For this, we adapt the similarity factor of (Feki, 2004) d. Re-dimension MS by:
to measure the pertinence of schemes to be integrated,  Dropping rows and columns of the merged
i.e., the number of their common dimensions. schemes
Given Si and Sk two star schemes in the same analysis  Adding one row and one column for the
domain, their similarity factor Sim(Si, Sk) is calculated newly constructed schema
on the basis of n and m which are the number of di- e. If <stopcondition> then exit, else return to step
mensions in Si and Sk respectively, and p which is the a.
number of their common dimensions:
The similarity matrix for S1 and S2 contains the
 if (n = p ) ∧ (n < m); single value Sim(S1, S2) = 0.75.
Sim( Si , S K ) = 
 p / (n + m − p ) otherwise.
The constellation schema resulting from applying
Informally, Sim(Si, Sk) highlights the case where the above five steps is depicted by Figure 7.
all the dimensions of Si are included in Sk. To dismiss
the trivial case of Si is having only the Date dimension DM-DS Mapping
(present in all schemes), the designer should fix the
threshold α to a value strictly greater than 0.5. The DW is built from several data sources (DS) while
In addition, to enhance the quality of the integration its schema is built from the DM schemes. Thus, the DM
result, we define a matrix of similarities MS to measure schemes must be mapped to the DS schemes. In our
the similarity between each pair of multidimensional approach, the DM-DS mapping adapts the heuristics
schemes. This matrix is used to decide which schemes proposed by (Golfarelli, Maio & Rizzi , 1998; Boni-
should be integrated first. fati, Cattaneo, Ceri, Fuggetta & Paraboschi, 2001) to
Given n star schemes of the same analysis domain map each element (i.e., fact, dimension…) of the DM
S1,S2,…..,Sn. Each schema, defined by a name, analyzes

Figure 6. S2 star schema


R egion

Department e
City Social-Reason
Supplier
Id
Client

Y ear Day SHI P M E NT


Quarter
Qty
Date
Semester Month IdDate A mount



You might also like