You are on page 1of 6

Exercises

Question 1:

A. Describe data warehouse architecture in details.

Data Warehouse Back-End Tools and Utilities:


•Data extraction: get data from multiple, heterogeneous, and external sources.
•Data cleaning: detect errors in the data and rectify them when possible.
•Data transformation: convert data from legacy or host format to warehouse format.
•Load: sort, summarize, consolidate, compute views, check integrity, and build indices and partitions.
•Refresh: propagate the updates from the data sources to the warehouse.
Metadata Repository:
Meta data is the data defining warehouse objects. It stores:
•Description of the structure of the data warehouse (schema, view, dimensions, hierarchies, derived data
defn, data mart locations and contents)
•Operational meta-data (data lineage, currency of data, monitoring information)
•The algorithms used for summarization.
•The mapping from operational environment to the data warehouse.
•Data related to system performance: warehouse schema, view and derived data definitions.
•Business data: business terms and definitions, ownership of data, charging policies.
Data Mart:
•A subset of corporate wide data that is of value to a specific group of users. Its scope is confined to specific,
selected groups, such as marketing data mart.

1
B. Compare between Data Warehouse and Operational DBMS.
OLTP vs. OLAP
OLTP OLAP
Users clerk, IT professional knowledge worker
#users thousands hundreds
DB Design application-oriented subject-oriented
Db size 100MB-GB 100GB-TB
historical, summarized,
current, up-to-date detailed,
Data multidimensional integrated,
flat relational isolated
consolidated
Function day to day operations decision support
Usage repetitive ad-hoc
read/write index/hash on prim.
Access lots of scans
key
#records accessed tens millions
Unit of work short, simple transaction complex query
metric transaction throughput query throughput, response

C. A company sells 4 items, A, B, C and D in 3 stores, 1, 2, and 3. The following table shows the sales for each of
these items at each store for every quarter of a year.

Quart Q1 Q2 Q3 Q4
Store
1 2 3 1 2 3 1 2 3 1 2 3
Item
A 40 20 10 200 20 50 200 100 50 140 120 10
B 20 30 10 100 30 80 150 100 70 30 150 12
C 100 100 100 150 20 80 150 120 70 30 170 20
D 100 10 40 200 20 80 120 90 50 40 120 12

i. Show the contents of the view (Store, item_type).


Store
1 2 3
Item
A 580 260 120
B 300 310 172
C 430 410 270
D 460 240 182

ii. Show the contents of the view (Quarter, Store).


Store
1 2 3
Quart
Q1 260 160 160
Q2 650 90 290
Q3 620 410 240
Q4 240 560 54

1
Question 2:

A. What is the meaning of the recommender systems and list its applications?
-Answer: systems that produce individualized recommendations as output. They can guide the user in a
personalized way to interesting or useful objects in a large space of possible options.
-Recommender systems use the opinions of a community of users to help individuals in that community more
effectively identify content of interest from a potentially overwhelming set of choices.
-Categories of Recommender Systems:
• Collaborative Filtering
• Content-Based
• Hybrid
-Recommender systems applications:
1. Product Recommendations
2. Movie Recommendations
3. News Articles
B. Compare between collaborative filtering recommender systems and Content-based recommender systems in
terms of: Technique used, Advantages and disadvantages.

collaborative filtering Content-based

1.TF-IDF (Term Frequency — Inverse


1. User-user collaborative filtering Document Frequency)
2. Item-based approach 2.Cosine similarity (measuring the
Techniques
3. Classification approach similarity in their properties).
4. Neural Collaborative Filtering 3.statistical learning and machine learning
methods.
• No need for data on other users (No
cold-start or sparsity problems)
• Other user’s scores are used • Able to recommend to users with
• No deterministic result since chance unique tastes.
Advantages is involved in the system • Able to recommend new & unpopular
• Works for any kind of item (No items (No first-rater problem).
feature selection needed). • Able to provide explanations (listing
content-features that caused an item to
be recommended).
• Cold Start (Problem with new users
and new products) • Over-specialization
• Sparsity (The user/ratings matrix is • Finding the appropriate features is
Disadvantages sparse) hard.
• Scalability • Recommendations for new users (How
• Popularity bias (Cannot recommend to build a user profile?)
items to someone with unique taste)

1
Question 3:
A. Name and describe the main features/elements/steps of Genetic Algorithms (GA).

Answer: Genetic Algorithms (GA) use principles of natural evolution. There are five important features of GA:
GA Features:
• Fitness Function:
fitness function is a measure of the objective to be obtained (maximum or minimum). represents the rank of
the “representation”, This function calculates and returns the fitness of an individual solution and It is usually
a real number.
• Selection:
replicates the most successful solutions found in a population at a rate proportional to their relative quality
for using them in reproduction, there are many strategies for that (e.g. Roulette-wheel, Fitness proportionate
selection, rank selection, etc.).
• Recombination (Crossover):
decomposes two distinct solutions and then randomly mixes their parts to form novel solutions.
Crossover means choosing a random position in the string (say, after 2 digits) and exchanging the segments
either to the right or to the left of this point with another string partitioned similarly to produce two new
offspring.
• Mutation:
It is a random change in a candidate solution. Sometimes it is used to prevent the algorithm from getting
stuck. The procedure changes a 1 to a 0 to a 1 instead of duplicating them.
• Encoding:
It is means that how we Are representing an individual or representing the “genetic structure” of a possible
solution.
-Binary Encoding: Most Common –string of bits, 0 or 1 (Gives you many possibilities). => Chrom: 1011001011
Example Problem: Knapsack problem
-Permutation Encoding: Used in “ordering problems” => Chrom: 153264798
Example: Travelling salesman problem
-Value Encoding: Used for complicated values (real numbers) and when binary coding would be difficult
Each chromosome is a string of some values. => Chrom: 1.2323 5.3243 0.4556
Example: Finding weights for neural nets.

GA Elements:
• Population: set of individuals each representing a possible solution to a given problem.
• Gene: a solution to problem represented as a set of parameters, these parameters known as genes.
• Chromosome: genes joined together to form a string of values called chromosome.
• Fitness score(value): every chromosome has fitness score can be inferred from the chromosome itself by
using fitness function.

Ga Steps:

1
B. Suppose a Genetic Algorithm uses chromosomes of the form x=abcdefgh with a fixed length of eight genes, each gene
can be any digit between 0 and 9. Let the fitness of individual x be calculated as:

f(x) = (a +b) − (c +d) + (e + f) − (g + h),

and let the initial population consist of four individuals with the following chromosomes:

x1 = 6 5 4 1 3 5 3 2

x2 = 8 7 1 2 6 6 0 1

x3 = 2 3 9 2 1 2 8 5

x4 = 4 1 8 5 2 0 9 4

I. Evaluate the fitness of each individual, showing all your workings, and arrange them in order with the fittest first and
the least fit last.

f(x1) = (6 + 5) - (4 + 1) + (3 + 5) - (3 + 2) = 9

f(x2) = (8 + 7) - (1 + 2) + (6 + 6) - (0 + 1) = 23 (the fittest individual)

f(x3) = (2 + 3) - (9 + 2) + (1 + 2) - (8 + 5) = -16

f(x4) = (4 + 1) - (8 + 5) + (2 + 0) - (9 + 4) = -19 (least fit individual)

The order is x2, x1, x3 and x4.

II. Perform the following crossover operations:


i. Cross the fittest two individuals using one–point crossover at the middle point.

ii. Cross the second and third fittest individuals using a two–point crossover (points b,f).

iii. Cross the first and third fittest individuals (ranked 1st and 3rd) using a uniform crossover.

1
III. Suppose the new population consists of the six offspring individuals received by the crossover operations in
the above question. Evaluate the fitness of the new population, showing all your workings. Has the overall
fitness improved?
Answer: The new population is:
O1 = 8 7 1 2 3 5 3 2 => f(O1) = (8 + 7) - (1 + 2) + (3 + 5) - (3 + 2) = 15
O2 = 6 5 4 1 6 6 0 1 => f(O2) = (6 + 5) - (4 + 1) + (6 + 6) - (0 + 1) = 17
O3 = 6 5 9 2 1 2 3 2 => f(O3) = (6 + 5) - (9 + 2) + (1 + 2) - (3 + 2) = -2
O4 = 2 3 4 1 3 5 8 5 => f(O4) = (2 + 3) - (4 + 1) + (3 + 5) - (8 + 5) = -5
O5 = 2 7 1 2 6 2 0 1 => f(O5) = (2 + 7) - (1 + 2) + (6 + 2) - (0 + 1) = 13
O6 = 8 3 9 2 1 6 8 5 => f(O6) = (8 + 3) - (9 + 2) + (1 + 6) - (8 + 5) = -6
Average fitness = (15+17+ -5 + -2 + 13+ -6) / 6 = 5.33
So that, the overall fitness has improved, since the average fitness is increased.

IV. By looking at the fitness function and considering that genes can only be digits between 0 and 9 find the
chromosome representing the optimal solution. Find the value of the maximum fitness.

Answer: The optimal solution should have a chromosome that gives the maximum of the fitness
function
max f(x) = max [(a + b) - (c + d) + (e + f) - (g + h)].
Because genes can only be digits from 0 to 9, the optimal solution should be:
Xoptimal = 9 9 0 0 9 9 0 0
and the maximum fitness is:
f(Xoptimal) = (9 + 9) - (0 + 0) + (9 + 9) - (0 + 0) = 36

V. By looking at the initial population of the algorithm can you say whether it will be able to reach the optimal
solution without the mutation operator?

Answer: No, the algorithm will never reach the optimal solution without mutation. The optimal solution is
Xoptimal = 9 9 0 0 9 9 0 0. If mutation does not occur, then the only way to change genes is by
applying the crossover operator. Regardless of the way crossover is performed, its only outcome is
an exchange of genes of parents at certain positions in the chromosome. This means that the first
gene in the chromosomes of children can only be either 6, 8, 2 or 4 (i.e. first genes of x1, x2, x3 and
x4), and because none of the individuals in the initial population begins with gene 9, the crossover
operator alone will never be able to produce an offspring with gene 9 in the beginning. One can
easily check that a similar problem is present at several other positions. Thus, without mutation, this
GA will not be able to reach the optimal solution.

You might also like