You are on page 1of 2

Ain Shams University

Faculty of Engineering
Computer and Systems Engineering Department
CSE 412: Selected Topics in Computer Engineering
4th Year CSE – 2nd Semester 2017/2018

S H E E T 1

1. “Data Value Conflicts problem is one of the major problems that face data integration,” What is
the meaning of Data Value Conflicts and how can we resolve such conflicts?

2. Table 1 represents the collected data for a set of car models:


Table 1: Data that describe a set of car models
Maximum Usage
Car Manufacturing Price Horse
Cylinders CC Speed Used/New Duration
Model Year (L.E.) Power
(Km/hour) (Years)
Toyota 2009 1600 150000 195 1400 New 3
Jeep 2009 6 3700 320000 200 210 New 0
Mercury MWVR09 6 4000 2500000 210 0
Opel 2008 4 1600 180000 192 105 New 10
Mitsubishi 2006 4 –1600 170 106 Used 3
Honda 2009 4 1500 120000 180 92 New 0
Mazda 2003 4 1600 85000 180 Used 6

a. Fill-in the incomplete data in fields “Cylinders” and “Used/New” using appropriate default
values, then fill them in again using mean/mode method.
b. For the field “Horse Power,” apply Inter-Quartile-Range method to find any outlier in this
field.
c. Identify all the noisy data in the table and mention the reasons for considering them noisy.

3. Table 2 shows the characteristics of a set of planets in the solar system.


Table 2: Characteristics of a set of planets in the solar system
Average Surface Approximate Approximate Approximate Average
Planet Temperature Solar Day (Earth Solar Year (Earth Diameter Distance to the
(C) Days) Days) (Km) Sun (Km)
Mercury 179 58.65 88 4880 58000000
Venus 482 243 225 12102 108000000
Mars –60 1 687 6792 230000000
Saturn –153 0.4263 10760 120000 1427000000
Uranus –218 0.7458 30681 50800 2870000000
Pluto –222 6.3900 90545 2320 5900000000

For the data shown in Table 1, use the following normalization methods to normalize “Average
Surface Temperature” and “Average Distance to the Sun” fields:
i. Min-Max, use [0, 1] as the normalization interval
ii. Z-Score
iii. Decimal Scaling

1/2
Assignment: Due on Next Week

1. “Validation and standardization are two strategies that are commonly used to smooth noisy
data,” explain briefly with the aid of examples the meaning of each of these terms.

2. Consider the data shown in Table 1:


a. Find any outliers in field “Price.”
b. Fill-in the incomplete data in fields “Price” and “Maximum Speed” using appropriate
default values, then fill them in again using mean/mode method.
c. Is there any inconsistency in the data values? If so, identify them and state the reasons
behind these inconsistencies.
Note: State all the assumptions you used.

3. For the data shown in Table 2, use the following normalization methods to normalize
“Approximate Solar Day”, “Approximate Solar Year”, and “Approximate Diameter” fields:
a. Min-Max, use [0, 1] as the normalization interval
b. Z-Score
c. Decimal Scaling

2/2

You might also like