Professional Documents
Culture Documents
Sheet1 2018
Sheet1 2018
Faculty of Engineering
Computer and Systems Engineering Department
CSE 412: Selected Topics in Computer Engineering
4th Year CSE – 2nd Semester 2017/2018
S H E E T 1
1. “Data Value Conflicts problem is one of the major problems that face data integration,” What is
the meaning of Data Value Conflicts and how can we resolve such conflicts?
a. Fill-in the incomplete data in fields “Cylinders” and “Used/New” using appropriate default
values, then fill them in again using mean/mode method.
b. For the field “Horse Power,” apply Inter-Quartile-Range method to find any outlier in this
field.
c. Identify all the noisy data in the table and mention the reasons for considering them noisy.
For the data shown in Table 1, use the following normalization methods to normalize “Average
Surface Temperature” and “Average Distance to the Sun” fields:
i. Min-Max, use [0, 1] as the normalization interval
ii. Z-Score
iii. Decimal Scaling
1/2
Assignment: Due on Next Week
1. “Validation and standardization are two strategies that are commonly used to smooth noisy
data,” explain briefly with the aid of examples the meaning of each of these terms.
3. For the data shown in Table 2, use the following normalization methods to normalize
“Approximate Solar Day”, “Approximate Solar Year”, and “Approximate Diameter” fields:
a. Min-Max, use [0, 1] as the normalization interval
b. Z-Score
c. Decimal Scaling
2/2