Professional Documents
Culture Documents
1) In real-world data, tuples with missing values for some attributes are a common
occurrence. Describe various methods for handling this problem.
2) gave the following data for the attribute age: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22,
25, 25, 25, 25, 25, 31, 34, 34, 35, 35, 35, 36, 39, 45, 46, 52, 70.
a) Use smoothing by bin means to smooth these data, using a bin depth of 3.
Illustrate your steps. Comment on the effect of this technique for the given data.
b) Determine outliers in the data?
3) Using the data for age given in Q2, answer the following:
a) Use min-max normalization to transform the value 35 for age onto the range
[0.0, 1.0].
b) Use z-score normalization to transform the value 35 for age.
§ If your student number ends with 0,1,2,3 or 4 you analyze the 32nd, 33rd and 34th
numerical attribute and the class variable of the dataset:
§ If your student number ends with 5,6,7,8 or 9 you analyze the 7th , 32nd and 33rd
numerical attribute of the dataset and the class variable of the dataset
Apply the following exploratory data analysis techniques to your dataset:
a. Compute the covariance matrix for the three numerical attributes you are analyzing;
also compute the correlation for each of the three pairs of attributes. Interpret the
statistical findings!
b. Create a scatter plot for the last two numerical attributes of your dataset. Interpret the
scatter plot!
c. Create histograms for each of the 3 attributes. Interpret the histograms plot!
d. Analyze the spread and distribution of the 33rd numerical attribute of the original
dataset.
e. Use one more display of your own liking to visualize the 33rd numerical attribute;
compare it with its histogram visualization your created in part 3.
5)
6)
Use a suitable Dataset from your own choice then apply a Data Reduction algorithm ( PCA for
example) to reduce the data dimensionality to a suitable size.
Explain your work in detail!
That’s all
Best wishes