You are on page 1of 2

2014 Fall, INE5008 Data Mining, by Kichun Lee

HW 2
Due 5 pm, Oct. 3, 2014

PART #1
Refer to the attached HWtrain.data.txt and HWtest.data.txt in a comma-separated form. The file
HWtrain.data.txt contains 173 observations (rows) of wine quality. The first value in a row
represents the wine quality of the row in terms of 1, 2, and 3. The values from the 2 nd column and
14th column are some descriptive values for the wine, as discussed in class. The file
HWtest.data.txt contains five future-observations that we would like to estimate the wine
qualities (marked by X).
We would like to apply principle component analysis after normalizing the descriptive values in
HWtrain.data.txt
(a) When we use the first two principle components, how much information of the total do
we use?
PC1 과 PC2 를 이용하면 55%의 정보를 설명할 수 있다. 그러나 보통 PCA 방법에서는 60% 이상
또는 80%이상이어야 하기에 PC5 까지 하는 것이 옳다고 볼 수 있다.

(b) Find the top-three original variables among the 13 that contributed to the construction
of the first principle component.
PC1 에서 가장 영향력 있는 3 가지를 뽑자면 절대값의 크기가 가장 큰 것을 보면 된다. 그렇기에,
v2, v12, v7 순으로 큰 영향력을 끼친다.
(c) Draw a plot of the 1st scores (on the X asis) and the 2nd scores (on the Y axis) of the
descriptive values in HWtrain.data.txt together with the wine qualities and similarly
transformed score values of the five future observations in HWtest.data.txt. Then try to
predict the wine quality of each of the five future observations.

5 개의 A 구역에 2 개 B 구역에 3 개가 들어간다.

You might also like