You are on page 1of 8

Problem 1

Statistical learning is the set of tools for understanding data. These tools can be
classified as supervised or unsupervised. Supervised statistical learning generally
involves building a statistical model for predicting, or estimating, an outcome based
on one or more inputs. While, unsupervised statistical learning involves inputs but
no supervising output; nevertheless, we can learn relationships and structure from
such data. Statistical learning data generally deal with variables which are
attributes that can assume different values depending on the instance. These
variables can be qualitative where the values are non-numerical or quantitative in
which the values are numerical. Data generated from the variables is fitted to some
function that relates the dependent variable to the independent variables; this
function can now be used for prediction (estimation of a dependent variables
values based given data) or inference (properties of the system function). If the
outcome of the system is continuous, it is called a regression problem and if it is
categorical it is a classification problem. Also, the observed set of data points we
start with to generate the function of the system is called the training data and the
set of data that was not used in generating the function but applied to the system is
the test data. The estimation of the elusive function can be model-based, where
training data set is mapped into some already established function which is
parametric modelling or no assumptions taken and just an estimate of the function
that gets as close to the data points as possible without being too rough is
generated.
Problem 2
Show that

minc E
E [ Y ] =arg
From

Y c 2 E

2
( Y c ] E
E [ Y ] =arg min c E
i.e. the minimum of
But

( Y c 2 ]
E

is when

c=E [Y ]

(Y E [ Y ] )+( E [ Y ] c )

Y c 2=E

Y E [ Y ] 2+

( Y E [ Y ] 2 ] + E [ ( E [ Y ] c 2 ] + E [ 2 ( Y E [ Y ] )( E [ Y ]c ) ] ] Linearity
E
But also

E [ 2 ( Y E [ Y ] )( E [ Y ]c ) ]=2 E [ ( Y E [ Y ] ) ( E [ Y ] c ) ]
2 ( E [Y ]E [E [ Y ] ] ) ( E [ Y ] c )
0
From let

E [ Y ] = y 0 then E [ E [ Y ] ] =E [ y 0 ] = y 0 y 0 y 0=0

Therefore
2

E [ Y ]c
( Y E [ Y ] 2 ] +
=E
E
2

But

E [ Y ] c 0 , equality holds only when c=E[ Y ]

( Y E [ Y ] 2 ]
Therefore

Y c 2 E

Problem 3
The practical benefit of the equations show that the mean is the statistic function of
our data that minimizes the mean square error (MSE) i.e. the lower bound for MSE.

Show that

E [ ^f ( x 0 ) f ( x 0 ) ] 2 +Var ( )
( f^ ( x ) E [ f^ ( x ) ] 2 +
0

( y 0 f^ ( x 0 ) 2 ]=E

E
Given

y 0=f ( x 0) +

y
2
( 0f ( x 0 ) )+(f ( x0 ) ^f ( x 0 ) )

^
y

f
x
( 0 ( 0 ) 2 ] =E

E
y

^
( f ( x 0 ) f ( x0 ) 2 ]+ 2 E
+ E

^
( f ( x 0 ) f ( x 0 ) 2 ] + 2 E
E [ f ( x 0 ) + f ( x0 ) 2 ]+ E

( f ( x 0 ) ^f ( x0 ) 2 ]+ 2 E [ ( ) ( f ( x 0 ) f^ ( x 0 ) ) ] but E [ ]=0
+ E
E

( f ( x 0 ) ^f ( x0 ) 2 ]
+ E
E
Also from

E [ Y ]c
( Y E [ Y ] 2 ] +
=E
E

replacing Y with

f^ ( x 0 ) and c with

f ( x 0 ) and applying the same

principle

E [ f^ ( x 0 ) ] f ( x 0 ) 2
( f ( x 0 )E [ f^ ( x 0 ) ] 2 +

( f ( x 0 ) f^ ( x 0 ) ]=E
2

E
Therefore,

E [ f^ ( x 0 ) ] f ( x 0 ) 2
( f ( x 0 )E [ f^ ( x 0 ) ] 2 +

+ E
( y 0 f^ ( x 0 ) 2 ]=E

Since

( 2 ](f ( x0 ) is deterministic

E [ f ( x 0) ]=f ( x0 ) )

Var ( )=E

^f ( x 0 )f ( x0 )

E
( f ( x 0 )E [ ^f ( x0 ) ] 2 +
( y 0 f^ ( x 0 ) 2 ] =Var ( )+E

proved

Problem 4
(b) The trainset contains 80 observations and 31 observations in the test set and in
total we have 111 observations.

(c) There are four variables with values as shown in the table below:
Variables

Range (maxmin)
Ozone
167
Radiation
327
Temperature
40
wind
18.4
(d) The plot below shows scatterplots for

Mean
42.1
184.8
77.8
9.9
all pairs

Standard
deviation
33.274
91.152
9.530
3.559

Pearson

correlation co-efficient
values
ozone
radiation
temperature
wind

ozone
1.0000000
0.3483417
0.6985414
-0.6129508

radiation
0.3483417
1.0000000
0.2940876
-0.1273656

temperature
0.6985414
0.2940876
1.0000000
-0.4971459

wind
-0.6129508
-0.1273656
-0.4971459
1.0000000

The range of the Pearson correlation co-efficient in general is

1.00000(0.61295 )=1.61295
Correlation of zero value implies that that the variables involved are independent of
one another I.e. there is no linear relationship between the variables.
The correlation between wind and all other variables is negative hence, as values
for wind decreases all other variables value increases while all other pairs of
variables yield a positive correlation in which case they are directly proportional.
This inference can be seen visually.
(f) Below is a scatter plot for the true responses and predicted response

The

y i ^
y i 2
RSS=

is

8208.509
And the correlation of the true responses to the predicted values is
actual Value
predicted

actual Value
1.0000000
0.8268958

predicted
0.8268958
1.0000000

(g)
Plot of RSS on k for the test dset

The most suitable value of k is 5 having the lowest RSS value


KNN assumes the response is categorical
(h) I would select the linear model with better RSS values and correlation which is
obvious since the categories used for the KNN is too much.

You might also like