Statistical Learning Is The Set of Tools For Understanding Data

Problem 1
Statistical learning is the set of tools for understanding data. These tools can be
classified as supervised or unsupervised. Supervised statistical learning generally
involves building a statistical model for predicting, or estimating, an outcome based
on one or more inputs. While, unsupervised statistical learning involves inputs but
no supervising output; nevertheless, we can learn relationships and structure from
such data. Statistical learning data generally deal with variables which are
attributes that can assume different values depending on the instance. These
variables can be qualitative where the values are non-numerical or quantitative in
which the values are numerical. Data generated from the variables is fitted to some
function that relates the dependent variable to the independent variables; this
function can now be used for prediction (estimation of a dependent variables
values based given data) or inference (properties of the system function). If the
outcome of the system is continuous, it is called a regression problem and if it is
categorical it is a classification problem. Also, the observed set of data points we
start with to generate the function of the system is called the training data and the
set of data that was not used in generating the function but applied to the system is
the test data. The estimation of the elusive function can be model-based, where
training data set is mapped into some already established function which is
parametric modelling or no assumptions taken and just an estimate of the function
that gets as close to the data points as possible without being too rough is
generated.
Problem 2
Show that
minc E
E [ Y ] =arg
From
Y c 2 E
2
( Y c ] E
E [ Y ] =arg min c E
i.e. the minimum of
But
( Y c 2 ]
E
is when
c=E [Y ]
(Y E [ Y ] )+( E [ Y ] c )
Y c 2=E
Y E [ Y ] 2+
( Y E [ Y ] 2 ] + E [ ( E [ Y ] c 2 ] + E [ 2 ( Y E [ Y ] )( E [ Y ]c ) ] ] Linearity
E
But also
E [ 2 ( Y E [ Y ] )( E [ Y ]c ) ]=2 E [ ( Y E [ Y ] ) ( E [ Y ] c ) ]
2 ( E [Y ]E [E [ Y ] ] ) ( E [ Y ] c )
0
From let
E [ Y ] = y 0 then E [ E [ Y ] ] =E [ y 0 ] = y 0 y 0 y 0=0
Therefore
2
E [ Y ]c
( Y E [ Y ] 2 ] +
=E
E
2
But
E [ Y ] c 0 , equality holds only when c=E[ Y ]
( Y E [ Y ] 2 ]
Therefore
Y c 2 E
Problem 3
The practical benefit of the equations show that the mean is the statistic function of
our data that minimizes the mean square error (MSE) i.e. the lower bound for MSE.
Show that
E [ ^f ( x 0 ) f ( x 0 ) ] 2 +Var ( )
( f^ ( x ) E [ f^ ( x ) ] 2 +
0
( y 0 f^ ( x 0 ) 2 ]=E
E
Given
y 0=f ( x 0) +
y
2
( 0f ( x 0 ) )+(f ( x0 ) ^f ( x 0 ) )
^
y
f
x
( 0 ( 0 ) 2 ] =E
E
y
^
( f ( x 0 ) f ( x0 ) 2 ]+ 2 E
+ E
^
( f ( x 0 ) f ( x 0 ) 2 ] + 2 E
E [ f ( x 0 ) + f ( x0 ) 2 ]+ E
( f ( x 0 ) ^f ( x0 ) 2 ]+ 2 E [ ( ) ( f ( x 0 ) f^ ( x 0 ) ) ] but E [ ]=0
+ E
E
( f ( x 0 ) ^f ( x0 ) 2 ]
+ E
E
Also from
E [ Y ]c
( Y E [ Y ] 2 ] +
=E
E
replacing Y with
f^ ( x 0 ) and c with
f ( x 0 ) and applying the same
principle
E [ f^ ( x 0 ) ] f ( x 0 ) 2
( f ( x 0 )E [ f^ ( x 0 ) ] 2 +
( f ( x 0 ) f^ ( x 0 ) ]=E
2
E
Therefore,
E [ f^ ( x 0 ) ] f ( x 0 ) 2
( f ( x 0 )E [ f^ ( x 0 ) ] 2 +
+ E
( y 0 f^ ( x 0 ) 2 ]=E
Since
( 2 ](f ( x0 ) is deterministic
E [ f ( x 0) ]=f ( x0 ) )
Var ( )=E
^f ( x 0 )f ( x0 )
E
( f ( x 0 )E [ ^f ( x0 ) ] 2 +
( y 0 f^ ( x 0 ) 2 ] =Var ( )+E
proved
Problem 4
(b) The trainset contains 80 observations and 31 observations in the test set and in
total we have 111 observations.
(c) There are four variables with values as shown in the table below:
Variables
Range (maxmin)
Ozone
167
Radiation
327
Temperature
40
wind
18.4
(d) The plot below shows scatterplots for
Mean
42.1
184.8
77.8
9.9
all pairs
Standard
deviation
33.274
91.152
9.530
3.559
Pearson
correlation co-efficient
values
ozone
radiation
temperature
wind
ozone
1.0000000
0.3483417
0.6985414
-0.6129508
radiation
0.3483417
1.0000000
0.2940876
-0.1273656
temperature
0.6985414
0.2940876
1.0000000
-0.4971459
wind
-0.6129508
-0.1273656
-0.4971459
1.0000000
The range of the Pearson correlation co-efficient in general is
1.00000(0.61295 )=1.61295
Correlation of zero value implies that that the variables involved are independent of
one another I.e. there is no linear relationship between the variables.
The correlation between wind and all other variables is negative hence, as values
for wind decreases all other variables value increases while all other pairs of
variables yield a positive correlation in which case they are directly proportional.
This inference can be seen visually.
(f) Below is a scatter plot for the true responses and predicted response
The
y i ^
y i 2
RSS=
is
8208.509
And the correlation of the true responses to the predicted values is
actual Value
predicted
actual Value
1.0000000
0.8268958
predicted
0.8268958
1.0000000
(g)
Plot of RSS on k for the test dset
The most suitable value of k is 5 having the lowest RSS value

KNN assumes the response is categorical
(h) I would select the linear model with better RSS values and correlation which is
obvious since the categories used for the KNN is too much.

Statistical Learning Is The Set of Tools For Understanding Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Learning Is The Set of Tools For Understanding Data

Uploaded by

Copyright:

Available Formats

Problem 1

E [ Y ] c 0 , equality holds only when c=E[ Y ]

f ( x 0 ) and applying the same

The range of the Pearson correlation co-efficient in general is

The most suitable value of k is 5 having the lowest RSS value

You might also like