33 views

Uploaded by Jamie Chan

ML Solutions

- Multiple Linear Regression in Data Mining
- 000282803321455287
- Implied Volatility from Options on Gold Futures
- et_Ch3
- Lecture 02 Revisedeconometrics regression
- L1_Classical Linear Regression
- Notes on Econometrics
- 7%2C8-simple%20regression.pdf
- Scribd
- ForecastForecasting with term structure
- Sample Selection Bias
- Lecture 03
- lec12_chap10
- Poker Superstars
- MLVE
- Why Does Spousal Education Matter for Earnings--Assortative Mating or Cross-Productivity
- Davies & Koo
- Survey Adjustments R
- OHEADSch3
- Econometrics-I-12.pptx

You are on page 1of 5

1. For a given loss function L, the risk R is given by the expected loss

(a) Consider a regression problem and the squared error loss

Answer: We have

h i

R(f ) = E (Y f (X))2

Z h i

= E (Y f (X))2 X = x gX (x) dx,

h i

E (Y f (X))2 X = x

= E Y 2 X = x 2f (x) E [ Y | X = x] + f (x)2

= Var [Y |X = x] + (E [ Y | X = x] f (x))2 .

f (x) = E [ Y | X = x] .

E |Y f (x)|X = x

E sign(Y f (x))X = x = P(Y < f (x)|X = x) + P(Y > f (x)|X = x) = 0

which occurs when P(Y < f (x)|X = x) = P(Y > f (x)|X = x) = 1/2, i.e. at the conditional

median f (x) = med [ Y | X = x].

2. Let (xi , yi )ni=1 be our dataset, with xi Rp and yi R. Linear regression can be formulated as

empirical risk minimization, where the model is to predict y as x> , and we use the squared loss:

n

X 1

R emp

() = (yi x>

i )

2

2

i=1

1

(a) Show that the optimal parameter is

i , and Y is a n 1 matrix with ith entry yi .

1

kY Xk22

2

Differentiating wrt and setting to 0,

(X Y)> X = 0

> (X> X) Y> X = 0

= (X> X)1 X> Y

(b) Consider regularizing our empirical risk by incorporating a L2 regularizer. That is, find mini-

mizing

n

X 1

kk22 + (yi x>

i )

2

2 2

i=1

Show that the optimal parameter is given by the ridge regression estimator

1

kY Xk22 + kk22

2 2

Again differentiating and setting derivative to 0,

(X Y)> X + > = 0

> (I + X> X) Y> X = 0

= (I + X> X)1 X> Y

(c) Compare the regularized and unregularized estimators of for n = 10 samples generated from

the following model:

y N (X, ) (1)

beta = c(-.1,.3,-.5,.2,-.5,.1,.3,7)

sigma = 0.5

p = 8

n = 10

y = rnorm(n, mean=X%*%beta, sd=sigma)

2

use = .1.

Which estimator has smaller L2 norm? Investigate the bias / variance tradeoff across values of

[0, 10] as follows. Generate new datasets for training and testing and make a learning curve

plot comparing training and testing error like the one shown in lecture on slide 13 of Lecture 2.

Put on the x-axis and mean square error (prediction error) on the y-axis. Label the points on

the plot corresponding to unregularized linear regression.

Answer: See the R markdown document: http://www.stats.ox.ac.uk/flaxman/

sml17/problem_2c_on_regularization.html.

3. Consider two univariate normal distributions N (, 2 ) with known parameters A = 10 and A = 5

for class A and B = 20 and B = 5 for class B. Suppose class A represents the random score X of

a medical test of normal patients and class B represents the score of patients with a certain disease. A

priori there are 100 times more healthy patients than patients carrying the disease.

(a) Find the optimal decision rule in terms of misclassification error (0-1 loss) for allocating a new

observation x to either class A or B.

Answer: The optimal decision for X = x is to allocate to class

argmaxk{A,B} k gk (x).

1 (x A )2 1 (x B )2

A exp( 2 ) B exp( 2 ),

2A 2A 2B 2B

that is, using A = B , iff

2

2A log(A /B ) + (x A )2 (x B )2 .

2x(B A ) + 2A 2B 2A

2

log(A /B ) = 0.

For the given values, this implies that the decision boundary is at

that is all patients with a test score above 26.51 are classified as having the disease.

(b) Repeat (a) if the cost of a false negative (allocating a sick patient to group A) is > 1 times that

of a false positive (allocating a healthy person to group B). Describe how the rule changes as

increases. For which value of are 84.1% of all patients with disease correctly classified?

Answer: The optimal decision minimizes E [L(Y, f (x))|X = x]. It is hence optimal to choose

class A (healthy) over class B if and only if

P the patients should be classified as healthy now iff (ignoring

again the common denominator k{A,B} k gk (x)),

1 (x A )2 1 (x B )2

A exp( 2 ) B exp( 2 ).

2A 2A 2B 2B

3

The decision boundary is now attained if x fulfills

2x(B A ) + 2A 2B 2A

2

log(A /(B )) = 0.

For increasing values of , patients with decreasingly smaller test scores are classified as having

the disease.

84.1% of all patients carrying the disease are correctly classified if the decision boundary is at

the 15.9%-quantile of the N (B , B2 )-distribution, which is at q = 20 + 51 (0.159) 15.

20q 300

= 100 exp = 100,

50

approximately 84.1% of all patients are correctly classified as carrying the disease.

4. (Exercise 2.8 in Elements of Statistical Learning) Compare the classification performance of logis-

tic regression, regularized logistic regression (and optionally k-nearest neighbor classification) on

the ZIP code digit image dataset, restricting to only the 2s and 3s. Investigate L1 and L2 regular-

ization alone and in combination (the elastic net). Show both training and testing error for each

choice. The ZIP code data are available from https://statweb.stanford.edu/tibs/

ElemStatLearn/data.html. You can read more about the data in Section 11.7 of Elements of

Statistical Learning.

Answer: See the R markdown document: http://www.stats.ox.ac.uk/flaxman/sml17/

zip.html

5. OPTIONAL: (Exercise 2.9 in Elements of Statistical Learning) Consider a linear regression model

with p parameters, fit with unregularized linear regression (sometimes called least squares or ordi-

nary least squares or even just OLS) to a set of training data (xi , yi )1iN drawn at random from a

population. Let be the estimator. Suppose we have some test data (xi , yi )1iM drawn at random

from the same population as the training data.

2 2

If Rtr () = N1 N > 1 PM >

P

i=1 yi xi and Rte () = M i=1 yi xi , prove that

Answer:

For a given set of training data, the corresponding estimator is found by minimizing the risk:

N

1 X 2

= argmin Rtr () = argmin yi x>

i

N

i=1

We could also consider corresponding to test data:

M

1 X >

2

= argmin Rte () = argmin yi xi

M

i=1

4

From this statement we see that Rte () Rte (), for any particular test dataset (xi , yi )1iM . If

we now consider the expectation over randomly drawn test datasets we have:

E[Rte ()] E[Rte ()], .

E[Rte ()] E[Rte ()]. (2)

Since train and test datasets come from the same distribution, we know that:

E[Rte ()] = E[Rtr ()]

Thus we see:

E[Rtr ()] = E[Rte ()] E[Rte ()]

by Eq. 2.

- Multiple Linear Regression in Data MiningUploaded byakirank1
- 000282803321455287Uploaded byFelipe Fajardo
- Implied Volatility from Options on Gold FuturesUploaded byCervino Institute
- et_Ch3Uploaded byRyan Taga
- Lecture 02 Revisedeconometrics regressionUploaded byAnubhav Bigamal
- L1_Classical Linear RegressionUploaded byMan Ho Li
- Notes on EconometricsUploaded byTabarnouche
- 7%2C8-simple%20regression.pdfUploaded byKevin Hyun
- ScribdUploaded byarjunkrishnan93
- ForecastForecasting with term structureUploaded byCésar Chávez La Rosa
- Sample Selection BiasUploaded bybo_bowl
- Lecture 03Uploaded bynaeem
- lec12_chap10Uploaded byakirank1
- Poker SuperstarsUploaded byTracy Jane Andrews
- MLVEUploaded byHiusze Ng
- Why Does Spousal Education Matter for Earnings--Assortative Mating or Cross-ProductivityUploaded bymurtajih
- Davies & KooUploaded byRaul Nuñez Brantes
- Survey Adjustments RUploaded byVijay A Hwre
- OHEADSch3Uploaded byRubayyat Hashmi
- Econometrics-I-12.pptxUploaded byPhuong Ho
- IPEM Perfect ComplementarityUploaded byc1356407
- What Does r² measure.pdfUploaded byGerson Miranda
- bontis2000.pdfUploaded byJuhaidah Jamal
- Gretl eBookUploaded bydickie_hardiansyah
- Williams L.J., Krishnan A., Abdi H - Experimental Design and Analysis for Psychology .pdfUploaded byDafny Arone Urbano
- Matrix for WebUploaded byAndreiNicolaiPacheco
- FREIT541Uploaded byanandi
- 56_dorUploaded byAmy Gray
- GE Lecture 3 Parts.pdfUploaded byKushtrim Kransici

- Royal Albert Hall Seating PlanUploaded byJamie Chan
- L9Uploaded byJamie Chan
- ConnorUploaded byREVE
- 0_Problem C + Solution AUploaded byJamie Chan
- 0_Problem C + Solution AUploaded byJamie Chan
- 1_Uploaded byJamie Chan
- RR1Uploaded byJamie Chan
- braun_manacct_ce_tif_01.docUploaded byhujie19930721
- s3 Chem Rev Notes Ch1&5Uploaded byJamie Chan
- Non-residential Hall Application FormUploaded byJamie Chan
- braun_ma4_sm_09Uploaded bynobleverma
- Chem Notes 00 - Writing Chemical EquationsUploaded byJamie Chan
- Corporate Strategy v2Uploaded byJamie Chan

- HermannSchweizer SmartGlassesTechnologyApplications SlidesUploaded byCameron Rios
- __2 Chapter 2 Review HomeworkUploaded byNaina Agarwal
- 11 History Notes 07 Changing Cultural TraditionsUploaded byDivya Kaul
- My Reading LogsUploaded bytashell
- quality controlUploaded bytushargkambli
- network organisationUploaded bycsoni27
- Chapter 5 - StarUML 5.0 User Guide (Modeling With Sequence Diagram)Uploaded byonmcv
- 02c Probabilistic Inventory Models (1)Uploaded bySeema
- lp-unit-1-assignment.docUploaded byC.RadhiyaDevi
- For the Purposes of This PaperUploaded byJohn Jufel Valdez
- Chmod - where, when and how ?Uploaded bypiotroxp
- Force Motion EnergyUploaded byChar Char Pongasi
- Indesign-keyboard-shortcuts-cheatsheet-print-ready-a4.pdfUploaded byJefferson Velonza
- Current Issues in the Teaching of GrammarUploaded byEduardo
- Cpeo 450 Ds FinalUploaded bymy_scribd_docs
- 7010-0944 - SokkiaSSF_rm.pdfUploaded bying_alulema_topcon
- Effects of Team Teaching on Students' Academic Achievement In English Language ComprehensionUploaded byIOSRjournal
- Mohammed R SnookUploaded byBechir Mettali
- GokuUploaded byGianMarcoOnofreCcora
- 10 learning plan overviewUploaded byapi-258643060
- Ten nationsUploaded byapi-3708504
- Analysis of the speech acts in a language class EssayUploaded byOscar López Serrano
- Adolescent Idiopathic ScoliosisUploaded byAlina Pasăre
- online winchell resumeUploaded byapi-218120038
- Nagaoka Et Al. 2009Uploaded byJuan Pablo Villanueva
- Of Doctors and Engineers - BioMedical Imaging TechniquesUploaded byAdvanced Digital Imaging Solutions Laboratory
- Chapter08 ConvolutionUploaded bySam Keays
- Aquaculture Health MagazineUploaded byfdlab
- Science and TechnologyUploaded byRadhaPriya
- Non Linear Dynamics - Cable_tutorial_rev1Uploaded byCostynha