Ingmar Schuster

Patrick Jhnichen

using slides by Andrew Ng

Institut fr Informatik

Linear Regression

Hypothesis formulation,

hypthesis space

Descent

Using multiple input features

with Linear Regression

Feature Scaling

Nonlinear Regression

derivatives

Linear Regression

Institut fr Informatik

Regression Problem

Linear regression w. gradient descent

x is input (predictor) variable

features in ML-speek

y is output (response) variable

Notation

Square meters

Price in 1000

73

174

146

367

38

69

124

257

...

...

Learning procedure

Hypothesis parameters

Training data

linear regression,

one input variable (univariate)

Learning Algorithm

Size

of flat

Estimated

price

hypothesis

(mapping between

input and output)

Linear regression w. gradient descent

Optimization objective

optimization objective and cost function (often called J)

...

(h(x) close to y for (x,y) in training data)

Cost function often named J

Number

Number of

of

training

training instances

instances

Squaring

Linear regression w. gradient descent

Optimizing Cost

with Gradient Descent

Want to minimize

Keep changing

to reduce

until we end up at minimum

10

Stepwise

Stepwise

descent

descent

towards

towards

minimum

minimum

Derivatives

Derivatives

work

work only

only for

for

few

few parameters

parameters

Gradient descent

partial

partial

derivative

derivative

beware: incremental

update incorrect!

steps

steps become

become smaller

smaller

without

without changing

changing

learning

learning rate

rate

12

convergence

not lead to convergence or to

divergence

Often

13

Checking convergence

correctly if

decreases

with every step

Possible convergence

criterion: converged if

decreases by less than

constant

14

Local Minima

(e.g. J not squared error for regression with only one variable)

Random restart

with different

parameter(s)

15

16

Multiple features

Square Bedrooms Floors Age of building

meters

(years)

x1

x2

x3

x4

Price in

1000

y

200

45

460

131

40

232

142

30

315

756

36

178

Notation

17

Hypothesis representation

More compact

with definition

18

19

20

21

(scales can vary strongly)

Square meters 30 - 400

Bedrooms

1 - 10

Price

80 000

2 000 000

22

Feature Scaling

23

Feature scaling

scale

Z-score conversion

24

Z-Score conversion

Center data on 0

mean

mean // empirical

empirical

expected

expected value

value

(mu)

(mu)

empirical

empirical standard

standard

deviation

deviation (sigma)

(sigma)

25

26

Nonlinear Regression

(by cheap trickery)

27

28

29

30

31

Optimizing cost

using derivatives

32

solve

for all i

33

Gradient Descent

Need to choose

Needs many iterations,

random restarts etc.

Works well for many features

Derivation

No need to choose

No iterations

34

Linear Regression

Hypothesis formulation,

hypthesis space

Descent

Using multiple input features

with Linear Regression

Feature Scaling

Nonlinear Regression

derivatives

35

Pictures

en.wikipedia.org and

de.wikipedia.org

36

