Overview of Supervised Learning

© All Rights Reserved

6 views

Overview of Supervised Learning

© All Rights Reserved

- artificial intelligence_dp2
- Hamming embedding and weak geometric consistency for large scale image search
- Predicting Type II Censored Data From Factorial Experiments Using Modified Maximum Likelihood Predictor
- Regerssion Notes 2
- 2014-005
- Major doc 13 Batch FINAL@@@.docx
- hoiem_cvpr06
- Maximum Likelihood
- Joint Modeling Li L
- 2011-2 05545372
- Peg Bayesian[1]
- j 25043046
- Handout 2
- Binary data
- A method for disambiguating word senses in a large corpus.pdf
- A 01100104
- Factor Analysis II Dynamic Factor Models
- Homework Nonprm
- Analysis and Optimization of a Butterfly Valve Disc
- Ref

You are on page 1of 133

DD3364

March 9, 2012

Problem 1: Regression

y

10

Predict its y value?

5

x

0

Problem 2: Classification

Is it a bike or a

face ?

Some Terminology

In Machine Learning have outputs which are predicted from

measured inputs.

Some Terminology

In Machine Learning have outputs which are predicted from

measured inputs.

output(s) given an input and lots of labelled training examples

{(input1 , output1 ), (input2 , output2 ), . . . , (inputn , outputn )}

Variable types

Outputs can be

discrete (categorical, qualitative),

continuous (quantitative) or

ordered categorical (order is important)

Predicting a continuous output is referred to as regression.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

Denote an input variable by X.

If X is a vector, its components are denoted by Xj

Quantitative (continuous) outputs are denoted by Y

Qualitative (discrete) outputs are denoted by G

Observed values are written in lower case.

xi is the ith observed value of X. If X is a vector then xi is a

Matrices are represented by bold uppercase letters.

More Notation

The prediction of the output for a given value of input vector

X is denoted by Y .

regression problems

T = {(x1 , y1 ), . . . , (xn , yn )}

with each xi Rp and yi R

It is presumed that we have labelled training data for

classification problems

T = {(x1 , g1 ), . . . , (xn , gn )}

with each xi Rp and gi {1, . . . , G}

More Notation

The prediction of the output for a given value of input vector

X is denoted by Y .

regression problems

T = {(x1 , y1 ), . . . , (xn , yn )}

with each xi Rp and yi R

It is presumed that we have labelled training data for

classification problems

T = {(x1 , g1 ), . . . , (xn , gn )}

with each xi Rp and gi {1, . . . , G}

More Notation

The prediction of the output for a given value of input vector

X is denoted by Y .

regression problems

T = {(x1 , y1 ), . . . , (xn , yn )}

with each xi Rp and yi R

It is presumed that we have labelled training data for

classification problems

T = {(x1 , g1 ), . . . , (xn , gn )}

with each xi Rp and gi {1, . . . , G}

Nearest Neighbours

Linear Model

Have an input vector X = (X1 , . . . , Xp )t

A linear model predicts the output Y as

Y = 0 +

p

X

Xj j

j=1

Let X = (1, X1 , . . . , Xp )t and = (0 , . . . , p )t then

Y = X t

Linear Model

Have an input vector X = (X1 , . . . , Xp )t

A linear model predicts the output Y as

Y = 0 +

p

X

Xj j

j=1

Let X = (1, X1 , . . . , Xp )t and = (0 , . . . , p )t then

Y = X t

How is a linear model fit to a set of training data?

Most popular approach is a Least Squares approach

is chosen to minimize

n

X

RSS() =

(yi xti )2

i=1

unique.

RSS() = (y X)t (y X)

where X Rnp is a matrix with each row being an input

vector and y = (y1 , . . . , yn )t

How is a linear model fit to a set of training data?

Most popular approach is a Least Squares approach

is chosen to minimize

n

X

RSS() =

(yi xti )2

i=1

unique.

RSS() = (y X)t (y X)

where X Rnp is a matrix with each row being an input

vector and y = (y1 , . . . , yn )t

The solution to

is given by

= (Xt X)1 Xt y

if Xt X is non-singular

This is easy to show by differentiation of RSS()

This model has p + 1 parameters.

Assume one has training data {(xi , yi )}n

i=1 with each

(

0

G(x)

=

1

if xt .5

if xt > .5

Assume one has training data {(xi , yi )}n

i=1 with each

(

0

G(x)

=

1

if xt .5

if xt > .5

The linear classifier

the training examples

too rigid

separated by a line

k=1

10 mixtures

The linear classifier

the training examples

too rigid

separated by a line

k=1

10 mixtures

The linear classifier

the training examples

too rigid

separated by a line

k=1

10 mixtures

the k-nearest neighbour fit for Y is

1

Y (x) =

k

yi

xi Nk (x)

closest points xi in the training data.

Closeness if defined by some metric.

For this lecture assume it is the Euclidean distance.

k-nearest neighbours in words:

responses.

Training data: {(xi , gi )} with each gi {0, 1}

is

the k-nearest neighbour estimate for G

G(x)

=

(

0

1

if

P

1

k

g

xi Nk (x) i .5

otherwise

closest points xi in the training data.

k-nearest neighbours in words:

of x as the majority class amongst the neighbours.

k = 15

k=1

classified.

But how well will it perform

same distribution?

k=1

classified.

But how well will it perform

same distribution?

k=1

classified.

But how well will it perform

same distribution?

k=1

There are two parameters that control the behaviour of k-nn.

These are k and n the number of training samples

The effective number of parameters of k-nn is n/k

Intuitively

say the nbds were non-overlapping

Would have n/k neighbourhoods

Need to fit one parameter (a mean) to each neighbourhood

There are two parameters that control the behaviour of k-nn.

These are k and n the number of training samples

The effective number of parameters of k-nn is n/k

Intuitively

say the nbds were non-overlapping

Would have n/k neighbourhoods

Need to fit one parameter (a mean) to each neighbourhood

Linear decision boundary is

smooth,

stable to fit

assumes a linear decision boundary is suitable

k-nn decision boundary is

can adapt to any shape of the data,

unstable to fit (for small k)

not smooth, wiggly (for small k)

Linear decision boundary is

smooth,

stable to fit

assumes a linear decision boundary is suitable

k-nn decision boundary is

can adapt to any shape of the data,

unstable to fit (for small k)

not smooth, wiggly (for small k)

pdfs for the two classes.

Test error

Training error: k-nn

Test error: k-nn

Training error: linear

Test error: linear

0.3

0.2

0.1

0

log( nk )

0

How do we measure how well f (X) predicts Y ?

Statisticians would compute the Expected Prediction Error

Z Z

By conditioning on X can write

How do we measure how well f (X) predicts Y ?

Statisticians would compute the Expected Prediction Error

Z Z

By conditioning on X can write

At a point x can minimize EPE to get the best prediction of y

c

The solution is

f (x) = E[Y |X = x]

This is known as the regression function.

At a point x can minimize EPE to get the best prediction of y

c

The solution is

f (x) = E[Y |X = x]

This is known as the regression function.

Only one problem with this: one rarely knows the pdf

p(Y |X).

Example:

p

Training data {(xi , yi )}n

i=1 where xi X R and yi R

averaging.

Let

X = [1, 1]2 and

x2

x2

x2

x

x1

x1

x1

= accuracy of y increases.

Therefore intuition says:

Lots of training data

More formally:

As n increases then

y =

1

k

xi Nk (x)

yi E[y | x]

Therefore intuition says:

Lots of training data

More formally:

As n increases then

y =

1

k

xi Nk (x)

yi E[y | x]

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

For large p

Nearest neighbours are not so close !

The k-nn of x are closer to the boundary of X .

Need a prohibitive number of training samples to densely

sample X Rp

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

For large p

Nearest neighbours are not so close !

The k-nn of x are closer to the boundary of X .

Need a prohibitive number of training samples to densely

sample X Rp

The Curse of Dimensionality (Bellman, 1961)

k-nearest neighbour averaging approach and our intuition

For large p

Nearest neighbours are not so close !

The k-nn of x are closer to the boundary of X .

Need a prohibitive number of training samples to densely

sample X Rp

Scenario:

Estimate a regression function, f : X R, using a k-nn regressor.

Have

X = [0, 1]p (the unit hyper-cube)

Scenario:

Estimate a regression function, f : X R, using a k-nn regressor.

Have

X = [0, 1]p (the unit hyper-cube)

Question:

Let k = r n where r [0, 1] and x = 0.

containing the k-nearest neighbours of x?

Scenario:

Estimate a regression function, f : X R, using a k-nn regressor.

Have

X = [0, 1]p (the unit hyper-cube)

Question:

Let k = r n where r [0, 1] and x = 0.

containing the k-nearest neighbours of x?

Solution:

Volume of hyper-cube of side a is ap . Looking for a s.t. ap equals

a fraction r of the unit hyper-cube volume. Therefore

ap = r = a = r1/p

To recap the expected edge length of the hyper-cube containing a

fraction r of the training data is

ep (r) = r1/p

To recap the expected edge length of the hyper-cube containing a

fraction r of the training data is

ep (r) = r1/p

ep (r)

p = 10

Let p = 10 then

p=3

0.8

ep (.01) = .63,

ep (.1) = .80

Therefore in this case 1% and

10% nearest neighbour estimate

are not local estimates.

p=2

0.6

p=1

0.4

0.2

0

r

0

0.2

0.4

0.6

0.8

Scenario:

Estimate a regression function, f : X R, using a k-nearest

neighbour regressor. Have

X is the unit hyper-sphere(ball) in Rp centred at the origin.

Scenario:

Estimate a regression function, f : X R, using a k-nearest

neighbour regressor. Have

X is the unit hyper-sphere(ball) in Rp centred at the origin.

Question:

Let k = 1 and x = 0.

What is the median distance of the nearest neighbour to x?

Scenario:

Estimate a regression function, f : X R, using a k-nearest

neighbour regressor. Have

X is the unit hyper-sphere(ball) in Rp centred at the origin.

Question:

Let k = 1 and x = 0.

What is the median distance of the nearest neighbour to x?

Solution:

This median distance is given by the expression

1

d(p, n) = (1 .5 n ) p

Plot of d(p, n) for n = 500

distance

0.5

0.4

0.3

0.2

0.1

p

2

10

boundary of X than to x

Consequence

For large p most of the training data points are closer to the

boundary of X than to x.

This is bad because

To make a prediction at x, you will use training samples near

Explanation:

Say n1 = 100 samples represents a dense sampling for a single

input problem

inputs.

sample the input space.

Explanation:

Say n1 = 100 samples represents a dense sampling for a single

input problem

inputs.

sample the input space.

Simulated Example

The Set-up

e8x

0.5

x

1

2

Y = f (X) = e8kXk

e8x

y0

0.5

x0

1

x(1) 0

frequency

150

100

50

0

p = 1, n = 20

x(1)

1

0.5

0.5

Average estimate of y0

frequency

300

200

100

y0

0

Note: True value is y = 1

0.5

p=2

1

x2

x1

1

2

Y = f (X) = e8kXk

p=2

x2

x(1)

x0

x1

1

1-nn estimate of y0

60

frequency

40

20

y0

0

Note: True value is y = 1

0.5

1-nn estimate of y0

frequency

80

60

40

20

0

y0

0

Note: True value is y = 1

0.5

As p increases

average distance to nn

1

average value of y0

0.8

0.6

0.5

0.4

0.2

p

2

10

0

2

10

average distance to nearest neighbour increases rapidly with p

thus average estimate of y0 also rapidly degrades

Bias-Variance Decomposition

For the simulation experiment have a completely deterministic

relationship:

Y = f (X) = e8kXk

= ET [(

y0 ET [

y0 ])2 ] + (ET [

y0 ] f (x0 ))2

= VarT (

y0 ) + Bias2 (

y0 )

1

MSE

Bias2

Variance

MSE

0.5

p

2

10

Why?

As p increases the nearest neighbour is never close to x0 = 0

Hence the estimate y0 tends to 0.

where variance dominates the MSE

The Set-up

1

2 (x

+ 1)3

3

2

1

0

x

1

The relationship between the inputs and output is defined by

1

Y = f (X) = (X1 + 1)3

2

1

2 (x

+ 1)3

3

2

1

y0

0

x0

1

0 x(1)

MSE

Bias2

Variance

MSE

0.2

0.1

p

2

10

Why?

as the deterministic function only involves one dimension the

bias doesnt explode as p increases!

Case 1

.5(x1 + 1)3 +

4

2

0

2

x1

1

Y = .5(X1 + 1)3 + ,

N (0, 1)

Case 2

x1 +

2

0

2

x1

1

Y = X1 + ,

N (0, 1)

EPE

f (x): linear, Pred: 1-nn

2

f (x): cubic, Pred: 1-nn

f (x): cubic, Pred: linear

1.5

p

2

10

EPE

f (x): linear, Pred: 1-nn

2

f (x): cubic, Pred: 1-nn

f (x): cubic, Pred: linear

1.5

p

2

10

linear predictor has a biased estimate of the cubic function

linear predictor fits well even in the presence of noise and high

dimension for the linear f

linear model beats curse of dimensionality

Words of Caution

In previous example linear predictor out-performed the 1-nn

regression function as

But could easily manufacture and example where

bias of linear predictor variance of the 1-nn predictor

There are a whole hosts of models in between the rigid linear

Many are specifically designed to avoid the exponential

Statistical models,

Supervised learning and

Function approximation

Goal

Know there is a function f (x) relating inputs to outputs:

Y f (X)

Want to find an estimate f(x) of f (x) from labelled training

data.

In this case need to incorporate special structure

reduce the bias and variance of the estimates

help combat the curse of dimensionality

y

10

f (x)

x

0

3

random variable indept of input X

Y = f (X) +

output

deterministic relationship

Y = f (X) +

where

the random variable has E[] = 0

is independent of X

f (x) = E[Y |X = x]

any departures from the deterministic relationship are mopped

up by

p(x)

1

0.5

x

0

p(x) = p(G = 1|X = x)

Therefore

E[G|X = x] = p(x) and

Have training data

T = {(x1 , y1 ), . . . , (xn , yn )}

where each xi Rp and yi R.

y

10

x

0

In book Supervised Learning is viewed as a problem in

function approximation.

Common approach

2.6 Statistical Models, Supervised Learning and Function Approximation

31

FIGURE 2.10. Least squares fitting of a function of two inputs. The parameters

minimize of

the sum-of-squared

errors.basis expansion

Decide onof fparametric

f , i.e. vertical

linear

a random sample yi , i = 1, . . . , N fromM

a density Pr (y) indexed by some

parameters . The log-probability of the observed sample is

f (x) =

L() =

N

!

i=1

hm (x) m

log

Pr (yi ).

m=1

(2.33)

Use least The

squares

to estimate in by minimizing

values for are those for which the probability of the observed sample is

largest. Least squares for the additive error model Y = f (X) + , with

n likelihood using the conditional

N (0, 2 ), is equivalent to maximum

likelihood

2

Pr(Y |X, ) = N (f (X),i 2 ).

(2.34)

i

RSS() =

(y f (x ))

of normality seems more restrictive,

i=1

the results are the same. The log-likelihood of the data is

Can find by optimizing other criteria.

Another option is Maximum Likelihood Estimation

For the additive model, Y = f (X) + have

P (Y |X, ) = N (f (X), 2 )

Log-likelihood of the training data is

L() =

=

n

X

i=1

n

X

i=1

log P (Y = yi |X = xi , )

log N (yi ; f (xi ), 2 )

Consider the Residual Sum of Squares for a function f

RSS(f ) =

n

X

(yi f (xi ))2

i=1

f

and

RSS(f) = 0

a solution.

Dont consider and arbitrary function f,

Instead restrict ourselves to f F

f F

Initial ambiguity in choosing f has just been transferred to

choice of constraint.

Dont consider and arbitrary function f,

Instead restrict ourselves to f F

f F

Initial ambiguity in choosing f has just been transferred to

choice of constraint.

Have a parametric representation of f

Linear model: f (x) = 1t x + 0

Quadratic: f (x) = xt x + 1t x + 0

i.e. f must have some regular behaviour in small

neighbourhoods of the input space, but then

What size should the neighbourhood be?

What form should f have in the neighbourhood?

Large neighbourhood = strong constraint

Small neighbourhood = weak constraint

The techniques used to restrict the regression or classification

The techniques used to restrict the regression or classification

Note:

It is assumed we have training examples {(xi , yi )}n

i=1 and

We present the energy functions or functionals which are

ensure f predicts the training values

penalty parameter

PRSS(f, ) =

Pn

i=1 (yi

J(f ) =

[f 00 (x)]2 dx

For wiggly f s this functional will have a large value while for

linear f s it is zero.

Regularization methods express our belief that the f were

Estimate the regression or classification function in a local

neighbourhood.

Need to specify

the nature of local neighbourhood

the class of functions used in local fit

Can define a local regression estimate of f (x0 ), from training

RSS(f , x0 ) =

n

X

i=1

where

Kernel function: K (x0 , xi ) assign weights to xi depending

on its closeness to x0 .

kx0 xk2

1

K (x0 , x) = exp

f is modelled as a linear expansion of basis functions

f (x) =

M

X

m hm (x)

m=1

Linear refers to the actions of the parameters.

f (x) =

M

X

K (m , x) m

m=1

where

Km (m , x) is a symmetric kernel centred at location m .

the Gaussian kernel is a popular kernel to use

If m s and m s pre-defined = estimating a linear

problem.

f (x) =

M

X

t

m (m

x + bm )

m=1

where

= (1 , . . . , M , 1 , . . . , M , b1 , . . . , bm )t

(z) = 1/(1 + ez ) is the activation function.

The directions m and bias terms bm have to be determined

Dictionary methods

Adaptively chosen basis function methods aka dictionary

methods

infinite).

mechanism

the Bias-Variance Trade-off

Many models have a parameter which control its complexity.

We have seen examples of this

k - number of nearest neighbours (nearest neighbour classifier)

- width of the kernel (radial basis functions)

M - number of basis functions (dictionary methods)

- weight of the penalty term (spline fitting)

model affect their predictive behaviour?

y

2

f(x)

f (x)

1.5

x

0

and = .1

y

2

E[f(x)]

1.5

x

0

At each x one std of the estimate is shown. Note its

magnitude.

y

2

1.5

f(x)

f (x)

x

0

and = .1

y

2

E[f(x)]

1.5

x

0

Compare the peak of f (x) and E[f15 (x)] !

Note the variance of estimate is much smaller than when

k = 1.

y

E[f(x)]

E[f(x)]

1.5

1.5

x

0

High complexity: k = 1

x

0

Lower complexity: k = 15

What not to do:

Want to choose model complexity which minimizes test error.

Training error is one estimate of the test error.

Could choose the model complexity that produces the

What not to do:

Want to choose model complexity which minimizes test error.

Training error is one estimate of the test error.

Could choose the model complexity that produces the

Why??

Training error decreases when model complexity

increases

Overfitting

38

Low Bias

High Variance

Prediction Error

High Bias

Low Variance

Test Sample

Training Sample

Low

High

Model Complexity

FIGURE 2.11. Test and training error as a function of model complexity.

be close to f (x0 ). As k grows, the neighbors are further away, and then

Have

a high

variance predictor

anything

can happen.

The variance term is simply the variance of an average here, and decreases as the inverse of k. So as k varies, there is a biasvariance tradeoff.

This More

scenario

is astermed

generally,

the modeloverfitting

complexity of our procedure is increased, the

variance tends to increase and the squared bias tends to decrease. The op-

posite behavior

occurs as the loses

model complexity

is decreased.

For k-nearest

In such

cases predictor

the ability

to generalize

neighbors, the model complexity is controlled by k.

Underfitting

38

Low Bias

High Variance

Prediction Error

High Bias

Low Variance

Test Sample

Training Sample

Low

High

Model Complexity

FIGURE 2.11. Test and training error as a function of model complexity.

be close to f (x0 ). As k grows, the neighbors are further away, and then

Therefore

predictor has poor generalization

The variance term is simply the variance of an average here, and decreases as the inverse of k. So as k varies, there is a biasvariance tradeoff.

More

the modelwill

complexity

of ourhow

procedure

is increased, the

Latter

ongenerally,

in theascourse

discuss

to overcome

these

variance tends to increase and the squared bias tends to decrease. The opproblems.

posite behavior occurs as the model complexity is decreased. For k-nearest

neighbors, the model complexity is controlled by k.

Underfitting

38

Low Bias

High Variance

Prediction Error

High Bias

Low Variance

Test Sample

Training Sample

Low

High

Model Complexity

FIGURE 2.11. Test and training error as a function of model complexity.

be close to f (x0 ). As k grows, the neighbors are further away, and then

Therefore

predictor has poor generalization

The variance term is simply the variance of an average here, and decreases as the inverse of k. So as k varies, there is a biasvariance tradeoff.

More

the modelwill

complexity

of ourhow

procedure

is increased, the

Latter

ongenerally,

in theascourse

discuss

to overcome

these

variance tends to increase and the squared bias tends to decrease. The opproblems.

posite behavior occurs as the model complexity is decreased. For k-nearest

neighbors, the model complexity is controlled by k.

- artificial intelligence_dp2Uploaded byupender_kalwa
- Hamming embedding and weak geometric consistency for large scale image searchUploaded bySerge
- Predicting Type II Censored Data From Factorial Experiments Using Modified Maximum Likelihood PredictorUploaded byapi-3730914
- Regerssion Notes 2Uploaded byxmlsi
- 2014-005Uploaded byTBP_Think_Tank
- Major doc 13 Batch FINAL@@@.docxUploaded byanoop sai
- hoiem_cvpr06Uploaded byHanna Chu
- Maximum LikelihoodUploaded byTewodros Tadesse
- Joint Modeling Li LUploaded byhubik38
- 2011-2 05545372Uploaded byRiyaz Mohammad
- Peg Bayesian[1]Uploaded byKarl Ecce Medicus
- j 25043046Uploaded byAnonymous 7VPPkWS8O
- Handout 2Uploaded byShailendra Rajput
- Binary dataUploaded byShaikh Majid
- A method for disambiguating word senses in a large corpus.pdfUploaded bytamthientai
- A 01100104Uploaded byInternational Journal of Engineering Inventions (IJEI)
- Factor Analysis II Dynamic Factor ModelsUploaded byRoberto Santolin
- Homework NonprmUploaded byAlireza Zahra
- Analysis and Optimization of a Butterfly Valve DiscUploaded byAsiff Razif
- RefUploaded byReddy Sumanth
- ps3Uploaded byAchuthan Sekar
- 1Uploaded bySitiSaodah
- Theoretical Bioinformatics and Machine LearningUploaded byRajesh Kumar
- ICPRAM_2017_40Uploaded byhferdinando
- i j Cv 2008 Objects in PerspectiveUploaded bySumit Das
- Modeling of Household Motorcycle Ownership Behavior in Hanoi CityUploaded byipeh
- Domingos-SVM-NN-CBR.pdfUploaded byTskdjj
- Proceedings 01 00681Uploaded byismail jabbar
- Goodness-Of-fit Tests for Symmetric Stable Distributions - Empirical Characteristic Function ApproachUploaded byShafayat Abrar
- STA3030F - Jan 2015.pdfUploaded byAmahle Konono

- Measures of AssociationUploaded bymissinu
- Stata RUploaded bymissinu
- MulticollinearityUploaded bymissinu
- Exit Questionnaire 2005 FinalUploaded bymissinu
- GRE VocabularyUploaded byKoksiong Poon
- Regulations Livestock in VN SummaryUploaded bymissinu
- Economic Accounts 2005 FinalUploaded bymissinu
- Lecture 3 - Linear Methods for ClassificationUploaded bymissinu
- Poe 4 FormulasUploaded bymissinu
- Vocabulary IELTS Speaking Theo TopicUploaded byBBBBBBB
- Lecture 4 - Basis Expansion and RegularizationUploaded bymissinu
- Lecture 2 - Some Course AdminUploaded bymissinu
- Active Models Human ResourcesUploaded bymissinu
- Quantitative Analysis v1Uploaded bymissinu
- Exam4135 2004 SolutionsUploaded bymissinu
- Exam ECON301B 2002 CommentedUploaded bymissinu
- Manure Estimates.pdfUploaded bymissinu
- ABDC Journal Quality List 2013Uploaded bymissinu
- Balabolka SampleUploaded bymissinu
- AnkiUploaded bymissinu
- Errata of Introduction to Probability 2ndUploaded bymissinu
- Midterm Microeconomics 1 2012-13Uploaded bymissinu
- Bai Tap Giai Tich 2 Chuong 2Uploaded bymissinu
- Introduction to Microeconomic Theory and GE Theory (2015)Uploaded bymissinu
- 08 Chi So Gia NGTT2010Uploaded byThanh Nga Võ
- Designing 20an 20Effective 20QuestionnaireUploaded byec16043
- Jql SubjectUploaded byrikudomessi

- Stats Lecture 07 Confidence IntervalsUploaded byKatherine Sauer
- a2.w09Uploaded byKevin Yau
- Forecasting Temperatures in Bangladesh: An Application of SARIMA ModelsUploaded byPremier Publishers
- Introduction to StatisticsUploaded byAhmed Kadem Arab
- Robust Principal Component Analysis and Outlier Detection With Ecological DataUploaded byYasmine M. Abdelfattah
- Mat 2377 Final Spring 2011Uploaded byDavid Lin
- ACMRZ CHES Party PlacementsUploaded bydorthy
- 03 Probability Distribution.docUploaded byLKS-388
- Hasil Analisis ALLL.docUploaded byrezky_jbt
- Variance - .pdfUploaded byMithun Prayag
- Toleranties-rubberartikelen-algemeen.pdfUploaded byJason-chung
- 1.1 Statistical AnalysisUploaded bydiahema
- ProblemSet01.pdfUploaded byMashiat Mutmainnah
- A study of Q chart for short run.pdfUploaded byEd
- Testing of HypothesisUploaded byravi_124a
- 08 2012 May.pdfUploaded byLual Malueth Kucdit
- Nola13 BaumUploaded byclau_botoc
- Post-Test.pdfUploaded byVicki Morgan
- Single Index Model.docUploaded byBikram Maharjan
- Chapter 6 AssignmentUploaded byadee_uson
- NSUChapter 13Uploaded byRakib_234
- Pcord_exemple Envirenemment AnalysisUploaded byقادة قناية
- Complete Statics Tools in ExcelUploaded byPreethi Prith
- Malhotra MR6e 07Uploaded bySajjad Pirzada
- Business Statistics-S2(2012)-3009L3Uploaded byb3nzy
- Chapter 1 Part 7Uploaded byJohn Good
- Classes of nonseparable, spatio-temporal stationary covariance functions - Noel Cressie & Hsing Huang.pdfUploaded byDilson David Ramirez Forero
- Length Weight Relationship and Condition Factor of Puntius Sophore (Hamilton) From Kolkata and Suburban Fish MarketUploaded byDR. BIJAY KALI MAHAPATRA
- AOB-qUploaded byusteryj
- Sampling Methods assignmentUploaded byShahrukh Malik