QM 1 Uni melb notes

© All Rights Reserved

0 views

QM 1 Uni melb notes

© All Rights Reserved

- sjpdf
- Curso - Six Sigma, Minitab
- EXERCICES.doc
- Final Questions for Icfc Elims[1]
- Common Misconceptions About Data Analysis and Statistics
- hypothesis testing
- ReviewForTest3Answers Population Mean
- Measuring Security Price Performance
- Biology Lab Report 2
- Test 1 Prep
- 26321-28991teewqf-1-PB
- _Chapter 12 Tests of Goodness of Fit and Independence
- EFFECTIVENESS_OF_THE_TEACHER-MADE_SCIENC.doc
- Misconceptions of R2
- 366
- Cases
- PMBSB Sample Q&A
- Aligning Statistical and Scientific Reasoning
- In Stat Guide
- Paninimania.pdf

You are on page 1of 18

Downloaded by karl hemming (mkwlh1000@gmail.com)

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

Lecture One:

Inferential statistics involves constructing estimates of these characteristics, and testing hypotheses

about the world, based on sets of data.

Modelling and analysis combines these to build models that represent relationships and trends in

reality in a systematic way.

Types of Data:

Numerical or quantitative data are real numbers with specific numerical values.

Nominal or qualitative data are non-numerical data sorted into categories on the basis of qualitative

attributes.

Ordinal or ranked data are nominal data that can be ranked.

- The population is the complete set of data that we seek to obtain information about

- The sample is a part of the population that is selected (or sampled) in some way using a

sampling frame

- A characteristic of a population is call a parameter

- A characteristic of a sample is called a statistic

- The difference between our estimate and the true (usually unknown) parameter is the

sampling error

- In a random sample, all population members have an equal chance of being sampled

- individual observations that are similar to the other observations in that strata

- different characteristics from other strata in the population

- stratified sampling can improve accuracy

- may be more costly

In a population, a perfect cluster would be a group with:

- individual observations that are different from the other observations in that cluster

- similar characteristics to other clusters in the population

- can reduce costs

- may be less accurate

This is the cost/accuracy trade off

Lecture Two:

- collected from (across) a number of different entities (such as individuals, households, firms,

regions or countries) at a particular point in time

- usually a random sample (but not always)

- not able to be arranged in any “natural” order (we can sort or rank the data into any order

we choose)

- often (but not only) usefully presented with histograms

- collected over time on one particular ‘entity’

- data with observations which are likely to depend on what has happened in the past

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

- often (but not only) presented as line charts

Lecture Three:

Measures of Centre:

𝚺𝒙

Mean/Average: population: µ sample: x̄ =

𝒏

- easy to calculate

- sensitive to extreme observations

Median: middle number, or average of two middle numbers

- not sensitive to extreme observations

Mode: most frequently occurring number

- only used for finding most common outcome

x̄ – µ = sampling error

- Symmetrical if mean = median = mode

- Right-skewed if mean > median > mode

- Left-skewed if mode > median > mean

Measures of Variation:

Population variance measures an average of the squared deviations between each observation and

1

the population mean: σ2 = ∑(𝑥1 − 𝜇)2

𝑁

1

Population standard deviation is the square root of population variance: σ = √𝑁 ∑(𝑥1 − 𝜇)2

Sample variance measures the average of the squared deviations between each observation and the

1

sample mean: s2: 𝑛−1 ∑(𝑥1 − x̄ )2

1

Sample standard deviation is the square root of sample variance: s = √𝑛−1 ∑(𝑥1 − x̄ )2

Coefficient of variation measures the variation in a sample (given by its standard deviation) relative

to that sample’s mean, it is expressed as a percentage to provide a unit-free measurement, letting us

𝑠

compare difference samples: CV = 100 × %

x̄

Lecture Four:

Measures of Association:

Covariance measures the co-variation between two sets of observations.

With a population size N having observations (xi, yi), (x2, y2), (xN, yN) etc. and having μx, μy, being the

respective means of the xi and yi terms, covariance is calculated as,

𝑁

1

𝐶𝑂𝑉(𝑋, 𝑌) = ∑(𝑥𝑖 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 )

𝑁

𝑖−1

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

If we have a sample of size n, with sample means 𝑥̅ and 𝑦̅, the covariance is calculated as

𝑛

1

𝑐𝑜𝑣(𝑥, 𝑦) = ∑(𝑥𝑖 − 𝑥̅ )(𝑦 − 𝑦̅)

𝑁

𝑖−1

Problems with covariance: it is difficult to interpret the strength of a relationship because covariance

is sensitive to units.

Correlation gives us a measure of association which is not affected by units.

𝑐𝑜𝑣(𝑥,𝑦)

Sample correlation coefficient: 𝑟 = 𝑠𝑥 𝑠𝑦

, sx and sy are sample standard deviations

𝐶𝑂𝑉(𝑋,𝑌)

Population correlation coefficient: 𝜌 = 𝜎𝑥 𝜎𝑦

, σx and σy are population standard deviations

- r=-1, perfect negative linear relationship

- r=0, no linear relationship

Lecture Five:

A random experiment is a procedure that generates outcomes that are not known with certainty

until observed.

A random variable (RV) is a variable with a value that is determined by the outcome of an

experiment.

A discrete random variable has a countable number (K) of possible outcomes with each having a

specific probability associated with each.

Univariate data has one random variable.

Bivariate data has two random variables.

If X is a random variable with K possible outcomes, then an individual value of X is written as xi,

i=1,2,3…K

The probability of observing X is written as P(X=xi) or p(xi) where

- 0 ≤ 𝑝(𝑥𝑖 ) ≤ 1

- ∑ 𝑝( (𝑥𝑖 ) = 1

- That is, all probabilities must lie between 0 and 1 and all added together equal 1 in total

Expected Value/Mean of a random variable: is the value of x one would expect to get on average

over a large/infinite number of repeated trials: µx = E(X) = ∑ 𝑥𝑖 𝑝(𝑥𝑖 )

Variance of a random variable: is the probability-weighted average of all squared deviations

between each possible outcome with the expected value: σ2 = V(X) = ∑(𝑥𝑖 − µ𝑥 )2 𝑝(𝑥𝑖 ) or

∑ 𝑥𝑖 2 𝑝(𝑥𝑖 ) − µ𝑥 2

Lecture Six:

- E(a) = a V(a) = 0

- E(aX) = aE(X) V(aX) = a2V(X)

- E(a + x) = a + E(X) V(a + X) = V(X)

- E(a + bX) = a + b(X) V(a + bX) = b2V(X)

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

Binomial Distribution:

- Each experiment is independent

- There are n trials each with two possible outcomes, success = p or failure = q

𝑛!

The number of successes is calculated by; P(X = x) = 𝑥!(𝑛−𝑥)! 𝑝 𝑥 (1 − 𝑝)(𝑛−𝑥)

Binomial distributions are written as, X~b(n,p), n=number of trials, p=probability of success.

Lecture Seven:

Continuous random variable has an uncountable number of values within an interval of two points

Normal Distribution:

Normal distribution is bell-shaped and symmetrical.

The total area under the curve is equal to 1.

𝑋−𝜇

Any X value can be standardised; 𝑍 = 𝜎

The standard normal distribution has a mean of 0 and a standard deviation and variance of 1.

The z score shows the number of standard deviations the corresponding observation of x lies away

from the population mean.

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

Lecture Nine:

If we take repeated samples of size n from a population X and record 𝑥̅ for each, the collection of 𝑥̅

can be represented as a random variable 𝑋̅ with its own distribution. This is called a sampling

distribution.

The mean of the sampling distribution is 𝜇𝑋̅

The standard deviation of the sampling distribution is 𝜎𝑋̅ , known as the standard error.

- The standard error is less than standard deviation: 𝜎𝑋̅ < 𝜎

𝜎2

- 𝑉(𝑋̅) = 𝜎𝑋2̅ = 𝑛

𝜎

- √𝑉(𝑋̅) = 𝜎𝑋̅ =

√𝑛

If repeated samples are taken from X (n>30) the sampling distribution 𝑋̅ will be approximately

normal, the larger n is, the more accurate this approximation is.

Standardising 𝑋̅:

𝑋̅ − 𝜇𝑋̅ 𝑋̅ − 𝜇

𝑍= =

𝜎𝑋̅ 𝜎/√𝑛

Lecture Ten:

- 𝜇 = 𝑛𝑝

- 𝜎 2 = 𝑛𝑝𝑞

Population proportion refers to the number of times a specific outcome X occurs within a

𝑋

population: 𝑝 = 𝑁

Samples proportion refers to the number of times a specific outcome X occurs within a sample:

𝑋

𝜌̂ = 𝑛

𝑋

- 𝜇𝑝̂ = 𝐸 (𝑛 ) = 𝑝

𝑋 𝑝𝑞

- 𝜎𝑝2̂ = 𝑉 ( ) =

𝑛 𝑛

𝑋

We can approximate the distribution of 𝜌̂ = by a normal distribution with a mean p and variance

𝑛

𝑝𝑞 𝑝𝑞

pq/n: 𝑝̂ ≈ 𝑁(𝑝, ) with a standard error, √ 𝑛 if n𝜌̂ and n𝑞̂ are both ≥ 5

𝑛

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

Lecture Eleven:

Before a sample is drawn from a population, a sample statistic is a random variable.

Once the sample is drawn, the statistic becomes a constant and is no longer random.

Random variables have probability distributions.

The distribution of a statistic is called a sampling distribution.

Exact sampling distributions will depend upon the distribution of the populations from which the

sample is drawn i.e. a normal population will produce a normal sample distribution, however if the

sample size is >30, we can use the CLT to approximate normality.

̅

Estimator: ĉ = min ∑ni=1(X i − c)2 , ieĉ = X

Estimator: 𝑐̂ = 𝑚𝑖𝑛 ∑|𝑋𝑖 − 𝑐|, 𝑖𝑒 𝑐̂ = 𝑚𝑑 (the median)

Principles of Estimation:

2𝜎

Consistency: Probability of estimator being wrong goes to zero as sample size gets big; 𝑉(𝑋̅) = 𝑛 →

0 𝑎𝑠 𝑛 →∝

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

We can construct point estimators to makes a specific guess of the parameter value or interval

estimators to guess a range of value in which the parameter may lie.

Samples statistics such as mean, median, variance etc. are all examples of point estimates of

population parameters.

Confidence Intervals:

Rather than finding the probability content of a given interval, confidence intervals find the interval

on the basis of sample data with a given probability content. These intervals can then be used to

guess the location of population parameters.

This means in repeated samples of size n, the probability of a randomly chosen interval covering the

true populations mean, μ is p.

The interval is random because 𝑋̅ varies from sample to sample, therefore the interval is random but

μ is not.

Interpretation of the confidence interval is that we are x% confident the interval covers the mean.

𝜎

The confidence interval estimator is 𝑋̅ ± 𝑧𝛼 ( 𝑛)

2 √

- As the level of confidence increases, the z score becomes more extreme and the interval

widens

- As the population standard deviation increases, the standard error increases and the interval

widens

- As the sample size increases, the standard error decreases so the interval narrows

Lecture Twelve:

If we do not know μ or σ2 we should replace σ2 with s2 and build our confidence intervals using t-

𝑋̅−𝜇

values; 𝑡 = 𝑠 using n-1 degrees of freedom

( )

√𝑛

If the table does not give the degrees of freedom you want, approximate, and write a note of why,

explaining the approximation used.

𝑠

Confidence interval estimator: 𝑋̅ ± 𝑡𝛼,𝑑𝑓 ( 𝑛) this can only be used if 𝑋̅~𝑁

2 √

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

- A the level of confidence increases, the t score becomes more extreme so the interval

widens

- As the sample standard deviation increases, the standard error increases so the interval

widens

- As the sample size increases, the interval narrows

Let 𝜃̂be the estimator of a parameter 𝜃 with standard error of 𝜃̂ being 𝑠𝜃̂

A (1-α) 100% confident interval for 𝜃 is [𝜃̂ − 𝑐𝑎 𝑠𝜃̂ , 𝜃̂ + 𝑐1−𝑎 𝑠𝜃̂ ] where 𝑐1−𝑎 cuts off an upper tail

2 2 2

probability of α/2.

𝑝̂ 𝑞̂

To find a confidence interval estimation of the proportion we use; 𝑝̂ ± 𝑧𝛼 (√ 𝑛

)

2

- As the level of confidence increases, the z scores become more extreme, widening the

interval

- As 𝑝̂ and 𝑞̂ approach 0.5, the standard error increases so the interval widens

- As the sample size increases, the standard error decreases, so the interval narrows

𝜎

Margin of error is half the width of an interval estimate, which is equal to 𝑧𝑎 ( )

𝑛 2 √

Any specified maximum allowable margin of error is called the error bound, B.

2

𝜎 2 𝜎

𝐵 = 𝑧𝑎 ( ) → 𝐵2 = 𝑧𝑎/2

2 √𝑛 𝑛

𝑧𝑎 𝜎

2 𝜎2

To find the sample size needed for a specified error bound we calculate; 𝑛 = 𝑧𝑎/2 𝐵2

=( 2

𝐵

)2

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

𝑝̂𝑞̂

For proportions, error bound; 𝐵 = 𝑧𝑎 (√ 𝑛 ) however to find n, we set 𝑝̂ = 0.5 which gives a

2

𝑧𝑎 ̂ ̂

√𝑝𝑞

conservative, wide interval estimate and then solve the equation 𝑛 = ( 2

𝐵

)2

Lecture Thirteen:

- Estimators are statistics that are random variables before the samples is drawn

- We use estimators to guess unknown parameter values

- A point estimator gives a single guess of a parameter

- A confidence interval gives a range of parameter values that are consistent with observed

sample data

The null hypothesis is usually an assertion about a specific value of the parameter and always has the

‘=’ sign.

The null is assumed true unless the evidence in the data supports the notion that it is not true.

The alternative hypothesis is the maintained hypothesis, where true lies if the null is not true.

A Type II error occurs when we do not reject H0 when H0 is false.

Values of Z that are evidence in support of H0 are in the acceptance region. All other values of Z are

in the rejection region.

Level of significance is the probability of the test statistic falling into the rejection region given H0 is

true, it is α.

The values of the test statistic that lie on the boundary between the acceptance and rejection

regions are called critical values.

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

Method:

2. Nominate test statistic (z, t etc.) and state its distribution under the null

3. State the decision rule

4. Calculate the test statistic

5. Interpret

Eg. At 5% level of significance, there is sufficient evidence in this sample to reject the null hypothesis

that the average width of paper produced by this machine has not changed.

Lecture 14:

Because of this, we have two options when hypothesis testing:

1. Replace p with 𝑝̂

2. Replace p with 𝑝0

One-tailed tests:

If the hypothesis involves < or > then we have a one-tailed tests, the method is the same, however

the shape of the rejection region is different.

With α=0.05 in a one tailed test in the lower tail, the critical value is −𝑧𝑎 = −𝑧0.05 = −1.645

The p-value is the marginal level of significance, this means any larger level of significance than the

p-value will put the calculated test statistic in the rejection region and any smaller level of

significance than the p-value will put the calculated test statistic in the acceptance region

The p-value gives the probability of observing a statistic as extreme as the observed value of the test

statistic given that the null hypothesis is true.

For z-statistics:

- For one-tailed tests in the upper tail, the p-value is P(Z>z|H0 is true)

- For one-tailed tests in the lower tail, the p-value is P(Z<z|H0 is true)

- For a two-tailed test, if z is positive, the p-value is 2P(Z>z|H0 is true) and if it is negative the

p-value is 2P(Z<-z|H0 is true) this can also be written as the p-value being 2(Z> |z||H0 is true)

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

The more extreme the p-value, the more we believe a hypothesis (null or alternative) therefore a

very small p-value suggest the data is unlikely to have come from the population described in the

null, and a big p-value means the observed statistic is very likely to be observed if the null is true

Power of a test:

Therefore β is the probability of making an incorrect decision when H0 is false.

The correct decision when H0 is false is to reject it, the probability of making the correct decision is

P(Rejecting H0 | H0 is false) = 1 – β

1 – β is called the power of the test, and among tests of a given size, we want the ones with the most

power test as these give the greatest probability of making the correct decision.

There is however a trade-off between size and power. If we drive size to 0 so we never make a Type I

error by always accepting the null, we never reject a false null so our power is 0.

Similarly, we can have a power of unity by always rejecting the null, but then we always incorrectly

reject true nulls so our size goes to unity.

Lecture Fifteen:

Independent samples: two samples with observations which are unrelated to each other

- The samples can have different sizes

- Samples from one population have no effect or influence on the sample from the other

population

- If two random variables are independent, COV[X,Y]=0

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

𝜎2

If 𝑋~𝑁 or n is large, 𝑋~𝑁(𝜇, 𝑛

)

If X1 and X2 are both random variables and independent samples are taken from each population,

the sample means will be independent random variables;

- 𝐸[𝑋̅1 − 𝑋̅2 ] = 𝜇1 − 𝜇2

𝜎 𝜎 2 2

- 𝑉[𝑋̅1 − 𝑋̅2 ] = 𝑛1 + 𝑛2

1 2

2 2

𝜎 𝜎

- (𝑋̅1 − 𝑋̅2 )~𝑁[(𝜇1 − 𝜇2 ), (𝑛1 + 𝑛2 )]

1 2

𝜎 𝜎 2 2

(𝑋̅1 − 𝑋̅2 ) ± 𝑧𝑎/2 √(𝑛1 + 2 ) and we can test hypothesis about(𝜇1 − 𝜇2 ) by using;

𝑛 1 2

(𝑋̅1 −𝑋̅2 )−(𝜇1 −𝜇2 )

𝑧=

𝜎 2𝜎 2

√( 1 + 2 )

𝑛1 𝑛2

𝑝𝑞

If np≥5 and nq≥5, 𝑝̂ ~𝑁(𝐸[𝑝̂ ], 𝑛

)

- 𝐸[𝑝̂1 − 𝑝̂2 ] = 𝑝1 − 𝑝2

𝑝 𝑞 𝑝 𝑞

- 𝑉[𝑝̂1 − 𝑝̂2 ] = 𝑛1 1 + 𝑛2 2

1 2

𝑝1 𝑞1 𝑝2 𝑞2

If 𝑛1 𝑝̂1 ≥ 5 and 𝑛2 𝑝̂2 ≥ 5 then; (𝑝̂1 − 𝑝̂2 )~𝑁[(𝑝1 − 𝑝2 ), ( 𝑛1

+ 𝑛2

)]

Then we can construct interval estimates of (𝑝̂1 − 𝑝̂ 2 ) using; (𝑝̂1 − 𝑝̂2 ) ± 𝑧𝑎/2 √( + ) and

𝑛1 𝑛2

(𝑝̂1 −𝑝̂2 ) −(𝑝1 −𝑝2 )

test hypothesis using 𝑧 = ̂ 𝑞

𝑝

̂ 𝑞

𝑝 ̂

√( 1 1 + 2 2̂ )

𝑛1 𝑛2

Observed frequency is the number of times a particular observation occurred relative to the total

number of observations

Expected frequency is the number of times we would expect that observation to occur if the null

hypothesis were true

For nominal data we use the chi-squared distribution: we sum together the squared differences

between the observed and expected frequencies of k outcomes relative to their expected

(𝑜−𝑒)2

frequencies: 𝜒 2 = ∑ 𝑒

if the observed values are close to expected, the test statistic will be small

With univariate data with k outcomes, the test statistic has degrees of freedom = k-1

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

2

The critical value for hypothesis testing will be 𝜒𝑎,𝑘−1

2

We reject H0 if 𝜒 2 > 𝜒𝑎,𝑘−1

Interpretation: We conclude there is/is not sufficient evidence in this sample to reject the null

hypothesis in favour of the alternative hypothesis that the distribution of variable is different from

the distribution of other variable at the x% level of significance.

We can use the previous method to test whether a univariate distribution first the shape we would

expect if it matched the hypothesised distribution where we have assumed the data came from a

normally distributed population – this is a goodness of fit test

Here the H0: 𝑋~𝑁(𝜇, 𝜎 2 ) and HA: X is not distributed as hypothesised in the null

We continue to use the previous method, however as this is bivariate data, we use degrees of

freedom = (r-1)(c-1) i.e. (the number of rows -1) times (the number of columns – 1) for example with

four rows and three columns we would have df=(4-1)(2-1)=3*2=6

Here, H0 is that the two variables are independent and HA is that the two variables are not

independent

If H0 is true, then P(A)=P(AUB) therefore if the null were true, the probability of one variable would

equal the probability of the union of the two for one situation

This means we can calculate the expected frequencies of the data, then we can calculate 𝜒 2 =

(𝑜−𝑒)2 2

∑ and compare it to the critical value. We reject H0 if 𝜒 2 > 𝜒𝑎,𝑘−1

𝑒

1

Covariance: 𝐶𝑂𝑉(𝑋, 𝑌) = 𝑛−1 ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)

𝐶𝑂𝑉(𝑋,𝑌)

Correlation: 𝑟 = 𝑠𝑋 𝑠𝑌

Correlation can be between, [-1, 1] with -1 being a perfectly linear, negative relationship, 1 being a

perfectly linear, positive relationship and 0 being no linear relationship at all.

This term shows that each observation of Y has a random distribution of values for each given value

of X. If 𝜀 is normally distributed, E[𝜀|x]=0 so 𝑦 = 𝛽0 + 𝛽1 𝑥1

We assume errors (𝜀) are random and normally distributed with a mean of zero, so we can construct

estimates of 𝛽0 and 𝛽1 using sample data. Therefore 𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥1 and the residual, e, is 𝑦 − 𝑦̂

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

We want the residual for each observation to be as small as possible so the estimated line is more

likely to fit the real one so we aim to minimise the sum of the squared residuals, ∑ 𝑒 2 .

To find the values of 𝛽0 and 𝛽1 that will minimise ∑ 𝑒 2 we differentiate ∑ 𝑒 2 with respect to each

coefficient and set the results equal to zero.

𝜕 ∑ 𝑒2

= −2 ∑(𝑦 − 𝛽̂0 − 𝛽̂1 𝑥) = 0 → 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥

𝜕𝛽̂0

𝜕 ∑ 𝑒2 ∑ 𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 𝑆𝑆𝑥𝑦

= −2 ∑ 𝑥(𝑦 − 𝛽̂0 − 𝛽̂1 𝑥) = 0 → 𝛽̂1 = =

𝜕𝛽̂1 ∑ 𝑥 2 − 𝑛𝑥̅ 2 𝑆𝑆𝑥

1

Where 𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)

𝑛

OLS estimators 𝛽̂0 and 𝛽̂1 give point estimates of 𝛽0 and 𝛽1 , this method will find the best line of fit

possible for the data.

2. It implies the sum of the residuals is zero

3. It must pass through the means of x and y as these points satisfy the equation 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅

The estimated intercept coefficient 𝛽̂0 is interpreted as; when the independent variable is equal to 0,

the dependent variable is estimated to be 𝛽̂0 on average.

The estimated slope coefficient 𝛽̂1 is interpreted as; for each increase in 1 of the independent

variable, the dependent variable increases (or decreases) by 𝛽̂1 on average.

∑ 𝑒2 ∑(𝑦−𝑦̂)2

With the assumption 𝜀|𝑥~𝑁(0, 𝜎𝜀2 ) we estimate the value of 𝜎𝜀 as; 𝑠𝜀 = √ =√

𝑛−2 𝑛−2

The better the fit of a regression line, the smaller the scatter of observations around the line and

hence the smaller the standard error of the estimate.

Goodness of fit:

The deviation of y from its mean is comprised of two deviations, the explained and unexplained.

R2 measures the proportion of variation in the dependent variable which can be explained by the

variation in the independent variable.

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

Lecture Nineteen:

Regression assumptions:

- A linear relationship exists between x and y in the population and the data can be modelled

as 𝑦𝑖 = 𝛽0 − 𝛽1 𝑥𝑖 + 𝜀𝑖 , 𝑖 = 1,2,3 … 𝑛

- 𝐸[𝜀𝑖 |𝑥𝑖 ] = 0 →𝐸[𝑦𝑖 |𝑥𝑖 ] = 𝛽0 + 𝛽1 𝑥𝑖

- 𝑉[𝜀𝑖 |𝑥𝑖 ] = 𝜎𝜀2

- 𝐶𝑜𝑣[𝜀𝑖 , 𝜀𝑗 ] = 0, 𝑖 ≠ 𝑗

Given the assumptions and CLT, 𝛽̂0 ~𝑁(𝛽0 , 𝜎𝛽̂2 ) and 𝛽̂1 ~𝑁(𝛽1 , 𝜎𝛽̂2 )

0 1

̂0 −𝛽0

𝛽 ̂1 −𝛽1

𝛽

From this, we can see that 𝜎𝛽2 ~𝑁(0,1) and 𝜎𝛽2 ~𝑁(0,1)

̂

0 ̂ 1

1 2

Since 𝜎𝛽̂2 and 𝜎𝛽̂2 are unknown, they can be estimated using 𝑠𝜀2 = 𝑛−2 ∑(𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 ) and we

0 1

2

0 1

̂0 −𝛽0

𝛽 ̂1 −𝛽1

𝛽

For this, we then use ~𝑡𝑛−2 and ~𝑡𝑛−2

𝑠𝛽

̂ 𝑠𝛽

̂

0 1

𝜎2

𝜎𝛽̂2 = ∑(𝑥 −𝑥̅

𝜀

)2

- bigger when 𝜎𝜀2 gets bigger i.e. the data gets noisier

1 𝑖

- smaller as n increases

- smaller the greater the variability in x

2

The smaller 𝜎𝛽̂ is, the more precise inferences about 𝛽1 are.

1

Confidence intervals:

2 2

Hypothesis testing:

̂𝑗 −𝛽 ∗

𝛽 𝑗

To test a null of the form H0:βj=βj* against an alternative, the test statistic is with either tn-2 or z

𝑠𝛽

̂

𝑗

The estimated regression gives us 𝑦̂ which is our predicted/estimated value of y, E[y|x]

To do this we substitute the y value wanted in our OLS estimate. It measures the y value on average.

Extrapolation: finding a y value for given x outside of the range. The further from the range, the less

accurate our prediction because while our model may be a good fit for our data, the unknown areas

may not fit as well, or not even be linear.

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

1 (𝑥𝑔 −𝑥̅ )2 1 ∑ 𝑒2

ŷ𝑔 ± 𝑡𝑎,𝑛−2 × 𝑠𝑒(𝑦̂𝑔 ), 𝑠𝑒(𝑦̂𝑔 ) = 𝑠𝜀 √𝑛 + 𝑆𝑆𝑥

, 𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑦 − 𝑛 (∑ 𝑥)(∑ 𝑦), 𝑠𝜀 = √𝑛−2

2

This gives the range of values which would cover the average (dependent variable) with a given

independent variable at an x% level of confidence.

1 (𝑥𝑔 − 𝑥̅ )2

ŷ𝑔 ± 𝑡𝑎,𝑛−2 × 𝑠𝑒(𝑦𝑔 − 𝑦̂𝑔 ), √1 + +

2 𝑛 𝑆𝑆𝑥

This gives the range of values we would expect to cover a single (dependent variable) with a given

independent variable at an x% level of confidence.

Proportionate change is (y1-y2)/y2 with percentage change being this multiplied by 100

From calculus, we can determine that proportionate change can be defined using loge, ln

This means we can use ln(𝑦̂) = 𝛽̂0 + 𝛽̂1 𝑥1 to represent a measure of proportionate change in y for a

unit change in x with 100𝛽̂1 being a measure of percentage change in y for a unit change in x.

Similarly, 𝑦̂ = 𝛽̂0 + 𝛽̂1 ln(𝑥1 ) is a measure of change in y for a proportionate change in x with

𝛽̂1 /100 being a measure of change in y for a percentage change in x.

From this, we can see that ln(𝑦̂) = 𝛽̂0 + 𝛽̂1 ln(𝑥1 ) represents a proportionate change in y for a

proportionate change in x which measures elasticity.

Lecture Twenty-one:

Many relationships theorised may be functions of more than one independent variable so we use

multiple regression as a technique to test these theories.

We can either add another independent variable as having its own effect on the dependent variable,

or we can test the relationship of two independent variables’ effects on the dependent variable.

For hypothesis testing, we take 1 degree of freedom for each independent variable, i.e. in single

variable regression we had n-2 degrees of freedom because there was only one independent

variable. In general, degrees of freedom for slope coefficients in regression analysis is n-k-1 where k

is the number of independent variables.

between two or more independent variables in a regression, then we have multicollinearity then if

the relationship is perfect, OLS estimates cannot be calculated, and if the relationships are

imperfect, inferences based on OLS estimates will be imprecise.

We test for multicollinearity by checking the correlation coefficients between each pair of

independent variables.

lOMoARcPSD|3666572

Semester 1, 2013 LECTURE REVISION NOTES]

Dummy variables are binary values incorporated into a regression as either a separate term, or an

effect on another independent variable.

If z=0 there is no effect on the regression, showing the dummy variable does not affect the

dependent variable, however if it is 1 then 𝛽̂2 will be the extra amount added to the dependent

variable, meaning the intercept of the estimated regression will be (𝛽̂0 + 𝛽̂2 ).

Hypothesis testing:

If we want to know whether the effect of the dummy variable is statistically significant or not at x%

̂2 −𝛽2

𝛽

level of significance, we test H0: β2=0 or not using 𝑡𝑎,𝑛−𝑘−1 with a test statistic of rejecting if

2 𝑠𝛽

̂

2

the test statistic is < -𝑡𝑎,𝑛−𝑘−1 or >𝑡𝑎,𝑛−𝑘−1

2 2

Interpretation: This will conclude that there is/is not sufficient evidence in the sample to suggest

that the dummy variable has an effect on the dependent variable at an x% level of significance.

This measures whether or not the dummy variable has an effect of the slope of a regression. If z=0

there is no effect on the regression, showing the dummy variable does not affect the dependent

variable, however if it is 1 then 𝛽̂2 will be the extra amount added to the dependent variable, per

increase in xi, meaning the slope of the estimated regression will be (𝛽̂1 + 𝛽̂2 )𝑥𝑖 .

We can also test a dummy variable’s effect on slope and intercept by adding the two together, or

test numerous variables by adding more 𝛽̂s.

- sjpdfUploaded byhubik38
- Curso - Six Sigma, MinitabUploaded byStefan Belibou
- EXERCICES.docUploaded byabdoul7
- Final Questions for Icfc Elims[1]Uploaded byAlyza C Arellano
- Common Misconceptions About Data Analysis and StatisticsUploaded bytalita_pollyanna
- hypothesis testingUploaded byapi-339611548
- ReviewForTest3Answers Population MeanUploaded byMohamad El Arab
- Measuring Security Price PerformanceUploaded byeka ryan
- Biology Lab Report 2Uploaded byWaiHongLam
- Test 1 PrepUploaded bySharif M Mizanur Rahman
- 26321-28991teewqf-1-PBUploaded bysigirya
- _Chapter 12 Tests of Goodness of Fit and IndependenceUploaded byKennethInui
- EFFECTIVENESS_OF_THE_TEACHER-MADE_SCIENC.docUploaded byMagno Maria Elena
- Misconceptions of R2Uploaded byapi-3744914
- 366Uploaded byTony
- CasesUploaded byAfshin
- PMBSB Sample Q&AUploaded bysivaramankm
- Aligning Statistical and Scientific ReasoningUploaded bycaejt44
- In Stat GuideUploaded byAdrian Chivu
- Paninimania.pdfUploaded bySantiago Ibarra
- Sample Questions for FinalUploaded byUsman Ahmed
- Econo g ArchUploaded byAjay Jaglan
- Comparing ’Ex-Cathedra’ and IT-Supported Teaching Methods and Techniques Policy of Teaching PracticeUploaded byMightyTy
- 2.MalanguArticle in JFCMedJune2008 About Poisoning in UgandaUploaded byGustav
- 7. Novita SariUploaded byIcha Nuri
- Statistical Inference WebUploaded bygogasingh
- Performance ManagementUploaded byasmitha priyanka vanapalli
- 0 SeminarUploaded byMuslim MD Isa
- term project 2Uploaded byapi-253502872
- Analysis of Prevalence of Malaria and Anemia UsingUploaded byNour Al-Rahman

- A Glossary of HSC Key WordsUploaded byDarth Kingofv
- QM1 Handwritten NotesUploaded bykarl hemming
- Economics Syllabus Dot PointsUploaded byDaniel Mooney
- Pre CommunicatesUploaded bykarl hemming
- Edu 2017 01 p Percents 1d6paeUploaded bykarl hemming
- 2017 Fees by TermUploaded bykarl hemming
- Policy 22 Fees PolicyUploaded bykarl hemming
- rawmarkdatabase v1.01Uploaded bynightweaver066
- Prelim HSC ResultsUploaded bykarl hemming
- From Ideas to Implementation - Syllabus Notes - Daniel WilsonUploaded byFrederik Nielssen
- Space - Syllabus Notes - Daniel Wilson.pdfUploaded byJana

- New Syllabi of Stat 1Uploaded byJeff Jeremiah Perea
- Factors to Consider in Writing ImsUploaded byRhynzel Macadaeg Bahiwag
- 7.LearningCCMM.pdfUploaded byArah Marie U. Viterbo
- 330349348-A-Summary-of-Max-Schulman-doc.txtUploaded byRhan Deguzman
- Short Term Memory in Consecutive Interpreting- PaperUploaded byAumnia Jamal
- A Study on Effect of Perceived Academic Achievement on the Level of Inferiority and Insecurity FeelingUploaded byManoranjan Tripathy
- 1 Units and MeasurementsUploaded byMarian Jesica Ballon Ares
- 2008_8702Uploaded byCamila Diaz
- Bayesian Statistics for Beginners a step-by-step approach.pdfUploaded byGeorge Karamalis
- stach9Uploaded bySunil Vatas
- Gilles Deleuze_ Constantine V. Boundas - The Deleuze Reader-Columbia University Press (1993).pdfUploaded byUn Mar De Copas
- Theoritical Sources for a Theory of Safety and SecurityUploaded bydenmas dipo
- Introduction to RM -2Uploaded byDIO
- Rel Lecture2Uploaded byzcap excel
- deltronic pin-gage-plug-gage-guide.pdfUploaded byfreudmind
- [AMALEAKS.BLOGSPOT.COM] Research in Daily Life 2 (RSCH-121) Week 1-20.docxUploaded byKristine Andrea Ang
- LESSON MANUSCRIPT Criminal InvestigationUploaded byRpsmu RegionEight
- 1-physics1Uploaded byMaria Regina Santos
- EJ1066768.pdfUploaded bychristian magdaraog
- HANDOUT1 VariablesUploaded byRosita C. Falible
- Desi Pramudya Wardani (210617094) Analisis Nilai Sosial dan Moral dalam Novel Hujan Karya Tere Liye.docxUploaded bydesy pramudya15
- PR2_2Uploaded byNenita Ganer
- mba301.pdfUploaded bySwati Mohapatra
- RESEARCH DESIGN.pptxUploaded byUIBS Delhi
- COMPARATIVE STUDY TO ASSESS THE EFFECTIVENESS OF OBJECTIVE STRUCTURED CLINICAL EXAMINATION (OSCE) VERSES TRADITIONAL METHOD IN TERMS OF KNOWLEDGE REGARDING ANTENATAL EXAMINATION AMONG B.SC. NURSING 3RD YEAR STUDENTS.Uploaded byAnonymous HXZige
- Teodor Ognean 1809.0446v1Uploaded byarg0naut
- Ivaylo Dagnev Metaphors and Shells in Anatomy - Bulgarian-English Cross-Linguistic Cognitive StudyUploaded byStefan Stefanov
- (Lord & Novick, 2008). Statistical Theories of Mental Test Scores.pdfUploaded byEileen García
- ISO_Guide_35_2017E-Reference_materials_--_Guidance_for_characterization_and_assessment_of_homogeneity_ (1).pdfUploaded byAndres Felipe Porras Rios
- Quantitative Research DesignsUploaded byElle