This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Dr. Gayatri V Singh, PhD, AMITY University, INDIA

• • • • • •

Time-Series Analysis Probability Probability Distribution Sampling Sampling Distribution Hypothesis Testing

Today’s Highlights

**Time Series Analysis
**

Understand

time-series

forecasts

techniques Understand four possible components Understand how to use regression models for trend analysis Understand nature of autocorrelation

• Time series data are composed of four elements Trend Cyclicality Seasonality irregularity • Trend : Long term general direction of data • Cycles: highs and lows through which data move over time periods usually of more than a year • Seasonal: shorter cycles, which usually occur in time periods of less than one year • Irregularity: rapid changes in the data, which occur in even shorter time frames than seasonal effects Stationary: data that contain no trend, cyclical or seasonal effects

• The response variable ‘Y’, is being forecast • The independent variable ‘X’, the time periods • Linear model Yi = β0 + β1Xti + εi Yi = Data value for period i Xti = ith time period

Linear Regression Trend Analysis

a. Checking for dependence

We will NOT assume that Yt-1 is independent of Yt Example: Is tomorrow’s temperature independent of today’s? Suppose y1 ...yT are the temperatures measured daily for several years. Which of the following two predictors would work better:

**i. the average of the temperatures from the previous year ii. the temperature on the previous day?
**

If the readings are iid N(µ ,σ 2), what would be your prediction for YT+ ? 1

**b. Checking for Independence
**

Independence:

Knowing Yt does not help you in predicting Yt+1

It is not always easy just to look at the data and decide whether a time series is independent. So how can we tell? Plot Yt vs. Yt-1 to check for a relationship or Plot Yt vs. Yt-s for s = 1, 2, …

How do we do this in Minitab? – Use the “lag” command

MTB > lag c2 c3 MTB > lag c3 c4

C1 t 1 2 3 4 5 6

C2 Y(t) 5 8 1 3 9 4

Y

C3 Y(t-1) * 5 8 1 3 9

C4 Y(t-2) * * 5 8 1 3

Y lagged twice Now each row has Y at time t, Y one period ago, and Y two periods ago

Y lagged once

**Each point is a pair ofLevel ) years. e.g. (Level , adjacent
**

12 99 13 90

First, let’s plot Levelt vs. Levelt-1 Corr = .794 Now, let’s plot Levelt vs. Levelt-2 Corr = .531

Moving Averages

• A moving average is an average that is updated or recomputed for every new time period being considered. • The most recent information is utilized in each new moving average. • Shown here are shipments (in millions of dollars) for electric lighting and wiring equipment over a 12-month period. Use these data to compute a 4-month moving average for all available months.

c. Autocorrelation

Time series is about dependence. We use correlation as a measure of dependence. Although we have only one variable, we can compute the correlation between Yt and Yt-1 or between Yt and Yt-2 . The correlations between Y’s at different times are called autocorrelations (serial correlation). However, we must assume that all the Y’s have:

– same mean (no upward or downward trends) – same variances

**We will assume what is known as stationarity. Roughly speaking this means:
**

– The time series varies about a fixed mean and has constant variance – The dependence between successive observations does not change over time

**Let’s define the autocorrelations for a stationary time series.
**

ρs = Var ( Yt ) × Var ( Yt −s ) cov ( Yt ,Yt −s ) = cov ( Yt ,Yt −s ) Var ( Yt )

Note that the autocorrelation does not depend on t because we have assumed stationarity

We estimate the theoretical quantities by using sample averages (as always). The estimated or sample autocorrelations are:

rs =

∑(Y − Y)(Y

t t =s T t t =1

T

t −s

− Y)

∑(Y − Y)

2

c. Autocorrelation

There is a strong dependence between observations spaced close together in time (e.g only one or two years apart). As time passes, the dependence diminishes in strength.

**Autocorrelation Function for ran
**

(with 5% significance limits for the autocorrelations) 1.0 0.8 0.6 Autocorrelation 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 2 4 6 8 10 12 14 Lag 16 18 20 22 24

In contrast to the ACF for the ‘level’ series, the sample autocorrelations are much smaller.

How do we know if the sample autocorrelations are good estimates of the underlying theoretical autocorrelations?

and

How do we know if we have enough sample information to reach definitive conclusions?

If all the true autocorrelations are 0, then the standard deviation of the sample autocorrelations is about 1/sqrt(T).

1 Std Err ( rs ) = T

T = Total Number of observations or time periods

Probability

• Structure of Probability Experiment : Process that produces outcome Event : an outcome of an experiment Elementary Events : Events that can not be decomposed or broken down into other events Sample Space : a complete roaster or listing of all elementary events of an experiment Mutually Exclusive Events : if the occurrence of one event prevents the occurrence of other event. Independent Events : If occurrence and nonoccurrence of one event does not affect the occurrence of the other.

Unions (∪ ) and Intersections (∩ ) x = {1,4,7,9} and y = {2,3,4,5,6} x ∪ y = {1,2,3,4,5,6,7,9} x ∩ y = {4} Selecting a thing without replacement method: the possibilities are N Cn = N!/n!*(N-n)! Selecting a thing with replacement method : the possibilities are Nn

No. of favourable outcome Probability = Total possible No. of outcome

Probabilities

• Addition Theorem P(X ∪ Y) = P(X) + P(Y) – P(X ∩ Y) What is the probability of getting an ace or red card from a pack of cards? P(X) = 4/52 P(Y) = 26/52 P(X ∩ Y) = 1/52 P(X ∪ Y) = 4/52 +26/52 – 1/52 = 29/52

• Multiplicative Theorem P(X ∩ Y) = P(X).P(Y/X) = P(Y).P(X/Y) X ∩ Y : both the events must happen Since 46% of the labour force is women, P(W) = 0.46. P(T/W) is a probability that a worker is a part-time worker given that the worker is a woman i.e. P(T/W) = 0.25 Probability that labour force are women and work part time = P(W ∩ T) = P(W).P(T/W) = (0.46).(0.25) = 0.115

Probability Distributions

Three days forecast • You have determined that there is 40% probability of rain on each of three days. • What is the probability that it will rain on 0,1, 2 or 3 of the days?

Day 1

Day 2

Day 3

Probability

Days of Rain

p( p( p( p( p( p( p( p(

, , , , , , , ,

, , , , , , , ,

) = 0.216 ) = 0.144 ) = 0.144 ) = 0.096 ) = 0.144 ) = 0.096 ) = 0.096 ) = 0.064

0 1 1 2 1 2 2 3

Probability Distribution

Days of Rain 0 1 2 3 Probability 0.216 0.432 0.288 0.064

DISCRETE PROBABILITY DISTRIBUTION

Scores, x Frequency, f Scores, Probability, P(x) x = f/n 1 2 3 4 5 24 33 42 30 21 1 2 3 4 5 24/150 =0.16 33/150 =0.22 42/150 =0.28 30/150 =0.20 21/150 =0.14

Total (n) = 150

• Mean (µ ) = Σ x P(x) • Variance (σ 2) = Σ (x - µ )2 P(x) = [Σ x2 P(x)] - µ 2

Scores Probability P(x) x 1 2 3 4 5 0.16 0.22 0.28 0.20 0.14 Σ P(x) = 1

x-µ -1.94 -0.94 0.06 1.06 2.06

(x - µ )2 3.764 0.884 0.004 1.124 4.244

**P(x)(x - µ )2 0.602 0.194 0.001 0.225 0.594
**

Σ P(x)(x - µ )2 = 1.616

Mean (µ ) = 2.94 Variance (σ 2) = √ 1.616 =1.27

BINOMIAL DISTRIBUTION

• In a binomial experiment, the probability of exactly x successes in n trials is P(x)= nCr px qn-x n = number of time a trial is repeated p = Probability of success in a single trial q = Probability of failure in a single trial x = number of success in n trials

EXAMPLE

• • • • • • From a standard deck of cards, pick a card Note whether it is club or not and replace it Repeat experiment 5 times, n = 5 S = getting a club, F = getting another suit p = ¼, q = ¾ Random variable x = 0, 1, 2, 3, 4, 5

P(0) = 5C0 p0 q5 = 0.237 P(1) = 5C1 p1 q4 = 0.396 P(2) = 5C2 p2 q3 = 0.188 P(3) = 5C3 p3 q2 = 0.088 P(4) = 5C4 p4 q1 = 0.015 P(5) = 5C5 p5 q0 = 0.001

x 0 1 2 3 4 5

MEAN = np VARIANCE = npq

P(x) 0.237 0.396 0.188 0.088 0.015 0.001

Poisson Distribution

• The probability of exactly x occurrences in an interval is P(x) = µ x e-µ / x! Specific number of success within a given unit of time. Number of accidents per months with an average of 3. what is the probability that in any given month 4 accidents will occur. P(4) = 34 e-3 / 4! = 0.168

• Number of cases of a rare blood disease per 100,000 people • Number of hazardous waste sites per county • Number of printing error in a book with pages 1000 • Number of times a tire blows on a commercial airplane per week

NORMAL DISTRIBUTION

• • • • It is a symmetrical distribution Normal curve is bell shaped Mean, median and mode are equal Total area under the normal curve = 1 • Ratio of

Probability density function (pdf) =

68%

95%

99.7%

68% of the observations fall within 1 standard deviation of the mean, that is, between and 95% of the observations fall within 2 standard deviations of the mean, that is, between and 99.7% of the observations fall within 3 standard deviations of the mean, that is, between and .

The distribution of heights of women aged 18 to 24 is approximately normally distributed with mean 65.5 inches and standard deviation 2.5 inches. From the above rule, it follows that 68% of these women have heights between 65.5 - 2.5 and 65.5 + 2.5 inches, or between 63 and 68 inches, 95% of these women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 61 and 71 inches. Again, you can try this out with the example below.

Example

Sampling

• Reason for sampling

The sample can save money The sample can save time For given resources, the sample can broaden the scope of the study Because the research process is sometimes destructive, the sample an save product. If accessing the population is impossible, the sample is the only option

**Steps in Sampling process
**

• • • • • Define the population Frame the population Chose a sample design Draw the sample Execute the research

Designs

• A probability sample is one in which each element of the population has a known non-zero probability of selection. • Not a probability sample of some elements of population cannot be selected (have zero probability) • Not a probability sample if probabilities of selection are not known.

**Simple Random Sampling • Each element in the population
**

has an equal probability of selection AND each combination of elements has an equal probability of selection • Names drawn out of a hat • Random numbers to select elements from an ordered list

47

Stratified Sampling

• Divide population into groups that differ in important ways • Basis for grouping must be known before sampling • Select random sample from within each group • For a given sample size, reduces error compared to simple random sampling IF the groups are different from each other • Tradeoff between the cost of doing the stratification and smaller sample size needed for same error • Probabilities of selection may be different for different groups, as long as they are known

•

**Systematic Random Sampling Each element has an equal
**

probability of selection, but combinations of elements have different probabilities. Population size N, desired sample size n, sampling interval k=N/n. Randomly select a number j between 1 and k, sample element j and then every kth element thereafter, j+k, j+2k, etc. Example: N=60, n=15, k=60/15=4. Random number j=3.

• •

• •

Cluster Sampling

• This is a form of random sampling • Population is divided into groups, usually geographic or organizational (heterogeneous) • Some of the groups are randomly chosen • Population is divided into groups • Some of the groups are randomly selected • For given sample size, a cluster sample has more error than a simple random sample • Cost savings of clustering may permit larger sample • Error is smaller if the clusters are similar to each other

Difference Between Cluster and Stratified Sampling

Population of L strata, stratum l contains nl units

Population of C clusters

Take simple random sample in every stratum

Take srs of clusters, sample every unit in51 chosen clusters

•

of the researcher • Quota samples are based on selecting objects until you have a certain number (the quota) of each type

– Appeals to idea of a “representative” sample – Can produce substantial bias – Still widely used (especially for telephone surveys with high non-response levels)

Non-Probability Sampling Methods Judgment sample are chosen by the judgment of the

**• Convenience samples are obtained by choosing the easiest objects available
**

– E.g. the first ten people to walk out of a store

52

**Sampling Distributions about some Typically, we are interested in learning
**

numerical feature of the population, such as • the proportion possessing a stated characteristic; • the mean and the standard deviation. A numerical feature of a population is called a parameter. A statistic is a numerical valued function of the sample observations. Sample mean is an example of a statistic.

**The sampling distribution of a statistic
**

Three important points about a statistic: • the numerical value of a statistic cannot be expected to give us the exact value of the parameter; • the observed value of a statistic depends on the particular sample that happens to be selected; • there will be some variability in the values of a statistic over different occasions of sampling. Because any statistic varies from sample to sample, it is a random variable and has its own probability distribution. The probability distribution of a statistic is called its sampling distribution.

•

**population • Random variable, X, is Age (years) of the individuals of individuals • Values of X: 18, 20, 22, 24
**

• If you want to select two person from this population, what are the possible samples?

Sampling Suppose there is a Distributions

Summary Measure N

µ = i =1

∑Xi N

P(X)

.3 .2

Population Distribution

=

18 + 20 + 22 + 24 = 21 4

.1 0

σ=

i =1

∑( X i − µ ) N

N

2

A (18)

B (20)

C (22)

D (24)

X

= 2.236

Uniform Distribution

1st Obs

**All Possible Samples of Size n =2
**

2nd Observation 18 20 22 24

18 18,18 18,20 18,22 18,24 20 20,18 20,20 20,22 20,24 22 22,18 22,20 22,22 22,24 24 24,18 24,20 24,22 24,24

1st 2nd Observation O s 18 20 22 24 b

16 Sample Means

18 18 19 20 21

16 Samples

Samples Taken with Replacement

20 19 20 21 22 22 20 21 22 23 24 21 22 23 24

**Sampling Distribution of All Sample Means 16 Sample Means
**

1st 2nd Observation O s 18 20 22 24 b

P(X) .3 .2 .1 0

Sample Means Distribution

18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24

# in sample = 2,

18 19

20 21 22 23

24

X

_

# in Sampling Distribution = 16

Statistical inference about the population mean is of prime practical importance. Inferences about this parameter are based on the sample mean and its sampling distribution.

Distribution of the sample mean

**Comparing the Population with its Sampling Distribution
**

Population

µ

P(X) .3 .2 .1 0

= 21, σ 2.236

=

.2 .1 X (24) 0

µx

P(X) .3

**Sample Means Distribution = 21n =σ2 = 1.58
**

x

A (18)

B (20)

C

D (22)

18 19

20 21 22 23

24

_

X

Hypothesis Testing

• Make statement(s) regarding unknown population parameter values based on sample data • Elements of a hypothesis test:

– Null hypothesis - Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality) – Alternative hypothesis - Statement contradictory to the null hypothesis (will always contain an inequality) – Test statistic - Quantity based on sample data and null hypothesis used to test between null and alternative hypotheses – Rejection region - Values of the test statistic for which we reject the null in favor of the alternative hypothesis

**The Null Hypothesis, H0
**

• States the Assumption (numerical) to be tested

e.g. The grade point average of juniors is 3.0 (H0: µ = 3.0)

• Begin with the assumption that the null hypothesis is TRUE. •Always contains the ‘ = ‘ sign

The Alternative Hypothesis, H•1 Is the opposite of the null hypothesis

e.g. The grade point average of juniors is not equal to or less than 3.0 (H1: µ ≠ 3.0) or (H1: µ < 3.0)

• Never contains the ‘=‘ sign • The Alternative Hypothesis may or may not be accepted • Is generally the hypothesis that is believed to be true by the researcher

Hypothesis Testing

Test Result – True State H0 True H0 False H0 True Correct Decision Type II Error H0 False Type I Error Correct Decision

α = P (Type I Error ) β = P(Type II Error )

Level of Significance, α

• Defines Unlikely Values of Sample Statistic if Null Hypothesis Is True

–

Called Rejection Region of Sampling

**Distribution • Designated α (alpha)
**

–

Typical values are 0.01, 0.05, 0.10

• Selected by the Researcher at the Start • Provides the Critical Value(s) of the Test

• • • • • • • •

**Steps of Hypothesis Identify data testing
**

Formulation of Hypothesis Determine Degrees of freedom Specify level of significance Calculate test statistics Find Table Value of test statistic Result Interpretation

If n ≥ 30 then it is a large sample and you have to apply Z test If n < 30 then it is a small sample and you have to apply t test

X − μ t = S n

( X − X 2 ) − ( µ1 − µ 2 ) t= 1 s X1 − X 2

t = S d

d n

s X1 − X 2 =

s s + n n

2 1

2 2

**Sample Hypothesized Population Data
**

Parameter

Sample Variance

Estimated Standard Error

t-statistic

One sample t-statistic

X

µ

S S s = d f

2

s2 n s2 n

s2 p n1 s2 p n2

X−µ t= s x D − µD t= sD

Paired samples tstatistic

D

µD

SS s = D df

2

Independent samples X X −2 1 t-statistic

µµ −2 1

S +S S S2 s= 1 d +f2 f1 d

2 p

+

(1 X (1 µ X 2− − ) −) µ 2 t = s− xx 1 2

**Steps for Calculating a Test Statistic
**

One-Sample T 1. Calculate sample mean 2. Calculate standard error 3. Calculate T and d.f. 4. Use Table D

**Steps for Calculating a Test Statistic
**

Independent Samples T 1. 2. Calculate X1-X2

**S +S S S2 2 s= 1 Calculate pooled variance p d +f2 f1 d
**

Calculate standard error Calculate T and d.f. Use Table E.6

( − ) (1 µ XX µ 2 − −) t 1 2 = s− xx 1 2

s2 p n1

+

s2 p n2

3. 4. 5.

d.f. = (n1 - 1) + (n2 - 1)

Illustration

A developmental psychologist would like to examine the difference in verbal skills for 8-year-old boys versus 8year-old girls. A sample of 10 boys and 10 girls is obtained, and each child is given a standardized verbal abilities test. The data for this experiment are as follows:

Girls

n1 = 10 X = 37

1

Boys

n2 = 10 = 31 SS2 = 210

X2

SS1 = 150

Illustration

Girls n1 = 10 X = 37

1

Boys n2 = 10 X = 31

2

SS1 = 150

SS2 = 210

STEP 1: get mean difference

X1 − X 2 = 6

Girls n1 = 10 X = 37

1

Boys n2 = 10 X = 31

2

Illustration

SS1 = 150

SS2 = 210

STEP 2: Compute Pooled Variance

SS 1 2 SS 5 1 + 00 3 + 6 0 1 2 s = = == 2 0 d 2 (−1) 1 f+ 1) 0 8 d 0 +1 1 − ( 1f

2 p

Illustration

Girls n1 = 10 X = 37

1

Boys n2 = 10 X = 31

2

SS1 = 150

SS2 = 210

STEP 3: Compute Standard Error

s s 22 00 S E = += + = = 42 nn 1 1 00 1 2

2 p

2 p

Illustration

Girls n1 = 10 X = 37

1

Boys n2 = 10 X = 31

2

SS1 = 150

SS2 = 210

STEP 4: Compute T statistic and df

(−− µ( −− X2 (− 310 ) 1 2 3 1 Xµ ) 7 ) t = = = 3 s2 2 x − x 1

d.f. = (n1 - 1) + (n2 - 1) = (10-1) + (10-1) = 18

Illustration

STEP 5: Use table E.6 T = 3 with 18 degrees of freedom For alpha = .01, critical value of t is 2.552 T calculated > t tabulated Our T is more extreme, so we reject the null There is a significant difference between boys and girls

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd