P. 1
Statistics Formulae

# Statistics Formulae

|Views: 200|Likes:

See more
See less

08/06/2013

pdf

text

original

# BATCH: 2009 -2011

STATISTICS FORMULAE

PREPARED BY: KEYUR SAVALIA DIPA SHAH KRISHNA RAJPUT NIKITA SANGHVI MITESH SHAH BATCH: 2009 -2011

BATCH 2009 - 2011

Index
Chapter
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Name of Chapter
Grouping and Displaying data to convey Meaning : Tables and Graphs Measure of Central Tendency and Dispersion in Frequency Distribution Probability 1 Introduction and Ideas Probability Distribution Sampling and Sampling Distribution Estimation Testing Hypotheses: One – Sample Tests Testing Hypotheses: Two – Sample Tests Chi –Square And Analysis Of Variance Simple Regression and Correlation Simple Regression and Correlation Index Numbers

Pg. No.
3 5 11 14 17 20 23 26 31 40 45

2|P a ge

BATCH 2009 - 2011

Chapter – 1

Grouping and displaying data to convey Meaning: Tables and Graphs

3|P a ge

Width of the Class Intervals =

BATCH 2009 - 2011 𝑵𝒆𝒙𝒕

𝒖𝒏𝒊𝒕 𝒗𝒂𝒍𝒖𝒆 𝒂𝒇𝒕𝒆𝒓 𝒍𝒂𝒓𝒈𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆 𝒊𝒏 𝒅𝒂𝒕𝒂 − 𝑺𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆 𝒊𝒏 𝒅𝒂𝒕𝒂 𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍𝒔

Note: - 1) To arrange raw data, Decide the number of classes into which you will divide the data 2) Normally total number of the class intervals is between 6 and 15

4|P a ge

BATCH 2009 - 2011

Chapter 2

Measure of Central Tendency and Dispersion in Frequency Distribution

5|P a ge

STEVENS BUSINESS SCHOOL 2.1 Population Arithmetic Mean

BATCH 2009 - 2011 𝒙

𝝁 = 𝑵
x - Sum of the values of all the elements in the population N - Number of the elements in the population.

2.2 Sample Arithmetic Mean 𝒙

= 𝒙

𝒏

x - Sum of the values of all the elements in the Sample n - Numbers of the elements in the Sample.

2.3 Sample Arithmetic mean of Grouped Data 𝒙

=

( 𝒇 × 𝒙 ) 𝒏

( 𝑓 × 𝑥 ) - calculate the midpoints (x) for each class in the sample. Multiply each midpoint by the frequency (f) of observation in the class, sum (∑) all these results, n = total number of observation in sample = 𝑓

6|P a ge

STEVENS BUSINESS SCHOOL 2.4 Weighted Average Mean

BATCH 2009 - 2011 𝒙𝒘

=

( 𝒘 × 𝒇 ) 𝒘 𝑤

× 𝑓 - calculate this average by multiplying the weight of each element (u) by that element (x), sum (∑) all these results, ∑ w = sum of all the weights

2.5 Sample arithmetic mean of grouped data using code 𝒙

= 𝒙𝟎 + 𝒄

( 𝒖 × 𝒇 ) 𝒏 𝑥

= Mean of the sample 𝑥0 = Value of the midpoint assigned the code 0 = this can be the middle point of the class corresponding to highest frequency c = numerical width of the class interval u = code assigned to each class f = frequency or number of observation in the each class n = total number of observation in the sample

2.6 Geometric mean

GM = 𝑛 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑥 𝑣𝑎𝑙𝑢𝑒𝑠
2.7 Median
( 𝒏+𝟏) 𝟐

Median =

n = number of items in the data array.

7|P a ge

STEVENS BUSINESS SCHOOL 2.8 Sample median of grouped data 𝒏
− (𝑭+𝟏) 𝟐

BATCH 2009 - 2011 𝒎

= ( 𝒇𝒎

) w+𝑳𝒎 𝑚

= Sample median n = total number of items in the description F = sum of all the class frequency up to, but not including, the median class 𝑓 = frequency of the median class 𝑚 w = class interval width 𝐿𝑚 = lower limit of median-class interval

2.9 Mode

Mo = 𝐿𝑚 0 + 𝑑

1 𝑑1 +𝑑2 𝑤 𝐿𝑚

0 = Lower limit of the modal class d1 = Frequency of the modal class minus the frequency of the directly below it d2 = Frequency of the modal class minus the frequency of the directly above it w = width of the modal class interval

2.10 Range

Range = value of highest observation – value of lowest observation

2.11 Inter quartile range

Inter quartile range = Q3 –Q1
Q1 = Value of first quartile = P25 Q3 = Value of third quartile = P75

8|P a ge

BATCH 2009 - 2011

2.12 Population variance 𝝈𝟐

=

(𝒙 − 𝝁)𝟐 𝒙𝟐 = − 𝝁𝟐 𝑵 𝑵 𝜎

2 = Population variance x = items or observation µ = Population mean N = total number of items in the population

2.13 population standard deviation

σ = 𝝈𝟐 =

(𝒙−𝝁)𝟐 𝑵

= 𝒙𝟐

𝑵

− 𝝁𝟐

x = observation µ = Population mean N = Total number of elements in the population ∑ = sum of all the value(𝑥 − 𝜇)2 , or all the value 𝑥 2 𝜎 = Population standard deviation 𝜎 2 =Population variance

2.14 Population standard score Population Standard Score = 𝒙
−𝝁 𝝈

x = observation from the population µ = Population mean σ = Population Standard deviation

2.15 Sample Variance 𝟐 𝒔𝟐

=

(𝒙− 𝒙 ) 𝒏−𝟏

= 𝒙𝟐

𝒏−𝟏

− 𝒏𝒙𝟐

𝒏−𝟏
9|P a ge

BATCH 2009 - 2011

2.16 Sample Standard Deviation 𝟐 𝒔

= 𝒔𝟐

=

(𝒙− 𝒙 ) 𝒏−𝟏

= 𝒙𝟐

𝒏−𝟏

− 𝒏𝒙𝟐

𝒏−𝟏 𝒔𝟐

= Sample Variance s = Sample Standard Deviation x = Value of each of the n observation 𝒙 = Mean of the Sample n-1 = number of observation in the sample minus 1

2.17 Sample Standard Score

Sample Standard Score =
x = Observation of Sample 𝒙 = Sample Mean S = Sample Standard Deviation 𝒙

− 𝒙 𝑺

2.18 Population Co-efficient of Variance Population Co-efficient of Variance =
σ = Standard deviation of population µ = Mean of the Population 𝝈

𝝁

× 100%

10 | P a g e

BATCH 2009 - 2011

Chapter – 3

Probability 1 Introduction and Ideas

11 | P a g e

BATCH 2009 - 2011

3.1 Probability of an event = 𝒏𝒖𝒎𝒃𝒆𝒓

𝒐𝒇𝒐𝒖𝒕𝒄𝒐𝒎𝒆 𝒘𝒉𝒆𝒓𝒆 𝒕𝒉𝒆 𝒆𝒗𝒆𝒏𝒕 𝒐𝒄𝒄𝒖𝒓𝒔 𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆

This is definition of the Classical probability that an event will occur. P (A) = Probability of event A happening A single probability refers to the probability of one particular event occurring, and it is call marginal probability. P (A or B) = Probability of either A or B happening This notion represents the probability that one event or the other will occur.

3.2

P (A or B) = P (A) + P (B)

The probability of either A or B happening when A and B are mutually exclusive equals of the sum of the probability of event A happening and the probability and the probability of event B happening. This is addition rule for mutually exclusive events.

3.3 P(A or B) = P(A) + P(B) – P(AB)
The addition rule for that are not mutually exclusive shows that the probability of A or B happening when event A or B happening are not mutually exclusive is equal to the probability of event A happening plus the probability of event B happening minus the Probability of A and B together, P(AB).

3.4 P (AB) = P(A) × P(B)
P (AB) =joint probability of event A and B occurring together or in succession. P (A) = marginal probability of event A happening P (B) = marginal probability of event B happening This joint probability of two or more independent events occurring together or in succession is the product of their marginal probability.
12 | P a g e

P (B│A) = P (B)

BATCH 2009 - 2011

For statistical independent events, the conditional probability of event B, given that event A has occurred, is simply the probability of event B. Independent events are those whose probability is in no way affected by the occurrence of each other.

3.6 P (B│A) = 𝑷
(𝑩𝑨) 𝑷(𝑨)

And

P (A│B) = 𝑷

(𝑨𝑩) 𝑷(𝑩)

For statistically dependent events, the conditional probability of event B, given that event A occurred, is equal to the joint probability of event A and divided by the marginal probability of event A.

3.7 LAW OF MULTIPLICATION P (AB) = P (A│B) × P (B) P (BA) = P (B│A) × P (A)
For statistically dependent events, the joint probability of events A & B happening together or in succession is equal to the probability of event A, given that event B has already happened, multiplied by the probability that event B will occur.

13 | P a g e

BATCH 2009 - 2011

Chapter – 4

Probability Distribution

14 | P a g e

STEVENS BUSINESS SCHOOL 4.1 Binomial Formula

BATCH 2009 - 2011

Probability of r successes in n Bernoulli trials =
r = number of successes desired {r = 0, 1, 2,…., n} n = number of trials undertaken p = probability of success {0≤ p ≤1} q = probability of failure = 1-p 𝒏

! 𝒓!(𝒏−𝒓)! 𝒑𝒓

𝒒𝒏−𝒓

4.2 Mean of Binomial Distribution

µ = np
n = number of trials p = Probability of success

4.3 Standard Deviation of a Binomial Distribution

σ = 𝒏𝒑𝒒
n = number of trial p = probability of success q = probability of failure = 1-p

4.4 Calculating Poisson Probability

P(x) = 𝝀

𝒙 × 𝒆−𝝀 𝒙!

; if x=0,1,2,….,∞ ; 𝝀 ≥0
Otherwise P(x) = 0

P(x) = probability of exactly x occurrence 𝝀𝒙 = Lambda (the mean number of occurrences per interval of time) raised to the x power e= 2.71828 𝒆−𝒙 = e or 2.71828, raised to the negative lambda power x! = x factorial
15 | P a g e

BATCH 2009 - 2011

4.5 Poisson Probability Distribution as an Approximation of the Binomial
(𝒏𝒑)𝒙 𝒆 −𝒏𝒑 𝒙!

P(x) =

n = number of trial p = probability of success x! = x factorial The rule most often used by statisticians is that the Poisson is a good approximation of the binomial when n is greater than or equal to 20 and P is less than or equal to 0.05

4.6 Standardizing a Normal Random Variance 𝒙
−𝝁 𝝈

Z=
x Z µ σ

= value of the random variable with which we are concerned =number of standard deviation from x to the mean of this distribution = mean of the distribution of this random variables = Standard Deviation of this distribution

Note :Normal distribution can be used as an approximation of Binomial Distribution when; np > 5 nq > 5

n = number of trial p = probability of success q = probability of failure = 1-p

16 | P a g e

BATCH 2009 - 2011

Chapter – 5 Sampling and Sampling Distribution

17 | P a g e

BATCH 2009 - 2011

5.1 When sampling distribution has a mean equal to the population mean

µx̅ = µ
5.2 Standard Deviation Error of the Mean for Infinite Populations 𝝈
𝒏 𝝈𝒙

= 𝛔

𝐱 = Standard error of the mean 𝝈
= Population standard deviation 𝒏 = Sample size This equation states that the sampling distribution has a standard deviation equal to the population standard deviation divided by the square root of the sample size.

5.3 Standardizing the Sample Mean

z= 𝒙

− 𝝁 𝝈𝒙 𝒙

= sample mean µ = population mean 𝝈 𝝈𝒙 = Standard error of the mean = 𝒏

18 | P a g e

BATCH 2009 - 2011

5.4 Standard Error of the Mean for Finite Populations 𝝈
𝒏 𝑵−𝒏 𝑵−𝟏 𝝈𝒙

=

×

N = size of population n = size of sample And 𝑵
−𝒏 𝑵−𝟏

is Finite Population Multiplier

Note: - Use above equation for calculation standard error of mean only when; 𝒏 > 0.005 𝑵

19 | P a g e

BATCH 2009 - 2011

Chapter – 6 Estimation

20 | P a g e

STEVENS BUSINESS SCHOOL 6.1 Estimation of Population Standard Deviation
(𝒙−𝒙)𝟐 𝒏−𝟏

BATCH 2009 - 2011 𝝈

= s =

Note :  This formula indicates that the sample standard deviation can be used to estimate the population standard deviation, When the population standard deviation is unknown; use above equation

6.2 Estimated Standard Error of the Mean of a Finite Population 𝝈
𝒏 𝑵−𝒏 𝑵−𝟏 𝝈𝒙

=

× 𝝈𝒙

= Symbol that indicated an estimated value 𝝈 = Estimate of the population standard deviation Note :- Use above formula only when 𝒏
𝑵

> 0.005

6.3 Mean of the Sampling Distribution of the Proportion 𝝁𝒑

= 𝒑 𝒑
= the proportion of success in sample P = the proportion of success in population

21 | P a g e

STEVENS BUSINESS SCHOOL 6.4 Standard Error of the Proportion 𝒑𝒒
𝒏

BATCH 2009 - 2011 𝝈𝒑

= 𝝈𝒑
P q n

= Standard Error of the Proportion = the proportion of success in population = the proportion of failure in population = total No. of outcome

6.5 Estimate Standard Error Of the Proportion 𝒑
𝒒 𝒏 𝝈𝒑

= 𝝈𝒑

= Estimate Standard Error of the Proportion 𝒑 = proportion of success in sample 𝒒 = Proportion of failure in sample n = Sample size

Note: - When use t-distribution; Degree of Freedom = n-1 n = sample size

6.6 Estimated Standard Error of the Mean of an Infinite Population 𝝈𝒙

= 𝝈

𝒏

22 | P a g e

BATCH 2009 - 2011

Chapter – 7

Testing Hypothesis: One sample tests

23 | P a g e

BATCH 2009 - 2011

Condition for using the Normal and t-distribution in Testing Hypothesis about Means

Sample Size

When the population Standard Deviation is known Normal Distribution Normal Area Normal Distribution Normal Area

When the population Standard Deviation is unknown Normal Distribution Normal Area t-distribution t-table

Sample size n > 30 Sample size n ≤ 30

 Hypothesis testing of means is used when the population standard deviation is known. 𝝈𝒙 = Standard error of the mean = Z = 𝒙
− µ 𝑯𝟎 𝝈𝒙 𝒙 − µ 𝑯𝟎 𝒔 𝝈 𝒏

t-value = 𝝈𝒙
= σ = n = 𝒙 = µ𝑯𝟎 =

Standard error of the mean Standard deviation of population sample size sample mean Population mean

24 | P a g e

BATCH 2009 - 2011

 Standard error of the proportion 𝝈𝒙

= 𝒑𝑯𝟎
𝒒𝑯𝟎 𝒑 𝒒 n = = = = = 𝒑𝑯𝟎

× 𝒒𝑯𝟎 𝒏 𝝈𝒙

= Standard error of the proportion
Hypothesis values of the population proportion of success (1- 𝒑𝑯𝟎 ) = Hypothesis values of the population proportion of failure Sample proportion of promotable Sample proportion of judged not promotable = (1 - 𝒑 ) Sample size

Z score = 𝒑

− µ 𝑯𝟎 𝝈𝒑

Note :- Use above formula when np > 5 nq > 5

25 | P a g e

BATCH 2009 - 2011

Chapter – 8

Testing Hypotheses: Two – Sample Tests

26 | P a g e

STEVENS BUSINESS SCHOOL 8.1 Standard Error of the Difference Between Two Means When n > 30

BATCH 2009 - 2011 𝝈𝒙𝟏

– 𝒙𝟐 = 𝝈𝟐

𝟏 𝒏𝟏

+ 𝝈𝟐

𝟐 𝒏𝟐 𝝈𝟐

= Variance of Population 1 𝟏 𝝈𝟐 = Variance of Population 2 𝟐 𝒏𝟏 = Size of sample 1 𝒏𝟐 = Size of sample 2

8.2 Estimated Standard Error of the Difference between Two Population When n > 30 𝝈𝒙𝟏

– 𝒙𝟐 = 𝝈𝟐

𝟏 𝒏𝟏

+ 𝝈𝟐

𝟐 𝒏𝟐 𝝈𝟐

= Estimated variance of population 1 𝟏 𝝈𝟐 = Estimated variance of population 2 𝟐

27 | P a g e

STEVENS BUSINESS SCHOOL 8.2.1 Z score value 𝒙𝟏
− 𝒙𝟐 − 𝝁𝟏 − 𝝁𝟐 𝑯𝟎 𝝈𝒙𝟏− 𝒙𝟐

BATCH 2009 - 2011

Z=

when n > 30

Z = Z – score 𝒙𝟏 = Sample mean of population 1 𝒙𝟐 = Sample mean of population 2 𝝁𝟏𝑯𝟎 = Hypothesized Value of Mean of population 1 𝝁𝟐𝑯𝟎 = Hypothesized Value of Mean of population 2 𝝈𝒙𝟏 – 𝒙𝟐 = Estimated Standard Error of the Difference between Two Population

Note :- only use above 3 equations when n > 30, i.e. For large sample only

8.3 Pooled Estimate of 𝝈𝟐 𝒔𝟐

𝒑 𝒔𝟐
𝒑 𝒏𝟏 𝒏𝟐 𝑺𝟏 𝑺𝟐

= 𝒏𝟏

−𝟏 𝑺𝟐 + 𝒏𝟏 −𝟏 𝑺𝟐 𝟏 𝟐 𝒏𝟏 +𝒏𝟐 −𝟐

= Pooled Estimate of 𝝈𝟐 = Size of sample 1 = Size of sample 2 = Standard deviation of sample of population 1 = Standard deviation of sample of population 2

Note : - Here , n1 and n2 < 30 Note :- Degree of Freedom for t-distribution Degree of freedom = 𝒏𝟏 + 𝒏𝟐 − 𝟐

28 | P a g e

BATCH 2009 - 2011

8.4 Estimated Standard Error of the Difference between Two Sample Mean with Small Sample and Equal Population Variance 𝝈𝒙𝟏

– 𝒙𝟐 = 𝒔𝒑 𝒔𝒑 = 𝟏

𝒏𝟏

+ 𝟏

𝒏𝟐

Where 𝒔𝟐

= 𝑷𝒐𝒐𝒍𝒆𝒅 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝒐𝒇 𝝈𝟐 𝒑 𝒙𝟏

= Sample mean of population 1 𝒙𝟐 = Sample mean of population 2 𝝁𝟏 − 𝝁𝟐 𝑯𝟎 = Hypothesized value of difference between two population mean

t = 𝒙𝟏

− 𝒙𝟐 − µ 𝟏 − µ 𝟐 𝑯𝟎 𝝈𝒙𝟏− 𝒙𝟐

8.5 Standard Error of the Difference between Two Proportions 𝝈𝒑𝟏

– 𝒑𝟐 = 𝒑𝟏

𝒒𝟏 𝒏𝟏

+ 𝒑𝟐

𝒒𝟐 𝒏𝟐

8.6 Estimated Standard Error of the Difference between Two Proportions 𝒑𝟏
𝒒𝟏 𝒏𝟏 𝒑𝟐 𝒒𝟐 𝒏𝟐
29 | P a g e 𝝈𝒑𝟏

– 𝒑𝟐 =

+

BATCH 2009 - 2011

8.7 Estimate Overall Proportion of Successes in Two Populations 𝒑

= 𝒏𝟏

𝒑𝟏 + 𝒏𝟐 𝒑𝟐 𝒏𝟏 + 𝒏𝟐

8.8 Estimate Standard Error of the difference between Two Proportion Using Combined Estimates from both Samples 𝝈𝒑𝟏

– 𝒑𝟐 = 𝒑

𝒒 𝒏𝟏

+ 𝒑

𝒒 𝒏𝟐

Z= 𝒑𝟏

− 𝒑𝟐 − 𝒑𝟏 − 𝒑𝟐 𝑯𝟎 𝝈𝒑𝟏− 𝒑𝟐 𝒑

= Estimate of the overall proportion of successes in the population using combined proportion from both samples 𝒏𝟏 = Size of sample 1 𝒏𝟐 = Size of sample 2 𝒑𝟏 = Proportion of population 1 with error 𝒑𝟐 = Proportion of population 2 with error 𝒒 = 1 - 𝒑 𝝈𝒑𝟏− 𝒑𝟐 = Estimate Standard Error of the difference between Two Proportion Using Combined Estimates from both Samples

30 | P a g e

BATCH 2009 - 2011

Chapter – 9 CHI –SQUARE AND ANALYSIS OF VARIANCE

31 | P a g e

BATCH 2009 - 2011

9.1 Chi –Square as a Test of Independence
9.1.1 Chi – Square static
( 𝒇𝟎 − 𝒇𝒆 ) 𝒇𝒆 𝝌𝟐

= ∑

~ X2 ( n – 1 ) 𝒇𝟎

= An observed frequency 𝒇𝒆 = An expected frequency 𝝌𝟐 = Chi - Square

9.1.2 Number of Degrees of freedom
No. of degree of freedom = (no. of rows - 1) × (no. of columns - 1)

9.1.3 Expected frequency for any cell 𝒇𝒆

= 𝑹𝑻

× 𝑪𝑻 𝒏 𝑹𝑻

= Row total for the Row containing that cell 𝑪𝑻= Column total for the Column containing that cell n = Total number of observation or grand total of frequency

9.1.4 Using the f-distribution : Degree of Freedom
No. of degree of freedom in the numerator of the F ratio = ( no. of Sample - 1 ) No. of degree of freedom in the denominator of the F ratio = ∑ ( 𝒏𝒋 − 𝟏) = 𝒏𝑻 − 𝑲 𝒏𝒋

= Size of the 𝒋𝒕𝒉 sample 𝒏𝑻 = total sample size K = no. of samples F – Table is given in Appendix Table 6(B) – Book Name: STATISITICS FOR MANAGEMENT

32 | P a g e

BATCH 2009 - 2011

F =

(𝒔𝟏 )𝟐 (𝒔𝟐 )𝟐

Where 𝒔𝟐 = Variance of the sample

9.1.6a Upper-tail value of F – for two tailed test

F = ( n, d, 𝜶 ) or Fα ( n, d )
n = numerator degree of freedom d = denominator degree of freedom 𝜶 = area in upper limit

9.1.6b lower – tail value of F for two-tailed test

F = ( n, d, 𝜶 ) = Fα = ( n, d ) = 𝟏

𝑭 ( 𝒅, 𝒏, 𝟏−𝜶) 𝟏

or 𝑭𝟏

−𝜶 ( 𝒅, 𝒏 )

33 | P a g e

BATCH 2009 - 2011

9.2 ANALYSIS OF VARIANCE
9.2.1 ANOVA TABLE FOR ONE-WAY CLASSIFIED DATA

Source of variation

Degrees of freedom (df)

Sum of squares (SS)

Mean squares (MS)

Between the level of the factor (treatment)

𝒌

TrSS / df 𝑴𝑻 = 𝑸𝑻 𝒌 − 𝟏 𝑻𝒊𝟐

𝑮𝟐 − 𝒏𝒊𝟐 𝑵

(ESS) Within the level of factors (errors) N-k by subtraction TSS – TrSS = 𝑸𝑺

ESS / df 𝑴𝑻 = 𝑸𝑬
𝑵−𝒌

(TSS) Total N–1 𝑸 = 𝒊𝒋 𝑮𝟐

𝒚 𝒊𝒋 − 𝑵 𝟐

-

F= 𝒈𝒓𝒆𝒂𝒕𝒆𝒓

𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒔𝒎𝒂𝒍𝒍𝒆𝒓 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆

= 𝑻𝒓𝑺𝑺

𝒅𝒇 𝑬𝑺𝑺 𝒅𝒇

= F K-1, N-K

34 | P a g e

STEVENS BUSINESS SCHOOL Working Rule for an Example

BATCH 2009 - 2011

We have to consider three quantities G, N and correction factor (CF), defined as follows:  G = sum of all the values for all the treatments  N = sum of the no. of times each treatment is applied

CF = 𝑮𝟐

𝑵

E.g. suppose there are 3 treatment A, B, and C Suppose the no. of the times treatment is applied is n1 in case of A,  n2 in case of B,  n3 in case of C sum of the values of three treatments are denoted by T1, T2, and T3.

A B C

1 2 9
 n1 = 4 ; T1 = 1 + 2 + 3 + 4  n2 = 4 ; T1 = 2 + 5 + 6 + 7  n3 = 4 ; T1 = 9 + 2 + 3 + 5  N = n1 + n2 + n3

2 5 2

3 6 3

4 7 5

1. G = T1 + T2 + T3 2. CF = 𝑮𝟐
𝑵

=

( 𝑻𝟏 + 𝑻𝟐 + 𝑻𝟑 )𝟐 𝒏𝟏 + 𝒏𝟐 + 𝒏𝟑

3. TSS = sum of the squares of the observed values – CF 4. TrSS = 𝑻𝟏
𝟐 𝒏𝟏

+ 𝑻𝟐

𝟐 𝒏𝟐

+ 𝑻𝟑

𝟐 𝒏𝟑

- CF

5. ESS = TSS - TrSS
35 | P a g e

STEVENS BUSINESS SCHOOL  CALCULATION OF DEGREES OF FREEDOM 6. df for treatments = No. of treatment – 1 =k–1

BATCH 2009 - 2011

7. df for the total = Total no. of times all the treatments have been applied – 1 = N – 1 = n 1 + n2 + n 3 – 1 8. df for error = N – K total no. of times all the treatments have been applied – no. if treatments 

CALCULATION OF MEAN SQURE 9. MT = 10. ME = 𝑻𝒓
𝑺𝑺 𝒅𝒇 𝑬𝑺𝑺

𝒅𝒇

Calculation of F value, variance ratio 11. F = 𝑴𝑻
𝑴𝑬

; Fk-1, N-K

12. Inference: if the observed value of F is less than the expected value of F, i.e. F 0 < Fe, for a given level of significance α, then the null hypothesis of equal treatment effect is accepted.

36 | P a g e

BATCH 2009 - 2011

9.2.2 ANOVA TABLE FOR CRD

Source of variation

Degrees of freedom (df)

Sum of squares (SS)

Mean squares (MSS)

Variance Ratio F

Between the level of the factor (Treatment)

k–1 𝐐𝐓 = 𝒊 𝑻𝒊

𝟐 𝑮𝟐 − 𝑵 𝑵

MT = 𝑸𝑻

𝒌−𝟏

FT = 𝑴𝑻

𝑴𝑬

: Fk-1, (N-1)

Within the level of factors (errors) Total

(N-k)

QE = Q – QT

ME = 𝑸𝑬

𝒌−𝟏

-

N–1 𝐐

= 𝒊𝒋 𝒚𝒊𝒋 𝟐 𝑮𝟐

− 𝑵

-

-

37 | P a g e

BATCH 2009 - 2011

9.2.3 ANOVA TABLE FOR RBD

Source of variation Between the level of the factor (treatment)

Degrees of freedom (df) k–1

Sum of squares (SS) TrSS 𝑸𝑻 = 𝒊

Mean squares (MSS)

Variance Ratio F 𝑻𝒊

𝑮 − 𝒓 𝒓𝒌 𝟐 𝟐

MT = 𝑸𝑻

𝒌−𝟏

FT = 𝑴𝑻

𝑴𝑬

: Fk-1, (k-1)(r-1)

BSS Blocks r-1 𝑸𝑩 = 𝒊 𝑩𝒋

𝑮𝟐 − 𝒌 𝒓𝒌 𝟐

MB= 𝑸𝑩

𝒓−𝟏

FB = 𝑴𝑩

𝑴𝑬

: Fr-1, (k-1)(r-1)

Within the level of factors (errors) Total

(k-1)(r-1)

QE = Q – ( Q T + QB )

ME = 𝑸𝑬

𝒌−𝟏 (𝒓−𝟏)

-

(rk – 1) = N –1 𝐐

= 𝒊𝒋 𝒚𝒊𝒋 𝟐 𝑮𝟐

− 𝒓𝒌

-

-

38 | P a g e

STEVENS BUSINESS SCHOOL Example Treatment A B C Here, k = 3 and r = 4
       T1 = 72 + 68 + 70 + 56 =266 T2 = 55 + 60 + 62 + 55 =232 T3 = 65 + 70 + 70 + 60 = 265 B1 = 72 + 55 + 65 = 192 B2 = 68 + 60 + 70 = 198 B3 = 70 + 62 + 70 = 202 B4 = 56 + 55 + 60 = 171

BATCH 2009 - 2011

Block 1 72 55 65

Block 2 68 60 70

Block 3 70 62 70

Block 4 56 55 60

 G = Grand Total of all the observations (rk)  G2 =(72)2 + (68)2 +(70)2 +(56)2 +(55)2 +(60)2 +(62)2 +(55)2 +(65)2 +(70)2 +(70)2 +(60)2 =  CF = 𝑮𝟐
𝒓𝒌

= 𝑮𝟐

𝑵

= 𝑮𝟐

𝟑×𝟒

(𝑻𝟏 )𝟐 𝒃

+

(𝑻𝟐 )𝟐 𝒃

+

(𝑻𝟑 )𝟐 𝒃

+

(𝑻𝟒 )𝟐 𝒃

- CF

o Here, b = no. of block = 4

 BSS =

(𝑩𝟏 )𝟐 𝒕

+

(𝑩𝟐 )𝟐 𝒕

+

(𝑩𝟑 )𝟐 𝒕

- CF

o Here, t = no. of treatment = 3

 TSS = G2 - CF  ESS = TSS – ( TrSS – BSS )
39 | P a g e

BATCH 2009 - 2011

Chapter – 10 Simple Regression and Correlation

40 | P a g e

BATCH 2009 - 2011

10.1

CORRELATION

Positive correlation

Negative correlation

10.1.1 KARL PERSON’S Coefficient of Sample Correlation

r= 𝑵

𝑵 𝑿𝟐 − 𝑿

𝒀 − ( 𝑿) ( 𝒀) 𝑿 𝟐 × 𝑵 𝒀𝟐 − 𝒀 𝟐

-

The value of r ranges from -1 to 0 and 0 to 1 r = 0, no correlation between X and Y r = 0.9 ≥ r ≥ -1, Shows a degree of correlation between X and Y r = 1, prefect positive correlation r = -1, prefect negative correlation 0.75 < r < 1.0 = high degree of positive correlation 0.6 < r < 0.75 = moderate degree of correlation

41 | P a g e

STEVENS BUSINESS SCHOOL 10.1.2 SPEARAMAN’S Rank Correlation Coefficient

BATCH 2009 - 2011 𝝆

= 1 – 𝟔 𝑫𝟐 𝑵𝟑

− 𝑵

D = difference between the corresponding ranks of X and Y = Rx - RY N = total numbers of pairs of observations X and Y

10.1.3 Correlation term for 𝝆 when ranks ‘m’ items
If there is a tie involving ‘m’ items, we have to add, 𝒎𝟑

− 𝒎 𝟏𝟐
To the term D2 in 𝝆

= 1 – 𝟔

( 𝑫𝟐 + 𝒎𝟑

− 𝒎 ) 𝟏𝟐 𝑵𝟑

− 𝑵

42 | P a g e

BATCH 2009 - 2011

10.2
10.2.1 Regression of Y on X

REGRESSION
10.2.2 Regression of X on Y

Y - 𝒀 = bxy (X – 𝑿) Y = a + bx a= b= 𝒏
𝒏 𝑿𝟐 𝒏 𝒀 − 𝑿 𝑿𝒀 𝟐

X - 𝑿 = bxy (Y – 𝒀) X = a + by a= b= 𝒏
𝒏 𝒀𝟐 𝒏 𝑿 − 𝑿 𝑿𝒀 𝟐 𝑿𝟐

− ( 𝑿) 𝑿𝒀 − 𝑿 𝒀𝟐

− ( 𝒀) 𝑿𝒀 − 𝑿 𝒀 𝟐 𝒀 𝟐 𝑿𝟐

− ( 𝑿) 𝒀𝟐

− ( 𝒀)

Let 𝝈𝒙 , 𝝈𝒚 denote the standard deviations of x, y respectively. Then-

byx = r bxy = r 𝝈𝒚

𝝈𝒙 𝝈𝒙 𝝈𝒚

so, r2 = byx × bxy r = 𝐛𝐲𝐱 × 𝐛𝐱𝐲
This method of regression is very useful for business forecasting.

43 | P a g e

BATCH 2009 - 2011

10.3
10.3.1 r10.1.3 =

PARTIAL AND MULTIPLE CORRELATION

PARTIAL CORRELATION 𝒓𝟏𝟐
− 𝒓𝟏𝟑 𝒓𝟐𝟑 𝟏−𝒓𝟐𝟏𝟑 (𝟏− 𝒓𝟐𝟐𝟑 )

r13.2 = 𝒓𝟏𝟑

− 𝒓𝟏𝟐 𝒓𝟑𝟐 𝟏−𝒓𝟐𝟏𝟐 (𝟏− 𝒓𝟐𝟑𝟐 )

r23.1 = 𝒓𝟐𝟑

− 𝒓𝟐𝟏 𝒓𝟏𝟑 𝟏−𝒓𝟐𝟐𝟏 (𝟏− 𝒓𝟐𝟏𝟑 )

10.3.2

MULTIPLE CORRELATION 𝒓𝟐𝟏𝟐
+ 𝒓𝟐𝟏𝟑 −𝟐 𝒓𝟏𝟐 𝒓𝟏𝟑 𝒓𝟐𝟑 𝟏 − 𝒓𝟐𝟐𝟑

R 1.23 =

R 2.13 = 𝒓𝟐𝟐𝟏

+ 𝒓𝟐𝟐𝟑 −𝟐 𝒓𝟐𝟏 𝒓𝟐𝟑 𝒓𝟏𝟑 𝟏 − 𝒓𝟐𝟏𝟑

R 3.12 = 𝒓𝟐𝟑𝟏

+ 𝒓𝟐𝟑𝟐 −𝟐 𝒓𝟑𝟏 𝒓𝟑𝟐 𝒓𝟏𝟐 𝟏 − 𝒓𝟐𝟏𝟐

 

R1.23 = R1.32 co-efficient of multiple correlation is between 0 and 1
44 | P a g e

BATCH 2009 - 2011

Chapter – 11 INDEX NUMBERS

45 | P a g e

BATCH 2009 - 2011

METHODS OF CONSTRUCTION OF INDEX NUMBER
1. UNWEIGHTED a. Simple aggregate b. Simple average of Price Relative

2. WEIGHTED a. Weighted Aggregate b. Weighted average of Price Relative

1. UNWEIGHTED a. Simple Average 𝑷𝟏
× 𝟏𝟎𝟎 𝑷𝟎

P01 =

P01 = Price index number for the current year with reference to the base year 𝑷𝟏
= Aggregate of Price for the current year, 𝑷𝟎 = Aggregate of Price for the base year

b. Simple Average Of Price Relative Method

P01 = 𝑷

𝑵 𝑷𝟏
× 𝟏𝟎𝟎 𝑷𝟎

Where, P =

N = No. of items
46 | P a g e

BATCH 2009 - 2011

c. Geometric Mean
P01 = antilog Where, P = 𝐥𝐨𝐠
𝑷 𝑵 𝑷𝟏 × 𝟏𝟎𝟎 𝑷𝟎

2. WEIGHTED INDEX NUMBER
  

p0 = Price in Base year, q0 = weighted / Quantity in Base year p1 = Price in Current year, q1= quantity in Current year

a) Laspeyre’s Method
P01 (La) = 𝒑𝟏
𝒒𝟎 𝒑𝟎 𝒒𝟎

× 100

b) Paasche’s method
P01 (Pa) = 𝒑𝟏
𝒒𝟏 𝒑𝟎 𝒒𝟏

× 100

47 | P a g e

BATCH 2009 - 2011

c) Bowley’s method

P01 (B) = = 𝒑𝟏

𝒒𝟎 𝒑𝟎 𝒒𝟎

+ 𝟐 𝒑𝟏

𝒒𝟏 𝒑𝟎 𝒒𝟏

× 𝟏𝟎𝟎 𝑷𝟎𝟏

𝑳𝑨 + 𝑷 𝟎𝟏 (𝑷𝑨) 𝟐

d) Fisher’s Ideal Formula
P01 (F) = 𝑳 × 𝑷 = 𝒑𝟏
𝒒𝟎 𝒑𝟎 𝒒𝟎

+ 𝒑𝟏

𝒒𝟏 𝒑𝟎 𝒒𝟏

× 𝟏𝟎𝟎

e) Marshall Edgeworth method
P01 (Ma) = 𝒑𝟏
( 𝒒𝟏 + 𝒒𝟎 ) 𝒑𝟎 ( 𝒒𝟎 + 𝒒𝟏 )

= 𝒑𝟏

𝒒𝟎 𝒑𝟎 𝒒𝟎

+ 𝒑𝟏

𝒒𝟏 𝒑𝟎 𝒒𝟏

× 𝟏𝟎𝟎

f) Kelley’s Method 𝒑𝟏
𝒒 𝒑𝟎 𝒒

P01 (K) =

× 𝟏𝟎𝟎

Where, q = 𝒒𝟎

+ 𝒒𝟏 𝟐

48 | P a g e

BATCH 2009 - 2011

DISCRIMINATE ANALYSIS 𝒂𝟏𝟏
𝒂𝟐𝟏 𝒂𝟏𝟐 𝒂𝟐𝟐 +

 𝒃𝟏𝟏

𝒃𝟐𝟏 𝒃𝟏𝟐

𝒂 + 𝒃𝟏𝟏 = 𝟏𝟏 𝒃𝟐𝟐 𝒂𝟐𝟏 + 𝒃𝟐𝟏 𝒂𝟏𝟐

+ 𝒃𝟏𝟐 𝒂𝟐𝟐 + 𝒃𝟐𝟐 𝒂𝟏𝟏

 𝒂 𝟐𝟏 𝒂𝟏𝟐

𝒂𝟐𝟐 × 𝒃𝟏𝟏

𝒃𝟐𝟏 𝒃𝟏𝟐

= 𝒃𝟐𝟐 𝒂𝟏𝟏

× 𝒃𝟏𝟏 (𝒂𝟏𝟐 × 𝒃𝟐𝟏) 𝒂𝟐𝟏 × 𝒃𝟏𝟏 (𝒂𝟐𝟐 × 𝒃𝟐𝟏 ) 𝒂𝟏𝟏

× 𝒃𝟏𝟐 (𝒂𝟏𝟐 × 𝒂𝟐𝟐 ) 𝒂𝟐𝟏 × 𝒃𝟏𝟐 (𝒂𝟐𝟐 × 𝒃𝟐𝟐 )

If A = 𝒂

𝒄 𝒃

𝒅

Then determination of A = 𝑨 = ad – bc If 𝑨 = 0, than A = singular matrix If 𝑨 ≠ 0, than A = non-singular matrix when 𝑨 ≠ 0

A-1 = A-1 = If I = 𝟏

𝑨 𝟏 𝒅

−𝒄

−𝒃 or 𝒂 𝒅 −𝒄 −𝒃 𝒂 𝒂𝒅

– 𝒃𝒄 𝟏

𝟎 𝟎

𝟏

Then, A A-1 = A-1 A = I
When a matrix A of type m×n and a matrix B type p×q are multiplied, we obtained a matrix C of type, m×q. With condition that n = p
49 | P a g e

BATCH 2009 - 2011

POOLED COVARIANCE MATRIX, S Condition I Variable 1 p1 p2 M 𝒑𝒎
   

Condition II Variable 2 q1 q2 M 𝒒𝒎 Variable 1 𝜶𝟏 𝜶𝟐 M 𝜶𝒏 Variable 2 𝜷𝟏 𝜷𝟐 M 𝜷𝒏

Let p̅ be the mean of the values of variables 1 under condition I. Let q̅ be the mean of the values of variables 2 under condition I. Let α̅ be the mean of the values of variables 1 under condition II. Let β̅ be the mean of the values of variables 2 under condition II.

Let 𝒚𝟏

= 𝒑

𝒒
i m

, 𝒚𝟐

= 𝜶

𝜷
j n 𝒚𝟏

− 𝒚𝟐 = 𝒑

− 𝜶 𝒒− 𝜷

50 | P a g e

BATCH 2009 - 2011

Pooled covariance matrix, S

S= 𝟏

[ 𝒎+𝒏−𝟐 ] × 𝒎 𝒏 𝒎 𝒏

(𝒑𝒊 − 𝒑)𝟐 + 𝒊
=𝟏 𝒎 𝒋=𝟏 𝒏

(∝𝒋 −∝)𝟐 𝒊
=𝟏 𝒑𝒊

− 𝒑 𝒒𝒊 − 𝒒 + 𝒎
𝒋=𝟏 𝒏

∝𝒋 −∝ 𝜷𝒋 − 𝜷 (𝜷𝒋 − 𝜷)𝟐 𝒋
=𝟏 𝒑𝒊

− 𝒑 (𝒒𝒊 − 𝒒) + 𝒊
=𝟏 𝒋=𝟏

∝𝒋 −∝ (𝜷𝒋 − 𝜷) 𝒊
=𝟏

(𝒒𝒊 − 𝒒)𝟐 + 𝓢

= S-1 𝒑

− 𝜶 𝒒−𝜷

= S-1 𝝀

𝝁

51 | P a g e

BATCH 2009 - 2011

FISHER DISCRIMINANT FUNCTION, Z 𝚭
= 𝝀𝒚𝟏 + 𝝁𝒚𝟐
Condition I Variable 1 𝒚𝟏 p1 p2 M 𝒑𝒎 Variable 2 𝒚𝟐 q1 q2 M 𝒒𝒎 Condition II Variable 1 𝒚𝟏 𝜶𝟏 𝜶𝟐 M 𝜶𝒏 Variable 2 𝒚𝟐 𝜷𝟏 𝜷𝟐 M 𝜷𝒏

Here, y1 & y2 are two different variables, both taking values under different conditions. 𝚭

cut-off = 𝒏

𝒁𝑨 + 𝒎 𝒁𝑩 𝒎+𝒏

Where, 𝒁𝑨 = mean discriminate function of condition I 𝒁𝑩 = mean discriminate function of condition II

52 | P a g e

BATCH 2009 - 2011

CLUSTER ANALYSIS
Q P
Example:

1 0

1 a c Q

0 b d

P

1 0

1 a=6 c=1

0 b=1 d=2

a = ( 1, 1 ) ; b = ( 1, 0 ) c = ( 0, 1 ) ; d = ( 0, 0 )

 Similarity co-efficient C ( P, Q ) = = = 𝒂
+𝒅 𝒂 + 𝒃 + 𝒄 + 𝒅 𝟔+𝟐 𝟔+𝟏+𝟏+𝟐 𝟖 𝟏𝟎

= 0.8
Inference: there is 80 % similarity between two point P and Q

53 | P a g e

BATCH 2009 - 2011

MATCHING COEFFICENT WITH CORRECTION TERM :

1. Rogers and Tanimoto coefficient of Matching:
By giving double weight for unmatched pairs of attributes, the matching coefficient with correction term is defined as-

2. Sokal and Sneath coefficient of matching
By giving double weight for matched pairs of attributes, the matching coefficient with correction term is defined as-

C ( P, Q ) = 𝒂

+𝒃 𝒂+𝒃+𝟐 ( 𝒃+ 𝒄 )

C ( P, Q ) = 𝟐

( 𝒂+𝒃 ) 𝟐( 𝒂+𝒃 )+ 𝒃+ 𝒄

Perfect similarity between P and Q occurs when, b = c = 0, C ( P, Q ) = 1 Maximum dissimilarity between P and Q occurs when, a = b = 0, C ( P, Q ) = 0 E.g. C ( P, Q ) = = 𝟔
+𝟐 𝟔+𝟐+𝟐 (𝟏+𝟏) 𝟔

Prefect similarity between P and Q occurs when, b = c =0, C ( P, Q ) = 1 Maximum dissimilarity occurs when, a = d = 0, C ( P, Q ) = 0 E.g. C ( P, Q ) = = 𝟐
(𝟔+𝟐) 𝟐 𝟔+𝟐 + 𝟏 + 𝟏 𝟏𝟔 𝟏𝟖 𝟏𝟐

= 0.89 Thus, the similarity between p and Q is estimated as 89%.

= 0.67 So, the estimate of similarity between P and Q is 0.67 or 67%

54 | P a g e

BATCH 2009 - 2011

COMPARISION OF THE THREE COEFFICIENTS OF SIMILARITY, 𝒂

+𝒃 𝒂+𝒃+𝟐 ( 𝒃+ 𝒄 )

≤ 𝒂

+𝒅 𝒂 + 𝒃 + 𝒄 + 𝒅

≤ 𝟐

( 𝒂+𝒃 ) 𝟐( 𝒂+𝒃 )+ 𝒃+ 𝒄

Rogers and Tanimoto coefficient

Simple matching coefficient

Sokal and sneath coefficient

Pessimistic estimate of Moderate estimate of similarity similarity

Optimistic estimate of similarity

55 | P a g e

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->