This action might not be possible to undo. Are you sure you want to continue?
https://www.scribd.com/doc/30278785/StatisticsFormulae
08/06/2013
text
original
STATISTICS FORMULAE
PREPARED BY: KEYUR SAVALIA DIPA SHAH KRISHNA RAJPUT NIKITA SANGHVI MITESH SHAH BATCH: 2009 2011
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Index
Chapter
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Name of Chapter
Grouping and Displaying data to convey Meaning : Tables and Graphs Measure of Central Tendency and Dispersion in Frequency Distribution Probability 1 Introduction and Ideas Probability Distribution Sampling and Sampling Distribution Estimation Testing Hypotheses: One – Sample Tests Testing Hypotheses: Two – Sample Tests Chi –Square And Analysis Of Variance Simple Regression and Correlation Simple Regression and Correlation Index Numbers
Pg. No.
3 5 11 14 17 20 23 26 31 40 45
2P a ge
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 1
Grouping and displaying data to convey Meaning: Tables and Graphs
3P a ge
STEVENS BUSINESS SCHOOL
Width of the Class Intervals =
BATCH 2009  2011 𝑵𝒆𝒙𝒕
𝒖𝒏𝒊𝒕 𝒗𝒂𝒍𝒖𝒆 𝒂𝒇𝒕𝒆𝒓 𝒍𝒂𝒓𝒈𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆 𝒊𝒏 𝒅𝒂𝒕𝒂 − 𝑺𝒎𝒂𝒍𝒍𝒆𝒔𝒕 𝒗𝒂𝒍𝒖𝒆 𝒊𝒏 𝒅𝒂𝒕𝒂 𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝑰𝒏𝒕𝒆𝒓𝒗𝒂𝒍𝒔
Note:  1) To arrange raw data, Decide the number of classes into which you will divide the data 2) Normally total number of the class intervals is between 6 and 15
4P a ge
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter 2
Measure of Central Tendency and Dispersion in Frequency Distribution
5P a ge
STEVENS BUSINESS SCHOOL 2.1 Population Arithmetic Mean
BATCH 2009  2011 𝒙
𝝁 = 𝑵
x  Sum of the values of all the elements in the population N  Number of the elements in the population.
2.2 Sample Arithmetic Mean 𝒙
= 𝒙
𝒏
x  Sum of the values of all the elements in the Sample n  Numbers of the elements in the Sample.
2.3 Sample Arithmetic mean of Grouped Data 𝒙
=
( 𝒇 × 𝒙 ) 𝒏
( 𝑓 × 𝑥 )  calculate the midpoints (x) for each class in the sample. Multiply each midpoint by the frequency (f) of observation in the class, sum (∑) all these results, n = total number of observation in sample = 𝑓
6P a ge
STEVENS BUSINESS SCHOOL 2.4 Weighted Average Mean
BATCH 2009  2011 𝒙𝒘
=
( 𝒘 × 𝒇 ) 𝒘 𝑤
× 𝑓  calculate this average by multiplying the weight of each element (u) by that element (x), sum (∑) all these results, ∑ w = sum of all the weights
2.5 Sample arithmetic mean of grouped data using code 𝒙
= 𝒙𝟎 + 𝒄
( 𝒖 × 𝒇 ) 𝒏 𝑥
= Mean of the sample 𝑥0 = Value of the midpoint assigned the code 0 = this can be the middle point of the class corresponding to highest frequency c = numerical width of the class interval u = code assigned to each class f = frequency or number of observation in the each class n = total number of observation in the sample
2.6 Geometric mean
GM = 𝑛 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑥 𝑣𝑎𝑙𝑢𝑒𝑠
2.7 Median
( 𝒏+𝟏) 𝟐
Median =
n = number of items in the data array.
7P a ge
STEVENS BUSINESS SCHOOL 2.8 Sample median of grouped data
𝒏
− (𝑭+𝟏) 𝟐
BATCH 2009  2011 𝒎
= ( 𝒇𝒎
) w+𝑳𝒎 𝑚
= Sample median n = total number of items in the description F = sum of all the class frequency up to, but not including, the median class 𝑓 = frequency of the median class 𝑚 w = class interval width 𝐿𝑚 = lower limit of medianclass interval
2.9 Mode
Mo = 𝐿𝑚 0 + 𝑑
1 𝑑1 +𝑑2 𝑤 𝐿𝑚
0 = Lower limit of the modal class d1 = Frequency of the modal class minus the frequency of the directly below it d2 = Frequency of the modal class minus the frequency of the directly above it w = width of the modal class interval
2.10 Range
Range = value of highest observation – value of lowest observation
2.11 Inter quartile range
Inter quartile range = Q3 –Q1
Q1 = Value of first quartile = P25 Q3 = Value of third quartile = P75
8P a ge
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
2.12 Population variance 𝝈𝟐
=
(𝒙 − 𝝁)𝟐 𝒙𝟐 = − 𝝁𝟐 𝑵 𝑵 𝜎
2 = Population variance x = items or observation µ = Population mean N = total number of items in the population
2.13 population standard deviation
σ = 𝝈𝟐 =
(𝒙−𝝁)𝟐 𝑵
= 𝒙𝟐
𝑵
− 𝝁𝟐
x = observation µ = Population mean N = Total number of elements in the population ∑ = sum of all the value(𝑥 − 𝜇)2 , or all the value 𝑥 2 𝜎 = Population standard deviation 𝜎 2 =Population variance
2.14 Population standard score Population Standard Score =
𝒙
−𝝁 𝝈
x = observation from the population µ = Population mean σ = Population Standard deviation
2.15 Sample Variance 𝟐 𝒔𝟐
=
(𝒙− 𝒙 ) 𝒏−𝟏
= 𝒙𝟐
𝒏−𝟏
− 𝒏𝒙𝟐
𝒏−𝟏
9P a ge
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
2.16 Sample Standard Deviation 𝟐 𝒔
= 𝒔𝟐
=
(𝒙− 𝒙 ) 𝒏−𝟏
= 𝒙𝟐
𝒏−𝟏
− 𝒏𝒙𝟐
𝒏−𝟏 𝒔𝟐
= Sample Variance s = Sample Standard Deviation x = Value of each of the n observation 𝒙 = Mean of the Sample n1 = number of observation in the sample minus 1
2.17 Sample Standard Score
Sample Standard Score =
x = Observation of Sample 𝒙 = Sample Mean S = Sample Standard Deviation
𝒙
− 𝒙 𝑺
2.18 Population Coefficient of Variance Population Coefficient of Variance =
σ = Standard deviation of population µ = Mean of the Population
𝝈
𝝁
× 100%
10  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 3
Probability 1 Introduction and Ideas
11  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
3.1 Probability of an event = 𝒏𝒖𝒎𝒃𝒆𝒓
𝒐𝒇𝒐𝒖𝒕𝒄𝒐𝒎𝒆 𝒘𝒉𝒆𝒓𝒆 𝒕𝒉𝒆 𝒆𝒗𝒆𝒏𝒕 𝒐𝒄𝒄𝒖𝒓𝒔 𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒑𝒐𝒔𝒔𝒊𝒃𝒍𝒆 𝒐𝒖𝒕𝒄𝒐𝒎𝒆
This is definition of the Classical probability that an event will occur. P (A) = Probability of event A happening A single probability refers to the probability of one particular event occurring, and it is call marginal probability. P (A or B) = Probability of either A or B happening This notion represents the probability that one event or the other will occur.
3.2
P (A or B) = P (A) + P (B)
The probability of either A or B happening when A and B are mutually exclusive equals of the sum of the probability of event A happening and the probability and the probability of event B happening. This is addition rule for mutually exclusive events.
3.3 P(A or B) = P(A) + P(B) – P(AB)
The addition rule for that are not mutually exclusive shows that the probability of A or B happening when event A or B happening are not mutually exclusive is equal to the probability of event A happening plus the probability of event B happening minus the Probability of A and B together, P(AB).
3.4 P (AB) = P(A) × P(B)
P (AB) =joint probability of event A and B occurring together or in succession. P (A) = marginal probability of event A happening P (B) = marginal probability of event B happening This joint probability of two or more independent events occurring together or in succession is the product of their marginal probability.
12  P a g e
STEVENS BUSINESS SCHOOL
P (B│A) = P (B)
BATCH 2009  2011
For statistical independent events, the conditional probability of event B, given that event A has occurred, is simply the probability of event B. Independent events are those whose probability is in no way affected by the occurrence of each other.
3.6 P (B│A) =
𝑷
(𝑩𝑨) 𝑷(𝑨)
And
P (A│B) = 𝑷
(𝑨𝑩) 𝑷(𝑩)
For statistically dependent events, the conditional probability of event B, given that event A occurred, is equal to the joint probability of event A and divided by the marginal probability of event A.
3.7 LAW OF MULTIPLICATION P (AB) = P (A│B) × P (B) P (BA) = P (B│A) × P (A)
For statistically dependent events, the joint probability of events A & B happening together or in succession is equal to the probability of event A, given that event B has already happened, multiplied by the probability that event B will occur.
13  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 4
Probability Distribution
14  P a g e
STEVENS BUSINESS SCHOOL 4.1 Binomial Formula
BATCH 2009  2011
Probability of r successes in n Bernoulli trials =
r = number of successes desired {r = 0, 1, 2,…., n} n = number of trials undertaken p = probability of success {0≤ p ≤1} q = probability of failure = 1p
𝒏
! 𝒓!(𝒏−𝒓)! 𝒑𝒓
𝒒𝒏−𝒓
4.2 Mean of Binomial Distribution
µ = np
n = number of trials p = Probability of success
4.3 Standard Deviation of a Binomial Distribution
σ = 𝒏𝒑𝒒
n = number of trial p = probability of success q = probability of failure = 1p
4.4 Calculating Poisson Probability
P(x) = 𝝀
𝒙 × 𝒆−𝝀 𝒙!
; if x=0,1,2,….,∞ ; 𝝀 ≥0
Otherwise P(x) = 0
P(x) = probability of exactly x occurrence 𝝀𝒙 = Lambda (the mean number of occurrences per interval of time) raised to the x power e= 2.71828 𝒆−𝒙 = e or 2.71828, raised to the negative lambda power x! = x factorial
15  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
4.5 Poisson Probability Distribution as an Approximation of the Binomial
(𝒏𝒑)𝒙 𝒆 −𝒏𝒑 𝒙!
P(x) =
n = number of trial p = probability of success x! = x factorial The rule most often used by statisticians is that the Poisson is a good approximation of the binomial when n is greater than or equal to 20 and P is less than or equal to 0.05
4.6 Standardizing a Normal Random Variance
𝒙
−𝝁 𝝈
Z=
x Z µ σ
= value of the random variable with which we are concerned =number of standard deviation from x to the mean of this distribution = mean of the distribution of this random variables = Standard Deviation of this distribution
Note :Normal distribution can be used as an approximation of Binomial Distribution when; np > 5 nq > 5
n = number of trial p = probability of success q = probability of failure = 1p
16  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 5 Sampling and Sampling Distribution
17  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
5.1 When sampling distribution has a mean equal to the population mean
µx̅ = µ
5.2 Standard Deviation Error of the Mean for Infinite Populations
𝝈
𝒏
𝝈𝒙
= 𝛔
𝐱 = Standard error of the mean
𝝈
= Population standard deviation 𝒏 = Sample size This equation states that the sampling distribution has a standard deviation equal to the population standard deviation divided by the square root of the sample size.
5.3 Standardizing the Sample Mean
z= 𝒙
− 𝝁 𝝈𝒙 𝒙
= sample mean µ = population mean 𝝈 𝝈𝒙 = Standard error of the mean = 𝒏
18  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
5.4 Standard Error of the Mean for Finite Populations
𝝈
𝒏 𝑵−𝒏 𝑵−𝟏
𝝈𝒙
=
×
N = size of population n = size of sample And
𝑵
−𝒏 𝑵−𝟏
is Finite Population Multiplier
Note:  Use above equation for calculation standard error of mean only when; 𝒏 > 0.005 𝑵
19  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 6 Estimation
20  P a g e
STEVENS BUSINESS SCHOOL 6.1 Estimation of Population Standard Deviation
(𝒙−𝒙)𝟐 𝒏−𝟏
BATCH 2009  2011 𝝈
= s =
Note : This formula indicates that the sample standard deviation can be used to estimate the population standard deviation, When the population standard deviation is unknown; use above equation
6.2 Estimated Standard Error of the Mean of a Finite Population
𝝈
𝒏 𝑵−𝒏 𝑵−𝟏
𝝈𝒙
=
× 𝝈𝒙
= Symbol that indicated an estimated value 𝝈 = Estimate of the population standard deviation Note : Use above formula only when
𝒏
𝑵
> 0.005
6.3 Mean of the Sampling Distribution of the Proportion 𝝁𝒑
= 𝒑
𝒑
= the proportion of success in sample P = the proportion of success in population
21  P a g e
STEVENS BUSINESS SCHOOL 6.4 Standard Error of the Proportion
𝒑𝒒
𝒏
BATCH 2009  2011 𝝈𝒑
=
𝝈𝒑
P q n
= Standard Error of the Proportion = the proportion of success in population = the proportion of failure in population = total No. of outcome
6.5 Estimate Standard Error Of the Proportion
𝒑
𝒒 𝒏
𝝈𝒑
= 𝝈𝒑
= Estimate Standard Error of the Proportion 𝒑 = proportion of success in sample 𝒒 = Proportion of failure in sample n = Sample size
Note:  When use tdistribution; Degree of Freedom = n1 n = sample size
6.6 Estimated Standard Error of the Mean of an Infinite Population 𝝈𝒙
= 𝝈
𝒏
22  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 7
Testing Hypothesis: One sample tests
23  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Condition for using the Normal and tdistribution in Testing Hypothesis about Means
Sample Size
When the population Standard Deviation is known Normal Distribution Normal Area Normal Distribution Normal Area
When the population Standard Deviation is unknown Normal Distribution Normal Area tdistribution ttable
Sample size n > 30 Sample size n ≤ 30
Hypothesis testing of means is used when the population standard deviation is known. 𝝈𝒙 = Standard error of the mean = Z =
𝒙
− µ 𝑯𝟎 𝝈𝒙 𝒙 − µ 𝑯𝟎 𝒔 𝝈 𝒏
tvalue =
𝝈𝒙
= σ = n = 𝒙 = µ𝑯𝟎 =
Standard error of the mean Standard deviation of population sample size sample mean Population mean
24  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Standard error of the proportion 𝝈𝒙
=
𝒑𝑯𝟎
𝒒𝑯𝟎 𝒑 𝒒 n = = = = =
𝒑𝑯𝟎
× 𝒒𝑯𝟎 𝒏 𝝈𝒙
= Standard error of the proportion
Hypothesis values of the population proportion of success (1 𝒑𝑯𝟎 ) = Hypothesis values of the population proportion of failure Sample proportion of promotable Sample proportion of judged not promotable = (1  𝒑 ) Sample size
Z score = 𝒑
− µ 𝑯𝟎 𝝈𝒑
Note : Use above formula when np > 5 nq > 5
25  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 8
Testing Hypotheses: Two – Sample Tests
26  P a g e
STEVENS BUSINESS SCHOOL 8.1 Standard Error of the Difference Between Two Means When n > 30
BATCH 2009  2011 𝝈𝒙𝟏
– 𝒙𝟐 = 𝝈𝟐
𝟏 𝒏𝟏
+ 𝝈𝟐
𝟐 𝒏𝟐 𝝈𝟐
= Variance of Population 1 𝟏 𝝈𝟐 = Variance of Population 2 𝟐 𝒏𝟏 = Size of sample 1 𝒏𝟐 = Size of sample 2
8.2 Estimated Standard Error of the Difference between Two Population When n > 30 𝝈𝒙𝟏
– 𝒙𝟐 = 𝝈𝟐
𝟏 𝒏𝟏
+ 𝝈𝟐
𝟐 𝒏𝟐 𝝈𝟐
= Estimated variance of population 1 𝟏 𝝈𝟐 = Estimated variance of population 2 𝟐
27  P a g e
STEVENS BUSINESS SCHOOL 8.2.1 Z score value
𝒙𝟏
− 𝒙𝟐 − 𝝁𝟏 − 𝝁𝟐 𝑯𝟎 𝝈𝒙𝟏− 𝒙𝟐
BATCH 2009  2011
Z=
when n > 30
Z = Z – score 𝒙𝟏 = Sample mean of population 1 𝒙𝟐 = Sample mean of population 2 𝝁𝟏𝑯𝟎 = Hypothesized Value of Mean of population 1 𝝁𝟐𝑯𝟎 = Hypothesized Value of Mean of population 2 𝝈𝒙𝟏 – 𝒙𝟐 = Estimated Standard Error of the Difference between Two Population
Note : only use above 3 equations when n > 30, i.e. For large sample only
8.3 Pooled Estimate of 𝝈𝟐 𝒔𝟐
𝒑
𝒔𝟐
𝒑 𝒏𝟏 𝒏𝟐 𝑺𝟏 𝑺𝟐
= 𝒏𝟏
−𝟏 𝑺𝟐 + 𝒏𝟏 −𝟏 𝑺𝟐 𝟏 𝟐 𝒏𝟏 +𝒏𝟐 −𝟐
= Pooled Estimate of 𝝈𝟐 = Size of sample 1 = Size of sample 2 = Standard deviation of sample of population 1 = Standard deviation of sample of population 2
Note :  Here , n1 and n2 < 30 Note : Degree of Freedom for tdistribution Degree of freedom = 𝒏𝟏 + 𝒏𝟐 − 𝟐
28  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
8.4 Estimated Standard Error of the Difference between Two Sample Mean with Small Sample and Equal Population Variance 𝝈𝒙𝟏
– 𝒙𝟐 = 𝒔𝒑 𝒔𝒑 = 𝟏
𝒏𝟏
+ 𝟏
𝒏𝟐
Where 𝒔𝟐
= 𝑷𝒐𝒐𝒍𝒆𝒅 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 𝒐𝒇 𝝈𝟐 𝒑 𝒙𝟏
= Sample mean of population 1 𝒙𝟐 = Sample mean of population 2 𝝁𝟏 − 𝝁𝟐 𝑯𝟎 = Hypothesized value of difference between two population mean
t = 𝒙𝟏
− 𝒙𝟐 − µ 𝟏 − µ 𝟐 𝑯𝟎 𝝈𝒙𝟏− 𝒙𝟐
8.5 Standard Error of the Difference between Two Proportions 𝝈𝒑𝟏
– 𝒑𝟐 = 𝒑𝟏
𝒒𝟏 𝒏𝟏
+ 𝒑𝟐
𝒒𝟐 𝒏𝟐
8.6 Estimated Standard Error of the Difference between Two Proportions
𝒑𝟏
𝒒𝟏 𝒏𝟏 𝒑𝟐 𝒒𝟐 𝒏𝟐
29  P a g e
𝝈𝒑𝟏
– 𝒑𝟐 =
+
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
8.7 Estimate Overall Proportion of Successes in Two Populations 𝒑
= 𝒏𝟏
𝒑𝟏 + 𝒏𝟐 𝒑𝟐 𝒏𝟏 + 𝒏𝟐
8.8 Estimate Standard Error of the difference between Two Proportion Using Combined Estimates from both Samples 𝝈𝒑𝟏
– 𝒑𝟐 = 𝒑
𝒒 𝒏𝟏
+ 𝒑
𝒒 𝒏𝟐
Z= 𝒑𝟏
− 𝒑𝟐 − 𝒑𝟏 − 𝒑𝟐 𝑯𝟎 𝝈𝒑𝟏− 𝒑𝟐 𝒑
= Estimate of the overall proportion of successes in the population using combined proportion from both samples 𝒏𝟏 = Size of sample 1 𝒏𝟐 = Size of sample 2 𝒑𝟏 = Proportion of population 1 with error 𝒑𝟐 = Proportion of population 2 with error 𝒒 = 1  𝒑 𝝈𝒑𝟏− 𝒑𝟐 = Estimate Standard Error of the difference between Two Proportion Using Combined Estimates from both Samples
30  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 9 CHI –SQUARE AND ANALYSIS OF VARIANCE
31  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
9.1 Chi –Square as a Test of Independence
9.1.1 Chi – Square static
( 𝒇𝟎 − 𝒇𝒆 ) 𝒇𝒆
𝝌𝟐
= ∑
~ X2 ( n – 1 ) 𝒇𝟎
= An observed frequency 𝒇𝒆 = An expected frequency 𝝌𝟐 = Chi  Square
9.1.2 Number of Degrees of freedom
No. of degree of freedom = (no. of rows  1) × (no. of columns  1)
9.1.3 Expected frequency for any cell 𝒇𝒆
= 𝑹𝑻
× 𝑪𝑻 𝒏 𝑹𝑻
= Row total for the Row containing that cell 𝑪𝑻= Column total for the Column containing that cell n = Total number of observation or grand total of frequency
9.1.4 Using the fdistribution : Degree of Freedom
No. of degree of freedom in the numerator of the F ratio = ( no. of Sample  1 ) No. of degree of freedom in the denominator of the F ratio = ∑ ( 𝒏𝒋 − 𝟏) = 𝒏𝑻 − 𝑲
𝒏𝒋
= Size of the 𝒋𝒕𝒉 sample 𝒏𝑻 = total sample size K = no. of samples F – Table is given in Appendix Table 6(B) – Book Name: STATISITICS FOR MANAGEMENT
32  P a g e
STEVENS BUSINESS SCHOOL 9.1.5 F – Ratio inference about two variances
BATCH 2009  2011
F =
(𝒔𝟏 )𝟐 (𝒔𝟐 )𝟐
Where 𝒔𝟐 = Variance of the sample
9.1.6a Uppertail value of F – for two tailed test
F = ( n, d, 𝜶 ) or Fα ( n, d )
n = numerator degree of freedom d = denominator degree of freedom 𝜶 = area in upper limit
9.1.6b lower – tail value of F for twotailed test
F = ( n, d, 𝜶 ) = Fα = ( n, d ) = 𝟏
𝑭 ( 𝒅, 𝒏, 𝟏−𝜶) 𝟏
or 𝑭𝟏
−𝜶 ( 𝒅, 𝒏 )
33  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
9.2 ANALYSIS OF VARIANCE
9.2.1 ANOVA TABLE FOR ONEWAY CLASSIFIED DATA
Source of variation
Degrees of freedom (df)
Sum of squares (SS)
Mean squares (MS)
Between the level of the factor (treatment)
(TrSS) k–1 𝑸𝑻 =
𝒊
𝒌
TrSS / df 𝑴𝑻 = 𝑸𝑻 𝒌 − 𝟏 𝑻𝒊𝟐
𝑮𝟐 − 𝒏𝒊𝟐 𝑵
(ESS) Within the level of factors (errors) Nk by subtraction TSS – TrSS = 𝑸𝑺
ESS / df 𝑴𝑻 =
𝑸𝑬
𝑵−𝒌
(TSS) Total N–1 𝑸 = 𝒊𝒋 𝑮𝟐
𝒚 𝒊𝒋 − 𝑵 𝟐

F= 𝒈𝒓𝒆𝒂𝒕𝒆𝒓
𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝒔𝒎𝒂𝒍𝒍𝒆𝒓 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆
= 𝑻𝒓𝑺𝑺
𝒅𝒇 𝑬𝑺𝑺 𝒅𝒇
= F K1, NK
34  P a g e
STEVENS BUSINESS SCHOOL Working Rule for an Example
BATCH 2009  2011
We have to consider three quantities G, N and correction factor (CF), defined as follows: G = sum of all the values for all the treatments N = sum of the no. of times each treatment is applied
CF = 𝑮𝟐
𝑵
E.g. suppose there are 3 treatment A, B, and C Suppose the no. of the times treatment is applied is n1 in case of A, n2 in case of B, n3 in case of C sum of the values of three treatments are denoted by T1, T2, and T3.
A B C
1 2 9
n1 = 4 ; T1 = 1 + 2 + 3 + 4 n2 = 4 ; T1 = 2 + 5 + 6 + 7 n3 = 4 ; T1 = 9 + 2 + 3 + 5 N = n1 + n2 + n3
2 5 2
3 6 3
4 7 5
1. G = T1 + T2 + T3 2. CF =
𝑮𝟐
𝑵
=
( 𝑻𝟏 + 𝑻𝟐 + 𝑻𝟑 )𝟐 𝒏𝟏 + 𝒏𝟐 + 𝒏𝟑
3. TSS = sum of the squares of the observed values – CF 4. TrSS =
𝑻𝟏
𝟐 𝒏𝟏
+ 𝑻𝟐
𝟐 𝒏𝟐
+ 𝑻𝟑
𝟐 𝒏𝟑
 CF
5. ESS = TSS  TrSS
35  P a g e
STEVENS BUSINESS SCHOOL CALCULATION OF DEGREES OF FREEDOM 6. df for treatments = No. of treatment – 1 =k–1
BATCH 2009  2011
7. df for the total = Total no. of times all the treatments have been applied – 1 = N – 1 = n 1 + n2 + n 3 – 1 8. df for error = N – K total no. of times all the treatments have been applied – no. if treatments
CALCULATION OF MEAN SQURE 9. MT = 10. ME =
𝑻𝒓
𝑺𝑺 𝒅𝒇
𝑬𝑺𝑺
𝒅𝒇
Calculation of F value, variance ratio 11. F =
𝑴𝑻
𝑴𝑬
; Fk1, NK
12. Inference: if the observed value of F is less than the expected value of F, i.e. F 0 < Fe, for a given level of significance α, then the null hypothesis of equal treatment effect is accepted.
36  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
9.2.2 ANOVA TABLE FOR CRD
Source of variation
Degrees of freedom (df)
Sum of squares (SS)
Mean squares (MSS)
Variance Ratio F
TrSS
Between the level of the factor (Treatment)
k–1 𝐐𝐓 = 𝒊 𝑻𝒊
𝟐 𝑮𝟐 − 𝑵 𝑵
MT = 𝑸𝑻
𝒌−𝟏
FT = 𝑴𝑻
𝑴𝑬
: Fk1, (N1)
Within the level of factors (errors) Total
(Nk)
QE = Q – QT
ME = 𝑸𝑬
𝒌−𝟏

N–1 𝐐
= 𝒊𝒋 𝒚𝒊𝒋 𝟐 𝑮𝟐
− 𝑵


37  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
9.2.3 ANOVA TABLE FOR RBD
Source of variation Between the level of the factor (treatment)
Degrees of freedom (df) k–1
Sum of squares (SS) TrSS 𝑸𝑻 = 𝒊
Mean squares (MSS)
Variance Ratio F 𝑻𝒊
𝑮 − 𝒓 𝒓𝒌 𝟐 𝟐
MT = 𝑸𝑻
𝒌−𝟏
FT = 𝑴𝑻
𝑴𝑬
: Fk1, (k1)(r1)
BSS Blocks r1 𝑸𝑩 = 𝒊 𝑩𝒋
𝑮𝟐 − 𝒌 𝒓𝒌 𝟐
MB= 𝑸𝑩
𝒓−𝟏
FB = 𝑴𝑩
𝑴𝑬
: Fr1, (k1)(r1)
Within the level of factors (errors) Total
(k1)(r1)
QE = Q – ( Q T + QB )
ME = 𝑸𝑬
𝒌−𝟏 (𝒓−𝟏)

(rk – 1) = N –1 𝐐
= 𝒊𝒋 𝒚𝒊𝒋 𝟐 𝑮𝟐
− 𝒓𝒌


38  P a g e
STEVENS BUSINESS SCHOOL Example Treatment A B C Here, k = 3 and r = 4
T1 = 72 + 68 + 70 + 56 =266 T2 = 55 + 60 + 62 + 55 =232 T3 = 65 + 70 + 70 + 60 = 265 B1 = 72 + 55 + 65 = 192 B2 = 68 + 60 + 70 = 198 B3 = 70 + 62 + 70 = 202 B4 = 56 + 55 + 60 = 171
BATCH 2009  2011
Block 1 72 55 65
Block 2 68 60 70
Block 3 70 62 70
Block 4 56 55 60
G = Grand Total of all the observations (rk) G2 =(72)2 + (68)2 +(70)2 +(56)2 +(55)2 +(60)2 +(62)2 +(55)2 +(65)2 +(70)2 +(70)2 +(60)2 = CF =
𝑮𝟐
𝒓𝒌
= 𝑮𝟐
𝑵
= 𝑮𝟐
𝟑×𝟒
TrSS =
(𝑻𝟏 )𝟐 𝒃
+
(𝑻𝟐 )𝟐 𝒃
+
(𝑻𝟑 )𝟐 𝒃
+
(𝑻𝟒 )𝟐 𝒃
 CF
o Here, b = no. of block = 4
BSS =
(𝑩𝟏 )𝟐 𝒕
+
(𝑩𝟐 )𝟐 𝒕
+
(𝑩𝟑 )𝟐 𝒕
 CF
o Here, t = no. of treatment = 3
TSS = G2  CF ESS = TSS – ( TrSS – BSS )
39  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 10 Simple Regression and Correlation
40  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
10.1
CORRELATION
Positive correlation
Negative correlation
10.1.1 KARL PERSON’S Coefficient of Sample Correlation
r= 𝑵
𝑵 𝑿𝟐 − 𝑿
𝒀 − ( 𝑿) ( 𝒀) 𝑿 𝟐 × 𝑵 𝒀𝟐 − 𝒀 𝟐

The value of r ranges from 1 to 0 and 0 to 1 r = 0, no correlation between X and Y r = 0.9 ≥ r ≥ 1, Shows a degree of correlation between X and Y r = 1, prefect positive correlation r = 1, prefect negative correlation 0.75 < r < 1.0 = high degree of positive correlation 0.6 < r < 0.75 = moderate degree of correlation
41  P a g e
STEVENS BUSINESS SCHOOL 10.1.2 SPEARAMAN’S Rank Correlation Coefficient
BATCH 2009  2011 𝝆
= 1 – 𝟔 𝑫𝟐 𝑵𝟑
− 𝑵
D = difference between the corresponding ranks of X and Y = Rx  RY N = total numbers of pairs of observations X and Y
10.1.3 Correlation term for 𝝆 when ranks ‘m’ items
If there is a tie involving ‘m’ items, we have to add,
𝒎𝟑
− 𝒎 𝟏𝟐
To the term D2 in
𝝆
= 1 – 𝟔
( 𝑫𝟐 + 𝒎𝟑
− 𝒎 ) 𝟏𝟐 𝑵𝟑
− 𝑵
42  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
10.2
10.2.1 Regression of Y on X
REGRESSION
10.2.2 Regression of X on Y
Y  𝒀 = bxy (X – 𝑿) Y = a + bx a= b=
𝒏
𝒏 𝑿𝟐 𝒏 𝒀 − 𝑿 𝑿𝒀
𝟐
X  𝑿 = bxy (Y – 𝒀) X = a + by a= b=
𝒏
𝒏 𝒀𝟐 𝒏 𝑿 − 𝑿 𝑿𝒀
𝟐
𝑿𝟐
− ( 𝑿) 𝑿𝒀 − 𝑿 𝒀𝟐
− ( 𝒀) 𝑿𝒀 − 𝑿 𝒀 𝟐 𝒀 𝟐 𝑿𝟐
− ( 𝑿) 𝒀𝟐
− ( 𝒀)
Let 𝝈𝒙 , 𝝈𝒚 denote the standard deviations of x, y respectively. Then
byx = r bxy = r 𝝈𝒚
𝝈𝒙 𝝈𝒙 𝝈𝒚
so, r2 = byx × bxy r = 𝐛𝐲𝐱 × 𝐛𝐱𝐲
This method of regression is very useful for business forecasting.
43  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
10.3
10.3.1 r10.1.3 =
PARTIAL AND MULTIPLE CORRELATION
PARTIAL CORRELATION
𝒓𝟏𝟐
− 𝒓𝟏𝟑 𝒓𝟐𝟑 𝟏−𝒓𝟐𝟏𝟑 (𝟏− 𝒓𝟐𝟐𝟑 )
r13.2 = 𝒓𝟏𝟑
− 𝒓𝟏𝟐 𝒓𝟑𝟐 𝟏−𝒓𝟐𝟏𝟐 (𝟏− 𝒓𝟐𝟑𝟐 )
r23.1 = 𝒓𝟐𝟑
− 𝒓𝟐𝟏 𝒓𝟏𝟑 𝟏−𝒓𝟐𝟐𝟏 (𝟏− 𝒓𝟐𝟏𝟑 )
10.3.2
MULTIPLE CORRELATION
𝒓𝟐𝟏𝟐
+ 𝒓𝟐𝟏𝟑 −𝟐 𝒓𝟏𝟐 𝒓𝟏𝟑 𝒓𝟐𝟑 𝟏 − 𝒓𝟐𝟐𝟑
R 1.23 =
R 2.13 = 𝒓𝟐𝟐𝟏
+ 𝒓𝟐𝟐𝟑 −𝟐 𝒓𝟐𝟏 𝒓𝟐𝟑 𝒓𝟏𝟑 𝟏 − 𝒓𝟐𝟏𝟑
R 3.12 = 𝒓𝟐𝟑𝟏
+ 𝒓𝟐𝟑𝟐 −𝟐 𝒓𝟑𝟏 𝒓𝟑𝟐 𝒓𝟏𝟐 𝟏 − 𝒓𝟐𝟏𝟐
R1.23 = R1.32 coefficient of multiple correlation is between 0 and 1
44  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Chapter – 11 INDEX NUMBERS
45  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
METHODS OF CONSTRUCTION OF INDEX NUMBER
1. UNWEIGHTED a. Simple aggregate b. Simple average of Price Relative
2. WEIGHTED a. Weighted Aggregate b. Weighted average of Price Relative
1. UNWEIGHTED a. Simple Average
𝑷𝟏
× 𝟏𝟎𝟎 𝑷𝟎
P01 =
P01 = Price index number for the current year with reference to the base year
𝑷𝟏
= Aggregate of Price for the current year, 𝑷𝟎 = Aggregate of Price for the base year
b. Simple Average Of Price Relative Method
P01 = 𝑷
𝑵
𝑷𝟏
× 𝟏𝟎𝟎 𝑷𝟎
Where, P =
N = No. of items
46  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
c. Geometric Mean
P01 = antilog Where, P =
𝐥𝐨𝐠
𝑷 𝑵 𝑷𝟏 × 𝟏𝟎𝟎 𝑷𝟎
2. WEIGHTED INDEX NUMBER
p0 = Price in Base year, q0 = weighted / Quantity in Base year p1 = Price in Current year, q1= quantity in Current year
a) Laspeyre’s Method
P01 (La) =
𝒑𝟏
𝒒𝟎 𝒑𝟎 𝒒𝟎
× 100
b) Paasche’s method
P01 (Pa) =
𝒑𝟏
𝒒𝟏 𝒑𝟎 𝒒𝟏
× 100
47  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
c) Bowley’s method
P01 (B) = = 𝒑𝟏
𝒒𝟎 𝒑𝟎 𝒒𝟎
+ 𝟐 𝒑𝟏
𝒒𝟏 𝒑𝟎 𝒒𝟏
× 𝟏𝟎𝟎 𝑷𝟎𝟏
𝑳𝑨 + 𝑷 𝟎𝟏 (𝑷𝑨) 𝟐
d) Fisher’s Ideal Formula
P01 (F) = 𝑳 × 𝑷 =
𝒑𝟏
𝒒𝟎 𝒑𝟎 𝒒𝟎
+ 𝒑𝟏
𝒒𝟏 𝒑𝟎 𝒒𝟏
× 𝟏𝟎𝟎
e) Marshall Edgeworth method
P01 (Ma) =
𝒑𝟏
( 𝒒𝟏 + 𝒒𝟎 ) 𝒑𝟎 ( 𝒒𝟎 + 𝒒𝟏 )
= 𝒑𝟏
𝒒𝟎 𝒑𝟎 𝒒𝟎
+ 𝒑𝟏
𝒒𝟏 𝒑𝟎 𝒒𝟏
× 𝟏𝟎𝟎
f) Kelley’s Method
𝒑𝟏
𝒒 𝒑𝟎 𝒒
P01 (K) =
× 𝟏𝟎𝟎
Where, q = 𝒒𝟎
+ 𝒒𝟏 𝟐
48  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
DISCRIMINATE ANALYSIS
𝒂𝟏𝟏
𝒂𝟐𝟏 𝒂𝟏𝟐 𝒂𝟐𝟐 +
𝒃𝟏𝟏
𝒃𝟐𝟏 𝒃𝟏𝟐
𝒂 + 𝒃𝟏𝟏 = 𝟏𝟏 𝒃𝟐𝟐 𝒂𝟐𝟏 + 𝒃𝟐𝟏 𝒂𝟏𝟐
+ 𝒃𝟏𝟐 𝒂𝟐𝟐 + 𝒃𝟐𝟐 𝒂𝟏𝟏
𝒂 𝟐𝟏 𝒂𝟏𝟐
𝒂𝟐𝟐 × 𝒃𝟏𝟏
𝒃𝟐𝟏 𝒃𝟏𝟐
= 𝒃𝟐𝟐 𝒂𝟏𝟏
× 𝒃𝟏𝟏 (𝒂𝟏𝟐 × 𝒃𝟐𝟏) 𝒂𝟐𝟏 × 𝒃𝟏𝟏 (𝒂𝟐𝟐 × 𝒃𝟐𝟏 ) 𝒂𝟏𝟏
× 𝒃𝟏𝟐 (𝒂𝟏𝟐 × 𝒂𝟐𝟐 ) 𝒂𝟐𝟏 × 𝒃𝟏𝟐 (𝒂𝟐𝟐 × 𝒃𝟐𝟐 )
If A = 𝒂
𝒄 𝒃
𝒅
Then determination of A = 𝑨 = ad – bc If 𝑨 = 0, than A = singular matrix If 𝑨 ≠ 0, than A = nonsingular matrix when 𝑨 ≠ 0
A1 = A1 = If I = 𝟏
𝑨 𝟏 𝒅
−𝒄
−𝒃 or 𝒂 𝒅 −𝒄 −𝒃 𝒂 𝒂𝒅
– 𝒃𝒄 𝟏
𝟎 𝟎
𝟏
Then, A A1 = A1 A = I
When a matrix A of type m×n and a matrix B type p×q are multiplied, we obtained a matrix C of type, m×q. With condition that n = p
49  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
POOLED COVARIANCE MATRIX, S Condition I Variable 1 p1 p2 M 𝒑𝒎
Condition II Variable 2 q1 q2 M 𝒒𝒎 Variable 1 𝜶𝟏 𝜶𝟐 M 𝜶𝒏 Variable 2 𝜷𝟏 𝜷𝟐 M 𝜷𝒏
Let p̅ be the mean of the values of variables 1 under condition I. Let q̅ be the mean of the values of variables 2 under condition I. Let α̅ be the mean of the values of variables 1 under condition II. Let β̅ be the mean of the values of variables 2 under condition II.
Let 𝒚𝟏
= 𝒑
𝒒
i m
, 𝒚𝟐
= 𝜶
𝜷
j n
𝒚𝟏
− 𝒚𝟐 = 𝒑
− 𝜶 𝒒− 𝜷
50  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
Pooled covariance matrix, S
S= 𝟏
[ 𝒎+𝒏−𝟐 ] × 𝒎 𝒏 𝒎 𝒏
(𝒑𝒊 − 𝒑)𝟐 +
𝒊
=𝟏 𝒎 𝒋=𝟏 𝒏
(∝𝒋 −∝)𝟐
𝒊
=𝟏
𝒑𝒊
− 𝒑 𝒒𝒊 − 𝒒 +
𝒎
𝒋=𝟏 𝒏
∝𝒋 −∝ 𝜷𝒋 − 𝜷 (𝜷𝒋 − 𝜷)𝟐
𝒋
=𝟏
𝒑𝒊
− 𝒑 (𝒒𝒊 − 𝒒) +
𝒊
=𝟏 𝒋=𝟏
∝𝒋 −∝ (𝜷𝒋 − 𝜷)
𝒊
=𝟏
(𝒒𝒊 − 𝒒)𝟐 + 𝓢
= S1 𝒑
− 𝜶 𝒒−𝜷
= S1 𝝀
𝝁
51  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
FISHER DISCRIMINANT FUNCTION, Z
𝚭
= 𝝀𝒚𝟏 + 𝝁𝒚𝟐
Condition I Variable 1 𝒚𝟏 p1 p2 M 𝒑𝒎 Variable 2 𝒚𝟐 q1 q2 M 𝒒𝒎 Condition II Variable 1 𝒚𝟏 𝜶𝟏 𝜶𝟐 M 𝜶𝒏 Variable 2 𝒚𝟐 𝜷𝟏 𝜷𝟐 M 𝜷𝒏
Here, y1 & y2 are two different variables, both taking values under different conditions. 𝚭
cutoff = 𝒏
𝒁𝑨 + 𝒎 𝒁𝑩 𝒎+𝒏
Where, 𝒁𝑨 = mean discriminate function of condition I 𝒁𝑩 = mean discriminate function of condition II
52  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
CLUSTER ANALYSIS
Q P
Example:
1 0
1 a c Q
0 b d
P
1 0
1 a=6 c=1
0 b=1 d=2
a = ( 1, 1 ) ; b = ( 1, 0 ) c = ( 0, 1 ) ; d = ( 0, 0 )
Similarity coefficient C ( P, Q ) = = =
𝒂
+𝒅 𝒂 + 𝒃 + 𝒄 + 𝒅 𝟔+𝟐 𝟔+𝟏+𝟏+𝟐 𝟖 𝟏𝟎
= 0.8
Inference: there is 80 % similarity between two point P and Q
53  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
MATCHING COEFFICENT WITH CORRECTION TERM :
1. Rogers and Tanimoto coefficient of Matching:
By giving double weight for unmatched pairs of attributes, the matching coefficient with correction term is defined as
2. Sokal and Sneath coefficient of matching
By giving double weight for matched pairs of attributes, the matching coefficient with correction term is defined as
C ( P, Q ) = 𝒂
+𝒃 𝒂+𝒃+𝟐 ( 𝒃+ 𝒄 )
C ( P, Q ) = 𝟐
( 𝒂+𝒃 ) 𝟐( 𝒂+𝒃 )+ 𝒃+ 𝒄
Perfect similarity between P and Q occurs when, b = c = 0, C ( P, Q ) = 1 Maximum dissimilarity between P and Q occurs when, a = b = 0, C ( P, Q ) = 0 E.g. C ( P, Q ) = =
𝟔
+𝟐 𝟔+𝟐+𝟐 (𝟏+𝟏) 𝟔
Prefect similarity between P and Q occurs when, b = c =0, C ( P, Q ) = 1 Maximum dissimilarity occurs when, a = d = 0, C ( P, Q ) = 0 E.g. C ( P, Q ) = =
𝟐
(𝟔+𝟐) 𝟐 𝟔+𝟐 + 𝟏 + 𝟏 𝟏𝟔 𝟏𝟖
𝟏𝟐
= 0.89 Thus, the similarity between p and Q is estimated as 89%.
= 0.67 So, the estimate of similarity between P and Q is 0.67 or 67%
54  P a g e
STEVENS BUSINESS SCHOOL
BATCH 2009  2011
COMPARISION OF THE THREE COEFFICIENTS OF SIMILARITY, 𝒂
+𝒃 𝒂+𝒃+𝟐 ( 𝒃+ 𝒄 )
≤ 𝒂
+𝒅 𝒂 + 𝒃 + 𝒄 + 𝒅
≤ 𝟐
( 𝒂+𝒃 ) 𝟐( 𝒂+𝒃 )+ 𝒃+ 𝒄
Rogers and Tanimoto coefficient
Simple matching coefficient
Sokal and sneath coefficient
Pessimistic estimate of Moderate estimate of similarity similarity
Optimistic estimate of similarity
55  P a g e
This action might not be possible to undo. Are you sure you want to continue?