# MEASURES OF CENTRAL TENDENCY Formula

1.

Mean for grouped data, using assumed mean with step deviation method Mean = A + Where – A is the assumed mean d is deviations from assumed mean divided by common interval

∑ fd * c
n

∑ fd

is the summation of frequency * deviations

c is the class interval n is total frequency Median for grouped data, Median = l + Where – l is lower limit of the median class n is total frequency cf is the cumulative frequency before median class c is class interval Mode for Grouped Data Mode = l + Where – l is lower limit of the modal class f is frequency of modal class interval

2.

n / 2 − cf *c f

3.

∆1 − ∆ 2 *C ∆1

∆1 is the frequency of pre-modal class – frequency of modal class ∆ 2 is the frequency of post-modal class – frequency of modal class
c is the class interval Five number summary comprisesa. b. c. d. e.

4.

Smallest observation First quartile (lower quartile) Median Third quartile (upper quartile) Largest observation

Statistics

Page 1

OBJECTIVE QUESTIONS Choose the best answer / Fill in the blanks / True or False – 1. 2. 3. If the classes are of the form 0 - 10, 10 – 20, etc they are called _______________ classes If the classes are of the form 1 - 10, 11 - 20,etc they are called _________________ classes If the classes are of the form 0 - 10, 10 – 20, etc an item of value 10 will be entered in – a. b. c. d. 4. 5. 6. 7. Class 0 – 10 Class 10 – 20 Either of the above None of the above

If the classes are of the form 0 - 10, 10 – 20, etc the class interval is ____________ If the classes are of the form 0 - 10, etc the mid point of class is ____________ Number of observations falling within a class is called - Class _____________ Ogive means – a. b. c. d. Cumulative frequency curve Frequency Cure Mathematical Average Arithmetic Mean

8. 9.

Data can be in ________________________ or _____________________form. The measures of central tendency are ______________, ______________ & _________________ a. b. c. d. a. b. c. d. Measures of Central Tendency Measures of Dispersion Measures of Middle Values Measures of Mathematical Averages Mean = Median = Mode Mean > Median > Mode Mean < Median < Mode Mean + Median = Mode

10. Mean, Median and Mode are known as –

11. If all the items in a distribution are of the same value, then-

12. The sum of deviations of all observations from the Arithmetic Mean is ____________ 13. In a symmetrical distributiona. b. c. d. Mean = Median = Mode Mean > Median > Mode Mean < Median < Mode Mean + Median = Mode

Statistics

Page 2

14. Empirical formula about measures of central tendency given by Karl Pearson for an asymmetrical distribution is – a. b. c. d. Mean – Mode = 3 (Mean – Median) 2 Mode = (Mean + Median) 2 Mean = (Mode + Median) 2 Median = (Mode + Mean)

15. Quartiles are _____________________ 16. Percentiles are ____________________ 17. Deciles are _____________________ 18. True or False a. b. c. d. e. The following measures are affected when the highest value in a set of observations is altered The following measures are affected when the lowest value in a set of observations is altered The following measures are affected when the highest value and the lowest in a set of observations are altered The following measures are affected when each value in a set of observations are increased or decreased by a constant value The following measures are affected when each value in a set of observations are multiplied or divided by a constant value Measure Mean Median Mode a b c d e

Statistics

Page 3

PROBLEMS CALCULATE THE MEASURES OF CENTRAL TENDENCY AND THE FIVE NUMBER SUMMARY FOR THE FOLLOWING DATA 1. Data pertaining to marks of students and ages of people is given below a. b. 2. Marks of students in a test is 48, 60, 59, 67, 66, 78 Ages of people in a group is 70, 72, 63, 56, 37, 82, 55, 85, 63

Cycle test marks of students are given below – Class A Class B 55 45 58 35 64 64 70 60 75 58 72

3.

Data pertaining to workers and their wages is given below Wages (Rs) No. of Workers 35 19 45 12 55 15 65 10 75 14

4.

Monthly income of 100 families is given belowMonthly Income (Rs) Less than 10 Less than 20 Less than 30 Less than 40 Less than 50 Less than 60 Less than 70 Less than 80 Less than 90 No. of Families 5 12 26 44 64 78 87 94 100

5.

Data pertaining to students and their marks is given below Marks No. of Students 0–9 1 10 – 19 3 20 – 29 19 30 – 39 10 40 -49 15 50 - 59 2

Statistics

Page 4

it should be converted to exclusive classes L−S L−S L+S Quartile Deviation Coefficient of Quartile Deviation Q3 − Q1 2 Q3 − Q1 Q3 + Q1 Q3 − Q1 2 Q3 − Q1 Q3 + Q1 Q3 − Q1 2 Q3 − Q1 Q3 + Q1 n ( − cf ) Q1 = l + 4 *c f Where l = lower limit of Q1 class n = total no.MEASURES OF DISPERSION Measure Range Coefficient of Range Individual Data Discrete Data Grouped Data L−S L−S L+S L is Maximum Value S is Minimum Value L−S L−S L+S L is Maximum Value S is Minimum Value Explanation L is mid-value of largest class S is mid-value of smallest class No class should be open-ended. of observations cf = cumulative frequency till class preceding Q3 class c = class size f = frequency of Q3 class Statistics Page 5 . of observations cf = cumulative frequency till class preceding Q1 class c = class size f = frequency of Q1 class Q1 = Explanation ( N + 1)th item 4 Q1 is the x value of the ( N + 1)th item 4 Q3 = 3( N + 1)th item Q3 is the x value of 3( N + 1)th 4 the 4 item 3n − cf ) Q3 = l + 4 *c f ( Where l = lower limit of Q3 class n = total no. If it is an inclusive class.

b. c. 5. 7. b. Quartile Range Inter quartile range Intra quartile range Semi inter quartile range Average deviation Dispersion Difference Zero sum 4. Dispersion Skewness Average Mean 2. The measure of degree of scatter of the data from the central value is a. b. d. b. c. d. c. 8.OBJECTIVE QUESTIONS Choose the best answer / Fill in the blanks / True or False 1. 9. Q3 – Q1 Q1 – Q2 Q2 – Q1 Q3 – Q2 Statistics Page 6 . 3. d. c. d. Mean deviation is otherwise called as – a. ______________is the difference between the largest and the smallest value of the variable Quartile deviation is otherwise called as – a. The relative measure of standard deviation is called ___________________________________ Square of standard deviation is called _______________________________ Sum of squares of deviation is minimum when taken from ___________________ Sum of absolute deviation is minimum when taken from _____________ Inter quartile range is a. 6.

b. c. True or False a. d. The following measures are affected when the highest value in a set of observations is altered The following measures are affected when the lowest value in a set of observations is altered The following measures are affected when the highest value and the lowest in a set of observations are altered The following measures are affected when each value in a set of observations are increased or decreased by a constant value The following measures are affected when each value in a set of observations are multiplied or divided by a constant value Measure Range Mean Deviation Quartile Deviation Standard Deviation Variance a b c d e Statistics Page 7 .10. e.

of bulbs) 8 60 24 16 12 Statistics Page 8 . Heights of 60 students in a class are as below.PROBLEMS CALCULATE THE MEASURES OF DISPERSION FOR THE FOLLOWING DATA 1. A factory produced two types of electric bulbs A and B. Height (in cms) No. b.5 3 153 9 153. of Students 152.5 7 158 5 159.160 A (no. Find which batsman is a better player Find out which batsman is more consistent (more reliable) 16 42 8 56 24 43 56 37 90 31 104 45 48 50 32 29 8 30 14 27 Batsman I Batsman II 2. Find which type of bulb is long lasting Find out which type of bulb is more variable Length of Life (in hours) 60 – 80 80 – 100 100 – 120 120 – 140 140 . a.5 2 3.5 6 157. d. The following are the runs scored by two cricketers in 10 innings.5 7 154 13 155 8 155. the following results were obtained c. of bulbs) 10 22 52 20 16 B (no. In a study about the life of bulbs.

e. c. d. Correlation measures the degree of relationship between two or more variables a. y ) where x and y are mean of x and y respectively If r = 0. d. These are known as linear regression equations Rate of change of one variable to unit change in other variable is called regression coefficient The regression lines intersect at ( x . Understanding why association exists a. b. c. b. The symbol for measuring correlation is ‘r’ ‘r’ lies between -1 and +1 Correlation is independent of origin and scale Correlation is symmetric with respect to the variables It is independent of units Correlation means relationship and not causation Dependency Nature and strength of association Causation Coincidental relationship Influence of other variables Positive and negative correlation Linear and non-linear correlation Simple. then the regression lines will be perpendicular to each other If r = ± 1. d. Statistics Page 9 . then the regression lines will coincide r is the geometric mean of the regression coefficients Both the regression coefficients are either positive or negative At least 1 regression coefficient must be numerically less than unity Regression coefficients are independent of origin but not scale 2. h. and d are constants. then y can be represented as equal to ax + b or x is equal to cy + d where a. f. b. b. e. e. Important types of correlation are – a. c. k. g. c. b. c. c. f. i. 3. e. • Regression a. partial and multiple correlation Difference in periods for cause and effect relationship to be established is known as lag and lead Advertisement and marketing expenses may lead to sales with a lag Additional supply of materials today may lead to reduction in prices after some time Effect of increase in income may lead to increase in expenditure and savings after a period Boom in agricultural produce may lead to increase in industrial output after a gap of time Regression is a functional relationship between the value of 2 variables With the help of regression lines we can predict most likely value of one variable given the other If x and y are two variables. j. d.CORRELATION AND REGRESSION 1. b. • Lag and lead in correlation a.

d is difference of ranks of x and y variable and n is number of observations and mi is number of times a rank is repeated in the first or second variable C. d is difference of ranks of x and y variable and n is number of observations WHEN RANKS ARE EQUAL R = 1− 6(∑ d 2 + 1 3 (mi − mi )) 12 n(n 2 − 1) Where. Karl Pearson’s Coefficient of Correlation Assumed mean method r= N ∑ dx 2 − (∑ dx) 2 N ∑ dy 2 − (∑ dy ) 2 N ∑ dxdy − ∑ dx ∑ dy Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the number of observations Direct method r= N ∑ x 2 − ( ∑ x ) 2 N ∑ y 2 − (∑ y ) 2 N ∑ xy − ∑ x ∑ y Where x is all values of x and y is all values of y and N is the number of observations (Note: Karl Pearson’s coefficient of correlation is also called product moment correlation) b. Methods of Correlation a. Two way Frequency Table r= Steps- N ∑ fdx 2 − (∑ fdx )2 N ∑ fdy 2 − (∑ fdy ) 2 Take step-deviations of x and y from assumed mean and denote them dx and dy Multiply dx and dy and the frequency of each cell and note the figure in upper right hand corner of each cell Add all values of fdxdy and obtain ∑fdxdy N ∑ fdxdy − ∑ fdx ∑ fdy Statistics Page 10 .Formula1. Spearman’s Rank Correlation WHEN RANKS ARE NOT GIVEN OR UNEQUAL RANKS GIVEN 6∑ d 2 R = 1− n(n 2 − 1) Where.

Multiply frequencies of variable x by deviations of variable x and obtain ∑fdx Take square of deviations from variable x and multiply by frequencies to obtain ∑fdx2 Multiply frequencies of variable x by deviations of variable y and obtain ∑fdy Take square of deviations from variable y and multiply by frequencies to obtain ∑fdy2 Substitute the values in the formula to obtain r d Concurrent Deviation Method R=± 2C − n n Where C is number of concurrent deviations (where sign change from previous pair of x and y is same and n is number of pairs observed) 4. Probable Error PE = 0. (Rho) is r ± PE Calculation of Regression Equation a. we will get the value of b as mentioned below r σy N ∑ dxdy − ∑ dx ∑ dy = by = σx N ∑ dx 2 − (∑ dx )2 x Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the number of observations d.6745 (1 − r 2 ) n Where r is correlation and n is number of pairs observed SE = (1 − r 2 ) n Where r is correlation and n is number of pairs observed δ 5. Fitting a straight line x on y - Statistics Page 11 . is called the regression coefficient of x on y Fitting a straight line y on x – Equation is Y = a + bX ∑ y = na + b ∑ x ∑ xy = a ∑ x + b ∑ x 2 Where if we solve for ‘a’ and equate the 2 equations. ( y − y) = r σy (x − x) σx σx is called the regression coefficient of x on y σy σy σx Where x and y are means of x and y respectively and r c. (x − x) = r σx ( y − y) σy Where x and y are means of x and y respectively and r b.

Fitting a parabolic curve or a second degree equationEquation is Y = a + bX + cX2 ∑ y = na + b ∑ x + c ∑ x 2 ∑ xy = a ∑ x + b ∑ x 2 + c ∑ x 3 ∑ x 2 y = a ∑ x 2 + b ∑ x3 + c ∑ x 4 f.r σx N ∑ dxdy − ∑ dx ∑ dy = bxy = σy N ∑ dy 2 − (∑ dy ) 2 Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the number of observations e. Statistics Page 12 . it can be done for N variables. Multiple Regression Equations For 3 variables. equation is X = a + bY + cZ ∑ x = na + b ∑ y + c ∑ z ∑ xy = a ∑ y + b ∑ y 2 + c ∑ yz ∑ xz = a ∑ z + b ∑ yz + c ∑ z 2 Similarly.

6. d. b. b. An analysis of the relationship among two or more variables is called a. Coefficient of determination is _________ and coefficient of non-determination is ____________ 14. 4. 9. Linear correlation Non-linear correlation Simple correlation Special correlation 7. Change of scale in value of x or y series willAffect the value of ‘r’ very much Not affect the value of ‘r’ Affect the value of ‘r’ slightly Increase or decrease the value of r proportional to the change of scale The amount of rainfall and the yield of crops The color of an employee’s dress and the employee’s salary Age of applicants for life insurance and the annual premium payable Sale of raincoats and the sale of umbrellas 11. it is called _______________ correlation If the ration between two sets of variables is same.OBJECTIVE QUESTIONS CHOOSE THE BEST ANSWER / FILL IN THE BLANKS / TRUE OR FALSE 1. c. it is called _______________ correlation If the decrease in one variable influences the increase in the other. then correlation between them is _________ If the decrease in one variable influences the decrease in the other. State the nature of correlation that exists between the following variablesa. c. State true or false a. b. 12. then coefficient of correlation for the remaining pairs remains unchanged Statistics Page 13 . d. c. Correlation value lies between ____________and ________ 13. 5. Correlation Skewness Dispersion Kurtosis 2. Perfect negative correlation is when r = _________ Perfect positive correlation is when r = _________ Completely no correlation is when r = ________ a. b. d. 3. 10. d. then it is called _____________________ correlation Curvilinear correlation is a. Correlation coefficient is unaffected by shift in origin Covariance between 2 variables is always positive Rank correlation lies between 0 and 1 If one set of values are removed. 8. c. d. b. c. If x and y are independent.

then the variables are independent Statistics Page 14 .e. If correlation between 2 variables are 0.

b. c. Do the following items have positive. negative or zero correlation a. Price and demand Age and life expectancy Age of husband and wife Income and savings of a person Statistics Page 15 . d.15.

98% are free from Swine Flu in the first case vs. Find the correlation and also regression equations between advertisement expenses and sales of a particular brand of icecream Dippy-Dip Month Advt. 21% who are infected with Swine Flu in the second case. Excel Pharma has launched a new preventive medicine for the treatment of Swine Flu. Find rank correlation between marks in test and marks in interview of a group of candidates in a job selection procedure Marks in Test Marks in Interview 24 38 33 40 33 44 42 50 53 49 60 45 60 52 60 50 71 55 75 68 5.Percentage score by Judge B 6.PROBLEMS CALCULATE CORRELATION FOR THE FOLLOWING DATA 1. The data below is the effect on 100 patients who have taken the medicine against 100 patients who have not taken the medicine and being admitted to the hospital with viral infection. Comment Statistics Page 16 . Jan 20 30 Feb 25 36 Mar 28 40 Apr 32 42 May 36 45 Jun 34 40 Find correlation and also regression equations between marks in statistics and accounting of a particular group of students Roll No of student Statistics marks Accounting marks 101 45 79 102 66 56 103 58 61 104 74 48 105 81 40 3. Find correlation and regression equations between age of cars and annual maintenance cost Age of cars Annual maintenance cost 2 1600 4 1500 6 1800 8 1700 10 2100 4. Excel Pharma is claiming a very high success rate on use of their medicine. Find correlation between percentage score given by 2 judges Y\X 50 – 60 60 – 70 70 -80 80 – 90 90 – 100 60-70 4 3 70 – 80 2 5 3 3 80 – 90 2 3 3 5 5 90 – 100 3 6 3 X – Percentage score by judge A Y . Exp (Rs 000s) Sales (Rs lakhs) 2.

Following is the data pertaining to the sensex value and the gold price as on 1st of month from Jan to Sep 2010. actual scores for the five Chennai players in the one off ODI match against Mumbai Player Sairam Sandeep Sankar Sundar Suresh Predicted score 36 74 41 87 4 Actual score 35 73 40 90 3 Please give your comments about these investigations and the truth in the allegations against the players. What will be the sensex value in Oct 2010. Sankar. The table contains scores calculated by the prediction factor vs. if the gold price will increase by 10% for diwali purchase season? MONTH 24 Ct Gold Price/gm Sensex JAN 10 1500 14000 FEB 10 1550 15000 MAR 10 1600 1550 APR 10 1620 15500 MAY 10 1700 16000 JUN 10 1750 17000 JUL 10 1800 17500 AUG 10 1850 18000 SEP 10 1900 18500 8. Sundar.7. Statistics Page 17 . Find the multiple linear regression equation of X on Y and Z from the data given belowX Y Z 2 3 4 4 5 6 6 7 8 8 9 10 9. it was found that five Chennai cricket players. Sairam. Sandeep. In the table below are the batting scores of these five players along with the team score and the result of the matches in the recently concluded Friendship series. it was predicted by the paper in a letter to the board that the players will under perform in their matches against Mumbai also and the prediction factor was given to the Chennai Police much in advance before the actual matches were played. and Suresh are deeply involved with the betting syndicate. Career Batting Average 28 26 41 85 34 224 60% WON Player Sairam Sandeep Sankar Sundar Suresh Team Chennai Result 1st ODI 41 17 33 89 0 272 WON 2nd ODI 19 19 42 112 3 212 LOST 3rd ODI 12 17 39 58 2 171 LOST 4TH ODI 33 71 36 90 1 265 WON 5th ODI 30 10 45 67 1 178 LOST Further. It has been confirmed by our sources that these players willfully underperformed in the recently concluded ODI series against the Bangalore team. (Please find below an article printed in the front page of Chennai Times) Chennai During our recent investigations.

TIME SERIES Time Series . C & I are expressed as ratios or in percentages Components may be dependent on each other Mostly used in real life practice Statistics Page 18 .Movements due to forces which are usually rhythmic in nature and within a year Irregular Variations ( I ) .Oscillatory movements with periods greater than 1 year. Utility of Time Series • Analysis Past behavior Effect of Factors Help predict future behavior • • • Forecasting Help make future plan of action Evaluation Evaluation of current achievements Comparison Scientific basis for making comparisons Isolating effects of various components Components of Time Series • Long term Secular Trend (T) .It is arrangement of data according to time of occurrence in chronological order.No regular period of occurrence and accidental changes. Any series of measurement that is variable over time is called Time series. C & I are expressed as deviations from T • Multiplicative Model Y=T*S*C*I S. purely random.General Trend to increase or decrease over a period of time Cyclic Variations (C) . Usually may last 7-9 years • Short Term Seasonal Variation (S) . unforeseen and unpredictable Mathematical Models • Additive Model Y=T+S+C+I Components are independent to each other Different components are expressed in original units and are residuals S.

Use real values rather than nominal values Comparability . not suitable for forecasting and decision making <> Simplest and Most Flexible Method First step to plot points on a paper Then. draw a freehand smooth curve through points Number of points above curve and below curve should be equal Total deviations should be zero Sum of square of deviations should be the minimum possible Merits and Demerits of Graphic Method Demerits Method of Semi Averages • • Merits • • • • • • Simple method Trend figures are objective Line can be extended to obtain future estimates Assumption of linear trend Affected by extreme values and use of arithmetic mean Obtained and predicted values are not precise and reliable <> Semi averages are the averages of two halves of a series Whole data is classified into two equal parts with respect to time Merits and Demerits of method of Semi Averages Demerits Statistics Page 19 .• Preliminary adjustments before Analyzing Time Series o o o o o Time Variation .Adjust for variables affected by population like per capita income Price Changes . of days in a month Population Variation .Adjusting for no.Make data homogeneous and comparable Miscellaneous Changes Measurement of Trend Freehand or Graphic Method • • • • • • Merits • • • • • Simple and time saving No mathematical calculation required Very flexible Highly subjective Hence.

a new equation has to be formed <> As sum of deviations from mean is zero.Method of Moving Averages • • • • Merits • • • • • • • Simple and Objective method Flexible to add additional data without affecting calculations If period of moving average coincides with period of cyclical fluctuations. sum of deviations from line of best fit is zero Hence. and b are constants Exponential smoothing average Statistics Page 20 . the centered moving average has to be found In some cases. b. and c are constants Fitting an exponential trend Y = a b X where a. then they are automatically eliminated No trend values for some initial and end periods No functional relationship between value and time Difficulty in selecting period of moving average Bias in case the trend is non-linear<> Method helps to reduce fluctuations and obtain trend values with fair degree of accuracy Method consists of taking arithmetic mean of the values for a certain time span and placing at the centre of time span In case of even years. weights may be given to the moving averages called weighted moving average Merits and Demerits of Method of Moving Averages Demerits Method of Least squares • • • Merits • • • • • • Trend line for entire period Functional relationship between time and value Objective method Requires many calculations and is complicated Seasonal. called as method of least squares or best fit Y = a + bX where ‘a’ and ‘b’ are constants Merits and demerits of Method of least squares Demerits Other Methods of obtaining trends • • • Fitting a Second Degree Trend or a parabolic trend Y = a + bX + cX2 where a. cyclical or irregular variations are ignored If even a single data pair is added.

use quadratic method If first differences of logarithm are constant. use exponential curve If first differences tend to decrease by a constant percentage. use linear method If second differences are constant.Selection of type of trend • • • • If first differences are constant. adjustment correction factor = 1200 or 400/(Total SI) Ratio to Moving Average method • • • • First take a centered moving average Get percentage for actual seasonal data by dividing actual data/ centered moving average Arrange percentage data seasonally and take average If total of seasonal index more or less than 1200 or 400. find seasonal trend values for annual data and then seasonal data Get percentage for actual seasonal data by dividing actual data/ trend values Find Seasonal Index which is average of percentages If total of seasonal index more or less than 1200 or 400. use modified exponential curve Methods of measuring Seasonal Variations Method of Simple Averages • • • • • Arrange seasonal data across given periods Find average of data for same season Find average of averages Get percentage weights for various seasons It is simple to find but there is an assumption that there is almost no cyclical or irregular variation or of negligible value Ratio to Trend Method • • • • • Arrange seasonal data across given periods Using a suitable method. adjustment correction factor = 1200 or 400/(Total SI) De-Seasonalisation of Data • • • • • • • Elimination of seasonal variation is called as de-seasonalisation of data Either additive or multiplicative models are used Measurement of cyclical variations Eliminate Trends and Seasonal Variations from the original data using additive or multiplicative models Irregular variations are removed from this data by using the method of moving averages of appropriate period Cyclical variations are the only variations left and can be measured now Measurement of Irregular variations Residual Method Statistics Page 21 .

• • Using additive or multiplicative models by removing trend. seasonal or cyclical variations They are found to be of small magnitude Forecasting of Data Qualitative Forecasting • When historical data are not available Quantitative Forecasting • • • When historical data available Casual forecasting methods Time Series forecasting methods Forecasting methods using time series • • • • • Mean forecast Naive forecast Linear Trend Forecast Non-Linear Trend Forecast Forecasting with Exponential Smoothing Statistics Page 22 .

Objective Questions CHOOSE THE BEST ANSWER / FILL IN THE BLANKS / TRUE OR FALSE 1. Ratio to moving average method iv. Increase only ii. If growth rate is constant. None of the above b. With which form of time series would you associate the followinga. Increase or Decrease iv. Method of least squares c. Seasonal variations are variations with periods of _________________ and are mostly caused by _________________ 3. A fire in the factory delaying production for three weeks b. The trend line obtained by the method of least squares is known as line of __________ f. Link relative method Statistics Page 23 . Method of simple averages ii. Choose the correct answer a. the trend line is _____________ i. Ratio to trend method iii. Ratio to moving average method iv. A time series consists of data arranged in _________________ order c. A polynomial of the form Y = a + bX + cX2 is called _______________________ j. Need for increased wheat production due to rise in the population c. Trend refers to a long term tendency to i. For the annual data _______________________component of time series is missing h. Price hike in petroleum products due to Gulf war 2. The multiplicative model is expressed as Y = ________________________ e. Ratio to trend method iii. The additive model is expressed as Y = ________________________ d. Increase in employment during harvest time e. Decrease only iii. seasonal indices are obtained by using i. The component of time series useful for long-term forecasting is _____________ g. Trend is the overall tendency of the time series data to _____________ or _______________ over a long period of time k. Method of simple averages ii. If trend is absent in a time series. Change in day temperature from winter to summer d. An overall rise or fall in a time series is called____________ b. Fill in the blanks a. The most widely used method of measuring seasonal variations is i.

Residual method Statistics Page 24 . The method used in the study of cyclical variations is i. Link relative method iv. Ratio to trend method ii.d. Ratio to moving average method iii.

b. Semi Averages method Moving Averages method Weighted Moving Averages method Least Squares method 1. c. What will be the sensex value in Oct 2010. Assume a 4 yearly cycle with equal weights 1970 53 71 79 72 76 73 66 74 69 75 94 76 105 77 87 78 79 79 104 80 97 81 92 82 101 83 105 Year Value 2. c. Monthly data pertaining to rice production in lakhs of tonnes the period of Jan 2007 to Dec 2009 Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2007 16 15 14 18 17 19 20 17 16 14 16 19 2008 25 23 25 27 24 25 26 22 22 22 22 23 2009 21 20 21 19 18 17 19 20 21 20 18 16 Statistics Page 25 . d.PROBLEMS Find trend lines for the following data by a. if the gold price will increase by 10% for diwali purchase season? Month 24 Ct Gold Price/gm Sensex Jan 10 1500 14000 Feb 10 1550 15000 Mar 10 1600 1550 Apr 10 1620 15500 May 10 1700 16000 Jun 10 1750 17000 Jul 10 1800 17500 Aug 10 1850 18000 Sep 10 1900 18500 Find seasonal indices for the following data by a. b. Method of simple averages Ratio to trend method Ratio to moving average method Link Relative method Output of Coal in Million Tonnes Year 2005 2006 2007 2008 2009 Q1 73 70 73 75 65 Q2 67 63 68 64 60 Q3 66 61 68 61 56 Q4 68 66 72 67 63 4. Following is the data pertaining to the sensex value and the gold price as on 1st of month from Jan to Sep 2010. d. 3.

Calculate the seasonal variations by ratio to moving average method for the following data from 2007 to 2009 Year 2007 2008 2009 IQ 68 65 68 II Q 62 58 63 III Q 61 66 63 IV Q 63 61 67 Statistics Page 26 . Calculate the seasonal variations by ratio to trend method for the following data from 2005 to 2009 Year 2005 2006 2007 2008 2009 IQ 30 34 40 54 80 II Q 40 52 58 76 92 III Q 36 50 54 68 86 IV Q 34 44 48 62 82 6.5.

An event is one or more outcomes of a sample space. For example. 5. 2. Multiplication RulesWhen two events occur in sequence. they have some common outcomes. 36 outcomes can be represented by using a table. 4. A probability experiment is a chance process that leads to well defined outcomes or results. The set of all outcomes of a probability experiment is called a sample space. the probability that the first event occurs does not affect or change the probability of the second event occurring. Once a sample space is found. Sample spaces can also be represented using tables. When the events are mutually exclusive. Statistics Page 27 . When two events are independent. P (A or B) = P (A) + P (B) When the two events are not mutually exclusive. The probability of any event will always be from 0 to 1 When an event cannot occur (impossible event). A trial means tossing a coin once.PROBABILITY Concepts Probability is the mathematics of chance. An outcome of a probability experiment is the result of a single trial of a probability experiment. the addition rules are used. A tree diagram can be used to determine the outcome of a probability experiment. Rules – 1. the probability that both events occur can be found by using multiplication rules. the probability will be 0 When an event is certain to occur. it is necessary to find probability of two or more events occurring. 3. Each outcome of the experiment is equally likely. In these cases. the probability is 1 The sum of the probabilities of all the outcomes in the sample space is 1 The probability that an event will not occur = (1 – probability that event will occur) Sample space can be represented in two ways: tree diagrams and tables. and it means add or union. When two dice are rolled. rolling a die or drawing a single card from the deck. Each outcome of a probability experiment occurs at random. An event with a single outcome is called simple event and with two or more outcomes is called a compound event. the outcomes when selecting a card from an ordinary deck can be represented by a table. A tree diagram consists of branches corresponding to the outcomes of two or more probability experiments that are done in sequence. they have no outcome in common. Sample space can be represented using tree diagrams and tables. probabilities can be computed for specific events Addition RulesMany times in probability. Probability Experiment is a process of chance that leads to well defined outcomes or results. P (A or B) = P (A) + P (B) – P (A and B) The key word in these problems is “Or”.

identification tags. There are three types of probability: Classical probability uses sample spaces. and it is defined as the frequency of an event divided by the total number of frequencies Subjective probability is made by a person’s knowledge of the situation and is basically an educated guess as to the chance of the event occurring Bayes’ theorem – Statistics Page 28 . the average of the outcomes or the payouts can be computed using mathematical expectation. Odds in favor = Odds against = Expected ValueMathematical expectations can be thought of as a long term average. The difference between a permutation and a combination is that for a permutation. Odds are computed from probabilities. For example. license plates. If the game is played many times. the fundamental counting rule. Conditional Probability – The key word for multiplication rule is “and” and it means intersection. the permutation rules. E(x) = In order to determine the number of outcomes or events. Classical probability is defined as the number of ways (outcomes) the event can occur divided by the total number of outcomes in the sample space. A sample space is the set of outcomes of a probability experiment. P (A and B) = P (A). P (B) If the events are dependent. P (B|A) where P (B|A) = P (B|A) is also known as conditional probability. the probability of the second event occurring is changed after the second event occurs. Empirical probability uses frequency distributions. order is important in phone numbers. and the combination rule can be used. Conditional probability is used when additional information is known about the probability of an event. . the order or arrangement of the objects is important.P (A and B) = P (A). Order is not important when selecting objects from a group. social security numbers. however. probabilities can be computed from odds if the true odds are known. Odds and Expectations – Odds are used to determine the payoffs in gambling games. dictionary etc.

λ is mean or expected value and x is number of successes where mean and variance = np Statistics Page 29 . For example – If people come to a railway station in a uniform distribution and a train leaves every 5 minutes. p is probability of success 3. What is the probability that a person arriving at the station will have to wait for less than a minute? The number of persons arriving is uniform and hence one in five persons arrive every minutes and hence probability = 0. Uniform Distribution. over a period of area or volume P= where e is mathematical constant. r is number of successes.A distribution is said to be uniform if the probability of the variable is equal for all values in the given interval.2 2.Probability Distributions – 1. Poisson Distribution – • • It is used when variable occurs over a period of time. Binomial Distribution – • • • • • • • Each trial can only have two outcomes There are a fixed number of trials The outcome of each trial is independent of each other The probability for an outcome must be same for each trial where n is number of trials.

4. mean is 0 and variance is 1. Normal Distribution – • • • • • • • In a standard normal distribution. If The standard normal values are called z scores It is bell shaped and symmetric about the mean and continuous and asymptotic to the axis Area under the curve is 1 The mean. median and mode are at the centre of the distribution Statistics Page 30 .

must pay a fine of Rs 50. If 3 are selected at random and tested. If a kangaroo is selected at random. Box 1 contains 2 red balls and 1 blue ball. 9. Statistics Page 31 . 7. If 4 directors are selected at random. determine the optimal strategy for both locations. 3. Construct a payoff table. If a cup is damaged. A street vendor. If 30% of commuters ride to work on a bus.Problems 1. the probability of getting two tails is? When a card is selected from a standard pack. the probability that it is a ‘9’ is? When a card is selected from a standard pack. 17. 2. If the ball is red. 6. Otherwise. what is the probability that they are not defective? 10. 8. A board of directors consists of 7 women and 5 men. the probability that exactly 2 directors are men is? 14. the probability that it is a diamond or a number card is? In a survey of 180 people. About 5% of rabbits are brown in color. The probability that there will be a car accident in a particular road is 0. the vendor can make Rs 100 at Main Road or Rs 75 at Cross Road. Two manufacturers supply paper cups to a certain catering service. The number of accidents follows Poisson distribution. Box 2 contains 1 red ball and 3 blue balls. if the vendor is caught by city inspector. If a person who took the exam was selected at random. find the probability that its height is between 62 and 66. The probability that the sum of spots on the faces will be ‘8’ is? When two coins are tossed. there are 6 white marbles. 3 will ride the bus. find the probability that in 100 randomly selected rabbits. If a sample of 1000 persons is taken. find the probability that the person scores above 230. find the probability that there will be exactly 4 accidents? 15. find the probability that it came from ‘A’? 20. and find the value of the game. 5. 12. When a die is rolled. ‘A’ supplied 100 cups and 5 were damaged.8 inches 18. the standard deviation of the sample will be? 13. If a person is selected at random. Each of the two boxes is selected and a ball is selected from the box at random. what is the probability that the person is over 60? If a letter is selected at random from the word “PROBABILITY”. 7s are over 60. find the probability that if 8 workers are selected at random. If there are 500 cars on the road on a particular day. 4 are defective. Assume normal distribution. 3 blue marbles and 1 red marble. 4. find the probability it came from box 1? 19. How many different 3 digit codes can be made? 11. 7 rabbits are brown in color? 16. A survey found that 10% of older people have given up driving. In an exam (which is approximately normally distributed). the probability that it is a vowel is? In a box. If the distribution is Poisson. the average marks were 200 and variance was 400.01. what is the probability of getting a number greater than 4? Two dice are rolled. If a marble is selected at random what is the probability that it is not white? In a sample of 10 pieces. ‘B’ supplied 50 cups and 3 were damaged. The average height for adult kangaroos is 64 inches with a variance of 4 inches.

It is also known as accepting a bad lot or consumer’s risk Statistics Page 32 . It is also known as the hypothesis of no difference.H0 =µ and H1 > µ or H0 ≤ µ and H1 > µ Left tail test .H0 =µ and H1 < µ or H0 ≥ µ and H1 < µ • • • • • Two tail test – A hypothesis with one rejection region. 5. It decides whether the test has to be a one tailed test or two tailed test Type I error – Rejecting a hypothesis when it is true. Alternate hypothesis (H1) – A hypothesis which contradicts the null hypothesis. It is also known as rejecting a good lot or producer’s risk Type II error – Accepting a hypothesis when it is false.HYPOTHESIS TESTING Procedure in Hypothesis Testing1. 2. It is also the probability of committing a type I error Acceptance region – Complementary region Critical Region – Rejection region One tail test – A hypothesis with two rejection regions. o o Right tail test . Formulate a Hypothesis Set up a suitable significance level Select test criterion Compute the statistic Make the decision H0 Accepted Correct decision Type II error (β) H0 Rejected Type I error (α) Correct decision H0 is True H0 is False Explanations• • • • • • • • • Parameter – Statistical measure based on all units of a population Statistic – Statistical measure based on all units of a sample Sampling distribution – Distribution of a statistic Standard error – Standard deviation of the sampling distribution of the statistic Confidence interval – An interval that is expected to include the true values of the parameter with the desired levels of confidence Significance level (α) – It indicates the percentage of sample data outside certain limits. H0 =µ and H1≠µ Null hypothesis (H0) – The hypothesis which is tested for possible rejection under the assumption that it is true. 4. 3.

Non-Parametric Tests • K-S test for goodness of fit of one sample (Kolmogorov-Smirnov) o o o o o o o • Sum cumulative frequency of observed values Convert to percentage Find the expected values and convert to percentage Find the difference of observed and expected values The maximum difference value is called D value Degree of freedom is the number of observations Compare with table value of D at degrees of freedom U Test (Mann-Whitney Test for Equality of two means) U = n1n2 + n1 (n1 + 1) n (n + 1) − R1or = n1n2 + 2 2 − R2 Whichever is lesser 2 2 n1n2 2 n1n2 (n1 + n2 + 1) 2 If σ = 12 U −µ Z= µ= Where Ri is sum of ranks of each group and ni = number of observations in each group • H Test (Kruskal Wallis Rank Sum Test for Equality of several means) σ R 12 H= Σ i − 3(n + 1) Where n = total number of observations. Ri = group sum of ranks n(n + 1) ni 2 Statistics Page 33 .

Use Kolmogorov-Smirnov test (K-S test) to test the hypothesis that there is no difference in ratings amongst the respondents Total Respondents Very Important Somewhat Important Neither Important nor Unimportant Somewhat Unimportant Very Unimportant 1. Use Kruskal-Wallis method (H test) to test the hypothesis that the increase in sales using different methods in different cities is the same at 5% level of significance. Chennai Mumbai Kolkata 70 65 53 58 57 59 60 48 71 45 55 70 55 75 63 62 68 60 89 45 58 72 52 75 63 Statistics Page 34 .PROBLEMS1. A company used three different methods of advertising its product in three cities It found out the increased sales in identical retail outlets in three cities as follows. Brand A Brand B 603 620 625 640 641 646 622 620 585 652 593 639 660 590 600 646 633 631 580 669 615 610 648 619 2. Use MannWhitney U test to compare the life time of brands A and B light bulbs. A company surveyed 100 respondents to know about the importance of computers in their life. 100 25 30 10 20 15 The following data indicates the lifetime (in hours) of samples of two kinds of light bulbs in continuous use. The respondents indicated as follows.

Fe = Expected Frequency DF (degrees of freedom) = (k-1) Fe Chi square distribution for independence of attributes- χ2 = ∑ ( Fo − Fe) 2 Fe row total * column total Fe = grand total Where Fo = Observed Frequency. Fe = Expected Frequency DF (degrees of freedom) = (r-1)(c-1) where r is number of rows and c is number of columns Statistics Page 35 .Chi-Square Test • Chi square distribution for goodness of fit- χ2 = ∑ where k is number of classes • ( Fo − Fe) 2 Where Fo = Observed Frequency.

Find out whether the calls are uniformly distributed over the week. Can it be inferred that availing of loans is more common among boys? Educational Loan Taken Not taken Total Boys 14 16 30 Girls 8 12 20 Total 22 28 50 Statistics Page 36 . Days Number of calls Monday 124 Tuesday 120 Wednesday 126 Thursday 134 Friday 146 Test for independence of attributes2. The following table gives the average number of calls received by an operator on various days of the week in a call centre. The following information is obtained concerning 50 randomly selected students.PROBLEMSTest for goodness of fit1.

• Z test for one sample mean- Z= x −µ σ Where σ n is the standard error. use formula t = s SD n n −1 Degrees of freedom = n-1 • T test for difference between means- n1 s1 + n2 s2 Where s = and n1+n2-2 = degrees of freedom t= n1 + n2 − 2 1 1 s + n1 n2 x1 − x 2 2 2 Statistics Page 37 . If σ is not known. we can use‘s’ n • Z test for difference between means- Z= x1 − x 2 σ 12 n1 + σ 22 n2 = Where σ 12 n1 2 + σ 22 n2 is standard error and H0 =µ1-µ2=0. we can estimate σ by the formula σ n1 s1 + n2 s 2 n1 + n2 2 • T Test for One sample mean- t= x−µ x−µ . Where standard deviation is given directly. If ‘ σ ’ is not given.

The sales of 4 salesmen . C & D of the Company Sellers in three seasons are given below.1. B. Can we say the prices of the commodities differ in the four cities? City Chennai Mumbai Delhi Kolkata Prices 11 7 9 8 7 9 4 12 10 11 7 12 3 8 2 8 2.A. ANOVA The following table gives the retail prices of a certain commodity in some selected shops in four cities as below. Can we conclude that overall sales are dependent on seasons? Are the four salesmen equally effective? Season/Salesman Summer Winter Monsoon A 6 7 8 B 4 6 5 C 8 6 10 D 6 9 9 Statistics Page 38 .

Statistics Page 39 .

Milk costs Rs 14 per litre and is sold at Rs 20 per litre.DECISION THEORY DECISION UNDER UNCERTAINTY 1. Past records of 200 days show the following demand pattern Milk (Litres) No. and continuing the same product with new packaging at a nominally increased price (S3). Laplace criterion and Hurwitz criterion (α=0. The marketing department has given profits for each of these strategies are given below- E1 S1 S2 S3 700. Milk is sold in multiples of 5 litres only and there is an assured demand for 15 litres every day. He sells in Kgs only. There is a guaranteed demand for 5 newspapers. Construct a payoff and opportunity loss table.8)? DECISION UNDER RISK 4.000 0 300. modifying the existing product at a moderately increased price (S2).000 450.000 E3 150. not change at all (E2) or decrease (E3) with respect to these strategies. A newspaper vendor can stock up to 10 newspapers in his store. The cost per Kg is Rs 30 and the selling price per Kg is Rs 50. Identify the best course of action for maximum expected profits and Calculate EVPI Statistics Page 40 . Unsold newspapers are disposed off for Rs 1 per unit.000 What strategy should the company choose on the basis of . of days 15 4 20 16 25 20 30 80 35 40 40 30 45 10 Construct a conditional profit table. A food product company is contemplating the introduction of a new product to replace an existing product at a higher price (S1). Unsold milk is disposed off. 2.000 500. Sales may increase (E1).Maximin criterion.Any units not sold at the end of the day are wasted. Each newspaper costs Rs 2 per unit and is sold for Rs 4. Minimax Regret criterion. Maximax criterion.000 300.000 E2 300. Construct a payoff and opportunity loss table. 3. A milk producer needs to determine how many litres of milk are to be produced on a daily basis to meet demand.000 300. A retailer has space for up to 4 Kgs of tomato in his store.

Statistics Page 41 .