# Welcome to Business Statistics Lecture 1 & 2

Contents: Basic Statistical Concepts • Summarisation of Data • Frequency Distribution • Measures of Central Tendency • Measures of Dispersion • Relative Dispersion, Skewness.
1

Using Statistics.

 

  

Malcom Forbes a businessman and a key hot air balloon enthusiast lost his way & landed in the middle of a cornfield. He saw a man running to him and had the following conversation, Forbes – “Sir, Can you tell me where I am?” Man – “Certainly, you are in a basket in a field of Corn.” Forbes – “Sir, You must be a Statistician.” Man – “That’s amazing. How did you know?” Forbes – “Easy. Your information is concise, precise and absolutely useless!!!” A GOOD STUDENT of Statistics should ensure that the information resulting from a good statistical analysis is always CONCISE, often PRECISE and never USELESS.

2

the characteristic being studied is generally nonnumeric. although the description might be arbitrary. or number of children in a family. State of birth 1. minutes remaining in class. C. A.Types of Variables A. EXAMPLES: Gender. B. D. Examples: Car Registration number. eye color are examples. 4. Quantitative Variable can be either Discrete or Continuous. A. 3. type of automobile owned. A. Qualitative variables could also be described by numbers. religious affiliation. Quantitative variable – Can be described by a number for which arithmetic operations such as averaging makes sense. Qualitative or Attribute variable . 3 EXAMPLES: Balance in your mobile account. etc. . state of birth. 2.

Summary of Types of Variables 4 .

religious affiliation. Sprite number 2. and Orange Mirinda number 4. Strongest – 4. Numbers are just labels for groups or classes. There is no natural zero point. 10:00 a. with the additional property that meaningful amounts of differences between data values can be determined. EXAMPLES: Monthly income of surgeons. Nominal stands for NAME 3 Interval scale . EXAMPLES: eye color. 5 . Sevenup number 3. EXAMPLE: During a taste test of 4 soft drinks.m. Coca Cola was ranked number 1.m. 2 Ordinal scale – involves data arranged in some order according to their relative size or quality. but the interval between 00:00 & 10:00 a.data that is classified into categories and cannot be arranged in any particular order. is twice the interval between 00:00 and 5:00 a. gender. EXAMPLE: Time of a day. 1 Nominal scale . The differences between data values cannot be determined or are meaningless. 4 Ratio scale . is not twice of 5:00 a. Platform number.m.similar to the ordinal scale. Differences and ratios are meaningful for this level of measurement.Four Scales of Measurement Weakest – 1. We know one is better than the other but how much better is not known.m. or distance traveled by manufacturer’s representatives per month..the interval scale with an inherent zero starting point.

s are used for describing sample statistic. objects.Population versus Sample A population is a collection of all possible individuals. A sample is a portion. like µ or σ are used for population & termed as Population Parameter. The population is also called the UNIVERSE. or part. 6 . or measurements of interest. x. Greek letters. or subset of measurements selected from the population of interest. Roman letters.

based on a sample. Descriptive Statistics .Types of Statistics – Descriptive Statistics Data and Data Collection – A set of measurements obtained on some variable is called a data set. summarizing. estimate. Inferential Statistics: A decision. and presenting data in an informative way. prediction. 7 . tabulating & presenting the data is a challenge.methods of organizing. or generalization about a population. Generally when the entire population space is considered.

Mean Deviation. Standard Deviation. continuous distribution. Measures of Central Tendency. Harmonic. continuous distribution Range.   Percentiles & Quartiles. Coefficient of Variation. Geometric. Arithmetic.Problems to be solved. Mode for individual.       Measures of Dispersion. Combined Standard Deviation. discrete. Test for Skewness. discrete.      Mean. Median for individual. discrete. Mean from Assumed mean. 8  Skewness. continuous distribution. Mean for individual.  .

It should be based on all values of the data. It should be simple to compute. which means that it should be calculated and interpreted in the same way by everyone. It should not be unduly affected by the extreme values.Requisites of a Good Measure of Central Tendency       It should be rigidly defined. It should be amenable for further algebraic treatment. 9 . It should be amenable to sampling. by which we mean that the results obtained by various samples should be similar.

Some measures of Central Tendency. Geometric Mean: It is a specialised average and is applicable when quantities requiring averaging are drawn from situations following Exponential law of growth or decline. ten parts and 100 parts respectively. Quartiles. Median: It refers to the VALUE of the middle observation of the array & is an positional average. MODE: MODE is the Value of the data that occurs most frequently. Harmonic Mean: Harmonic Mean is used to average rates. Percentiles: These are also positional averages and divides the series into four parts. 10 . Deciles.       Arithmetic Mean: It is an mathematical average and is obtained by dividing the sum of the observations by the number of observations.

 Mean averages out the positive and negative deviations. “Average number of children is 3.  Mean is not useful for studying quantitative phenomena like beauty.  For open ended distributions mean cannot be calculated with accuracy. Demerits  It is affected by extreme values & thus for distributions where concentration is on small or big values the mean is not an ideal representative.  It is based on all items of the series.Arithmetic Mean Merits  Easy to understand and simple to calculate. honesty.  Rigidly defined by a mathematical formula.  It has sampling stability and is least affected by sampling fluctuations. etc.6 in India” is meaningless.  Mean does not have a life of its own. which is incorrect.  Arrangement of items is not required. 11 .  It is capable of further algebraic treatment. intelligence.

 It is not affected by extreme values.  Easier to compute as compared to mean in case of unequal class intervals.  Incapable of any algebraic treatment & combined medians cannot be obtained. Demerits  Requires arrangement of data.  Suitable in case of Qualitative Data  It minimises total absolute deviations.  Assumption of uniformly distributed median class is not always true.  It is not based on all the items of the series. 12 .Median Merits  Useful in Open ended series as it is based on position and not on the values.

g. e. 13 .  It is incapable of further algebraic treatment.  It can be used for qualitative phenomena.  Its value is affected by size of class interval. garments.  It is not rigidly defined because different formulae will give different answers.  It indicates point of maximum concentration in case of highly skewed distributions. size of shoes. Limitations  In case of bi modal or multi modal series.MODE Merits  In certain situations mode is the only suitable average.  It is not based on all the items of the series. mode cannot be uniquely defined. etc. wages.  It is not affected by extreme values.

charts. and graphs to show the typical selling price on various dealer lots. The table on the right reports only the price of the 80 vehicles sold last month at Whitner Autoplex. 14 . Kathryn Ball of AutoUSA wants to develop tables.Case Study – Descriptive Statistics Ms.

then 26 = 64. Step 2: Determine the class interval or width. So the recommended number of classes is 7. H is the highest observed value. If we try k = 6. such as a multiple of 10 or 15 100. which means we would use 6 classes. A useful recipe to determine the number of classes (k) is the “2 to the k rule. If we let k = 7.925 . There were 80 vehicles sold.Constructing a Frequency Table Example Step 1: Decide on the number of classes. The formula is: i  (H-L)/k where i is the class interval. Hence.546)/7 = \$2. L is the lowest observed value. So n = 80. and k is the number of classes.000  . somewhat less than 80.\$15.911 Round up to some convenient number. which is greater than 80. Use a class width of \$3. then 27 128.” such that 2k > n. 6 is not enough classes. (\$35.

Constructing a Frequency Table .Example  Step 3: Set the individual class limits 16 .

Constructing a Frequency Table   Step 4: Tally the vehicle selling prices into the classes. Step 5: Count the number of items in each class. 17 .

18 . each of the class frequencies is divided by the total number of observations.Relative Frequency Distribution To convert a frequency distribution to a relative frequency distribution.

Graphic Presentation of a Frequency Distribution The three commonly used graphic forms are: Histograms  Frequency polygons  Cumulative frequency distributions  19 .

Histogram Histogram for a frequency distribution based on quantitative data is very similar to the bar chart showing the distribution of qualitative data. 20 . The classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars.

Histogram Using Excel 21 .

 22 . It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies.Frequency Polygon A frequency polygon also shows the shape of a distribution and is similar to a histogram.

Cumulative Frequency Distribution 23 .

Cumulative Frequency Distribution 24 .

 It is least affected by fluctuations in sampling.Standard Deviation. σ Merits.  It is very much affected by the extreme values & importance is given to extreme values from the mean than the near values.  It is difficult to compute as compared with other measures of dispersion.  It provides a unit of measurement for normal distribution. Demerits.  It is amenable to algebraic treatment.  It facilitates the calculation of combined standard deviation of two or more groups. 25 .  It is based on all items of the distribution.  It cannot be used for comparing the variability of two or more series of observations given in different units.