You are on page 1of 19

# HANDOUTS IN MATHED 2

INTRODUCTION
Definitions of Statistics:
 A branch of science which deals with the collection, presentation, analysis, and
interpretation of data.
 Recorded data such as the number of business permits issued, number of customers
eating at a restaurant, the size of enrollment at USLS, and so on.
 Numerical characteristics calculated for a set of data (e.g., mean, median, mode)
 The backbone of Research
Two Branches of Statistics
1. Descriptive Statistics
- deals with organizing and summarizing observations so that they are easier to
comprehend
- used to describe the basic features of the data in a study
- provide simple summaries about the sample and the measures
2. Inferential Statistics
- deals with the formulation of inferences about conditions that exist in a population
from study of a sample drawn from a population.
- make inferences from the data to more general conditions
The Research Process:
Why do research?
 Formulate the problem
• S – pecific
• M - easurable
• A – attainable
• R – ealistic
• T – ime bound
 Define the population of the study
o Population – all subjects under investigation
– the set of all elements of interest in a particular study
o Sample
– a subset of the population
 Identify the variable/s of the study
o Variable – measurable characteristic of the subject
– any entity that can take on different values
Example:
Problem: What is the average weekly allowance of a USLS BMath 2 student for the first semester
of AY 2012 – 2013?
 Population of study:
• All USLS BMath 2 students for the first semester, AY 2012 - 2013
 Variable/s:

weekly allowance of a BMath 2 student
:
:
 (Anticipated) Conclusion:
• The average weekly allowance of a USLS BMath 2 student for the first semester of
AY 2012-2013 is ________.
1.

2.

Types of Variables:
Qualitative/Categorical
 Attributes are in terms of categories
Examples:
a. sex:
Male /
Female
b. religious affiliation:
Roman Catholic / INC /
Quantitative/Numerical
 Attributes are in terms of counts or measurements

Baptist / Islam / etc…

weight. Number of students b. Number of cars b. Volume of water   • •  Functions of variables: Important if the investigation is about cause and effect Distinctions: a. Ordinal Level Possesses rank order characteristics  the categories must still be mutually exclusive and exhaustive. Independent Variable what the researcher (or nature) manipulates -. Dependent Variable what is affected by the independent variable -.the effects or outcomes Example: Study/Problem: the effects of a new educational program on student achievement Independent variable . Discrete Variable • uses the process of counting to generate data • values of attributes are in terms of whole numbers only Examples: a. but they also indicate the order of magnitude of some variable Examples: military rank. large) 3. Interval Level  Has all the properties of the ordinal scale  A given interval (distance) between scores has the same meaning anywhere on the scale  Intervals provide information about how much better one value is compared with another  Has no absolute zero Examples: temperature measured on Celsius or Fahrenheit. medium.measures of achievement Defn: Measurement – The process of assigning numbers to observations Levels of Measurement 1. Ratio Level  Possesses all the characteristics of the interval scale  Has a true or absolute zero point  The ratio of two values is meaningful Examples: distance.the program Dependent variables . Nominal Level  Consists of numbers which indicate categories for purely classification or identification purposes  The categories are mutually exclusive (the observations cannot fall into more than one category)  The categories are exhaustive (there must be enough categories for all the observations) Examples: gender.a treatment or program or cause b. citizenship 2. Indicate whether each of the following examples refers to a population or to a sample. cost of an automobile EXERCISES 1. test scores 4. height. size of T-shirts (small.2  Distinctions: a. religious affiliation. Weight of a package b. . time. Continuous Variable • uses the process of measuring to generate data • values of attributes may have fractional or decimal parts Examples: a.

As part of the test. Number of banks in the municipalities and cities of Negros Occidental f. I can say that the score for one individual is twice as large as the score for the other individual. What is the sample for this study? c. Two days later. Distance of the student’s house from school o. Identify the level of measurement of the following variables. • Simple Random Each element of the population is given an equal chance of being included in the sample • Most basic probability sampling procedure • Foundation of all probability sampling procedures . A group of 25 customers selected to taste a new soft drink Salaries of all CEOs in the pharmaceutical industry Customer satisfaction ratings of all clients of a local bank Monthly phone expenses of selected Globe subscribers 2. For each of the following statements. ratio) that the researcher used. Favorite TV show b. A researcher measures two individuals and the uses the resulting scores to make a statement comparing two individuals. Time required to complete a Sudoku puzzle i. b. Number of leaves m. Earnings per share k. a.3 a. a. Shoe size c. I can say that one individual scored higher than the other. c. Age f. Brand of jeans you prefer b. I can only say that the two individuals are different. if not eliminates. Grade in Math 1 i. Place of birth g. A firm is interested in testing the advertising effectiveness of a new television commercial. Scores of freshmen college students on an attitude towards math scale h. 5. but I cannot specify how much higher. Zip code q. the commercial is shown on a 6:30 PM local news program in Bacolod City. Family monthly income e. measured in minutes j. d. a market research firm conducts a telephone survey to obtain information on recall rates (percentage of viewers who recall seeing the commercial) and impressions of the commercial. High school GPA d. quantitative discrete (QD) or quantitative continuous (QC) variables. Why would a sample be used in this situation? Explain. identify the scale of measurement (nominal.) j. ordinal. Weekly allowance n. SAMPLING TECHNIQUES Defn: Sampling – the process of selecting the subjects of the population to be included in the sample Types of Sampling: A. d. Age l. Ranking of professional tennis players g. Probability sampling  each element of the population is given a chance of being included in the sample  minimizes. Ratio of current assets to current liabilities c. Rating of the management skills of a company president e. a. Number of text messages received per day d. Travel time (in minutes) from USLS to residence 4. Number of sacks of rice 3. b. c. Number of children in the family h. Indicate whether the following are qualitative (QL). I can say that one individual scored 6 points higher than the other. Color of the hair p. What is the population for this study? b. a. Effectiveness of a drug for headache. interval. selection bias 1. Height (in cm.

how many employees must we take from each race? Solution: Race (i) Ni Filipino 182 Chinese 51 American 17 Total 4. Systematic Random • Selecting every kth element of the population • When to use: – When the population is homogenous and there is no suspicion of a trend or pattern in the frame or geographical layout – A sampling frame is available • Procedure: i. of the population. Include all elements within the selected clusters to form the resulting sample 5. rs + 2k. Multi-stage random sampling Repeated cluster sampling B. k ii. Non-probability sampling  not all elements of the population are given a chance of being included in the sample  prone to selection bias 1. 51 are Chinese. Convenience / Voluntary /Haphazard/Accidental • Sample elements are selected because they are available . Determine the number of the elements to be included in the sample: rs. rs + k. • • 250 % ni 100 15 Cluster Random Selecting clusters of elements rather than individual elements • When to use: – when "natural" groupings are evident in a statistical population – a sampling frame is not available • Procedure: i. and 17 are Americans. If we use proportional allocation to select a stratified random grievance committee of 15 employees. Identify the random start: 1 ≤ rs ≤ k iii. Determine the sampling interval.4 When to use: – The population is homogeneous – A sampling frame is available • Procedure: – Lottery – Use of random number generators 2. or strata. Randomly select m clusters iii. Determine the proportion of each stratum relative to the population ii. Divide the population into clusters (M =total number of clusters) ii. • Stratified Random selecting random samples from mutually exclusive subpopulations. Select the samples from each stratum using either simple or systematic random sampling Example: Among the 250 employees of the local office of an international insurance company. 182 are Filipinos. … • 3. Identify the stratum sample sizes using proportional allocation iii. • When to use: – When the population is heterogeneous but can be subdivided into homogeneous subgroups or strata – A sampling frame is available for each stratum • Procedure: i.

Relative Frequency and Percent Distribution of Soft Drink Purchases Soft Drink Relative Percent Frequency Coke Coke Zero Pepsi Pepsi Max Sprite Mountain Dew Total Graphical presentations of qualitative data: 1. Coke Coke Zero Pepsi Pepsi Max Pepsi Max Sprite Mountain Dew Mountain Dew Coke Coke Coke Zero Coke Zero Coke Zero Sprite Coke Coke Coke Pepsi Pepsi Pepsi Pepsi Max Sprite Pepsi Max Sprite Coke Coke Pepsi Max Pepsi Max Coke Coke Pepsi Coke Coke Zero Coke Zero Pepsi Mountain Dew Coke Mountain Dew Pepsi Max Sprite Pepsi Coke Pepsi Max Pepsi Max Coke Mountain Dew Pepsi Pepsi Max Sprite Mountain Dew Table 1. relative frequency. or percent distribution 2.6 Frequency Distribution . Frequency Distribution of Soft Drink Purchases Soft Drink Coke Coke Zero Pepsi Pepsi Max Sprite Mountain Dew Total (n) Frequency (f) 50 Relative Frequency – the fraction or proportion of items belonging to a class: rf = f / n Percent = relative frequency x 100 Table 2. Construct a frequency distribution to summarize the data. Pie chart – A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class . Example: The following data were obtained from a sample of 50 soft drink purchases. Bar graph – A graphical device for depicting qualitative data that have been summarized in a frequency.A tabular summary of data showing the number (frequency) of items in each of several non-overlapping classes.

but not so many that some contain only a few items. of classes = __________ Step 3: Lower class limit of first interval = lowest value in the data set = _______ Lower class limit of second interval = lower class limit of 1 st interval + class width ___________ What is the upper class limit of the first interval? Table 4. Determine the class limits.com/watch?v=-ERARVSfeuw SUMMARIZING QUANTITATIVE DATA Constructing a Frequency Distribution for Quantitative Data 1. Frequency Distribution of Audit Times Audit Time (in days) Tally Frequency = . Determine the number of non-overlapping classes. use enough classes to show the variation in the data. 12 15 20 22 14 14 15 27 21 18 19 18 22 33 16 18 17 23 28 13 16 21 15 14 27 30 31 25 22 18 Steps in Constructing a Frequency Distribution: Step 1: Number of classes = 6 Step 2: Range = ________ Class width = range / no. Example: These data show the time in days required to complete year-end audits for a sample of 30 clients of a small accounting firm. Determine the width of each class (also called interval size). Lower class limit – identifies the smallest possible data value assigned to the class Upper class limit – identifies the largest possible data value assigned to the class 4. Class width (i)= range / no. Develop a frequency distribution for the data. Count the number of data values belonging to each class. 2. use between 5 to 20 classes.youtube.7 USING EXCEL: Watch Excel Statistics 15: Category Frequency Distribution w Pivot Table & Pie Chart by ExcellsFun at http://www. of classes Range = highest value – lowest value 3.

by DannyRocksExcels: 1. each with zero frequency. Histogram – A graph consisting of a series of vertical columns or rectangles with no gaps between bars each bar is drawn with a base equal to the class boundaries and a height corresponding to the class frequency a suitable graph for representing data obtained from continuous variables.youtube. an additional class interval is added to both ends of the distribution. Example: Using the audit time data. Frequency Polygon – Constructed by plotting class marks (X) against class frequencies (Y) and connecting the consecutive points by straight lines to close the frequency polygon. Relative frequencies – obtained by dividing the class frequency by the total frequency 3. Cumulative percentages – obtained by dividing the cumulative frequencies by the total number of cases and then multiplying the result by 100. Two Ways to Create a Frequency Distribution Report in Excel . Cumulative frequencies – the number of data items with values less than or equal to the upper class limit of each class. Cumulative percentages provide information on the percentage of values less than or equal to a specified value.com/watch? v=nh5ObAKfj1o&feature=fvsr . USING EXCEL: Watch the following videos: A. http://www.8 Total In two to three sentences. complete the following table. Percentages – obtained by multiplying the relative frequencies by 100% 4. Frequenc y Audit Time Class Boundari es Class Marks Relative Frequenc y Percenta ge Cumulati ve Frequenc y Cumulati ve Percenta ge Graphical Representations of Quantitative Frequency Distributions: • • • 1. obtained by summing the frequencies 5. 3. 2. Class Marks or Class Midpoints – the value halfway between the lower and upper class limits 2. of adjoining classes 1. describe how the audit time data is distributed. respectively.the true or real limits of an interval the specific points that serve to separate adjoining classes along a measurement scale for continuous variables can be determined by identifying the points that are halfway between the upper and lower stated class limits. Ogive – A graph of a cumulative frequency distribution plotting the upper class boundaries (X) against the cumulative frequencies (Y) the lower end of the graph is connected to the X-axis by adding another interval. __________________________________________________________________________________________________ __________________________________________________________________________________________________ __________________________________________________________________________________________________ Other Components of a Frequency Distribution Class Boundaries .

http://www. Dist. Stem and Leaf Plot 6 7 8 9 10 11 12 13 14 Shapes of Distributions 1. http://www. 112 73 126 82 92 115 95 84 68 100 72 92 128 104 108 76 141 119 98 85 69 76 118 132 96 91 81 113 115 94 97 86 127 134 100 102 80 98 106 106 10 73 124 83 92 81 106 75 95 119 Procedure: 1.youtube. Symmetric – the shape of the left side of the distribution is a mirror image of the right side Skewed – the two sides of the distribution are not mirror images of each other .com/watch? v=vCUMqHKwFn8&feature=BFa&list=ULx8ePdM9LquM 2. • the data are arranged by place value: o Stems . 2. 3. w Formulas.9 B.youtube. Arrange the leading digits of each data value to the left of a vertical line. Sort the digits on each line in rank order in order to obtain a stem-and-leaf display.com/watch?v=x8ePdM9LquM&feature=BFa&list=ULvCUMqHKwFn8 Stem and Leaf Plots • a type of graph that is similar to a histogram but shows more information. w Formulas. Excel Statistics 21: P2 Quantitative Freq. record the last digit for each data value corresponding to its first digit. by ExcellsFun: 1.the digits in the smallest place Example: The following data are the result of a 150-question aptitude test given to 50 individuals who were interviewed for a position at a manufacturing company. http://www. To the right of the vertical line. Excel Statistics 20: P1 Quantitative Freq. 2.the digits in the largest place o Leaves . • summarizes the shape of a set of data (the distribution) and provides extra detail regarding individual values.youtube.com/watch?v=ERARVSfeuw 2. Dist. Excel Statistics 22: Histogram & Ogive Charts & % Cumulative Frequency.

describe the frequency distribution and polygon that you BASIC SUMMATION NOTATION In Statistics. 38 50 37 42 53 52 54 57 49 63 54 38 46 49 33 43 40 29 43 60 69 54 64 41 63 44 55 58 55 41 52 51 53 49 48 64 55 37 47 50 Construct a frequency distribution using 7 classes. food quality. and pie chart to summarize the following data collected on food quality. Construct a frequency distribution for this data set using 8 classes. What do these descriptive statistics tell you about the performance of the students in the exam? 3. Given a set of X n observations represented by 1 . Make a stem-and-leaf plot for the above data set. Construct a frequency distribution. bar graph. c. The following are the final examination test scores of 50 statistics students. as the first value. then the sum can be expressed as X 2 as the second value. and poor (P). Positively skewed (skewed to the right) – scores tend to cluster toward the lower end of the scale (i. Each characteristic is rated on a scale of outstanding (O). b. prices. Plot a frequency polygon of the distribution. made. the smaller numbers) with increasingly fewer scores at the upper end of the scale (the larger numbers) b. Use 7 classes. 112 73 126 82 92 115 95 107 73 124 83 92 81 106 97 86 127 134 100 102 80 69 76 118 132 96 91 81 72 92 128 104 108 76 141 100 119 106 94 85 68 95 115 98 84 75 98 113 119 106 a. We use the symbol  (capital Greek letter sigma) to represent the sum of a set of numbers. The number of friend requests 6 14 22 17 25 13 0 13 9 7 14 17 a. b. Mari’s Steakhouse uses a questionnaire to ask customers how they rate the server. The following data are the scores of 50 individuals who answered a 150-item aptitude test as a requirement for a job application. confirmed during a week by 37 Facebook users were: 15 12 18 11 23 10 13 17 8 20 18 13 16 15 0 15 14 15 13 3 15 7 23 10 15 Present this set of data in the form of a frequency distribution. What can you say about the performance of the 50 job applicants who took the aptitude test? Use the graphs to explain your answer. c. Develop a histogram and an ogive for the frequency distribution you constructed.. d. c. .10 a. and so on up to X n . What is the shape of the distribution? In not more than 5 sentences. Negatively skewed (skewed to the left)– most of the scores tend to occur toward the upper end of the scale while increasingly fewer score occur toward the lower end EXERCISES 1. b. and atmosphere at the restaurant. good (G). very good (V). What is your feeling about the food quality ratings at the restaurant? G O V G A O V O V G O V A V O P V O G A O O O G O V V A G O V P V O O G O O V O G A O V O O G V A G 2. 4.e. Construct a histogram and an ogive. average (A). 68 55 65 42 64 45 56 59 56 42 a. it is frequently necessary to work with sums of numerical values. cocktails.

 Population Mean:  Sample Mean:  xi N . In fact. evaluate xy a)  i i b) c)   xi   yi   x  y  i 2 i DATA ANALYSIS Measure . usually represented by lowercase letters of the English alphabet MEASURES FOR QUALITATIVE DATA Summarized using the following measures:  proportions (relative frequencies)  percentages Example: gender coded as   M–0 F–1 Not appropriate to get the “average gender” But: “percentage of females in the group”. 11. y 2  2. where xi  ith score or observation. 11. 5. Parameter – a measure of the population. where xi  ith score or observation. and 3 x x  x 2  x3  3 + 5 + 7 = 15 a)  i = 1 2 x b)  i = c)  (x i  2) 2 = Example 2. n  sample size  Example 1: During a particular summer month. mean) is computed by summing all the observations in the sample and dividing the sum by the number of observations. Given x1  2. and y 3  5 . If x1  3 . N  population size  xi X  n . the limits of summation are .a number that summarizes a particular characteristic of a given data set. x  7 .11 n X i 1 i  X1  X 2  K  X n When we are summing over all the values of often omitted and we simply write X X i X i that are available. 8. 14. x3  1. “proportion of males” MEASURES FOR QUANTITATIVE DATA MEASURES OF CENTRAL TENDENCY ARITHMETIC MEAN  (or simply. some authors even drop the subscript and let represent the sum of all available data. find Example 1. the mean number of units sold is . 16. Considering this month as the statistical population of interest. 11. x 2  3. x 2  5 . y1  4. the eight salespeople in an appliance store sold the following number of central air-conditioning units: 8. usually represented by lowercase Greek letters Statistic – a measure of the sample.

8.4 percent.00 0 Php303.5 percent.300. 8.00 0 110. the weighted mean correctly describes the overall average.2percent. in Php (w) A 4.000 Total Php58. 14.00 0 37.2 30.000.000 wX 126. Assuming the sales totals in the following table. the following procedure is used: N 1 n 1 or 2 o Find the position of the median value in the array : 2  Population Median: Sample Median : ~  x N 1 2 ~ x  x n 1 2 Example 3: The eight salespeople described in Example 1 sold the following number of central air-conditioning units.300. unless the four products are equal in sales. this unweighted average is incorrect. line B.000 C 7. 11. X (%) Sales.00 0 30. the median is the mean of the two middle values.5  2 .00 0 MEDIAN  the value of the middle item of an array (arrangement of the values in either ascending or descending order)  If N or n is odd.000 B 5. 11. and sample weighted means are identical:  w or Xw    wX  w  each value in the group (X) is multiplied by the appropriate weight factor (w).000.000.000.000. 11. and line D.1 percent. WEIGHTED MEAN  also called weighted average  an arithmetic mean in which each value is weighted according to its importance in the overall group  formulas for the population. in ascending order: 5.000 D 10. The value of the median is ~  x n 1  x 4. Example 2: In a multiproduct company.1 3.000. 7.  When N or n is large. the profit margins for the company’s four product lines during the past fiscal year were: line A. The unweighted mean profit margin is  x  N However.5 20. 4. 10. the median is the middle value of the array  If N or n is even. 5. and the products are then summed and divided by the sum of the weights. Product Line Profit Margin.4 5.000. line C.000. one generally reports the measures of location to one additional digit beyond the original level of measurement. x 12 i  N Note: For reporting purposes. 16.

11. 11. o The Mean: also an excellent representative value for a population. such as wage rates. The median is better than the mode because its value is more stable from sample to sample. or mode= RELATIONSHIP BETWEEN THE MEAN AND THE MEDIAN    symmetrical distribution: mean = median = mode positively skewed distribution: mean > median negatively skewed distribution: mean < median REMARK: The latter two relationships are always true. for example. the median is generally the best measure of data location for describing population data. for sample data. are located. The mode for this group of values is the value with the greatest frequency. in a population. The lack of symmetry is no special problem because the median wage rate. and 11. . is always the wage rate of the “middle person” when the wage rates are listed in order of magnitude. there is no mode o may have correspond to multiple values.13 Remark: The value of the median is between the fourth and fifth value in the ordered group. in a rectangular distribution where all the frequencies are equal. This is true regardless of whether there is more than one mode or whether the population distribution is skewed or symmetrical. For nonsymmetrical data. but only if the population is fairly symmetrical. 5. For representing sample data: Recall: the purpose of statistical inference with sample data is to make generalizations about the population from which the sample was selected. Since both these values equal “11” in this case. a few very high wage rates for technical specialists) will serve to distort the value of the mean as a representative value. unlike the mean and the median o does not always exist. the best measure of location generally is the arithmetic mean. there may be two or more scores with the same highest frequency. It can be useful as a descriptive measure for a population group.0. 14. MODE  the observation that occurs most frequently. the value corresponding to the highest peak  not necessarily unique. USE OF THE MEAN. the extreme values (for instance. the median equals 11. regardless of whether or not the distribution is unimodal. the value of the mean is the most stable of the three measures. o Thus.  Unimodal – the distribution has a single mode  Bimodal – the distribution has two modes  Polymodal – the distribution has multiple modes Example 4: The eight salespeople described in Example 1 sold the following number of central air-conditioning units: 8. but only if there is one clear mode. in a frequency polygon. o Thus. 16. o However. o The Median: always an excellent measure by which to represent the “typical” level of observed values. o o The mode is not a good measure of location with respect to sample data because its value can vary greatly from sample to sample. such as hourly wage rates in a company. AND MODE   For representing population data: o The Mode: indicates where most of the observed values. MEDIAN. 8.

25. Shift 1 2 3 Percentage defective 1. For a sample of 15 students at an elementary-school snack bar. 22. median. If 9 of the students have IQs of 101. Compute the median.0 85. 43. The average IQ of 10 students in a mathematics course is 114. 4.3 Number of Items. and mode for these sales amounts. How would you describe the distribution from the standpoint of skewness? 6. 118. 55. 99.9 17. 40. 45. Find the mean. 11.3 99. 50.6 a.0 81.5 56.6 34.6 65. b. and 109. and mode.0 98. median. According to a survey.1 1. 10. 27. the daily number of cars rented of a car rental company are as follows: 7 5 9 10 5 10 6 7 4 7 8 7 9 4 5 4 6 9 7 9 8 9 7 9 9 12 5 8 7 7 a. Do these data appear to be consistent with the average reported by the newspaper? Explain your answer.4 7. Justify your choice. Determine the mean. 60. 88.0 28.9 7. The following table shows the percentage of defective items in an assembly department. b. 15. 115. the following sales amounts arranged in ascending order of magnitude are observed: Php10. 25.8 67. in thousands 210 120 50 7.4 52. Find the preferred measure of central location for the sample whose observations18. and mode.9 4. 112 73 126 82 92 a.7 45. 107 73 124 83 92 97 86 127 134 100 69 76 118 132 96 72 92 128 104 108 115 95 84 68 100 81 106 75 95 119 102 80 98 106 106 91 81 113 115 94 76 141 119 98 85 Find the mean. Compute the mean. what must be the other IQ? . If the break-even point for the company is 8 cars per day.0 94.5 0. 125. The following are scores of 50 high school students in a 150-item achievement test in Mathematics. 98.3 64. 25. 106. Determine the overall percentage defective of all items assembled during the sampled week.4 29. b.9 145. the average person spends 45 minutes a day listening to recorded music. and 17 represent the number of automobiles sold during this past month by 9 different automobile agencies. 5.5 2. is the company doing well? Explain. Between the mean and the median.1 9.2 70. The following data were obtained for the number of minutes spent listening to recorded music for a sample of 30 individuals.3 0.2 0. 10. 128.1 4. 33. median. What is the shape of the distribution? 2.6 63. 45. 11.6 4.0 53. b. which measure do you think is more appropriate to use for this data set? Why? 3. 118.2 0. a. During a 30-day period.14 EXERCISES 1. 35. 30.

Deciles are found in exactly the same way that we found percentiles Example: Use the data on car battery lives to find D 7. 50% falls below Q2.7 4.9 3.0 3. If k is a fractional value.0 3. P99. 1. the ith percentile is the (k+1)th observation. and 90% falls below D9.2 2.8 3. the ith percentile is the average of the kth observation and the (k+1)th observation. DECILES    values that divide a set of observations into 10 equal parts denoted by D1.6 1. P2.5 3.4 3.3 3.3 3. 2% falls below P2.5 2. 2.1 4.6 2.6 2. Example: Use the data on car battery lives to find Q 3. MEASURES OF VARIATION .7 3.9 4.7 Find P85.4 3.3 4. 76. If k is a whole number.2 4.1 4.4 4. 4. and 75% falls below Q3 also found in exactly the same way that we solved for percentiles and deciles. such that 1% of the data falls below P 1. 3.5 4. … and 99% falls below P99. Q2. …. and Q3. Example: The following are the lives of 40 car batteries (in years). QUARTILES    values that divide a set of observations into 4 equal parts denoted by Q1. are such that 10% of the data falls below D1. ….1 31. Steps in Finding Percentiles: 1. are such that 25% of the data falls below Q1.9 2.2 3.6 3. D2.3 3. D9.15 8. 20% falls below D2.2 3.9 3. where k = the position of the ith percentile in the ordered data set.4 3.7 3.2 3. Find the position of the ith percentile:  i n  100 .1 3. ….8 3.1 3.7 3. Rank the given data in increasing order of magnitude. and 82 on 3 tests and a 79 on the final examination in a certain course if the final examination counts three times as much as each of the 3 tests? MEASURES OF NON-CENTRAL POSITION    describe or locate the position of certain noncentral pieces of data relative to the entire set of data often referred to as fractiles or quantiles values below which a specific fraction or percentage of the observations in a given set must fall PERCENTILES values that divide a set of observations into 100 equal parts denoted by P1.5 3. k  i = the ith percentile n = the number of observations in the data set 3. What is the average for a student who received grades of 85.

it has the same unit of measurement as the raw data Calculation of the Variance and Standard Deviation: Raw Score Method s2  n  xi2  ( xi ) 2 n(n  1) (Raw score formula) . VARIANCE   a measure of variability that is based on the difference between the value of each observation (xi) and the mean deviation about the mean = the difference between each xi and the mean Population Variance: 2   ( xi   ) N Sample Variance: 2 s2   ( xi  X ) n 1 2 STANDARD DEVIATION  defined to be the positive square root of the variance Population Standard Deviation: Sample Standard Deviation: REMARKS:  The sample variance may be thought of as the average of the squared deviations from the mean  The greater the deviations. the greater the variance  The variance is of little use in descriptive statistics because its calculated value is expressed in square units of measurement  the standard deviation is more widely used. RANGE    difference in value between the highest (maximum) and the lowest (minimum) observation can be computed very quickly but is not very useful considers only the extremes and does not take into consideration the bulk of the observations. Measures Of Variation describe the degree of dispersion. 2. the data are too scant or too scattered to justify the computation of a more precise measure of variability.  These measures only describe the typical or representative values. these do not describe how the observations spread out from the average. a knowledge of extreme scores or a total spread is all that is wanted.16 Given the following data sets: Set A Set B 3 3 4 7 5 7 6 7 8 8 9 8 10 8 12 9 15 15 Find the mean and median values. The range is used when: 1. scatter or spread of scores in a distribution. Remarks:  The measures of central location do not give an adequate description of a given distribution.

0414. o a positive standard score indicates that the transformed raw score is above or higher than the mean o a negative standard score shows that the given raw score is below or lower than the mean. In simple language.57/\$250 x 100% = 83% Therefore.70/\$230 x 100% = 47% For B: \$208. B is considered more risky than A. 2.2092.9691.3043. The formula for transforming a raw score to a standard score. is a transformed raw score. with the following data: The coefficient of variation for each proposal is: For A: \$107.33 The standard deviation is used when: 1. is . expressed in terms of standard deviation units from the mean.0962.04 10(9) 90 90 s  152. the coefficient of variation allows you to determine how much volatility (risk) you are assuming in comparison to the amount of return you can expect from your investment. 3.972)  (506) 2 269. the statistic having the greatest stability is desired. represented by z. the lower the ratio of standard deviation to mean return.5002. Has a mean of zero. A and B.036 13. because the coefficient is a relative measure of risk. coefficients of correlation and other statistics are to be computed later.0245.720  256. Example: Consider two investment proposals.6812. the mean is the preferred measure of central tendency.17 32 xi 71 64 50 48 63 38 41 47 52  xi  506 1. the better your risk-return tradeoff.704  x 2  26. APPLICATIONS OF THE STANDARD DEVIATION COEFFICIENT OF VARIATION  a measure of relative variability  expresses the standard deviation as a percentage of the mean  expressed in percent  can be used to compare the variability of two or more distributions even when the observations are expressed in different units of measurement: the smaller the CV the less variable the values of a given set compared to another data set  formula: CV  s  100% X Remarks: In the investing world.04  12.4441.972 i xi2 s2  10(26. STANDARD SCORE      tells the relative location of a particular raw score with regard to the mean of all the scores in a series.684    152.

7 31.3 22.2 16.6 14.5 14. Example: Ruben got a final grade of 85 in both English and Physics.3 9.6 12.  Approximately 99.1 11.3 12.  Approximately 68% of the data values will be within 1 standard deviation of the mean.8 17. such as the SAT.6 52.9 17. the daily number of cars follows: 7 10 6 7 9 4 7 5 5 7 8 4 6 9 9 10 4 7 5 9 8 Find the range. 19. Suppose that the distribution of scores for such a test is known to be approximately normally distributed. report standardized test scores with the mean for the normative group used to establish scoring standards converted to 500 with a standard deviation of 100.25 ounces. variance. .7 41. Shown here are return on equity percentages for 25 companies.18 z  x X s usually used to compare observations in two or more different distributions of raw scores which have different means and/or different standard deviations. variance.6 9. In which subject was his academic performance better in relation to his class? EMPIRICAL RULE When the data are believed to approximate a bell-shaped distribution. The mean final grades of his class in these two courses are 80 in English and 75 in Physics with standard deviations of 12 and 10.2 During a 30-day period. that is. and standard deviation. The following data are the number of days required to fill orders for these suppliers. Filling weights frequently have a bell-shaped distribution.0 2.3 8.6 6. A and B. A manufacturing firm regularly places orders with two different suppliers.1 12.  Approximately 95% of the data values will be within 2 standard deviations of the mean. Many national academic achievement and aptitude tests. Find the range. If the mean filling weight is 16 ounces and the standard deviation is 0. One measure of success is return on equity – the ratio of net income to stockholder’s equity.2 19. and standard deviation. A goal of management is to help their company earn as much as possible relative to the capital invested. between 400 and 600 b. the empirical rule can be used to determine the percentage of data values that must be within a specified number of standard deviations of the mean. greater than 700 d.2 30. Determine the approximate percentage of reported scores that would be a.4 5. between 500 and 700 c.0 15.7% of the data values will be within 3 standard deviations of the mean.2 rented of a car rental company are as 9 7 9 9 12 5 8 7 7 3.8 5. use the empirical rule to draw conclusions about the distribution of filling weights. EXERCISES 1. Example: Liquid detergent cartons are filled automatically on a production line. 11. respectively. less than 200 4. 9.

51 3.39 3. Unfortunately they have left it rather late to book and there are only two resorts. Medlena and Bistry. why? .43 3. and if so.43 3.48 3.41 3. When they ask about the ages of the holidaymakers at these resorts their travel agent says the only thing he can tell them is that that the mean age of people going to Medlena is 19 whereas the mean age of visitors to Bistry is 22. Two friends want to take a summer holiday before going to college in the autumn.48 3.50 Should the production line be shut down? Why or why not? 6. the production line must be shut down for repairs. the standard deviation of the ages of visitors to Medlena is 8 and the standard deviation of the ages of visitors to Bistry is 2’. Just as they are about to book holidays in Medlena because it seems to attract the sort of young crowd they want to be with the travel agent says.005.45 3.52 3. available within their budget.49 3.19 Supplier A: 11 10 9 10 11 11 10 11 10 10 Supplier B: 8 10 13 7 10 11 10 7 15 12 Use the range and standard deviation to determine which supplier provides the more consistent and reliable delivery times.45 3. Suppose the following data have been collected: 3.50 3. Should they change their minds on the basis of this new information.38 3. 5. The department employs the following decision rule at an inspection station: If a sample of 14 items has a variance of more than . They are looking for somewhere with plenty of clubs where they can party all night. ‘I’ve got some more figures. A production department uses a sampling procedure to test the quality of newly produced items.