You are on page 1of 237
AAMT 105 Engmneerig Road, Araneta: University Village, Potrero, Malabon City” . Tel. Nos: (02}-365.9405 + 365.3239 7 Fax No.: (02)'448.1114 e “ ermasapublshing nm and Roland §. Zorills « Beda H. Esller Fe G, Partible « Violeta C. Mendoza.* Milna K, Cabrera ABEL RIGHTS RESERVED No part of this “publication may be reproduced or stored in a retrieval ‘system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without ‘prior written permission of the “authors and/or publisher. This. Statistics: Basic Concepts and Applications is the third edition of Basic 5 Statistics first published in 2005. This third edition has two main objectives: 1) to introduce students to the fundamental ptinciples of various statistical concepts, methods and procedures, and applications; and 2) to prepare students for higher statistical analyses and inferences.” The use and application of advanced mathematical tnodels demand a higher degree of critical judgment. Such attainment of objectives shall provide insights and inputs necessary for research work. The topics for discussion are those relevant to’ students majoring in various academic disciplines such as economics, education, business administration, psychology, saciology, or any one of the sciences. This Statistics: Basic Concepts ana Applications is a product of careful and thorough review of a group of experienced mathematics professors, of the various textbooks of statistics used in-the past, -and of consultation with experts in the field. This second edition is an improvement of the principles, methods, and procedures in the study of statistics to meet the various changes of modern society. It cannot be avoided that formulas are presented without derivations and proofs. However, the applications of the formulas are presented with sufficient explanations of illustrative exercises and exarripies; and with steps and. procediites shown step by step. While some classical examples are included to illustrate certain impottant concepts, teal life. situations existing in Philippine setting are used as éxamples and exercises for better appteciation. And this will hopefully arouse interest upon students. to continue higher statistical studies. Statistical analysis is purely based on basic’mathematics and all that is required is knowledge of basic algebra and the ability to perform the four fundamental operations of mathematics. Mathematical explanations and notations are kept as basic and simple as possible. The use of the same symbol, for instance, to represent two different values is avoided. . The material is divided into three parts, namely: descriptive statistics, probability theories and related topics, and inferential statistics. Descriptive statistics covers collection, organization, presentation and analysis of data. Analysis of data uses the various measures of averages and. dispersions, Probability theories and related topics include countiig technique, permutations and combinations, simple probability theories, and the concepts and applications of normal distribution. Inferential statistics deals with hypothesis testing, non-parametric tests, simple regression and correlation analysis. . We trust that you find the book interesting and enjoyable to read. We welcome your comments, suggestions and recommendations to improve the book further. ° - The Authors Chapter 1: Concerts ano Nerune of Staristics Introduction and Information 2 Descriptive and Inferential Stati Variables and Types of Data... Parameter and Statisti Measurement Levels . Summation Notation . Exercises ..... Chapter 2: Coreenon, Oreanizarion ano Presentarion_of Dara Data Collection ........ Determining the Sample Size Sampling Techniques... Non-Probabitity Sampling ... Presentation of Data.. Array Form Methoid.. Textual Form Method. Stem and Leaf Presentation... Tabular Form Method............ Frequency Distribution Table Graphs of Frequency Distribution. The Frequency Polygon... Histogram... Less Than Ogive (Ogive) Graph: Pie Chart or Circle Graph Some Other Forms of Graph...... Exercises ... Mean. for Ungrouped Data. "Weighted Mean, Mean for Grouped Data... Calculating the Weighted Arithmetic Mean of Grouped Data. Median Median for Ungrouped Data. "Median for Grouped Data... Made... . Mode for Ungrouped Data.... Mode'for Grouped Data::. Properties of the Mean, the Median and the Mode..... Graphical Relationship of the Mean, the Median and the Mode. Quantiles Quartile... Computing Quartiles of a Frequency Distribution... Percentiles... , Computation of Percentiles for Grouped Data.. Deciles.. Exercise: Chapter 4: Mensuaes of Verinmon Introduction ‘attd Information The Range .... The Inter-Quartile Range The Sertii-Interquartile Range ot Quartile Deviation... The Mean Deviation or Average Deviation The Variance... ‘The Variance for Ungrouped Data... The Variance for Grouped Data...... ‘The Standard Deviation Exercises .... Chapter 5: Measures or ReLaTive Dsraasin, Steumess ano Hunrosis Introduction and Information: The Coefficient of Variation The Coefficient of Quartile Deviation .... The Measures of Shape... ‘The Measure of Skewness... The Measure of Kurtosis .. Exercises .... Introduction... The Fundamental Principle of Counting..... Factoriat... Permutation... Circular Permutation. Permutation of n with Alike Objects . Combination. Exercises Chapter 7: Proprewity. Concerts .- Introduction, and Overview .... _ Definitions and Concepts of Probability : Approaches of Probability .... Subjective Probability or Non-frequentist Subjective Approach Probability of the Relative Frequency Classical Probability...... Addition Rule. Conditional Probability ...... Muttiplication Rule... Independent and. Dependent Events Exercises .... Chapter 8: Norma. Dstaeuven _ ~~ Concepts and Properties. Normal Curve. Areas Under the Normal Curve... Standard Score (Z-Score).. Applications of the Normal Cutve Exercises ..... Chapter §: Hupotesis Testine Introduction and Information... ‘Type I and Type Hi Errors... Hypothesis Tests Terminologies and Assumptions... Significance Level... Test Statistic. Critical or Tabular Vatue(s). Critical Region. Z-Test and Test ... oon Z-test Formulas, Steps and Procedures to be Followed in Using the’Z-test. Applications of Z-test The t-Teyt Formulas, Steps/Procedures in Hypothesis Testing Using the t-test...... _“ Applications of t-test Exercises Chapter 10: fMon-Panamerne Tests os Chi-Square Test. One Variable Chi-Square With Unequal Expected Frequencies, Chi-Square Test of Independence. Analysis of Variance (ANOVA)... Exercises... Chapter 11: Sime.e Resression ano Correuarion Anacysis Introduction and Information... Scatter Diagram... Correlation Analysis. Correlation Between Interval Variables . 235 Pearson’s Product-Moment Correlation Coefficient 235 Correlation Between Ordinal, Variables Spearman Rank Order Correlation Coefficient .... Regression. Regression Method. Linear Regression Analysis Coefficient of Determination... Exercises ... APPENDICES: TABLES BIBLIOGRAPHY. Introduction and Information Descriptive and Inferential Statistics Variables and Types of Data Parameter and Statistic Measurement Levels Summation Notation s Exercises fF FCEE SE The word statistics seems to be a familiar household word nowadays. A young mother may. be. aware that her vital statistics is not proportional, A father, who is a basketball fan is anxious to find out the statistics results of the game of his teani. SEA is the branch of science that deals with studies involving collection, organization; presentation, analysis, interpretation, and drawing conclusions from the data, : There are several reasons why we should study statistics. Among the most important reasons are the following: 1. We must be able:to read and-understand the various statistical studies performed. around us. To understand these studies, we must be knowledgeable of the different concepts, principles, symbols, and techniques and procedures used in statistical analysis. nN We may conduct. research on growing phenomena around us. To do this, we must be able to. perform experiments, and gather, organize, summarize, and analyze numerical data. iy . We may have to make decisions based on the data and information of statistical studies such as what product to purchase based on consumer studies, how much budget should be allotted by a company for advertisement expense, or how much salaries and benefits should be paid or given to employees, etc. | scriptive and Inferential To conduct a research, we have to collect data about variables we intend to investigate:Data are any pieces of information useful to the researcher. A:collection of values form a data set. Each value in the set is called datum. ‘The two main branches of statistics are the descriptive and inferential statistics. Descriptive statistics consists of the@olleetinn, @eyanization, pisseritation aiid aialyae! ‘ofiddata. The origin of descriptive statistics can be traced in the early days in Babylonia between 4500 BC and 3000 BC. The Babylonians coilected data on male citizens, arms, and farm and wheat products, to find out whether they had enough of them to wage war. From the Bible, Roman Emperor Augustus (27 BC ~ AD 17) conducted surveys on births and deaths of the citizens of the empire. The second branch, inferential statistics consists of higher degree of analysis, saitdsiiileichces, The area of inferential statistics called hypothesis testing ‘is ‘a decision making procedure to find out whether there is a significant difference between a claim about a population and another information obtained about the said population. For example, a researcher may wish to find out whether exposure pee Statistics: Basic Concepts and Applications to pollution may reduce life span. For this study, two groups of population of senior citizens of 75 years old and above were selected and exposed to 2-kinds of air = one’ group was exposed to the air in the polluted center of metropolis aid'the other group to the air in the remote ared of Palawan. Five years later, the number of deaths per'group was.counted and 2 decision was made on the effect of pollution on man’s life. span. Inferential statistics was traced in the 1600s: John Graunt in his book on-population. made some tests of significance on the differences-of mortality among groups of people. At the same time, another popular mathematician, Edmund Halley, published the mortality table to determine life insurance rates. Statisticians gain information’ on a particular situation by collecting data for variable. A figpi@bldis.a charactézistic ofa population or-sample which'makes one different from 4 ‘the other. Population refers to a large collection of objects,:persons, or things. Data are gathered by the researcher froma population or sample. If, for example, a researcher wants to find out the average age of 3,100 students of a certain school, then all the students comprise the population (N). In this case, N = 3100. Sampleis-a-part of a population that has the same characteristics ofthe given. population. Data can be classified as qualitative or quantitative. Qualianverdati@ire data that canbe placed into categories according to their characteristics orattributes. Genderjcivil statis, and nationality are qualitative data. Data under this category.cannot be added, subtracted, multiplied, or divided. Quantitonive data ate-data which ‘are numérival in nature: Thesé ‘data can’ be ordered or ranked. Age, height; test:scores, and weiglits. aré examples of quantitative data. Variables may be classified as discrete, continous, dependent, ot independent. A discrete Variable‘asgumes values that canbe countéd and.their values ‘ate-tepresented by counting numbers qnly. ‘An example of a discrete variable is the number of the days in a week or the number of the children in a family. A continuous variable is-one which can assume: all: walues ‘between -‘any:two;specitic, values: or intervals. The values are obtained through méasurertient. If the data are the average number ‘of hours of seep of an infant is 12 hours‘a day; then this means 1 or 2 minutes les§ than 12 hours, meverdennsedelig's affected or-influenced’by:another variable. For example, in ne study on “The Sice fects of Staggered Food Supply to,a "Sick Secluded Patient,” the independent variable is the staggered food supply ‘and the depéndent variable is the sick secluded patient. ~ Concepts and Nature of Statistics 3d A paramerer i8-a value or -meastiré Obtained ‘fom a populdtion. If one uses the ‘meaii, median and standard deviation th differontiaté the achievement of a class from another class, then these measuies are parameters. Statistic is any value or measurement obtained from a sample. It is an.estimate of the parameter. Ina given popularity survey of a certain progtam, if 10% of it-are-senior citizens with an average age of 65, then this average age is statistic. . 15 Measurement Levels Variables are counted or measured using four types of scillés namély, nominal, ordinal, interval, and ratio, : 1. Norhinal level of measinement. classifies ‘data. in non-overlapping scale, This scale distinguishes one object from another object for identifying purposes only. Subjects taught such as English, Filipifié, Science, or Mathematics aré nominal scale. There is no ranking or meaningful order among the categories, Rp . Ordinal level of measiirement classifies data into. some. specified order or rank, However, we cannot tell how much less or how. much more one rank has over the other. Examples of these are the ranking of honor students in.a class and ranking of candidates in a beauty contest, we . Interval level of measurement specifies the précise difference between or among the-values or ranks. If, for instance, the sale of tickets on the first, day of showing of a.film is 2 million; on the second day, 1.75 mitlion; and on the third day, 1 million, the. differences among the sale of tickets for the 3 days-can be determined, + ‘Ratio'léyel. of measurement:has.the. same characteristics as. the interval leveli.the only-differonce is that the ratio level always starts fom, zero. Tn addition, the ratio level has always the presence of units of measures. Hence, we can say that'one object is 8o°many times larger or smaller than the other. If; for example, car A starts from Luneta and traveis 90 km. in one liowr and car B also starts from Lunet and travels 120 km. in one hour, we can conclude that cat B is. 74 times faster than car A, ~~: c Sta asic Concepts and Application: The most commonly used notation: in statistics is the summation notation (I) which reads “the sum of” or “the summation of’. This is simply the shorter way of 2 writing X, + X, +X, +... +X,, for 2% in summation notation. This notation is read: the summation of x sub i, as i ranges from 1 to 20. Other examples written in summation notation are shown below. 1 fixe fxjt otha, bas ied. 16 2 2 (RAPE, HEH OG, AIP tot Og tI ED t 1) ind is, 5. 3, (Xe, Fay Fo td tO, FY: TI He + yd Det i it To expand summation notations, we shall do the reverse. Examples: A. Expand the notations below. £ ¥0,-@+0,+0, +0, +0, iat : 9, Macata,tay~a, +a, a 3, NCHA H(THA (THA HTB) HTHAY) SA(T) 4A, +A, FAS FA, =28+A, +A, +A, +A, B. The table below shows the agés of 5 pairs of boys and girls. Concepts and Nature of Statistics 2eN Expand and find the values of the following: 5 JL Dy iL 2 Sx isl 3. Sein? iL 4, Average age of boys 5. Average age of girls Solutions: s 1. oxy = ay tay +k tay +5 Ys 12412 4144 13 4:11 302 5 2. Dy =dy +2493 +34 F Ps EL _1413+12414+10=60 3. Elum) stra oF oF Hea) 9? a =(12-HY + (12-13) + (14-12 +(13 -14Y + (11-10) =P i(yePa(Per sittid+]+1=8 5 Xi 4, Average age of boys = 451— 8 =124 5 5, Average age of girls = we Statistics: Basic Concepts and Applications C. Using the following values for x,and yz x,=3 45 x28 x,=10 ya ¥,=6 y,24 y,=l4 Find the values of the following notations: Solutions: ‘ L Dia a ty tay tx, i? 33454+84+10=26 4 2. Lyavt yet vat ye el 25 46+4414=29 4 05 +H) = Gr tr) + (2 +92) + Org Hs) + (04 + 94) el =(345)+(5+6)+(8+4)+(10+14) =84114+]2+24=55 wa 4, Yb) = xy + X2¥Q + X3¥3 + X49 HI =3(5)+5(6)+8(4)+10(14) =15+30+32+ 140-217 w . Si = yj) = (1 ~ 31) (a2 — 92) + (x3 93) + (Xa 94), rt =(3-5)+(5~6)+(8-4)+(10-14) =(-2)+(-)+4+(-4)=-3 Concepts and Nature of Statistics FF EK E “Exercises Data Collection Determining the Sample Size Sampling Techniques Presentation of Data Graphs of Frequency Distribution One of thie reasons why we study statistics is to help us make a wise decision. We cannot: avoid making decisions evéry minute of our life, For instance, in our choice of career in college or partner in life, we make decisions based.on the data and information that we have gathered. Data may be gathered from primary and secondary sources. Primary sources of data intiude government offices, private organizations, establishments and recognized individuals. who have first hand information about an - event. Secondary data may be obtained from sources like newspapers, journals and. magazines: For instance, a reporter who goes directly to the crime scene to interview the victim and witnesses around has gathered primary data, while the readers who read the news item of the scene have received the secondary data. The foltowing are some methods of collecting data: . Interview Method. The researcher makes. direct and personal contact with the interviewee. The researcher gathers data by asking the interviewee series of * questions. n . Questionaire Method. The researcher distributes the questionnaires either personally or by mail and collects them by the same process, The researcher can save 4 lot of time and energy since the quéstion#aires can be distributed and answered simultaneously, The researcher, however, cannot expect all the mailed questionnaires to be retrieved, since some respéndents might ignore answering the questionnaires. w . Registration Method. This method of collecting data is. governed by our existing laws. The researcher gathers data from offices concerned, ¢.g.- the National Statistics Office (NSO), the Commission of Elections (COMELEC), Muijicipal/ City Halls or Barangay Offices, The NSO takes care of keeping complete records of birth and death of the population. The COMELEC takes care. of the list of registered voters. 4, Experimental Method. This method of collecting data is ius Pause andiedieer: relationsh: ipsot ‘Bertain’ ‘Phenomena: unde controlled con The chemist is interested in Gnding out the effect of pesticide when sprayed in ‘vegetables which are then eaten by people. 5 e= 10% oN. I+ Net - 4,000,000 17¢(1,000,000)(10)° * 2,000,000 © “roo 7? n= 100 (sample size rounded off to whole number) A sample should not:be selected.in-haphazard way because the information obtained from the study, might be; unbelievable and. unrealistic. Sampling technique: is a method used to determine which element is to be included in the sample. In order to obtain a genuine or unbiased sample, cach member of the population must have an equal. chance of being included in the sample. This procedure is called random sampling technique or probability sampling technique. Collection; Organization and Presentation of Data —— ee NN = Iris imporwaint that we have a complete listing of the pepulation, so. that every | member is ready and equally likely to be included in the sample. Among the types of yandom sampling techniques are: 1. Lottery Sampling. This is also known as raffle. Each member of the population, -isnumbered on a piece of paper. The pieces of paper shalll be identical (equal in size and weight) and rolled evenly. They are placed in a lottery box and shaken very well, The desired number of samples are drawn, one after the other, with eyes away from it! . 2, . Table of Random Numbers, A table of randori numbers, invented by astatistician, is used to draw the numbers for the sample. Suppose in a class'of 30 students, 5 will be selected at random and transferred 10 another class. Then, if the-first 2-digit number of the-random numbers is'26, the number 26 of the class is the. first sample: The process is continued until the 5" number is chosen. w , Systematic Sampling, This is done by numbering each element of the population. If there are 1,000 elements (N) and. 50 samples (n) are needed, we divide 1,000 by 50 and obtain n = 20. We then select one. number from the numbers 1 to 20°by. Jottery sampling. If the number 3 happens to come out, then the first sample is 3. The second sample is 3 +n or 23. The process is continued until 50 samples are. obtained. ‘ = Stratified Sampling. We obtain samples by dividing the Populatioit ‘into strata (groups). If the desired sample is 50 and there are 10 strata, then we-obfaini the sample proportionately from each stratum. The bigger the strata, the more number of samples are taken. wn Cluster Sampling. This is sometimes called area sampting because it is used for a large population. We select members of the sample by area; then we select each area by lottery sampling. . Multistage Sanipling#Here, we use combinations of seveial random sampling techniques in getting the sample from a very large population. This is done by dividing the whole population by area,“and the each-area into strata, Thereafter, from each stratum, we get the sample by using the siniple random sampling technique. There are some sampling techniques which are biased and therefore not reliable stch as thosé samples drawi by résearchers based on their own judgment. These methods are classified as ainepephy om :fechnique. Some of the commonly noted non-probability sampling techniques that we: should refrain from using are: faleniée Saiplinge This is used because it is convenient to the researcher. example, a researcher may find out which detergent is the most popular among households by making phone calls using the phone numbers found in the telephone directory. While the data may easily be obtained; the accuracy of the data may. not be reliable since not all households have telephone connections. { 2. Quota Sampling. In this method, the tesearcher uses thé proportions of ditfere: strata; and from the strata, sélections are done using quota. 3. Purposive Sampling. The researcher gets his sample from the respondents purposely related or close to him. In order to make inferences about évents, the researcher must organize data in a meaningful way that will make hia handle and interpret the data easily. There are two ways of organizing data—the grouped data and the ungrouped data, Uigarauped! are those that are not organized, and may or may. not be arranged according to their magnitudes. Grouped data are organized and arranged into class intervals, presented in a frequency distribution form, A frequency ‘distribution: table shows the scorés, or groups of scores and their corresponding frequencies. 2.5.1 Array form Method This method presents data in array form, such as the scores of 40 students in a 20-item quiz in Statistics, as follows: 4] ofis[s [7 [2 a a “hits 9]s]4afo {9 1 3 an wats [a7 Het ufs wfio [a6] 7 | While it is difficult to-derive conclusions. or findings by simply arranging these scores from the lowest to the highest, we can nevertheless determine. some important characteristics of the data, such as the highest and the lowest scores, the middle score or median, and the percentage of students above the mean. 25.2 Textual Form Method The textual method, also called the paragraph method, is used-to present purely qualitative data or if there are only very few numerical data, This method is desirable and effective when data are presented in paragraph form using small columns like those in the newspaper. Collection, Organization and Presentation of Data 2.5.3 Stem aid Leaf Presentation Using the above data, we can present the scores in stem and leaf method, as shown below. 0,0, 1, 1,2, 2,2, 3, 4.4, 5,5, 6.6, 7,8, % ’ Stem and Leaf method orders the. data in a concise way. “95.4 Tabular Form Method . i St Statistical tables are effective devices of presenting both qualitative and quantitative data. The tables can be used conveniently to make comparisons and draw relationships between and among variables. . A statistical table shall show the following components: table heading (table number and title), body (Contains quantitative data), stubs (labels that classify values of a variable), box heads (captions above the columns), footnote and source note. ae : 2.5.5 Frequency Distribution Table Below is the frequency distribution of the above data. Total = 40 Data, shown above in frequency distribution form, can be easily analyzed and described, While the identity of anindividual score is lost due to grouping, the organization of the data can.be easily handled. The groups of scores 1-3; 4-6, «.,, 19-21 are called class intervals of classes. The numbers 1, 4,7, ..., 19 are the-lower limits; 3, 6,9, ... 21 are the upper limits; 4, 8, 11, ..., 1 are-the frequencies, and 40 is the sum of the frequencies or the total frequency. A sport editor takes the records of the 25 participants in the PBA anziual sporisfest * for a 12 ft. perimeter shooting, Construct a frequency distribition for the data!” , 6 1.8 [+4 {10 8 5 il slo alo alala sa}ol ofa afe[jefe Solution: . Step I: Find the range. The range, R, is defined as: R = highest score — lowest score or HS ~ LS =l-4=7 Since the range is small, single vaiue data maybe used and they are 4, 5, 6, 7, 8,9, 10 and LL. Step 2: Prepare a table as shown below. Step 3: Make a tally of the data. Step 4: Complete the frequency cofimn. it q 2 Total = 25 For a set of data that is contifmous, we shall construct the hotndaries for each class by subtracting 0.5 from each class value and adding 0.5:to each class value, as shown below. Ate} |e 9+6=515 7 Collection, Organization and Presentation of Data ae / 15+3=18 3 x 3 : 18+3=21 10 95-105 2 21+2=23 C2 105-115 | 2 23 +2=25 The basic rule for consttucting the column fot class boundaries i8 that the class boundaries have one additional class value and end in 0.5. Cumulative frequencies, are. used to show how many values are accumulated up to a specific class. In less . than cumulative frequency, we start accumulating the frequency of the smallest class interval to the largest class interval while in the greater than cumulative frequency, we do the reverse, The class width or class size is found by subtracting any two consecutive lower limits or upper limits. The midpoint or class mark is the numerical location of the center of the class and is computed as follows: Midpoint or Class mark (X)= dower limit + upper limit . upper limit Percentage of the xelative frequency is the percentage composition of each class relative to the. total number of frequencies (n). The total percentage of the relative frequencies should be equal to:100%. a Gp atx 100 a To construct a frequency distribution table, the following rules shall be followed: {. The number of classes should be 5 to 20 classes. The genetal mule should be the number of classes (K) = I + 3.3 log n, where n is the total number of elements of ‘tie sample. Example: If'n = 100, then K= 1433 log 100 , 214+ 33(2)% = 1266 = 7.6: = 8 (rounded off to whole manber) Please note that while there is ¢ general-rule as mentioned above, the determination of the number of classes, however, is subjective. It depends upon the researcher who decides the number of classes based on the needs, requirements and other factors affecting the results of his studies. Few samples mean few classes, e.g. 20 scores may require only 4-5 classes, while 500 scores may need about 9-10 classes. © Ne Applications «. There should be no overlapping of ‘clements in the class intervals, such-as: Scores 8 - 5 Using the same element, i.c. 5 or 10, in 2 different class 5 - 10 is wrong! 10-- 15 3. Show or include all classes. A class interval with no frequency and located between the first and fasticlass intervals should be included, The exception occurs whén the class interval with a zero frequency is the first or last; then it should: be omitted. 4, There should be enough classes to accoramodate all the data. . 5. The classes mist be equal ini Width. This is the general rule, except'whén the classes are open-ended, such as the classes shown below: 75 and. below. 76-80 8E-85 86-90 91 and above These groupings ate made from the researcher's experience that nohody gets 5 less than 75, or more than 5 above 91; and if there is any, th case is rather exceptionally rare. Example2: The following data represent the: 1Qs. of 50 police applicants who passed the 1* screening test. Construct a frequency distribution table with 7 classes. | | 1203 i Tt 120) [ iis 107 [aon] 1% | u2 [4 [13 [aos P12 | ais [100 | 125 mi | ut s[ 0 | 4 [7 | a27_- [126 105 [113 | iis | ans | 134 urs | 119 | 118 |g 105] 134 Ht Solution: Step I: Find the range. R=AS-LS.= 134~ 100 =34 Step 2: Select or determine the number of classes {usually between 5 and 20) desired, if necessary. In the problem above, the requirement is 7 classes. Step 3: Find the class width or class size by dividing the'tange by the number of classes, R Class Size or Width (c) = no, of classes es He 4.86 or 5 7 Collection, Organization and.Presentation of Data — ee ae ‘The number of classes has to be rounded up to a whole number. Any decimal number with a remainder such as 2 =66 has to be'rounded off to7. Step 4: Select a’ starting point. lt must be. equal or lower than the smallést value. Itis nnich convenient if the starting point is a multiple of the class width, such as 105, 110, 115-and 120 (all-mmultiples of 5). Step 5: Construct the classes or class intervals. The first class interval should contain 5 consecutive numbers starting from 100. Hence, the class intervals are: Class Intervals 105 - 109 110 = 114 415 - 119 120 - 124 125 : 129 130 = 134 Step 6:. Determine the class boundaries by adding 0.5 to each of the upper-class limits and subtracting 0.5 from each of the lower class limits. Class Boundaries ‘ =: Si Basie Concépts and Applications Step 7: Tally the data; write the numerical equivalent of the tally on the cohuan for frequency; determine the léss than and greater than cumulative frequencies; and calculate the percentages of relative frequencies, as follows: , 05 -. 109. o[ Lis -. M19. 120 =. 124 125 - 129 130 - 134 2.6.1 The frequency Polygon One of the most widely used graphs of frequency. distribution is the frequency polygon. Here, we use a line to connect any two consecutive points. In constructing a frequency polygon, we shall follow the procedures below. L. Frequencies are placed on the vertical axis (y-axis). 2. The scores or midpoints of classes are placed in the horizontal axis (x-axis). 3, The ratio of the x-axis to the y-axis should be 3:2 or 4:3, If one is using graphing paper with 10 small squares to the inch, the use of 120 of the-small spaces on the x-axis would require the use of 80 small-spaces on the y-axis. Thesé ratios will lead to conformity with respect to the right shape of the graph. 4. Plot the points and connect consecutive points with straight line. The curve is anchored at both ends to zero frequency or the figure formed should. be a polygon. (closed broken lines). . If the first observation is too far from zero.(0), a “break line” (H+) shall be used t to indicate that no value exists between 0 and the first observation. w Collection, Organization and Presentation‘of Data . ee Using the frequency distribution table of the 100-point test in Statistics, let us now prepare a frequency polygon. oa "80-84 ~ “ 85-89 Frequency tid 32 37 42 47 52 57 G2 G7 72 7782 BT 92 Midpotnts One advantage of the frequency polygon is that we can plot several frequency distribtitions on the same axes for comiparative axes; if the dimensions aré the sane and-if the data on thé axes vary a little. . When this is : done, it ig advised that different colored pencils are used, 2.6.2 Histogram "The histogram is a graph that ‘uses bats of various heights’ t represent the frequencies. The graphs may be prepared for multiple responses of even for overlapping categorical data. The bars:may be drawn vertically or horizontally. The steps in constructing a histogram are as follows: Step I: Draw.and level the x and y axes. The x-axis is the horizontal axis and the y-axis is the vertical axis. “ Step 2: Represent the frequencies as the heights on the y-axis and the class boundaries on the x-axis. asic Concepts and Applicatins: Stari Step 3: Using’the frequencies as the heights; draw vertical bars for each class. Figure 2. Histogram for Statistics Test Results... ween © pF 9? po oF oF 2? G 9? 9? 0 go? . lass Boundaries Bee 2.6.3 Pareto Chart A Pareto Chart is a’bat graphi‘of a frequency’, distribution, uséd for a categorical variable. Similar to the histogram, the Pareto chart uses bars for frequencies: Rules for constructing a Pareto Chart; ‘I. Make the bars the same in width. * 2. Arrange the data from the,laigest to. the smudllest according to their frequencies. ..: 3. Make’ thé units for féquéncies squal in’ sizé.” ‘The following table shows the forms of suicide gathered by polic particular year in'a Metropolitan City. rope hanging | shosting self witha gin! juinping from a tall building jumping into.a deép body of water |“ Collection, Organizarion and Presentation of Data. —_ ee Figure 3. Pareto Chart-showing the Forms of Suicide - Frequency Forms of Suicide 2.6.4 Less Than Ogive (>Ogive) Graphs The Ogive graph is a line graph formed by plotting the cumulative frequencies against the class boundaries and connecting all the consecutive points by straight lines: This graph is uscd to show a trend of values or of au activity over a period of time, Example, the cumulative ages of persons by age ranges in a frequency distribution can be presented -using'the Ogive graph; the salesman’s*monthly cumulative sales for a period of one year, through the less than Ogive graphs, and the monthly dectease in the budget for a period of 6 months, can be shown by the greater than Ogive graph. ‘The steps and procedures in preparing an Ogive graph are as follows: 1. Represent the cumulative frequencies on the x-axis and the class boundaries on the y-axis. . 2. For the Less Than Ogive, plot the points of intersection starting from the lowest Jower boundary against the lowest frequency. Continue plotting the points using the next upper boundary and the next higher frequency until all points of intersection are plotted.. 3. For the Greater Than Ogive, plot the points of intersection starting from the highest upper boundary and the highest frequency. Continue plotting the points using the lower boundary and the next lower frequency until all points of intetsection are plotted. a Statistics: Basic Concepts and Applications Example; Using Table A below, prepare the graphs for < Ogive and > Ogive. Table A. IQ Scores of 50 Police Applicanis Figure 4, < Ogive and > Ogive Graphs Showing the 10 Scores of 50 Police Applicants 60 | > Osive , = 89 +79 + 82 + 84-= 330, Therefore, the score of the fifth test should be 415°- 330 = 85. 3.2.2 Weighted Mean There are some cases when values are given more importance than others. ‘The mean derived in this case is known as the weighted arithmetic mean. n. The formula used. in the computation of the weighted arithmetic mean is: kt oe where: X= represents each of the item values ‘w= represents the weight of each item value © Measures of Central Tende: . ee Example 5: ‘Suppose we are interested in computing the weighted mean of a BS ‘Math student in a ‘certain university where he is enrolled in.6 subjects having different unit loads, as follows: Solution: we= 2.29 Example 6: 18,000 books of Algebra were sold at P*320.00 each, 1,500 Business Mathematies at P°380.00 each, 1,000 Mathematics of Investment at ® 300.00 each and 3,500 Statistics at P*340.00 each; find the weighted arithmetic mean sales for the four books. Solution: __ (8,000 320) + (4,500 x 380) + (1,000x300) + (3,500 340) me 14,000 _ 2,560,000 + 570,000 + 300,000 + 2,190,000 7 14,000 _ 4,620,000 © < 14,000 ~ vat =F 330.00 3.2.3 Mean for Grouped Data Data which are arranged in a frequency distribution are called grouped data, It is best to compute the measures of central tendency for grouped data using frequency distribution, especially when the number of items is too large. There are two methods in computing for the mean of grouped data. These are the: 1. Midpoint Method 2. Unit Deviation Method ee Statistics: Basic Concepts and Applications In using the Midpoint method, the midpoint or class mark of each of the class intervals shalt be multiplied to their corresponding freqiiencies: The sum of the product is then divided by.the total numiber of frequencies. The formula for the mean tor) grouped data is denoted by . n where: f = frequency of cach class x= midpoint or class marié 11 = total number of frequencies o¢ sample Size Example 7: The results of the LQ. test of{a group of Psychology students in a’ certain college.“ are presented in a frequency distribution shown in Table 3.1 below. Compute the value of the mean using the Midpoint Method. - Table 3.1. LQ. Test Results of Psychology Students 142-147 118-123 § Solutions: To calculate the mean, let us follow the steps and procedures enumerated below. Step 1: Get.the midpoint or class mark of each of the class intervals. 100-105 106-111 112-117 118-123 145 Measures of Central Tendency: eX Step 2: Multiply each midpoint to the. corresponding. frequency of .each .class interval. Represent the product by fx, as shown below. 88-93 94-99 100-105. 106-144 112-117 148-123 5 120.5 Step 3: Find the sum of fx. 82-87 20 “845 4,690.0 88-93 21 ~ 90.5, 1,900.5 94-99 100-105 106-111 112-117 118-123 : ” . 602.5 n=145 14,082.50 “Step 4; Compute the value of the meait by using the formula. Round off thetesuits tothe nearest hundredths: La _ 14,082.50 145 X= 97-12 re Statistics: Basic Concepts:and Applications Example 8: ‘The frequency distribution of thé test results of £100 BS Math Studenis in Statistical ~ Analysis are as follows: Table 3.2. Test Results of 106 Students in Srocal ‘Analysis 15-24 5 19.5 97.5 : 25-34 10 29.5 +} 295.0 35-44 il 395 | 4345 45-54 23 49.5 | 1,138.5 55-64 126 59.5_| 1,547.0 Find the value of the mean. X= 53.90 The second method. of computing the value of the mean for grouped data is the Unit Deviation Method. The formula is a fee) | | : where: . . = assumed mean inthe: midpoint of the class interval having the highest frequency) f =frequenicy of each class d =the unit deviation or interval size = class size of-the class intervals or number of values in the class interval n =total frequency or the sample size * Measures of Ceniral'Tendency —-. we - . Example 9; Find the mean of the data in Table 3.1 using the unit deviation method. Table 3:1. 1.0. Test Results of Psychology Students _ 10-75 2 16-81 7 82-87 20 88-93 “24 94-99 39 100-105 of 27 106-111 14 12-117 10 118-123 5. n= 145 Solutions: Step J: Identify the assumed mean. The’ assumed mean is the midpoint of the class interval 94-99. Therefore, x, = 96.5. Step 2: Find the class size. ce =(99-94)+] co =6 Step 3: Construct the unit deviation column, as follows. 94-99 39 0 100-105 27 +h 106-111 14 +2. 442-117 10 +3 148-123 5 +4, n= 145 re Statistics: Basic Concepts and Applications. Step 4: Multiply the data under cotumn f to the data under column d of each of the. class intervals and find the sum of fd. 100-105 . 106-114 +2. +28 ‘ 112-117 43, +30. 118-123, +4 +20 D5 Step 5; Compute the mean using the unit deviation method, as follows: Foxe BE n = 96.5 + ( 145 = 96.5.40.62 X= 97.12 Example 10: Find the mean of the test results of 100 students in Statistical Analysis (Table 3.2) using the unit deviation method. Table 3.2. Test Results af 100 Students in Statistical Analysts ‘Measures. of Central-Tendency. . —$_ wee Solution: 56 = 59,5 + 10] * (He) 5 =53.90 ° Hence, the two methods'for solving the mean of grouped data yield the same value. 4.2.4 Catculating the Uleighted Arithmetic Mean of Grouped ‘Data The formula for calculating the weiglited atithmetic mean’ of grouped data is: > we - where: w = weight of x wx= product of weight and value of x Example 11: An achievement test in College Algebra was administered to BS Math freshmen students in 3 colleges. Each of the college heads computed his own mean using class size or interval size of 3, as shown below: College A ~%+3| 5) 27 & = 28.67 (mean of students in College A) College B 12-14 l 5 “5° 15-17 2 4 “8 18-20 2 3 6 21-23 4 2 8 F 24-26 3 “1 : 27-29 8 0 o- 30-32 6 +1 i6 33-35 | 5 +2 +10” 136-38: 2 8 46 39-41 1 4 +4 42.44 45 set fiat -2+3(3) 35 ¥ = 28.09 (mean of students in College B) College C a=35.f. 9 [pi aa35 ¥ = 30.40 (mean of students in College C) Measures of Central: Tendene ‘Using the formula for weighted mean-for the 3 colleges, we have: _ 27(28.67) + 35(28.09) + 35(30.40) - 27435435 _ 774.09 + 983.15 + 1,064.00 ~ 97 _ 2,821.24 "97 Ww = 29.08 (mean of all students for the 3 colleges) The weighted mean of the 3 colleges can also be solved by following the procedures below. - . “ 1. Find the highest and lowest scores of all the students in the 3 colleges. HS, =44 LS. =12 2, Prepare class intervals using a class size or class width of 3, considering all scores of the students in the 3 colleges. 3, Write the frequencies of the scores in step 3. 4, Find the total frequencies for each of the 3 colleges. 5, Compute the mean using the unit deviation or the midpoint method. Statistics: Basic Concept: Using the midpoint method, we ‘obtain: pole ng _ 2,821 7 F= 29.08 Using the unit deviation method, we shall have: Fon,+f 2A) n -28+3{ 2) 97 F= 29.08 : : Hence, using the above procedures also yield the same mean of 29,08 for the 3 colleges. ‘The median i is the center most observation that divides the data, arranged in either ascending or descending order, into halves. Half of the dbservations belongs to higher 50% of the group, while the other half (remaining 50%) belongs to lower 50% of the group. It is denoted by ¥ (reads “x curl”). 3.3.1 Median for Ungrosiped Data To find the value of the median, we arrange the observations from ascending to descending order or vice versa. The observation found in the middle is the median. Example 1: Find the median of the heights (in cm.) of the £5 basketball players listed in Example 3 of 3.1.1, as-shown below. Solution: Arrange the heights of the 15 players from the shortest to the tallest and identify the height of the middle player. 181, 183, 185, 187, 187, 188, 188, 189, 190, 191, 194, 194, 201, 202, 205 Since there are 15 players in the team, then the eighth (8°) observation is the median. ¥ = 189 Measures of Central Tendency re Anal cases, if nis odd, the median OF Tne ODSCTV AT OTS COTES PO observation. Therefore, y ntl itnisodd 2 This formula will give us the position of the median relative to the lowest score and not the median itself. nt2. If nis even, the median is the average of the 2, and the “—“th observations. ‘The median is the average of the two middle values... Xt Ena . ya? 2 if niseven Example 2: Find the median of the following values: 36, 40, 1216, 23, 25, 50, 38, 45, £7 Solution: Arranging the data from the highest to lowest valués (or. vice. versa), we have: 12, 16,17, 23, 25, 36, 38, 40, 45, 50. : sth 10 Since there are 10: values, the median is the average between ‘the — observation which is 25, and ihe 107? —# observation which is. 36. Hence, the. median is: ” S* scare +6" score 2 25 +36. - 2 : = 30.5 Piease note that the median 30.5 is not found in the given,set of observations. In a set of observations, the mean and the median may or may not have the same value. y Example 3: NS Determine the median value for the following séts of data. a, A=23,17, 41, 12, 14, 40 and 36 : b, B=21, 23, 25, 19, 18, 30, 22, 26, 27 and 20 Solution: ‘Arrange the values from ihe lowest to the highest, as shown below. a. A= 12, 14, (7,23, 36, 40 and-41 Since n =7 (odd number}, the formula to be used is: Fa Kyat 5 - 2 Statistics: Basic Concepts and.Applications Hence, ces. : Phar) . “Kp os ES 23(the fourth score} b. B= 18, 19, 20, 21, 22,23, 25, 26, 27’and 30.0200) There are 10 values, and 10 is an even’numiber. Hence, the median is found by using the formula: : Hence, the median is the average of the 5“ value = 22 and the 6" value = 23, 22423 “2 F225 k= 3.3.2 Median for Grouped Data To compute. the median of grouped data, we have to determine the value which divides the distribution into two equal parts. ‘Thus, we have to calculate the “less than cumulative frequency” of the disttibution:” The median of a grouped data is computed-using the formula: where: X,, =lower class boundary of the median class c =class size or interval size cfb=less.than cumulative frequency before the median class n =number of observations or total: number of frequencies fm =frequency-of the median class Le Medsures.of Centril Tendency: _ Example 4: Find the median of the data in Table 3.1. Table 3.4. 1.Q. Test Results of Psychology Students Solutions: Let us follow the procedures enumerated below. 1. Get the median class. The median class is the class interval where oth item is contained. 7 Po 97 72-5 Therefore, the median classis 94-99. 2, Determine the value of the Jess than cumulative frequency before the median class. of = 50 3. Determine the lower boundary of the median class, the frequency of the median class, and the size of the class interval. These are: K,, = 93.5 jin = 39 c=6 4, Substitute the values in the formula, as shown below. -9356 72.5 ~50 39 ) =93.5+3.46- 6.96 ” oe Statistics: Basic Concepts and Applications Example 5: ; os Find the median of the test results of 100 students in Statistical Analysis contained. in Table 3,2 as shown below. . ‘Table 3:2. Test Results of 100 Students in Statistical Analysts 65-74 14 89 _ _ 8 oT 85-94. 3 100 n= 100 Solutions: 1 210 _ sy 2, Median class = 55 ~ 64 3. 0, = 54.5 4. cb 10 5. ¢fh =49 6. fn = 26 Hence, substituting the values in the formula, we shall have: F=X,,4¢| 2545 +10 2% 6 = 5454038 2=54.88 Measures of Centrat Tendency. — ee Another type of average is known as the mode. The mode is:generally associated with nominal data and is denoted by. (reads “x hat”). Suppose a department store sold 100 blouses in one day with sales breakdown shawn below. Extra Smal Small Medium Large Extra Large Double Extra Large _f- 12 Since the medium size sold the most number of blouses, it has the highest frequency. Hence, the mode is the medium-sized blouse. ‘The mode (2) is the observation or value which appears the most number of times in the set of values. 3.4.1 Mode for Ungrouped Data he mode is the simplest measnre of central tendency. It can be easily identified by inspection of an ungrouped set of data by gétting the score or item which occurs most frequently. . A set of scores or data with only one mode is called uni-modal while a set with two modes is bi-modal; with three is tri-modet; and many (2 or more) modes, multi-modal, , Tn some instances, the mode may not even exist at all. Example 1: : Find the mode of cach of the sets.of observations shown below. aAt 2,3,5,7,8,9, 11,15 b.B: 15, 13, 13, 14, 17, 17, 18, 12 “ eC: 2,3,5,8, 8,9, 10,8 dD: 3,5,7, 7,8, 5, 5,8, 89, 10, 9,9 Solutions: a. In set A, there is no value that occurs more than once. Therefore, the made does not exist. b. In set B, the values 13 and 17 appear twice. Then, the modes are: x= 13 and 17 (bi-modal). ¢. Inset C, 8 occurred thrice. Therefore, £ = 8 (uni-modal). d. In set D, 5, 8 and 9 appear thrice. Therefore, the % = 5, 8 and 9 (iri-medal). ae Statistics: Basic Concepts ‘and Applications 3.4.2 Mode for Grouped Data : . The mode in a fréquency distéibuiion is the value within'the clade inieévall tiv ing, the highest frequency. ‘The mode for grouped data can bé’ calculated using the formula: six, + i) finy.+ fing X,, = tepresents the lower boundary of the modal class where: ¢ =class:size/width or interval size : os fm, = difference between the,frequency of the modal class andthe frequency of the class interval. preceding it fm, = difference between the frequency of the modal class and the quency of the class interval following the modal class °° Example 2: Compute the niode of the frequency distribution in Table 3.1. Table 3.1 1.Q. Test Results of Psychology Students 112-147 Solutions: To find the mode, we shail follow the steps below, 1. Find the modal class. The modal class is the class interval with the highest frequency. Therefore, 94-99 is the:modal class. 7 2, Determine the class size,.c = 6, 3, Get the value of fin, and fin,. J, = 39-21 = 18 Jin, = 39-27 = 12 Measures of Centrat Tendency ~ 9-4: Substitute the values in the formula, 43 follows: Example 3: Find the mode-of the frequency. distribution shown in Table 3.2. Table 3.2 Test Results of 100. Swidents in Statistical Analysis 25-34 10 35-44 45-54 23 [55-64 26 65-74 4 75-84 |. 85-94 3 n= 100 Solutions: 1. Modal class: 55-64 X25 3.c=10 4, fm, = 26-23 =3 jin, = 26~14= 12 5. 8=Xy vf | fing + fing 25454 35 : i+ =54542 $2565 es Statistics: Basic Concepts and Applications 3.43 Properties of the Mean, the Median and the Mode Mean 1. The mean is:always'a unique value in any set of data: 2. The mean is associated with the interval/ratio data. 3.. The mean is strongly influenced by the extreme values in a set of data. 4. The mean‘is the most reliable measure of central tendency. Median ~ . 1, Like the mean, the median igalsoa unique value i any-set'of-data. 2. The median is associated with ordinal data. 3. The median value is not‘affécted by the extreme values. 4 . The median is only a function of the middle values (even or-odd) or the average of the two middle values (when n is even) when the data are arranged from the highest value to the lowest value ot vice versa. 5. The median is a positional measure. “ Mode 1, The mode is not affected by the extreme values, 2. It may not exist. 3. If the mode exists, it may not always be unique. 4. In finding the mode, we do not considef all ‘the Values in the distrib 5. The mode is associated with nominal data. 3.4.4 Graphical Relationship of the Maan, the Median’ ant the Made * £-If the mode is greater than the niedian and the median is greater than the mean, the data are skewed to the right or positively skewed. Figure 3.1 Positively Skewed Distribution. y ~ frequenci x + values Measures of Central Tendency tn a 2: He the mean is less than the median and the median is léss thai the mide, the daia are skewed to the left or negatively-skewed. Figure 3:2 ‘Negatively Skewed Distribution:. y+ frequencies © X- values 3. If the mean, the median and the: mode are equal, the data are symmetric or are normally distributed. vt Figure a, 3 Snare Distribution y- frequencies . Tn addition to the mean, the median.and the mode, there are other ways of ~ describing a given set of numerical data, known as Quantiles. Quantiles are extension of the median concept in that they are valites which divide a set of data itito qual parts. Hf the median, divides the distribution into, two equal parts, the ‘nantes divide the.data into either, e.g., four, ten or hundred equal parts. 3.5.1 Quartile A guattile divides a set of observations into four equal parts. The upper-quartile known as the third quartile ig the value of the variable below which 75% of the cases lie. The third quartile also corresponds to the 75* percentile point because it surpasses 75% of the cases. Hence, Q, = Pay _ The lower quattile kitown as the first quartile (Q,) is the value of the variable below which 25% of the cases lie. It also corresponds to the 25" percentile point. Hence, Q, = P,,. Therefore, the median is equal to the second quartile (Q,), eS Statistics: Basic Concepts and‘Applications 4.5:2-Computing Quartiles of a frequency. Distribution . eee Computing the values of the first and third quartiles is similar in computing ‘the value of the median, where n is multiplied to k (where k =, 2,3) and divided it by four. instead of two. The formule for computing the quartile is; as follows: kn % 2Xn+e| 4 where: . X,, = lower boundary of the k, quartile class cfb = cumulative flequency before the k,, quartile class = frequency of the k, quartile class om ‘The formulas of the quartiles wheri k= 1,2 and3 are, as follows: * when k= 1 Q,;=Xn +e a7 when k = 2. 2" opp faz B=Xyte Whenk=3° | Measures of Central Tendency. Tae Example t: Find Q,, Q, and Q, of the 1, test resulis of 145 Psychology students in, «certain college. Table 3.1. 1.Q. “Test Results of Psychology Students 0-75 2 | 2 76-81 7 9 82-87 20 29 * 88-93 21 50 94-99 39. 89 100-105 27 116, ~ 106-111 14 130 112-117 10 140 118-123 5 145 n= 145.. Solutions: A, To find Q,, let us follow the steps below. 1. Get 4 of the total number of frequencies. n_l4 5 th — = 36.25" value. 4 4 2. Find the first quartile class. The 36.25 is contained in 50 under column Quartile ©. Decile > d. Median * 4051 a ctass of 48 stiidents, the passing mark for a statistics test is the 3"* quartile. ‘What does this imply? a. Thirty-six students did not'pass the test. ‘b. Twenty-five percent of the studénts passed the test. . Twenty-five students passed the test. “ . Two of the above are true. . All of the above are true. e BD *For question nos. 11 to 13, consider the score distribution of 15 students given below. ° 83, 72, 87, 79, 75, 82, 76, 77; 73, 86, 81, 79, 82, 79, 74 11. The median score is a. 75 b. 77 . 79 4.74 12. The mean score is . ot a 75 bu 77 ©. 79 74 13. The mode is , a. 75 v.77 79 a4 14. Ina class of 100 students, Roy obtained a score of 60 which is the lowest quartile. This implies that his score is . a, below the 3 quartile c. below the 3 decile b. above the 60" percentile d. above the I* quartile 15, The mean in question no. 14 can also be interpreted so that a. More than 20 students scored above 60. b. Only 65 students scored above-60. c. 87 students scored below 60. d, 45 students scored below 60. *For question nos. 16-19, please refer to table A below. Table A F763 Statistics: Basic Concepts and Applications + 16. In solving for the 60" percentile, the lower boundary to be used is - a 34 b. 39.5 ©. 34,5 d. 39 a 17, What value of thé cumulative frequency is to be used in solving for the’35" percentile? a4 b.7 c, 12 4.18 18. The 45" percentile is . . a, 33.45 b, 35.60 c. 32.87 do 31.33 19. The 50* percentile is . a. 36.0, b. 36.5 c. 37.0 d. 37.5 20. The 50" percentile is equivalent to a. the 5" decile c, the mean score e. all of the above b. the 2" quartile d. two of the above Measures of Central Tendency. eek FE FE EEE SE Introduction and Information The Range The Inter-Quartile Range : The Semi-Interquartile Range or Quartile Deviation. : The Mean Deviation or Average Deviation: : “ The Variance The Standard Deviation Exercises MRT eeu TN AN) COLELLO) In. the preceding chapter, we discussed the computation, comparison and uses of the.several types of measuring the averages of a set of numerical-data — the. mean, the median andthe mode: These averages representthe typical or representative values that tend:to be centrally located in a set of data, atranged by magnitude j ig an increasing or décréasing ordex. Sonie’6f thie tnore commonly used positional measures thai divide a distribution of data into equal parts, known generally as quantiles, were also.discussed. ‘These quantiles are the quartile, decile and percentile. The information on the measures of central location and the positions of division of a set of numerical data; however, are not certainly sufficient to give gn adequate description of the data. We have to know also how the numerical data tetid to spread or scaiter'about the average value. Two sets of data having the sam¢ mean or median may differ considerably in the spreact of their data from the average. For instance, 2 sets of data, A and B, have the same-mean of 50. Data A has a lowest score of 5 and a highest score of 95, while data B has 20 and 80 for its lowest and highest scores, respectively. While the 2 sets of data have the same mean, they differ considerably in the spread of their data in terms of the range. The range of data A is 95 — 5 or 90, while for data B is 80 — 20 or 60. This-indicates that the values:in data B are less scattered about the mean-of 50; hence, more homogeneous than those values in data A. If we will find the differences of the 2 ranges, 90 for A and 60 for B, we will note a big gap of 30, Since the measures. of central tendency, therefore, are not enough to provide complete and useful data and information, there is a need to support them with other computational measures of description, commonty known as the measures of variation or measures of dispersion. ‘The measures of variation or dispersion indicate the degree or extent to which numerical values are dispersed or spread out about the average value in a distribution. In this chapter, we.will discuss the more popularly used measures of variation. These ate the range, the semi-interquartile range, the quartile range,.the mean deviation or average deviation, the variance and the standard deviation, ‘The range which is the simplest to compute, is the difference between the largest and the lowest values in the set of numerical data, The range for ungrouped data is obtained by finding the difference between the largest value and the lowest value. For grouped data, the range is determined hy subtracting the tower boundary of the lowest. Class interval from ‘the upper boundary of the highest class interval of a frequency distribution. This i So because the class boundaries are considered the true limits. ee a Statistics: Basic Concepts and Applications EN ON OO OO ISIN Eg FSR ERED AS Fot Ungrouped Data Range (R) = Highest Value (HV) — Lowest Value (LV) or R=HV~LV For Grouped-Data © Range (R}'= Upper Boundary of the Highest Class-Interval UB ,,.,), Lower hes Boundary of Lowest Class Interval TB ey ) or R = UB, — LB gy Example 1: The scores obtained by 12 students in a Statistics class are = 80, 75, 63, 95, 98, 78, 85, 90, 73, 68, 87 and 81. Find the‘range. 7 Solution: i : The highest score is 98, while the lowest score is 63. Heiice, u Ris HV- LV. 8 ey = 98-63 wl “Res ‘ Example 2: Find the range.of'a.given frequency. disttibution- whose highest class interval is 91-95.and lowest class interval is 51-55. , a Solution: . The upper boundary of the highest class. interval 91-95 is 95.5 and,the lower boundary of the lowest class interval 51-55 is 50.5, Therefore, the range is obtained, as follows: R= UB ge, —1Byey = 95.5-50.5 Ra45 ~ As you may have observed by now, the range considers only the extreme values — the largest and the lowest values — ina distribution and shows only the difference or the _ distance between these 2 values. It does not consider and tell anything. about all the other values between these extreme values. If there are 100 observations, only 2 observations are considered in the calculation of the tange: The rest of the observations are simply ignored. Hence, the range is a poor and unstable measure of variation, particularly, if we consider a large number of values. It is least reliable and should be used only when someone wants to obtain a quick measure of variation. A reliable measure of spread is one which considers or takes into account all the values in a distribution. - Measures of Variation ee “. 'We learned in the preceding chapter that the quartile divides. the distribution of mimerical data into 4 equal parts. The first or lower quartile lies on the 25% of the total number of values, while the third or the upper quartile is on the. 75%. “The interquartile.tange (IQR). is found by’ finding the difference, between, the values of the third quartile (Q,) or upper quartile and the first. quartile @Q) ‘or the lower quartile. In symbols,-we have: JOR = Q,-Q, ai eae ile Pn ram ea Cee HSC : ue exgcP DAO “ . The semicinterquartile range (SIQR) ot quartile deviation ©) indicates the variation or dispersion of the values covering the middle 50% of the distribution of the data, It is found by getting half of the vatue of distance between the thitd quartile or upper quartile and the first quartile or the lower quartile. Thus, we have the equation: SIQR or QD= 03-01 =a Note that the range, the interquartile range, and the semi-interquartile range have the same disadvantage. Each does not provide idea on the density of observations,and hence, gives only litle information ‘on the concentration of the observatioris about the central values. ‘The semi-interquartile range or quartile deviation is an appropriaté-measure of variation only if the median-is the one that is used as the measure of central tendency, and especially if the distribution is skewed. Exampie 1: A manvfacturing company produced the following number of units per day for a given period: 21, 25, 20, 28, 30, 23, 22, 31, 32, 27, 19, 33, 24, 29, 26, and 34. Determine the: a. Range (R) b. Interquartile range QR) c. Semi-interquartile range (SIQR) or Quartile Deviation (QD) a Statistics: Basic Concepts and Applications a Range =HV ALY ge ee = 34-19 : =I5 to magnitude and calculate Q, and Q, as shown below, 1920.21 as 24.25 v 28 29 — 32 33-34 Qo FQ @ Note that Q, is tip to the point containing the lower 25% of the: dita arranged from lowest to highest, while Q, is up to the point containing the upper 75% of the data. Thus, we apply the formulas as follows: b. IQR =0,~ 2, $30.5=22.5 IQR=8 ¢. SIQR or QD = eae = 30.5 ~22.5 2 STOR or QD =4 Example 2: Table 1 below shows the average production of 60 employees of a manufacturing company during agiven week. WOKE Lt " . Vance warts = Find the: Fetes e uoeATES OY a. Range ae b. Interquartile range ce, Semi-interquartile range or Quartile Deviation Table 1, Average Production of 60 Employees qwtehh Vee fxs eet [A hae tg Ease ew Foe Roce ease Be eneOUA oe co eee Ut etnias ° aes age A2blee BB BoM E Ae TORE oe Measures of Variation f04 42 bo “\» Solutions: oe Range @y: R= UB yey EB cy Sos ee EGS OOS labo 4 35 b; Interquavtile Range (QR). First, wé have to calculate the fixst and the third quartiles. Substituting the values to the foimulas we studied in'the preceding chapter, we have: (Gon fea Be 0 pro. o;=70.5+ 3) 4 3 3 vores ‘ ih om peer] SSE] =70.5435| ~~ ( 10. } Sr a2 } Q)=72 o aled F3¢6g-38) “9* Sees (Sh OO Oe = 805-3 a Bor wows ene eX ees u J bp voger 3 45— 38% v =s05+5[ i ] ae 87 LS }, = 83.68 ‘Therefore, JOR=Q,-9, : : oe = 83.68 — 72 oo IQR = 11.68 ~* ‘oy Semi-interquartile Range (SIQR) or the Quaitile Deviation (QD) STOR or OD = 8368-72 2 =5.84 asic ts and Applications Stati _A niore. reliable. measure of variation than the’ range, lié-interquartile. range and the semii-interquartile range’ or quartile deviation is the mean’ deviation. (MD) or average deviation (AD). The mean deviation or average deviation takes into account the deviations of the individual values from the mean. While the deviations from the ° mean of the individual values could either be negative or positive; we shall calculate the mean deviation or average deviation using the absolute values of the.deviations. The absolute value of a number is the value of the number regardless of iti sign. Hence, the absolute vaine’of -7'or +7 is simply 7. Since it‘uses thé absolute values of the deviations, it is sometimes réfetred to as the mean absolute deviation. The mean deviation or average deviation is, thus, defined as the average-of the absolute deviations of the individual values of a set of numerical data from either the tnean, the median or the mode. Among the 3, however, the mean is the most preferred arid commidnly uséd measure of central tendency for comptiting the meat deviation or average deviation. To determine the mean or average deviation, we shall use the following formulas: For Ungrouped Data tx—Xl MporAD ==! nt > For Grouped Data [x-¥) jah ALF — owes, Mo= amd ~ OX where: x— refers’ to the, individual yalue for ungrouped: data, and the midpoint of each class interval for grouped data =~ the mean of the data” n — the total number of frequencies f — the frequency of each class interval For ungrouped. data, the mean deviation or average deviation is determined by following the procedures enumerated below. 1, Arrange the values from the lowest to the highest or'vice versa. 2. Compute the value of the mean. 3. Find the individual absolute value of each deviation from the mean. 4, Find the sum of the absolute values in step 3. 5. Substitute ‘the values in the formula and solve. Measures of Variation “ ae Eee eee ee Or ie ee eee ee ee oe ee _and business concerns if they até influenced directly fram the mosin in either ditection, For instance, Company"A has a'thean deviation of 7P°3,000:fer the salaries.of its” employees, while Company B has a mean deviation of only ® 1,000 for its employees. A comparigon of the-mean deviations indicates that the salaries:of the employees of Compaity A are thrice as dispersed:as the salaries of the employees of Company B.-- Hence, companies sometimés use this measure for planning exelcises, notably wher making economic forecastings. oe Example 1: . The number of TV units sold by an appliance store for a.10-day period 12, 6, 7, 13, 4, 2, 11 and 8, Determine the mean deviation or average deviation... Solutions: . . seed we BE ‘We shail first arrange the number of TV units sold according to magnitude (lowest to highest) and then find the individual absotute value of each deviation from ‘the mean, as shown below. : 2 6 : - 4 4 6 2 : 7 L 8 Oo 8s a) : 9 1 i 3 12 4 . Mean (3) ga _ 80 nt 10 X=8 Mean Deviation (MD) or Average Deviation (AD) Mix-3! MD or AD = n 226 iO MD or AD =2.6 Statistics: Basic Concepts and Applications

You might also like