Professional Documents
Culture Documents
Module 3
Module 3
STATISTICS
What is Statistics?
Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way
that meaningful conclusions can be drawn from them. In general, its investigations and analyses fall into two broad
categories called descriptive and inferential statistics.
Descriptive statistics deals with the processing of data without attempting to draw any inferences from it. The data are
presented in the form of tables and graphs. The characteristics of the data are described in simple terms. Events that are
dealt with include everyday happenings such as accidents, prices of goods, business, incomes, epidemics, sports data,
population data.
Inferential statistics is a scientific discipline that uses mathematical tools to make forecasts and projections by analyzing
the given data. This is of use to people employed in such fields as engineering, economics, biology, the social sciences,
business, agriculture and communications.
II. Pre-test
TEACHING FRAME 1
INTRODUCTION TO DATA MANAGEMENT
A. ORGANIZATION OF DATA
When conducting a statistical research, investigation or study, the research must gather data for the particular
variable under investigation. To describe situations, make conclusions, and draw inferences about events, the researcher
must organize the data gathered in some meaningful way. The easiest way and widely used of organizing data is to
construct a frequency distribution. A frequency distribution is a grouping of the data into categories showing the number
of observations in each of the non-overlapping classes.
After organizing data, the next move of the researcher is to present the data so they can be understood easily by
those who will benefit from reading the study. The most useful method of presenting data is by constructing graphs and
charts.
Before we get started in constructing frequency distribution, we must define some terms that are essential to
understand deeper the nature of data that are displayed in a frequency distribution.
Raw data is the data collected in original form.
Range is the difference of the highest value and the lowest value in a distribution.
Frequency distribution is the organization of data in a tabular form, using mutually exclusive classes showing the
number of observations in each.
Class Limits is the highest and lowest values describing a class.
Class Boundaries is the upper and lower values of a class for group frequency distribution whose values has
additional decimal place more than the class limits and end with the digit 5.
Interval is the distance between the class lower boundary and the class upper boundary and it is denoted by the
symbol i.
Frequency (f) is the number of values in a specific class of a frequency distribution.
Percentage is obtained by multiplying the relative frequency by 100%.
Cumulative Frequency (cf) is the sum of the frequencies accumulated up to the upper boundary of a class in a
frequency distribution.
The categorical frequency distribution is used to organize nominal-level or ordinal-level type of data.
Example 1. Twenty applicants were given a performance evaluation appraisal. The data set is
Solution:
Percentage =
Generally, the number of classes for a frequency distribution table varies from 5 to 20, depending primarily on the
number of observations in the data set. It is preferred to have more classes as the size of the data set increases. The
decision about the number of classes depends on the method used by the researcher.
1. Rule 1. To determine the number of classes is to use the smallest positive integer k such that , where n
is the total number of observations.
2. Rule 2. Another way to determine the class interval is by applying the formula below.
Example 2. Suppose a researcher wished to do a study on the monthly salary of young professionals of selected
companies in Cauayan City. The research first would have to collect the data by asking each young professional about his
monthly salary. The data collected in original form is called raw data.
17,400 29,500 23,500 32,400 27,300 20,200 24,600 21,300 22,750 26,200
14,000 27,500 22,900 30,500 26,500 17,950 23,700 20,250 21,750 24,750
a. Range e. Percentages
b. Interval f. Cumulative frequencies
c. Class limits
d. Relative frequencies
Solution:
Step 1. Arrange the raw data in ascending or descending order. In this particular example we will arrange raw data in
ascending order. This will make it easier for us to tally the data
14,000 17,950 20,250 21,750 22,900 23,700 24,750 26,500 27,500 30,500
14,300 18,350 20,300 21,800 22,900 23,700 25,000 26,500 27,600 30,650
15,500 18,400 20,400 21,900 23,000 23,850 25,000 26,800 27,800 30,700
15.700 18,700 20,500 21,900 23,200 24,100 25,150 26,900 27,900 30,700
17,000 18,800 20.800 22,000 23,400 24,300 26,000 27,000 27,900 30,750
17,300 20,000 21,000 22,600 23,400 24,500 26,100 27,000 29,300 32,100
17,400 20,200 21,300 22,750 23,500 24,600 26,200 27,300 29,500 32,400
17,800 20,250 21,600 22,800 23,700 24,700 26,300 27,400 30,400 33,500
Generally, the class interval (or width) should be equal for all classes. The classes must cover all the values in the
raw data (that is, from lowest to highest. Class interval is generated using the formula:
Note: Round the value of the interval up to the nearest whole number if there is a remainder.
The starting point can be the smallest data value or any convenient number less than the smallest data value. In our
cased 14,000 is used.
We need to add the interval (or width) to the lowest score taken as the starting point to obtain the lower limit of the
next class. Keep adding until we reach the 7 classes, as reflected 14,000; 16,800; 19,600; 22,400; 25,200; 28,000 and
30,800
To obtain the upper class limits, we first need to add the interval to the lower limit of the class to obtain the upper
limit of the first class. That is, 14,000 + 2,800 = 16,800. Then add the interval (or width) to each lower limit to obtain
all the upper limits.
Class Limits
14,000 < 16,800
16,800 < 19,600 Step 3. Tally the raw data.
19,600 < 22,400
22,400 < 25,200
25,200 < 28,000
28,000 < 30,800
30,800 < 33,600
Class Limits Tally
14,000 < 16,800 llll
16,800 < 19,600 lllll – llll
19,600 < 22,400 lllll – lllll – lllll – l
22,400 < 25,200 lllll – lllll – lllll – lllll –
25,200 < 28,000 lll
28,000 < 30,800 lllll – lllll – lllll – ll
30,800 < 33,600 lllll – lll
lll
Step 5. Determine the relative frequency. It can be found by dividing each frequency by the total frequency.
Step 6. Determine the percentage. It can be found by multiplying 100% in each relative frequency.
Step 7. Determine the cumulative frequencies. The cumulative frequency can be found by adding the frequency in each
class to the total frequencies of the classes preceding that class.
Step 8. Determine the midpoints. The midpoint can be found by getting the average of the upper limit and lower limit in
each class.
Example 3. RAF Travel Agency, a nationwide local travel agency, offers special rates on summer period. The owner
wants additional information on the ages of those people taking travel tours. A random sample of 50 customers taking
travel tours last summer revealed these ages.
18 29 42 57 61 67 37 49 53 47
24 34 45 58 63 70 39 51 54 48
28 36 46 60 66 77 40 52 56 49
19 31 44 58 62 68 38 50 54 48
27 36 46 59 64 74 39 51 55 48
Solution:
18 29 37 42 47 49 53 57 61 67
19 31 38 44 48 50 54 58 62 68
24 34 39 45 48 51 54 58 63 70
27 36 39 46 48 51 55 59 64 74
28 36 40 46 49 52 56 60 66 77
Select a starting point for the lowest class limit. The lowest value in the data set is 18, this will also serve as our
starting point.
Set the individual class limit. We will add 9 to each lower class limit until reaching the number of classes (18, 27, 36,
45, 54, 63, and 72). To obtain the upper class limit, we need to add 9 to the lower limit of the class to obtain the
upper limit of the first class. Then add the interval (or width) to each upper limit to obtain all the upper limits (27, 36,
45, 54, 63, 72, and 81).
Class Limits
18 < 27
27 < 36
36 < 45
45 < 54
54 < 63
63 < 72
72 < 81