Professional Documents
Culture Documents
Introduction
“Statistical thinking will one day be as necessary as the ability to read and write.”
- H. G. Wells
We have come into the age of computerization and are becoming rich in information at a
very fast rate. However, data gathered will not make sense unless we know how to use the available
information to make good decisions. This problem can be aided by Statistics because Statistics deals
with the collection, presentation, analysis and interpretation of a set of data in order to yield
meaningful information.
1. Descriptive Statistics – defined as those statistical methods concerned with the collection,
presentation and characterization of a set of data in order to describe the various features of
that set of data properly.
2. Inferential Statistics – defined as those statistical methods that make possible the estimation
of a characteristic of a population or the making of a decision concerning a population based
only on sample results.
Illustration. Suppose a study will be conducted in order to learn about student perceptions
concerning the imposition of a tuition fee increase in MSU.
Remark: Inferential Statistics has been developed due to the benefits of studying only a sample
instead of a whole population.
Advantages of sampling:
In sampling, only a relatively small number of respondents or experimental units will be
involved; thus, it is better because:
1
II. Sampling Procedures
A. Non-probability Sampling – is one in which individuals or items are chosen without regard to
their probability of occurrence. This is usually used when the size of the population is unknown.
Examples:
1. Purposive Sampling - making a sample which agrees with the profile of the population based
on some pre-selected characteristics.
2. Quota Sampling - selecting a specified number (quota) of units possessing certain
characteristics.
3. Convenience Sampling - using results that are readily available.
4. Judgment Sampling - selecting a sample in accordance with an expert’s judgment.
B. Probability Sampling – is one in which the elements of the sample are chosen on the basis of
known probabilities. Each element in the population has an equal and independent chance of
being selected as a sample point. This means that the choice of an element is not influenced by
other considerations such as personal preference, and that the choice of one element is not
dependent upon the choice of another element in the sampling.
2. Systematic Random Sampling – selects every kth element in the population, the first unit
being chosen at random
The population of N units is divided into subpopulations (called strata) and then a sample is
drawn from each strata.
Procedure: Step 1. Classify the population into at least two homogenous strata.
Step 2. Using proportional allocation, draw a sample from each stratum.
2
Example 2.1. At a small private college, the students may be classified according to the
following scheme:
Classification Number of
Students
Senior 150
Junior 163
Sophomore 195
Freshmen 220
If we use proportional allocation to select stratified random sample of size n = 40, how large
a sample must be taken from each stratum?
4. Cluster Sampling – selects a sample containing either all, or a random selection, of the
elements from clusters that have themselves been selected randomly from the population.
Procedure: Step 1. Divide the population area into heterogeneous sections or clusters.
Step 2. Select randomly a few from these clusters.
Exercise 2.1. At a university, students are classified according to the following scheme:
Housing Number of
Students
Campus dormitory 2100
Lodging house 720
Private Residence 3400
Use proportional allocation to determine how many students should be taken from each
classification if we are to select a stratified random sample of size 200.
Advantages:
1. Questions can be repeated, rephrased, or modified for better understanding.
2. Answers may be clarified, thus ensuring more precise information.
3. Information can be evaluated since the interviewer can observe the reaction of the
interviewee and in the case of personal interviews, the interviewer can observe the facial
expression of the interviewee.
Disadvantages:
1. It is too costly.
2. It can cover only a limited number of individuals in a given period of time.
3. Interviewees may feel pressured for on-the-spot responses.
Advantages:
1. It is less expensive and has a greater scope than the interview method.
2. Respondents have enough time to formulate appropriate responses.
4. Experimentation Method
2. Ordinal level – involves data that may be arranged in some order but difference between data
values either cannot be determined or is meaningless
3. Interval level -is like the ordinal level with the additional property that we can determine
meaningful amounts of differences between data
- measurement units are equal
- lacks an inherent zero starting point or lack absolute zero (absolute zero
means the total absence of the characteristic being measured)
- the starting point is arbitrary
Example: temperature in degrees Fahrenheit or degrees Celsius
4. Ratio level - is actually the interval level that has an inherent zero starting point
- differences and ratios are meaningful
- it is possible to make a comparison between two data values
- the highest level of measurement
4
–P0.00 means no income
Suppose Kim earns –P 30,000 a month while Gerald earns –P 15,000 a month, then we can
say that Kim earns –P 15,000 more than Gerald, i.e., Kim earns twice as much as Gerald
does.
Exercise 4.1. Determine which level is most appropriate in measuring each of the following data.
1. student ID number
2. weight of a package
3. inclusive date of employment
4. rating of an instructor (such as excellent,
very good, very satisfactory,
satisfactory, poor)
5. size of a family
6. class size
7. t-shirt size (such as small, medium,
large, extra large)
8. occupation
9. religion
10. rank of 5 contestants in a beauty pageant
11. speed of a car in km/hr
12. number of traffic accidents in a month
13. score in a test
14. zip code
15. home address
16. cellular phone number
17. cellular phone brand
18. highest educational attainment
19. height of a tree
20. civil status
21. age
22. military rank
23. color of the eye
24. nationality
25. dialect spoken
26. birth date
27. Tax Identification Number
28. number of years spent in the Philippines
29. cancer stage (such as stage 1, stage 2,
stage 3)
30. IQ score
5
V. Methods of Presenting Data
Tabular Presentation - information are entered into the appropriate row and column
categories
- may be in the form of a cross tabulation table or a frequency distribution
table
2
Additional columns may be built to obtain additional information about the distributional
characteristics of the data. These are:
a) Class Boundaries (CB) - If the data are continuous, the CB’s reflect the continuous
property of the data. The CB’S are obtained by taking the midpoints of the gaps
between classes.
LCB = LL - ½ * (one unit of measure)
UCB = UL + ½ * (one unit of measure)
b) Class Mark ( x i ) - is the midpoint of a class or interval, i.e., x i = ½ (LL + UL)
or x i = ½(LCB+UCB)
c) Relative Frequency (RF) - is the frequency of a class expressed in proportion to the
total number of observations: RF = frequency ÷ N
RF could also be expressed in percent: RF = (frequency ÷ N) * 100%
d) Cumulative Frequency (Fi) - is the accumulated frequency of a class. It is the total
number of observations whose values do not exceed the upper limit or boundary of
the class.
Example 5.1.
Table 5.2 Frequency Distribution Table of Weights (in kg) of Math 31 Students
Class Class Frequency Class Mark, Relative Cumulative
Boundaries xi Frequency Frequency,
Fi
40 – 46 39.5 – 46.5 6 43 0.15 6
47 – 53 46.5 – 53.5 14 50 0.28 20
54 – 60 53.5 – 60.5 10 57 0.25 30
61 – 67 60.5 – 67.5 6 64 0.15 36
68 – 74 67.5 – 74.5 2 71 0.05 38
75 – 81 74.5 – 81.5 2 78 0.05 40
Bar Chart – is a graph where the different classes are represented by rectangles or bars.
The width of the rectangle is the length of the interval, represented by the
class limits in the horizontal axis, or categories for nominal data. The length
3
of the rectangle, corresponding to the class frequency, is drawn in the vertical
axis.
Bar Chart
16
14
12
Frequency
10
8
6
4
2
0
40 - 46 47 - 53 54 - 60 61 - 67 68 - 74 75 - 81
Weights (in kg) of Math 31 Students
Histogram – closely resembles the bar chart with the basic difference that a bar uses the
class limits for the horizontal axis while the histogram employs the class
boundaries. Using the class boundaries eliminates the spaces between
rectangles, thus giving it a solid appearance.
Histogram
16
14
12
Frequency
10
8
6
4
2
0
39.5 - 46.5 46.5 - 53.5 53.5 - 60.5 60.5 - 67.5 67.5 - 74.5 74.5 - 81.5
Weights (in kg) of Math 31 Students
Frequency Polygon – is constructed by plotting the class marks against the frequency.
Straight lines then connect the set of points formed by the class marks and
their corresponding frequencies together with additional class marks at the
beginning and end of the distribution.
16
Frequency Polygon
14
12
Frequency
10
8
6
4
2
0
36 43 50 57 64 71 78 85
Weights (in kg) of Math 31 Students
4
Frequency Ogive
45
40
35
30
Frequency
25
20
15
10
5
0
39.5 46.5 53.5 60.5 67.5 74.5 81.5
Pie Chart – is a circle divided into pie-shaped sections, which look like slices of a pizza.
The angle of a sector is proportional in size to the frequencies or percentages
but it is advisable to convert the frequency table into percentages.
Pie Chart (Weights of Math 31 Students)
5% 40 - 46
6% 16%
47 - 53
16%
54 - 60
61 - 67
30%
68 - 74
27%
75 - 81