Intro To Statistics

I.
Introduction
“Statistical thinking will one day be as necessary as the ability to read and write.”
- H. G. Wells
We have come into the age of computerization and are becoming rich in information at a
very fast rate. However, data gathered will not make sense unless we know how to use the available
information to make good decisions. This problem can be aided by Statistics because Statistics deals
with the collection, presentation, analysis and interpretation of a set of data in order to yield
meaningful information.
Some uses of Statistics:
1. To know how to properly present and describe information.

2. To know how to draw conclusions about large populations based only on information
obtained from samples.
Population – refers to the totality of the observations of which the study is concerned
Sample – refers to a part or subset of a population
3. To know how to improve processes like strategies on how to improve sales or quality of a
product or services delivered by an organization.
4. To know how to obtain reliable forecasts.
Two Major Areas of Statistics:
1. Descriptive Statistics – defined as those statistical methods concerned with the collection,
presentation and characterization of a set of data in order to describe the various features of
that set of data properly.
2. Inferential Statistics – defined as those statistical methods that make possible the estimation
of a characteristic of a population or the making of a decision concerning a population based
only on sample results.
Illustration. Suppose a study will be conducted in order to learn about student perceptions
concerning the imposition of a tuition fee increase in MSU.
Population of the study: All currently enrolled students of MSU.

Main objective of the study: To estimate the various attitudes or characteristics of interest of
the entire population.
Application of Inferential Statistics: Select a sample from the population and use the
statistics computed from the sample to draw conclusions about the population parameters or
characteristics.
Remark: Inferential Statistics has been developed due to the benefits of studying only a sample
instead of a whole population.
Advantages of sampling:
In sampling, only a relatively small number of respondents or experimental units will be
involved; thus, it is better because:
1. it entails lesser cost;

2. it is less time consuming;
3. it is less cumbersome and more practical to administer; and
4. some experiments are destructive so it is not possible to involve the whole population. For
example, in the quality sampling of products, if you test or taste the product, usually it cannot
be sold anymore.
Sampling also has disadvantages, the biggest of which is that the sample may not truly reflect
the characteristic of the population and this would lead to wrong conclusions. Hence, care must be
taken in choosing a sample.
1
II. Sampling Procedures
A. Non-probability Sampling – is one in which individuals or items are chosen without regard to
their probability of occurrence. This is usually used when the size of the population is unknown.
Examples:
1. Purposive Sampling - making a sample which agrees with the profile of the population based
on some pre-selected characteristics.
2. Quota Sampling - selecting a specified number (quota) of units possessing certain
characteristics.
3. Convenience Sampling - using results that are readily available.
4. Judgment Sampling - selecting a sample in accordance with an expert’s judgment.
B. Probability Sampling – is one in which the elements of the sample are chosen on the basis of
known probabilities. Each element in the population has an equal and independent chance of
being selected as a sample point. This means that the choice of an element is not influenced by
other considerations such as personal preference, and that the choice of one element is not
dependent upon the choice of another element in the sampling.
1. Simple Random Sampling (SRS) – may be done with or without replacement
Procedure: Step 1. Number the elements of the population from 1 to N.

Step 2. Select n numbers from 1 to N using a random process like raffling or
using a table of random numbers.
2. Systematic Random Sampling – selects every kth element in the population, the first unit
being chosen at random
Procedure: Step 1. Number the population units from 1 to N.

Step 2. Determine the sampling interval k: k = N/n, where N = population
size and n = sample size.
Step 3. Select a random start r, 1  r  k. The first unit of the sample is the
unit corresponding to r.
Illustration: In a population of 120 individuals, choose a random sample of size 24.
Solution: Since k = 120/24 = 5, we have r = 1, 2, 3, 4, 5.
If we choose r = 3, the sample points will be those numbered 3, 6, 9, 12, . . . ,
72.
3. Stratified Random Sampling
The population of N units is divided into subpopulations (called strata) and then a sample is
drawn from each strata.
Procedure: Step 1. Classify the population into at least two homogenous strata.
Step 2. Using proportional allocation, draw a sample from each stratum.
In proportional allocation, the number of units to be taken from each stratum is

proportional to the size of the subpopulation; that is, between two strata of different sizes, a
bigger sample will be taken from the bigger strata.
Proportional allocation. If the size N of the population is divided into k homogenous

subpopulations or strata of sizes N1, N2, . . ., Nk, then the sample size to be taken form each
 Ni 
stratum i is obtained using the formula ni =   n, i = 1, 2, . . . , k.
 N 
2
Example 2.1. At a small private college, the students may be classified according to the
following scheme:
Classification Number of
Students
Senior 150
Junior 163
Sophomore 195
Freshmen 220
If we use proportional allocation to select stratified random sample of size n = 40, how large
a sample must be taken from each stratum?
Solution: Since n = 24 and N = 150 + 163 + 195 + 220 = 728, then –

 150   163   195   220 
n1   40  8 n2   40  9 n3   40  11 n4   40  12 .
 728   728   728   728 
Note: The values computed above for each ni are rounded off to the nearest integer.
4. Cluster Sampling – selects a sample containing either all, or a random selection, of the
elements from clusters that have themselves been selected randomly from the population.
Procedure: Step 1. Divide the population area into heterogeneous sections or clusters.
Step 2. Select randomly a few from these clusters.
Exercise 2.1. At a university, students are classified according to the following scheme:
Housing Number of
Students
Campus dormitory 2100
Lodging house 720
Private Residence 3400
Use proportional allocation to determine how many students should be taken from each
classification if we are to select a stratified random sample of size 200.
III. Methods of Collecting Data
1. Interview Method – is a person-to-person encounter between the one soliciting information

(also known as the interviewer) and the one supplying the information (also known as the
interviewee). It can be conducted in person or through a telephone conversation.
Advantages:
1. Questions can be repeated, rephrased, or modified for better understanding.
2. Answers may be clarified, thus ensuring more precise information.
3. Information can be evaluated since the interviewer can observe the reaction of the
interviewee and in the case of personal interviews, the interviewer can observe the facial
expression of the interviewee.
Disadvantages:
1. It is too costly.
2. It can cover only a limited number of individuals in a given period of time.
3. Interviewees may feel pressured for on-the-spot responses.
2. Questionnaire Method – could be mailed or hand-carried (delivered in person)
Advantages:
1. It is less expensive and has a greater scope than the interview method.
2. Respondents have enough time to formulate appropriate responses.
Disadvantage: Low return rate.

3
3. Observation Method - appropriate in obtaining data pertaining to behavior of an individual or
group of individuals at the time of occurrence of a given situation. Subjects may be observed
individually or collectively.
Limitation:Observation is made only at the time of occurrence of the appropriate event/s.
4. Experimentation Method
5. Use of existing data

a) from documents (books and magazines, hospital records, public files, registrations, etc.)
b) from the internet
IV. Levels of Measurement
1. Nominal level - values fall into unordered categories or classes

- data are qualitative and can be used as measures of identity
- data can be coded but these codes do not have neither the ordering property nor
a mathematical significance
- lowest level of measurement
Example: blood type: 1 – Type A 2 – Type B 3 – Type AB 4 – Type O
 The numbers 1, 2, 3, 4 above have no inherent mathematical properties, i.e., assigning 4

to Type O and 1 to Type A does not mean that Type O is better then Type A. Moreover,
the assignment of codes is not unique. For instance, 0 may be assigned to Type A, 1 to
Type B, and so on.
 4 – 1 = 3 but this does not mean that if we subtract a person who is blood Type O from a
person who is blood Type A we get a person who is blood Type AB.
 The numbers are used only to facilitate data analysis using the computer
Other examples: color, gender, product brand
2. Ordinal level – involves data that may be arranged in some order but difference between data
values either cannot be determined or is meaningless
Example: rank of students in a graduating class (1 – valedictorian, 2 – salutatorian, and so on)

 A rank of 5 is better than a rank of 10
 The difference of 5 between the 5th and 10th ranks is meaningless, i.e., the difference of 5
between ranks 5 and 10 is not necessarily the same as the difference between ranks 20
and 25
3. Interval level -is like the ordinal level with the additional property that we can determine
meaningful amounts of differences between data
- measurement units are equal
- lacks an inherent zero starting point or lack absolute zero (absolute zero
means the total absence of the characteristic being measured)
- the starting point is arbitrary
Example: temperature in degrees Fahrenheit or degrees Celsius
 The freezing point of water in Celsius is 0 while in Fahrenheit it is 32.

 30 Celsius is hotter than 15 but it is wrong to conclude that 30 is twice as hot as 15
 0 does not mean the total absence of heat
4. Ratio level - is actually the interval level that has an inherent zero starting point
- differences and ratios are meaningful
- it is possible to make a comparison between two data values
- the highest level of measurement
Example: monthly income
4
 –P0.00 means no income
 Suppose Kim earns –P 30,000 a month while Gerald earns –P 15,000 a month, then we can
say that Kim earns –P 15,000 more than Gerald, i.e., Kim earns twice as much as Gerald
does.
Exercise 4.1. Determine which level is most appropriate in measuring each of the following data.
1. student ID number
2. weight of a package
3. inclusive date of employment
4. rating of an instructor (such as excellent,
very good, very satisfactory,
satisfactory, poor)
5. size of a family
6. class size
7. t-shirt size (such as small, medium,
large, extra large)
8. occupation
9. religion
10. rank of 5 contestants in a beauty pageant
11. speed of a car in km/hr
12. number of traffic accidents in a month
13. score in a test
14. zip code
15. home address
16. cellular phone number
17. cellular phone brand
18. highest educational attainment
19. height of a tree
20. civil status
21. age
22. military rank
23. color of the eye
24. nationality
25. dialect spoken
26. birth date
27. Tax Identification Number
28. number of years spent in the Philippines
29. cancer stage (such as stage 1, stage 2,
stage 3)
30. IQ score
5
V. Methods of Presenting Data
Methods of presenting data:

I. Tabular presentation
II. Graphical presentation
Tabular Presentation - information are entered into the appropriate row and column
categories
- may be in the form of a cross tabulation table or a frequency distribution
table
1. Example of a Cross Tabulation Table:

Table 5.1.
Distribution of Ethnic Affiliation by Gender of MSU ILS 1st Year Students (AY 2005-06)
Gender
Male Female Total
Maranao 29 55 84
Non-
Tribe Maranao 12 4 16
Total 41 59 100
Source: MSU-ILS Survey Report AY 2005-06 (Undergraduate Thesis)
2. Frequency Distribution Table (FDT) - a grouping of all the observations into

classes or intervals together with a count of the number of observations that fall in each
class or interval
Steps in constructing a frequency distribution table:
1. Compute the range R, where R = (highest value) – (lowest value).

2. Determine the number of classes k. You may use any of the formula for k below or
you may choose your own number of classes.
a) k = N
b) k = 1 + 3.322 log10N, where N = number of observations
Round off k to the nearest whole number.
3. Calculate the class width c (also called class size): c= Rk.
Round up c to the nearest value whose precision is the same as those of the raw data.
4. Construct the classes as follows. Each class is an interval of values defined by its
lower and upper class limits.
List the lower class limit (LL) of the first class. The starting lower limit could be
the lowest value or any smaller number close to it.
List the lower limits of the succeeding classes by simply adding c (the class
width) to the lower limit of the preceding class.
The upper limit (UL) of the first class can then be obtained by subtracting one
unit of measure from the lower limit of the next class. The upper limits of the rest of
the classes can then be obtained in a similar fashion or by adding c to the upper limit
of the preceding class.
5. Tally the frequencies (fi) for each class constructed.
2
Additional columns may be built to obtain additional information about the distributional
characteristics of the data. These are:
a) Class Boundaries (CB) - If the data are continuous, the CB’s reflect the continuous
property of the data. The CB’S are obtained by taking the midpoints of the gaps
between classes.
LCB = LL - ½ * (one unit of measure)
UCB = UL + ½ * (one unit of measure)
b) Class Mark ( x i ) - is the midpoint of a class or interval, i.e., x i = ½ (LL + UL)
or x i = ½(LCB+UCB)
c) Relative Frequency (RF) - is the frequency of a class expressed in proportion to the
total number of observations: RF = frequency ÷ N
RF could also be expressed in percent: RF = (frequency ÷ N) * 100%
d) Cumulative Frequency (Fi) - is the accumulated frequency of a class. It is the total
number of observations whose values do not exceed the upper limit or boundary of
the class.
Example 5.1.
Table 5.1. Weights (in kg) of Math 31 Students

63 59 43 60 41 53 56 81
50 66 62 52 49 48 52 40
64 64 47 53 47 54 62 56
58 53 50 47 79 70 45 47
46 58 56 55 56 45 73 49
Step 1. Compute the range: R = 81 – 40 = 41

Step 2. Estimate the number of classes: k = 40 = 6.325  6 or k = 1 + 3.322 log1040 =
6.322  6
Step 3. Compute the class width: c = 41 ÷ 6 = 6.833  7
Table 5.2 Frequency Distribution Table of Weights (in kg) of Math 31 Students
Class Class Frequency Class Mark, Relative Cumulative
Boundaries xi Frequency Frequency,
Fi
40 – 46 39.5 – 46.5 6 43 0.15 6
47 – 53 46.5 – 53.5 14 50 0.28 20
54 – 60 53.5 – 60.5 10 57 0.25 30
61 – 67 60.5 – 67.5 6 64 0.15 36
68 – 74 67.5 – 74.5 2 71 0.05 38
75 – 81 74.5 – 81.5 2 78 0.05 40
Graphical Presentation – information are presented graphically by means of a bar chart,

histogram, line graph or frequency polygon, frequency ogive, pie
chart, pictograph, etc.
Bar Chart – is a graph where the different classes are represented by rectangles or bars.
The width of the rectangle is the length of the interval, represented by the
class limits in the horizontal axis, or categories for nominal data. The length
3
of the rectangle, corresponding to the class frequency, is drawn in the vertical
axis.
Bar Chart
16
14
12
Frequency
10
8
6
4
2
0
40 - 46 47 - 53 54 - 60 61 - 67 68 - 74 75 - 81
Weights (in kg) of Math 31 Students
Histogram – closely resembles the bar chart with the basic difference that a bar uses the
class limits for the horizontal axis while the histogram employs the class
boundaries. Using the class boundaries eliminates the spaces between
rectangles, thus giving it a solid appearance.
Histogram
16
14
12
Frequency
10
8
6
4
2
0
39.5 - 46.5 46.5 - 53.5 53.5 - 60.5 60.5 - 67.5 67.5 - 74.5 74.5 - 81.5
Frequency Polygon – is constructed by plotting the class marks against the frequency.
Straight lines then connect the set of points formed by the class marks and
their corresponding frequencies together with additional class marks at the
beginning and end of the distribution.
16
Frequency Polygon
14
12
Frequency
10
8
6
4
2
0
36 43 50 57 64 71 78 85
Frequency Ogive – represents a cumulative frequency distribution. It is constructed by

plotting class boundaries on the horizontal scale and the cumulative frequency
less than the upper class boundaries in the vertical scale.
4
Frequency Ogive
45
40
35
30
Frequency
25
20
15
10
5
0
39.5 46.5 53.5 60.5 67.5 74.5 81.5
Pie Chart – is a circle divided into pie-shaped sections, which look like slices of a pizza.
The angle of a sector is proportional in size to the frequencies or percentages
but it is advisable to convert the frequency table into percentages.
Pie Chart (Weights of Math 31 Students)
5% 40 - 46
6% 16%
47 - 53
16%
54 - 60
61 - 67
30%
68 - 74
27%
75 - 81

Intro To Statistics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro To Statistics

Uploaded by

Copyright:

Available Formats

I.

Some uses of Statistics:

1. To know how to properly present and describe information.

Two Major Areas of Statistics:

Population of the study: All currently enrolled students of MSU.

1. it entails lesser cost;

1. Simple Random Sampling (SRS) – may be done with or without replacement

Procedure: Step 1. Number the elements of the population from 1 to N.

Procedure: Step 1. Number the population units from 1 to N.

3. Stratified Random Sampling

In proportional allocation, the number of units to be taken from each stratum is

Proportional allocation. If the size N of the population is divided into k homogenous

Solution: Since n = 24 and N = 150 + 163 + 195 + 220 = 728, then –

III. Methods of Collecting Data

1. Interview Method – is a person-to-person encounter between the one soliciting information

2. Questionnaire Method – could be mailed or hand-carried (delivered in person)

Disadvantage: Low return rate.

Limitation:Observation is made only at the time of occurrence of the appropriate event/s.

5. Use of existing data

IV. Levels of Measurement

1. Nominal level - values fall into unordered categories or classes

Example: blood type: 1 – Type A 2 – Type B 3 – Type AB 4 – Type O

 The numbers 1, 2, 3, 4 above have no inherent mathematical properties, i.e., assigning 4

Example: rank of students in a graduating class (1 – valedictorian, 2 – salutatorian, and so on)

 The freezing point of water in Celsius is 0 while in Fahrenheit it is 32.

Example: monthly income

Methods of presenting data:

1. Example of a Cross Tabulation Table:

2. Frequency Distribution Table (FDT) - a grouping of all the observations into

Steps in constructing a frequency distribution table:

1. Compute the range R, where R = (highest value) – (lowest value).

Table 5.1. Weights (in kg) of Math 31 Students

Step 1. Compute the range: R = 81 – 40 = 41

Graphical Presentation – information are presented graphically by means of a bar chart,

Frequency Ogive – represents a cumulative frequency distribution. It is constructed by

Weights (in kg) of Math 31 Students

You might also like