You are on page 1of 82

STATISTICS

Graduate School
FLVARGAS
Janet R.Galapon,PhD.
Professor
Introduction :
Statistics is a branch of mathematics dealing with the
collection, analysis, interpretation, presentation,
and organization of data. I
- Is a branch of science that deals with the
collection, tabulations or presentation, analysis and
interpretation of numerical or quantitative data.
- Kendall and Stuart state : statistics is the branch of
scientific method which deals with the data
obtained by counting or measuring the properties
of populations.
Statistics deals with all aspects of data including
the planning of data collection in terms of the
design of surveys , experiments and investigation.

• Steps in Conducting Research


• 1.) Collection
• 2.) Organization or presentation
• 3.) Analysis
• 4.) Interpretation
1.) Collection
It refers to the gathering of information or
data.
e.g. Interviews, questionnaires, rating scales

2.) Organization or Presentation


It involves summarizing data or information in
textual, graphical, or tabular forms.
3.) Analysis
It involves describing the data by using
statistical methods and procedures.

4.) Interpretation
It refers to the process of making
conclusions based on the analyzed data.
• Why do we need to
study statistics?
In Business

• Statistics is used to test consumers’


preferences and to discover what it is about a
product that gives it its appeal.
• Statistics can also be used in planning
marketing and advertising strategies and in
making changes in a product’s quality to
increase sales.
In Education

• The performance ratings of students in


national examinations are monitored for the
improvement of the quality of education.
•Enrollment rates are also used in developing
programs that can be reach out to out-of-
school children
In Medicine

•Statistics is used in medical research,


providing healthcare professionals with new
knowledge and technology for better
diagnosis, treatment, and the prevention of
certain diseases.
In Sports

•Numerical measures about the performance of


individual players and teams can be calculated
using statistical formulas.
In Politics and Government

•Statistics is used to obtain information about


voters’ attitudes toward certain issues and
candidates. This will help candidates plan their
campaign strategies.
In Entertainment

•The most favorite actresses and actors can be


determined by using surveys.
•Ratings of the members of the board of judges
in a beauty contest are statistically analyzed.
In Agriculture

•Through statistical tools ,an agriculturist can


determine the effectiveness of a new fertilizer
in the growth of plants or crops .
Two Fields of Statistics
1. Descriptive Statistics is concerned with
collection, classification, presentation, analysis
and interpretation of data and to describe the
collected summarized values of group
characteristics of data in:
a. Measures of central tendency
b. Measures of dispersion
c. Skewness
d. Kurtosis and others
2. Inferential Statistics aims to give information about large groups
of data ( population) without dealing with each other and every
element of these groups. It only uses small but representative
portion ( sample ) of the total set of data in order to draw
conclusions or judgments regarding the entire set of data.

- Inferential statistics consists of methods that use sample results to


help make prediction.
Examples; sampling/ sampling distribution, estimation, testing of
hypothesis using:
- z-test
- Chi-square test
- F-test
Quick Check!

• Tell whether the following situations will


make use of descriptive statistics or
inferential statistics

1. A teacher computes the average grade of her


students and then determines the top ten
students.
2. A manager of a business firm predicts
future sales of the company based on the
present sales.

3.A psychologist investigates if there is a


significant relationship between mental age
and chronological age.

4. A researcher studies the effectiveness of a


new fertilizer to increasing food production
5.A janitor counts the number of various
furniture inside the school.

6.A sport journalist determines the most


popular basketball player for this year.

7. A school administrator forecast future


expansion of a school
8.A market vendor investigates the most
popular brand of vinegar.

9.An engineer calculates the average height of


the buildings along Taft Avenue.

10.A dermatologist tests the relative


effectiveness of a new brand of medicine in
curing skin diseases.
• Terminologies in
Statistics
1. Population

A population consists of all elements- individuals,


items, or objects – whose characteristics are
being studied. The population being studied is
called the target population.

• It refers to a large collection of objects,


persons, places, or things.
• A population is denoted or represented by N.
2. Sample

A portion of the population selected for study is


referred to as a Sample.

• It could also be defined as a subgroup, subset,


or representative of a population. A sample is
usually denoted by n.
3. Parameter
It is any numerical or nominal characteristics
of a population. It is a value or measurement
obtained from a population .It is usually
referred to as the true or actual value .
4. Statistic
•It is an estimate of a parameter. It is any value or
measurement obtained from a sample.
5. Data
•These are facts, or a set of information or
observations under study. More specifically, data are
gathered by the researcher from a population or from
a sample.
6. Variable
it is a characteristics or property of a population or
sample which makes the members different from each
other.
Types of Variables

1. Quantitative Variable – it is a variable that can be


measured numerically. The data collected about a
quantitative variable are called quantitative data.
Weight – 100 lbs; height – 34 inches
Quantitative variable can be classified as
-a. discrete variable is a variable whose values are
countable ( 1, 2, 3, . . . ) or values are obtained through
the process of counting.
b. continuous variable is a variable that can assume
infinite values within a specified interval or intervals. The
values are obtained through measuring.
Example : height of he building
c. Dependent Variable
It is a variable which is affected or influenced
by another variable.

d. Independent Variable
It is one which affects or influences the
dependent variable.
2. Qualitative or Categorical variable - a
variable that cannot assume a numerical but
can be classified into rows or more
nonnumeric categories. The data collected
about such a variable are called qualitative
data.
Examples: sex - male , female
year level – 1st year, 2nd year
Qualitative or Quantitative
•number of students in school
•civil status
•nationality
•body mass index
•kinds of poetry
•number of siblings
•number of hours playing online games
•speed of light
7. Constant
It is a property or characteristics of a
population or sample, which makes the
members of the group similar to each other.
Scales of Measurement
Measurement – assignment of symbols or numerals
to objects or events according to some rules.
The Four Measurements Scales
1. Nominal Scale –comes from latin word “nomen”
which means ‘name’.
*used to distinguish one object from another for
identification purposes
*cannot be quantified or ranked
Example : gender , nationality
• 2. Ordinal Scale – applies to data that are
divided into different categories that can be
ranked.
• ( good, poor, excellent), of siblings in the
family; of honor students in the class.
3. Interval Scale – applies to data that can be
ranked or ordered
* can specify the amount of difference
* no point of reference which is called an
absolute
Example : Scores in an examination
To illustrate, suppose Maria got 50 in a Math
examination while Martha got 40. We can say
that Maria got higher than Martha by 10 points.
4. Ratio Scale
Similar to an interval
the only difference is that the ratio level always
starts from an absolute or true zero point.
there is always the presence of units of measure.

Ex. Weight
suppose Mrs.Reyes weighs 50kg, while her
daughter weighs 25kg. We can say that Mrs.Reyes
is twice as heavy as her daughter .
Determining Sample Size

n = N /(1 + Ne)

Example : in a population size of 1000 with e = 5%, what


would be the sample size?
n=? , N= 1000
n= 1000/1+1000(0.05)
= 1000/ 1+ 50
= 1000/ 51
= 19.60 or 20
SAMPLING TECHNIQUE
A. Probability sampling
B. Non-Probability sampling

Probability Sampling – a sampling technique


wherein each member or element of the
population has an equal chance of being
selected as members of the sample.
Types of Probability Sampling
A. Random Sampling
B. Systematic Sampling
C. Stratified Random Sampling
D. Cluster Sampling
E. Multi-stage sampling
Random Sampling
A. Lottery Method
B. Table of random numbers
• Systematic Sampling
- select a random starting point, then draw
successive elements from the population .
k=N/n
Example : the population is 45 and sample size is
14.
k=45/14 = 3.21 or 4
• Stratified Random Sampling
- dividing the elements of a population into
different categories or subpopulations.
Example : a group of students wanted to
conduct an experiment in the university about
the use of cellular phones.
STRATA NUMBER OF PEFCENTAGE SAMPLE TO BE
STUDENTS TAKEN

Elementary 2500

High school 9500

College 4500

N= 16500
• Cluster Sampling
sampling wherein groups or clusters instead
of individuals are randomly chosen:
- area sampling
• Non-Probability Sampling
- is a sampling technique wherein members
of the sample are drawn from the population
based on the judgment of the researcher .
- is also called subjective sampling
* convenience sampling
* Quota Sampling
* Purposive Sampling
• Convenience Sampling
- samples are determined based on
convenient, availability, proximity or
accessibility to the researcher.

• Quota Sampling
- samples are determined with the same
percentage ( as that of startified) but not done
randomly.
• Snowball sampling
- a member of the sample is chosen
through referral of the other members of the
sample.
• Modal Instance sampling
- members of the sample are selected
based on the typical, most frequent
observation or modal cases .
Summation Notation
Summation notation is used to denote the sum of values. The
notation ∑ in this expression represents the sum of all the
values of x.
Example. 1. Find the ages of four managers are 35, 47, 28 and 60
years.
∑x = 35 + 47 + 28 + 60 = 170
2. The following table 1 lists four pairs of A and B values:
A = 12, 17, 22, 32
B= 7, 11, 12, 18
Compute the sum of the following:
a. ∑A b. ∑ B c. ∑AB d. ∑A²B e. ∑ (A -5) ²B
Table 1
A B AB A²B (A-5)²B
12 7 84 1008 343
17 11 187 3179 1584
22 12 264 5808 3468
32 18 576 18432 13122
∑A=83 = 48 =1111 =28427 18517
• Quiz no. 1. Complete the table 2 below:
X Y Y² XY X²Y
10 3
13 7
18 8
28 14
Chapter 1
Frequency Distribution and Graphical Presentation

After collection of data, the researcher should know


how to classify, tabulate, compute, analyze and
interpret the results.
A common procedure is to put them in systematic
order by grouping them in classes in a form of
frequency distribution. The classification of these
numbers may help the researchers to understand
important features of the data. This presents the
arrangement of data in a form of frequency
distribution, tallying the frequencies, and the graph
representation of the frequency distribution.
Frequency Distribution
generally, frequency distribution is any
arrangement of the data that shows the
frequency of different values or groups of
values of a variable. It can be done direct
from the raw data.
• A frequency distribution is a table that
displays the frequency of various outcomes in
a sample.
• Each entry in the table contains the frequency
or count of the occurrences of values within a
particular group or interval, and in this way,
the table summarizes the distribution of
values in the sample.
Frequency distribution is applicable when the
number of cases (N) is equal to or greater than 30
(N ≥ 30). Table 1.1 below shows the scores in
research taken by 72 teacher education students
in certain State Colleges.
100 70 77 80 120 88 93 94 95 81 75 88 110 89 95
96 90 78 88 75 76 90 97 98 125 72 74 81 83 91 100
101 84 73 78 86 87 92 102 103 104 105 106
107 108 109 110 111 112 113 71 72 73 74
75 76 77 78 79 80 81 82 83 84 94 88 85 86 85 87 91 93
The steps in arranging the data in a form of
frequency distribution:
1. Find the range
Range is equal to the highest score (HS)
minus lowest score (LS) or R = HS – LS. In
table 1.1 the highest score is 125 and the
lowest score is 70. hence, R=125-70 = 55.
2. Find the class interval
To find the class interval, simply divide the range by 20 and by 10 so
that the class limits be not less than 10 and not more than 20. the
range of data 1.1 data is 55, thus,
55 = 2.75 55 =5.5
20 10

2.75 5.5
3 4 5 6

The class interval ranges from 3 to 6. in getting the class interval, it is


preferable to choose an odd number. There are two odd numbers, 3
and 5, say for example, 5 is the class interval. Table 1.1 data class limit
of 12 using 5 as class interval. The ideal class limit is 12 to 15 class
limits,
3. Set up the classes
To set the classes add one-half of the class interval or C/2
to the highest score as upper real class limit and subtract
C/2 to the highest score as lower real limit. The highest
score is 125 and the class interval is 5, hence, 125 +2.5 =
127.5 and 125-2.5= 122.5, then subtract 5 or class interval
to the lower limit until the lowest score is reached. This
process setting of class interval.
There are two ways of setting classes, namely, real limit
and integral limit. In integral limit, add 0.5 to the lower
real limit and subtract 0.5 to the upper real limit. Table
1.2 presents the setting of classes of real limit and integral
limit.
Real limit Integral Limit

122.5-127.5 123-127
117.5- 122.5 118-122
112.5-117.5 113-117
107.5- 112.5 108-112
102.5-107.5 103-107

97.5-102.5 98-102
92.5-97.5 93-97
87.5-92.5 88-92
82.5-87.5 83-87
77.5-82.5 78-82
72.5-77.5 73-77
67.5-72.5 68-72
4. Tally the data
To tally the data, locate the scores to its proper class
limit and tally. After tallying, count the number of
tallies and write it under Frequency (f) or column 4
of table 1.3. the tally must be carefully checked to
determine if it is equal to the total number of cases
N. if tally is not equal to the number of cases N,
tallying is reviewed or repeated to arrive at exact N
total number of cases. At the bottom of column 4
(f), symbol N or ∑f stands for the “sum of” equals to
72 or the total number of cases; column 1, real
limit; column 2, integral limit; column 3,tally; and
column 4, frequency f.
Real Limit Integral Limit Tally frequency (f)
(1) (2) (3) (4)

Table 1.3
122.5-127.5 123-127

117.5-122.5 118-122

112.5-117.5 113-117

107.5-112.5 108-112

102.5-107.5 103-107

97.5-102.5 98-102

92.5-97.5 93-97

87.5-92.5 88-92

82.5-87.5 83-87

77.5-82.5 78-82

72.5-77.5 73-77

67.5-72.5 68-72

TOTAL
Cumulative Frequency Distribution
Most often, cumulative frequency
distribution is desired to determine the
number or percentage of values “greater
than≥” or “less than ≤” a specified value. The
cumulative frequencies are obtained by
cumulating or adding successively individual
frequencies starting from the bottom for
“lesser than” or at the top for greater than.
For instance, Table 1.4 cumulative frequency
lesser than (CF≤) of column 4, the frequency at
the bottom class limit, 68-72, is 4; class limit
73-77,11; 4+11 = 15. This process continues
until the topmost class limit,123-127, is
reached and cumulative frequency lesser than
is equal to the number of cases N or 72. For
CF≥ (is greater than) of column 5, starting from
the topmost class limit ,123-127 has frequency
of 1; 118-112, has frequency of 1; 1+1=2 and so
on until the bottom class limit, 68-72 is reached
and is equal to 72 or total number of cases.
Table 1.4- Cumulative Frequency Distribution of
Lesser than and Greater than
Real limit Integral limit Tally Frequency CF lesser than CF greater than
11 2 3 4 5 6
Cumulative percentage frequency
Cumulative percentage frequency is obtained by
dividing the cumulative frequency by the total
number of cases N times 100. the formula is as
follows:
CPF = CF x 100
N
Where : CPF = cumulative percentage frequency
CF = cumulative Frequency
N = total number of cases
Given: highest class limit 123-127, CF= 72, N= 72
CPF≤ = CF≤ x 100
= 72 x 100
72
CPF≤ =100%
The cumulative percentage frequency lesser
than (CPF≤) of second highest class limit, 118-
122, has CF≤ 0f 71, thus, 71/72=0.983 .
This means that 98.31 percent of the students
have scores of 122 and below. For third
highest class limit,113-117, has CF≤ of 70,
hence, CPF≤ is 70/72=0.97222 x 100=97.22
percent. This means that 97.22 percent of the
students have scores of 117 and below. This
process continues until the lowest class
limit,68-72,is reached.
• For cumulative percentage frequency greater
than (CPF≥) highest class limit of 123-127, the
CF≥ is 1 divided by 72 equals
1/72=0.0138889x100=1.39 percent; second
highest class limit, 118-122,has cumulative
frequency greater than (CF≥) of 2,hence,
cumulative percentage frequency greater than
is 2 divided by 72 equals 2/72=0.0777778 x
100=7.78%. This means that7.78percent of
the students have scores of 122 and above.
This process continues until the lowest class
limit is reached.
Table 6.5 presents the distribution of cumulative
percentage frequency “lesser than” and
“greater than”.
Integral(CL) Absolute(f) CF≥ CF≤ CPF≥ CPF≤
4 5 6 7 8
72 72/72X100=
100
71 98.61
70 97.22
69 .95.83
63 87.50
58 80.55
53 73.61
45 62.5
35 48.61
25 34.72
15 20.83
4 5.55
Seatwork no.1
Compute for the range, frequency, cumulative
frequency greater than and lesser than, and the
cumulative percentage greater then and lesser
than of the following scores of 30 students in
Statistics.

75 50 76 42 75 88
76 58 86 48 79 105
85 98 79 52 89 100
86 110 83 55 91 68
90 67 93 65 93 78
• R= HS-LS
= 110-42
=68

68/10=6.8
68/20=3.4 3.4 6.8
3---4---5----6---7
7/2=3.5
110+3.5=113.5
110-3.5=106.5
REAL LIMIT INTEGRAL LIMIT TALLY FREQUENCY CF≤ CF≥ CPF≤ CPF≥

106.5 -113.5 107 -113 1 1 30 1

99.5-106.5 100 – 106 2 2 29 3

92.5-99.5 93-99 3 3 27 6

85.5-92.5 86- 92 6 6 24 12

78.5-85.5 79-85 4 4 18 16

71.5-78.5 72-78 5 5 14 21

64.5-71.5 65-71 3 3 9 24

57.5-64.5 58-64 1 1 6 25

50.5-57.5 51-57 2 2 5 27
43.5- 50.5 44 – 50 2 3 29
36.5 – 43.5 37 -43 1 1 30
• Quiz no. 2
• Given below are the scores of 44 students in
Science.
2 17 8 7 18 11 2 18 13
3 22 11 23 16 30 10 8 5
19 22 19 12 26 9 13 17 8
21 6 4 8 8 18 22 12 18
2 21 12 8 15 8 13 8
Real Integral midpoint tally frequenry cf≤ cf≥ cpf≤ cpf≥
limit limit
1. Construct a frequency distribution with the
following columns:
a. real limit
b. integral limit
c. midpoint
d. tally
e. frequency (f)
f. cumulative frequency lesser than ( CF≤ )
g. cumulative frequency greater than ( CF≥ )
h. cumulative percentage frequency (CPF ≤
and CPF ≥)
Graphic Representation of Frequency
Distribution
A graph is geometric image or a mathematical
picture of a set of data. For this purpose, line
graph and bar graph are commonly used in
thesis, dissertation, and research paper in
presenting the data.
Frequency distribution are often presented
graphically to give the researchers and readers
understandable essential features of the form
of distribution and to compare one frequency
distribution with the other. The graphical
representation is based on tables.
Tabular form is a systematic arrangement of
research data in rows and columns. Each
category in the table is placed in a row or column
and the data are assigned in suitable cells.
Row in a table refer to an horizontal line, while
column refer to a vertical line
A good research data table consists of four parts;
1. Table caption
2. Stub
3. Box heads
4. body
1.Table caption- includes the table number and
heading. The caption explains briefly the contents of
the table. Table caption is written above the body.
2. Stub – this refers to rows of the table where it is
found at the left side. example, class limit, 123-127
and so on.
3. Box heads- these are headings within the box of the
table wherein the data are to be emphasized. The
box heads are midpoint, frequency, CF, CFP.
4. Body of the table-refers to the main part of the
table consisting of figures which are placed in
column aligned with the box heads . (data under
each box heads)
• Table 1.1. ( table caption)
Frequency Distribution with Midpoints and
cumulative Frequencies lesser than and Greater than
of Research Scores Taken by 72 Teacher Education
students in certain state University
* Stub refers to rows of the table(5 rows)
Stub1 Heading2 Heading3 Heading4 Heading5
Column 1 Column 2

Stub 2 Row 1 Row 1


Row 1 column 1 column 2
Stub 3
Row 2
Stub 4
4 Kinds of Graph
1. Line graph
2. Bar graph
3. Pictograph
4. Circle ( Pie ) graph
* Line Graph- this is made by plotting the data
with a dot and connecting the plotted points by
means of straight line. In other words, the X-
axis and Y-axis intersect with each other .
• The graph has 4 essential parts.
1. Caption – is placed below the graph
representing the figure, graph number, title
heading.
2. Stub or Y-axis
3. Reasonable proportion of data or X-axis
4. Body
( f=y-axis vs midpt,x-axis ; cf vs upper limit; cpf vs
upper limit )
* Bar Graph – is another way of representing
data in graphical form. It represents research
data in vertical rectangles or bars. The bars are
drawn with their base equal to each other.
* Pictograph – is a kind of graph which uses
pictures or symbols as information.
* Circle graph or pie graph- is another way of
presenting research data in circular form. The
data are presented in percent (%) or actual
figure and divided into parts.
• Examples
1. Draw line graph and a simple bar graph to
represent the profits of a bank for 5 years.

Years 1989 1990 1991 1992 1993

Profit ( thousand ) 10 12 18 25 42
2. Draw a multiple bar chart to represent the
import and export of Canada ( Value in $ )for
the years 1991 to 1995.
YEARS IMPORTS EXPORTS

1991 7930 4260

1992 8850 5225

1993 9780 6150

1994 11720 7340

1995 12150 8145


QUIZ # 3
A. Make a table from the given below.
B.Prepare a pictograph, bar graph, line graph and circle
graph of the following milkfish bones burger sales from
August to December 2014 of Mr. X research project .
Each P stands for10 ,000.00.
August - 20,000.00
September- 24,000.00
October- 36,000.00
November - 30,000.00
December - 50,000.00
C. From the example of the monthly income of 150
households, illustrate in a line graph and bar or
histogram of the distribution of the monthly income.

You might also like