You are on page 1of 21

Basic Statistical

Concepts

Definition of Terms
Statistics refer to the science that deals with the
collection, tabulation or presentation, analysis, and
interpretation of numerical or quantitative data.
Collection of data refers to the process of obtaining
numerical measurements. Tabulation or presentation of
data refers to the organization of data into tables,
graphs or charts, so that logical and statistical
conclusion can be derived from the collected
measurements. Analysis of data pertains to the process
of extracting the given data relevant information from
which numerical description can be formulated.
Interpretation of data refers to the task of drawing
conclusions from the analyzed data. It also normally
involves the formulation of forecasts or prediction about
larger groups based on the data collected from small
group.

Population and Sample


Population or the universe refers to the collection of
all traits under study or under consideration. A small part of
this big group is called a sample.
Example of population and sample: graduate students
of NEUST are an example of population while students in
the MPA program are a sample.
Using the language of mathematics, the universal set
is the population while the subset refers to the sample.
Hence, If the universal set is a set of counting numbers,
the set of even numbers is a subset, so with the set of odd
numbers.
A population can be finite or infinite. The population of
a certain school in a particular term is finite while the
population consisting of all possible outcomes (heads, tails
in successive tosses of coin) is infinite.

Parameter and Statistics


A parameter refers to the numerical characteristic of the
population like the population mean, population standard
deviation, population variance, and many more. It is
usually unknown and estimated only by a corresponding
statistic computed from the sample data. Thus, the
population mean is estimated by the sample mean,
population standard deviation through the sample
standard deviation, the population variance by the sample
variance, etc. The mean weight of a sample of 100
sophomore students selected from the entire population
of the sophomore students in a certain high school is a
statistics. The mean weight of all students comprising the
population is a parameter, which is estimated by the
sample mean weight of the sophomore students.
Generally, the characteristics of a population are called
parameters, while the characteristics of a sample are
called statistics.

Below are different symbols for parameters


and statistics in most statistical writing:
Characteristics

Parameter

Mean
(mu)
Standard Deviation (sigma)
Variance
2
Proportion
P
Pearson Correlation Coef.
R
Number of Cases
N
n

Statistic
x
s
s2
p
r

Variables
Variable is one of the basic concepts in
statistics.
It
refers
to
observable
characteristics or phenomena of a person
or object whereby the members of the
group or set vary or differ from one
another. A variable is a symbol such as X,
Y, Z, a, b, c, etc. which can assume any
domain of the variable. If the variable can
assume only one value it is called a
constant. (e.g. - )

Discrete and Continuous


Variables
A variable which can be theoretically assume
any value between two given values is called a
continuous variable, otherwise it is called a
discrete variable.
Example: the number of houses in a
community is a discrete variable- it can be measure
any of the values 0, 1, 2, 3, etc. but cannot be 1.5,
3.34, 4.624, etc.
The weight of an individual, which can be 45.3
kg., 50.50 kg., 70.345 kg., etc depending on the
accuracy of measurement, is a continuous variable.
In general, measurement gives rise to
continuous data while enumeration or counting
gives rise to discrete data.

Dependent and
Independent Variables
Variables can be grouped into dependent and
independent variables with respect on their use.
Independent variable is used as predictor if the
objective is to predict the value of one variable on the
basis of the other. Contrary to this, dependent variable
means the variable whose value is predicted. To
illustrate, if we want to predict or foresee the students
academic achievement in mathematics, we may
analyze the different factors such as gender, study
habits, intelligence quotient, interest, attitudes, socioeconomic status and many more. Hence, the
independent variables are gender, study habits,
intelligence quotient, interest, attitudes, and socioeconomic status. On the other hand, the dependent
variable is the student academic achievement in
mathematics.

Uses of Statistics
According to Ary and Jacobs (1976), statistics is a
body of scientific methods for analyzing quantitative
data. Statistics produces two functions: (1) they aid the
scientist in organizing, summarizing, interpreting and
communicating quantitative information obtained from
observations and (2) they allow scientist to extrapolate
the data to reach tentative conclusions about the larger
group from which the smallest group was derived. The
statistical procedure dealing with the first function are
generally called descriptive statistics (gathering,
classification, presentation of data and collection of
summarizing values) while the procedures dealing with
the second function are called inferential statistics
(critical judgement and mathematical methods).

Types of Data
Statistical tools rely on the types of data that are collected.
Among the different types are as follows:
Primary and Secondary Data
Primary data refer to information which are gathered directly
from the original source or which are based on direct or first hand
experience (e.g. autobiographies, diaries, etc.). Secondary data
refer to information which are taken from published or unpublished
data which are previously gathered by other individuals or agencies
(e.g.- books, magazines, newspapers, etc.).
Qualitative and Quantitative Data
Qualitative data are categorized data, which take the form of
categories or attributes (e.g. - sex, year level, religion, etc.). On the
other hand, quantitative data or numerical data are obtained from
measurements (e.g. height, weight, ages, scores, etc.).

Measurement Scales
Qualitative data can be converted to quantitative data through the
process called measurements. By measurements, numbers are utilized to
code objects in order that they can be treated statistically. There are four types
of measurements. They are as follows:
Nominal Measurements. Nominal measurements are used only for
identification or classification purposes. Example: students numbers, names of
books, number of vehicles, etc.
Ordinal Measurements. Ordinal measurements do not only classify items.
They also give the order of classes, items or objects. Example: first runner-up,
second runner-up, third runner-up, etc.
Interval Measurements . In interval measurements, numbers are assigned to
the items or objects. They measure the degree of differences between any two
classes. Example: weight, height, temperature, IQ, test scores, etc.
Ratio Measurements . For ratio measurements, the ratio of the numbers
assigned in the measurements shows the ratio in the amount of property being
measured. Multiplication and division have meanings in ratio measurements.
Example: Boris is 40 years old and Morgana is 20 years old, then their ages
may be expressed in the ratio 2:1 (two is to one).

Sampling Techniques
It is not necessary for the researcher to examine every member of the
population to get data or information about the population. Cost and time
constraints will prohibit one from undertaking a study of the entire population.
Sampling techniques are utilized to test the validity of conclusions or
inferences from the sample of population.
Random Sampling. What is random sampling? Random sampling is a
method of selecting sample size from a population or universe such that each
member of the population has an equal chance of being selected in the sample
and all possible combinations of size have an equal chance of being selected as
the sample.
Stratified Random Sampling. In this method the population is first divided into
groups based on homogeneity in order to avoid possibility of drawing
samples whose members come only from one stratum.
Cluster Sampling. It is the advantageous procedure when the population is
spread out over a wide geographical area. It is also means as a practical
sampling technique used if the complete list of the members of the population is
not available. A cluster refers to an intact group which has a common
characteristics.

Methods Used
Collection of

in the
Data

1. Direct or Interview Method


This is a method of person-to-person exchange between
the interviewer and the interviewee.
The following are the advantage of the direct or interview
method:
1.

It can give complete information needed in the study.

2.

It can yield inaccurate information since the interviewer can


influence the respondents answer through his facial
expressions, tone of voice, or wording of the questions.

3.

The interviewer may cheat by turning in dishonest


responses if their expected or desired responses are not
obtained.

2. Indirect or Questionnaire Method


The questionnaire method is one of the easiest
methods of data gathering. In this method, written
responses are given to prepared questions. A
questionnaire is a list of questions which are
intended to elicit answer to the problems of a study.
It should be attractive, includes illustrations,
pictures, and sketches. Its contents, especially the
directions, must be precise, clear, and selfexplanatory.
3. Registration Method
This method of gathering information is enforced by
certain law. Examples are the registration of births,
deaths, motor vehicles, marriages, and licenses.
The advantage of this method is that information is
kept systematized and made available to all
because of the requirement of the law.

4. Observation Method
Observation method is utilized to gather
data regarding attitudes, behavior, values,
and cultural patterns of the sample under
study. It is usually used when the subjects
cannot talk or write.
5. Experiment Method
An experiment is applied to collect data if
the investigator wants to control the factors
affecting the variable being studied.

Methods of Presenting
Data
Collected data are useless and
invalid if they are not presented
effectively
for
analyses
and
interpretations. Data are presented in
four general methods: [1] textural
method, [2] tabular method, [3] semitabular method, and [4] graphical
method or presentation.

Frequency Distribution
When the researcher gathers all
the needed data, the next task is to
organize and present them with the
use of appropriate tables and graphs.
Frequency distribution is one system
used to facilitate the description of
important features of the data.

Class Interval or Class Limits - refers to the grouping defined by


a lower limit and an upper limit.
Class Boundaries if heights are recorded to the nearest inch,
the class interval 60 62 theoretically includes all measurements
from 59.5000 to 62.5000 in. These numbers, indicated briefly by
the exact numbers 59.5 and 62.5, are class boundaries, or the
true class limits; the smaller number [59.5] is the lower class
boundary, and the larger number [62.5] is the upper class
boundary.
Class Mark - is the midpoint or middle of a class interval.
Example: it is obtained by finding the average of the lower class
limit and the upper class limit. The class mark of the class limit
5 9 is [5 + 9]/2 or 7.
Class Size refers to the difference between the upper class
boundary and the lower class boundary of a class interval.
Class Frequency - means the number of observation belonging
to a class interval.

Graphical Presentation of
Data
Histogram - is made up of vertical bars that are joined together, making
an appropriate graph for continuous data. The base of each bar or
rectangle is equal to the class boundaries, wherein height
corresponding to its class frequency.
Frequency Polygon is commonly called linear graph. It is very useful
device to show changes in values over successive periods of time.
An advantage of the frequency distribution is that it can be used to
compare two or more distributions graphically on one pair of axes.
Bar Graph is used to represent discrete data, where the bars are
separated. The length of each bar is arbitrary. However, the bars
must be of the same width. Thus, the bar graph is almost like as the
histogram, the only difference is that the bars of the histogram are
joined.
Pie Diagram or Pie Chart is used to show percentage distribution. It is
made up a circle subdivided into sectors proportional in size to the
quantities or percentages they represent.

Types of Frequency Curves


1. The symmetrical or bell-shaped frequency curves, frequency curves
are characterized by the fact that observations equidistant from the
central maximum have the same frequency. An important example is
the normal curve.
2. In J-shaped and reversed J-shaped frequency curves, a maximum
occurs at the end.
3. In the moderately asymmetrical or skewed frequency curves, the tail
of the curve to one side of the central maximum is longer than that to
the other. If the longer tail occurs to the right, the curve is said to be
skewed to the right or have positive skewness, while if the reverse is
true, the curve is said to be skewed to the left or have negative
skewness.
4. A U-shaped frequency curve has maxima at both ends.
5. A bimodal frequency curve has two maxima.
6. A multi-modal frequency curve has more than two maxima.

Illustration:

Symmetrical
Or Bell-shaped

Reversed J-shaped

Skewed to the right


(positive Skewness)

U-shaped

Skewed to the left


(negative Skewness)

Bimodal

J-shaped

Multi-modal