You are on page 1of 43

INTRODUCTION TO

STATISTICS
OBJECTIVE OF THE SESSION
• At the end of the session will be able
appreciate the basic statistical concepts
and its applicability
• Descriptive Statistics
• Inferential Statistics
• Statistical Sampling
WHAT IS STATISTICS?
• Statistics is defined as
collection,compilation,analysis
and interpretation of
numerical data
• Statistics is the science of data
TYPES OF STATISTICS
Descriptive Statistics
• Used to describe basic features of the data
• Just describes and summarizes data
Types of Descriptive Statistics

Measures of Central Tendency

Measures of Dispersion
Measures of Central Tendency
• Also known as Measures of Location
• Gives an overview of the entire data set
• Method to describe what is typical for a group
of data
Measures of Central Tendency
• MEAN - average
• MEDIAN- middle value
• MODE- which data occurs the most
Measures of Central Tendency
Measures of Dispersion
• Shows how the data is dispersed

• Useful when we want to find the relation between the set of


data
Measures of Dispersion
• Range – the difference between the largest and smallest
value in a data set
Measures of Dispersion
Standard Deviation- is a statistic that measures the dispersion of a
dataset relative to its mean

It is useful in comparing sets of data which may have the same mean but a different
range. For example, the mean of the following two is the same: 15, 15, 15, 14, 16
and 2, 7, 14, 22, 30. However, the second is clearly more spread out. If a set has a
low standard deviation, the values are not spread out too much.
Inferential Statistics
helps to suggest explanations for a situation or phenomenon. It allows
you to draw conclusions based on extrapolations, and is in that way
fundamentally different from descriptive statistics that merely
summarize the data that has actually been measured. 
Types of Inferential Statistics
• Confidence Interval.
• Contingency Tables and Chi Square Statistic.
• T-test or Anova.
• Pearson Correlation.
• Bi-variate Regression.
• Multi-variate Regression.
Confidence Interval
displays the probability that a parameter will fall between a pair of
values around the mean. Confidence intervals measure the degree of
uncertainty or certainty in a sampling method. They are most often
constructed using confidence levels of 95% or 99%
Chi-square test
A chi-square (χ2) statistic is a test that measures how a model compares
to actual observed data. The data used in calculating a chi-
square statistic must be random, raw, mutually exclusive, drawn from
independent variables, and drawn from a large enough sample. For
example, the results of tossing a fair coin meet these criteria.
T-test or Anova
is a method that determines whether two populations are statistically
different from each other, whereas ANOVA determines whether three or
more populations are statistically different from each other. Both of
them look at the difference in means and the spread of the distributions
(i.e., variance) across groups; however, the ways that they determine the
statistical significance are different.
T-test or Anova
is a method that determines whether two populations are statistically
different from each other, whereas ANOVA determines whether three or
more populations are statistically different from each other. Both of
them look at the difference in means and the spread of the distributions
(i.e., variance) across groups; however, the ways that they determine the
statistical significance are different.
Pearson correlation
is the test statistics that measures the statistical relationship, or
association, between two continuous variables. It is known as the best
method of measuring the association between variables of interest
because it is based on the method of covariance.
Bi-variate Regression
is a simple linear regression model which is used to predict one variable
(referred to as the outcome, criterion, or dependent variable) from one
other variable (referred to as the predictor or independent variable)
Multi-variate Regression
an extension of multiple regression with one dependent variable and
multiple independent variables. Based on the number of independent
variables, we try to predict the output.
BASIC TERMS
• Measurement : assignment of numbers to
something
• Data collection of measurements
• Sample:collected data
• Population: all possible data
• Variable: property with respect to which data
from a sample differ in some measurable way
POPULATION &
SAMPLE
POPULATION
• The set of data ( numerical or otherwise)
corresponding to the entire collection of units about
which information is sought.
• The collection of all outcomes, response
measurements, or counts that are of interest.
• The totality of all the elements or persons for which
one has an interest at a particular time it is donated
by. N
EXAMPLE OF POPULATION

• The faculty members of LPU-Batangas


• The athletes of LPU-Batangas
• Facebook users worldwide
SAMPLE
•It is a subset of a
population it is denoted by
n.
WHY SAMPLING?
•In the most studies it is difficult to obtain
information from the entrie population
because of various reasons.we rely on
samples to make estimates or inferences
related to the population.
EXAMPLE OF SAMPLE

• Selected Student of LPU-Batangas


• Selected employees in LPU-Batangas
• 10 selected babies born in Batangas every
day
EXAMPLE OF POPULATION & SAMPLE

• Identify whether each statement refers to a population


(P) or sample (S).
1. A group of 25 students selected to test a new teaching
technique.
2. The total machines produced by a factory in one
week.
PARAMETER &
STATISTIC
PARAMETER
• Any statistical information or attribute taken
from a population.
• Measured characteristic of a population
• A numerical measurement describing
some characteristic of a population
STATISTIC
• Any estimate of statistical attributes taken from a
sample.
• Measured characteristic of SAMPLE
• A numerical measurement describing some
characteristic of sample
EXAMPLE OF PARAMETER & STATISTIC

• WHICH IS A PARAMETER AND WHICH IS A


STATISTIC
1. In 1936,Literary Digest polled 2.3 million adults in the United
States ,and 57%said that they would vote for all landon for
the presidency
2. There are exactly 100 Senators in the 109th Congress of the United
States, and 55 of them are Republicans
POPULATIONS AND PARAMETERS

• Population is all elements(can be people or Things)


whose characteristics are being studied. Also called
“Target Population”
• A value ( Average, total etc.) that is calculated from
the entire population is called a Parameter.
Samples and Statistics
• A sample is a subset of the population
that is selected for study.
• A value that is calculated from a sample is
called a statistic.
STATISTICAL INFERENCE

• Drawing Conclusions ( Inferences) about


a population based on an examination of
a Sample taken from the population
Classification of Variables
• Qualitative Variables
• Use categorical or qualitative responses it refers to the
attributes or characteristics of the samples .
• Quantitative Variables
Yield Numerical responses representing an amount or quantity examples hide way
number children numerical data gathered are either discrete or continuous
discrete quantitative variables assume finite or countable and finite values such as
0,1,2,3,4 . Continues quantity variables cannot take on finite values but the values
are related or associated with points on an interval of the real line
Levels of Measurement
• 1. Nominal Level - Is the crudest form of measurement. The numbers
of symbols are used for the purpose of categorizing forms into groups
.The categories are mutually exclusive, that is being in one category
automatically excludes another .

• 2. Ordinal Level - Is a sort of improvement of nominal level. Data are


ranked from “bottom to top” or “ low to high “ manner. Statements of
the kind “ greater than or less than “ may be made here .
Levels of Measurement
• Interval Level - Process is the properties of the nominal and ordinal
levels the distances between any two numbers on the scale are known
and it does not have a stable starting point (or an absolute zero ).
• Ratio Level – Possesses all the properties of the nominal ordinal and
interval levels. In addition this has an absolute zero point. Data can be
classified and placed in a proper order. We can compare the magnitudes
of these data .
Thank you
• Group I
• Gerarld Alphonsus Malabanan
• Lizel C. Morcilla
• KingRoy L. Hernandez

You might also like