Professional Documents
Culture Documents
STATISTICS
OBJECTIVE OF THE SESSION
• At the end of the session will be able
appreciate the basic statistical concepts
and its applicability
• Descriptive Statistics
• Inferential Statistics
• Statistical Sampling
WHAT IS STATISTICS?
• Statistics is defined as
collection,compilation,analysis
and interpretation of
numerical data
• Statistics is the science of data
TYPES OF STATISTICS
Descriptive Statistics
• Used to describe basic features of the data
• Just describes and summarizes data
Types of Descriptive Statistics
Measures of Dispersion
Measures of Central Tendency
• Also known as Measures of Location
• Gives an overview of the entire data set
• Method to describe what is typical for a group
of data
Measures of Central Tendency
• MEAN - average
• MEDIAN- middle value
• MODE- which data occurs the most
Measures of Central Tendency
Measures of Dispersion
• Shows how the data is dispersed
It is useful in comparing sets of data which may have the same mean but a different
range. For example, the mean of the following two is the same: 15, 15, 15, 14, 16
and 2, 7, 14, 22, 30. However, the second is clearly more spread out. If a set has a
low standard deviation, the values are not spread out too much.
Inferential Statistics
helps to suggest explanations for a situation or phenomenon. It allows
you to draw conclusions based on extrapolations, and is in that way
fundamentally different from descriptive statistics that merely
summarize the data that has actually been measured.
Types of Inferential Statistics
• Confidence Interval.
• Contingency Tables and Chi Square Statistic.
• T-test or Anova.
• Pearson Correlation.
• Bi-variate Regression.
• Multi-variate Regression.
Confidence Interval
displays the probability that a parameter will fall between a pair of
values around the mean. Confidence intervals measure the degree of
uncertainty or certainty in a sampling method. They are most often
constructed using confidence levels of 95% or 99%
Chi-square test
A chi-square (χ2) statistic is a test that measures how a model compares
to actual observed data. The data used in calculating a chi-
square statistic must be random, raw, mutually exclusive, drawn from
independent variables, and drawn from a large enough sample. For
example, the results of tossing a fair coin meet these criteria.
T-test or Anova
is a method that determines whether two populations are statistically
different from each other, whereas ANOVA determines whether three or
more populations are statistically different from each other. Both of
them look at the difference in means and the spread of the distributions
(i.e., variance) across groups; however, the ways that they determine the
statistical significance are different.
T-test or Anova
is a method that determines whether two populations are statistically
different from each other, whereas ANOVA determines whether three or
more populations are statistically different from each other. Both of
them look at the difference in means and the spread of the distributions
(i.e., variance) across groups; however, the ways that they determine the
statistical significance are different.
Pearson correlation
is the test statistics that measures the statistical relationship, or
association, between two continuous variables. It is known as the best
method of measuring the association between variables of interest
because it is based on the method of covariance.
Bi-variate Regression
is a simple linear regression model which is used to predict one variable
(referred to as the outcome, criterion, or dependent variable) from one
other variable (referred to as the predictor or independent variable)
Multi-variate Regression
an extension of multiple regression with one dependent variable and
multiple independent variables. Based on the number of independent
variables, we try to predict the output.
BASIC TERMS
• Measurement : assignment of numbers to
something
• Data collection of measurements
• Sample:collected data
• Population: all possible data
• Variable: property with respect to which data
from a sample differ in some measurable way
POPULATION &
SAMPLE
POPULATION
• The set of data ( numerical or otherwise)
corresponding to the entire collection of units about
which information is sought.
• The collection of all outcomes, response
measurements, or counts that are of interest.
• The totality of all the elements or persons for which
one has an interest at a particular time it is donated
by. N
EXAMPLE OF POPULATION