You are on page 1of 10

Introduction to Statistics

STATISTICAL ANALYSIS

Introduction
to
Statistics
Page 1

Objectives
 To define statistics
 To discuss the wide range of
applications of statistics
 To discuss key statistical concepts
 To understand the branches of
statistics
 To describe the levels of
measurement of data
Page 2

What

is Statistics?

 Science of collecting, organizing,


presenting, analyzing, and interpreting
data for the purpose of assisting in making
more effective decision
 Branch of mathematics
 Facts and figures
 A subject or discipline
 Collections of data

Page 3

YHJ@SOM, USM

Page - 1

Introduction to Statistics

STATISTICAL ANALYSIS

Applications of Statistics
Statistical techniques are used in a wide range
of types of scientific and social research,
including: Biostatistics, Computational
biology, Computational sociology, Network
biology, Social science, Sociology and Social
research.

Some fields of inquiry use applied statistics so


extensively that they have specialized terminology.
These disciplines include:
Actuarial science
Applied information economics
Biostatistics
Business statistics
Chemometrics (for analysis of data from chemistry)
Data mining
Demography
Econometrics
Energy statistics
Engineering statistics
Epidemiology
Geography and Geographic Information Systems, specifically in Spatial analysis
Image processing
Psychological statistics
Reliability engineering
Social statistics

Engineering statistics is a branch of statistics that has


several subtopics which are particular to engineering:
 Design of Experiments (DOE) uses statistical techniques to test
and construct models of engineering components and systems.
 Quality control and process control use statistics as a tool to
manage conformance to specifications of manufacturing
processes and their products.
 Time and methods engineering use statistics to study repetitive
operations in manufacturing in order to set standards and find
optimum (in some sense) manufacturing procedures.
 Reliability engineering which measures the ability of a system to
perform for its intended function (and time) and has tools for
improving performance.
 Probabilistic design involving the use of probability in product and
system design
Page
6

YHJ@SOM, USM

Page - 2

Introduction to Statistics

STATISTICAL ANALYSIS

Applications of Statistics in Business


 Accounting auditing and cost estimation
 Finance investments and portfolio management
 Human resource compensation, job satisfaction,
performance measure
 Operation quality management, forecasting,
MIS, capacity planning, materials control
 Marketing - market analysis, consumer research,
pricing
 Economics regional, national, and international
economic performance
 International Business- market and demographic
analysis.
Page 7

Key Statistical Concepts




Population
a population is

Sample
A sample is

Page 8

Key Statistical Concepts




Parameter

Statistic

Page 9

YHJ@SOM, USM

Page - 3

Introduction to Statistics

STATISTICAL ANALYSIS

Key Statistical Concepts


Population

Sample

Subset

Statistic

Parameter


Populations have Parameters,


 Samples have Statistics.
Page 10

Branches of Statistics
Statistics
Descriptive Statistics

Parametric Statistics

Inferential Statistics

Non-Parametric Statistics

Page 11

Descriptive Statistics


are methods of organizing, summarizing, and


presenting data in a convenient and informative
way. These methods include:



Graphical Techniques
Numerical Techniques

The actual method used depends on what


information we would like to extract. Are we
interested in


measure(s) of central location? and/or

measure(s) of variability (dispersion)?

Page 12

YHJ@SOM, USM

Page - 4

Introduction to Statistics

STATISTICAL ANALYSIS

Inferential Statistics


Inferential statistics is also a set of methods, but


it is used to draw conclusions or inferences
about characteristics of populations based on
data from a sample.

Page 13

Statistical Inference
Statistical inference is the process of making an
estimate, prediction, or decision about a
population based on a sample.
Population

Sample
Inference

Statistic
Parameter

What can we infer about a Populations Parameters


based on a Samples Statistics?
Page 14

Statistical Inference
We use statistics to make inferences about
parameters.
Therefore, we can make an estimate,
prediction, or decision about a population
based on sample data.
Thus, we can apply what we know about a
sample to the larger population from which
it was drawn!
Page 15

YHJ@SOM, USM

Page - 5

Introduction to Statistics

STATISTICAL ANALYSIS

Statistical Inference


Rationale:
Large populations make investigating each
member impractical and expensive.
Easier and cheaper to take a sample and make
estimates about the population from the
sample.

However:
Such conclusions and estimates are not
always going to be correct.
For this reason, we build into the statistical
inference measures of reliability, namely
confidence level and significance level.
Page 16

Confidence & Significance Levels


The confidence level is the proportion of times
that an estimating procedure will be correct.
E.g. a confidence level of 95% means that,
estimates based on this form of statistical
inference will be correct 95% of the time.
When the purpose of the statistical inference is
to draw a conclusion about a population, the
significance level measures how frequently the
conclusion will be wrong in the long run.
E.g. a 5% significance level means that, in the
long run, this type of conclusion will be wrong
5% of the time.
Page 17

Process of Inferential Statistics


Calculate x
Population

to estimate

Sample

(parameter)

(statistic )

Select a
random sample
Page 18

YHJ@SOM, USM

Page - 6

Introduction to Statistics

STATISTICAL ANALYSIS

Branches of Statistics
Statistics
Descriptive Statistics

Parametric Statistics

Inferential Statistics

Non-Parametric Statistics

Page
19

Parametric Statistics
Parametric statistics is a branch of
statistics that assumes data come
from a type of probability
distribution and makes inferences
about the parameters of the
distribution. Most well-known
elementary statistical methods are
parametric.
Page
20

Non-parametric statistics
 distribution free methods which do not rely on

assumptions that the data are drawn from a


given probability distribution.
 non-parametric statistic can refer to a statistic (a
function on a sample) whose interpretation
does not depend on the population fitting any
parametrized distributions.
 Non-parametric models differ from parametric
models in that the model structure is not
specified a priori but is instead determined from
data.
Page
21

YHJ@SOM, USM

Page - 7

Introduction to Statistics

STATISTICAL ANALYSIS

The most frequently used tests include:


 AndersonDarling test Kuiper's test
 Cliff's delta
MannWhitney U or












Wilcoxon rank sum test


Cochran's Q
median test
Cohen's kappa
Pitman's permutation test
EfronPetrosian test Rank products
Friedman two-way analysis of variance by ranks
SiegelTukey test
Kendall's tau
Kendall's W
WaldWolfowitz runs test
KolmogorovSmirnov test
Kruskal-Wallis one-way analysis of variance by ranks
Spearman's rank correlation coefficient
Van Elteren stratified Wilcoxon rank sum test
Wilcoxon signed-rank test.

Page
22

Types of Data & Information


Data (at least for purposes of Statistics) fall
into three main groups:


Interval Data


Nominal Data


Ordinal Data

Page 23

Interval Data
Interval data
Real numbers, i.e. heights, weights,
prices, etc.
Also referred to as quantitative or
numerical.
Arithmetic operations can be performed on
Interval Data, thus its meaningful to talk
about 2*Height, or Price + $1, and so on.
Page 24

YHJ@SOM, USM

Page - 8

Introduction to Statistics

STATISTICAL ANALYSIS

Nominal Data
Nominal Data
The values of nominal data are categories.
E.g. responses to questions about marital status,
coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
Because the numbers are arbitrary, arithmetic
operations dont make any sense (e.g. does
Widowed 2 = Married?!)
Nominal data are also called qualitative or

categorical.
Page 25

Ordinal Data
Ordinal Data appear to be categorical in nature,
but their values have an order
order;; a ranking to them:
E.g. College course rating system:
poor = 1, fair = 2, good = 3, very good = 4,
excellent = 5
While its still not meaningful to do arithmetic on
this data (e.g. does 2*fair = very good?!), we can
say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what
numeric values are assigned to each category.
Page 26

Calculations for Types of Data


As mentioned above,
All calculations are permitted on interval
data.
Only calculations involving a ranking
process are allowed for ordinal data.
No calculations are allowed for nominal
data, only counting the number of
observations in each category is possible.

This lends itself to the following hierarchy


of data
Page 27

YHJ@SOM, USM

Page - 9

Introduction to Statistics

STATISTICAL ANALYSIS

Hierarchy of Data
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.
Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.
Nominal
Values are the arbitrary numbers that represent
categories.
Only calculations based on the frequencies of
occurrence are valid.
Data may not be treated as ordinal or interval.
Page 28

End of discussion

Page
29

YHJ@SOM, USM

Page - 10

You might also like