To Statistics To Statistics: Objectives

Introduction to Statistics
STATISTICAL ANALYSIS
Introduction
to
Statistics
Page 1
Objectives
To define statistics
To discuss the wide range of
applications of statistics
To discuss key statistical concepts
To understand the branches of
statistics
To describe the levels of
measurement of data
Page 2
What
is Statistics?
Science of collecting, organizing,

presenting, analyzing, and interpreting
data for the purpose of assisting in making
more effective decision
Branch of mathematics
Facts and figures
A subject or discipline
Collections of data
Page 3
YHJ@SOM, USM
Page - 1
Applications of Statistics
Statistical techniques are used in a wide range
of types of scientific and social research,
including: Biostatistics, Computational
biology, Computational sociology, Network
biology, Social science, Sociology and Social
research.
Some fields of inquiry use applied statistics so

extensively that they have specialized terminology.
These disciplines include:
Actuarial science
Applied information economics
Biostatistics
Business statistics
Chemometrics (for analysis of data from chemistry)
Data mining
Demography
Econometrics
Energy statistics
Engineering statistics
Epidemiology
Geography and Geographic Information Systems, specifically in Spatial analysis
Image processing
Psychological statistics
Reliability engineering
Social statistics
Engineering statistics is a branch of statistics that has

several subtopics which are particular to engineering:
Design of Experiments (DOE) uses statistical techniques to test
and construct models of engineering components and systems.
Quality control and process control use statistics as a tool to
manage conformance to specifications of manufacturing
processes and their products.
Time and methods engineering use statistics to study repetitive
operations in manufacturing in order to set standards and find
optimum (in some sense) manufacturing procedures.
Reliability engineering which measures the ability of a system to
perform for its intended function (and time) and has tools for
improving performance.
Probabilistic design involving the use of probability in product and
system design
Page
6
YHJ@SOM, USM
Page - 2
Applications of Statistics in Business

Accounting auditing and cost estimation
Finance investments and portfolio management
Human resource compensation, job satisfaction,
performance measure
Operation quality management, forecasting,
MIS, capacity planning, materials control
Marketing - market analysis, consumer research,
pricing
Economics regional, national, and international
economic performance
International Business- market and demographic
analysis.
Page 7
Key Statistical Concepts

Population
a population is
Sample
A sample is
Page 8

Parameter
Statistic
Page 9
YHJ@SOM, USM
Page - 3

Population
Sample
Subset
Statistic
Parameter
Populations have Parameters,

Samples have Statistics.
Page 10
Branches of Statistics
Statistics
Descriptive Statistics
Parametric Statistics
Inferential Statistics
Non-Parametric Statistics
Page 11
are methods of organizing, summarizing, and

presenting data in a convenient and informative
way. These methods include:

Graphical Techniques
Numerical Techniques
The actual method used depends on what

information we would like to extract. Are we
interested in
measure(s) of central location? and/or
measure(s) of variability (dispersion)?
Page 12
YHJ@SOM, USM
Page - 4
Inferential statistics is also a set of methods, but

it is used to draw conclusions or inferences
about characteristics of populations based on
data from a sample.
Page 13
Statistical Inference
Statistical inference is the process of making an
estimate, prediction, or decision about a
population based on a sample.
Population
Sample
Inference
Statistic
Parameter
What can we infer about a Populations Parameters

based on a Samples Statistics?
Page 14
We use statistics to make inferences about
parameters.
Therefore, we can make an estimate,
prediction, or decision about a population
based on sample data.
Thus, we can apply what we know about a
sample to the larger population from which
it was drawn!
Page 15
YHJ@SOM, USM
Page - 5
Rationale:
Large populations make investigating each
member impractical and expensive.
Easier and cheaper to take a sample and make
estimates about the population from the
sample.
However:
Such conclusions and estimates are not
always going to be correct.
For this reason, we build into the statistical
inference measures of reliability, namely
confidence level and significance level.
Page 16
Confidence & Significance Levels

The confidence level is the proportion of times
that an estimating procedure will be correct.
E.g. a confidence level of 95% means that,
estimates based on this form of statistical
inference will be correct 95% of the time.
When the purpose of the statistical inference is
to draw a conclusion about a population, the
significance level measures how frequently the
conclusion will be wrong in the long run.
E.g. a 5% significance level means that, in the
long run, this type of conclusion will be wrong
5% of the time.
Page 17
Process of Inferential Statistics

Calculate x
Population
to estimate
Sample
(parameter)
(statistic )
Select a
random sample
Page 18
YHJ@SOM, USM
Page - 6
Branches of Statistics
Statistics
Non-Parametric Statistics
Page
19
Parametric statistics is a branch of
statistics that assumes data come
from a type of probability
distribution and makes inferences
about the parameters of the
distribution. Most well-known
elementary statistical methods are
parametric.
Page
20
Non-parametric statistics
distribution free methods which do not rely on
assumptions that the data are drawn from a

given probability distribution.
non-parametric statistic can refer to a statistic (a
function on a sample) whose interpretation
does not depend on the population fitting any
parametrized distributions.
Non-parametric models differ from parametric
models in that the model structure is not
specified a priori but is instead determined from
data.
Page
21
YHJ@SOM, USM
Page - 7
The most frequently used tests include:

AndersonDarling test Kuiper's test
Cliff's delta
MannWhitney U or

Wilcoxon rank sum test

Cochran's Q
median test
Cohen's kappa
Pitman's permutation test
EfronPetrosian test Rank products
Friedman two-way analysis of variance by ranks
SiegelTukey test
Kendall's tau
Kendall's W
WaldWolfowitz runs test
KolmogorovSmirnov test
Kruskal-Wallis one-way analysis of variance by ranks
Spearman's rank correlation coefficient
Van Elteren stratified Wilcoxon rank sum test
Wilcoxon signed-rank test.
Page
22
Types of Data & Information

Data (at least for purposes of Statistics) fall
into three main groups:
Interval Data
Nominal Data
Ordinal Data
Page 23
Interval Data
Interval data
Real numbers, i.e. heights, weights,
prices, etc.
Also referred to as quantitative or
numerical.
Arithmetic operations can be performed on
Interval Data, thus its meaningful to talk
about 2*Height, or Price + $1, and so on.
Page 24
YHJ@SOM, USM
Page - 8
Nominal Data
Nominal Data
The values of nominal data are categories.
E.g. responses to questions about marital status,
coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
Because the numbers are arbitrary, arithmetic
operations dont make any sense (e.g. does
Widowed 2 = Married?!)
Nominal data are also called qualitative or
categorical.
Page 25
Ordinal Data
Ordinal Data appear to be categorical in nature,
but their values have an order
order;; a ranking to them:
E.g. College course rating system:
poor = 1, fair = 2, good = 3, very good = 4,
excellent = 5
While its still not meaningful to do arithmetic on
this data (e.g. does 2*fair = very good?!), we can
say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what
numeric values are assigned to each category.
Page 26
Calculations for Types of Data

As mentioned above,
All calculations are permitted on interval
data.
Only calculations involving a ranking
process are allowed for ordinal data.
No calculations are allowed for nominal
data, only counting the number of
observations in each category is possible.
This lends itself to the following hierarchy

of data
Page 27
YHJ@SOM, USM
Page - 9
Hierarchy of Data
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.
Ordinal
Values must represent the ranked order of the data.
Calculations based on an ordering process are valid.
Data may be treated as nominal but not as interval.
Nominal
Values are the arbitrary numbers that represent
categories.
Only calculations based on the frequencies of
occurrence are valid.
Data may not be treated as ordinal or interval.
Page 28
End of discussion
Page
29
YHJ@SOM, USM
Page - 10

To Statistics To Statistics: Objectives

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

To Statistics To Statistics: Objectives

Uploaded by

Copyright:

Available Formats

Introduction to Statistics

Science of collecting, organizing,

Some fields of inquiry use applied statistics so

Engineering statistics is a branch of statistics that has

Applications of Statistics in Business

Key Statistical Concepts

Key Statistical Concepts

Key Statistical Concepts

Populations have Parameters,

are methods of organizing, summarizing, and

The actual method used depends on what

measure(s) of central location? and/or

measure(s) of variability (dispersion)?

Inferential statistics is also a set of methods, but

What can we infer about a Populations Parameters

Confidence & Significance Levels

Process of Inferential Statistics

assumptions that the data are drawn from a

The most frequently used tests include:

Wilcoxon rank sum test

Types of Data & Information

Calculations for Types of Data

This lends itself to the following hierarchy

You might also like