You are on page 1of 42

Chapter 2

Basic Concepts in Statistics

Chapter Goals

Create an initial image of the field of


statistics.
Brief history of Statistics
Know the branches of Statistics.
Introduce several basic vocabulary words
used in studying statistics: population,
variable, statistic.

What is Statistics?

STATISTICS
The science of collecting, describing, and interpreting
data.
Can give an instant overall picture of data based on
graphical presentation or numerical summarization
irrespective to the number of data points.
It is the methodology which scientists and
mathematicians have developed for interpreting and
drawing conclusions from collected data.

It is clear that statistics is much more than


just the tabulation of numbers and the
graphical presentation of these tabulated
numbers. Statistics is also the science of
gaining information from numerical and
categorical data. Statistical methods can be
used to find answers to the questions like:

What kind and how much data need to be collected?


How should we organize and summarize the data?
How can we analyze the data and draw conclusions
from it?
How can we assess the strength of the conclusions
and evaluate their uncertainty?

That is, statistics provides methods for :


1. Design: Planning and carrying out research
studies.
2. Description: Summarizing and exploring data.
3. Inference: Making predictions and generalizing
about phenomena represented by the data.

Brief History of
Statistics

ancient chief
trained warriors
taxes
kingdom

Keywords

17th to 18th century


Mathematicians were asked by gamblers to develop
principles that would improve more chances of
winning.
MATHEMATICIAN!!!
Bernoulli and De Moir
> probability
De Moir - developed the equation for the normal
curve.

19th century
DURING 19th CENTURY
La Place and Gauss
> probability principles to astronomy
*EARLY 19th century
Quetelet - Belgian statistician
> investigation of social and educational
problems
> statistical theory on a general method of
research to sciences.

Social sciences
Heredity
Eugenies
psychology
anthropometry
statistics
measurement between two variables
centiles and percentiles

Francis Galton

Correlation and regression

PEARSON - GALTON

psychologists
Europe in 1880
applied agriculture and biological setting
(E L thorndike)

JAMES MCKEEN CATELL

In 20th century
R.A. Fisher
Applied in agriculture and biological
setting.
The data can be classified into two
types

* Continouos
. can be made into measurement of varying degress of
precison.
(e.g. 1 yard equal 3 feet)
* Discontinouos/ discrete data
measurement expressed in whoke units
( e g. number of object)
According to stevens

Type of Measurement

Measurement of Scale
4 types of measurement
1. Nominal scale -used as measures of identity
e.g. Individuals into categories : yes or no, M and F.
2. Ordinal scales -used in measurement like handling of
individual object.
e.g. harder or softer, cold or hot

3. Interval scale- numbers that reflect differences among


other items.
e.g. score in a test, blood pressure, ages, number of
students
4. Ratio scales- measure of length weight, loudness,
softness, width and so on
highest types of scale

Terminologies
Concepts of Statistics

Population- is the collection of all


individuals or items under consideration in a
statistical study. Two kinds of populations:
finite (countable) or infinite. (uncountable)
Sample-is that part of the population from
which information is collected.

Data- are the facts and figures collected,


summarized, analyzed and interpreted.
The data collected in a particular study
are referred to as data set.

Variable - any characteristic of an individual or entity.


A variable can take different values for different
individuals. Variables can be

Interval - Values of the variable are ordered as in


Ordinal, and additionally, differences between values are
meaningful, however, the scale is not absolutely anchored.
Calendar dates and temperatures on the Fahrenheit scale are
examples. Addition and subtraction, but not multiplication
and division are meaningful operations.
Ratio - Variables with all properties of Interval plus an
absolute, non-arbitrary zero point, e.g. age, weight,
temperature (Kelvin). Addition, subtraction, multiplication,
and division are all meaningful operations.

TYPES OF STATISTICS

Descriptive Statistics

is concerned with summary calculations,


graphs, charts, and tables, this is also a set of
methods to describe data that we have collected.

Of 350 randomly selected people in the town of Luserna,


Italy, 280 people had the last name Nicolussi. An example
of descriptive statistics is the following statement :
"80% of these people have the last name Nicolussi."
This is a descriptive statement because they can actually be
verified from the information provided.

Inferential Statistics
is a method used to generalize from a sample to a
population.. For example, the average income of all
families(the population) in India can be estimated from
figures obtained from a few hundred(the sample)
families.
This is also a set of methods used to make a
generalization, estimate, prediction or decision.
The major use of inferential statistics is to use
information from a sample to infer something about
a population.

Of 350 randomly selected people in the town of Luserna,


Italy, 280 people had the last name Nicolussi. An example
of inferential statistics is the following statement :
"80% of all people living in Italy have the last name
Nicolussi.
We have no information about all people living in Italy, just
about the 350 living in Luserna. We have taken that
information and generalized it to talk about all people living
in Italy. The easiest way to tell that this statement is not
descriptive is by trying to verify it based upon the
information provided.

VARIABLES

Variable - any characteristic of an individual or entity. A


variable can take different values for different individuals.
-any characteristics number or quantity that can be
measured/ counted.
Variables can be
categorical variables- have values that describe a quality or
characteristics of a data unit like what typeor which
category
Nominal - Categorical variables with no inherent order or ranking
sequence such as names or classes (e.g., gender). Value may be a
numerical, but without numerical value (e.g., I, II, III). The only
operation that can be applied to Nominal variables is enumeration.

Ordinal - Variables with an inherent rank or order, e.g. mild,


moderate, severe. Can be compared for equality, or greater or less,
but not how much greater or less.

NUMERIC OR QUANTITATIVE- have values that


describe a measurable quantity as a number like how
many or how much
continuous variable- observation can take any value
between a certain set of a real number
discrete variable- observation can take a value based on
a count from a set of distinct whole values.

DATA

Two types of statistical presentation of data - graphical


and numerical.
Graphical Presentation: We look for the overall pattern
and for striking deviations from that pattern. Over all
pattern usually described by shape, center, and spread of
the data.
Bar diagram and Pie charts are used for categorical
variables.
Histogram is used for numerical variable.

Graphical
Presentation

Bar Diagram: Lists the categories and presents the


percent or count of individuals who fall in each category.

Categorical Variable

Pie Chart: Lists the categories and presents the percent or


count of individuals who fall in each category.

A fundamental concept in summary statistics is that of a


central value for a set of observations and the extent to
which the central value characterizes the whole set of
data. Measures of central value such as the mean or
median must be coupled with measures of data
dispersion (e.g., average distance from the mean) to
indicate how well the central value characterizes the
data as a whole.
To understand how well a central value characterizes a set of observations,
let us consider the following two sets of data:
A: 30, 50, 70
B: 40, 50, 60
The mean of both two data sets is 50. But, the distance of the observations
from the mean in data set A is larger than in the data set B. Thus, the mean
of data set B is a better representation of the data set than is the case for
set A.

Numerical
Presentation

Histogram: Overall pattern can be described by its shape,


center, and spread. The following age distribution is right
skewed. The center lies between 80 to 100. No outliers.

Numerical Variable

THANK YOU

You might also like