You are on page 1of 64

Introduction to Statistics

Outlines
 What is Statistics
 Types of Statistics
 Application of Statistics
 Importance of Statistics to researchers and engineers
 Difference between Parameters and Statistics
 Descriptive and Inferential Statistics
 Qualitative and Quantitative data
Outlines
 Collection and Presentation of Data
 Sources of Data
 Means of Collecting Data
 Methods of Presentation of Data
 Population and Sampling
 Kinds of Population
 Sampling
 When is Sampling appropriate
 Basic Principle governing random sampling
Introduction to Statistics
Meaning of Statistics:
Statistics is concerned with scientific methods for
collecting, organising, summarising, presenting and analysing data
as well as deriving valid conclusions and making reasonable
decisions on the basis of this analysis. Statistics is concerned with
the systematic collection of numerical data and its interpretation.
Application of Statistics:
• Mathematics
• Business
• Economics
• Country’s Administration
• Astronomy
• Banking
• Accounting and Auditing
• Natural and Social Science
Application of Statistics:
• Actuarial science • Population ecology
• Astrostatistics • Psychometrics
• Biostatistics • Quality control
• Business analytics • Quantitative psychology
• Chemometrics • Reliability engineering
• Demography • Statistical finance
• Econometrics • Statistical mechanics
• Environmental statistics • Statistical physics
• Epidemiology • Statistical signal processing
• Geostatistics • Statistical thermodynamics
• Machine learning
• Operations research
Importance of Statistics to researchers and
engineers
• Use of statistics will guide researchers in research for proper
characterization, summarization, presentation and interpretation
of the results of the research
• Used to communicate research finding and to support hypothesis
and give credibility to research methodology and conclusions
• In a designed experiment, the engineer makes changes in the
controllable variables of process, observes the resulting system
output data, and then makes a decision about which variables are
responsible for the observed changes in output performance.
• Monitoring the outcomes of a certain process and trying to
manipulate the results to one’s desire through tampering.
Two Major Types of Statistics.
The branch of statistics devoted to the collection, organization,
summarization and presentation of data is called descriptive
statistics and the branch of statistics concerned with
generalizing from samples to populations, to make an
inference about population based on information obtained
from a sample of the population is called inferential statistics.
Two Major Types of Statistics.
It is important to distinguish between a sample and a
population.

A population consists of all subjects (human or otherwise) that


are being studied.

A sample is a group of subjects selected from a population.


When is Sampling appropriate
• Sampling is used any time data is to be gathered. 
Data cannot be collected until the sample size (how much) and
sample frequency (how often) have been determined.
• Sampling should be periodically reviewed.
When data is being collected on a regular basis to monitor a
system or process, the frequency and size of the sample should
be reviewed periodically to ensure that it is still appropriate.
• To help in minimizing error from the despondence due to large
number in the population
Parameters and Statistics
We can describe samples and populations by using measures
such as the mean, median, mode and standard deviation. When
these terms describe the characteristics of a population, they
are called parameters. When they describe the characteristics
of a sample, they are called statistics.
What is a variable
A variable is a characteristic or attribute that can assume
different values.

Data are the values (measurements or observations) that the


variables can assume. Variables whose values are determined
by chance are called random variables.
Variables and Types of Data:
Qualitative variables/Categorical variables are variables that
can be placed into distinct categories, according to some
characteristic or attribute. For example, if subjects are
classified according to gender (male or female), then the
variable gender is qualitative.
Quantitative variables/Numerical Variables are numerical and
can be ordered or ranked. For example, the variable age is
numerical, and people can be ranked in order according to the
value of their ages.
Quantitative Variables:
Discrete Variables: assume values that can be counted.

Continuous variables, by comparison, can assume an infinite


number of values in an interval between any two specific
values.
The classification of variables can be
summarized as follows:
Scales
Scales for Qualitative Variables.

Besides being classified as either qualitative or quantitative,


variables can be described according to the scale on which
they are defined. The scale of the variable gives certain
structure to the variable and also defines the meaning of the
variable.
Scales
Scales for Qualitative Variables.

If the categories of a qualitative variable are unordered, or


have no natural ordering, then the qualitative variable is said
to be defined on a nominal scale, the word nominal referring
to the fact that the categories are merely names. If the
categories can be put in order, the scale is called an ordinal
scale.
Scales
Scales for Quantitative Variables.

If one can compare the differences between measurements of


the variable meaningfully, but not the ratio of the
measurements, then the quantitative variable is defined on
interval scale. If, on the other hand, one can compare both the
differences between measurements of the variable and the
ratio of the measurements meaningfully, then the quantitative
variable is defined on ratio scale.
Scales
Scales for Quantitative Variables.

In order for the ratio of the measurements to be meaningful,


the variable must have natural meaningful absolute zero point.
For example, temperature measured on the Certigrade system
is a interval variable and the height of person is a ratio
variable.
Examples of Measurement Scales
Data Collection and Sampling Techniques
Sometimes it is possible and practical to examine every person
or item in the population we wish to describe. We call this a
complete enumeration, or census. We use sampling when it is
not possible to measure every item in the population.
Data Collection and Sampling Techniques
Data can be collected in a variety of ways. One of the most
common methods is through the use of surveys. Surveys can
be done by using a variety of methods.
1. Telephone surveys
2. Mailed questionnaire surveys
3. Personal interview surveys
Data can also be collected in other ways, such as
surveying records or direct observation of situations.
Sources of Data
• Primary data means the raw data (data without fabrication or
not tailored data) which has just been collected from the
source and has not gone any kind of statistical treatment like
sorting and tabulation. The term primary data may
sometimes be used to refer to first hand information.
• Secondary data are data which has already been collected by
someone, may be sorted, tabulated and has undergone a
statistical treatment. It is fabricated or tailored data.
Sources of Primary Data
• Personal Investigation
• Through Investigators
• Through Questionnaire
• Through Local Sources
• Through Telephone
• Through Internet
Sources of Secondary Data
• Government Organizations
• Semi-Government Organization
• Teaching and Research Organizations
• Research Journals and Newspapers
• Internet
Observational and Experimental Studies
There are several different ways to classify statistical studies.
This section explains two types of studies: observational
studies and experimental studies.

In an observational study, the researcher merely observes what


is happening or what has happened in the past and tries to
draw conclusions based on these observations.
Observational and Experimental Studies
For example, data from the Motorcycle Industry Council (USA
TODAY) stated that
“Motorcycle owners are getting older and richer.” Data were
collected on the ages and incomes of motorcycle owners for
the years 1980 and 1998 and then compared. The findings
showed considerable differences in the ages and incomes of
motorcycle owners for the two years. In this study, the
researcher merely observed what had happened to the
motorcycle owners over a period of time. There was no type of
research intervention.
Observational and Experimental Studies
In an experimental study, the researcher manipulates one of the
variables and tries to determine how the manipulation
influences other variables.
Observational and Experimental Studies
For example, a study conducted at Virginia Polytechnic Institute
and presented in Psychology Today divided female undergraduate
students into two groups and had the students perform as many sit-
ups as possible in 90 sec. The first group was told only to “Do your
best,” while the second group was told to try to increase the actual
number of sit-ups done each day by 10%. After four days, the
subjects in the group who were given the vague instructions to “Do
your best” averaged 43 sit-ups, while the group that was given the
more specific instructions to increase the number of sit-ups by 10%
averaged 56 sit-ups by the last day’s session. The conclusion then
was that athletes who were given specific goals performed better
than those who were not given specific goals.
Treatment/Control Group
In the sit-up study, there were two groups. The group that
received the special instruction is called the treatment group
while the other is called the control group. The treatment
group receives a specific treatment (in this case, instructions
for improvement) while the control group does not.
Independent/Dependent Variables
The independent variable in an experimental study is the one
that is being manipulated by the researcher. The independent
variable is also called the explanatory variable. The resultant
variable is called the dependent variable or the outcome
variable.
Advan/Disadvan (Experimental)
Advan:
 researcher can decide how to select subjects and how to assign
them to specific groups
 can also control or manipulate the independent variable
Disadvan:
 they may occur in unnatural settings, such as laboratories and
special classrooms
 Hawthorne effect
 confounding variable
Advan/Disadvan (Experimental)
A confounding variable is one that influences the dependent or
outcome variable but was not separated from the independent
variable.
Advan/Disadvan (Observational)
Advan:
 it usually occurs in a natural setting
 it can be done in situations where it would be unethical or
downright dangerous to conduct an experiment
 can be done using variables that cannot be manipulated by the
researcher
Disadvan:
 cause-and-effect situation cannot be shown
 expensive and time-consuming
 study of events that occurred in the past
Data Presentation Methods
1. Textual Method
2. Tabular Method
3. Semi-tabular Method
4. Graphical Method
Textual Method
The most common way of presentation of data is in the form of
statements. This works best for simple observations, such as:
"When viewed by light microscopy, all of the cells appeared
dead." When data are more quantitative, such as "7 out of 10
cells were dead", a table is the preferred form.
Tabular Method
Provides a more precise, systematic and orderly presentation
of data in rows or columns. Should be used for small datasets
for comparison.
Semi-tabular Method
The utilization of graphs is most effective method of visually
presenting statistical results or findings
Graphical Method
Graphs are commonly used scientific illustrations. There
should be a good reason for using a graph rather than a table.
Usually they are employed to show the functional relationship
between dependent and independent continuous variables.
Types of Data Display
1. Bar Charts
2. Histograms/Frequency Polygon/Ogive
3. Line Graphs
4. Scatter Graphs
5. Box Plots
6. Pie Charts
7. Pictograms
Types of Data Display
1. Bar Chart
The Bar Chart (or Bar Graph) is one of the most common
ways of displaying qualitative and discrete data. Bar Graphs
consist of 2 variables, one dependent and one dependent
arranged on the horizontal and vertical axis of a graph.

The graph may be orientated in several ways. There exists


some space between the individual bars in each of the graphs
Types of Data Display
Types of Data Display
2. Histogram
A histogram is similar to a bar chart. However histograms
are used for continuous data and has no gaps between
columns.
Types of Data Display
Types of Data Display
3. Line Graph/Time Series
A graph of ordered pairs, (x,y), where the points are
connected, in order, by a line segment. Good for comparing
one set of values to another and for displaying trends.
Types of Data Display
Types of Data Display
Types of Data Display
Types of Data Display
4. Scatter Plot/Scattergram
A scatter plot is a graph of the ordered pairs (x, y) of
numbers consisting of the independent variable x and the
dependent variable y.
It provides a visual representation of data and allow us to
look for any trends or patterns in the data and are used for to
graphically represent numerical data.
Types of Data Display
Types of Data Display
Types of Data Display
Types of Data Display
5. Box Plot/Box-and-Whisker Plot

It is a standardized way of
displaying the distribution of data
based on the five number summary:
minimum, first quartile, median, third
quartile, and maximum.
Types of Data Display
Types of Data Display
6. Pie Chart

It is a circle which is divided into segments called sectors


that each represent a proportion of the whole.
A pie chart shows data as a percentage of the whole.
Types of Data Display
Percentages of U.S. Numbers of native English speakers in
Population by Race, 2000 the major English-speaking countries of
the world.
Types of Data Display
7. Pictogram

A pictogram is simply a picture that


conveys some statistical information.
Types of Data Display
Misleading Graphs
Misleading Graphs
Misleading Graphs
QUESTIONS??

You might also like