You are on page 1of 8

Statistical Analysis using SPSS Lesson 1

Lesson 1

Introduction to Statistics

What is statistics?
Why is statistics needed?
Population and sample
Introduction to SPSS Windows
Basic steps in data analysis using SPSS
Statistical Analysis using SPSS Lesson 1

What is Statistics?

Statistics is the science of learning from data. It includes everything from planning for the
collection of data and subsequent data management to end-of-the-line activities such as
drawing conclusions of numerical facts and presentation of results. Statistics combines
mathematical theories with prevailing knowledge in different areas of sciences resulting
in the advancement of broader scientific understanding. Today, statistics has become an
important tool in the work of many disciplines such as medicine, psychology, education,
sociology, engineering and physics, just to name a few.

A famous statistician once remarked, Statistics is concerned with one of the most basic
human needs: the need to know about the world and how it operates in the face of
variation and uncertainty. We can say the whole field of statistics revolves around this
concept variation.

Variation is a concept that is used to quantify the differences in a characteristic of interest
between different members in a population. For example, birth weights of full-term
newborn babies are never the same. Some will be heavier than others, some will be
lighter than others, but the majority will weigh around 3.5kg. In this example, the
characteristic of interest is the weight of newborn babies and it is a variable. The
quantification of the differences in birth weight is known as variation. A researcher may
want to know why newborns differ in weight from one another. Are the differences due to
differences in the mothers age, mothers weight, habits, education, etc? These are some
of the questions that need to be answered. In this case, statistical methods can be used to
design, collect and analyze the observed data. The statistical principles can then be used
to understand more about the differences in the birth weights.

Why is Statistics Needed?

Data can be captured and stored in many ways. However, application of proper statistical
procedures is vital in the planning and data collection process to ensure the data collected
are reflective of the population at large. Once data are collected, we study the data. It is
important to remember that data are just crude information and not knowledge. When the
data are explored and the numbers are crunched, we obtain information. This division or
area of statistics is known as descriptive statistics. Many procedures are available to help
describe the data. The deeper we explore the data, the more we understand about the data
at hand. Care must be taken to understand the type of data that is being analyzed so that
appropriate descriptive procedures can be used.

The information gained through performing descriptive analyses on a particular variable
may shed some insight on the characteristics of interest. Note that this is only a sample
characteristic, also known as statistic (without the ending s in the word statistics).
But the main purpose of any study is not just to describe the sample, but to make
inferences about the population at large. The term parameter is used to refer to the
characteristic of interest in the population. Again, information obtained from a single
sample or at a single time point may or may not be a true reflection of the universe. Some
logical linkage must be made between the sample statistic and the population parameter.
This division or area of statistics is known as inferential statistics, which is discussed in
chapter 5. This is a wide area and involves the application of probability and statistical
theory, especially in the underlying statistical distributions.
Statistical Analysis using SPSS Lesson 1

Information tested and retested with different samples and at different time points
becomes facts, if the data can consistently support it. Finally, facts become knowledge
when they are used in the successful completion of the decision process. The whole
sequence is known as data-driven decision-making process. Figure below shows the level
of statistical methods or procedures needed for a study depends on the desired level of
improvement in decision making.

Level of Knowledge


Data Level of improvement in decision making

Data-driven decision-making process

Population and Sample

Population: A set of things or objects in which we have an interest at the particular time.
Examples: Workers at a factory, students in a college, in-patients at a hospital.

Sample: A subset of the population
Examples: A group of workers at the factory, a selection of students from the college.

Several sampling methods can be used to obtain a sample from the population. They can
be classified under two broad categories: probability sampling and non-probability
sampling. In probability sampling, we require a sample frame a list of all the items or
objects in the population. Within probability sampling there are a few types of sampling
procedures, the basic one is the simple random sample. All analyses in SPSS assume that
the data are collected using this simple random sampling procedure.


A variable can be defined as a characteristic of things or objects that take different values
in different items that are tested. The opposite of a variable is a constant.

For example, the weight of newborn babies varies from one to another. So, the weight of
newborn babies is considered to be a variable. The gender of the babies also differs from
one to another. So, gender is a variable too. The whole field of statistics revolves around
this term variable and the hundreds of statistical tools we have are concerned with
describing these variables and finding associations between them.
Statistical Analysis using SPSS Lesson 1

There are two types of variables:

Qualitative variable: This is a phrase used to describe characteristics that cannot
be measured or counted, but merely categorized like race, sex, colour, exam
grades and blood group.

Quantitative variable: This is a phrase used to describe measurable characteristics
like height, weight, age and exam marks and counts like number of passes,
number students and number of accidents.


Measurement is the assignment of numbers to represent a characteristic. It is useful to
clarify what is being measured and what it measures. For example, the clinical
thermometer measures the body temperature. But what does the body temperature
measure or indicate? Perhaps, the body temperature is an indicator of the presence of
bacterial or viral infection.

The units of measurements are equally important for computations and inference
purposes. For example, consider an increase in body temperature. An increase of 3
Fahrenheit may not be a cause for concern, but an increase of 3
Celsius may be critical.

Concepts and Indicators

A concept is what we are hoping to capture and indicators are what we use to capture it.
Say, a doctor wants to establish the health status of a group of workers. The health status
is a concept. Since health status varies from person to person, it can be considered to be a
variable too. Health status is a concept and it is not directly measurable it is an
unobserved measure, often called a latent variable. Then, how do we measure the
unobserved? First, we need to identify some reliable indicators of health status. In
healthcare, variables like weight, blood pressure, cholesterol, blood sugar levels are often
used as some indicators of health status. These are measurable variables and their units of
measurement are different too. If a persons blood pressure is always high, he is said to be
of poor health. Blood pressure is also known to have high levels of association with
cholesterol level, blood sugar level and weight. A person who has values in the normal
range, for all of these measures, is said to be healthy.


Data can be considered as the raw material of statistics. The information gathered, facts
tested and ultimately the knowledge gained, depends heavily on the quality of data
collected. Therefore, considerable importance must be paid to the data collection stage.

Data can be obtained either from primary or secondary sources. Data compiled from
sources like records, journals and archives are called secondary data. While data collected
primarily through designed experiments or surveys, by the researcher are called primary
Statistical Analysis using SPSS Lesson 1

Types of data

1. Qualitative data can be classified further into nominal data and ordinal data.

Nominal data are categorical characteristics that you can name.
Examples: Gender: Male or female based on physical traits.
Blood group: A, B, AB or O based on allele types.
Of course, it is not true that group A is better than group B.
They are just names given based on particular characteristics.

Ordinal data are categorical characteristics that you can name and rank as well.
Examples: Socio-economic status: Low, middle or high.
Exam grades: A, B, C, D or E based on level of achievement.
Of course, grade A is better than grade B and so on.

2. Quantitative data can be classified into discrete data and continuous data.

Discrete data are numerical characteristics that are countable (whole numbers).
Examples: Number of males and number of females
Number of patients waiting for surgery
Number of students sitting for an exam

Continuous data are numerical characteristics that are measurable.
Examples: Marks obtain by students
Body mass index (BMI) of patients
Time taken by athletes to complete a road race
Since continuous data are measureable, they can be measured in decimals.

It is very important to understand the different types of data so that they can be described
and presented in an appropriate manner. For example, it does not make sense to find the
average for a group of males and females. In this case the information is best stated in the
form of percentages. Variables like weight and height are best described using average
and percentiles. For visual data presentations, bar charts should be used for qualitative
data and histograms should be used for quantitative data. The underlying distributions
also differ for different data types. In making inferences, the choice of statistical tests
depends on the type of data.
Statistical Analysis using SPSS Lesson 1

Introduction to SPSS Windows

Statistical Packages for Social Sciences (SPSS) for Windows provides a powerful
statistical analysis and data management system in a graphical environment, using
descriptive menus and simple dialog boxes to do most of the work for you. Most tasks
can be accomplished simply by pointing and clicking the mouse.

In addition to the simple point-and-click interface for statistical analysis, SPSS for
Windows provides:

Data Editor. A versatile spreadsheet-like system for defining, entering, editing,
displaying data.

Viewer. The Viewer makes it easy to browse your results, selectively show and hide
output, change the display order results, and move presentation-quality tables and charts
between SPSS and other applications.

Multidimensional pivot tables. Results come alive with multidimensional pivot tables.
Explore your tables by rearranging rows, columns, and layers. Uncover important
findings that can get lost in standard reports. Compare groups easily by splitting your
table so that only one group is displayed at a time.

High-resolution graphics. High-resolution, full-color pie charts, bar charts, histograms,
scatterplots, 3-D graphics, and more are included as standard features in SPSS.

Database access. Retrieve information from databases by using the Database Wizard
instead of complicated SQL queries.

Data transformations. Transformation features help get your data ready for analysis. You
can easily subset data, combine categories, add, aggregate, merge, split, and transpose
files, and more.

Electronic distribution. Send e-mail reports to others with the click of a button, or export
tables and charts in HTML format for Internet and Intranet distribution.

Online Help. Detailed tutorials provide a comprehensive overview; context-sensitive Help
topics in dialog boxes guide you through specific tasks; pop-up definitions in pivot table
results explain statistical terms; the Statistics Coach helps you find the procedures that
you need; and Case Studies provide hands-on examples of how to use statistical
procedures and interpret the results.

Command language. Although most tasks can be accomplished with simple point-and-
click gestures, SPSS also provides a powerful command language that allows you to save
and automate many common tasks. The command language also provides some
functionality not found in the menus and dialog boxes.
Statistical Analysis using SPSS Lesson 1

SPSS Windows

SPSS for Windows provides a powerful statistical analysis and data management system
in a graphical environment, using descriptive menus and simple dialog boxes to do most
of the work for you. Simply pointing and clicking the mouse can accomplish most tasks.
SPSS for Windows provides:

SPSS Data Editor. A versatile spread-sheet-like system for defining, entering, editing,
and displaying data.

SPSS Viewer. The new Output Navigator makes it easy to browse your results,
selectively show and hide output, change the display order results, and move
presentation-quality tables and charts between SPSS and other applications.

SPSS Chart Editor. Helps you edit charts. You can change the pattern, color, style, and
label of the graphs. You can also modify the axis, rotate or swap the axis.

SPSS Syntax Editor. This can be used to save, view, modify and rewrite the syntax.

Help. Comprehensive overview of SPSS basics is also available in the online tutorial
under the Help menu. The meanings of the statistical terms can also be obtained by
double-clicking on the terms themselves.
Statistical Analysis using SPSS Lesson 1

Basic steps in Statistical Data Analysis Using SPSS

The four basic steps in data analysis in SPSS is summarized as below.

Step 1

your data into
Get your data into SPSS Data Editor
This can be done either by;
directly entering the data in the Data Editor.
open a previously saved SPSS file.
read a spreadsheet/text data file.

Step 2

a procedure
from the menu
Select a procedure from the men.
This depends on the objective of the study.
Graph procedure to create a chart.
Analyze procedure to perform statistical analysis.

Step 3

for the analysis
Select a variable
Make sure the procedure is appropriate for the
all the variables in data file are displayed in a
Dialog Box.
just highlight and click the variable(s) into the
respective dialog boxes.

Step 4

Run & Examine
the results
Run the procedure by clicking OK
Results are displayed in the OUTPUT VIEWER.
it can be a chart,
it can be descriptive statistics,
it can be inferential statistics,

Based on the output, draw conclusions accordingly.

The four steps in data analysis