You are on page 1of 35

BIOSTATISTICS

USING (IBM-SPSS)
DR Mahmoud Gabr

mahgabr@alexu.edu.eg
mgabr@zewailcity.edu.eg

(2016)

1
Biostatistics Using SPSS Prof Mahmoud Gabr
The origins of SPSS
In 1968, Nie, N.H., Hull, H., and
Bent D.H., three young men from
disparate professional backgrounds,
developed a software system based
on the idea of using statistics to
turn raw data into information
essential to decision-making.
Nie, a social scientist and Stanford
doctoral candidate, represented the target
audience and set the requirements;
Bent, a Stanford University doctoral
candidate in operations research,
had the analysis expertise and designed the
SPSS system file structure; and
Hull, who had recently graduated
from Stanford with a master of
business administration degree,
programmed.

2
Biostatistics Using SPSS Prof Mahmoud Gabr
This revolutionary statistical
software system was called SPSS,
which stood for the Statistical
Package for the Social Sciences.
Or
SPSS (Statistical Product and Service
Solutions)

♥Today: SPSS is recognized as


a leader in the predictive
analytics market space.
♥Predictive analytics, combines
advanced analytics and decision
optimization.
 

3
Biostatistics Using SPSS Prof Mahmoud Gabr
♥Most Important features:
•SPSS 10 (1999) included two views:
•“Data View” and “Variable View”.

4
Biostatistics Using SPSS Prof Mahmoud Gabr
♥Most Important features:
•SPSS 13, the ability to open more
than one data file at the same time.
•SPSS 14, the artificial Intelligence
of Graphs using “Chart Builder”

5
Biostatistics Using SPSS Prof Mahmoud Gabr
♥Most Important features:
•SPSS 16 (2007) add “the Neural
Networks” and removed “Maps”.

6
Biostatistics Using SPSS Prof Mahmoud Gabr
♥Most Important features:
•SPSS 17 in 2008 add the
“Dictionary” and “Nearest
Neighbor” and replaced “time
series” by “Forecasting”

7
Biostatistics Using SPSS Prof Mahmoud Gabr
♥Most Important features:
•IBM took over SPSS in 2009 and
renamed SPSS 18 as PASW
(Predictive Analytic Soft Ware) 18
•PASW 18 add the “Direct Marketing”,
“Artificial Intelligence for using
Nonparametric” and the “Bootstrap”.

8
Biostatistics Using SPSS Prof Mahmoud Gabr
9
Biostatistics Using SPSS Prof Mahmoud Gabr
♥Most Important features:
•IBM in 2010 renamed SPSS as
IBM-SPSS 19
•IBM-SPSS 19 add the “Automatic
Linear Modeling”

11
Biostatistics Using SPSS Prof Mahmoud Gabr
Descriptive Statistics: Methods of collecting,
organizing, summarizing and presenting
data in an informative way.
Inferential Statistics: A decision, estimate,
prediction, or generalization about a
population based on a sample.

11
Biostatistics Using SPSS Prof Mahmoud Gabr
A population is the complete collection of all
possible elements (individuals, objects, scores,
people, measurements, and so on) to be studied.
A census is the collection of data from every
element in a population.
A sample is a portion or part of the population of
interest.
For example a public health center might commission a
survey of 10,000 people to estimate the proportion of
the population smoking for medical purposes. The
1,000 people constitute the sample, and all of us
constitute the population. Every 10 years the
government tries to obtain a census, but fails because it
is impossible to reach everyone.

12
Biostatistics Using SPSS Prof Mahmoud Gabr
A parameter is a numerical measurement describing
some characteristic of a population.
Population Sample
 

Parameter Statistic
A statistic is a numerical measurement describing
some characteristic of a sample.

13
Biostatistics Using SPSS Prof Mahmoud Gabr
Variables

A Quantitative data consist of numbers representing


counts or measurements (information is reported
numerically).
Examples: number of patients in a hospital (discrete),
the temperature of a patient (continuous)

A Qualitative (or categorical or attribute) data


can be separated into different categories that are
distinguished by some nonnumeric characteristics
(the characteristics being studied in nonnumeric).
Examples: Gender, religions affiliation, Governorate of
birth, eye color, ...)

14
Biostatistics Using SPSS Prof Mahmoud Gabr
Variables
&
levels of measurements

15
Biostatistics Using SPSS Prof Mahmoud Gabr
Discrete data which means that it can only take specific values (The
number of possible values is either a finite number or a countable number).
0, 1, 2, 3, . . .
Continuous data which means that it can take all values in a given range
(numerical) data result from infinitely many possible values that correspond to
some continuous scale that covers a range of values without gaps, interruptions,
or jumps). For example, the amounts of milk that cows produce could be
2.3415 gallons a day.

16
Biostatistics Using SPSS Prof Mahmoud Gabr
Sampling Techniques
1. A simple random sampling: A sample is selected such
that every element in the population has an equal
chance of being chosen. No particular subject is
systematically excluded from the study, or more
likely to be included than others.
2. A systematic sampling is one of the most practical
method of sampling, in which every kth item in the
sampling frame is selected. For instance, every 20th
name on a list, every 10th house on one side of a
street, or every 55th item of the production lot ,...etc.,
are selected. BE AWARE of the presence of hidden
periodicity! For example every 10th house could be a
corner house.
3. Cluster Sampling: A population is divided into
clusters using naturally according geographic or
other boundaries. Then, clusters are randomly
selected and a sample is collected by randomly
selecting from each cluster.

17
Biostatistics Using SPSS Prof Mahmoud Gabr
4. A stratified sampling is a procedure which consists
of stratifying (or dividing) the population into a
number of non-overlapping sub-populations, or
strata, and then taking a sample from each stratum.
If the selected samples from each stratum constitute
simple random samples the random sampling is
called stratified (simple) random sample.
Another way of selecting samples from strata is by
proportional allocation which means that sample sizes of
the samples from the different strata are proportional to
the sizes of the strata. For example if the population size
N=1000 which can be stratified according to social class
for three stratum of size N1=200, N2=500, and N3=300. A
sample of size n=100 is to be selected using proportional
allocation. Then, the sample size to be selected from the
first stratum is n1=(200/1000)x100= 20 (i.e. n1=(N1/N) x n).
Similarly, the samples selected from second and third
strata are, respectively, n2=50 and n3=30. Note that
n1+n2+n3=n. Other types of allocations are also discussed
in many text books for statistical sampling.
Strata 1 2 3 Total
Population size 200 500 300 1000
Sample size 20 50 30 100

18
Biostatistics Using SPSS Prof Mahmoud Gabr
Getting started with SPSS
The first time you run SPSS, the following window will
appear. You can choose one of the given options
there; eg. Type in data, but in the mean time click

To start data entry

You can open another type


of file, such as SPSS Output
or Syntax

19
Biostatistics Using SPSS Prof Mahmoud Gabr
The main window in the SPSS program has two views:

Data View
This is used for
data entry. It
consists of
columns
(variables) and
rows (cases).

variable View
This is used for
variables
attributes.
It consists of
columns (variables
characteristics)
and rows
(variables).
It’s better to begin
with this window
to define your
variables and then
start data entry.

21
Biostatistics Using SPSS Prof Mahmoud Gabr
Variable View
In the variable view we have 10 columns that characterized variables
Column1: Name
“Name” stands for the variable name. Suppose we have 3 variables age,
gender, and education. In variable view, in column “Name” write “age”,
“gender", and “edu”.
There are some constrains on ‘variables name’ in SPSS such as:
1. Variable name can’t be more than 64 characters
2. The name should begin with a letter and shouldn’t end with a
period “.”
3. You can’t have a name with special characters such as (*, +, - , \, ( ,
%, ^, or &) but you can have (@, #, $ or .)
4. You can’t have a space in the variable name such as ‘age 1’ but
you can use the underscore like ‘age_1’
5. The variable name shouldn’t be a reserved word such as (Eq, All,
With, …)
6. Each variable name must be unique; duplication is not allowed.

21
Biostatistics Using SPSS Prof Mahmoud Gabr
Column2: Type
The variable type indicates the way you will enter the data, and most
of the time we use “Numeric. In variable view, click on “Numeric” to get
the following dialog box.
Most of the time we select
“Numeric” you would do that
for quantitative variables.
However this approach can be
used in qualitative e.g. you
would use 1 for male and 2 for
female.

We use “String” for company


names, people names or any
variable with unrepeated
categories.

Column3 and 4: Width and Decimal


“Width” is the number of digits reserved for the variable you can choose a
number from 1 to 40. Decimal digits should be less than “Width”.

22
Biostatistics Using SPSS Prof Mahmoud Gabr
Column5: Label

The variable label allows you


to give full description of the
variable with no restrictions
up to 256 characters. It is the
name that will appear in the
output.

Column6: Values
“Value label” is used to define the label for value for each
category. In the variable view click on “None” in “Values” column
to get the following dialog box

To give the label for


the value specially for
qualitative data: write
the value and it’s label
and then click Add

23
Biostatistics Using SPSS Prof Mahmoud Gabr
Column7: Missing
Some times in response to a question in a questionnaire you don’t get
an answer. For example some people don’t give their age in this case we
could give any number to indicate of the missing value such as -9. You can
define up to 3 different values.

In the variable view click on “None”


in “Missing” column to get the
following dialog box, and type “-9”
as missing value. Note: “-9” will not
included in the analysis.
Some time you could give a range of
missing values, for example if you to
exclude some people with ages
between 10 and 18.

24
Biostatistics Using SPSS Prof Mahmoud Gabr
Column8, 9, and 10: Columns, Align, and Measure

Columns : Indicates the size of column (width).


Align : Indicates the alignment of the text entered (left, center or right)
For example you can set “Columns” for the variable age as “20” and
“Align” as “Center”, and the result will be as follows:

Look at the column size and text align for the variable “age”.
Measure : This is refer to the level of the measurement of the variable, and
can be one the following:
Scale: Quantitative data
Nominal: Qualitative data
Ordinal: Qualitative data with some ranking.

25
Biostatistics Using SPSS Prof Mahmoud Gabr
Reading data files using SPSS
1. SPSS data files
Select File → Open → Data and then choose the file name
example1.
Note that the file extension is ‘sav’ which stands for the
SPSS data files.
2. Spreadsheets data files

The Excel data file name.

The variables name in the first raw.

Select File → Open → Data and then choose the file name
example1. Note that the file extension is ‘xlsx’ which stands
for the Excel files.

26
Biostatistics Using SPSS Prof Mahmoud Gabr
Describing Quantitative Data
Frequency Distribution Table

Suppose we want to construct a frequency distribution table for


“BMI”(Body Mass Index data).
Assume that the number of intervals is 7.
Interval width = 3
The first interval will be [18 – 21), where 21 is not included.
In the beginning we will create a new variable with number {1,2,
…, 7), where 1 means first interval, 2 means the second interval,
and so on.
Select Transform → Visual Binnig (or Visual Banded in the old
versions of SPSS)

27
Biostatistics Using SPSS Prof Mahmoud Gabr
Click on the variable BMI', and click on the black arrow to move
the variable into the “Variable to Band” box, and then click

Write down a new


variable name
"BMI_Classes"

Select this option if upper


bounds are excluded

Enter the name and label into the banded variable box (BMI_Classes).
Select “Excluded[<]” to exclude the upper bound, and then click on
'Make Cutpoints'

28
Biostatistics Using SPSS Prof Mahmoud Gabr
To construct equal width intervals enter the required information into the
empty boxes starting with the first interval upper bound which is 21 in this
example.
Enter the number of Cutpoints which is 7, and then click in the third box,
SPSS will calculate the intervals width '345.2 ", overwrite 3 on it, and
finally click Apply

29
Biostatistics Using SPSS Prof Mahmoud Gabr
These values represent the intervals upper bounds.
Clicking on the 'Make Labels' in the Visual Bander dialog box. Click OK
twice in the 'Visual Binning' Dialog box so that the binned variable is
created.

31
Biostatistics Using SPSS Prof Mahmoud Gabr
To find the frequency distribution table for the new variable
Select Analyze → Descriptive Statistics → Frequencies
Select the variable “BMI_Classes” and click
Frequency table of the Binned variable ' BMI_Classes '

31
Biostatistics Using SPSS Prof Mahmoud Gabr
Histogram
The second way to describe quantitative variable is
graphs. There are 5 suitable graphs for qualitative data,
‘Histogram’, ‘Stem & leaf’, 'Polygon’ , ‘Ogive’ and
‘Boxplot’.
To draw the histogram for the variable ‘BMI’:
Select Graphs → Legacy Dialogs → Histogram

Click on the variable ‘BMI’ ,


Click the arrow of the ‘Variable’, and then click OK

32
Biostatistics Using SPSS Prof Mahmoud Gabr
This histogram has too many classes. To decrease them to a smaller
number e.g. 7 classes, or to choose the class interval width = 3, double
click on the chart to open ‘Chart Editor’ window.

33
Biostatistics Using SPSS Prof Mahmoud Gabr
From the menu bar select ‘Binning’, and then change the number of
intervals to 7, or the interval width to 3 and then click apply.
Close the ‘Chart Editor’ window to go back the output window.

34
Biostatistics Using SPSS Prof Mahmoud Gabr
35
Biostatistics Using SPSS Prof Mahmoud Gabr

You might also like