You are on page 1of 57

1 INTRODUCTION TO

BIO-STATISTICS AND
DATA ANALYSIS USING
SPSS
AYESHA USMANI
BIO-STATISTICIAN

ISLAMABAD DENTAL HOSPITAL, ISLAMABAD


DATE: 12-13 DEC, 2018
2 Statistics

Mathematical
Applied Statistics
Statistics

Psychology Economics Medical


3 BIOSTATISTICS

• Biostatistics is the branch of applied statistics that applies statistical methods


to the medical and biological problems.
• For example a researcher wants to explore the association of sugar intake,
brushing frequency, and previous dental history with the prevalence of dental
caries.
• To conduct this study, we have to answer the following questions.
CONTI…
4

• What data should we collect? ➢ Selection of variables


• How the data will be collected? ➢ Method of data
• How many patients/participants should we examine? collection
• How should we record the data to facilitate the ➢ Sample size
computerization later?
➢ Data analysis
• Which statistical technique will be used to explore the
association?
Variable is a characteristic that can take different value every time
it is measured/observed.

Consider a cup of coffee.You can observe the following.


- Taste - Size of cup
- Color - Price
- Temperature

How will you report these variables?


6 VARIABLE:TYPES

QUANTITATIVE VARIABLES QUALITATIVE VARIABLES


• That can be counted or measured. • That can not be counted or
• It has numerical value associated measured but observed.
with it. • It has no numerical value.
• Further divided into discrete and • Further divided into nominal and
continuous variables. ordinal variables.
• Heights, cholesterol level, heart rate, • Satisfaction level, eye color,
no. of decayed teeth per child effectiveness of a drug
7 QUANTITATIVE VARIABLE:TYPES

DISCRETE VARIABLE CONTINUOUS VARIABLE


• It is counted. • It is measured.
• It can assume only certain values. • It can take on any value with in a
• There are gaps between values. range.

• No. of children in a family, units of • The values can be in decimals.


drug sold today, no. of patients • Temperature, skull circumference,
admitted in a hospital
8 QUALITATIVE VARIABLE:TYPES

NOMINAL VARIABLE ORDINAL VARIABLE


• It consists of various mutually • It consists of ordered/ranked
exclusive categories. categories.
• Nationality with three categories • Daily sugar intake with three
German, Spanish, English. categories, low, medium, high.
• Gender with two categories male • Rating of a restaurant from 0 to 4
and female. star.

Mutually exclusive means that every subject/item belongs to only on


category.
9 LEVEL OF MEASUREMENT
EXERCISE-1

• Level of measurement of data guides us how to summarize and present the data.
• It also determines the statistical test that should be performed.
• SPSS uses three levels of measurements.
1. Nominal (categorical variables)
2. Ordinal (categories with ranking or order)
3. Scale (quantitative/numerical variables)
10

HERE COMES SPSS


DATA HAS BEEN COLLECTED. NOW WE NEED SPSS
FOR THE DATA ANALYSIS
11

SPSS
STATISTICAL PACKAGE FOR SOCIAL SCIENCES
12 SPSS: HISTORY

• SPSS was initially developed a long time ago in the dark ages of computing in late
60’s.
• Norman H. Nie, Dale H. Bent, and Hadlai Hull developed this software in 1968.
• In 1975, SPSS officially became an independent company.
• In 2009, IBM acquired SPSS.
• 2010, new official name is now IBM SPSS Statistics and is part of IBM’s analytics
portfolio.
13 GENERAL CAPABILITIES

• Comprehensive software for data management and analysis.


• Tabulated reports.
• Produce charts.
• Plot distributions and trends.
• Conduct descriptive statistics.
• Perform complex statistical data analysis.
14 SPSS DATA EDITOR WINDOW

• SPSS main window is data editor.


• This is the only window that opens when we start SPSS.
• This window is used for defining the variables, data entry, and analysis.
• It has two main tabs in the left bottom corner.
1. Data view: where we inspect our actual data.
2. Variable view: where we define and give all the related information about the
variables.
Go to SPSS and view the menus and options
15 RULES FOR VARIABLE NAMES

• In the variable view, we can adjust the variables under different categories like
name, type, label, value etc.
• Each variable name MUST be unique. Duplication is not allowed.
• Variable name should NOT start with a number or special character.
• The underscore, period, and characters $, #, @ can be used within variable
names.
• An error message is given if variable name is in illegal format.
Visit www.ibm.com for more details.
16

LET’S ENTER A REAL


LIFE DATA SET
• EXERCISE-2 (ANXIETY LEVEL)
• EXPLORE SOME DESCRIPTIVE STATISTICS
17 ENTER THE DATA FOR VARIABLES “ID” AND
“ANXIETY LEVEL ”
• Define the variables in Variable View.
• Enter the data in Data View.
• Analyze > Descriptive Statistics > Frequencies
• Transfer the variable “Anxiety Level” to the Variable(s) box.
• Go to Statistics and choose the options.
• Go to Charts and choose the options as desired/required.
• Click Continue and Ok. You will get the output in the output window.
18 SAVE THE DATA & OUTPUT FILE

• Don’t forget to save your files before closing.


• Save the data file (file extension .sav)
Ctrl + S > Give file name > Give file location > ok
• Save the output file (file extension .spv)
Ctrl + S > Give file name > Give file location > ok
19

DATA ANALYSIS &


INTERPRETATION
20 WHICH TECHNIQUE SHOULD BE USED?

• The main task is to decide about the appropriate statistical technique to analyze
the data in a study.
• Either we are going to plot charts, descriptive measures, or testing some
hypothesis.
• This decision depends on:
1. What do we want to do?
2. What is the objective of our study?
3. What is the data nature?
21 CONTI…

Relationship between two or more variables • Draw scatter plot


• Pearson’s correlation coefficient
• Test the significance

Relationship between two categorical • Pearson’s chi-square test for association


variables

Relationship between three or more • Log linear analysis


categorical variables
22
Compare two independent groups • Independent samples t-test

Compare two dependent/related groups • Dependent/Paired samples t-test

Compare more than two independent • Analysis of variance (ANOVA)


groups
23 PARAMETRIC OR NON-PARAMETRIC

• There are some assumptions, required to be fulfilled to apply a certain test.


• Such assumption based tests are called Parametric test.
• If the assumptions are not met, then we go for their non-parametric counter
parts (if there is).
• For example the independent t-test (parametric) and Mann-Whitney test (non-
parametric).
24

BASIC STATISTICAL
TESTS
NOW WE CONSIDER SOME BASIC TESTS TO ANALYZE THE DATA
25 SOME BASIC STEPS

• Construct the hypothesis.


• Calculation and obtain the p-value.
• Decide either to accept or reject the hypothesis on the basis of following
decision rule:
• Decision rule:
P-value ≥ 0.05 results are insignificant / accept null hypothesis
P-value < 0.05 results are significant / reject null hypothesis
26 T-TEST: INDEPENDENT SAMPLES

• Independent t-test is used in the situations where two conditions are to be


compared and different participants are used for each condition.
• Compare two independent/unrelated groups or to be more specific, to
compare two means.
• Independence means that there must be different participants/subjects in each
group.
27 T-TEST: INDEPENDENT SAMPLES
HYPOTHESIS

• H0 (Null hypothesis) = Two group means are equal


• H1 (Alternative hypothesis) = Two group means are not equal
• Decision rule:
P-value ≥ 0.05 results are insignificant / accept null hypothesis
P-value < 0.05 results are significant / reject null hypothesis
28 T-TEST: INDEPENDENT SAMPLES
EXAMPLE

• A researcher decided to investigate whether an exercise intervention is more


effective in lowering anxiety level. He recruited a sample of 24 males and
randomly divided them into two equal groups.
• Group 1 undertook an exercise training and group 2 attended the mind relaxing
talk program.
• At the end of the treatment program, their anxiety level were calculated.
29 T-TEST: INDEPENDENT SAMPLES
DATA ENTRY

• Independent variable is Treatment with two categorical groups i.e talk and
exercise
• Dependent variable is Anxiety_Level measured on numerical scale.
• While entering the data in SPSS, we need three columns i.e for the ID, Treatment
and for the Anxiety_Level.
• So, there will be 3 columns and 24 rows in our data view.
• Here are the variables and data look in SPSS (add the variable Treatment in the
previous data file).
30 To run the t-test in SPSS, we will use the options in the following manner
Analyze > Compare Means > Independent Samples t-test
31 Transfer the dependent variable “Anxiety_Level” into Test variable box.
Transfer the independent variable “Treatment” into Grouping variable box and
Define Groups.
32 Define Groups by entering the values that are used in defining variables. Here, enter
“0” into Group 1 box and “1” into Group 2 box and Continue.
33 T-TEST: INDEPENDENT SAMPLES
REPORTING THE OUTPUT

• SPSS Statistics generates two main tables of the output for independent t-test.
• The output will be reported in the following manner:

On average, talk participants experienced greater anxiety (M = 47.00,


SE = 3.18) than the exercise participants (M = 40.00, SE = 2.68). The
difference was not significant, t(22) = -1.68, p ≥ .05. So, we conclude
that the exercise training is better than the talk program.
34 T-TEST: DEPENDENT/PAIRED SAMPLES

• Dependent t-test compares the means between two related groups.


• Related groups mean that same participants/subjects are used in both groups.
• Dependent variable is measured on scale level.
• Refer to previous example, suppose that we’d collected the data using same
participants.
• All participants had their Anxiety_Level measured before and after the exercise.
So the effectiveness of the exercise program will be evaluated.
35 T-TEST: DEPENDENT/PAIRED SAMPLES
DATA ENTRY AND ANALYSIS

• Now the data will be entered in SPSS differently.


• We’ll have three variables, ID, Before_Score, After_Score.
• In the Data View, we’ll have 3 columns and 12 rows.
36 To run the t-test in SPSS, we will use the options in the following manner.
Analyze > Compare Means > Paired Samples t-test
CONTI…
37

• Transfer the variables into Paired Variables box.


• Transfer the variables one by one or use the shift key from the keyboard.
• If you need to change the confidence level limits, use the Options button.
• Click Continue and then OK.
• The output will be shown in output window.
38 T-TEST: DEPENDENT/PAIRED SAMPLES
REPORTING THE OUTPUT

• The results will be reported in the following way.

On average, participants experienced significantly greater anxiety


after the exercise (M = 47.00, SE = 3.18) than before the exercise
(M = 40.00, SE = 2.68), t(11) = -2.47, p < .05

• So, we conclude that the exercise program didn’t worked well to reduce the
anxiety level of the participants.
39 CHI-SQUARE TEST FOR ASSOCIATION

• When we deal with categorical variables (ordinal or nominal) e.g. gender,


education level
• The data we get, are the frequencies, not the measurements.
• When we want to see the relationship between two categorical variables, we
use Pearson’s chi-square test.
• Both variables should have two or more groups.
• All the groups should be mutually exclusive.
40 CHI-SQUARE TEST FOR ASSOCIATION
HYPOTHESIS

• H0 (Null hypothesis) = Two variables are not associated


• H1 (Alternative hypothesis) = Variables are associated
• Decision rule:
P-value ≥ 0.05 results are insignificant / accept null hypothesis
P-value < 0.05 results are significant / reject null hypothesis
41 CHI-SQUARE TEST FOR ASSOCIATION
DATA ENTRY

• For this test, the data can be entered in two ways in SPSS.
1. Enter the raw data.
It means that every row of the data editor represents each subject/item about
which we have data. For example, we have data about 200 patients, there will be
200 rows of data in data editor.
2. Enter the weight cases.
We input the frequency data i.e. the number of cases falling in each category.
42 CHI-SQUARE TEST FOR ASSOCIATION
EXAMPLE

• An educator would like to know whether gender is associated with the


preferred type of learning medium.
• we have two nominal variables Gender (male/female) and Preferred learning
medium (online/books). The data is collected for 80 students.
• So, in SPSS we’ll have two columns (for variables) and 80 rows (for number of
students).
• Go to SPSS, create new data file, and define the variables.
43 To run the chi-square test in SPSS, we will use the following options.
Analyze > Descriptive Statistics > Crosstabs
Transfer one of the variables into Row(s) box and other variable into Column(s)
44 box.
CONTI…
45

• Go to Statistics button, check box for chi-square, and Continue.


• Click the Cells button and select Observed from Counts area, Row,
Column, and Total from Percentages area.
• Click Continue and then OK.
• The output will be shown in output window.
46
47
48 CHI-SQUARE TEST FOR ASSOCIATION
REPORTING THE OUTPUT

• The results will be reported in the following way.

There is no statistically significant association between Gender


and Preferred Learning Medium χ2 (1) = 0.487, p = .485 so we
conclude that both males and females equally prefer online
learning vs books.
49 ANALYSIS OF VARIANCE (ANOVA)

• When we deal with more than two independent groups simultaneously.


• One-way ANOVA
The data are classified into more than two groups on the basis of single
criterion.
• Two-way ANOVA
When each observation is classified according to two criteria of classification
simultaneously.
50 ANALYSIS OF VARIANCE
HYPOTHESIS

• H0 (Null hypothesis) = All the group means are equal


• H1 (Alternative hypothesis) = Not all the groups means are equal
• Decision rule:
P-value < 0.05 results are significant.
P-value ≥ 0.05 results are insignificant.
51 ANALYSIS OF VARIANCE
EXAMPLE

• A principal wants to improve the data analysis skills of his research students. He
employs an external agency which provides training in software. They offer four
course, beginner, intermediate, advanced, and professional. He is unsure which
course is needed for the students. So, he sends 4 students in each course. When
they all return from the training, he gives them a problem to solve and the
solution time taken by every student is recorded in minutes. He then compares
the four courses.
52 ANALYSIS OF VARIANCE
DATA ENTRY

• Here, independent variable is Course with four unrelated groups beginner,


intermediate, advanced, and professional.
• Dependent variable is Time taken to solve the problem.
• Data will be entered in the same way as we did for independent t-test (but for 4
groups).
• So, in SPSS we’ll have two columns (for variables) and 16 rows (for total number
of students).
• Go to SPSS, create new data file, and define the variables.
53 To run the one-way ANOVA in SPSS, we will use the following options.
Analyze > Compare Means > One-Way ANOVA
Transfer dependent variable Time into Dependent List and independent variable
54 Course into Factor.
When we have no specific hypothesis before the experiment, we use post hoc
55 tests. Click Post Hoc button and check Tukey.
Click Options button and check Descriptive and Homogeneity of variance
56 test > Continue > OK
57 ANALYSIS OF VARIANCE
REPORTING THE OUTPUT

• The results will be reported in the following way.

The average time taken by the students to solve the problem was
significantly different for four courses, F(3, 12) = 5.563, p < 0.05.
The post hoc Tukey’s test revealed that Professional course is
significantly different from all other three courses.

You might also like