You are on page 1of 19

INTRODUCTION TO DATA ANALYSIS USING SPSS

Types of variables Mean, median and mode

Learning sequence
Univariate
Describing complex Distribution Measures of analysis central tendency Measure of spread

Bivariate
Describing

Multivariate
Describing

relationships
Sampling distribution Hypothesis testing e.g. Chi square

relationship
regression

Types of variables

Dependent Variables and Independent Variables


An

independent ( experimental or predictor) variable is a variable that is being manipulated in an experiment in order to observe the effect on a dependent (outcome) variable.

Scenario

Situation:

A tutor asks 100 students to complete a maths test.

Research question:

The tutor wants to know why some students perform better than others.

Hypothesis:

The tutor thinks that it might be because of two reasons:


some students spend more time revising for their test; and some students are naturally more intelligent than others.

Procedure:

The tutor decides to investigate the effect of revision time and intelligence on the test performance of the 100 students.

The dependent and independent variables for the study are:


Dependent Variable: Test Mark (measured from 0 to 100) Independent Variables: Revision time (measured in hours) Intelligence (measured using IQ score)

Types of variables

Nominal

allow for only qualitative classification/categorical. measured only in terms of whether the individual items belong to certain distinct categories, but we cannot quantify or even rank order the categories

Ordinal

is a nominal variable, but its different states are ordered in a meaningful sequence. Ordinal data has order, but the intervals between scale points may be uneven.

Scale/ratio/interval

can be measured along a continuum and they have a numerical value (for example, temperature measured in degrees Celsius or Fahrenheit).

Summary
Nominal
Categorical? Can rank? Can measure the actual distance between data points? YES NO NO

Ordinal
YES YES NO

Interval
YES YES YES

Descriptive Statistics

The analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. DO NOT allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.

Distributions

Bars charts Histogram Normal distribution Positive skew Negative skew

Measure of Central tendency

Ways of describing the central position of a frequency distribution for a group of data. The mean, median and mode are all valid measures of central tendency but, under different conditions, some measures of central tendency become more appropriate to use than others.

Summary
Type of Variable Nominal Ordinal Interval/Ratio (not skewed) Interval/Ratio (skewed)

Best measure of central tendency


Mode Median Mean Median

Significant?
Imagine this situation: You are in a class with just four other students, and the five of you took a 5-point pop quiz. Today your instructor is walking around the room, handing back the quizzes. She stops at your desk and hands you your paper. Written in bold black ink on the front is 3/5 How do you react? Are you happy with your score

Measuring the spread of data

The most common measure of variation, or spread, is the standard deviation. The standard deviation is a number that measures how far data values are from their mean. Suppose that we are studying waiting times at the checkout line for customers at supermarket A and supermarket B; the average wait time at both markets is 5 minutes. At market A, the standard deviation for the waiting time is 2 minutes; At market B the standard deviation for the waiting time is 4 minutes. Market B has a higher standard deviation, we know that there is more variation in the waiting times at market B. Overall, wait times at market B are more spread out from the

Standard deviation

For data having a distribution that is MOUNDSHAPED and SYMMETRIC:


Approximately

68% of the data is within 1 standard deviation of the mean. Approximately 95% of the data is within 2 standard deviations of the mean. More than 99% of the data is within 3 standard deviations of the mean.

Activity 1

Data Entering Recoding variables

Hypothesis testing

Define the research hypothesis and set the parameters for the study. Set out the null and alternative hypothesis (or more than one hypothesis; in other words, a number of hypotheses). Explain how you are going to operationalise (that is, measure or operationally define) what you are studying and set out the variables to be studied. Determine whether the distribution that you are studying is normal (this has implications for the types of statistical tests that you can run on your data). Select an appropriate statistical test based on the variables you have defined and whether the distribution is normal or not. Run the statistical tests on your data and interpret the output. Accept or reject the null hypothesis.

Parametric and non parametric test

Chi Square Test for Association

to discover if there is a relationship between two categorical variables. Null hypothesis : No relation between X and Y when H0: X02 = 0 There is a relationship between X and Y when H1: X12 > 0

P values

Indicate the extend to which the sample results can be generalized to the population When p <0.05, conclude:
I

am 95% confident that there is an association between gender and employment in the population

Activity

You might also like