You are on page 1of 19

# INTRODUCTION TO DATA ANALYSIS USING SPSS

Types of variables Mean, median and mode

Learning sequence Univariate Describing complex Distribution Measures of analysis central tendency Measure of spread Bivariate Describing Multivariate Describing relationships Sampling distribution Hypothesis testing e. Chi square relationship regression .g.

.Types of variables  Dependent Variables and Independent Variables  An independent ( experimental or predictor) variable is a variable that is being manipulated in an experiment in order to observe the effect on a dependent (outcome) variable.

The dependent and independent variables for the study are:   Dependent Variable: Test Mark (measured from 0 to 100) Independent Variables: Revision time (measured in hours) Intelligence (measured using IQ score) . and some students are naturally more intelligent than others.  Procedure:  The tutor decides to investigate the effect of revision time and intelligence on the test performance of the 100 students.  Hypothesis:  The tutor thinks that it might be because of two reasons:   some students spend more time revising for their test.Scenario  Situation:  A tutor asks 100 students to complete a maths test.  Research question:  The tutor wants to know why some students perform better than others.

Types of variables  Nominal   allow for only qualitative classification/categorical. but we cannot quantify or even rank order the categories  Ordinal   is a nominal variable. temperature measured in degrees Celsius or Fahrenheit). but its different states are ordered in a meaningful sequence. Ordinal data has order. .  Scale/ratio/interval  can be measured along a continuum and they have a numerical value (for example. measured only in terms of whether the individual items belong to certain distinct categories. but the intervals between scale points may be uneven.

Summary Nominal Categorical? Can rank? Can measure the actual distance between data points? YES NO NO Ordinal YES YES NO Interval YES YES YES .

They are simply a way to describe our data. . for example. show or summarize data in a meaningful way such that. DO NOT allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made.Descriptive Statistics    The analysis of data that helps describe. patterns might emerge from the data.

Distributions      Bars charts Histogram Normal distribution Positive skew Negative skew .

Measure of Central tendency  Ways of describing the central position of a frequency distribution for a group of data. median and mode are all valid measures of central tendency but. under different conditions. some measures of central tendency become more appropriate to use than others. The mean.  .

Summary Type of Variable Nominal Ordinal Interval/Ratio (not skewed) Interval/Ratio (skewed) Best measure of central tendency Mode Median Mean Median .

Written in bold black ink on the front is 3/5  How do you react? Are you happy with your score . handing back the quizzes.Significant? Imagine this situation: You are in a class with just four other students. and the five of you took a 5-point pop quiz. She stops at your desk and hands you your paper. Today your instructor is walking around the room.

Market B has a higher standard deviation. the standard deviation for the waiting time is 2 minutes. Suppose that we are studying waiting times at the checkout line for customers at supermarket A and supermarket B.  At market B the standard deviation for the waiting time is 4 minutes.  the average wait time at both markets is 5 minutes.Measuring the spread of data  The most common measure of variation. we know that there is more variation in the waiting times at market B. wait times at market B are more spread out from the     . or spread. Overall.  At market A. is the standard deviation. The standard deviation is a number that measures how far data values are from their mean.

 Approximately 95% of the data is within 2 standard deviations of the mean.  More than 99% of the data is within 3 standard deviations of the mean.Standard deviation  For data having a distribution that is MOUNDSHAPED and SYMMETRIC:  Approximately 68% of the data is within 1 standard deviation of the mean. .

Activity 1   Data Entering Recoding variables .

Run the statistical tests on your data and interpret the output.Hypothesis testing        Define the research hypothesis and set the parameters for the study. Explain how you are going to operationalise (that is. a number of hypotheses). Set out the null and alternative hypothesis (or more than one hypothesis. in other words. Accept or reject the null hypothesis. measure or operationally define) what you are studying and set out the variables to be studied. Determine whether the distribution that you are studying is normal (this has implications for the types of statistical tests that you can run on your data). . Select an appropriate statistical test based on the variables you have defined and whether the distribution is normal or not.

Parametric and non parametric test .

Chi Square Test for Association  to discover if there is a relationship between two categorical variables. Null hypothesis : No relation between X and Y when H0: X02 = 0 There is a relationship between X and Y when H1: X12 > 0   .

05. conclude: I  am 95% confident that there is an association between gender and employment in the population .P values  Indicate the extend to which the sample results can be generalized to the population When p <0.

Activity .