This action might not be possible to undo. Are you sure you want to continue?
University of Pretoria Faculty of Economic & Management Science 10, 11, 14 & 15 September 2009
Presented by Sumari O’Neil firstname.lastname@example.org
Table of Contents
1. 2. STATISTICS AND ALL THAT JAZZ DESCRIPTIVE STATISTICS 1 3 4 6 7 7 8 9 14 15 16 17 17 17 18 18 19 24 25
2.1 Frequencies 2.2 Central tendency 2.3 Statistics for variability 2.4 Working with percentages:
3. PARAMETRIC AND NO N-PARA METRIC STATISTICS
3.1Testing the assumption of normality
3.2 Equality of variances 4. FROM QUESTIONNAIRE TO DATASET 5. SCREENING AND CLEANING YOUR DATA 6. MANIPULATING YOUR DATA
6.1Calculating the total scores of scales or indexes 6.2 Reversing negatively worded items 6.3 Collapsing a continues variable into groups
7. CORRELATION ANALYSIS
7.1 Statistics to test relations between variables 7.2 How to interpret the results of the correlations 7.3 The coefficient of determination (r2)
7.4 How to write up the results of a correlation analysis in a research report
7.5 Graphically representing the relationship between variables: 7.6 Other analysis that is grounded in correlation analysis
8. T ESTING DI FFERENCES BETWEEN GROUPS (CAUSAL RELATIONSHIPS)
8.1 What does “testing for differences between groups,” mean? 8.2 Testing differences between two independent groups: t-test for independent groups test 8.4 Testing differences between two dependent / related samples 8.4 The non-parametric alternative to the t-test for dependent/related samples: Wilcoxon Singed-Rank Test Variance (One way ANOVA) 8.6 The non-parametric alternatives for the One-way ANOVA References:
28 30 32 33 35 35 40
8.3 The nonparametric alternative for the t-test for independent samples: Mann-Whitney U
8.5 Testing differences between more than 2 groups on one variable: One-way Analysis of
For instance. The data comes first.g. the experience of cancer survivors. 2 the data comes from a representative sample and 3. you have to make sure that if your data is good. Field (2009) explains this by saying that we are actually building statistical “models” of reality. Make sure you choose the best one to increase the accuracy of your results. Should your data not meet the criteria for the statistic needed to answer your research question. your data and secondly the statistics used. Some research questions are better answered by quantitative research. when you build a model you would want to use the best material to depict the reality as accurately as possible. After finding a topic. questions that revolves around determination (such as the prediction of -1- . It should be clear by now that although statistics is used for the analyses of the data. In terms of statistics these material refers to firstly. it should actually be considered from the start of the research process. Here. In terms of the latter. When you look at the model you would like to be able to say that “this is what reality” looks like! Of course. Qualitative research gives a better depiction of the depth of a problem. it implies that you collected data from the “real world” by means of a questionnaire (most commonly used) and now you want to tell the story of the “real world” by using statistics. most hard core qualitative researchers will RATHER DIE than use any form of statistics!) In short. e. for instance the prevalence of HIV Aids in South Africa. Stat istics and all that jazz Statistics is used in quantitative research to analyse and interpret the data collected during the data collection process. you will do a statistic will very little power and little validity or you may not be able to do the statistic at all. There is probably more than one option to consider when selecting a statistic. garbage-out principle applies.1. that you choose the best possible statistic to depict the “reality of your research question”. your data answers the research question. quantitative research gives more answers in terms of the breadth of a problem. (Although very elementary statistics such as frequency counts are sometimes used in qualitative statistics. every statistic has a set of criteria to be met for optimal usage. the garbage-in. Then in terms of the statistics. Generally research topics for explorative research (topics not explored in great depth) are better answered through qualitative research. Make sure before the study that. 1. From its nature. a research question should be stated at some point and out of the question flow the purpose of the research. the data meets the parameters of the statistics you want to use.
you should already have an idea of what type of analysis you can possibly use. Most statistical analyses have some data requirements. 1: The research process -2- . For instance. validity (i.g. most parametric statistics require data to be at least on interval scale of measurement). whether gender is the cause of a negative attitude) are all better answered through quantitative methods.e. When stating the research question.one event by means of another). Has the topic been explored in depth and breadth? Find a topic What methodology would answer the Research question best? Research Question Does the design fit the criteria for the statistical analysis? Is the sample big enough for the statistical analysis? •Design oPlan for measurement oSampling plan and procedures oData analysis •Interpretation of results •Conclusion & recommendations Data analysis & interpretation The Research Process Fig. requirements of sample size and level of measurement (i.e. the validity of a questionnaire) and causal relationships between variables (e.
-3- . age. You can use it to describe the sample.BOX 1: Different approaches to research 2. The first step of statistical analysis usually involves descriptive statistics. By doing descriptive statistics you will be able to draw a profile of the managers that took part in your research. Descriptive Statistics Descriptive statistics tells you what your data looks like. as well as questions with regard to their management style. You would also be able to get an idea of the management styles they use.g. Let’s say the questionnaire was about asked biographical questions about the managers (e. Say for instance. to check if the data is fit for specific analysis or to answer a specific descriptive or exploratory research question. you used a questionnaire to gather data. gender) that completed it. year’s experience.
8 15. In other words. it is very obvious that most of the voters (908) voted for Clinton in 1994. I wanted to see the frequency of people that voted for each one of the three candidates for the US presidential elections in 1994.1 Frequencies Frequencies indicate to us the amount of cases (respondents). 2.0 Valid Bush Perot Clinton Total In this example. different descriptive statistics are used.2 100. PEROT Frequency 661 278 908 1847 Percent 35. Suitable graphs to display frequencies for categorical data are bar charts or pie charts. This is since these two levels of measurement indicate different categorical answers in your data set.1 49. through to x. Nominal and ordinal data are henceforth referred to as categorical data/variables. Frequencies can be displayed in terms of counts or percentages. This is a variable that represents only 2 categories. Descriptive statistics include frequencies/frequency counts. Interval and Ratio level data on the other hand is referred to as Scale data/continues variables.8 15.0 Valid Percent 35.0 Cumulative Percent 35. Frequencies are usually displayed by means of frequency tables. but can also be displayed graphically in graphs and charts. different descriptive statistics are used for data from different levels of measurement.8 100. In my interpretation. A special type of categorical variable is the dichotomous variable. the variable of gender represents male and female. 2. statistics of central tendency and statistics that indicate variability/dispersion.8 50. Example of bar chart: -4- . BUSH. This is because they indicate respondent answers on a scale from 0/1.1 49. which falls into each of the available categories.For different types of data. Example of a frequency table: VOTE FOR CLINTON. 3.2 100. For instance.
Another option is to graphically represent it by means of a frequency polygon.VOTE FOR CLINTON.16% 15. one would display a frequency distribution graphically by means of a histogram and not a bar chart or pie chart.05% Here is a pie chart displaying the percentages of the frequencies. BUSH. In SPSS you also have the option of adding a normal curve to the histogram to get an idea of the normality of the distribution.000 800 Frequency 600 908 400 661 200 278 0 Bush Perot Clinton VOTE FOR CLINTON. Example of Pie chart: VOTE FOR CLINTON. BUSH. PEROT Here I drew up a bar chart of the frequency distribution in the above-mentioned example.79% 49. Although a frequency polygon is appropriate for ordinal data as well as other scale date. PEROT 1. PEROT Bush Perot Clinton 35. BUSH. it is not appropriate for nominal data. -5- . For a scale variable.
mathgoodies. Thus with categorical data the mode and median have the same function as a mean. (From http://www. What is the mode? 8. 10. 15. If most of the people indicated they were medical doctors that would be the mode of the dataset for that question.2. The mean -6- . 15. 10 Solution: Ordering the data from least to greatest. 9.html): Example 1: The following is the number of problems that Ms.uwsp. 9.com/lessons/vol8/mode. 18 Answer: The mode is 9. 6. Say for instance. 9. For ordinal level data the best indicator of central tendency is the median the median is the exact middle point of the data set. one uses the mean (average score) as indicator of central tendency. 8. As an example of mode look at the following (From Glosser (2004) http://www. we get: 6.2 Central tendency For variables measured on nominal scale. your question asked peoples occupation from a list. 9.htm) For interval and ratio data. 14. 11. Matty assigned for homework on 10 different days. 11.edu/psych/stat/5/CT-Var. 9. 9. The mode indicates the category with the greatest number of cases. 14. It indicates the value above and below which half of the cases fall. the statistic for central tendency is the mode. 18.
most researchers work out the percentage of frequency in each category. Standard deviation: The standard deviation of the scores from the mean in the deviation same measurement unit as the original score. another type of measure that can be used to summarise a data set is the measures of dispersion or variability. Percentages represent the proportion of responses within each category in your dataset. These measures refer to summaries of the size of the differences between each score and every other score. N can also be described as the “base.4 Working with percentages: In order to compare frequencies. The greater the differences the more the mean fails to represent the data set.is not used with interval data if the distribution is skewed (not normal). To calculate the percentage. the variability is often not used as a description. and serves two purposes: 1) it simplifies the data by reducing the numbers to a range from 1 – 100 and 2) it translates the data into a standard form for relative comparrison. There are three measures of variability: Range: Range The difference between the largest and smallest score Variance: Variance Extent of the differences among scores. the higher the variability. For scale data. Working 2. One can rather look at the minimum and maximum scores in the data set or the range. the standard deviation is used. The variance takes into account every score. 2. that if the standard deviation = 0. you need to know the number of observations in the category and the total number of observations in the data set. In this case you will use the median.3 Statistics for variability As mentioned above. The higher the standard deviation. Since categorical variables have a restricted range (it will always be bound to the number of categories).” or universe. -7- .” “total. The formula for percentages is: Percentage = f /N * 100% where f = the number of observations in the category and N = the total number of observations in the data set. Take note. The range takes only into account the largest and smallest score. all the scores are the same variability.
How do we know if there exists independence of observations? Well. or respondents that are in some way related to each other). 260 000 are women. it is extremely important that you test the assumptions of a specific statistic before you continue with the analysis. The total number of people living in Johannesburg is 132 000 000. This assumption is however different depending on the context in which it is used. the results will be inaccurate. This is referred to as a weighted average. As such. what is the percentage of poor women living in Johannesburg? There are some rules when it comes to interpreting percentages: 1. • Equal variances / homogeneity of variances: If two or more groups are compared. 3. while it may only indicate 3/5. When a very small base is used (say the percentage of out of 5) it is easy to overestimate the percentage. There are statistical ways to prove independence of observations – however they are not used for the type of statistics we will go through in this course. What is the percentage of people living in poverty? Of the total of poor people. Parametric and Non-parametric stat istics When we need to use inferential statistics. they should have equal variances or spread of scores • Independence: There must be independence of observations. To use parametric tests. Where did the data come from? Was it observations of two entirely different groups.000. Specific statistics has specific assumptions. People in poverty in Johannesburg is 400. they generally include: • Normally distributed data: It is assumed that the data is from a normally distributed population. 2. -8- .For example. except when the data are paired (paired data refers to data that is related to the same respondents over more than one measurement. For instance. Percentages cannot be averaged unless each is weighted by the size of the group from which it is computed. or used in the research. like in pre-and post measurements. If it does not meet the assumptions. or was it a pre-post measurement of the same group. however. 60% would seem like a huge difference. the optimal is to use parametric statistics. you should also understand that the population’s distribution should also be normal. How do we prove that this assumption was met? Easy! By describing and explaining the research design. If you remember that inferential statistics is done to prove that some or the other results are applicable to an entire population. the data that we use should meet a number of assumptions. you will have to look at the design of the research.
When the assumptions of parametric tests are not met. This assumption is tested by common sense and not through a statistical analysis.1 Testing the assumption of normality It is unimodal – thus. Although non-parametric statistics also has some assumptions. there are fewer restrictions on the data that can be used. we should look at the non-parametric alternative to the parametric test (we will also look at non-parametric alternatives with each type of statistic in following tasks).• Interval data: The variables (specifically the independent variable) should be on at least interval level of measurement (or if categorical it should have a minimum of 7 categories). The general assumptions If we look at the assumptions above it is clear why non-parametric statistics are often referred to as statistics for small samples and distribution free tests. mode and medial are equal It is symmetrical (not skewed) It is asymptotic (the extreme scores never touch the x-axis) It’s neither too peaked not too flat. of non-parametric statistics are: • • • • • Independence of observations except when paired Few assumptions concerning the population’s distribution The scale of measurement of the dependent variable may be categorical or ordinal The primary focus is either the rank ordering or the frequencies of the data Sample size requirements are less stringent than for parametric tests. 3. it has only one hump in the middle of the distribution with the mode in the middle • • • • The mean. thus the kurtosis is equal to 0. What is a normal distribution? The normal distribution has 4 characteristics: • An illustration of the normal distribution The statistics to look at when you check for normality of the distribution include: • Skewness -9- .
but instead of selecting “skewness”. Kurtosis on the other hand measures the flatness or peakedness of the distribution. The +/. E. choose: Analyse > Descriptive statistics > Descriptives > From the options… box. The higher the number.10 - . A perfect normal distribution has kurtosis = 0. select “kurtosis”.• • • • • • Kurtosis Kolmogorov-Smirnov (or K-S from now on) (the vodka statistic) Shapiro-Wilk test Q-Q Plots Box-and-whiskers plots Histogram Skewness refers to the lack of symmetry. you can follow the same procedure as for skewness.) To use skewness and kurtosis to see if the distribution is normal. Very peaked distributions have positive kurtosis and very flat curves have a negative kurtosis.g. A distribution with a long tail to the right have is positively skewed and visa versa. select skewness. “Analyse”. How to see the skewness of a distribution with SPSS: From the menu.845. you have to convert the given skewness and kurtosis scores to z-scores. The output will give you a number.-5. the more skewed the distribution. Use the following formula: zskewness = (K-0)/SEskewness (Both skewness and kurtosis can also be computed by SPSS under the “Frequency” option of .in from of the number indicates to what direction the skewness tends and the number how skew the distribution is. To check the kurtosis.
SE = Standard Error (of skewness or kurtosis). When a sample is larger than 200.154 Frequency 1. 139). you can graphically see if the distribution is skewed or flat or peaked.5 2 2. choose: Analyse > Descriptive statistics > Frequencies > From the charts options… box. In larger samples.96. Error of Kurtosis 1. If the value is smaller than 1. . Significance tests of skewness and kurtosis should with large samples because they are likely to be significant even when skew and kurtosis are not too different from normal (Field. this value should be increased to 2. p.232 N = 1.594 .000 500 0 0.500 Skewness Std. one should look at the shape from the histogram rather than significance testing. When you draw up a histogram.94 Std. And very large samples it should be increased to 3. = 0.5 Mean = 1. S = Skewness. A normal distribution on a P-P Plot should be a diagonal straight line.11 - . K = kurtosis.or z kurtosis = (S-0)/SEkurtosis.29. Dev. the distribution is normal. Skewness and kurtosis gives us a numerical value by which we can judge whether a distribution is normal or not. Histogram 2.000 N Valid Missing 1013 504 -3. Error of Skewness Kurtosis Std. 2009.58.817 .077 12. Another plot that can be used is the P-P Plot (Probability-probability plot).013 Counselling for Mental Problems Example of SPSS output How to draw a histogram with SPSS: From the menu. select histogram and select the tickbox with normal curve.5 1 1.
Drawing a P-P Plot with SPSS: From the menu. Here the split file dialogue box will open: Select “Organise output by groups. sex). When you select the split file function. Another way in which normality can be tested is by means of the Kolmogorov-Smirnov (K-S) and the Shapiro-Wilk tests. any subsequent procedure that you will do in SPSS will be carried out. (To switch it of follow the same path (given below) and click on the reset button.12 - . and the select the grouping variable (e. The split file function allows you to identify a grouping variable (a variable that is used to specify categories of people).g. choose: Analyse > Descriptive statistics > P-P Plots Box 3: Describing the different groups in your sample: Using the split file command Most of the time there are different subpopulations represented in the sample.) To select the split file command: From the menu. choose > Data > Split file. For this reason it is important to turn off the split file function after you have completed the computations you wanted done in that way. on each category specified by the grouping variable. One of the functions in SPSS that can help you do this is the split file function. in turn. In these cases you would most likely want to explore each of the subpopulations. Then click OK. These tests compare the distribution with a comparable normal .
and-leaf. there is a significant difference between the population and sample . it means that the sample distribution is normally distributed). The null hypothesis in this case will be: There is no difference between the distributions of the sample and population (thus they are equal). for males and females).643 .548 . If you click on Plots select under box plots. If this is true (if we accept the null hypothesis). you can put it in the factor list. Continue. In both these tests.000 .444 574 .000 .000 a To Be Well Liked or Popular To Obey Respondent's Sex Male Female Male Female Statistic . Komogorov-Smirnov & Shapiro Wilk in SPSS: How to do this? Well.distribution. Also Select the Normal plots with tests and click on continue. Tests of Normality Kolmogorov-Smirnov Statistic df Sig.000 a. choose Analyse > Explore…In the dependent list.857 Shapiro-Wilk df 408 574 408 574 Sig.000 . otherwise use the K-S test. i less than 0. put all the variables of interest to you (that you want to test). If any of the variables is a grouping variable.05. The Shapiro-Wilk test is used for small sample sizes (less than 50). Then OK.000 . Under descriptive select stem. The limitation of these tests is similar to the skewness and kurtosis significant tests: that is. Remember that we always statistically test to reject or accept the null hypothesis. the option of factor levels together. we are actually testing a hypothesis. but only some that is of importance specifically for the K-S and Shapiro-Wilk. This hypothesis is that the distribution of the sample is the same as the distribution of a population with the same mean and standard deviation. if the sample size is large.g. it is actually easy with SPSS. Lilliefors Significance Correction Example of normality tests output How do you interpret these tests? The statistic is the actual K-S statistic and the df is the degrees of freedom (should be the same as the sample size).13 - . or significance value.000 . If the sig. . For this reason one would always plot data and use the graphs in collaboration with any other test used. it will easily show significant differences (non-normality). There is a lot of output. This will split the file so that your computations will be done for different subgroups (e. If you click on statistics select Descriptives.227 408 . The important statistic is the tests for normality. First. . You may look at the descriptors per variable if you haven’t drawn up any originally. you have to do the statistics by: From the menu.865 . The one we look at to judge whether to accept or reject the null-hypothesis is the sig.266 574 .000 .398 408 .
Significance larger tan 0. 3. However.052 1 53 .820 . The grouping variable should be in the “Factor list”.856 Read the statistics based on the mean. If the significance is smaller than 0. I will report: The Kilmogorov-Smirnov statistic was significant (p<0. Test of Homogeneity of Variance Levene Statistic Age Based on Mean Based on Median Based on Median and with adjusted df Based on trimmed mean .457 Sig.distribution – therefore we reject the null hypothesis and say that the distribution is not normal.14 - . F as well as the degrees of freedom (df) should be .792 . Under the “Plots” options select Histograms with normality plots and “untransformed” under “Spread vs Level with Levene’s test”. these give you an indication of the variance of the different groups. it indicates that the variances are not equal.033 df1 1 1 1 df2 53 53 52.033 . The most common of these are the Levine’s test of homogeneity of variance and the Bartlett’s test for homogeneity of variance. You will see that the normality output will also include Q-Q plots and stem and leaf plots and even box-plots (box and whiskers plots). In the case of the above shown table.856 .05) and therefore the distribution is not normal. The Levene’s test in SPSS Explore: Go to Analyse>Descriptive statistics>Explore…put the dependent variable in the “Dependent List. To report the results of the Levene’s test: Levene’s test is denoted by the letter F . There are other statistics that tell us to what extent there are significant differences between different samples.2 Equality of variances You can see the variances by using the descriptive and frequency commands in SPSS.05 indicates that variances are equal. but you do not know if the differences on face value are statistically significant.05.070 . .
you will have to wait until after you collected your data. For this reason. I will illustrate this as: Variables Gender Variable SPSS Variable Name SEX Coding Instruction 1 = Female 2 = Male The codebook can be created as soon as your data analysis tool is finalised and contains only closed answer categories. was some knowledge. if I measured level of statistical knowledge. The general form of reporting is: F(df1. “Gender” will be the variable name. For the purposes of using statistical programmes.g. From questionnaire to dataset The data collected during the research needs to be coded and entered into SPSS to create a dataset with which you can work. In my codebook. In the SPSS data file. So the levels would be the codes that I will use to indicate the levels of statistical knowledge. such as an open ended questionnaire. E. df2) = value . as well as their labels and the codes ascribed to each answer category given. the label may be STATKNOW and the levels of that variable was measured on a 5 point scale where 1 was no knowledge.mentioned in the report. 4. Variable names should: • • • Be unique Must begin with a letter (not a number) Cannot include full stops. For instance. blanks or other characters . where 1 indicates “Female’ and 2 indicates “Male”. The codebook lists all the variables included for the statistics. I will refer to gender as “SEX” and the codes that identifies each respondents’ gender is 1 or 2. 0. In the case where you want to use a qualitative data collection tool.15 - . you have to define and label the variables you measured during data collection. When you are measuring a lot of variables it is very easy to become confused with codes and labels. F(1. 4 was above expected and 5 was exceeding knowledge.792. a question asking each respondents’ gender). 53) = 0.070. researchers create codebooks. For instance. if I measured gender in a questionnaire (in other words. 3 was average knowledge. sig.
it often happens that mistakes are made when capturing data. And. . lt.• Cannot include words used as commands by SPSS (all. by.16 - . To screen data means that you explore the dataset for any errors. Lucky for her she went back and checked her data before she started writing the report. Even open-ended questions should be transformed to numerical codes to use it in SPSS. But you may decide to use MS Excel for the data analysis or SAS (Statistical Analysis System). gt. (Note that data should be in an Excel spreadsheet or a text file to open with SPSS). find the errors and correct them. When the dataset is faulty. ne. not. When you are working with raw data (the answers of the respondents are on the questionnaires only) you need to create a template and insert the data into the SPSS spreadsheet. you will need to create some form of data set for it to work on. Otherwise you would not be able to do any statistics with them. to. ge. it can lead to wrong conclusions and therefore invalid and unreliable research! For this reason. right. 5. or. you would have to read the data into that programme. To identify errors means that you have to know what the correct data will look like. with) Cannot exceed 64 characters • The responses must all be coded with numbers. Scr eening and cleaning your data Sally did research on managers’ stress level and blood pressure. Before you can analyse data with a statistics programme like SPSS. To her amazement she found inconsistent results. It turned out that Sally made a lot of mistakes while reading in the data. In which case. She collected the data on stress using the General Stress Inventory and a registered nurse took the blood pressure levels. the first step after capturing data is to screen and clean the dataset. If our data is in an electronic form. and that caused the inconsistent results! Like with Sally. eq.. As soon as Sally had all the data she read it into SPSS and started the analysis. The dataset will need read the data you collected into the chosen programme for it to work with the data. it can be opened in SPSS. Since you will be working on SPSS you will need to open or create an SPSS data set. For this course we are using SPSS (Statistical Programme for the Social Sciences).
To do this in SPSS go to > Transform >compute variable.1Calculating the total scores of scales or indexes In some questionnaires a number of questions (items) measure a specific construct. Continues scores may need to be collapsed into categories to create a categorical variable. For instance. If this is the case. if you measured the variable of home language in South Africa with a closed ended question with 11 answer options (one for each language). 6.This is easy. we would like to add the responses on these items to obtain a total for each person. what if you find that a variable is not in the same range as you expected. how will you know which one of the cases is out wrong? You can either search the variable or you can do more detailed descriptive statistics.2 Reversing negatively worded items In some scales. you will not look at the single items alone. basically you want SPSS to describe the data. . See. the wording of particular items has been reversed to help prevent response bias. 6. Using the “Transform” function in SPSS. If you do not know what the correct value is. We may also use scale in which case we want to add the responses of all the items together to obtain a scale score. Skewed distributions can also be transformed if needed. In other words. 6. easy! Now. for instance to adding the scores on individual items of a questionnaire to get a scale score. And. the item can be recoded positively. how do you screen for errors in SPSS? Well. you know that for the variable of language there is a range of 1 – 11. you know what the data should look like since a codebook is available that shows you what the range of the data should be for each variable. Manipulating your data With SPSS one can add up scores. Anything outside this range will be a mistake. or if too few responses of a specific category are present the number of categories on a questionnaire can be reduced. you need to delete the value and replace it with a missing value (or just keep the cell empty).17 - . Your choice! As soon as you have identified the error you can replace it with the correct value by going back to the raw data (questionnaires).
you have much more detail on your samples age. and the amount of drowning is very high. For instance in terms of income. It gives you also a wider variety of analysis to work with since if needed. Does this mean that ice cream sales cause drowning? Or does it maybe mean that drowning cause ice cream sale? Of course not! There is no logical or theoretical link between these two events! So a relationship implies that at a given time. The nature of a relationship/association implies its strength and the direction of the relationship. you will transform the continues variable into a categorical variable. You may ask. So in writing the answer. but say you want to compare the three different income groups on for instance the variable of hope. you may have a continues variable of income. there will be a correlation or relationship between ice cream sales and drowning. A coefficient of 0 indicates no relationship and 1 indicates a perfect relationship. The strength of a relationship is indicated by a correlation coefficient (the symbol r is used to indicate the correlation coefficient in statistics output). using interval or ratio level of measurement gives you much more detail to work with. you may always collapse the continues variable into a categorical one. influence does not imply a causal relationship! If ice cream sales in Bloemfontein are very high this month.3 Collapsing a continues variable into groups Sometimes you will need to divide your sample according to scores to create groups. in a given context. Relationships between variables are also referred to as associations between variables. middle income and high income if the question on the questionnaire asked you to write in the income.6.18 - . Correlation analysis When we talk about relationships between variables. If you ask age in categories. we imply that the variables influence each other. every person in your sample will just fall into a category but if you ask the specific age. the rate or frequency of occurrence of two variables (that is in this case ice cream sales and drowning) increase. why do you not use categories from the beginning? Well. To do this in SPSS go to > Transform > Recode into different variable 7. you would want to create categories of low income. Take note. The correlation coefficient is a number between 0 – 1 that indicates how strong the relationship between variables are. . In such cases.
1 Statistics to test relations between variables Different statistics are used to test the relationship between variables. If the analysis is significant. the aim of the research when you are using correlations is to describe the relationships that exists between a and b. They are all referred to as types of correlation analysis. Questions about relationships between variables are usually descriptive research. I will do a correlation analysis. A positive relationship implies that if the properties of the one variable increase. a positive relationship means that the variables co-vary in the same direction. In statistics. Spearman.The direction is whether the relationship is positive or negative. They include: • • • Pearson / product-moment. it will tell me that the better the higher order thinking skills. both of them are fitted on a straight diagonal line (See the scatter gram examples below – you will see that a positive and negative correlation both fit on a straight diagonal line). a negative relationship means that the variables co-vary in different directions. . In other words. A positive relationship is also referred to as a direct relationship. the properties in the other one will also increase. THUS. the properties in the other will also decrease. A negative relationship means that if the scores in one variable increase the scores in the other variable decrease. 7. Or if the properties in the one decrease. That is. Therefore. It does however not tell me that higher order thinking cause statistics understanding! There is a difference. THUS. A negative correlation is also referred to as an indirect relationship. the better students’ understand statistics. I will ask: Is there a positive relationship between higher order thinking skills and student’s understanding of statistics? For relationship questions. Point Baserial. a correlation analysis is used to test the nature of the relationships between variables. but are used for different types of data. I will conduct a correlation analysis.19 - . As example. relationships are also referred to as correlations – positive and negative correlation. The positive and negative correlations refer to linear relationships – in other words. if I want to know if students with higher order thinking skills understand statistics better.
1. in statistics we have two legs or two kinds of statistics – those that are parametric and those that are non-parametric. if they are skewed. The Pearson correlation coefficient is a parametric statistic. Parametric indicates that there are certain assumptions or parameters (borders) that the data should adhere to in order for it to qualify for parametric statistics. but preferably normal. To use the Pearson productmoment correlation your data should adhere to the following assumptions or parameters: • • • Data must be on Interval level A linear relationship must exist (can be indicated by means of a scatter plot) The distributions must be similar (Thus. in other words. • Outliers must be identified and omitted from the computation (please note if you delete the outliers. The Pearson correlation is also a parametric test or a parametric statistic.1 The Pearson / Product-Moment correlation A Pearson correlation coefficient is used when you are working with continuous data.• Phi coefficient and so forth. they must be skewed in the same direction). Should the data not adhere to the parameters or assumptions. data on the interval or ratio level of measurement. the equivalent but NON-parametric alternative should be used. 7. In short. delete only the cell with the outlier value) .20 - .
.How do I know if there are outliers? To see if there are any outliers. the value can be replaced by: a. To read stem and leaf plots use the following link: http://www. which is not an outlier The median The 1ste – 3de quartile of the distribution. which is not an outlier Outliers cannot be included in the analysis. Data can be transformed: Outliers skew distributions. thus it will be the bell part in a normal distribution Outliers or extreme values that do not fit with the rest of the distribution The minimum value. The box plot gives you a good idea of the outliers and the identity of the outliers. There are different ways to deal with outliers: 1.edu/stats/definitions/stem.Explore Outliers . 155 for a short and understandable description of different transformation options. (See Field (2009) p. Outliers can be removed 2. 3. Both of these can be drawn up under Analyse…Descriptives. we draw up a box-and-whiskers plot and a stem and leaf plot.cmh.Under statistics select Outliers and under Plots select stem and leaf.htm. In other words. Changing it to the next highest score in the dataset plus 1. The mean plus two standard deviations . The maximum value. it does not only show you the outlier.. Change the score: Should the transformation fail. The skewness can be reduced somewhat by transformations of the dataset.21 - . b. but also which number in the data set has that particular value.
1. In other words. To do this on SPSS use the same procedure as with the Pearson correlation. This will show SPSS that the significant correlations must be marked on the output.2 The Spearman Rank-Order correlation / Spearman’s Rho Spearman’s Rho is the non-parametric alternative of Pearson correlation coefficient. Select “Pearson” under Correlation coefficients.22 - . select the variables that you want to correlate. E. To do this on SPSS use the same procedure as with the Pearson correlation. you can select “Flag significant correlations”.1. females and males report the total number of years of education they have had. the other should be at least on interval scale). but select the Spearman Rank-Order option instead. One tailed is only chosen when you have specified the direction of the effect (relationship). In the bottom left hand corner. We will mostly work with this one. 7. Under Test of significance. It is stricter and if you do both the Tau and Rho.g. 7. two-tailed means that there is no specification of the direction of the correlation in the hypothesis stated. but select the Kendall’s Tau option instead. It is used when one or both of the variables are measured on ordinal scale (If only one.1. you will probably find that the Tau is a bit lower than the Rho. and we want to know whether there is any correlation .4 The Point-Baserial correlation This statistics is computed when you want to see the relationship between a continues variable and a dichotomous variable. 7.To conduct a Pearson-correlation follows the following steps should be used in SPSS: From the menu bar select: Analyze > Correlate > The options that you can choose from at this stage is: Bivariate > Partial > Distance > A bivariate correlation is a correlation between 2 variables. Spearman’s Rho is indicates as rs. a directional hypothesis.3 Kendall’s Tau Kendall’s Tau is another non-parametric correlation and it should be used rather than Spearman’s coefficient when you have a small data set (50 or less). In the following box.
The assumptions that the data must meet to utilise the Phi coefficient is: • • • • Variables must be dichotomous Observations are independent The observations are in the form of frequencies and not scores There must be at least 5 counts in each category for each variable.017 1.943 1.929 df 1843 1677.521 .004 -.05 it means that the two groups have equal variances.134 .685 .1. How to test for equality of variances in SPSS? To test for equality of variances.Independentstatistics…compare means….259 . It is indicated by rpb The assumptions that your data must meet to compute a Point Baserial correlation is: • The dichotomous variable has mutually exclusive groups whose values have been coded 1 and 0 • • • The two groups created by the dichotomous variables are normally distributed The two groups created by the dichotomous variables have equal variances The continues variable has equal variances across each level of the dichotomous variable To compute a rpb you use a normal Pearson correlation procedure.057 .23 - . If the significance value is more than 0.248 .177 .065 .090 Sig.252 Mean Difference . . Error Difference .978 Sig.047 .between gender and years of education.052 . The grouping variable will obviously be the dichotomous variable and the continues the one which you want to test differences for.259 .298 1845 1680.002 -.046 -.5 The Phi-coefficient When both variables are dichotomous the phi-coefficient is used (indicated as rphi).057 RS HIGHEST DEGREE 5. Then click OK.079 t 1. (2-tailed) . an easy way is to select from the menu bar means….065 Std. .147 Look under Levene’s test for equality of variances. This procedure will give you a table in the output that OK looks like this: Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -.523 .177 F HIGHEST YEAR OF SCHOOL COMPLETED Equal variances assumed Equal variances not assumed Equal variances assumed Equal variances not assumed 3.054 .154 1. 7.Independent-Samples T-Test.133 .
You look at the Phi statistic only.24 - .4 0.6 Cramer’s V Coefficient When you want to test the association between two categorical variables (not dichotomous) you use the Cramers’ V statistic. Here is a rough guide to interpret the correlation coefficients in terms of strength of relationship: Correlation coefficient (r) 0.2 0.136 . go to the statistics option and select “Phi and Cramer’s V” and continue and OK.1.7 – 0. However. low Moderate Strong.0 – 0. very high Strength of relationship You have to remember to look at the direction of the correlation as well. It does not tell you whether that relationship is statistically significant or not. 7. Obtain this by the same steps as above.7 0.05 there is a significant relationship between the two variables. The output box should give you a table like this: Symmetric Measures Value . marked Very strong. You can only interpret the correlation in terms of strength if the correlation is statistically significant. . If the significance value is less than 0.4 – 0.2 – 0.From the menu bar select Analyze > Describe > Cross Tabulations Go though the same process as you would with cross tabulations.To compute the phi coefficient with SPSS: To compute the phi….208 .9 0.0 Very weak.9 – 1. b.2 How to interpret the results of the correlations A correlation coefficient tells you two things: 1) the strength of the relationship between the variables and 2) the direction of that relationship. . high. Sig. Using the asymptotic standard error assuming the null hypothesis.208 1847 Approx. 7. negligible Weak.136 Nominal by Nominal N of Valid Cases Phi Cramer's V a. Not assuming the null hypothesis.
78 (p < 0.edu/garson/pa765/correl.01 is used.4 How to write up the results of a correlation analysis in a research report Mostly you will write something like…The results of the chi-square analysis indicated a significant but weak association between group membership and post intervention fear (Chi-square = 0.htm). 7. In SPSS.05.03)…depending on which analysis you used. thus indicating that the results have only a 5% or 1% chance of being likely by chance alone.05.05 or 0.40.05) then the r = 0.ncsu. Statistical significance is usually indicated by the alpha value (or probability value). This r 2 2 is also called the coefficient of determination (see http://www2.6084. For most research studies the significance level of 0. For example if correlation coefficient between age and social intelligence is 0. p= 0. A scatter plot shows how the scores on the variables co-vary (go together). Since the scatter plots gives you such a good picture of what to expect from a correlation coefficient. likely represents a true relationship between the variables. 7.5 Graphically representing the relationship between variables: It is probably the easiest to see if a relationship exists by drawing up scatter plots of the different variables that you would like to test.What is statistical significance? A statistical concept indicating that the result is very unlikely due to chance and.3 The coefficient of determination (r2) When the correlation coefficient is squared. therefore. Remember to interpret it in terms of the practical value of the research.25 - . If the pvalue is smaller than 0.chass. which should be smaller than a chosen significance level. 7. we look at the p-value to tell us whether results are statistically significant or not. we know the results are statistically significant by 0. it gives us an indication of the amount of variability in the one variable that is explained by the other. it is the fist step of a correlation analysis – to first draw up a scatter plot . This can then be interpreted as: The amount of variability that can be explained in social intelligence by means of age is 61%.
you can display the covariance between several variables on the same axis / diagram. With the overlay scatter plot option. The different types of scatter plots that can be drawn up are: • The simple scatter plot (as indicated above).We will use the simple scatter plot for now.26 - . Click on Define > Select the variables for the analysis and place as x and y-axis > If there is a grouping variable that defines different categories you may place it in the “Set markers by” block > Select the Titles option below to give headings to the plot. it is drawn up in a matrix. The Matrix scatter plot does the same but rather than drawing it up on the same diagram. • The overlay scatter plot. A box with the different scatter plot options should appear . • The Matrix scatter plot and • The 3-D scatter plot. The 3D scatter plot is used to draw a diagram of the relationship between 3 variables.Examples of scatter plots: A positive relationship A negative relationship No Relationship To draw up a scatter plot in SPSS: From the menu bar select graphs > Select Scatter. . This type of scatter plot looks at the relationship between two variables.
For more information on correlations and regression.uk/bto/statistics/tress11. (2004). When the criterion is not on interval level of measurement.edu/Lawecon/WkngPprs_01-25/20.net/methods_regression_analysis.Sykes. and reliability analysis.pdf o o o o http://www.html http://www.edu/modules/dau/stat/dau2_frm. A.E. one can use a regression analysis to determine the extent to which the high school achievement can be used to predict achievement at university. For instance.bto.ed.pdf G.investorwords.edu/faculty/gerstman/StatPrimer/regression. They include Factor analysis.edu/~esuess/Links/Software/RegressionExplained/re gression_explained.sjsu. The Little Handbook of Statistical Analysis @ http://www.blackwellpublishing.gmu. Retrieved from: http://www. to name but a few commonly used ones.cne. if we take high school achievement and university achievement. o http://www2. see: o o http://bmj.com/collections/statsbk/11.HTM > Select Regression pages on the menu .telecom. page. While correlation analysis tests whether relationship exists and the strength of that relationship.O.tufts.html http://www.uchicago.pdf http://www.doc o o DAU Stats Refresher @ http://www. multiple regression is used when you want to test the predictive value of a number of predictors to a single criterion (outcome measure).law.csuhayward.7.html o (ND) An Introduction to Regression Analysis.valuebasedmanagement.6 Other analysis that is grounded in correlation analysis A lot of multivariate statistics is grounded in the logic of correlation analysis.html Dallal. regression analysis assess the predictive ability of an independent variable on a continues dependent variable. where the criterion should be a scale variable (continues variable on at least interval level of measurement). While simple regression will assess the functional relationship between one dependent (criterion/outcome measure) one independent (predictor) variable.Regression.shtml Correlation Sykes.com/4136/regression_analysis. regression analysis.ac. and regression analysis for curve fitting find @ http://helios.edu/~gdallal/LHSP.bmjjournals.27 - . cluster analysis.com/specialarticles/jcn_10_462. logistic regression should be used.
In other words. if you are a male. While a factor analysis can assess the construct validity of an instrument. In other words.Another procedure. In other words. 8. This method tests the internal consistency of the items that are supposed to measure the same thing. you have to compare groups (in this case males and females). The research question for this study would be: Is there a difference between the personality profiles of sales consultants and sales managers? . you can use a factor analysis.28 - . In other words. This would give important information for the recruitment of both groups. Test ing differences between gr oups (causal relat ions hips ) Sometimes we hypothesise that one variable (independent variable) may cause a change in another variable (dependent variable). Take the following example: Example 1: A researcher wants to know whether there is a difference in the personalities of sales consultants and sales managers. is the factor analysis. the validity and reliability of your data collection instrument is important. For instance. indicating the underlying structure (or reduced number of latent variables) present in the data set. One of the main principles of selecting a data collection instrument is that it should measure what you need it to measure and it should be a reliable indicator of what ever it is you are measuring. when you have 10000 variables in your dataset and want to look at how these variables fall together. the cronbach’s alpha is one way to assess the reliability of a questionnaire. thus. you will have certain interests that differ from the interests of females. With a factor analysis you can determine the underlying structure of a large data set. we think that gender can influence vocational interest.1 What does “testing for differences between groups. too prove your hypothesis you have to prove that the career interests of males and females differ from each other. On such a dataset the factor analysis will group the variables that fall together with each other. Thus.” mean? Researchers often want to test the similarities or differences of the properties or characteristics between groups. All of the reliability analysis options are under Analyse>Scale… 8. . which is based on the logic of correlations. One analysis which is very important when questionnaires are used in research is the reliability analysis.
Of course the researcher has to define each of the variables included in the study. The researcher will of course specify here what this instrument measures and how. He is not directly involved with the prospective buyer. They are: Type of post that is the independent variable or grouping variable. The sales manager is a person who is responsible for the sales of sales consultants within a specific division of an organisation. However. These sub hypotheses will specify how the personality profiles will differ. The researcher defines a sales consultant as a person who is responsible for the sales of a specific product of a company. The hypotheses will be set out as follows: H0 (null hypothesis): There is no difference in the personality profiles of sales consultants and sales managers. which is either sales consultant or sales manager. but rather with the management of sales consultants. H1 (alternative hypothesis): There is a difference between the personality profiles of sales consultants and sales managers. And a sales manager has to be in a post specified as sales manager for at least 1 year. Personality profiles are measured by means of the 16 Personality Factor questionnaires. the researcher needs to measure these concepts and therefore will specify the operational definitions/operationalise the variables. The researcher can go so far as to set specific sub hypotheses for H1.29 - . The above mentioned is the conceptualisations or conceptual definitions of the variables. For this. he will for instance define the groupings as: For a person to fall into the category of a sales consultant. A personality profile is a profile that defines the personality dimensions important for a specific group. and personality profile (the dependent variable). He is directly involved with the prospective buyer. he or she has to be in a sales consultant post for at least 1 year. For instance. the researcher can say that: .
For this. Equal variances are assumed. for the t-test. D and E.05. the Levene’s test for homogeneity of variances is used.H1 (a): A sales consultant will score high on dimension A.2 Testing differences between two independent groups: t-test for independent groups When a researcher wants to see if statistically significant differences exist between two different groups with regard to a dependent variable. You have to define the groups – use the codes of the data set. 4. The t-test is a parametric statistic. To do an independent samples t-test on SPSS select ANALYSE > COMPARE MEANS > INDEPENDENT SAMPLES T-TEST. and lower on A and F. The t-test uses the means to compare for differences. If sub hypotheses are specified. The following assumptions must be met: 1. F and Q4. This is given with the t-test output. The data for each of the two groups must be distributed normally. Select the dependent variable for the dependent list and the grouping variable under grouping variable. For instance. 8.g. This can be tested by means of the descriptions of skewness and kurtosis or the Q-Q plots (or any other test for normality). he will use the t-test for independent groups. 3. It is not essential for this procedure that the sample sizes of the two groups are the same. However. and low on Q2. if you want to test if there is a difference in the level of language skills between a group of matriculates from Gauteng and Limpopo. group 1 = 0. The latter will indicate that variances are equal. H1 (b): A sales manager will score high on dimension C. 2.30 - . Run the analysis. group 2 = 1. you will use the t-test for independent groups. the sample size should at least be 30 per group. the researcher will have to substantiate why and how he got to these hypothesis from previous research. This implies that the data for the dependent variable must be on at least interval scale. You will remember from previous SA’s that the significance value of the Levene’s test should be more than 0. . e.
The t-value is in this case 0.05 to indicate a significant difference.9152 Std.07605 .The output will typically look like this: Group Statistics Gender Male Female N 493 507 Mean 8.12464 .998) = 0.760 When you write the results of a t-test you must indicate the t-value as well as the significance value (p). Deviation 1.68866 1. the mean score for each group and the standard error of the mean.31 - .517. use the upper row output for the t-test. Should be less than 0. In this example.421 Sig. Levine’s test showed that homogeneity of variances could be assumed. the test shows that the variances are equal F(1. (2-tailed) .g if Levene’s test indicates homogeneity of variances as in this case. E.447).07040 Income before the program The first table shows the number of cases (N) for each group.760 . The degrees of freedom (N-1) The significance value.517. In this case.58510 Std.0787 .28209 F Income before the program Equal variances assumed Equal variances not assumed . For this example no differences exist.517 t . Thus. from the results.10354 .p = 0. p = 0. Levene’s test output. Lately it is required to report .0787 Std. Error Difference .448 Mean Difference .773 Sig. it is evident that there are no differences between the groups (t(998) = 0.760 df 998 989. Error Mean .447 . The mean difference when the mean of group A is subtracted from Group B Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -.9939 8.28191 . . This reports the statistical significance.760.12446 -.10363 Use the row of output corresponding to the outcome of the Levene’s test.
Pearson's r can vary in magnitude from −1 to 1. The hypothesis for a Mann-Whitney will look like: HO: there are no differences between the means of the samples ( non-parametric) H1: there is a difference between the means of the two samples ( The output will typically look like this: Mann-Whitney Test ) (median1?median 2) ) (median1 =median2 for . large. The most common is however using r and Cohen’s d .32 - . likely represents a true relationship between the variables. r = 0. with −1 indicating a perfect negative linear relation. we do accept it as being “significant”. For most research studies the significance level of 0. When it is smaller than 5%.01 is used. If this probability is larger than 5% we generally do not accept it as “significant”. this significance is does not necessarily mean that it is important.05. thus indicating that the results have only a 5% or 1% chance of being likely by chance alone. Statistical significance can sometimes be due to large samples. For this reason we calculate also the effect sizes of significant statistics.05 or 0. 1 indicating a perfect positive linear relation.the effect sizes of statistical results as well. the assumptions of the t-test for independent samples are not met. we know the results are statistically significant by 0. What is statistical significance? A statistical concept indicating that the result is very unlikely due to chance and.37 or larger.05.3 The nonparametric alternative for the t-test for independent samples: Mann-Whitney U test Used if. 8.24-. r = 0. therefore.1-. i.36. Box 3: Practical and statistical significance In SPSS. r = 0.e. If the p-value is smaller than 0. Cohen gives the following guidelines for the social sciences: small effect size. We test the significance of yours statistics by looking at the probability that our results may be due to other factors.23. which should be smaller than a chosen significance level. Statistical significance is usually indicated by the alpha value (or probability value). and 0 indicating no linear relation between two variables. There are different methods for calculating effect sized. However. medium. Data is not normally distributed Dependent variable is measured on ordinal scale Sample sizes are small (smaller than 30 larger than 5 per group). we look at the p-value to tell us whether results are statistically significant or not.
13 488.33 - . The assumptions are the same as for the t-test for independent samples.389. the differences between married and unmarried employees with regard to level of education is not significant (z=-1. p=0.00 -1.00 Test Statisticsa Level of education 119130.4 Testing differences between two dependent / related samples In some research designs. A good example of such a design is when a researcher wants to test the If the training programme is effectiveness of a communication skills training programme.165 Mann-Whitney U Wilcoxon W Z Asymp. the logical deduction would be that the scores on a second measurement of (after the training programme) will be higher than the first (before the training programme. We want to see if our test preparation course improved people's score on the test.00 242386.389 . In such cases the researcher would like to see if there is a difference between the two measurements. 8. except for the independence of observations. .165). Grouping Variable: Marital status When you report the results you must mention the Z score and the Significance level. Sig. Take the following example: We compared the mean test scores before (pre-test) and after (post-test) the subjects completed a test preparation course. effective. In the example above. (2-tailed) a. In the case described above.68 Sum of Ranks 258114. For instance a pre-and post measurement. a researcher has two measurements of the same group taken at two different points in time.01 242386. the research will use the t-test for related/dependent samples.Ranks Level of education Marital status Unmarried Married Total N 504 496 1000 Mean Rank 512.
People who did well on the pre-test also did well on the post-test. the groups are paired / the same and therefore. Remember. we assume that there is a correlation between the first and second measurement. However this is just on face value – we still do not know if this difference is statistically significant. To the right of the Paired Differences. The post-test mean scores are higher. Remember. Finally. There is a strong positive correlation. Next.34 - . we see the correlation between the two variables.First. this test is based on the difference between the two variables. we see the results of the Paired Samples T Test. we see the descriptive statistics for both variables. . we see the T. degrees of freedom. Under "Paired Differences" we see the descriptive statistics for the difference between the two variables. and significance.
but it is not a significant difference. Where the t-test uses the mean to test for differences.and post-test scores.05..053 If the significance value is less than .lboro. The test preparation course did not help! To conduct a t-test for related samples on SPSS you follow the same route as with the t-test for unrelated samples. Here. there is a significant difference. But. the Wilcoxon Signed-Rank Test is used to test differences. .The T value = -2. There is no difference between pre.35 - .ac.4 The non-parametric alternative to the t-test for dependent/related samples: Wilcoxon Singed-Rank Test When the level of measurement for a one-group pre-post test design is on ordinal scale. If the significance value is greater than. there is no significant difference. select the paired samples option/dependent samples t-test.5 Testing differences between more than 2 groups on one variable: One-way Analysis of Variance (One way ANOVA) Sometimes a researcher wants to compare the differences and similarities between more than 2 groups. the Wilcoxon Signed Rank test uses the median. we see that the significance value is approaching significance.171 We have 11 degrees of freedom Our significance is . or sample sizes are small. 05.pdf 8. For more info on the Wilcoxon Sing Rank procedure see: http://learn. data is not normally distributed.uk/sci/ma/mlsc/documents/wsrt. 8.
She measures time management skills by means of the Kubic Time-management questionnaire. H2: Students with moderate-to high time management skills will perform better on their masters dissertations than students with low-to-moderate and moderate time management skills. This questionnaire categorise a persons time management skills in Low. Moderate-to-high. This is however beyond the scope of this module. moderate-to-high and high time management abilities and students’ performance on their masters dissertations H1: Students with high time-management ability will perform significantly better in their masters dissertations than students with moderate-to-high. This is important because there are also two-or three ways ANOVAs or factorial ANOVAs which is computed when there are more than 1 factor or grouping variable used in the comparison. the ANOVA looks at the variances (or differences in variances) between the different groups. low-to-moderate. The hypotheses are as follows: H0: There is no difference between low. Research skills are measured by means of the outcome/score of a student’s performance on a masters’ level dissertation. This will therefore be the grouping variable. a one-way ANOVA can be performed. This indicates that there is only one independent variable (grouping variable) or factor involved. The research question here is: Do time management skills influence students’ research skills? For this study time management skills is the independent variable. Research skills are the dependent variable. we assume that there are differences somewhere . low-to-moderate. low-to-moderate and low time management skills. and High time management.36 - .Example: A researcher thinks that students’ research skills are influenced by their time management skills. As the name indicates. (H3 and H4 will follow in the same pattern) To test these hypotheses. If differences exist. but worse than students with high time management skills. Note the use of “one-way” in this type of statistic.
00 .2500 11. To see between which groups differences exist.37 - . It does not tell you between which groups these differences lie. SPSS output will typically give you the following: a. The variances of all the groups must be the same (homogeneity of variances) 5. The ANOVA output itself only tells you that there is a difference somewhere. The most commonly used in the Tukey’s Honestly Significant Difference or HSD test.4560 8. Both these tests are conducted when equal variances of groups are assumed (parametric assumption). The dependent variable must be measured on an interval or ratio scale 3. The one-way ANOVA as explained above is a parametric test.00 8.9540 Std.00 11. you may select from Tamhanes. the H0 can be rejected.05175 Minimum 6.0556 N Did not complete high school High school degree Some college Total 459 348 193 1000 Mean 7.07344 . Deviation .00 6.1738 11.6776 9.72273 1.3262 11.00 14. or not.8524 7. Error . The dependent variable must be normally distributed in the population – for each group being compared. Or from all the groups.6023 9.7528 9. but should preferably be larger than 30 for each group. The assumptions or requirements for the data is the same as for the t-test for independent groups: 1. Thus. post-hoc tests are used. which adapts for the differences between group variances.00 Maximum 10.03829 . The chances of a TYPE I Error is enhanced since repetitive comparisons are made between groups.03874 .00 10. 4.00 14. Dunettes or Games-Howell post-hoc tests. The Bonferonni test is also used since it controls for the TYPE I Error (finding significant differences when there are none). Sample sizes need not be equal.3111 8.6008 9. For this.between the means of the different groups. There are different post-hoc tests. When equal variances are not assumed. at this point in time. Descriptive statistics: Descriptives Income before the program 95% Confidence Interval for Mean Lower Bound Upper Bound 7. SPSS gives you a choice of post hoc tests.02030 1. but all other assumptions are met. at least two group means differ significantly from each other. All observations must be independent from each other 2.63663 Std.82043 .
405 2675.691 F 1436. The F-ratio is computed by between MS/within MS. In the first column. . If the grouping variable has an effect (in other words when there is a difference between groups) the F- . you will see that the output shows you the location of the differences – being either between groups (that is differences of variance between groups) or within groups (that is. Test for homogeneity of variance: Test of Homogeneity of Variances Income before the program Levene Statistic 18. .240 . the variances between groups are not equal.884 df 2 997 999 Mean Square 993.420 df1 2 df2 997 Sig.The descriptive statistics include the number of respondents per group (N).38 - . If the Levene’s test is significant (sig. b. THUS.000 As with the t-test the ANOVA output.000 Between Groups Within Groups Total This is what you look at to decide whether there are any differences between the groups.05) the null hypothesis (that states that variances are equal) is rejected. should give you the results of the Levene’s test for homogeneity of variances.399 Sig. MS is the MEAN SQUARE (the variance) which is computed by SS/df. the minimum and maximum scores. The ANOVA uses the F-test/F-distribution to test for differences between groups. The df stands for DEGREES OF FREEDOM and is N-1. the mean or average score per groups./p<0. if you choose. This gives you a fair idea of which post-hoc tests should be interpreted. the standard deviation. The amount of variation for each of these is computed by means of the SUM OF SQUARES and the DEGREES OF FREEDOM. the amount of variation that exists within each of the groups). The ANOVA / F-test ANOVA Income before the program Sum of Squares 1986. b. The F is the F-ratio.479 689. standard error of the mean. For differences to be significant the between MS should be much larger than the within MS.
9458 -3.6073 2.5724* -3. c.2060* -1.5724* -3.000 .05 level.08304 .6110 2.000 95% Confidence Interval Lower Bound Upper Bound -1.08304 . The significance value for the group .5724* -3.4307 -2.7028 -2.7112 -2.2060* 3.9495 1.4053 3.07463 .7784* 2.9735 1.5724* -2.199.05911 .0308 -1.4445 -3.ratio should be larger than 1.0308 3.08283 .7784* 1.4337 -3.4421 -3.000 .05.000 .05447 . The mean difference is significant at the .08304 .0270 -1.2060* -1.7028 -3.5724* -2.07134 .4421 -2.05447 .07463 .7142 -3.08283 .000 .000 .5724* .5724* -2.7784* 2.07463 .05447 .4445 -2.7004 -2.4337 -2. If the sig.4015 *. p=0.0104 1.9735 2.7784* 2.4053 -1.4307 -3.7784* 1.0270 3.05911 .9773 2. you need to look at the sig.4015 3.6073 1.2060* 3.7784* 2.000 . compare the specific groups with each other. .2060* .5795 2. The post-hoc test.5724* -2.7004 -3.00).9458 2.000 .05447 .000 .7784* 1.05911 .000 .39 - .5833 1.07134 . < 0.07463 . Error -1. Results of the Post-hoc tests: Multiple Comparisons Dependent Variable: Income before the program Mean Difference (I-J) Std.3849 -1. Those groups that differ significantly will usually be flagged by means of an *.000 .05911 -3.3849 3. The interpretation for this table will be written as: A significant difference exists between groups (f=1436.6110 1.000 .07134 1.08283 . (significance value).7112 -1.3811 -1.000 . comparison should also be smaller than 0. it indicates that there are significant differences between the groups.000 .000 .000 .5795 1.2060* -1.5833 2.0066 3.000 .0104 3.000 .000 .9773 1.000 .000 .000 .05.0066 -1.3811 3.7142 -2.000 . To see if the differences is statistically significant.07134 .000 .08304 Tukey HSD (I) Level of education Did not complete high school High school degree Some college Bonferroni Did not complete high school High school degree Some college Tamhane Did not complete high school High school degree Some college Games-Howell Did not complete high school High school degree Some college (J) Level of education High school degree Some college Did not complete high school Some college Did not complete high school High school degree High school degree Some college Did not complete high school Some college Did not complete high school High school degree High school degree Some college Did not complete high school Some college Did not complete high school High school degree High school degree Some college Did not complete high school Some college Did not complete high school High school degree Sig.9495 2.2060* 3.2060* 3.08283 .7784* .
the Kruskal-Wallis H test. It is more powerful and preferable non-parametric alternative to use. Where the ANOVA uses the F-ratio. I can now say that the ANOVA showed that significant differences exist. and those who do not. I need to look at either the results of the Tamhane or Games-Howell post hoc tests. did complete school and those with post-matric training. we will only look at the Kruskal-Wallis H Test. statistically significant differences between all the groups.For this example. For the purpose of this module.6 The non-parametric alternatives for the One-way ANOVA When the data does not meet the requirements/assumptions of the parametric one-way ANOVA. the Median test and the Johnckheere-Terpstra Tests can be used. I have selected both post hoc tests that assumes homogeneity of variances. Therefore.40 - . The Kruskal-Wallis H Test is an extension of the Mann-Whitney U test. Basically it compares the medians of the samples/groups. • • • • Data that can be used for the Krukskal-Wallis should: Groups must be independent More than 5 respondents per group (preferably 10) Sample sizes should be equal or as equal as possible . I can now say that the assumption of Both these tests indicate to me that there are homogeneity of variances has not been met. the KruskalWallis uses the H to assess whether differences exists. The hypothesis that I tested in this example was that significant differences exist in the income level of people that did not complete school. From the Levene’s statistic. From the means plot I can see which of the groups has the highest income: 12 11 Mean of Income before the program 10 9 8 7 High school degree Some college Did not complete hig Level of education 8.
In other words.htm References: Field. See more at: o o o http://davidmlane. (1998). Descriptive statistics for the social and behavioral sciences.The distribution need not be normal and variances need not be equal. Asian and White South Africans differ in terms of demographical location.html Remember when to use a partial correlation? You want to keep the effect of a variable constant. In such cases we use the Analysis of Covariance (ANCOVA).buffalo. number of children.sfsu. (2003) SPSS survival manual. ANOVA’s with more than one “factor” tested to see if it has an effect. to see what the relationship between two other variables are without its interference. you would not want to compare more than two groups on one independent variable alone. number of people living within one household?. In some situations. SAGE Publications Huysamen. G.edu/cc/docs/stat38.com/hyperstat/A134930. A. you would like to see if there are differences based on more than one variable. J.K. .html http://pluto. JL van Schaik Academic: Pretoria. Open university press. Sometimes when we want to test differences with an ANOVA. See also: http://www-users.edu/~ludford/Stat_Guide/ANCOVA.uwaterloo. See: o o http://userwww.utexas.ca/~djbrown/psych391/Test2/Factorial-Variance1.41 - .fss. (2009) Discovering statistics using SPSS. Pallant. we may also want to control for the effect of another variable.edu/~efc/classes/biol710/manova/manovanew. can also be called Factorial ANOVA.html http://arts.pdf When more than one dependent variable is included. Does Black.edu/classes/psy/segal/2072001/anova2/ANOVA2. the Multivariate analysis of variance or MANOVA is used. when three independent variables are included we make use of the three-way ANOVA.htm http://www.umn. When two independent variables are included we make use of the two-way ANOVA.cs.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.