# Paired-samples test

Use this test as an alternative to the t-test, for cases where data can be paired to reduce incidental variation - i.e. variation that you expect to be present but that is irrelevant to the hypothesis you want to test. As background, let us consider exactly what we do in a conventional t-test to compare two samples. We compare the size of the difference between two means in relation to the amount of inherent variability (the random error, not related to treatment differences) in the data. If the random error is large then we are unlikely to find a significant difference between means unless this difference is also very large. Consider the data in the table below, which shows the number of years' remission from symptoms (of cancer, AIDS, etc.) in two groups of patients: group A who received a new drug and group B who received a placebo (the controls). There were10 patients in each group, and we will first analyse the data by conventional ttest (seeStudent's t-test if you are not familiar with this).
Patient 1 2 3 4 5 6 7 8 9 10 Drug 7 5 2 8 3 4 10 7 4 9 Placebo 4 3 1 6 2 4 9 5 3 8

x

59 10

45 10

n

Now we will analyse the data as paired samples.05 is 1. (The same would be true if.67 Clearly.37 413 348 65 ( x)2 / n  2 7. There is every reason to suspect that age.).9 x 2 4.44 = 1. [The smallest tabulated t value for significant difference at p = 0. recording negative values where they occur) 2. the patients are matched as nearly as possible to exclude the effects of extraneous variation.22 d = 1.5. and so on. we are not dealing with random groups but with purposefully paired observations. For example. for example. Calculate  z. and it would be foolish not to exclude this variation if the purpose of the trial is to see if the drug actually has an overall effect. or to test whether a range of different bacteria are sensitive to an antibiotic. there is no significant difference between the means. Instead. could influence the course of a disease. Procedure (see worked example later) 1. where "n" is the number of pairs (z values) .5 261 203 58 6. patient 1 in each group (drug or placebo) might be a Caucasian male aged 20-25. social factors etc. In other words. (NB Always subtract in the same "direction".96. etc. patient 2 in each group might be an Asian female aged 40-50. .] But drug trials are never done as randomly as this. we wanted to test effects of a fungicide against a disease on 10 farms.  z2 and ( z)2 /n . sex. Subtract each control value from the corresponding treatment value and call the difference z.17 = 0.

we would expect such a result to occur by chance only once in a thousand times. So the drug is effective: we see below that it gives remissison of symptoms for 1. Square root this to find d then calculate: 7.3. Patient 1 2 Drug 7 5 Placebo 4 3 Difference (z) 3 2 . Consult a t table at n-1 degrees of freedom.4 0. The confidence limits are 1. If this were true then the observed values of z would have a mean close to zero.05) In our example the calculated t value is 5.001. with variation about this mean.26 (p = 0. The tabulated t value for 9 df is 2.24. where n is the number of pairs (number of z values).6 years (mean t.266 years (this value is the mean standard error of the mean). Construct a null hypothesis.4 0. In this case it would be appropriate to "expect" no difference between the groups (drug treatment versus controls). In other words. Calculate: 5.n). Calculate: 6. Find t from the equation: 8. which is very highly significant .it exceeds the t value for probability (p) of 0. 4.

3 4 5 6 7 8 9 10 2 8 3 4 10 7 4 9 1 6 2 4 9 5 3 8 1 2 1 0 1 2 1 1 z 14 10 1.4/0.84 0.6 6.4 n z 2 26 19.24 It is instructive to consider what we have done in this analysis.71 0. calculated the standard errorof this mean difference and tested it to see if it is significantly different from zero (no difference). We calculated the mean difference between the pairs of patients (treatments). .267 = 5.4 ( z)2 / n d d n 2 0. The following diagram should make this clear.267 t = 1.

24) on the previous page because the computer did not round up the decimal points during calculations] Patient 1 2 3 4 5 6 7 8 9 10 Drug 7 5 2 8 3 4 10 7 4 9 Placebo 4 3 1 6 2 4 9 5 3 8 t-Test: Paired Two Sample for Means Mean Variable 1 5. See Student's t-test for explanation of other relevant entries in the print-out. [The calculated t (5. select the whole data set (cells B2-C11) for Input variable range and a clear cell for Output range.9 Variable 2 4. The example above was run on "Excel". as before (see Student's t-test) but we select ttest: paired two sample for means from the analysis tools package. click OK.25) differs slightly from the worked example (5.Paired-samples t-test: print-out from "Excel".5 .

949414 0 9 5. standard error. not the value for p = 0.833114 0.5 10 ************ ******** Note that the two-tailed t-test shows the drug and placebo to be significantly different at p = 0. CONTENTS INTRODUCTION THE SCIENTIFIC METHOD Experimental design Designing experiments with statistics in mind Common statistical terms Descriptive statistics: standard deviation.4 years) is significantlygreater than zero. WHAT TEST DO I NEED? STATISTICAL TESTS: Student's t-test for comparing the means of two samples Paired-samples test. But in this case we would be justified in using a one-tailed test (P = 0.000264 1. but used when data can be paired) Analysis of variance for comparing means of three or more samples:    For comparing separate treatments (One-way ANOVA) Calculating the Least Significant Difference between means Using a Multiple Range Test for comparing means .above zero . We should have used this value. This is the value given forp = 0.00026) because we are testing whether the mean difference (1.05 in our testing for significance.so we can double the normal probability of 0.211111 10 0.25 0. confidence intervals of mean.000 that we would get this result by chance alone).0005 (probability of 5 in 10.262159 6. Look at the ‘critical t’ for a one-tailed test in the printout.000528 2.Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail 7.05). (like a t-test.1 in a t-table (because we are testing only for a difference in one direction .

Or. number of cases and mean difference and standard deviation of difference respectively. number of cases and sum of signed difference. give two integer numbers of changers from the diagonal of a two by two tables in the top two boxes. p = 0.    T-test. being measured twice. number of cases and t-value of difference. Wilcoxon Ranks Test. probits and arcsin values STATISTICAL TABLES: t (Student's t-test) F. before and after an 'intervention'.001 (Analysis of Variance) 2 (chi squared) r (correlation coefficient) Q (Multiple Range test) Fmax (test for homogeneity of variance) Pairwise Input. T-test | Wilcoxon | Mc-Nemar Pairwise tests concerns the comparison of the same group of individuals.01 (Analysis of Variance) F. Matched or pairwise data can be presented as in the following table: Before After D(ifference) Sq(D-Mean) Ranked-D .05 (Analysis of Variance) F. For factorial combinations of treatments (Two-way ANOVA) Chi-squared test for categories of data Poisson distribution for count data Correlation coefficient and regression analysis for line fitting:   linear regression logarithmic and sigmoid curves TRANSFORMATION of data: percentages. lowering the level of unexplained variance or 'error'. McNemar test. Number of positive changers and number of negative changers. Explanation. p = 0. p = 0. logarithms. Using this methodology the respondents or their matched 'partners' function as their own control. or matched pairs.

64 77. In the case of the example like this: squareroot-outof(265. One takes the sum of the squares of the difference between the observations minus the mean of the difference.84 0.43.44 0. as it concerns paired observations much of the differences within the two sets of data which the usual t-test (for two independent samples) considers might already have been taken out.84 51.74} the t-test procedure produces a t-value of 0. differences between the two sets of paired observations are not declared statistically significant as quickly as it should. the one with the "s" symbol. and the standard deviation of the difference must be given as positive number with or without decimals in the third box. However. with an associated single sided ('tailed') p-value of 0.6 sd=5. doing it this way does not do justice to the data.72. In the second box the mean difference is given (5. One can use a calculator with statistical function to calculate the standard deviation.84 3.24 1.74 The first three columns show the data. The intervention does not produce statistically significant improvements in the respondents score.8 in the above example).John Steve Liz Mary Paul Joy Mike Nick Linda Peter Total 18 37 12 42 7 31 59 21 8 56 291 28 34 17 40 20 35 66 27 21 61 349 10 -3 5 -2 13 4 7 6 13 5 58 Mean=5.5 45 Signed=3 sd=19.1 (291/10). the following columns a number of calculations which we will now discuss. T-test for paired observations. which is an integer number.5 4.8 'points' on average after the intervention.5 -1 9. sd1=19. n1=10. mean2=34. (Also known as the t-test for two correlated samples).9.64 60. How the standard deviation is calculated is shown in the fifth column of the example above. In that case the t-test tests is too conservative. and take the sample standard deviation.5 3 7 6 9.8 17.05. As the data shows. This sum is then divided by the number of cases (minus one) and the square root is taken. The standard deviation is only used for the t-test procedure. .2376. Enter the scores for the difference.44 0. Giving the values above {mean1=29. according to this method of testing. not the one with the funny little "sigma" symbol.6/(10-1))=5. n2=10.64 265. and which is 10 in the case of the above example. sd2=16. For doing the t-test procedure you have to give the number of cases. One possible way to test if the difference between the respondents before and after the intervention is statistically significant is to apply the procedure t-test as implemented in SISA.43 8 -2 4. the 10 respondents have improved 5. This t-test tests if the sum of the change between the two groups differs statistically significantly from zero. in the top box.04 51.05 sd=16.

e. The Wilcoxon test requires as input the number of changed cases as an integer value in the top box and the sum of the negative ranks as a positive integer number in the second box. 8. one can give the t-value of the difference and leave the value zero (0) in the standard deviation box. the values -2. 2. or there being a ceiling on the maximum size of the differences. in the case of there being. 10. i. the assumptions for the test. 7. neither in the ordering of the data nor in the number of cases. 3. Two of the rank numbers refer to negative values. We have to do this for the (D) values 5 and 13. The t-value of the difference equals 3. i. that the data after the intervention can be ordered. the difference between the first and the second measurement is zero. with a standard deviation of 9. someone who improved four points improved twice as much compared with someone who improved two points. 4. 6.5. with rank numbers 1 and 2 respectively.378. 5. 9. The t-value is calculated like this: (mean-nilhypothesis)/(standarddeviation/squareroot(n))=(5. and that the difference between the two sets of data can be validly ordered. Thus. 9. Giving this value in the appropriate box shows that the expected value of the sum of signed ranks equals 27. This column shows the rank number of the differences between the two measurements in the fourth column.5..5. The assumption of there being a normal distribution does not have to be met. 4. which reordered from small to large give: -2. or there only being differences of more or less the same size. In column four we have the following differences. In the case of ties (two or more similar values) we average the rank numbers. a number of smaller difference being counterballanced by one very large difference.43/sqrt(10)). This test considers that the data are at an ordinalmetric level. 2.00407.5. the t-test would not be valid. 13. Following two tests are presented for in case the assumptions of interval level and normal distribution are not met. 6. As we have 10 observations these observations should be apportioned the rank numbers: 1.5.e. doing this results in the following ranking: 1. The associated z value of the difference between observed and .81. for example. 3. 7. no standard deviation is required. -3. 10. Thus.Alternatively to giving a mean and a standard deviation for the t-test. that the original data can be validly ordered. thus. 13. the sum of signed ranks. This might not be immediately obvious and therefore we repeat the calculation here again. 9. 4. -3. 5. this is particularly practical if the maximum change is somehow limited. the program gives you the pvalue of 0. If all the assumptions for the t-test are met the Wilcoxon has about 95% of the power of the t-test. 8. 4. The nil hypothesis in this case equals zero (0). 7. It is considered that the differences between the two different observations are at the interval level. The sum of the negative ranks is calculated in the sixth column of the above table. 5. this individual is excluded from the analysis. D. In the case of an individual which did not change. cases which did not change are not present in the data. It is further considered that the differences are normally distributed. This assumption is slightly less critical than the interval level assumption necessary for the t-test. table or analysis. A positive aspect of the Wilcoxon test is that it is a very powerful test. 6. we expect no or zero change given the standard error of the measurement (the standarddeviation/sqrt(n) is commonly refered to as the standard error). The t-test is critical with regard to characteristics of the data. The sum of these values equals 3. Wilcoxon Matched Pairs Signed Ranks Test.80)/(5.

179. This test studies the change in a group of respondents measured twice on a dichotomous variable.00625. the difference between the respondents score before and after the intervention is statistically significant. 13 respondents changed their preference from Carter to Reagan while 7 respondents changed their preference from Reagan to Carter. Carter lost the support of 6 voters. The test asks the question if the number of respondents changing is similar in the direction from Reagan to Carter as in the other direction. Fill in the two change values (13 and 7) in the top two boxes (leave a zero in the bottom box). or from Reagan to Carter. the question is if there is a statistically significant difference between the seven and the 13.497 with a p-value of 0. It is customary in that case to tabulate the data in a two by two table. This presumes that the likelihood of changing from Carter to Reagan. no prior expectation with regard to direction. One of the problems with the McNemar is that it requires knowledge of the inside of the table. What is interesting is mostly not what happens inside the table. click the McNemar button. The program also gives you a Binomial alternative to the McNemar test.expected sum of signed ranks equals 2. McNemar Change Test. probability value equals 0. to estimate the inside of the table from the marginals. Double the pvalue of the Binomial test to get a double sided test. Machin and Campbell propose that. As can be seen. before and after a television debate. SISA allows you to do this . As you can see. The following example from Siegel illustrates how the test works. the probability of getting more or the same number out of a total number while the expectation was 50% one way and 50% the other way.V. The Chi-square is a double sided test. but what happens in the marginals (the green bit). while we are mostly interested in the marginals. The doubled Binomial p-value will be very close to the p-value of the Yate's Chi-square. use it if you want to test if the change goes into a particular direction and if the change is statistically significant in that regard. mostly. debate Preference Before Debate Reagan Carter Total Preference after TV Debate Reagan 27 13 40 Carter 7 28 35 Total 34 41 75 In this table one group of voters is asked twice about their voting intention. Thus. in the case the inside of the table is not known. Use the Binomial if the number of cases is small or if you have a cell with less than 5 observations. According to the Wilcoxon test. Preference for candidate before and after T. and the program provides you with the answers. is independent of being a Reagan or a Carter supporter in the first place. before the TV debate Carter had the support of 41 of the 75 voters.125 with Yate's continuity correction. The two candidates were not statistically significantly different in changing the voters preference. 0. the net difference between the voters who changed in favour of Carter and the voters who changed in favour of Reagan. after the debate this decreased to 35. Note that the Binomial is a single sided test.

then healthy people diseased (1).2059. Δ is the hypothesized difference (0 if testing for equal means). to see if after some treatment more diseased people got cured (+1).2 36. being careful to match each stalk's new height to its previous one.0 38. For example.7 After height 45.box. Fill in two numbers of cases. The stalks would have grown an average of 6 inches during that time even without the fertilizer. it is also a test with few assumptions and does not require that the data has particular characteristics.3 36. andn is the sample size. Did the fertilizer help? Use a significance level of 0.0 24.7 31.6 26. he measures the height of each stalk.5 28. The positive direction of the change might well have been caused by chance fluctuation.7 31.0 .8 28.6 with a probability of 0. and the number of negative changes in the second N. Hypothesis test Formula: where is the mean of the change scores.8 35. The signs test including this McNemar version has the disadvantage that it is not a very powerful test.05.5 27. in the top two boxes. A farmer decides to try out a new fertilizer on a test plot containing 10 stalks of corn.3 22.8 42. and a total number of cases in the bottom box.4 33. He uses a t-test instead of the Chi-square or the Binomial alternative.1.1 34. However. Click the McNemar button. s is the sample standard deviation of the differences.9 35. The number of degrees of freedom for the problem is n – 1.3.5 31.6 44. the McNemar test gives a Chi2 for equal number of changers in both groups of 1. Two weeks later. Place the number of positive changers in the top N+ box. (Agresti discussess this same topic in his example 10. In the top table above we would have 7 positive and 3 negative changers. one for each marginal. ordinal and nominal data. he measures the stalks again. Before applying the fertilizer. Disregard the number of non changers.less preferred way of doing the McNemar also. It can be used on interval. null hypothesis: H0: μ = 6 alternative hypothesis: Ha : μ > 6 Stalk 1 2 3 4 5 6 7 8 9 10 Before height 35.) The McNemar can be used as a alternative for the signs test.

36 inches over normal growth). the null hypothesis can be rejected. The problem has n – 1.05. or 10 – 1 = 9 degrees of freedom.9 is 1. The test has provided evidence that the fertilizer caused the corn to grow more than if it had not been fertilized.833. . then compute the mean and standard deviation of the change scores and insert these into the formula. Because the computed t-value of 2. The amount of actual increase was not large (1. The critical value from the t-table for t.833.098 is larger than 1. The test is one-tailed because you are asking only whether the fertilizer increases growth. not reduces it.Subtract each stalk's “before” height from its “after” height to get the change score for each stalk. but it was statistically significant.