You are on page 1of 12
INTRODUCTION TO SIMPLE STATISTICS THE SQUIRREL, THE NUT AND THE NONPARAMETRIC SOLUTION Version 1.1 (August 1993) Carlos Drews University of Cambridge Dept. of Zoology, Downing Street Cambridge CB2 3EJ, U.K. Aim of the seminar: The aim of the seminar is to provide an overview of basic concepts in statistics and familiarize the audience with the value and applications of nonparametric statistical procedures. The presentation is restricted to the essential facts, leaving out formulae as well as mathematical derivations of the central notions. The seminar will unavoidably suffer from oversimplification. Nonetheless, it should encourage the audience to start reading and learning about statistics, which can be a fascinating subject in its own right in addition to being a powerful and necessary tool in research. Recommended reading: Siegel, S. & N.J.Castellan, Jr. (1988) Nonparametric statistics, 2nd Edition, McGraw-Hill, Singapore Huff D. (1954) How to lie with statistics, Penguin Books, London PART I. BASIC CONCEPTS 1 Why statistics? Chance is a major determinant of the outcome of a sampling procedure. Statistical analysis of data is a quantification of the possible effect of chance. It enables an objective distinction between what could be attributed to chance and a "true" effect, such as a difference, a similarity or an association. Descriptive statistical parameters, which describe e.g. the central tendency or the spread of a data set, characterize a sample in a way that enables comparisons with other samples or with values which would be expected according to specific theoretical premises. 2 Characterizing distributions The distribution of data is described by how often each value occurs among the possible range of values of a scale. Example 1: Distribution of nut sizes from a tropical tree grown in a botanical garden in London and in a Brazilian forest patch. €.Drews INTRODUCTION TO STATISTICS 2.1 Various distributions (Note: Figures 1-7 are to be drawn on the board) Figure 1. Nut size in botanical garden (normal) Figure 2. Nut size in Brazil where squirrels are common (left skew, right skew, bimodal, even, I Don’t Know) Figure 3. Nut sizes of two tree species in Botanical Garden (normal distributions with different variance and central tendency) 2.2 Describing distributions in statistical terms ~ "average" is a neutral term for the central tendency, the specific statistic used depends strictly on the distribution Figure 4. The Normal distribution, the mean and standard deviation Figure 5. Other measures of central tendency and spread: the mode, the median and interquartile ranges 3 Levels of measurement 3.1 Nominal or categorical E.g. number of different kinds of seeds eaten by the squirrel - even if numeric codes are used for seed kinds, sex of squirrel. Values cannot be ordered or ranked 3.2 Ordinal or ranking There is a scale underlying the data, values can be ranked. Continuous scale: weather conditions during foraging Discontinuous scale: age classes infant, juvenile, adult Use median (or mode) to describe central tendency, because it is insensitive to changes in the numerical assignments to scores. The median always remains in the middle of the distribution, while the mean does not. 5 3.3 Interval and ratio scales E.g. temperature in roost of squirrel, foraging height of squirrel on trees in metres, weight of squirrel The distances and differences between any two numbers on the scale have meaning. The ratio of any two intervals is independent of the unit used and the zero point, both of which are arbitrary. If a true zero point exists then it is a ratio scale. 4 Parametric vs. Nonparametric (distribution free) Statistics Conditions to apply parametric tests: = normal distributions - equal variances - at least interval level of measurement 2 C.Drews INTRODUCTION TO STATISTICS Advantages and disadvantages of Nonparametric statistics: Advantages: § - deal with nominal and ordinal level > only alternative when sample sizes small - no assumptions about the distribution Disadvantages: - slightly less powerful than parametric stats - some limitations in multivariate procedures If interval data and large samples are available, then check deviation from normality, use a transformation to normalize the distribution or use a (conservative) nonparametric test. 5 Conclusions from statistical tests: the rejection region and p p is the probability of a value equal or more extreme occurring purely by chance. Figure 6. Rejection region in a normal distribution = p<0.05 is a convention and it remains a statistical probability - Type I error: if the null hypothesis is rejected when it is, in fact true (an effect is reported when there isn’t one). The smaller is p the more unlikely is this error. - Type II error: involves failure to reject the null hypothesis when, in fact, it was false (the true effect was not detected). A large, representative, sample makes this error less likely. One-tailed hypothesis if the direction of the effect is predicted from a sound theoretical argumentation or from previous results of similar analyses ~ Two~tailed hypotheses when the direction of the effect is not predicted in advance (the test is more robust) ~ if p<0.05 then the result is significant regardless of the sample size - make sure that the sample is representative for the population for which your generalization should hold - report p along with non-significant results as well 6 Pseudoreplication or the "Pooling fallacy" For all statistical tests, whether parametric or not, data points have to be independent.