Professional Documents
Culture Documents
Contents
An Overview of Statistics in Education
Analysis and Interpretation of Multivariate Data
Analysis of Covariance
Analysis of Extreme Values in Education
Analysis of Variance
Bayesian Statistical Analysis
Bootstrap Method
Canonical Correlation
Categorical Data Analysis
Causal Inference
Cluster Analysis: Overview
Cognitive Psychology and Educational Statistics
Computational Statistics
Continuous Probability Distributions
Correspondence Analysis
Data Mining
Decision Theory
Design of Experiments
Discrete Probability Distributions
Discrimination and Classification
Empirical Bayes Methods
Evaluation Research
Exploratory Data Analysis
Factor Analysis: An Overview and Some Contemporary Advances
Generalized Linear Mixed Models
Generalized Linear Models
Generating Random Numbers
Goodness-of-Fit Testing
Graphical Models
Growth Modeling
Hierarchical Linear Models
Hypothesis Testing and Confidence Intervals
Instrumental Variables
Jackknife Methods
Large-sample Statistical Methods
Latent Class Models
Markov Chain Monte Carlo
Matrix Algebra
Measure of Association
Measures of Central Tendency
Measures of Dispersion, Skewness and Kurtosis
Meta Analysis
Missing Data
Model Selection
Monte Carlo Methods
Multidimensional Scaling
Multiple Comparisons
Multivariate Analysis of Variance
Multivariate Linear Regression
1
2 Statistics
Introduction the topics included in the statistics section, or, that sev-
eral examples included in the articles in this section, deal
This article intends to provide an overview of the appli- mostly with educational measurement and should have
cation of statistics to the field of education. Statistics is a belonged in the Educational measurement section of
vast field, with new topics such as proteomics, ensemble this encyclopedia. In addition, some of the topics may
sampling, and statistics in opthalmology being intro- overlap to a small extent with other topics in this section
duced every now and then. The Statistics section of the (e.g., generalized linear models are tools in categorical
encyclopedia did not attempt to provide a summary of all data analysis, but this section has two separate articles on
possible topics under the subject. Instead, this section the two topics). In any case, the number of applications
focuses on the topics in statistics that have found appli- of statistics to education is on the rise. This is because of
cations in the field of education. Most applications of (1) increases in computing power that have led people to
statistics to education are found in the area of educa- ask questions that could not have been answered 20 years
tional measurement for the simple reason that statistics, ago in a timely manner, and, (2) increases in the number
the science that deals with quantitative analysis of data, of educational tests, partially due to the No Child Left
inherently is related to measurement. (Several such appli- Behind (NCLB) Act of 2001 in the USA that requires
cations – such as automated scoring, differential item annual testing in the schools and produces a lot of data.
functioning, generalizability theory, and item response Practitioners in education should find this section, together
theory (IRT) – are covered in the educational measure- with the Educational measurement section, helpful as these
ment section of the encyclopedia and will not be two provide a comprehensive overview of the statistical
repeated here.) Hence, one could argue that some of methods used in education.
An Overview of Statistics in Education 3
Examinees
After data are collected, often the first step in analyzing
the data is exploratory data analysis (EDA), which consists
of looking at data to see what they seem to say (Tukey,
1977) while relying on simple arithmetic and easy-to-
draw pictures or plots. The techniques used in EDA
include the following:
Plotting the data in bar charts, pie charts, histograms,
Youden plots, etc.,
Plotting simple statistics in plots such as mean plots,
standard deviation (SD) plots, box plots, etc., and
Positioning such plots so as to extract the maximum
information possible from them. Items
Consider Figure 1, which shows responses of 325 exam- Figure 1 A plot of the responses of 325 examinees to 15 mixed
inees to 15 items regarding mixed-number subtraction number subtraction items.
(Tatsuoka, 1984). An example item is 457 147. The items,
sorted according to decreasing proportion correct (i.e., 9 and 13 were higher in 2004 than in 1971 and the average
increasing difficulty), are shown along the x-axis in the score for 17-year-olds in 2004 was similar to that in 1971.
figure; the examinees, sorted according to increasing raw Measures of central tendency and measures of dispersion,
scores, are shown along the y-axis. A short black horizontal skewness, and kurtosis are discussed in the statistics sec-
line for an examinee and an item indicates a correct tion of the encyclopedia.
response. Some patterns are immediately visible from the
figure. For example, several examinees (24 out of 325) at
the top answer all items correctly (clear from the top of Measures of Association
the plot being completely black). The examinees with the
In education, it is often of interest to examine the amount
lowest scores could answer only two items correctly. Inter-
of association between a group of variables. For example,
estingly, these items ð34 34 and 378 2Þ are not the two
test administrators administering several tests simulta-
easiest items and can be solved without any knowledge of
neously to students will like that scores on different tests
mixed-number subtraction. Further, the lower half of the
(e.g., in reading, writing, mathematics, and science) do
examinees rarely answered any difficult items correctly.
not correlate highly with each other. High correlations
Wainer (2000, 2005) provided several applications of
between such scores may raise questions about the need
EDA to education.
of so many tests. Choice of the appropriate measures of
association depends on whether the variables of interest
are continuous, discrete ordinal, or discrete nominal. For
Simple Summary Measures
example, if the scores on the abovementioned tests are
It is often important to examine simple summary measures given on a scale of 1 to 100 in 1-point increments, a
such as the mean and standard deviation (SD) of numerical correlation coefficient may be the appropriate measure
information. For example, the Digest of Education Statistics, of association. On the other hand, if scores are given as
an annual publication of the National Center for Educa- 0 or 1 (where a score of 1 means that a student is good on a
tion Statistics (NCES) in the United States of America, subject and 0 otherwise), an odds ratio or the Kendall’s
includes the number of schools and colleges, teachers, tau (see, e.g., Agresti, 2002) may be the appropriate mea-
enrolments, and graduates, in addition to other informa- sure. It is often of interest to examine the association
tion on education. Further, the 2007 Digest reports that the between two groups of variables, X ¼ ðX1 ; X2 ; . . . Xk Þ
average salary for teachers in 2005–06 was US$49 109, and Y ¼ ðY1 ; Y2 ; . . . Y1 Þ. Canonical correlation analysis
about 1% higher than in 1995–96, after adjustment for attempts to find linear combinations of the two groups
inflation, and that the average reading scores at ages with high correlations.
4 Statistics
ANCOVA may be used to compare the average SAT collect data on a sample of individuals, and apply an
critical reading scores of several schools where the pre- appropriate method to draw conclusions. In the example
liminary scholastic aptitude test/national merit scholar- above, one could record the number of hours of television
ship qualifying test (PSAT/NMSQT) critical reading watched by the students and examine its association with
score of each examinee is available in addition to the their grades. Only limited number of conclusions can be
SAT critical reading score. (The PSAT/NMSQT is sup- drawn from an observational study. Any observed differ-
posed to provide firsthand practice for the SAT.) ence or association has several reasonable alternative
Multivariate analysis of variance (MANOVA) is used explanations. For example, an observed lower score of
to compare means of several variables simultaneously students watching television longer can be caused by
across several groups of individuals. For example, one such students having less appropriate home and school
could apply MANOVA to simultaneously compare the inputs, such as fewer books at home or parents who read
average scores on several subjects across several schools. less to them. Huang and Lee (2009) investigated whether
Longford (1990) provides such an example. television watching at ages 6–7 and 8–9 affects cognitive
development measured by math and reading scores at age
8–9 using a rich childhood longitudinal sample.
Design of Experiments
An experiment is a test in which purposeful changes are
made to the input variables of a process or system so that Causal Inference and Instrumental Variables
one may observe and identify the reasons for changes
Suppose that an investigator is interested in testing a
that may be observed in the output response. Design of
hypothesis, for example, about the comparison of a new
experiments is the science of planning and conducting
educational program versus the existing program. Whether
experiments and analyzing the resulting data so that
the investigator performs a randomized experiment or an
valid and objective conclusions can be drawn. For exam-
observational study, he is faced with the question of how to
ple, the education ministry of a country may be interested
draw inferences about the causal effects of the new program.
in conducting an experiment to find out if a particular
In other words, if there is a performance difference between
style of teaching mathematics helps children of fourth
the students who were administered the new educational
grade to learn the subject better than the existing style.
program and those who were administered the existing
In designing an experiment, the ministry has to make sure
program, the investigator would like an answer to the ques-
that any difference that they might observe in the out-
tion ‘‘Is the difference in performance of the two groups
come for students who were taught using the new style
caused by the difference in the educational program?’’ An
and those who were taught using the existing style cannot
article in the statistics section discusses how causal infer-
be attributed to a factor other than the teaching style (e.g.,
ences can be made. Instrumental variables (IVs) are used to
if they assign all students from rural areas to the new style
estimate causal relationships when controlled experiments
and all students from urban areas to the existing style,
are not feasible. An overview of instrumental variables and
then a difference can be attributed to the rural vs. urban
of their possible applications to education is discussed in the
difference). There are three basic principles in design of
statistics section of the encyclopedia.
experiments – randomization (which means that the
assignment of the experimental material and the order in
which the individuals receive the experimental material are
randomly determined), replication (which refers to repeats Sampling
of each experimental condition), and blocking (which is the There is a growing importance of survey information on
grouping of individuals to create several homogeneous individuals, households, institutions, businesses, and envi-
groups before assigning the experimental material). ronmental resources. Typically, one wants to gather infor-
mation on a large group of individuals. However, time and
cost usually does not allow obtaining information from
Observational Studies
each individual in the group. In such cases, one usually
In some situations, it is not possible (for reasons such as gathers information on only a sample, which is a small
budget constraints and ethical issues) to design an exper- part of the large group. Sampling plays an essential role in
iment to answer a question or to test a hypothesis. drawing conclusions about the large group (which is
For example, consider that the interest is in finding called the population) from the information contained
whether watching television is affecting the class grades in the sample. An example of an application of sam-
of students. As watching television may have adverse pling is the National Assessment of Educational Progress
effects, it will be unethical for one to design an experiment (NAEP), an educational sampling survey (Allen et al., 2001).
and randomly assign students to watch television for dif- NAEP is the only ongoing measure of what students in the
ferent number of hours per day. In such situations, often USA know and can do in a variety of subject areas and it
the only way is to conduct an observational study, that is, to reports scores for different demographic groups based on
An Overview of Statistics in Education 7
gender, ethnicity, school type, school location, etc. NAEP the population of interest. Two of the most popular resam-
draws a sample of students that is representative of the pling methods are the jackknife and bootstrap. Both of these
whole student population, applies several statistical tech- are examples of nonparametric statistical methods.
niques, and draws conclusions (an example of a conclusion Jackknife is used in statistical inference to estimate the
is that between 1992 and 2000, the percentage of fourth- bias and standard error of a test statistic. The basic idea
graders at or above the proficient-achievement level in behind jackknife lies in systematically recomputing the
reading increased by a small, but statistically significant statistic a large number of times, leaving out one observa-
amount) on the whole population based on information tion or a group of observations at a time from the sample.
contained in the sample. Estimates of the bias and variance of the statistic can be
calculated from this set of jackknife replications of the
statistic. The jackknife finds several applications in com-
Bayesian and Empirical Bayes Methods plex sampling schemes, such as multistage sampling with
In a traditional or frequentist statistical analysis, the varying sampling weights – an example of such applica-
parameter of a probability model is considered an tion is NAEP, where the jackknife method is employed to
unknown but nonrandom quantity and only the informa- compute standard errors of estimates.
tion contained in the observed data is relevant for any Bootstrap is a statistical method for estimating the
inference. On the contrary, a Bayesian analysis (see, for sampling distribution of an estimator by sampling with
e.g., Gelman et al., 2003) assumes that the parameter is a replacement from the original sample, most often with the
random variable with a certain probability distribution, purpose of deriving robust estimates of standard errors
referred to as the prior distribution. The prior distribution and confidence intervals of a population parameter like a
quantifies the experimenter’s beliefs about the parameter mean, median, and correlation coefficient. It is often used
before observing the data. The next step in a Bayesian as a robust alternative to procedures based on parametric
approach is to update the prior distribution on the basis of assumptions, especially when those assumptions are in
the likelihood function of the observed data through doubt, or where parametric inference is impossible or
Bayes’ theorem (Bayes, 1763). The resulting distribution requires very complicated formulas for the calculation of
is referred to as the posterior distribution of the parameter standard errors. See, for example, Hanson et al. (1993),
and summarizes the information in both the prior distri- who applied the bootstrap method to compute the stan-
bution and in the data. The influence of the prior dis- dard error of an equating method.
tribution on the posterior distribution becomes weaker as
the size of the observed data sample increases. The varia-
tion of the Bayesian methods in which the parameters of Nonparametric Inference
the prior distribution are estimated from the observed
data is called empirical Bayes methods. Sinharay (2006) Nonparametric methods, or distribution-free methods,
provided a review of the applications of Bayesian methods are statistical methods that do not rely on assumptions
to educational measurement. Novick and Jackson (1974) that the data are drawn from a given probability distribu-
included several applications of Bayesian methods to tion. Nonparametric methods are often applied when less
educational measurement. Other examples of applications is known about the data (so that a probability distribution
of Bayesian methods to education are Rubin (1983), who cannot be assumed). Due to the reliance on fewer assump-
applied Bayesian methods to three problems in educa- tions, nonparametric methods are more robust (i.e., less
tional measurement, Zwick et al. (1999), who applied an vulnerable to violations of assumptions). They are also
empirical Bayes method to differential item function- often applied because of their simplicity. Examples of
ing, and Sinharay (2005), who applied Bayesian model- nonparametric methods are Pearson’s w2 test for assessing
checking methods to assess the goodness of fit of IRT independence in a contingency table, jackknife and boot-
models. Further details on Bayesian methods and empir- strap methods for estimating the bias and variance of an
ical Bayes methods (see, for example, Carlin and Louis, estimator, the Wilcoxon Mann–Whitney rank-sum test,
1996) are discussed in the statistics section of the encyclo- the permutation test, the Kolmogorov–Smirnov test for
pedia. Decision theory, which is a Bayesian approach, is assessing whether two distributions are the same, and
concerned with identifying the values, uncertainties, and spline regression for estimating regression curves of a
other issues relevant in a given decision and the resulting dependent variable on several independent variables.
optimal decision. An article in the statistics section pro-
vides more details on decision theory.
Multiple Linear Regression Models
Multiple linear regression models have been extensively
Resampling Methods
used in education (see, e.g., Hsu, 2005). Interestingly, the
Resampling methods (see, e.g., Efron, 1982) draw samples name regression, borrowed from the title of the first
from the observed data to draw certain conclusions about article on this subject (Galton, 1885), does not reflect
8 Statistics
either the importance or breadth of application of this requirements (Braun, 2005). Interestingly, in this respect,
method. Multiple regression is the statistical procedure to some states have taken the lead by seeking a quantitative
predict the values of a response (dependent) variable from evaluation of teachers based on an analysis of the test-
a collection of predictor (independent) variable values. score gains of their students. Such evaluations employ a
For example, if scores on multiple predictors and one class of models called value-added models (VAMs). These
criterion are available, multiple regression may be used models require data that track individual students’ aca-
to develop a single equation to predict criterion perfor- demic growth over several years in different subjects in
mance from the set of predictors. Several applications of order to estimate the contributions that teachers make to
multiple regression models can be found in the prediction that growth. Thus, VAMs can be viewed as a special case of
of first-year grade-point average in college from the SAT growth models and, hence, of HLMs. Given their current
scores and high school grade-point average (see, e.g., state of development, VAMs can be used to identify a group
Kobrin et al., 2008). Multiple regression and multivariate of teachers who may reasonably be assumed to require
multiple regression, the case when there are more than targeted professional development. These are the teachers
one dependent variables of interest and the interest is in with the lowest estimates of relative effectiveness. Despite
predicting them simultaneously from a set of predictor the enthusiasm these models have generated among many
variables, are discussed in the statistics section of the policymakers, several technical reviews of VAMs have
encyclopedia. revealed a number of serious concerns and it is important
that such concerns be properly addressed before VAMs are
used to make important decisions.
Hierarchical Linear Models and Growth Models
In an application of linear regression, the observations are Generalized Linear Models and Generalized
assumed to be independent. When the assumption of Linear Mixed Models
independence is likely to be violated, for example, in an
application in which one has data on several students who Linear regression models apply when the response vari-
belong to a few schools (so that the responses of the able can be assumed to be a continuous variable or to be
students within each school are dependent), a popular normally distributed. However, in several applications in
option is to employ hierarchical linear models (HLMs). education, the response does not belong to either of those
These models are also referred to as multilevel models types. Suppose the interest is in finding out how the
and random-effects regression models. The students con- socioeconomic status and average parents’ education for
stitute the lower level while the schools constitute the a class of students affects their performance on a test. If we
higher level in the example. Note that HLMs can also have the scores on the test for each student, we can
be applied to repeated measures design or longitudinal employ a linear regression model regressing the test
studies, where individuals are followed and their re- scores on the socioeconomic status and average parent
sponses recorded several times over a certain period education. However, if we do not have the scores, but only
of time; the repeated measures constitute the lower level know who passed the test and who did not (which is a
and the individuals constitute the higher level; exam- binary response), we cannot employ linear regression.
ples of such models are growth models, where, for ex- Generalized linear models (GLMs) can be used in situa-
ample, the investigator measures the cognitive growth tions like this. GLMs are extensions of the linear regres-
of students by giving them several tests over a certain sion model to a wider class of response type such as binary
period of time. Growth models are increasingly popular or count data. A GLM requires the specification of two
in the US due to the NCLB Act of 2001 that puts special defining characteristics – the distribution of the response
emphasis on the cognitive growth of students. For exam- and the link function that describes how the mean of the
ple, in December 2007, the US Secretary of Education response is linked to a linear combination of the predic-
Margaret Spellings invited all eligible US states to submit tors. Generalized linear mixed models (GLMM) are
a growth model proposal for the 2007–08 school year. extensions of GLMs to the case when the individuals are
Growth models and HLMs are discussed in the statistics clustered (e.g., students belonging to different schools).
section of the encyclopedia. The statistics section of the encyclopedia includes two
articles, one each on GLM and GLMM.
Value-Added Models
Nonlinear Regression Methods
The NCLB Act of 2001 in the US requires states to ensure
that there are quality teachers in every classroom, with One is often interested is in studying how a set of inde-
quality defined in terms of traditional criteria such as pendent variables affect a dependent variable, but the
academic training and fully meeting the state’s licensure relationship between them cannot be assumed linear. So
An Overview of Statistics in Education 9
the abovementioned models, all of which assume a linear the number of students who will take a test (e.g., SAT) at
relationship, cannot be applied. Nonlinear regression an administration based on the numbers from the previ-
methods, which may be applicable in such situations to ous administrations of the test. A time-series model gen-
predict the dependent variable from the independent erally reflects the fact that observations close together
variables and recursive partitioning, or, classification and in time will be more closely related than observations
regression trees method, which is another method that further apart. Three broad classes of time-series models
may be applicable in such situations, are discussed in the of practical importance are the autoregressive (AR) mod-
statistics section of the encyclopedia. els, the integrated (I) models, and the moving average
(MA) models. There are models, such as the autoregres-
sive moving average (ARMA) and autoregressive inte-
IRT Models grated moving average (ARIMA) that are combinations
of the above three.
These models, with numerous applications in education,
are discussed in an article in the educational measure-
ment section and are not covered here.
Model Fit and Model Selection
Model fit analysis refers to an examination of whether
Latent Class Models the statistical model employed in an application ade-
A latent class model (LCM) relates a set of observed quately explains the important features of the data set
discrete multivariate variables to a set of latent variables at hand. Model selection refers to the choice of the
(latent variables are not directly observed but are rather statistical model that describes the data best among
inferred, mostly through a mathematical model, from several competing models. Model fit and model selec-
other variables that are observed; e.g., quality of life or tion analysis for the linear models employed in educa-
intelligence of a person is a latent variable). It is called an tion do not pose any problems and proceed in a similar
LCM because the latent variable is discrete and divides manner as in any other statistics field, for example, by
the population into several classes. A class is characterized using residual analysis, Akaike information criterion
by a pattern of conditional probabilities that indicate the (AIC) and Bayesian information criterion (BIC) (see,
chance that variables take certain values. For example, e.g., Draper and Smith, 1998). However, model fit and
Dayton and Macready (2006) discuss an application in model selection analysis for the nonlinear models, espe-
which the observed variables are the responses to ten cially for the IRT models, are not trivial, primarily
questions on matrix algebra on a test, the latent variable because the computations are not straightforward with
refers to the knowledge of matrix algebra of students, and these models, the response variable is discrete so that
the latent classes refer to masters and nonmasters on normality of the response cannot be assumed, and the
matrix algebra. Given class membership, the conditional number of possible responses is huge so that there is
probabilities specify the chance certain answers are cho- sparseness in the data. Fortunately, with the advent of
sen. Within each latent class, the observed variables are faster computers, there has been substantial work in
statistically independent (this is often called local inde- these areas. Swaminathan et al. (2006) provided a detailed
pendence). This is an important aspect of LCMs. Usually, review of the literature on model fit of IRT models.
the observed variables are statistically dependant. By Several model fit statistics have been suggested for test-
introducing the latent variable, independence is restored ing different aspects of an IRT model: statistics for test-
in the sense that variables are independent within classes. ing the unidimensionality assumption of the IRT model,
The association between the observed variables is thus item fit statistics, person fit statistics, and overall model
explained by the latent classes. fit statistics. In applications of IRT models, it is impor-
tant to employ the appropriate model fit statistics
depending on the intended use of the model, and to
evaluate not only statistical significance, but also practi-
Time-Series Analysis
cal significance. For example, it may happen that the
A time series is a sequence of data points, measured value of a fit statistic is statistically significant so that
typically at successive time points. Time series analysis the model does not predict an aspect of the data, but the
comprises methods that attempt to understand such time misfit has negligible consequences operationally.
series, often either to understand the underlying context Kang and Cohen (2007) provided a detailed review of
of the data points, or to make forecasts (predictions). model selection methods for IRT models. More techni-
Forecasting using a time-series analysis consists of the ques such as the penalty criteria and the theoretical back-
use of a model to forecast future events based on known ground behind several techniques are discussed in the
past events. An example in education is the prediction of statistics section of the encyclopedia.
10 Statistics
Bibliography
point average. College Board Research Report No. 2008-5.
New York: College Board.
Agresti, A. (2002). Categorical Data Analysis, 2nd edn. New York: Wiley. Kramer, W. and Gigerenzer, G. (2005). How to confuse with statistics or:
Allen, N. L., Donoghue, J. R., and Schoeps, T. L. (2001). The NAEP The use and misuse of conditional probabilities. Statistical Science
1998 Technical Report (NCES 2001-509). Washington, DC: 20, 223–230.
National Center for Education Statistics, U.S. Department of Longford, N. T. (1990). Multivariate variance component analysis: An
Education. application in test development. Journal of Educational Statistics
Bayes, T. (1763). An essay towards solving a problem in the doctrine of 15(2), 91–112.
chances. Philosophical Transactions of the Royal Society of London Novick, M. R. and Jackson, P. H. (1974). Statistical Methods
61.53, 370–418. [Reprinted with biographical note by Barnard, for Educational and Psychological Research. New York:
G. A. (1958) Biometrika 45, 293–315.] McGraw-Hill.
Braun, H. (2005). Using Student Progress to Evaluate Teaching: Rubin, D. B. (1983). Some applications of Bayesian statistics to
A Primer on Value-Added Models. Princeton, NJ: Policy Information educational data. Statistician 32, 55–68.
Center, Educational Testing Service. Sinharay, S. (2005). Assessing fit of unidimensional item response
Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes theory models using a Bayesian approach. Journal of Educational
Methods for Data Analysis. London: Chapman and Hall. Measurement 42, 375–394.
Cronbach, L. J., Nageswari, R., and Gleser, G. C. (1963). Theory of Sinharay, S. (2006). Bayesian methods in educational measurement. In
generalizability: A liberation of reliability theory. British Journal of Upadhyay, S. K., Singh, U., and Dey, D. K. (eds.) Bayesian Statistics
Statistical Psychology 16, 137–163. and Its Applications, pp 422–437. New Delhi: Anamaya.
Dayton, C. M. and Macready, G. B. (2006). Latent class analysis in Stigler, S. M. (2005). Correlation and causation: A comment.
psychometrics. In Rao, C. R. and Sinharay, S. (eds.) Handbook of Perspective in Biology and Medicine 48(1), 88–94.
Statistics, vol. 26, pp 421–446. Amsterdam: North-Holland/Elsevier. Swaminathan, H., Hambleton, R. K., and Rogers, H. J. (2006).
Draper, N. R. and Smith, H. (1998). Applied Regression Analysis, 3rd Assessing the fit of item response theory models. In Rao, C. R. and
edn. New York: Wiley. Sinharay, S. (eds.) Handbook of Statistics, vol. 26, pp 683–718.
Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Amsterdam: North-Holland/Elsevier.
Plans. Philadelphia, PA: Society for Industrial and Applied Tatsuoka, K. K. (1984). Caution indices based on item response theory.
Mathematics. Psychometrika 49, 95–110.
Everitt, B. S. (1990). Cluster analysis. In Husen, T. and Tukey, J. W. (1977). Exploratory Data Analysis. Reading, MA:
Postlethwaite, N. (eds.) International Encyclopedia of Education, Addison-Wesley.
2nd edn., pp 825–831. Oxford: Pergamon. von Davier, M., Sinharay, S., Oranje, A., and Beaton, A. (2007). The
Galton, F. (1885). Regression towards mediocrity in heredity stature. statistical procedures used in national assessment of educational
Journal of the Anthropological Institute 15, 246–263. progress: Recent developments and future directions.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian In Rao, C. R. and Sinharay, S. (eds.) Handbook of Statistics, vol. 26,
Data Analysis. New York: Chapman and Hall. pp 1039–1055. Amsterdam: Elsevier.
Haller, H. and Kraus, S. (2002). Misinterpretations of significance: Wainer, H. (2000). Visual Revelations: Graphical Tales of Fate and
A problem students share with their teachers? Methods of Deception from Napoleon Bonaparte to Ross Perot, 2nd edn.
Psychological Research 7, 1–20. Hillsdale, NJ: Erlbaum.
Hanson, B. A., Zeng, L., and Kolen, M. J. (1993). Standard Wainer, H. (2005). Graphic Discovery: A Trout in the Milk and
errors of Levine linear equating. Applied Psychological Measurement Other Visual Adventures. Princeton, NJ: Princeton
17, 225–237. University Press.
Hsu, T. (2005). Research methods and data analysis procedures used Zwick, R., Thayer, D. T., and Lewis, C. (1999). An empirical Bayes
by educational researchers. International Journal of Research and approach to Mantel–Haenszel DIF analysis. Journal of Educational
Method in Education 28(2), 109–133. Measurement 36, 1–28.
Huang, F. and Lee, M. (2009). Dynamic treatment effect analysis of TV
effects on child cognitive development. No 0906, Discussion Paper
Series, Institute of Economic Research, Korea University. http://
econpapers.repec.org/RePEc:iek:wpaper:0906. Further Reading
Jenkins, F., Kaplan, B., and Lim, Y. (2001). Data analysis for the
national writing samples. In Allen, N. (ed.) The NAEP 1998
Technology Report, 359–370. Washington, DC: National Center Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern
for Education Statistics. Test Theory. New York: Harcourt Brace Jovanovich College
Johnson, R. A. and Wichern, D. W. (1998). Applied Publishers.
Multivariate Statistical Analysis, 4th edn. Upper Saddle River, NJ:
Prentice-Hall.
Kang, T. and Cohen, A. S. (2007). IRT model selection methods
for dichotomous items. Applied Psychological Measurement 31(4), Relevant Website
331–358.
Kobrin, J. L., Patterson, B. F., Shaw, E. J., Mattern, K. D., and Barbuti,
S. M. (2008). Validity of the SAT for predicting first-year college grade http://www.ed.gov – US Department of Education.