The use of theory in prediction and explanatory models

Mathematical models have a wide variety of uses and are applied in some form or another in all levels of the sciences- from the impressive models that describe quantum mechanics and the movements of subatomic particles, to the classical and expansive models that describe movements of heavenly bodies across the cosmos. Here on Earth, meteorologists model weather patterns with the goal of forecasting future conditions and providing ample warning of severe climate; political scientists use models to analyze voter behaviors, and develop election strategies based upon models predicting the likelihood of future outcomes; engineers develop applied models to test and simulate products and prototypes that might logistically be impossible. In the social sciences, data models are built for measuring complex social phenomena and developing theories of human behavior, such as human development and personality. In each of these cases, modeling serves as an attempt to idealize a complex, empirical situation (Estes, 1991). In a few instances, such as the formal models that are predominant in the physical sciences, the idealized situation of the model is so similar to the empirical situation that the model may be interpreted as literal. However, in the softer sciences, including the social sciences, it is often the case that the model is not derived upon some formalized natural law. In these instances, the model appeals to some form of analysis of the relationships between the empirically observed data. Inferences about the phenomena in question may be derived from patterns within the data. Statistical models, therefore, while not entirely identical to the empirical situation, are constructed with the goal of best fitting the empirical observations. The hope is that the discrepancy between model and phenomenon is inconsequential enough for the model to have applied efficacy. In general, the ultimate goal of any statistical model falls into one of two categories: prediction or explanation. Superficially, these goals may seem one and the same, however the distinction between each is not a trivial one. The decision of which goal is desirable not only effects how the model is applied, but also has implications for which methods are appropriate when deriving the model, as well as the limitations that are inherit to the model. More so, the question of how scientists chose to apply their models is debate that touches upon two major philosophical arguments that are at the heart of science- the debate between prediction and explanation; and theoretical versus atheoretical derivations. Prediction and Explanation There are two general applications for which models are ultimately constructed- prediction of a future outcome given a particular set of circumstances; and explanation of a given phenomenon by offering an exhaustive description of its properties including its causes (Osborne, 2000). These applications approximate the two goals that define the scientific endeavor. Science is an attempt to gain both predictive and explanatory knowledge of the

[In doing science] one must construct theories which consist of highly general statements. In particular. However. if at all. & Urry. the question of how these two are related. However. no understanding is possible. prediction and explanation appear two sides of the same coin.external world (Keat. the emphasis is placed on the observation of these empirical regularities. the prevailing winds have shifted the emphasis to providing causal explanations of the phenomena under observation. a theory that predicts phenomenon that did not prompt the initial formation of that theory is better supported by that phenomenon than is a theory by known phenomena that generated the theory in the first place. Keat and Urry write. Following in the positivist tradition. then it is necessary only that there is a constant relationship between a dependent and an independent variable (Manicas. Consequently. beyond this. To explain something is to show that it is an instance of these regularities. explanation is conceptualized as the description of empirical regularities. or laws. In this instance. The argument follows that the phenomena under observation may be too complex for true causal explanation to be determined. predictions of new phenomenon are regarded as more powerful evidence for a theory than explanations of old ones (Achinstein. Even in a theoretical context. (p. the major purpose for prediction is the confirmation of explanatory theories (Reiss. it limits the domain of science to the prediction and description of a given phenomenon.. As Hume argued centuries prior. Advocates of this view have argued that genuine. each of these goals may seem complementary. Continuing with their definition of science. enable us both to predict and explain the phenomena that we discover by means of systematic observation and experiment. standard view of science and those who believe that the development of accurate explanatory constructs is an attainable and desirable goal. Further. 2006). This can be attributed to the relative rarity of successful prediction. and we can make predictions on the same basis. even totally novel prediction. historically. or causal explanatory knowledge of a phenomenon is an impossible proposition. and vice versa. This bias towards prediction. .4) In this description. This position has several implications that have shaped the course of scientific practice. it implies that the practice of research need not be constrained by theoretical considerations. In this context. even for those that may not subscribe to the extreme position that explanation is impossible. Intuitively. 1982). 1994). First. 1987). expressing the regular relationships that are found to exist in the world. The explanation of a phenomenon is facilitated by the prediction of future situations and outcomes with regard to it. has been up for debate between those who are proponents of a positivistic. as providing better evidential support for a theory than .. has been an ingrained part of the scientific paradigm. These general statements. the relationship these two ends has been a topic of much contention. scientists do not necessarily view prediction. Second. In the current paradigm. if prediction and explanation are truly symmetrical.

then things become more complicated. and have become bedrocks of biology and physics. then they may adopt a research strategy choosing to focus on explanatory models while only considering prediction in terms of its ultimate aim. the model may be developed with limited theoretical considerations with the explicit intention of predicting some future outcome given the data. 1959). one might be able to make a successful prediction using information that is not causally relevant. 1989). Conversely. including the two previously mentioned. this does not mean that the model has provided a sufficient explanation of what has occurred. Modern science is wrought with examples of this. but non-predictive sciences (Scriven. However. Even when prediction of the future is possible with the help of a model. and has been extremely beneficial to the social sciences. In this new context. As such. this is more coincidental than necessary (Scriven. The two progress exclusively of one another. However. For example. evolutionary theory and Einstein's relativity. another individual with the same belief may choose instead to focus on the derivation of an accurate prediction model. Hypothetically. and that by deriving these natural laws allows for the explanation of past events along side the prediction of future events. one's position on the relative nature of explanation and prediction in scientific practice has major implications for how they chose to model their empirical data. if one subscribes to the idea that good models ought to represent natural laws. as the other will simply fall out (Reiss. This concession has been important in the rise of explanatory.that which is provided by the already known facts explained by the theory. a model adept at explaining a phenomenon may still not be an accurate forecast of future outcomes. one might be able to explain what one is unable to predict. Conversely. The Role of Theory in Regression Analyses The science of psychology often involves the observation of complex. then one might be able to focus solely on achieving one of the two. The prevailing tendency may be to focus on one of these two ends at the exclusion of the other. 1959) and relativity is a prominent example of a theory in which very few novel predictions were derived after the fact (Brush. However. explanation and prediction are two different abstractions. then explanation is only attainable in the context of prediction. It follows that though there are instances in which the two complement one another. appealing to the standard view's bias towards predictive power. In both instances. are seen as explanatory marvels. exclusive abstractions. evolutionary theory has very little predictive value (Scriven. including psychology.for instance. Returning to the topic of modeling. 2006). 1959). if one were to believe that the major purpose for prediction is the confirmation of explanatory theories. predicting that a storm is approaching one's location based upon the information provided by Doppler radar. one may be forced to accept the limitations of the derived model. If one adopts a strictly positivist viewpoint. multi-faceted . Two major thriving examples of this. if one views explanation and prediction as distinct.

as well as the limitations of the experimental methods. regardless of the theoretical feasibility of the model. Atheoretical models are most often specified using one of few automatic selection procedures. 1997). As briefly touched upon before. with the assertion that theory. is the best guide when selecting variables (Pedhazur. but it is not entirely necessary. The impact of the computer on the development of models cannot be overstated. the goal is not advancing some understanding. A major distinction between the two types of applications is the necessity of accounting for theoretical concerns. establishing possible predictor variables based upon trends in the data. To suggest the development of a purely atheoretical statistical model a few decades ago might have been considered lunacy. The current paradigm is a trend against atheoretical modeling. that prediction models and explanatory models face different sets of constraints when considering their development and application. However. Given this directive. Stepwise selection procedures are one class of automatic selection procedures that are used in specifying a given model. prediction models need not be limited by theoretical considerations. modern researchers have at their disposal a wide range of powerful statistical software packages that are able to perform the large number of calculations necessary in relatively little time. MR and other multivariate techniques have become so fundamental to psychology because they “best honor the reality to which the researcher is purportedly trying to generalize" (Thompson. (2) iteratively stepping. but developing the best calculus for predicting future outcomes. is may be the case that all that is necessary is a strong correlation between the predictor and criterion variables. or when a specified maximum number of steps has been reached. The value of the initial model may either be inclusive of all candidate variables included by the design of the analysis. and (3) terminating the search when stepping is no longer possible given the stepping criteria. or it may only include those variables that are specified to be forced into the model. a strong case could be made for atheoretical practices. Multiple regression analysis is a method of quantifying standardizing the relationship between several independent predictor variables and some dependent criterion variable. Gone are the days when the a priori decision of predictor variables was necessary. For this. In a few cases the model may be purely exploratory. rather than some arbitrary method. 1994) Regardless of one's philosophical persuasions. It may be the case that some of these predictor variables are established a priori to the analysis. an event or state of affairs will occur. The basic procedure involves (1) identifying an initial model. it should be apparent. In general. thus arriving at a final model. in light of the discussion in the previous section. The development of atheoretical models is a fairly controversial topic. it is only bound to providing a claim that at a certain time. given a certain set of conditions. citing a few of the major models used in psychology today (personality trait. The selection criteria . particularly in psychology. regression techniques are extremely attractive for the modeling of psychological data.phenomenon. With prediction models. the statistical properties of the predictor variables determine the order of entry. When the goal of the model is simply prediction. intelligence). however. repeatedly altering the model at the previous step by adding or removing a predictor variable in accordance with the "stepping criteria.

select a critical criterion value. and allow the computer to do the rest. when using stepwise regression techniques. it suffers in cases of colinearity (redundant predictors). The computer will then specify a model based upon a prediction equation derived solely from the empirical data. In principle. Stepping is also terminated if the maximum number of steps is reached. or adjusted r². These include: the observation that many of the software packages on the market use incorrect degrees of freedom in their stepwise computations. fully automated specification techniques have the possibility to produce models that have little to no explanatory power whatsoever. The model might include some variable that has little theoretical merit. if any of the candidate variables. then it is removed from the model. This process is repeated until no variable has a value that is less than the critical value for removal from the model. If any candidate variable has a value less than the critical value. However. In cases when the initial model is all-inclusive. Conversely. in forward entry methods. it is entirely possible that the resulting model may be theoretically absurd. the researcher works backward. From the standpoint of statistical validity. Variables are added into the model one by one until no variable has a value that exceeds this entry statistic. based upon test statistics such as F-values. 1995).such as .for adding or removing candidate variables is generally based upon some critical statistical probability value (1 minus the p-value). the most egregious sin of using automated stepwise procedures can be summed up in the following statement by Singer & Willet (2003). The value of the removal statistic is calculated at each step. then the variable with the smallest value is removed from the model. Since they are entirely atheoretical. It cannot distinguish predictors of direct substantive interest from those whose effects you want to control. The computer does not know your research questions nor the literature upon which they rest. These two methods. all one needs in order to specify a model is to input some set of candidate variables. Never let a computer select predictors mechanically. thus resulting in artificially inflated likelihood of obtaining statistical significance in the inclusion and removal criterion. and that stepwise methods tend to capitalize on sampling error and thus tend to yield results that are not replicable (Thompson. for many. such as hair texture in a model predicting performance on a college entrance exam for graduating high school senior. may be combined in such a manner in which both are used in the specification of the model. the initial model starts with few. t-values. removing candidate variables one by one. If this is the case with multiple variables. Because the inclusion or exclusion of a variable may be decided simply on the basis some arbitrary measure of the amount of variance it explains. In this case. and exclude some variable that is thought to have theoretical importance. and selected candidate variables are entered and removed from the model until neither is possible. entry to the model is determined upon the variable's value relative to some critical statistic. backward removal and forward entry. This method does have its detractors. there are several criticisms to the practice of using stepwise regression.

I return to the example involving the prediction of performance of graduating high school seniors in Lucas County on a college entry exam. The Five Factor Model provides an apt example of an atheoretical. race. Some may conclude that the above example is overly optimistic. it has shown to be an effective tool in applied areas. on its own merits. Deary. cite the above conclusion as a prime example against the use (or rather misuse) of automated selection in specifying models. Consider current personality trait theories. Most current researchers. it may be reasonable for one to conclude that a model including hair texture but discounting performance in high school may suggest that the underlying theoretical assumptions are incomplete. for instance. 1994). However. and whether the model itself has any explanatory merit to actually understanding personality (Pervin. particularly psychologists. A theoretically absurd model may serve as the impetus for some type of explanatory research. but do little to provide their students with the tools necessary to pass an entrance exam. Singer and Willet's cautionary advice is taken to heart.high school grade point average. The scores on any of the five dimensions of the model do well to predict future behavior of an individual in a given situation. believing in the primacy of explanation. The fact remains that atheoretical models fail to account for the vast wealth to accumulated knowledge that precedes them. Theory can be a useful and effective guide in determining which candidate variables should be selected and removed from a model. the model seems less absurd. Thus the resulting model may help one to predict a future outcome without offering any reasonable explanation of why it occurred. This same investigation may find that a strong correlation between hair texture and socio-economic status. while those in the poorer districts are less rigorous. do pursue a number of non-explanatory goals. Costa. The fact remains that scientists. Sakloiske. Rather than label the model as absurd. purely predictive model that is prominently used in psychology. It may be that the schools in more affluent districts are both more rigorous and better preparatory for college admissions. and socio-economic status. and to be fair. This is not to diminish the role of theory in the specification of models. Taken by itself. & Zeidner. this bias may be somewhat unfounded. 1998). In this scenario. or rather hair texture. In considering what a theoretically absurd model with strong predictive power may say about the theoretical assumptions that it is compared against. This particular model was specified atheoretically using factoral analyses. in particular the Five Factor Model as developed by Costa and McCrae (1992). and the model is often used in both clinical and evaluation practices (Matthews. nor the value of explanatory models. the Five Factor Model offers little clues as to the underlying mechanisms for the traits it describes. may reveal major discrepancies in the quality of high school education throughout Lucas County. However. A more in depth investigation. there is some doubt as to whether these traits are actually explanatory. resulting in a description of personality as parsed into five major traits. When designing a method of observing a . perhaps it is. However. and brings to light perhaps a major social issue a involving the county's schools that may have otherwise gone unnoticed. The development of models for measures of intelligence and personality speak volumes to this fact. But it is also indicative of some of the research practices that are current in psychology.

why? And how? It is perhaps for this reason that many psychologists are biased towards explanatory models (the romantic researcher). one has little constraint with regard to the number and type of predictor variables that one chooses to include. This may result in a high variability of the parameter estimates from one sample to another. More. undoubtedly coupled with the use of atheoretical modeling techniques is the tendency to misuse the specified model. When considering explanatory models. though. the most damning criticism against atheoretical models maybe that they fail to capture the zeitgeist of scientific discovery. pragmatic goals. the best model is the one that makes the most accurate prediction of a future outcome. is a series of compromises. Theory should play the role of a guide. Multicolinearity leads to increases in the standard error of the sampling distribution. Specifying an explanatory model. as well as their construct validity. though undesirable. in principle. rather than some ex deus machina. For an explanatory model constructed with the goal of prescribing explanatory power to each variable. In principle. both seem unsatisfactory. When specifying a model solely for the purpose of prediction. Explanatory models Ultimately. One should in no way seek to make an atheoretical model into an explanatory one. multicolinearity may cause a decrease in the statistical power of the model. one major obstacle comes to light. inconsequential.phenomenon it is rarely advisable to disregard the literature surrounding the topic. multicolinearity is not a fatal violation of the underlying assumptions of regression models. their ability to not only reproduce the observed real system behavior. from the specification of its predictor variables to its application should appeal to the use of some sort of fundamental knowledge about the phenomenon. the presence of two or more highly correlated variables may make it difficult to separate out their unique or independent effects. This may take to form of attempting to rectify the model with theory a posteriori. that has little affect on their predictive cousins. when one wishes to use the model to make some explanatory claim about the phenomena in question. leading to a greater risk of sample-to-sample divergence in the predicted values. however. This is not the case. However. Neither answer the really important questions of science. The researcher must find . but truly reflect the manner in which the real system operates to produce this behavior. While description and prediction of a future outcome may be attainable. There are caveats. Further. the problem of multicollinearity. then. Every stage in the development of theoretically driven model. Whether or not any of these predictor variables have an inordinately high correlation with one another is. but rather the standards by which models are measured are their structural validity. of course. the validity of inferences about unobserved variables that they provide on the basis of observed variables (Zeigler. and that most models are ultimately judged not on their ability to match the data acquired from a real system (replicative validity) nor their ability to match the data before it was actually acquired from the real system (predictive validity). 1985).

(1994). Though the current biases lean toward theoretically driven. and not vice versa. the model should be made to fit the data. More so.the answer is regrettably vanilla. K. 246. Keat. It is important to consider that theory might be useful in specifying what variable one should be looking for. P. Sakloiske. (1997).. T. D.a balance of what constitutes a model that best fits the data: demonstrating relatively high replicative reliability. but it is not the end all. 35-48. minimizing multicolinearity. Matthews.. 2007 Pedhazur. References Achinstein. 14(1). (1987). P. (1998) Dimensional models of personality: A framework for systematic clinical assessment. Multiple Regression in Behavioral Research: Explanation and Prediction (3rd ed. prediction: which carries more weight? PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association. and relevant question that psychologists should be asking with regards to their field as a science is the question of whether these statistical models could aim at replacing formalized theoretical models. P. Practical Assessment. much might be said about psychology's stigma as a soft science. On the question of which types of models should psychologists being using. Deary. Explanation v. London: Routledge and Kegan Paul. Estes. J. 11241129. All of these types of models continue to have a place in psychological science. Statistical Models in Behavioral Research. J. W. and fitting with the spirit of the theory.). & Urry. Prediction an theory evaluation: The case of light bending. New York: Thomson Learning. S.predictive versus explanatory. Research. (2000). Osborne. I. Perhaps the more interesting. G. 156-165. Manicas.. Science. Oxford: Blackwell. If so. care must be taken that the model itself has some degree of independence from its theoretical motivations. Brush. W. (1982). R. J. A history and philosophy of the social sciences. M. E. atheoretical versus theoretical . Prediction in multiple regression. Rather. Lawrence Erlbaum Associates. European Journal of Psychological Assessment. the reality of the discipline is that atheoretical and purely predictive models have had an important impact on the development of the field..). Costa. explanatory models. T. (1989). . J. & Zeidner. 2. (1991). & Evaluation Retrieved May 13. Social Theory as Science (2nd ed.

A. Explanation and prediction in evolutionary theory. Paper presented at the Biennial Meeting of the Southwestern Society for Research in Human Development. A Critical Analysis of Current Trait Theory. TX. 103-113. (1994). 55. (1995). P. (2006) Do We Need Mechanisms in Social Science? Philosophy of the Social Sciences. L. Why Multivariate Methods Are Usually Vital in Research: Some Basic Concepts. B. Malabar: Krieger.Pervin. Psychological Inquiry. B. 5(2). Reiss. . (1959). J. M. Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply here: A Guidelines Editorial. 130. Theory of Modelling and Simulation. Austin. (1985). B. Zeigler. Thompson. 525534. Scriven. Science. Educational and Psychological Measurement. 447-482. (1994). Thompson.

Sign up to vote on this title
UsefulNot useful