This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

Turning Numbers into Knowledge Page 1 of 7

**Two Important Statistics The R2 and the t-value
**

Notes By Edward Leamer January 26, 1999 What is an R2?.......................................................................................................................................... 2 What is a t-value?..................................................................................................................................... 2 Examples of the effect of sample size ....................................................................................................... 3 Questions about these numbers that you might be asking yourself: ........................................................ 4 Why is the difference between the adjusted R2 and the R2 hardly noticeable, except when the sample size is very small?............................................................................................................................. 4 Why is the adjusted R2 a U-shaped function of sample size? .............................................................. 4 Why does the standard error fall with sample size and the t-value increase? ....................................... 4 How big should be my R2 ......................................................................................................................... 4 Example: Two Multiple Regressions that Explain GDP......................................................................... 5 Which of these equations is actually better? .......................................................................................... 6 How big should my t-value be? I’m a t=1 kind of guy. ............................................................................. 6 Summary Table ........................................................................................................................................ 7

The standard error for one variable cannot be compared effortlessly with the standard error of another variable. 2. For example. That’s treating questions 3 and 4 as if they were identical: if there is a lot of noise. 2 and 3 are entirely straightforward. Quarterly Growth at Annualized Rates. You could put a line with a much steeper slope into that picture and not distort the relationship very much. When you run a regression. What is an estimate. 1 0 -10 -20 -20 -10 0 Growth(-1) 10 20 How far they can stretch their arms horizontally. The best line summarizing this scatter has a slight upward slope.What’s an R2 and what’s a t-value? Turning Numbers into Knowledge Page 2 of 7 What is an R2? The R2 measures the percent of variation in the “dependent” variable that can be accounted for or “explained” by your “independent” variables. for each predictor (independent/explanatory) variable you will get all three: an estimated coefficient. Remember that all those numbers that come jumping out of the computer when you ask for a regression are only answering four questions: 1. the t-value doesn’t depend on the units. the tvalue can be compared across all variables. You can see that 1. One reason is that units in which the variables are measured affect the standard errors. Unlike the standard error. The t-value is the ratio of the estimate divided by the standard error. The estimate represents the computer’s attempt to find the equation that best summarizes the data you are studying. a standard error and a t-value? I am glad that you asked me about all three of these items at the same time since it reveals that you understand that they are all related. a standard error. you cannot tell where the line is. 4. Using the same wingspan variable. There is a lot of noise around that line. We are working with the growth of real GDP. The standard error measures how sure the computer feels about its choice of estimate. 2. What are the data that you are examining? What is the best line for summarizing these data? How much noise is there around that line? How well is that line determined? Items 1. It is another way of measuring how sure the computer feels. and a t-value. you can account for only 40% of the variability in their weights. 3. Must be that line is not well determined by the data. 1947-1998 Is there much momentum in US real GDP? 20 10 Growth But what about item 4: “How well is that line determined?” Maybe you are thinking that you can see the answer to that question also in this picture. 3. Unlike the standard error. . you can account for 97% of the variation in adult heights using wingspan1 as a predictor. Indeed they come together in a neat little package. and you should be able to see them in this scatter diagram which compares growth of US real GDP in one quarter against the growth in the previous quarter.

12 0. of Obs.2 Error 0 -0.08 0.2 -0.35 -0. We’ll answer those questions next.07 0. R-squared 204 170 130 90 50 10 0. If a little less than a decade of data are trimmed.10 0. to 90.4 0. To illustrate that point I have successively trimmed more and more data from the sample and computed the corresponding regression.6 Estimate and Standard 0.27 0.56 -2.10 0.1 0 0 50 100 150 200 R-squared Adj. Think of some questions that these numbers suggest.6 -0.08 0.30 0.83 2.08 0.3 0 .4 -0.07 0.07 3.8 -1 0 50 100 150 200 Number of Observations (Before 1998:2) 4 2 0 10 Absolute t-value 8 6 Estimate Std. Error 0.29 0.41 Estimate 0.07 0. to 50 and last to only 10.5 0 . Examples of the effect of sample size Table 1 The Effect of Sample Size R-squared and Slope of Regression of Growth on Growth(-1) Period 1947:3 1998:2 1956:1 1998:2 1966:1 1998:2 1976:1 1998:2 1986:1 1998:2 1996:1 1998:2 N.2 0 . The numbers are reported in Table 1 below. then to 130. the number drops to 170. If all the quarterly data are included there are 204 observations. Noisy experiments don’t provide the best information.12 0.08 0.4 0 .25 2.16 4.14 0.34 0.69 Effect of Sam p l e S i z e o n R .11 0.What’s an R2 and what’s a t-value? Turning Numbers into Knowledge Page 3 of 7 But there is something else that affects the answer to question 4: the sample size. R-sq 0.S q u a r e d 0 . but if you run a lot of them. Error abs t-value . R-sq N u m b e r o f Observations (ending in 1998:2) Effect of Sample Size on Estimates 0.29 t-value 5. Take a look at the numbers in the tables and the figures below that illustrate the numbers.78 Std. you can average out the noise to find the truth.09 0.47 Adj.07 0.

The adjusted R2 is a measure of the noise around the regression line. In fact. 2 . But the most recent data is anomalous – suggesting a negative relationship – you can see the negative estimate. If one equation is predicting the growth of GDP. 2. you are probably “over-fitting” – asking too much of the data.145. the t-value thus grows like √n.799. with a t-value of 394. It is not difficult to form some conjectures regarding the U-shaped function that we see here The older data. the change in the percent unemployed. This means that the standard error declines.0. and the level of real defense expenditures in the previous quarter. It is very difficult to detect any effect of defense expenditures on growth. The t-value is the ratio of the estimate divided by the standard error. the predictor variables are the growth rate in the previous quarter. To see how the context matters. But make sure the horses are on the same track. If the number of variables is a significant fraction of the number of observations. correcting for this overfitting problem. don’t compare that R2 with another equation that is predicting some other variable or the same variable for a different sample period. Technically. Can you see that the t-value does indeed grow like the square root of the sample size? How big should be my R2 A highly trained professional statistician can with confidence tell you that an R2 of one is great and an R of zero is terrible. In the first equation. Equation 2 is the best choice. If the relationship were stable over time. GDP growth is relatively high following periods with high but falling unemployment. is more variable and the intertemporal correlation at that time was larger. but can’t really say anything more. The second explains the level of GDP. It doesn’t work well to combine that most recent data with any of the less anomalous data. and the growth in real defense expenditures. The second equation that explains the level of real GDP looks a lot better. The direction that unemployment is moving is ten times more important than the level of unemployment. the standard error of a coefficient declines like 1/square root of the number of observations. Why is the adjusted R2 a U-shaped function of sample size? This is not a general property but instead is a function of some special features of the GDP growth series. It’s best to think of it as a horse race. Below are two regressions estimated from US real GDP data from 1946 second quarter to 1998 second quarter. Why does the standard error fall with sample size and the t-value increase? As you add more and more data. it will help to take a look at some examples. It depends on how fast the other horses run. It has an R2 nearly equal to one while the other equation has that miserable little R2 of only 0. and the change in the number of unemployed. What is big or small between those extremes is a matter of judgement and depends on the context and the variable that is being explained. If the estimate is stable. In the second equation. prior to 1985. doesn’t it. 3. except when the sample size is very small? Keep in mind that the R2 goes up each time you add another variable to the equation. the predictor variables are the GDP in the previous quarter. the choice of the best fitting regression line becomes more and more clear. This correction doesn’t matter much if the number of observations is a large multiple of the number of variables in the equation. the number of unemployed in the previous quarter. The second equation has that great predictor variable. Each is meant to help to answer similar questions: Do unemployment and/or federal defense expenditures have a positive or a negative effect on real GDP? The first equation explains the rate of growth of GDP. There is no absolute standard.What’s an R2 and what’s a t-value? Turning Numbers into Knowledge Page 4 of 7 Questions about these numbers that you might be asking yourself: Why is the difference between the adjusted R2 and the R2 hardly noticeable. The best t-value the first equation can find is only –2. that’s for sure. the lagged value of real GDP. that adjusted R2 should not depend on the sample size. the percentage of the workforce that was unemployed in the previous quarter. these two equations are saying pretty much the same thing: 1.

000 0.549 -2.54 394 2.799 -0.25 0.155 0.D. Error 13.341 3.121 0.265 0.727 5.59 0.83 1720. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) .233 -0.112 0.094 1.122 3.886 1.77 9.123 0.What’s an R2 and what’s a t-value? Turning Numbers into Knowledge Page 5 of 7 Example: Two Multiple Regressions that Explain GDP Equation 1 Dependent Variable: Growth of Real GDP Method: Least Squares Date: 01/16/99 Time: 09:50 Sample(adjusted): 1961:1 1998:2 Included observations: 150 after adjusting endpoints Variable C Growth(-1) Unemploy %(-1) Change in Unemploy %(-1)) Growth in Real Defense(-1) R-squared Adjusted R-squared S.04 t-Statistic 0.99 3957.000 0.034 t-Statistic 0.154 0.E.301 -3.77 Std.20 1.377 0.145 0.9996 33 213529 -981 1.D.473 6.892 Std.9996 0.86 9.7 -397. 0.0001 Mean dependent var S.276 0.94 -6.194 1.00000 Mean dependent var S.9 1.00 4810 -43174 0.372 5.018 0. 0.006 0. Error 1.534 Prob. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) Equation 2 Dependent Variable: Real GDP ($b 1992) Method: Least Squares Date: 01/24/99 Time: 17:37 Sample(adjusted): 1948:3 1998:2 Included observations: 200 after adjusting endpoints Variable C Real GDP(-1) Unemployed (-1) (billion) Change in unemployed(-1)) Fed Defense $b 1992 R-squared Adjusted R-squared S.494 1769.103 0.00 0.004 0. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient 1.94 134481 0. of regression Sum squared resid Log likelihood Durbin-Watson stat Coefficient 7.003 1637 6875 0.E.28 0.594 3.01 Prob.

0. which it should be. and a 95% chance that it is within 2 standard errors. Equation 2 looks a lot better with an R2 of 0. for another reason. the estimated coefficient of any variable with a bigger t-value than the one you omitted cannot change in sign. The first is asking what percentage of the historical variation in the growth rate can be explained by these variables. The R2 of the first equation is an answer to a different question than the R2 of the second regression.” The t-value is the estimate divided by the standard error. But to say that a good prediction of next quarter’s GDP is this quarter’s GDP plus $4810 for each unemployed person may or may not give you a very accurate predictor of the growth of GDP. It is entirely arbitrary whether you choose one or two or some other number as the critical value of the t-statistic. then the adjusted R2 would increase.0-. I think that it does make sense to think of there being a constant effect of the rate of unemployment on the rate of growth. the data are saying: “I cannot tell the sign of the effect. The t-value is an answer to the question: “Based on these data.) . The t-value is also an answer to the question: “What would happen to the adjusted R2 if I omitted this variable from the equation?” If the t-value were greater than one in absolute value. (You can do a lot of damage with this information. with a growing economy. Thus if the t-value is one. A statistician can give you reason to believe that there is a 68% chance that the true coefficient is within one standard error of its estimate.95)/2 = 2. Half of the remaining probability is on one side and half on the other. If the t-value is equal to two in absolute value. The second is asking what percentage of the historical variability of real GDP can be explained. If the t-value were less than one. What is big? Big compared with the other variables in the equation. The one that is doing the job here is the previous quarter’s GDP. so don’t tell anyone else. If the adjusted R2 is one target of your analysis. A big t-value thus means a sturdy inference.025 = 97. Any variable with a strong time trend can account for a lot of the so-called variability in the level of real GDP. or do you need 1.9996. there is a (1.5% of the variability of the rate of growth of real GDP across quarters. Most of the historical variability of real GDP is just the time trend. the data are suggesting that the sign conforms with the sign of the estimate. it doesn’t make sense to think of each unemployed worker making a constant contribution to the level of GDP. It gets that astronomical t-value of 394. If the absolute t-value is one. Which of these equations is actually better? Although these two equations are saying about the same thing regarding the determinants of GDP growth.68)/2 = 16% chance that the true value is different in sign from the estimate. the unemployment rate and the growth in defense expenditures together account for 14. How big should my t-value be? I’m a t=1 kind of guy.16 = 84% virtual certainty.0 .What’s an R2 and what’s a t-value? Turning Numbers into Knowledge Page 6 of 7 Take another look at Equation 1. Is 1-0. I prefer the one you don’t like – equation 1 that explains the growth in real GDP with the low R2. then a onestandard-error interval around the estimate just includes the zero value.5% for virtual certainty? The choice of a critical value equal to one is not entirely arbitrary.5% chance of an opposite sign. To decide. then the adjusted R2 would fall.0 – 0. If the t-value is small. can we tell what is the sign of the coefficient?” If the t-value is large. That looks pretty bad. The previous quarters’ growth of GDP. variables with t-values less than one in absolute value are asking to be discarded. I like this equation because I think it is capable of tracking the behavior of the economy over time better than the other one. you’d have to compute the R2 differently. there is only a (1. There is another property of a t-value that I probably shouldn’t be revealing. It depends on the odds that you like to bet on. But think again. If you drop a variable from your equation. I think that over time.

You can do that using the previous quarter’s GDP. Think about medicine as an example.What’s an R2 and what’s a t-value? Turning Numbers into Knowledge Page 7 of 7 Summary Table Low t-value Low R2 Absolute Misery – the effect of your favorite variable isn’t detectable using these data. And think again how you defined your dependent variable. High t-value Oh Joy.9 in an equation explaining quarterly GDP is no big deal. then your favorite one will look better.. regardless of your coffee-drinking habits.5 for explaining the growth of GDP using only lagged variables is terrific. You have found a way to measure the effect of your favorite variable using these data. . but still not be able to predict whether or not you will suffer from heart disease. The other variables in your equation are not crowding out your favorite. It is easier to predict your daughter’s height at age 16 than how much she will grow from birthday 16 to birthday 17. But be alert – your enemies are trying to find just the right variable to include to knock yours out of the equation. Maybe they can find out that coffee drinking has a statistically significant and even important effect on the probability of heart disease. and your equation overall sucks. Another variable is doing the work and giving you a high R2. Bliss – the best it can get. What it means to have a high R2 depends on the variable. An R2 of 0. Maybe if you discard one of the other variables. The ones that are correlated with your favorite might be causing the problem. Maybe you are worried that your equation overall is not explaining much of the variability of the dependent variable. This is the normal state of affairs for variables defined in a way that makes them hard to predict. Don’t be upset. High R2 Forget that variable. An R2 of 0. like growth rates or changes over time.

- Krugman 01
- Krugman Notes
- Chinese Journal
- Chinese Business
- 2011
- Annual Report 2009
- Silver Birtd 2008
- Annual Report 2009
- Silver Bird Annual Report 2008
- Trade Barrier
- The Possible Positive and Negative Effects of FDI v2
- The Possible Positive and Negative Effects of FDI(1) -
- Basic Corporate Finance
- Chap 5
- History of Tqm
- Hilton Spring 2007
- Hilton Spring 2007
- Agency Problems

- Regression
- Slides Non Parametric Regression
- Wilsdcats Drilled 3
- The Relationship Between the Growth Rate of Consumption Expenditures and GDP for Australia 1969
- ch11
- er-final
- JIE93Ma
- Church, Curram - 1996 - Forecasting Consumers' Expenditure a Comparison Between Econometric and Neural Network Models
- wp1405
- Effects of Transformations on Significance of the Pearson T-Test
- m6
- Notes on Forecasting With Moving Averages--Robert Nau
- US Federal Reserve
- 021406_6
- Introduction to Econometrics, Tutorial (4)
- QT
- Finished Assignment 1
- ERP1992_Chapter7
- CFMV Q3-12 Final
- 201305 Pap
- The Monte Carlo method in Excel - André Farber
- AnswersCH4[1]
- IMF
- module4-example1
- Semi Markov Model for Market Microstructure [P. Fodra, H. Pham] 2013
- US Treasury
- Chapter 15 Estimating Methods
- Imd Methodology
- time-series-panels-class-note.pdf
- Statistics Sets

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd