You are on page 1of 21
1266 Cope 24 Fomastng Hotes “Thus, to forecast the value ofthe series during period + k, we multiply our estimate ofthe period 1 + K hase (L, +A7,) by our most recent estimate of month (1+ 4s seasonality Factor (5,24. Initialization of Winter's Method To obtain good forecasts with Winter's method, we must obtain good initial estimates of base, trend, and all seasonal factors. Let Ly = estimate of base at beginning of month | T, = estimate of trend at beginning of month 1 ‘estimate of January seasonal factor at beginning of month (16) estimate of February seasonal factor at beginning of month 1 jy = estimate of December seasonal factor at beginning of month 1 AA variety of methods are available to estimate the parameters in (16), We choose a simple ‘method that requires two years of data, Suppose that the last two years of sales (by month) were as follows: Year-2: 4,3, 10, 14, 25, 26, 38, 40, 28, 17, 16, 13 Year-1; 9,6, 18, 27, 48, 50, 75, 77, 52, 33, 31, 24 Total sales during year ~2 = 234 ‘Total sales during year — 1 = 450 We estimate J, by (Average monthly sales during year 1) — (Average monthly sales during year -2) 2 t To estimate Ly, we first determine the average monthly demand during year ~ 14%). This estimates the base at the middle of year —1 (month 6.5 of year —1). To bring this estimate to the end of month 12 of year —1, we ald (12 ~6.5)7, = 5.57, Thus, our estimate of Ly = 375 +S5(1.5) = 45.75. ‘To estimate the seasonality factor for a given month (say, January = s_,,), we take an estimate of January seasonality for year ~2 and year —1 and average them. In year 2. average monthly demand was = 19.5; in January of year —2, 4 air conditioners were sold. Therefore, Yea 2 estimate of January seasonality = 55 = 0.205 Similarly Year ~ {estimate of January seasonality = > = 0.240 | | REMARKS TABLET Wet’ Baths fa i ioe (a Ba04y 244 Winter’ Met: apo Smoning wi Secsety 1267 Finally, we obtain s 4, = = 0.22. In similar fashion, we obtain 0.16, 5 8, 542133 0.82, 5, = 0.65 Asa check, initial seasonal factor estimates should average t0 1 Before showing how (12)-(14) are used, we demonstrate how to use (15) for Forecasting, At the beginning of month 1, our forecast for month | air conditioner sales is (45 to.40 75 + 1,510. Jog = (hy + T5119 At the beginning of month 1, our forecast for month 7 air conditioner sales is Sa lost For a = 0.5.8 =0.4, y = 0.6, applying Winter's method to the frst 12 months of air conditioner sales data yields the results in Table 7. We illustrate the computations by computing L,, T,,and s, (Ly + TT) og7-az = AS-7S + TLS) 97 B 5( 23.) 4055.54 1.5) =53.17 05(33) sasesrs1s (2) $05) +75 OAL, = by) + 0.61, = 0.4(53.17 ~ 45.75) + 0.6(1.5) = 3.87 06(#) +04, 06(g) e040 3.7, ‘Thus, at the end of month 1, our forecast for (say) month 7 air conditioner sales is f;,, (L, +615; ,6. 42 = 33.17 + 6(3.87))1.97 = 150.49. Our forecast for month 7 atthe end ‘of month 1 exceeds the forecast for month 7 made at the beginning of month 1, because ‘month I sales were higher than predicted, For all 24 months of data, spreadsheet calculations show that MAD = 10.48, 0.23 Since Winter's method uses three smoothing constants, iis quite a chore to find the eombination ‘of a, and y values that yiels the smallest MAD. The use ofa ypreadsheet to do Winter's method mh Soles, fay Be flees 387 023 1040 2.60 2 7 121 01S 913 23 3 009 048 25.80 2.80 4.2 080 070 3520-320 53S Og 128 5871-071 ea ao 59.42 058 go) =D50 199 $682 3:18, 98 030 206 9097 208, 9 63 “028 14l 6284 016 wo 029 O88 3902-002 noo 007 O83 3612 O88 005 065 2893 DOT FigURE S Contin Ss Picts 1268 Corer 24 Fasting Mods is discussed in Review Problem 3. As with the Holt method, ‘good values of 8, and y. 2 Although the values of a and that minimize MAD should not exceed 0.5 (as in the Holt method) itis not uncomman for the best value of y to exceed 0.5, Ths is hecause for monthly data, teach monthly seasonal factor is updated during only + ofall periods, ince the seasonality factors are updated so infrequently, we may need to give more weight to each observation, so y = 0.5 is not ‘out ofthe question, 3 Figure 5 shows how well forecasts of air conditioner sales (for a = 0.5, #1 = 14, and y = 0.6) ‘compare to actual air conditioner sales. The agreement between predicted and actual sles is quite ‘good except during months 15 and 17. During these months, our forecasts are much too high, Perhaps ew salespeople were hired during these two months, causing sale tobe less than anticipated. 1 ADT2 command can aid in finding Forecasting Accuracy For any forecasting model in which forecast errors are normally distributed, we may use MAD to estimate s, = standard deviation of our forecast errors. The relationship between, MAD and s, is given in formula (17), 1,25 MAD a7 Assuming that errors are normally distributed, we know that approximately 68% of our predictions should be within s, ofthe actual value, and approximately 95% of our predictions should be within 2s, of the actual value. Thus, for our air conditioner sales predictions, we find that 5, = 1.25(10.48) = 13.10, So we would expect that for about 0,68(24) = 16 of 24 months, our predictions for sales would be off by st most 13.10 air conditioners, and for 0.95(24) = 23 of 24 months, our predictions would be off by at most 2(13,10) = 26.2 ait conditioners, Actually, our predictions for air conditioner sales are accurate within 13.10 during 17 months and accurate within 26.2 during 22 months. Actual sles Sates eames peu mee ptm tn 17a peruse 27g Mooth ' i reer SEPP UA Winters ato: ExporerlSmeting wth Sexsniy 1269) We note that in most situations where a forecast is required, knowing something about the probable accuracy ofthe forecast is almost as important as the actual forecast, Thus, this short subsection is very important! ‘roblems iroup A ‘Simple exponential smoothing (with a = 0.2) is being sed to forecast monthly beer sales at Gordon's liquor store Tier observing April's demand, the predicted demand for fay is 4000 cans of beer. ‘4 Acthe beginning of May, what i the predition for Suly's boo sales? bb Actual demand during May and June is a8 follows: May, 4500 cans of beer; Fone, 500 cans of beer After ‘observing June's demand, what isthe forecast for July's ‘demand? The demand during May and June averages out 10 ‘saz "4000 cans per month. This is the same as the forecast for monthly sales before we observed the May and June data. Yet after observing the May and June demands for bee, our forecast for Huy demanal has “decreased front what twas at the end of Apri. Why? We are predicting quarterly sales for soda at Gordon's “quar Store using Winters method. We are given the fol- ‘wing information: Seasonality factors: fall =0.8 spring = 12 3 ‘Current base estimate = 400 cases per quarter Current trend estimate = 40 cases per quarter 02 p=03 y=05 winter =07 summer Now sales 6f 650 cases during the summer quarter are ob served, Use this information to update the estimates of has, trend, and seasonality b After observing the summer demand, forecast de~ ‘mand forthe fall quater and the winter quarter. 3 Weare using Winter's method and monthly data o fore ‘ast the GNP. (Ail numbers ae in billions of dollars.) At the nd of January 1992, £, = 600 and T, = 5. We are given the Fallowing seasonalifies; January, 0.80; February, 0.85; De- ember, [.2. During February 1992, the GNP is at a level Sf 630. At the end of February what is the forecast forthe December 1992 level of the GNP? Use a= 8 = y = 05 4 We are using the Holt method to prediet monthly VCR Sales at Highland Appliance. At the end of October, 1992, T, = 20) and T, = 10, During November, 1992, 230 VCRs sve sold. At the End of November MAD = 35 and were 95% Sure that VCR sales for December. 1992 will be between and _. Usew = = 05, 5. We are using simple exponential smoothing to predict ‘monthly electric shaver sales at Hook’s Drug Store. At the tnd of October 1992, our forecast for December 1992 sales ‘vas 40, In November 50 shavers were sold, and during De- ember 45 shavers were sold. Suppose ” = 0.50. At the end ff December 1992, what is our prediction forthe total num- ber of shavers that will be sold during March and April of 1993? 6 We are using simple exponential smoothing to predict ‘monthly suto sales at Bloomington Ford. The company be- Tieves that safes do not exhibit tend or seasonality. so simple ‘exponential smoothing has yielded satisfactory forecasts for the most par, Each March, however, Bloomington Ford has ‘observed that sles tend to exceed the simple exponential ‘soothing forecast (4,,) by 200. Suppose that atthe end of February 1993, A, = 600. During March 1993, 900 cars are sold, a Usinge = 0.3, determine atthe end of March 1993) ‘forecast for April 1993 car sales, b Assume that atthe end of March, MAD = 60. We are 95% sure that Apri sles will be between _and _ 7 ‘The University Credit Union is open Monday through Saturday. Winters method is being used (witha = 1.5) to predict the number of customers entering the bank cach day After incorporating the arivalsof October 16, 1993, 1, = 200 customers, 7, = | customer, and the seasonaities sie as follows: Monday, 0.90: Tuesday, 0.70; Wednesday. (080, Thursday, 1.1; Friday 1.2: Saturday, 1.3. For example this means that on atypical Monday, the numberof customers is WF of the numberof customers entering the bank on an ge day. On Tuesday, October 17, 1993, 182 customers tenter the bank. Atthe close of usiness on October 17, 1993, make a prediction forthe number of customers to enter the bank on October 25.1993, 8 ‘The Holt method (exponential smoothing with trend and ‘without seasonality) is being use to forecast weekly carsales at TOD Ford. Currently, the hase i estimated to be 50 cars per week, aid the trend is estimated to be 6 ears per week. During the current week, 30cars are sol. After abserving the courtent week's sales, Forecast the number of ears to be sold week tha begins three weeks after he conctusion p=03. 9 Winter's method (with a = 0.2, 8 = 0.1, and y = 0:5) Js being used to forecast the number of customers served teach day by Last National Bank. The bank is open Monday through Friday. At present, the following seasonalities have ‘heen estimated: Monday, 0.80: Tuesday, 0.90; Wednesday. (095: Thursday, 1.10: Friday, 1.25. A seasonality of 0.80 for ‘Monday means that on @ Monday, the number of customers iti fof the current week. Use a 1270 Caper 24 Foss Nadel served hy the bank tends to be 80% of average, Current, the ‘ase is estimated to be 20 customers, and the trend is esti rated to equal I customer. After observing that on Monday 30 customers are served by the bank, predict the number of| ‘customers ta he served by the bank on Wednesday VO We have heen assigned to forceast the number of air- craft engines ordered each month by Engine Company’ Aue end of February. the forecast is that 100 engines will be or: dered during April. During March, 120 engines are ordered Using a = 0.3, determine (atthe end of March) 3 forecast for the number of orders placed duting, April ‘Aver the same question for May, Suppose at the end of March, MAD = 16. At the end of March, we are 68% sure that April orders will be emeen and 11 Winter's method is being used to forcast quarterly US. el ales a lions of ola), Atthe end of he ft turer of 1983.1. ~ 00,7. 30, andthe sesontndees Teunfllowe iter I, 0.0,guner2.095 quer ).095 Gare 4.120, Daring the second guatr of 192 resales tre $360 blon, Assume = 0248 = 0nd y= 3 © Atthe end of the second quarter of 1992, develop a Foveeast for retail sales during the fourth quarter of 1992, be Atthe end of the second quarter of 1992, develop a Foreeast forthe second quarter of 1993, Group B 12 Simple exponential smoothing with « = 0.3 is being used to predict sales of ridios at Lovsland Appliance. Predic= tions sie made on a monthly basis. After observing August radio sales, the forecast for September is 100 radios. © During September, 120 adios resold. After obser ing September sales, what is the prediction for October radio sales? Por November radio sles? bIetums out that June sales were recorded as 10 ra dios, Actually, however, 100 radios were sold in June, After correcting for this ero, what would be the pred tion for October radio sales? 13 In our discussion of Winter's method, a monthly sea- sonality of (say) 0.80 for January means tha during January air conditioner sales are expected to he 80% of the sles dur- ing an average month, An alternative approach to modeling seasonality is 10 let the seasonality factor for each month represent how far above average air conditioner sales will be luring the current month, For instance. if s,,, = —0, then air conditioner sales during January are expected to be 50 less than aie conditioner sales during an average month. If =, then air conditioner sales during lily are expected to he 90 more than sir conditioner sales diving an average month 4, = the seasonality for month after month 1 demand is observed estimate of ase after month Memand is abserved estimate of trend after month + demand is observed ‘Then the Winter's method equations given in the text are modified to be as follows (” indiates multiplication): L= are oH, +7, (Lb, +0 =O, YUNA A165, @ What should and 11 be? Suppose that month 13i8a fanuary, yy = 30.7, =3, 5; = 80, and 5, = 20. Let a Suppose 12 ai conditioners are sold during month 13. [Atthe end of month 3. what i the predition for ait conditioner sales daring month 18? 14 Winter's method assumes « multiplicative seasonality but an additive trend. For example, a trend of 5 means that the base will increase by S units pe period, Suppose there is actually a multiplicative trend. Then (ignoring Seasonality) if the curent estimate of the base is 50 and the current estimate of the trend is 1.2, we would predict demand to inerease by 20% per period. Ignoring seasonality, we would thus fore- «ast the next period's demand to be $0(1.2) and forecast the demand two periods inthe future tobe SO(1.2)°. If we want {o use a makplicaive trend in Winter's method, we should use the following equations awe(2) oa eeu) BeUD +B, @ Determine what F and £1 should be. b Suppose we are working with monthly data and month 12 is a December, month 13 a January, and so fon, Also suppose that Ly = 100, Ty, = 1.2.5; = 0.0, s, =0.70, and s, = 0.95, Suppose ,, = 200. At the cihd.of month 13, what is the prediction Tor x,,? Assume a=pay =05 15. Hol’s method assumes an additive wend For example a trend of 5 means that the base will increase by units per period. Suppose there is actualy a multiplicative trend, ‘Thus, ifthe curren estimate ofthe base is $0 and the earent estimate of the trend is 1.2, we would predict demand (0 inerease by 20% per period. So we would forecast the next period’s demand to be 50(1.2) and forecast the demand ¢W0 periods in the future to be S0(1.2)°, If we want to use 2 ‘multiplicative trend in Holt’s method, we should use the Following equations L sas) dae T=BeID 4-H, Determine whae / and 1 should be, Suppose we are working with monthly date and ‘month 2 is a December, month 13 x January, and so fom, Also suppose that Ly, = 100 and 7,) = 1.2. Sup- pose 6) = 200, At the end of month 13, what isthe prediction for x,, Assume er 05, 16 A version of simple exponential smoothing canbe used te predict the outcome of sporting events, To illustrate, con- Siser pro football, We fist assume that all zames are played ‘om a neutral field, Before each day of play. we assume that teach feam has a rating. For example, if the Bears” rating is +10 and the Bengals’ rating is +6, we would predict the Bears to heat the Bengals by 10 ~ 6 = 4 points, Sup- pose the Bears play the Bengals and win by 20 points For this observation, we “underpredicted” the Bears’ per- formance by 20-4 = 16 points, The best efor pro football is 0.10, Aller the game, We therefore increase the Bests’ rating by 1610.1) = 1,6 and decrease the Bengals’ rating bby 16 points, In a rematch, the Bears would be favored by HS AaHocFowsing 1271 2 How does this approach relate tothe equation 4, = A, tate,” b Suppose the home field advantage in pro football is 3 points; that is, home teams tend to outscore visiting teams by an average of 3 points a game, How could the home field advantage be incorporated into this system? © How could we determine the best a for pro football? 4 How might we determine ratings for each team at the beginning ofthe season? © Suppose we tied 1o apply the above method 10 pre: ‘ict pro football (16-game schedule), college football (I T-game schedule), college basketball 30-game sched- ‘le, and pro basketball (82-gameschedule). Which sport ‘would have the smallest optimal a? Which sport would have the largest optimal @? (10+ 1.6) — (6-16) = 72 points £ Why would this approach probably yield poor fre- casts FoF major league Baseball? 24.5 Ad Hoc Forecasting ‘Suppose we want to determine how many tellers a bank must have working each day to provide adequate service. In order to use the queuing models of Chapter 22 to answer this question, we need to be able to predict the number of customers who will enter the bank each day. The bank manager believes that the month of the year and the day of the week influence the number of customers entering the bank. (The bank is open Monday through ‘Saturday, except for holidays.) Can we develop a simple forecasting model to help the bank predict the number of customers who will enter each day’? ‘The number of customers entering the bank each day during the last year is given in Table 8, We have used | = Monday, 2 = Tuesday, ..., 6 = Saturday, and 7 = Sunday to denote the days of the week. A “Y” in the AFI column means that the day is the day after the bank was closed fora holiday. Let.x, = number of customers enteri DW, x M, 6,, where the bank on day #. We postulate that x, = B x B= base level of customer traffic corresponding to an average day DW, = day of the week factor corresponding to day of the week on which day’ falls M, monthly factor corresponding to month during which day ¢ oceurs ¢, = random error term whose average value equals | To begin, we estimate B = average number of arrivals per day the bank is open 438.33. We illustrate the estimation of the DW, by DW for Monday = XST!8¢ number of arivals on Mondays hank is open 492.07 Beas 1280 Cpr 24—Fowasig als February 8, of next year), we would obtain B> (DW, for Saturday) x (M, for February) = £38.33 (0.809)(1.025) = 363.47 customers. For the data given in Table 8, our simple model yielded a MAD of 79.1. If this method. ‘were used to generate forecasts for the coming year, however, the MAD would probably exceed 79.1, This is because we have fit our parameters to past data; there is no guarantee that future data will “know” that they should follow the same pattern as past data, We have also neglected to consider whether or not an upward trend in the data is present (see Problem 3). suppose the bank manager observes that on the day after a holiday, bank traffic is much higher than the model predicts, The data in Table 9 indicate that tis is indeed the case, How can we use this information to obtain more accurate customer forecasts for days afer holidays? From Table 9, we find that the average value of Actual/Forecasted for days after a holiday is 1.15. Thus, for any day after a holiday, we obtain a new forecast simply by multiplying our previous forecast by 1.15, TABLED | | | ep UD Freed Bark tic on ay oer yey Acta (romded)—_Acol/Forecsted away? 108 Jel B38 bys 6s LD Scpember3 45) 392 LIT November) 70396 LIB Decemer36 91 aM) Ltd Problems 2 Suppose agsinthatthe banks acllepe nei onion. at Group A pow te sagt every eter Fay gt ak wali is much higher than usual on staf paydays. How could we incorporate this fact nto the forecasting procedre described inthis section? 1. Suppose the bank is college credit union and that on | lays when te college's professors get pid, bank trlfic is much higher than usual. Assuming that college professors | are paid om the fist weekslay of each month, How could we 3 Suppose thatthe numberof customers entering the bank incorporate this fact into the Forecasting procedure described js growing at around 20% per year. How could we incompo: inthis section? fe this fact into the forecasting procedure described in this 24.6 Simple Linear Regression Often, we try to predict the value of one variable (called the dependent variable) from the value of another variable (the independent variable), Some examples follow: Dependent Variable Independent Variable Sales of product, Price of product Automobile sales Interest rate ‘Total production cose Units produced TABLE 10 Welly Cost Dat ons FIGDRE 6 Sait of ost of rig ‘Tis 246 Single nm Rees 1281 Je are related in a near fashion, simple Ifthe dependent variable and the independent vari Tinea repression ean be used to estimate this lationship. In Section 24.7, we wil discuss hhow to estimate nonlinear relationships. “To ilustrate simple Linear regression, fet’ recall the Giapetto problem (Example | of Chapter 3) To setup this problem, we need to determine the cost of prong 2 soldet cake cost of producing a train, Let's suppose that we want fo determine the cost of provhcing a train, To estimate this cost, we have observed for ten weeks the mimber of trans produced each weck and te total cost of producing those trains. This information is ven in Table 10. “The data from Table 10 are plotted in Figure 6. Observe that there appears 10 be & strong linear relationship between x (number of trains produced durin week /) and ¥, (cost of producing trans made during week i). The line plotted in Figure 6 appears in & weay to Be made precise Iter, o come close to capturing the linear relationship tween Units produced and production cost, We will soon see how this line was chosen. "Te begin, we model the linear relationship between x, and y, by he following equation Bot Bim +8; were gs an eor term representing te ac that during a week during which 3 tain retraced ie pestueton cost might noc avays egal y+ By Ife; > 0: te cos of Wok eine Produced Cost of Produig Tins t 0 5237.40 2 20 $601.60 3 0 $182.00 4 40 3765.40 5 45 5895.50 6 50 $1133.00 7 69 $1152.80 8 33 $1132.70 9 7 st4s9.20, 0 40 9970.10, Cost chourands) os . a6 os 02 ee 10 w 30 70 Trains TABLET Conmivin cA ood Fi fe Tin Cos ao 1282 Cpe 24 Frcs Modes producing x, trains during week 7 will exceed f+, whereas if ¢, < 0, the cost of Producing. tins during week i will be less than fy + yx, However, we expet 6 average out to 0, so the expected cost during a week in which x, trains are produced fo y+ Bay, ‘The true values of fy and fare unknown. Suppose we estimate B, by A, , by By. Then our prediction for y, (since the average value of 2, Rahat Bs, ‘Suppose we have data points of the form (I) Ob. Hy), (x, ¥,)- How should wwe choose values of f, and A, that yield good estimates of fj, and f,? We select values of fj anf, that make our preitons 5, = fy + fy, lose othe actu data points (x), To formalize this idea, define ¢, = error or residual for data point i ~ (atl cos 9) (predicted cost = Ay ~ fix,. We now choose fi and A 0 minimize FB A) = oe =, - by - Bx? ‘The valves fy ad Aj minimizing Fy fi at aed the least squares estimates off and estimate 0) is given by and jy, As described in Example 19 of Chapter 12, we find and fj by setting ar _ar ai, ‘The resulting values of f, and A, are given by LE, ~ HE, — 7 Aa ora Ay tg 4 Ea, —H* A "i where £ = average value all's and j= average value of ally,’ We call §, = A+ Aja) the least squares regression line. Essentially, if the least squares Tine its the points well (ina sense to be made more precise later, we will we + By, a our prediction for y, Usually, the feast squares line is determined by computer Lotus, Exeel, Minitab, and many other popular packages will provide, and 8, For the sake of completeness, however, the computationsneededto determine ana, forthe data in Table IO regivenie Tle IT where we have used Wo 2574-32 657.87 21,042.24 os 2 6oL6 6894.14 84 307820 1395.64 144 40 765.4 290.14 4 45 8055 “58.41 9 so 11330 744.24 ot 6 18 ed 428094 324 8313271321773, 2ssn49 169 7 14592854423 15,238.44 78 40 97-2553 110.26 4 TABLE 12 ors f rms 244 Single Reson 1288 From Table Ley, - Hy, - A (which can easily be implemented on a spreadsheet) we find that 5) = 53,756.6 and (x, ~ 2)? = 3010. Prom (18), we now find that 156.6 010 Cur test squats ine i.) = 164.88 + 17.866, Thos, we estimate that eae extra tain incurs a varable costof f= $17.86 ‘Our predictions and errors for ll ten weeks are given in able 12. To lsat the computations, consder the fst point (102574, The predicted cost i 5, = 164.88 + T7R6(I0) = 333, and the ero is ven by e, = 2574 343.5 = 86.1 Every leat squares ine has to properties! 1786 andy = 914.97 ~ (17.86)42 = 168.88 1 It passes through the point (Z, 5). Thus, during a week in which Giapetto produced j = 42 trains, we would predict that these trains would cost $914.97 to produce. 2 She, = 0.The least squares line “splits” the data point, in the sense that the sum of the ‘vertical distances from points above the least squares line o the Feast squares line equals ‘ne sum of the vertical distances from points below the least squares line to the least squares line. How Good a Fit? How do we determine how well the least squares line fits our data points? To answer this question, we need to diseuss three components of variation: sum of squares total (SST), suum of squares error (SSE), and sum of squares regression (SSR), Sum of squares otal is given bySST = S-(y, ~ 3)*, SST measuees the total variation of y, about its mean ¥. Sum of squares error is given by SSE Yel. Ifthe least squares line passes «bough all the data points, SS te tha he Tease squares Tine fits the data well, We define sum of squares regression to be SSR-= S1(5, ~ j)*.Itean bbe shown that ‘SR + SSE a9) Note that SST is a function only of the values of y. For a good fit, SSE will be small so (19) shows that SSR will be large for a good fit. More formally, we may define the coeflicient of determination (R?) for y by Re = SE = percentage of variation in y explained by 2574 3S 861 20 OG 2217S 30 7820 7007 813 4 165.4 8793-1139 45 3055 9685 —730 5011330 10ST 75.2 60 11 Is 836 581327 1a mid % 14392 14I5 442 40 9TM1 879.3908 V28A—Choper24- Forwards Equivalently, (19) allows us to write SSE 1-8 percentage of variation in y not explained by x Sst From computer output, we find that SST = 1,021,762 and SSE = 61,705. Then (19) yields SSR = SST ~ SSE= 960,057. Thus, we Find that R® = (22,9, — 0.94, This means thar the numberof trains produced during a week explains 94% of the variation in weekly cost of producing tain. All other factors combined can explain at most 6% ofthe variation in weekly cost, so we ean be quite sure that the linear relationship between + and y ig "A measur ofthe linear association between « and y is the sample linear correlation 1.,.A sample corelation near +1 indicates a strong postive linear relationship between x thd yea sample comeation neat —T indicates a stong negative linear relationship between {and y:and a sample corelation near 0 indicates a weak linear lationship between and bythe way. if = O.thenr, equals +, whereas if) <0 the sample conlation between x and y is given by ~VR?. Thus, in our cost example, r,, = v0.94 = 0.97, indicating a strong linear relationship between « and y. Forecasting Accuracy A measure of the accuracy of predictions derived from regression is given by the standard terror of the estimate (s,). Ifwe let = number of observations, s, is given by For our example, “V0 I is usually true’ that approximately 68% of the values of y will be within s, of the predicted value J. and 95% of the values of y will be within 2s, of the predicted value §. In the current example, we expect that 68% of our cost estimates will be within $87.80 of the true cost, ‘and 95% will be within $175.60, In actuality, for 80% of our data points, actual cost is \within », of the predicted cost, and for 100% of our data points, actual cost is within 2s, of the predicted cost Any observation for which y is not within 2s, of jis called an outlier. Outliers represent tunustral data points and should be carefully examined, Of course, ifan outlier is the result Actuily approximately 68% ofthe points should be within | 144 Sinner Regesion 1285 ofa data entry ertor, it should be corrected. If an outlier is in some way uncharacteristic Of the remaining data points, it may be better to omit the outlier and re-estimate the least ‘squares line, Since all the errors are smaller than 2s, in absolute value, there are no outliers in our cost example. Tests in Regression Using a test, we can test the significance of a linear relationship. To test Hy: 8 = 0 (no significant linear relationship between x and y) against Hf, #0 (significant linear relationship between x and y) ata level of significance @, we compute the f-statsti given by by SE) SuErr(f,) measures the uncenaity in our estimate of fit eam usualy be found on 2 computer printout, We reject Hy if I tase MRT fy, obtained from “ble 13. For our eost example, StUEr(f,) = 146 (ound from a computer printout), 50 = WHE 211,16, Using a = 0.05, we find mye = 2.306, so we reject Hy and again onelie that there isa strong linear relationship between x and y Assumptions Underlying the Simple Linear Regression Model Statistical analysis of the simple linear regression model requires that the following as- ‘sumptions hold, Assomption 1 “The variance of the error term should not depend on the value of the independent variable +. This assumption is called homoscedasticity. Ifthe variance of the error term depends, on x, then we say that heteroscedasticity is present, To see whether the homoscedasticity assumption is satisfied, we plot the errors on the y-axis and the value of «c on the x axis. Figure 7 illustrates a situation where the homoscedasticity assumption is satisfied: the figure indicates no tendency for the size of the errors to depend on x. In Figure 8, however, the magnitude of the errors tends to increase as x inereases. This is an example of heteroscedasticity. Using In y or y'/? as the dependent variable will often eliminate heteroscedastivity. Assumption 2 Enors are normally distributed. This assumption is not of vital importance, so we will not discuss it further, Assumption 3 The errors should be independent. This assumption is often violated when data are collected {as in our example) over time, Independence of the errors implies that knowing the value cf one error should fell us nothing about the value of the next (or uny other) error. The validity of this assumption can be checked by plotting the errors in time-series sequence, In Figure 9, we find that the errors had the following signs: +++ ++ +++ This sequence of rors exhibits the following pattern: postive error (corresponding to underprediction ofthe actual value of y) is usually followed by another positive error andl a negative error (corresponding to overprediction of the actual value of ») is ustally tollowed et TAGLE 13 Pecercge Pains of the ssn 1286 6 7 18 9 0 2 2 23 Pry 25 26 28 29 x0 40 0 20 240 ing. Chop 24 3.078 L886 1.2R2 Forcing Modes 01 a= 005 a 0s 6atd 2920 2.353 2132 2015 L943 1805 1.860 1.833 L8i2 1.796 1782 Lam 1761 1753 1746 1740 L734 1729 1725 17a 17 Ls um 1708 1.706 1703 L701 699 L097 Lest 16m 1658 Lest Las 12.106 4303) 3.182 2776 2571 120 Ho 01 2.093 2.086 2.080 2074 2.069) 2.064 2.060 2.056 2.052 2.048 2045 2042 2021 2.000 1.980 1970 1960 31.821 6.965 4541 3747 3.365 3.143 2.998 2.896 2821 2764 2718 2.681 2.650 2,624 2,602 2.583 2367 2582 2539 2528 2518 2508 2.500) 2.492 285 2479 2073 2.467 2.462 2.457 2423 2390 2358 2342 2.326 nor 63.657 9.925 S841 4.604 4032 3.007 3.499 3385 3.250 3.169 3.106 3.055 3012 2977 24 2921 2.898 2.878 2.861 2845, 2.831 2819 2.807 2.797 2787 779 2711 2763 2756 2750 2.704 2.560 2.617 2.596 2.576 0.005: fe UaInntnnIE EEE EEN 246 Sin ino Reyesson 1287 FIGURE T rast Figure 8 Firoedsicty by another negative error This pattern indicates that suecessive errors are not independent; it is referred to as positive autocorrelation. In other words, positive autocorrelation indicates that successive errors have a positive linear relationship and are not linearly independent. If the sequence of errors in time sequence resembles Figure 10, we have negative autocor- relation, Here, the sequence of errors is + — + — += +b ~ + ~+~. This indicates that «positive error tends to be Followed by a negative error, and vice versa. This indicates that successive errors have a negative linear relationship and are not independent. In Figure 11 Wwe have the following sequence of errors: + +— ++ ~+— + + +~. Here, no obvious pattem is present, and the independence assumption appears to be satisfied. Observe that the errors “average out" to 0, 50 we Would expect about half our errors to be positive and half to be negative. Thus, if there is no pattern in the errors, we would expect the errors to change sign about half the time. This observation enables us to formalize the precedin discussion as follows 1 Ifthe errors change sign very rarely (much less than half the time), they probably violate the independence assumption, and positive autocorrelation is probably present. FIGURED festie Asucanlsion FIGURE 10 Negi Asoc 1288 Cup 24 Foe dls 2. If the errors change sign very often {much more than half the time), they probably violate the independence assumption, and negative autocorrelation is probably present. 3. If the errors change sign about half the time, they probably satisfy the independence assumption, for the autocorrelation will 1 of Pindyek and Rubinfeld If positive or negative autocorrelation is present, correct often result in much more accurate forecasts. See pages 215: (1989) for details, Running Regressions on a Spendshest Figure 12 (file Cost.wk ) illustrates how to run a regression with Lotus 1-2-3. We have input the data for Table 10 in the cell range A2..B1 and then invoked the DATA REGRESSION UDR) command, Our X range is A2..A11 and oar ¥ range is B2..B1. Our output range is 13. Ler's briefly explain what each number in the output means, ALLL Figure W mb Single ear Regen fo tron FIGURE TD Keastot A A B c D E 1 [TRAINS [oosr YHAT FIGURE, 2 40 257.4| 343.47126| -86.07126 3 20 601.6 | 522.06462 | 79.535382 4 30, 782 | 700.65797 | _81.342027 5 40 765.4 | 879.25133| -113.8513, 6 45 895.5 | 968.54801 | -73.04801 Z 50, 1133 | 1057.8447 | 75.155316 8 60. 1152.8 | 126.438 | -83.63804 9 55 1132.7 | 1147.1414 | -14.44136 10 70 1459.2 | 1415.0314 | 44.168605 14 40 970.1 | 87925133 90.848671 12 13 Regression Ouiput 1.4] Constant 164.8791 15 | Std Err of V Est, 87.824643 16 |R Squared (0.9396089 17 No. of Observations 10 18 [Degrees of Freedom 8 19 2.0 |X Coefficient(s) 17.859336, 2-4 [Sta Err of Coet. 7.6007855 1289 24.7 1290 Cup 24 Fowcsing odes CONSTANT = This sf = 164.8779 STD ERR OF Y EST = Thisis s, = 87.82. R SQUARED = This is r? = .939609. NO. OF OBSERVATIONS = This isthe numberof data points (10), DEGREES OF FREEDOM ~ This isthe deyrees of feed (n ~2 = 8) used for the test of Hy: B, = O against Hy: B, #0. Thisis f, = 17.85934 Thisis Starr fi, = 1.600786, The X Coefficient divided by the Std Err of Coe, yields the r-staitie for testing Hy: fy = O-against Hy: 0. X COFFFICIENT(S) STD ERR OF COE! In cell C2 we obtained § by inputting the formula +D$14+A2"C$20. In cell D2 we obtain «by inputting the formula +B2—C2. Copying from the range C2..D2 to C2..DI1 creates predictions and errors for all observations, Obtaining a Scatterplot with Lotus 1-2-3 To obtain a scatterplot with Lotus 1-2-3, let the range where your independent variable is be the X range. Then let the range where your dependent variable is be the A range. Then invoke the command sequence GRAPH, TYPE, X-Y, OPTIONS, FORMAT, GRAPH, SYMBOLS. Fitting Nonlinear Relationships (Often, a plot of points of the form (x,,y,) indicates that y is not a linear function of x. In stich cases, however, the plot may indicate that there is @ nonlinear relationship between x and y. For example, ifthe plot ofthe (1, y;) looks like any of parts (a)~i) of Figure 13, a nonlinear relationship between x and y is indicated. ‘The following procedure may be used to estimate a nonlinear relationship: Step 1 Plot the points and find which part of Figure 13 bes fits she data. For illustrative purposes, suppose the data look lke part (c). Step 2 The second column of Table 14 gives the functional relationship between «and y. For part (c), this would be y = fy ep(B,*) Step 3 Transform each data point according to the rules in the third column of Table 14 ‘Thus, if part (c) of the figure is relevant, we transform each value of y into In y and transform cach value of ¥ into x. Given the relationship in the second column of Table 14, the transformed data in Table [4 should, if plotted, indicate a straight-line relationship. For part (c. for example, if y = f,exp(,.x), then taking natural logarithms of both sides yields In y = In(f) + fx. so there is indeed a linear relationship between x and In y Step 4 Estimate the least squares regression line for the transformed data. If ly is the intercept ofthe least squares line (For transformed data), 6, isthe slope of the least squares line (for transformed data, ans, isthe standard error of the regression estimate, chen We read the estimated relationship from the final column of Table 14, Thus, if part (c) were relevant, we would estimate that § ~ exp(A, + Bx 82/2). EE STSSCTINETSDTHNS 247 fitingNocnen Rorsties 1291 1G URE 13 Gnplsot neal Fncos FB Br xall>0) YBa x>0,8<0) y >o ae -1o x B

You might also like