Fundamental Concepts in Data Analysis The Great Chase of the All-Time Single Season RBI Baseball Record

of 1930
Summary
Detroit Tigers’ superstar, Miguel Cabrera, is believed to be on track in this season to break Hack Wilson’s single season RBI (Runs Batted In) record of 191, established in 1930. I have analyzed Cabrera’s batting stats for 2011, 2012 and the 2013 seasons using both a linear and a nonlinear model to fit the AB (At Bats)-RBI data. The discussion of baseball stats here thus serves as a useful proxy to understand many fundamental concepts of data analysis that, sadly, are NOT used in the so-called “soft” sciences. For example, the ratio y/x (where x and y are our empirical observations on any problem of interest) is often taken to be the rate of change. However, as we see here in the discussion of baseball statistics, the ratio y/x is not the same as the “rate” of change (example unemployment rate, various fatality rates). The true measure of the rate of change is the ratio ∆y/∆x, as we learn in our elementary calculus courses. The ratio y/x is rarely equal to the ratio ∆y/∆x. The study of baseball statistics helps to illustrate these basic concepts; see also the companion article (click here), http://www.scribd.com/doc/143727444/Trust-Me-TheFinancial-World-will-Change-Forever-if-Wall-Street-Starts-AnalyzingFinancial-Data-like-we-do-Baseball-Stats-Miguel-Cabrera

************************************************************

Page | 1

The following comment was posted ~ 5:50 AM on May 28, 2013

Vj Laxmanan Ok, here's good news for those who want to see the Wilson record broken. I took a second harder look and used a nonlinear model. In the linear model, we take the data for April and May to make the end of season projection. This is called Method I in my earlier analysis (already uploaded). If I apply the same to full 2012 stats, we see a clear deviation from the linear model. Cabrera was operating on a nonlinear power curve in 2012. Statistical analysis gives a very good fit and to the yearend data using the nonlinear model for 2012. We can do the same now with 2013 data. Even so, the AB need is too high and may not be achieved. I will update the earlier post by adding the nonlinear analysis soon. The linear model may be found here, see link http://www.scribd.com/doc/144083838/Is-Miguel-Cabrera-on-Pace-to-Break-Hack-Wilson-sSingle-Season-RBI-Record-No-Way-Read-On-Now

The following comment was posted at ~ 7:26 PM on May 27, 2013

http://www.mlive.com/tigers/index.ssf/2013/05/what_theyre_saying_migue l_cabr.html#comments Vj Laxmanan I did a statistical analysis of AB-RBI for Cabrera and so this is not just my opinion. It is based on mathematical analysis. Cabrera can beat his own record. He will need about 490 to 500 AB to cross 140 RBI assuming the present pace continues. But, there is just NO WAY, just IMPOSSIBLE, for him to beat the all-time record and do better than 190 or 191 RBIs. He just will not have the AB needed to beat the record. Whoever made the prediction that Miguel is on pace --- better check the math out. We just cannot use ratios to make such predictions, such the current ratio RBI/AB = 57/199 = 0.286 to make the prediction. One must use the rate of change of RBI with increasing AB. That will give the more accurate prediction. I ran the statistics using just 2013 data (two ways of doing it, just end of April and May data and 16 points spread out through April and May) and also considering the full 2011, 2012 seasons along with 2013 season to date. There is just no way he could beat the all-time RBI record. At least not in this
Page | 2

season. I have recently posted a more detail article on Cabrera batting average, see my Instablog here http://seekingalpha.com/instablog/958073vlaxmanan/1894301-trust-me-the-financial-world-will-change-forever-ifwall-street-starts-analyzing-financial-data-like-we-do-baseball-statisticsmiguel-cabrera I will follow up later with my analysis of the RBI for all to review. Of course, you can rip me….and the predictions can be tested, for that matter even by the end of June 2013, or July 2013. Don’t have to wait for season end. Cheers!    (See update with the nonlinear model presented after Method III.)

************************************************************
Miguel Cabrera’s batting stats caught my attention recently as I was trying to explain the meaning of a “work function” (an idea first conceived by Einstein in 1905) in the context of my discussion of the US traffic fatalities (which went up in 2012 after six straight years of decline, hence the discussion, see the references cited). The “work function” is the term used by Einstein to describe the nonzero intercept c in the linear law y = hx + c, describing the empirical observations on photoelectricity; see references cited. Likewise, there is a work function in baseball, i.e., a nonzero c in the At Bats-Hit relation, or At Bats-RBI relation and so on. Likewise, we find a work function in many other problems of interest to us. Instead of using my earlier analysis of the legendary baseball player Babe Ruth’s batting stats (to explain the work function), I decided to use Cabrera’s recent four-game stretch to discuss the work function; see Refs. [1, 2]. Cabrera was making the MLB sports headlines. During this stretch, Cabrera had six home runs in four consecutive games with a home run in each game, and an incredible batting average of 0.600 over this stretch. I also came across another intriguing stat that was being discussed – that he is on pace to break the single season RBI record set by Hack Wilson in 1930.
Page | 3

The results of my analysis of Cabrera’s batting average can be found in the references cited and I have also called attention to it in my Instablog at Seeking Alpha, see link in the comments that I posted. I have now followed up my earlier analysis (i.e., as described in Ref. [2]) by checking the AB-RBI stats as well to see if Cabrera can break Wilson’s record. The brief answer is: NO WAY. This is not my personal opinion. It is based on sound mathematical analysis. So, here it is. (Since the first posting of this article, I have added the nonlinear analysis which may be found at the end, after Method III. I have also analyzed Hack Wilson’s 1930 AB-RBI data, see Appendix 1, and am willing to change my mind from NO WAY to YES, Can.) There are at least three different approaches (have added a fourth one now, a nonlinear model) to making the prediction of interest to us. I will now discuss each one of them. The relevant 2013 season data and other data that I have used, is presented in the form of tables for ready review and can also be found at the ESPN MLB website. As discussed in Ref. [2], in the discussion of Cabrera’s batting stats, the ratio y/x is not the same as the ratio ∆y/∆x. When we talk about the incredible batting average of 0.600 during the four-game stretch from May 19 to May 23, we are using the ratio ∆y/∆x, not the ratio y/x, where ∆y is the additional Hits made in the additional At Bats ∆x. Here x and y are the cumulative values of these two quantities, see Table 1.

Readers, especially sophisticated baseball fans, should note that regardless of the language sometimes used in discussing the RBI, in the context of “Cabrera breaking” Wilson’s record, I recognize the criticism of this statistic as NOT being an individual statistic, reflecting the skill of an individual player. Some critics think of it rather as a team statistic; see Refs. [19, 20] and also remarks in Appendix 1.

Page | 4

Table 1: Miguel Cabrera’s AB-RBI batting stats for the 2013 season
2013 Season only AB 97 102 RBI 29 28 Month of May 2013 AB RBI Cum AB

May April

Game

Month of April 2013 AB RBI Cum AB

Cum RBI 1 3 4 5 6 6 10 10 11 11 11 13 17 17 17 18 18 18 19 20 21 23 26 26 28

Game

Cum RBI 29 30 30 36 36 37 37 40 40 40 40 41 41 41 42 42 47 49 52 55 57 57 57 57

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

5 4 4 2 4 4 5 5 4 5 4 4 5 6 4 4 3 4 3 4 4 5 4 2 4

1 2 1 1 1 0 4 0 1 0 0 2 4 0 0 1 0 0 1 1 1 2 3 0 2

5 9 13 15 19 23 28 33 37 42 46 50 55 61 65 69 72 76 79 83 87 92 96 98 102

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

3 5 4 4 4 4 5 5 5 4 4 3 5 4 4 4 4 4 4 3 4 4 3 4

1 1 0 6 0 1 0 3 0 0 0 1 0 0 1 0 5 2 3 3 2 0 0 0

105 110 114 118 122 126 131 136 141 145 149 152 157 161 165 169 173 177 181 184 188 192 195 199

Data sources: Miguel Cabrera Game-by-Game Stats, ESPN MLB, http://espn.go.com/mlb/player/gamelog/_/id/5544/miguel-cabrera

The cumulative At Bats, going into the game on May 19 when the four-game stretch began was x = 169 and the cumulative hits was y = 63. At the end of
Page | 5

the four game stretch it had increased to x = 184 and y = 72 giving ∆x = 15 and ∆y = 9 and the four-game batting average BA = ∆y/∆x= 9/15 = 0.600. If this had continued, with 10 more AB, Cabrera would have made an additional 6 hits. However, as we know, that pace could NOT be maintained in the remaining three games of May 2013. Nonetheless, the average rate of change over a significantly number of games will always yield much better predictions than using the y/x ratio, which was 0.373 on May 18, increased to 0.391 on May 23 and dropped to 0.385 on May 26. The reason for this approach has been discussed in detail in Refs. [3-8].

Method I: Month end April and May 2013 data
The easiest way to make a prediction is to consider the data for the current season, the month end April and May data. This is the lazy way. We have two (x, y) pair where x is the At Bats and y now is the RBI. These two points can always be joined by a straight line, see Figure 1. The mathematical equation of this line is y = hx + c where slope h and intercept c are given by: Slope h = (y2 – y1)/(x2 – x1) = ∆y/∆x Intercept c = (y1- hx1) = (y2 –hx2) ……..(1) ……..(2)

We first determine the slope h using equation 1 and then the intercept c using any one of the two equalities in equation 2. Both will yield the same answer. The equation of the line, labeled A in Figure 1, is y = 0.299x – 2.495. Now, we can substitute the value of RBI (y) desired to find the AB (x) needed. Or, AB (x) = 3.34 y + 8.345 to predict AB for desired y ……..(3)

Because of the nonzero intercept c, the ratio y/x = RBI/AB will keep on increasing as the AB increases. This is the reason why using the ratio y/x, or some “averaged” value of this ratio, will lead to erroneous predictions. Using the slope h, or the rate of change of RBI with increasing AB, will yield a more accurate prediction.

Page | 6

250

200

Number of RBI, y

y = hx + c = 0.299x – 2.495 (April & May 2013)

150

100

50

0 0 100 200 300 400 500 600 700 800

Number of At Bats, x
Figure 1: The AB-RBI diagram for the 2013 season, at its simplest, with just the month end data for April and May (two diamonds). The two solid blue dots are the predicted values of the At Bats for RBI = 141 and RBI = 192.

If we want to predict the future location of a car, we use its instantaneous speed, or velocity, v = ∆x/∆t, not the average speed x/t where x is the total distance traveled since the journey began and t is the total time elapsed for the journey. The speed v is the ratio of the “additional” distance ∆x traveled in the additional time ∆t. If the car is traveling at 60 mph, it will be found at a distance of 1 mile after one more minute. The speed v, not the ratio x/t is used by the traffic cop to determine if you should get a speeding ticket, not the ratio x/t, your average speed. Likewise, we must use the slope h = ∆y/∆x, not the ratio y/x to make the predictions for the likely breaking of the single-season Wilson record or Cabrera’s own record.

Page | 7

To better his own record, and cross RBI = 141, Cabrera will need to get to AB = 480, which is well within the realm of possibility, if we continues to remain healthy and there are no other unforeseen conditions. However, to cross RBI = 191 and break Wilson’s record, Cabrera will need AB = 652 which is clearly impossible. The AB for 2011 and 2012 season were 572 and 622. The requirement of AB = 652 is thus difficult to foresee. Cabrera’s pace would have to increase significantly to break Wilson’s record with way 600 to 620 AB in line with the last two seasons. So, right now, the prediction is NO, Cabrera cannot break Wilson’s record. It is just not possible. Like a golfer running out of holes while making a late charge up the leaderboard, Cabrera will run out ABs before he can break Wilson’s record. Of course, this rather pessimistic prediction now can be revisited and checked at the end of June. With additional AB of 110, which would agree with June 2011 and June 2012 AB values, Cabrera would be at RBI = 90, if the current pace continues into June. If Cabrera exceeds the prediction of RBI = 90 and reaches say RBI = 100 at the end of June, then he would definitely be in contention to break the Wilson record. Now, let us see what other methods tell us.

Method II: Detailed April and May 2013 data
As the number of games played increases, the At Bats (x) increase and the RBI also increases. A small selection of the cumulative tally from Table 1 is considered in Table 2. The x-y graph, see Figure 2, reveals a remarkably linear relationship between AB and RBI. Notice that the RBI remains constant and does not increase for several games. In such cases, the value of AB chosen for the analysis is always the highest value for the RBI considered. The mathematical equation relating AB and RBI can be determined using the method of least squares (linear regression analysis). This yields, RBI (y) = 0.2887x – 0.4433 with r2 = 0.9906 ……..(3)

Based on linear regression equation 2 above, for AB = 490, RBI = 141 and for AB = 667, RBI = 192.
Page | 8

Once again, we see that the AB needed to cross Cabrera’s own record is around 490, which is eminently feasible. However, the AB needed to break the Wilson record is too high and is not attainable in a single season. Hence, even if Cabrera were to continue the current pace, it is unlikely that he can break the single season RBI record because of a lack of sufficient ABs.
80.00 70.00

Number of RBI, y

60.00 50.00 40.00 30.00 20.00 10.00 0.00 0 -10.00 50 100 150 200 250 300

y = hx + c = 0.289x – 0.443 r2 = 0.991 (April & May 2013)

Number of At Bats, x
Figure 2: A selection of the AB-RBI data for Cabrera from April and May 2013. Table 2: Small selection of AB-RBI data from April and May 2013
Game No 1 4 8 11 12 18 20 23 25 Cumulative AB, x 5 15 33 46 50 76 83 96 102 Cumulative RBI, y 1 5 10 11 13 18 20 26 28 Page | 9

3 5 11 17 19 20 24

114 122 149 173 181 184 199

30 36 40 47 52 55 57

Method III: Merging Data Sets for April 2011-May 2013
Finally, we consider the overall performance of Cabrera in the past two seasons along with the current season to see the effects of any “averaging” of the performance. Recall that 2011 season is Cabrera’s best season (highest BA) to date. The x-y graph, see Figure 3, again reveals a nice upward trend All of the data points can be seen to fall below the straight line joining the (x, y) pairs for April 2011 and May 2013, the two extreme points in the data set. However, notice that the data immediate past, starting from September 2012 to May 2013 reveals a higher rate of change of RBI with increasing AB. This slope of this (dashed) line, with the equation, y = 0.218x – 1.97, is therefore used to predict the AB needed to achieve the desired RBI values. October 2012, AB = 1192 and AB = 244. This is the baseline value. April 2013, May 2013, AB = 1285 and RBI = 273 AB = 1387 and RBI = 301

 For AB = 1675, RBI = 385  Additional 141 RBI in 2013 season with AB = 483  For AB = 1850, RBI = 658  Additional 192 RBI in 2013 season with AB = 658 To summarize, all three methods considered lead to the same conclusions. Cabrera can best his own RBI record before reaching AB = 500 in this season, assuming current performance continues. However, it is impossible to beat the single-season Wilson record because he will not be able to achieve the
Page | 10

needed AB values. If higher AB values are attained, he might have a shot at it lacking the AB, the Wilson record seems secure.

Table 3: Merged monthly data for the 2011, 2012, and 2013 seasons Month-Year
Mar-11 Apr-11 May-11 Jun-11 Jul-11 Aug-11 Sep-11 Apr-12 May-12 Jun-12 Jul-12 Aug-12 Sep-12 Oct-12 Apr-13 May-13

AB
2 94 91 93 96 105 91 84 124 106 96 96 104 10 93 102

RBI
1 18 18 19 12 16 21 20 22 20 23 24 27 3 29 28

Cum AB
2 96 187 280 376 481 572 656 780 886 982 1078 1182 1192 1285 1387

Cum RBI
1 19 37 56 68 84 105 125 147 167 190 214 241 244 273 301

Finally, it is also of interest to note that the slope h predicted by all three methods are comparable and the intercept c is negative in all cases and also does not differ significantly to make a big difference. Method I; y = 0.299x – 2.495 Method II: y = 0.289x - 0.443 Method III: y = 0.281x – 1.97

Page | 11

400

300

Number of RBI, y

200

100

y = hx + c = 0.281x – 1.97 Joins Sep 2012 to May 2013
0 200 400 600 800 1000 1200 1400 1600 1800 2000

0

-100

Number of At Bats, x

-200

Figure 3: Merged 2011, 2012, and 2013 data used for analysis. The dashed line with the steeper slope joins the September 2012 and May 2013 data. This is used for extrapolation to predict AB required to best Cabrera’s own record and the all-time single-season Wilson record. In this case, linear regression analysis is NOT very helpful and will actually yield a more conservative and lower slope. We are looking, instead for the best performance to beat the Wilson record. This is revealed by the slope of the dashed line joining the data, as indicated.

Nonlinear Model for Cabrera’s 2012 Performance
If we apply method I described here to the 2012 batting stats, we see a clear evidence of nonlinearity. This is illustrated by the calculations presented in Figure 4 (see dashed straight line) where I have applied method I to the 2012, with the benefit of already having all the scores for the rest of the season. Now it is clear that Cabrera was operating on a nonlinear curve and that the RBI was increasing at an accelerating pace as the 2012 season progressed.

Page | 12

The simplest nonlinear model that can be applied to describe such an accelerating pace is the power law model which can be written as y = Axn + B ……..(4)

If the exponent n = 1, we recover the linear law as before. For n > 1, the graph is a rising curve but the slope decreases as x, the number of At Bats, increases. In other words the pace is decelerating. For n > 1, the graph is a rising curve but shows an acceleration in the pace. I have added the nonzero B but for the moment let us take B = 0, i.e., the curve is assumed to pass through the origin. The constants “A” and “n” in this power law can be determined by first preparing a double logarithmic plot of log x versus log y, since after taking logarithms, equation 4 becomes log y = log A + n log x; see the plots provided in Appendix 1 where I have discussed Hack Wilson 1930 season. Hence, if the power law applies, the graph of log x versus logy is a straight line with the slope equal to the exponent “n”. The intercept of the graph equals log A and we can determine A from A = exp (log m). I will skip the presentation of the log-log plot and go straight to the power-law. To get a good fit, I had to delete the first point (data from the regression analysis. The values determined are A = 0.11 and n = 1.11.

Page | 13

200 180 160

Needed (x, y) y = Axn = 0.11x1.11

Number of RBI, y

140 120 100 80 60 40 20 0

y = hx + c = 0.177x + 5.1 Joins April and May 2012
0 100 200 300 400 500 600 700 800 900 1000

Number of At Bats, x
Figure 4: Nonlinear analysis with the 2012 AB-RBI stats for Miguel Cabrera. We use method I to project the season end batting stats with the benefit of knowing the results for the rest of the season. The dashed straight line with the equation y = hx + c = 0.1774x + 5.097 describes this early season performance. Cabrera was clearly operating on a nonlinear curve for the rest of the season. The mathematical law is the power law, y = Axn = 0.11 x1.11. The constants A and n were determined using linear regression analysis (a logx versus logy plot, as described in the text). The nonlinear law, with an acceleration in the pace of RBI is clearly a good fit to the data. However, even with this acceleration, Cabrera would need AB = 833 for RBI = 192 to break the Wilson record. This is clearly impossible. The same nonlinear model is also applied to the 2011 batting stats, see Figure 5.

Page | 14

160 140

Number of RBI, y

120 100 80 60 40 20 0

y = hx + c = 0.198x + 0.011 Joins April and May 2011

y = Axn = 0.274x0.934

0

100

200

300

400

500

600

700

800

Number of At Bats, x
Figure 5: Nonlinear analysis with the 2011 AB-RBI stats for Miguel Cabrera. Although the 2011 season is Cabrera’s best season to date (highest BA), it was not a sterling performance from the standpoint of the RBI. As we see here, the linear model, using the early season stats through the end of May 2011 actually provides a reasonably good prediction. Unlike the case in 2012, Cabrera shows a deviation from the linear law but now the pace has decelerated. In other words, the exponent n < 1. Using the log-log plots, we deduce y = 0.274 x0.934. This power law curve can be seen to provide a good fit to the AB-RBI data for the latter part of the 2011 season. Cabrera needs to improve on the accelerating pace witnessed in 2012 to break the single-season Wilson RBI record. This is illustrated again Figure 6 with a new power law curve added to the 2012 data.

Page | 15

200 180 160

y = Axn = 0.153x1.11

Number of RBI, y

140 120 100 80 60 40 20 0

0

100

200

300

400

500

600

700

800

900

1000

Number of At Bats, x
Figure 6: A second power-law curve with a higher value of the constant A = 0.153 with n = 1.11, as in Figure 4, is added to the 2012 season plot. With the higher value of A, it is now possible to arrive at RBI = 192 with AB = 6210, as for the 2012 season. The increase in “A” with constant n means that the RBI y = Axn has increased for the same At Bats, x. The slope of the curve, dy/dx = n(y/x) and this means the ratio y/x must be considerably higher than in 2012 season at all values of the At Bats to break the Wilson single-season record. The two solid blue dots on the power law curves represent RBI = 192 and the corresponding AB can be read off the curves using the gridlines added for convenience. . Finally, readers should note that the same analytical methods can also be applied to critically analyze many (x, y) observations, such as the profits-revenues data for companies, traffic fatality statistics, and so on. This is not just about baseball. That is the reason I am willing to “risk” going on a limb here with these “predictions”. It is the methodology, not the specific prediction, that is much more important and, I hope, is appreciated and also more widely applied about the “hard sciences”.
Page | 16

Appendix 1: Hack Wilson’s 1930 season batting stats
Month (1930) AB, x RBI, y Hits HR April/Mar 57 11 15 4 May 93 33 34 10 June 108 29 39 8 July 112 31 37 11 Aug 113 53 45 13 Sep/Oct 102 34 38 10 Totals 585 191 208 56 Ratios (season) 0.326 0.356 0.096 The ratios are relative to total AB at end of season. Data source: http://www.baseballreference.com/players/split.cgi?id=wilsoha01&year=1930&t=b
250

y = hx + c = 0.354x – 9.226 (April & May 1930)

200

Number of RBI, y

150

100

50

y = hx + c = 0.299x – 2.495 (April & May 2013) Cabrera’s early season
0 100 200 300 400 500 600 700 800

0

Number of At Bats, x
Figure 7: Hack Wilson’s 1930 AB-RBI diagram. The linear law is a good predictor here. Wilson’s RBI dropped in June and July. It was the huge August RBIs that made all the difference for the still unbroken single-season RBI record.
Page | 17

250

y = hx + c = 0.354x – 9.226 (April & May 1930)

200

Number of RBI, y

150

100

50

y = hx + c = 0.299x – 2.495 (April & May 2013) Cabrera’s early season
0 100 200 300 400 500 600 700 800

0

Number of At Bats, x
Figure 9: Folks! I was skeptical at first but now I have now changed my mind. Yes, Cabrera may indeed be on track to beat Hack Wilson’s 1930 single season RBI record based on the above graphical comparison of the two players. The two solid dots are Cabrera’s 2013 month end April and May data. The dashed line is the projection based on the linear law (Method I discussed earlier). Now, although the slope h is lower for Cabrera compared to Wilson, notice that with his current rate of increase of RBI (measured by the slope h) will allow Cabrera to match Wilson in June and July. But then Cabrera will need the August that Wilson had. The best strategy for Cabrera would be to move to a steeper slope in June and July and match Wilson’s initial line established in Mar-April-May 1930 and then coast to the record in August and September to match Wilson. Is it possible? Now, having analyzed Wilson’s record, I think it is possible. But, Cabrera has his work cut out for him starting June. Now, do RBIs really matter compared to homers? Does Cabrera care for this record, see Ref. [19]? That might just be the million dollar question! Cheers!  Suddenly, why do I feel like an expert now on this topic?    .
Page | 18

Now, for completeness, let us now consider the nonlinear model for Wilson’s 1930 AB-RBI stats.

250

200

Number of RBI, y

150

100

50

y = hx + c = 0.308x – 6.582 (Mar/April to June 1930) Wilson’s early season
0 100 200 300 400 500 600 700 800

0

Number of At Bats, x
Figure 9: Instead of the Mar/April and May data, we consider here the initial slope established by averaging over the month of June 1930. The straight line joining the points (57,11) for Mar/April and (108,29) for June 1930, has a slightly lower slope (Cabrera’s slope h = 0.299). Wilson was essentially following this line through July and then accelerated to a higher rate in August and maintained it in September to set the single-season record that remains unbeaten. The nonlinearity evident here can be described by the power-law model as illustrated by the plots in Figures 10 and 11. The double logarithmic plot reveals a remarkable linearity. The slope of the log x versus log y plot (natural logarithms were used) yields n = 1.202 and the intercept log A = -2.3851 from which A = exp(-2.385) = 0.092.

Page | 19

Natural logarithm, y (RBI)

7.00 6.00 5.00 4.00

ln y = 1.201 ln x – 2.3851

3.00
2.00 1.00 0.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00

Natural logarithm, x (At Bats)
Figure 10: Double logarithmic plot log y = n log x + log A to deduce the constants A and n in the power law y = Axn. The slope gives the value of the exponent “n” and the intercept value of pre-factor “A” after taking anti-logarithms.
300

Number of RBIs, y

250

y = Axn = 0.0921x1.201
200 150 100 50 0 0 100 200 300 400 500 600 700 800

Number of At Bats, x
Figure 11: Near PERFECT power-law fit to Wilson’s 1930 AB-RBI data. The exponent “n” deduced for Wilson is quite close to n = 1.11 for Cabrera, Figure 6.
Page | 20

Finally, let us consider the following projection based on RBI and the number of games played. The following data can be obtained from http://www.baseballreference.com/players/gl.cgi?id=cabremi01&t=b&year= which gives the game number along with the relevant stats; see comments attached with Ref. [22]. Game No, x
0 5 10 15 20 25 30 35 40 42 45 49 60 75 101 129 160 166

RBI, y
0 6 11 17 20 28 36 40 42 47 55 57

Best-fit line projection
-0.627 5.172 10.972 16.771 22.571 28.370 34.170 39.969 45.769 48.088 51.568 56.208 69 86 117 149 185 192

The linear regression equation y = 1.16x – 0.627, see graph, is used to make the projections in the last column. Cabrera must clearly increase the slope h = 1.16 even further in order to break the Wilson record to avoid running out of At Bats (like a golfer running out of holes).

In his discussion, Rymer projects about 180 RBI for Cabrera, based on the observation of 47 RBI after 42 games (or about 1.12 RBI per game). After considering a number of additional data points, the linear regression analysis above yields a projection of 185 RBI at 160 games. Cabrera will need 166 games to break the Wilson if the slope h, which is equal to the rate of increase of RBI with increasing games, is sustained. Hence, Cabrera must actually achieve an even higher rate, similar to that attained by Wilson in August 1930.

Page | 21

80

70
60

Number of RBIs, y

50 40 30 20 10 0 0 10 20 30 40 50 60 70

y = hx + c = 1.16x – 0.627 r = 0.990 (April & May 2013)
2

Number of Games, x
Figure 12: The RBI data is plotted versus number of games played in April and May 2013. The best-fit line through the data has the equation y = 1.16x – 0.627 with r2 = 0.990. At game 160, this linear law yields RBI = 185. Only at game 166 will Cabrera be able to break the Wilson record. In other words, the rate of increase of RBI with increasing games, the slope of the graph must be increased in June and July to successfully break the Wilson record. So, the conclusion is that breaking the Wilson record IS DOABLE but will require significant improvement in performance.

Page | 22

Reference List

1. Instablog at Seeking Alpha, Trust Me, The Financial World will Change forever if Wall Street Starts Analyzing Financial Data Like we do Baseball Statistics: Miguel Cabrera, Posted May 26, 2013, http://seekingalpha.com/instablog/958073-vlaxmanan/1894301-trustme-the-financial-world-will-change-forever-if-wall-street-starts-analyzingfinancial-data-like-we-do-baseball-statistics-miguel-cabrera 2. Trust Me, The Financial World will Change forever if Wall Street Starts Analyzing Financial Data Like we do Baseball Statistics: Miguel Cabrera, Posted May 26, 2013, http://www.scribd.com/doc/143727444/Trust-Me-The-Financial-Worldwill-Change-Forever-if-Wall-Street-Starts-Analyzing-Financial-Data-likewe-do-Baseball-Stats-Miguel-Cabrera In the following articles on US Traffic fatalities, which should be of general interest to all, I have used baseball statistics (Miguel Cabrera as an example) to illustrate the meaning of a “work function”. This was first conceived by Einstein, in 1905, to explain the photoelectric effect. Baseball stats provide another example of a “work function”. Baseball fans might also like my analysis of the batting stats of the legendary Babe Ruth, and the recent analysis about Hamilton’s present “slump”. 3. Early Estimates of Motor Vehicle Traffic Fatalities in 2012, Traffic Safety Facts, May 2013, NHTSA DOT HS 811 741 http://wwwnrd.nhtsa.dot.gov/Pubs/811741.pdf 4. Is the Vehicle Miles Traveled (VMT) Even the Proper Metric to Determine US Traffic Fatality Rates? Published May 23, 2013, http://www.scribd.com/doc/143168641/Is-Vehicle-Miles-Traveled-VMTEven-the-Proper-Metric-to-Determine-Traffic-Fatality-Rates ; see also posted using Facebook, http://www.scribd.com/doc/143156075/Is-

Page | 23

5.

6.

7.

8.

9. 10.

11.

12. 13.

Vehicle-Miles-Traveled-VMT-Even-the-Proper-Metric-to-DetermineTraffic-Fatality-Rates The Correlation Between Highway Deaths and the US Economy, Published May 20, 2013, http://www.scribd.com/doc/142526685/TheCorrelation-Between-Highway-Deaths-and-the-US-Economy Highway Fatalities Trend shows it first uptick in Six Years: Predicting the Crossover with Firearms Deaths, Published May 18, 2013, http://www.scribd.com/doc/142199172/Highway-Fatalities-TrendShows-Its-First-Uptick-in-Six-Years-Predicting-Crossover-with-FirearmsDeaths Does Speed Kill? Forgotten US Highway Deaths in the 1950s and 1960s, August 3, 2012, http://www.scribd.com/doc/101982715/DoesSpeed-Kill-Forgotten-US-Highway-Deaths-in-1950s-and-1960s The Effect of Speed Limits on Fatalities and Texas Proofing of Vehicles, August 3, 2012, http://www.scribd.com/doc/101983375/Effect-of-Speed-Limits-onFatalities-Texas-Proofing-of-Vehicles Babe Ruth’s 1923 Batting Statistics and Einstein’s Work Function, Published April 17, 2013, http://www.scribd.com/doc/136489156/BabeRuth-s-1923-Batting-Statistics-and-Einstein-s-Work-Function Babe Ruth Batting Statistics and Einstein’s Work Function, To be Published April 17, 2013, http://www.scribd.com/doc/136556738/BabeRuth-Batting-Statistics-and-Einstein-s-Work-Function The Method of Least Squares: Predicting the Batting Average of a Baseball Player (Hamilton in 2013), Published May 7, 2013, http://www.scribd.com/doc/139924317/The-Method-of-Least-SquaresPredicting-the-Batting-Average-of-a-Baseball-Player-Hamilton-in-2013 Legendre, On Least Squares, English Translation of the original paper http://www.york.ac.uk/depts/maths/histstat/legendre.pdf Line of Best-Fit, Least Squares Method, see worked example given http://hotmath.com/hotmath_help/topics/line-of-best-fit.html The formula for h used in this example is an actually approximate one and was used, before the advent of modern computers, since it only involves the determination of x2 and xy and the sum of all the values of x, y, x2 and xy.
Page | 24

The exact formula, is given below, with xm and ym denoting the “mean” or “average” values of x and y in the data set, and ym = hxm + c since the “bestfit” line always passes through the point (xm , ym). h = ∑ (x – xm)(y – ym)/ ∑ (x – xm)2 Determine the deviations of the individual x and y values from the “mean”, or “average”, (x – xm) and (y – ym). Determine the product (x – xm)(y – ym) and their sum. This gives the numerator in the expression for h. Determine the square (x – xm)2 and the sum. This gives the denominator in the expression for h. This also fixes the intercept c via ym = hxm = c . Then, using the regression equation, determine the predicted value yb on the best-fit line and the vertical deviation (y – yb) and the squares (y- yb)2. The sum of these squares is a minimum. This can be checked by assigning other values for h (using any two points) and allowing the graph to pivot around (xm, ym). The regression coefficient r2 = 1 - { ∑(y- yb)2 / ∑(y- ym)2 } is a measure of the strength of the correlation between x and y (or y/x versus x). For a perfect correlation, when all points lie exactly on the graph, r2 = +1.000. 14. The Method of Least Squares: The Debt-GDP Relation for the Trillionaire Club of Nations, Published May 4, 2013, http://www.scribd.com/doc/139348541/The-Method-of-Least-SquaresThe-GDP-Debt-Relation-for-the-Trillionaires-Club-of-Nations 15. Bibliography, Articles on Extension of Planck’s Ideas and Einstein’s Ideas beyond physics, Compiled on April 16, 2013, http://www.scribd.com/doc/136492067/Bibliography-Articles-on-theExtension-of-Planck-s-Ideas-and-Einstein-s-Ideas-on-Energy-Quantum-totopics-Outside-Physics-by-V-Laxmanan 16. Money in Economics is Just like Energy in Physics: Extending Planck’s Law Beyond Physics, Published Jan 14, 2013, Introduction to the generalized statement of Planck’s radiation law and application to describe the maximum point on the profits-revenues graph of a company (the “old”, GM, Ford, Yahoo), http://www.scribd.com/doc/120324960/Money-inPage | 25

Economics-is-Just-like-Energy-in-Physics-Extending-Planck-s-law-beyondPhysics 17. What they are saying: Miguel Cabrera on pace to break Hack Wilson’s single season RBI record, by James Schmehl, May 6, 2013, http://www.mlive.com/tigers/index.ssf/2013/05/what_theyre_saying_mi guel_cabr.html#comments 18. Miguel Cabrera: Tigers’ Superstar on Track to Smash “Untouchable” MLB Record, by James Morisette, May 25, 2013. http://bleacherreport.com/articles/1651288-miguel-cabrera-tigerssuperstar-on-track-to-smash-untouchable-mlb-record Paul Goode Basebook Baseball Magazine writer called attention to the possibility of breaking the single-season Wilson record. 19. Runs Batted In (RBI) http://www.sportingcharts.com/dictionary/mlb/runs-batted-in-rbi.aspx
While RBI is generally tracked as an individual statistic and gives an indication as to the ability of a player to generate offense for a team, there are some critics of the metric that suggest that the quality of the team is also a significant input into the number of RBIs a player is able to generate.

20. Why the RBI is Obsolete and Why We Can Do Better? By Zachary D Rymer, November 19, 2012, http://bleacherreport.com/articles/1414978why-the-rbi-is-obsolete-and-how-we-can-do-better 21. Despite zero RBI Saturday, Miguel Cabrera on pace to break record, by Matt Snyder, May 25, 2013, http://www.cbssports.com/mlb/blog/eye-onbaseball/22302415/despite-zero-rbi-saturday-miguel-cabrera-on-pace-tobreak-record He took a zero in the RBI category on Saturday after knocking in 15 in the
previous five games. Cabrera's pace for the season is a whopping 196. The major-league record for RBI in a season is 191, held by Hack Wilson (Cubs, 1930). The American League record is held by Lou Gehrig (Yankees, 1931), as he drove home 184 in 1931.

22. Why Hack Wilson’s RBI Record is Impossible to Break in Today’s MLB, by Zachary D. Rymer, May 20, 2013, http://bleacherreport.com/articles/1645610-why-hack-wilsons-rbirecord-is-impossible-to-break-in-todays-mlb 58 Comments

Vj Laxmanan posted 1 minute ago Contributor I (~11:10 PM on May 28, 2013)

Page | 26

“And even an RBI-hating nerd like myself has to admit that Cabrera's RBI total is particularly impressive, as 47 ribbies through 42 games puts him on pace for roughly 180.” Zach, can you explain how you get 180? I am guessing you are taking a simple proportion 47 RBI 42 games, or 1.12 RBI per game, give him about 160 games to get your projection. Is that correct? But, now consider this: After 49 games he is at 57 RBI (end of May stat). This means in 7 games he has added 10 RBI. At this rate of increase of RBI of 10/7 = 1.43 RBI per game, in 70 games he could add 100 RBI and in 98 games (14 times 7) he would add 140 RBI. Now add the 57 he has now and we get 197 RBI in (98+49) = 147 games. That is breaking the Wilson record for sure. We have both used proportions in different ways, that's all. We can revisit this type of calculation at the end of June, and even July, and August. In case you are wondering, I am kinda obsessing on this now and have a detailed article posted on this topic of projecting if MC could break the RBI record. http://www.scribd.com/doc/144083838/Is-Miguel-Cabrera-on-Pace-to-Break-HackWilson-s-Single-Season-RBI-Record-YES-Can-I-Changed-My-Mind-on-This-Read-OnNow I have also studied Wilson 1930 season AB-RBI and am totally amazed by how well math seems to work, as you can see by the calculations in Appendix 1. But, I still would like to how this on pace to break Wilson's record started and what is the basis of the projection -- forget all the nonlinear models I am talking about just want to understand what the baseball pros here are thinking. Thanks. If we follow the above logic and use Method I, with games instead of At Bats, the equation of the line joining (42, 47) and (49, 57) is y = 1.429x – 13 where x is the number of games played and y is the RBI. Hence, after 65 games, which can be observed in June 2013, Cabrera should be at y = RBI = 80. Also, he will be at y = RBI = 193 at game 144 and therefore well within the possibility of breaking Wilson’s record.

Page | 27

About the author V. Laxmanan, Sc. D.
The author obtained his Bachelor’s degree (B. E.) in Mechanical Engineering from the University of Poona and his Master’s degree (M. E.), also in Mechanical Engineering, from the Indian Institute of Science, Bangalore, followed by a Master’s (S. M.) and Doctoral (Sc. D.) degrees in Materials Engineering from the Massachusetts Institute of Technology, Cambridge, MA, USA. He then spent his entire professional career at leading US research institutions (MIT, Allied Chemical Corporate R & D, now part of Honeywell, NASA, Case Western Reserve University (CWRU), and General Motors Research and Development Center in Warren, MI). He holds four patents in materials processing, has co-authored two books and published several scientific papers in leading peer-reviewed international journals. His expertise includes developing simple mathematical models to explain the behavior of complex systems. While at NASA and CWRU, he was responsible for developing material processing experiments to be performed aboard the space shuttle and developed a simple mathematical model to explain the growth Christmas-tree, or snowflake, like structures (called dendrites) widely observed in many types of liquid-to-solid phase transformations (e.g., freezing of all commercial metals and alloys, freezing of water, and, yes, production of snowflakes!). This led to a simple model to explain the growth of dendritic structures in both the groundbased experiments and in the space shuttle experiments. More recently, he has been interested in the analysis of the large volumes of data from financial and economic systems and has developed what may be called the Quantum Business Model (QBM). This extends (to financial and economic systems) the mathematical arguments used by Max Planck to develop quantum physics using the analogy Energy = Money, i.e., energy in physics is like money in economics. Einstein applied Planck’s ideas to describe the photoelectric effect (by treating light as being composed of particles called photons, each with the fixed quantum of energy conceived by Planck). The mathematical law deduced by Planck, referred to here as the generalized power-exponential law, might
Page | 28

actually have many applications far beyond blackbody radiation studies where it was first conceived. Einstein’s photoelectric law is a simple linear law and was deduced from Planck’s non-linear law for describing blackbody radiation. It appears that financial and economic systems can be modeled using a similar approach. Finance, business, economics and management sciences now essentially seem to operate like astronomy and physics before the advent of Kepler and Newton. Finally, during my professional career, I also twice had the opportunity and great honor to make presentations to two Nobel laureates: first at NASA to Prof. Robert Schrieffer (1972 Physics Nobel Prize), who was the Chairman of the Schrieffer Committee appointed to review NASA’s space flight experiments (following the loss of the space shuttle Challenger on January 28, 1986) and second at GM Research Labs to Prof. Robert Solow (1987 Nobel Prize in economics), who was Chairman of Corporate Research Review Committee, appointed by GM corporate management.

Cover page of AirTran 2000 Annual Report
Can you see that plane flying above the tall tree tops that make a nearly perfect circle? It requires a great deal of imagination to see and to photograph it.

Page | 29

Sign up to vote on this title
UsefulNot useful