# The Method of Least Squares Predicting the Batting Average of a Baseball Player (Hamilton in 2013

)

Summary

A graphical representation of the batting stats, for a baseball player, using the At Bats-Hits diagram is shown to reveal a simple linear relation y = hx + c relating the number of AB (x) to the Hits (y) where the constant c is related to the number of missing hits and the constant h is the rate of increase of hits as the AB increases (comparable to the marginal tax rate in the tax problem). For Josh Hamilton, who is now believed to be in a slump, the analysis of the game-by-game 2013 stats to date indicates that his BA can be improved to a value between 0.256 t0 0.263. His best performance in 2013 is a hitting rate h = ∆y/∆x = 4/12 = 0.333 between game 25 (April 29, 2013 with stat 104, 21) and game 28 (May 2, 2013 with stat 116, 25). Much higher BA can only be achieved if either the slope h (the marginal hitting rate) and/or the intercept c (the baseball work function) can be further improved.

*********************************************************

In this article we will discuss the application of the Method of Least Squares to analyze the batting average (BA) of a baseball player. We will also see how it can be used to predict the end of the season performance for a baseball player based on the first month stats. We will use the example of the Angels’ player Josh Hamilton for no reason other than the fact that he is now in going through a slump and is in the news and at the center of the first month woes of the Angels, see Refs. [1,2]. The methodology used here is similar to that described in detail in my earlier analysis of the career batting stats of baseball legend Babe Ruth, published recently, see Refs. [3, 4].

1).4) where the first number x is the number of At Bats (AB) and the second number is the number of Hits (H) in the game. -2 etc.. [5]. 8 Number of Hits (H). is related to the number of missing hits. y Josh Hamilton (2010) Game-by-game log 6 4 2 0 0 2 4 6 8 Number of At Bats (AB). month-by-month) stats are all available.3). x Figure 1: Josh Hamilton’s game-by-game batting performance in his best season to date (2010). y) pairs illustrate (AB. we will find (x. Hamilton’s best BA to date. 2). see Ref. the game-by-game performance can be described by the mathematical law y = hx + c where the slope h = 1 and c = 0. The (x. 0).e. -1. (1. However. (4. or any other baseball player (like the legendary Babe Ruth). seasonal. y) scores such as (0. (3. Page | 2 . At the end of the season AB = x = 518 and H = y = 186 and the batting average BA = y/x = 186/518 = 0. The BA y/x = h + (c/x) deviates from the ideal value of y/x = 1 + (c/x) as the AB increases and the missed hits increase correspondingly. The game-by-game. and split-season (i. Hits) scores for each game. If we analyze the game-by-game for Josh Hamilton.The batting stats being analyzed here can be obtained from Yahoo Sports (or some other website of choice).359. The same scores were achieved in more than one game. (2.

in these games. 3) in that season.286 0. we also find scores like (2. (3._ylt=AnBC4BTb9Y_nwJSf_wgJCnKFCLcF?year=2010&type =Batting For Babe Ruth.362 http://sports. I analyzed his best season to date (2010) and only found the scores of (0.327 0. the baseball player has the theoretically PERFECT batting average BA = H/AB = 1/1 = 1. The batting average BA = y/x = 1 – (1/x) is less than the perfect value and decreases as the number of AB = x increases. 3).1). (4. y = x + c where the ratio y/x = BA = 1 and the constant c = 0.294 0.390 0. the law is y = x + c = x – 1 where the constant c = -1 is related to the number of missing hits.265 0. If we continue the analysis.454 0. Missing were the scores like (1.0).0).1).4) and (6.324 0. For these games.000 0. (4.000 = 2/2 = 3/3. (2. (3.2). (5. 0) and (3. We will also find scores such as (1.5).418 0.com/mlb/players/6679/splits.400 1.Table 1: Josh Hamilton’s 2010 Batting Stats Month or Other At Bats Hits description AB = x H=y April 83 22 May 109 32 June 108 49 July 98 41 August 104 37 September 5 2 October 11 3 Home 264 103 Away 254 83 Day 133 38 Night 385 148 Indoors 34 11 Outdoors 484 175 Data Source for the 2010 season (click here) Batting Average BA = y/x 0. In other words. etc. which we find for Babe Ruth. Regardless. 1).yahoo. 2) etc. (2. (5. we find all these scores. The batting stats can be described Page | 3 . 2).3) which means y = x + c = x – 2 and again the constant c = -2 the number of missing hits and the batting average y/x = 1 – (2/x) deviates from the perfect value of 1 and decreases with increasing AB. For Hamilton.356 0.384 0. Josh Hamilton’s 2010 season game-bygame performance is illustrated in Figure 1.

This is illustrated in Figures 2 and 3 for Hamilton. and by season.4]). and look at the performance of the same player.by a series of parallels with the general equation y = hx + c where the slope h = 1 and the nonzero intercept c = 0. we find the more general law y = hx + c where the slope h < 1 and the constant c is related to the skill of the baseball player (how many missing hits on average).. using energy in the form of a light source. 60 50 40 30 20 10 0 0 20 40 60 80 100 120 140 160 Number of Hits (H). The batting stats for the season are summarized in Table 1 for convenience. Page | 4 . y Josh Hamilton (2010) Monthly stats Number of At Bats (AB). if we start aggregating the batting stats on a monthly basis. The constant c can be thought of as the baseball work function and is analogous to the idea of a work function used in physics to describe photoelectricity (the production of free electrons from within a metal. -1. etc. the number of missing hits. In Figure 3 we consider the career stats to date (2007-2013 through May 5. see Table 2 for the data. 2013 (AB = 125 and H = 26). In Figure 2 we consider the 2010 season. x Figure 2: Josh Hamilton’s aggregated monthly batting stats for his best season to date (2010). Now. see Refs. -2. [3.

The individual (x.304 0.181/x) is less than the slope h of the line and increases with increasing AB.208 Data source: http://sports.375x – 1. The data shows an upward trend and a best-fit line can be determined using the method of least squares. The equation of the best-fit line y = 0. The BA y/x = 0.337. y) pair for the 2010 season falls above the best-fit line and thus gives the highest. y) pair in Figure 3 represents the data for a single season (with partial season for 2013).268 0. The slope h of the best-fit line is the maximum BA that can be achieved by a player. The BA for the 2013 season is only 0. Table 2: Hamilton’s Career Batting Stats (thru’ 5/5/13) Season (year) 2007 2008 2009 2010 2011 2012 2013 At Bats AB = x 298 624 336 518 487 562 125 Hits H=y 87 190 90 186 145 160 26 Batting Average BA = y/x 0. 7] for more details and a worked example. Page | 5 .181. y) pairs in Figure 2 are the (AB. statistically speaking.359.8779. career BA = 0. Hamilton’s best BA to date. see also the discussion of Babe Ruth’s batting stats in Refs.805. the BA that Hamilton can achieve is the limiting value of h = 0.208 to date. The same upward trend is again observed and the best-fit line through the data has the equation y = hx + c = 0. [3. Each (x. to date.337x – 15. The (x.285 0. [6.yahoo.4]. the slope of the line is the current performance continues.337 – 15.359 for a season. The slope h and the intercept c are related to the skill of the baseball player and the scatter is related to the “consistency” achieved by the player on a month-to-month basis. with the linear regression coefficient r2 = 0.com/mlb/players/6679/career. At the end of the season AB = x = 518 and H = y = 186 and the batting average BA = y/x = 186/518 = 0.359 0.375 – (1. The same trends are also observed when we aggregate the data on a seasonal basis._ylt=AonDH5cy3IrM_WMi_1w0IwKFCLcF However.298 0. The BA y/x = 0. see Refs. Hits) scores for each month.292 0.337.805/x increasing with increasing At bats x and will approach the limiting value of h = 0.

With this background. 2013).This also means that Hamilton is capable of improving of his batting performance during the remaining months of the 2013 season. 0) and (47. Considering Figure 4. y) pairs represent the cumulative AB and Hits as the season has progressed through the game on May 5. These (x. The straight line joining these two (x. the (AB. x Figure 3: Josh Hamilton’s aggregated career batting stats to date (through game on May 5. This is illustrated by the game by game stats analysis in Figures 4 and 5. let us now consider the batting stats for the 2013 season in a bit more detail to see the batting average that could potentially be achieved. 2013. y) pairs has the highest slope as seen here with all the other (x. y) pairs falling below this line. Page | 6 . y 200 Josh Hamilton (2007-2013) Career stats 150 100 50 0 0 200 400 600 800 Number of At Bats (AB). 11). Hits) through game 1 and game of 12 of the season have the values (4. 250 Number of Hits (H). based on the performance to date.

based on his performance to date. We also know that c is related to the number of missing hits.256 to 0. 2013).4) = 11/37 with ∆y being the change in the number of hits (11 – 0) between game 1 and game 12 and ∆x being the change in the number of AB = (47 – 4). see Figure 5. the constant c = .1. x Figure 4: Josh Hamilton’s game-by-game batting stats to date for the 2013 season (through the game on May 5.023 where the slope h = ∆y/∆x = (11 – 0)/(47. For Hamilton.023. The limiting performance is described by the equation is y = 0. this slope h is higher than the batting average BA = y/x since y/x = h + (c/x) is less than h when c < 0. Page | 7 .256x – 1. The slope h is the rate of increase of Hits as the At Bats increase. y) pairs for game 5 and game 24. y Josh Hamilton (2013) y = 0. This means that Hamilton is capable of improving his performance and achieving a batting average (BA) as high as 0. with very nearly the same slope as the solid line but with an intercept c which has become more negative (missing hits has increased. statistically speaking).023 25 20 15 10 5 0 0 20 40 60 80 100 120 140 Number of At Bats (AB). As discussed already.256x – 1.35 30 Number of Hits (H). If we examine the data as the season progressed we find another line joining the (x.263.

The dashed line joins the (x. The baseball work function tells us something about the difficulty of producing Hits (or Home Runs if HR stats are being analyzed) in baseball. each Page | 8 . This will also change the constant c which has been called the work function for a baseball player. which tells us something about the “difficulty” of producing electrons from a metal when light (a stream of photons.263x – 4.26 Number of At Bats (AB). 2013 with stat 104.35 30 25 20 15 10 5 0 0 20 40 60 80 100 120 140 Number of Hits (H).333 between game 25 (April 29. 21). 1) and game 24 (96. see discussion of Babe Ruth’s stats. 21) and game 28 (May 2. 2013). y) pairs at the end of game 5 (20. x Figure 5: Josh Hamilton’s game-by-game batting stats to date for the 2013 season (through the game on May 5. Examination of Figure 5 shows that Hamilton’s best performance in 2013 season (highest slope of the x-y graph) is a hitting rate h = ∆y/∆x = 4/12 = 0. 2013 with stat 116. 25). An even higher BA requires a very significant change in the slope h. y Josh Hamilton (2013) y = 0. This is similar to the work function conceived by Einstein to explain the photoelectric effect.

because of various factors (associated with the pitcher.208 Data source: http://sports.000. pitching speed.228 0. This is exactly like the missing hits in baseball. if c < 0. see Ref. This depends on the skill of the baseball player. the BA was decreasing with increasing AB. The maximum energy of the electron K = ε – W where “W” is the energy that must be given up to do work necessary to overcome the forces binding the electron to the metal.having the energy ε) shines on its surface. This difference means that for Ruth. Table 3: Hamilton’s 2013 Season Stats (thru’ 5/5/13) Date Games (2013 season) (2013 season) April 1 April 7 April 10 April 11 April 14 April 22 April 24 April 27 April 30 May 5 1 6 8 9 12 18 20 23 26 31 At Bats AB = x 4 25 32 35 47 72 80 92 108 125 Hits H=y 0 4 5 7 11 16 18 21 22 26 Batting Average BA = y/x 0.com/mlb/players/6679/career.225 0. the nonzero intercept c > 0 whereas for Babe Ruth c < 0. etc.234 0. difference in the stadium. even wind speed.222 0. If every AB produces a Hit. night versus day.yahoo. Hence.200 0. through 5/5/13. indoor versus outdoors. The English translations of Einstein’s original 1905 paper (written in German) refer to “W” as the work function. the BA is always less than the slop h. for Lou Gehrig.235 Page | 9 . However._ylt=AonDH5cy3IrM_WMi_1w0IwKFCLcF For the month of May. the BA is less than the perfect values and the slope h < 1. Hamilton’s rate h = 4/17 = 0.160 0. the BA increased with increasing AB whereas for Gehrig.204 0.156 0.) some of the AB do not produce Hits.000 0. [4]. As an example. However. the player will have the PERFECT Batting Average BA = 1. in the 1927 season (when Ruth won the home run race and set the single season record of 60 home runs).

x Figure 6: Josh Hamilton’s game-by-game batting stats through May 5. In the Airline Quality Page | 10 . Again we see a nice linear relation. see Table 3. the currently observed hitting rate.986. (x. 2013. If there is no significant improvement in performance.222 – (0.786.222x – 0. y 25 20 15 10 5 0 0 20 40 60 Josh Hamilton (2013) y = 0. The BA = y/x = 0.786/x) increasing as the AB increases. see Refs. Hits). [8-10] and the Debt-GDP problem. The best-fit equation y = 0.35 30 Number of Hits (H). A more detailed discussion of the work function may also be found within the context of other problems such as the airline On-Time arrivals problem.986 80 100 120 140 160 Number of At Bats (AB). also explains the “fluctuations” in the BA values which are also seen to decrease and then increase again. both of which are described by the same mathematical law y = hx + c. are aggregated to yield the ten (At Bats. the highest BA that can be attained equals the slope h = ∆y/∆x = 0. The “fluctuations” in the data observed here. [11-14]. above and below the best-fit line.222. Refs.786 r2 = 0. while showing a generally upward trend.222x – 0. y) pairs plotted here. with r2 = 0. This is lower than the limiting value suggested by the cumulative game-by-game stats considered in Figures 4 and 5.

The Batting Average BA = y/x = h + (c/x). It is almost like a universal law.Rating (AQR) problem. of course. we want to consider the best if we want to find "universal" laws. The "work function" lumps together all of this complexity in a simple number. 2013 This is being prompted mainly by recent piece by Yahoo Sportswriter Tim Brown entitled. I considered Babe Ruth's best seasons (1923 and 1927) of his career. see link given below. The number of missing hits. I like to analyze baseball stats and have shown that there is a simple relation between At Bats (x) and Hits (y). Brown: I read your recent piece on Josh Hamilton's struggles with great interest. Of course. I sent Brown the following email. a matter of great concern now. Then. Appendix 1: Hamilton’s Batting Update on June 19. an On-Time arrival is like a hit in baseball and the number of flights operated by the airline is like the number of At Bats. The nonzero intercept c is very important and I call it the "baseball work function" and it can be related to the "missing hits". which can be deduced by considering the game-by-game batting stats of a baseball player. And so. depends on a number of complex factors. the nonzero intercept c. http://sports. and the subject of one of the great economic debates of our times (following the discovery of coding errors in a paper written two Harvard economists) can be also be understood in terms of the idea of a work function. The law can be written mathematically as y = hx + c where h is the slope of the AB-Hits graph and c is the nonzero intercept.html.com/news/hamilton-still-isn%E2%80%99t-hitting--andhe-can%E2%80%99t-quite-figure-out-why044424016. or the nonzero intercept c. I have shown this rigorously using Babe Ruth’s game logs. “Hamilton isn’t hitting and he can’t quite figure out why: weird man”._ylt=A2KJ2UhiCsJRwGsAoWrQtDMD Dear Mr. Page | 11 . including the skill of the baseball player. The Debt/GDP ratio.yahoo.

which means that the more the AB the more the hits and Ruth's BA just keeps increasing the more he plays. y 70 60 50 40 30 20 10 0 0 50 100 150 200 250 300 350 Number of At Bats (AB).204 0. the nonzero intercept c was positive. If we consider the end of April (AB. 2013.237 0. The nonzero intercept Page | 12 . Notice that Babe Ruth had a negative c. With this background. do take a look at my analysis of Josh Hamilton as well. H) and the to-date stats in June 2013.This is described in several articles (I have provided some links below). Before sending this email. I checked Hamilton’s April. Gehrig lost the home run race in the 1927 season. May and June stats. For his Yankee teammate. in the same season.1.219x . 2013 April May June BA 0.204 0. x Figure A1: Josh Hamilton’s cumulative AB and Hits through June 19.213 Number of Hits (H).190 80 AB 108 97 63 Hits (H) 22 23 12 Cum AB 108 205 268 Cum H 22 45 57 Cum BA 0.220 0. the equation y = 0. which means the more the AB the lower the BA. Lou Gehrig. No wonder.625 describes this season so far.

220 at the end of May.219 . Page | 13 . Any improvement will require a major change in the nonzero intercept c (work function has to improve) or the slope h (hitting rate has to improve.c is negative (c = -1. It is quite simple. Thanks and regards. Very sincerely V. WPA etc.(1. some real professional help and must get off that booze. Hamilton needs some help. If you do read the Babe Ruth articles. you will get the hang of it. It also appears that he has improved his BA in May and June compared to April. This is coming from a baseball fan who is also a geek as far as baseball stats. There is no way his BA will improve this season unless there is a fundamental change in what he is doing with the bat. (This is a minor statistical fluctuation.625). There are lots of graphs. If possible. really. please pass this message on to Hamilton and urge him to take a hard look in the mirror and think about what he is doing with his personal life off the field. more hits for same AB).219.) In other words. But Hamilton’s problem is the slope h = 0. which is equal to the rate of increase of Hits as AB increases. The slope of the AB-Hits graph is the theoretical maximum BA. since BA = y/x = 0. When I hear someone say "Weird man". And a lot better than WAR. the batting data reveals that Hamilton is all MAXED OUT. if it is true. Laxmanan Some articles that you might find of interest (I hope). that seems to be a problem too. Hamilton already had a BA of 0. I see other commentators mention alcoholism and even drugs. Sabermetrics stuff.219.625/x) will increase as At Bats x increase (because of negative c) but can never exceed h = 0.

Bangalore. Sc. More recently. E.g. each with the fixed quantum of energy conceived by Planck). While at NASA and CWRU. USA. MI). production of snowflakes!).) in Mechanical Engineering from the University of Poona and his Master’s degree (M. freezing of water. now part of Honeywell. also in Mechanical Engineering. he has been interested in the analysis of the large volumes of data from financial and economic systems and has developed what may be called the Quantum Business Model (QBM). Cambridge. This led to a simple model to explain the growth of dendritic structures in both the groundbased experiments and in the space shuttle experiments.About the author V.. M. MA. Einstein applied Planck’s ideas to describe the photoelectric effect (by treating light as being composed of particles called photons. He then spent his entire professional career at leading US research institutions (MIT. referred to here as the generalized power-exponential law..). and General Motors Research and Development Center in Warren. or snowflake. and. might Page | 18 . Laxmanan. He holds four patents in materials processing. D. Allied Chemical Corporate R & D. The mathematical law deduced by Planck. The author obtained his Bachelor’s degree (B. yes.) and Doctoral (Sc. like structures (called dendrites) widely observed in many types of liquid-to-solid phase transformations (e. E. from the Indian Institute of Science. Case Western Reserve University (CWRU).e. D. has co-authored two books and published several scientific papers in leading peer-reviewed international journals. This extends (to financial and economic systems) the mathematical arguments used by Max Planck to develop quantum physics using the analogy Energy = Money. NASA. energy in physics is like money in economics. i. he was responsible for developing material processing experiments to be performed aboard the space shuttle and developed a simple mathematical model to explain the growth Christmas-tree.) degrees in Materials Engineering from the Massachusetts Institute of Technology. freezing of all commercial metals and alloys. His expertise includes developing simple mathematical models to explain the behavior of complex systems. followed by a Master’s (S.

Robert Schrieffer (1972 Physics Nobel Prize). Robert Solow (1987 Nobel Prize in economics). Finally, I also twice had the opportunity and great honor to make presentations to two Nobel laureates: first at NASA to Prof. who was the Chairman of the Schrieffer Committee appointed to review NASA's space flight experiments (following the loss of the space shuttle Challenger on January 28, 1986) and second at GM Research Labs to Prof. who was Chairman of Corporate Research Review Committee, appointed by GM corporate management, during my professional career.