Fundamental Concepts in Data Analysis

# Fundamental Concepts in Data Analysis

Detroit Tigers’ superstar, Miguel Cabrera, is believed to be on track in this season to break Hack Wilson’s single season RBI (Runs Batted In) record of 191, established in 1930. I have analyzed Cabrera’s batting stats for 2011, 2012 and the 2013 seasons using both a linear and a nonlinear model to fit the AB (At Bats)-RBI data. The discussion of baseball stats here thus serves as a useful proxy to understand many fundamental concepts of data analysis that, sadly, are NOT used in the so-called “soft” sciences. For example, the ratio y/x (where x and y are our empirical observations on any problem of interest) is often taken to be the rate of change. However, as we see here in the discussion of baseball statistics, the ratio y/x is not the same as the “rate” of change (example unemployment rate, various fatality rates). The true measure of the rate of change is the ratio ∆y/∆x, as we learn in our elementary calculus courses. The ratio y/x is rarely equal to the ratio ∆y/∆x. The study of baseball statistics helps to illustrate these basic concepts
06/15/2013

Fundamental Concepts in Data AnalysisThe Great Chase of the All-Time SingleSeason RBI Baseball Record of 1930
Summary
Detroit Tigers’ superstar
, Miguel Cabrera, is believed to be on track in this
season to break Hack Wilson’s single season RBI
(Runs Batted In) record of
191, established in 1930. I have analyzed Cabrera’s batting stats for 2011,
2012 and the 2013 seasons using both a linear and a nonlinear model to fit theAB (At Bats)-RBI data. The discussion of baseball stats here thus serves as auseful proxy to understand many fundamental concepts of data analysis that,sadly, are NOT used in the so-
called “soft” sciences. For example, the ratio y/x
(where x and y are our empirical observations on any problem of interest) isoften taken to be the rate of change. However, as we see here in the discussionof baseball statistics, the ratio y/x is not the same as the
rate
of change(example unemployment rate, various fatality rates). The true measure of therate of change is the ratio
∆y/∆x
, as we learn in our elementary calculuscourses. The ratio y/x is rarely equal to the ratio
∆y/∆x
. The study of baseballstatistics helps to illustrate these basic concepts; see also the companion Financial-World-will-Change-Forever-if-Wall-Street-Starts-Analyzing-Financial-Data-like-we-do-Baseball-Stats-Miguel-Cabrera
The following comment was posted ~ 5:50 AM on May 28, 2013
Vj LaxmananOk, here's good news for those who want to see the Wilson record broken. I took a second harder look and used a nonlinear model. In the linear model, we take the data for April and May tomake the end of season projection. This is called Method I in my earlier analysis (alreadyuploaded). If I apply the same to full 2012 stats, we see a clear deviation from the linear model.Cabrera was operating on a nonlinear power curve in 2012. Statistical analysis gives a very goodfit and to the yearend data using the nonlinear model for 2012. We can do the same now with2013 data. Even so, the AB need is too high and may not be achieved. I will update the earlier  post by adding the nonlinear analysis soon. The linear model may be found here, see link http://www.scribd.com/doc/144083838/Is-Miguel-Cabrera-on-Pace-to-Break-Hack-Wilson-s-Single-Season-RBI-Record-No-Way-Read-On-Now
The following comment was posted at ~ 7:26 PM on May 27, 2013http://www.mlive.com/tigers/index.ssf/2013/05/what_theyre_saying_migue I did a statistical analysis of AB-RBI for Cabrera and so this is not just myopinion. It is based on mathematical analysis. Cabrera can beat his ownrecord. He will need about 490 to 500 AB to cross 140 RBI assuming thepresent pace continues. But, there is just NO WAY, just IMPOSSIBLE, for him tobeat the all-time record and do better than 190 or 191 RBIs. He just will not have the AB needed to beat the record.Whoever made the prediction that Miguel is on pace --- better check the mathout. We just cannot use ratios to make such predictions, such the current ratioRBI/AB = 57/199 = 0.286 to make the prediction. One must use the rate of change of RBI with increasing AB. That will give the more accurate prediction.I ran the statistics using just 2013 data (two ways of doing it, just end of Apriland May data and 16 points spread out through April and May) and alsoconsidering the full 2011, 2012 seasons along with 2013 season to date.There is just no way he could beat the all-time RBI record. At least not in this

season. I have recently posted a more detail article on Cabrera battingaverage, see my Instablog here http://seekingalpha.com/instablog/958073-vlaxmanan/1894301-trust-me-the-financial-world-will-change-forever-if-wall-street-starts-analyzing-financial-data-like-we-do-baseball-statistics-miguel-cabrera I will follow up later with my analysis of the RBI for all to review.
Of course, you can rip me….and the predictions can b
e tested, for that
matter even by the end of June 2013, or July 2013. Don’t have to wait for
season end. Cheers!

(See update with the nonlinear modelpresented after Method III.)
Miguel
Cabrera’s batting stats caught my attention recently as I was trying to
explain the meaning of a
“work function”
(an idea first conceived by Einsteinin 1905) in the context of my discussion of the US traffic fatalities (which went up in 2012 after six straight years of decline, hence the discussion, see thereferences cited).
The “work function” is the term used by Einstein to
describe the nonzero intercept c in the linear law y = hx + c, describing theempirical observations on photoelectricity; see references cited. Likewise,there is a work function in baseball, i.e., a nonzero c in the At Bats-Hit relation,or At Bats-RBI relation and so on. Likewise, we find a work function in manyother problems of interest to us.Instead of using my earlier analysis of the legendary baseball player Babe
Ruth’s batting stats (to explain the work function), I decided to use Cabrera’s
recent four-game stretch to discuss the work function; see Refs. [1, 2]. Cabrerawas making the MLB sports headlines. During this stretch, Cabrera had sixhome runs in four consecutive games with a home run in each game, and anincredible batting average of 0.600 over this stretch. I also came acrossanother intriguing stat that was being discussed
that he is on pace to break the single season RBI record set by Hack Wilson in 1930.