You are on page 1of 3

With the increasing the data complexitiy without a proper tool like R , its

impossible to get the data properly framed and plotted against the given data set.R
Programming offers a vast range of inbuilt function to build and visualize the
data.

We can take a exmpale of air passengers over given period of time.Line Charts are
commonly preferred when we are to analyse a trend spread over a time period.

this dataset includes pitchers, who might never go up to bat in a whole season
## playerID yearID teamID lgID stint G G_batting AB R H X2B
## 1: aardsda01 2006 CHN NL 1 45 43 2 0 0 0
## 2: aardsda01 2008 BOS AL 1 47 5 1 0 0 0
## 3: aaronha01 1954 ML1 NL 1 122 122 468 58 131 27
## 4: aaronha01 1955 ML1 NL 1 153 153 602 105 189 37
## 5: aaronha01 1956 ML1 NL 1 153 153 609 106 200 34
## ---
## X3B HR RBI SB CS BB SO IBB HBP SH SF GIDP G_old salary birthYear
## 1: 0 0 0 0 0 0 0 0 0 1 0 0 45 NA 1981
## 2: 0 0 0 0 0 0 1 0 0 0 0 0 5 403250 1981
## 3: 6 13 69 2 2 28 39 NA 3 6 4 13 122 NA 1934
## 4: 9 27 106 3 1 49 61 5 3 7 4 20 153 NA 1934
## 5: 14 26 92 2 4 37 54 6 2 5 7 21 153 NA 1934
## ---
## 84365: 1 0 2 0 0 1 2 0 0 0 0 0 45 NA 1924
## 84366: 0 0 5 1 NA 11 NA NA 1 1 NA NA 27 NA 1888
## 84367: 8 16 95 21 NA 46 68 NA 1 10 NA NA 154 NA 1888
## 84368: 7 13 94 24 NA 67 65 NA 2 18 NA NA 150 NA 1888
## 84369: 0 1 8 0 NA 4 6 NA 0 2 NA NA 35 NA 1888
## birthMonth birthDay birthCountry birthState birthCity deathYear
## 1: 12 27 USA CO Denver NA
## 2: 12 27 USA CO Denver NA
## 3: 2 5 USA AL Mobile NA
## 4: 2 5 USA AL Mobile NA
## 5: 2 5 USA AL Mobile NA
## ---
## 84365: 8 20 USA MI Holland NA
## 84366: 11 2 USA MO St. Louis 1978
## 84367: 11 2 USA MO St. Louis 1978
## 84368: 11 2 USA MO St. Louis 1978
## 84369: 11 2 USA MO St. Louis 1978
## deathMonth deathDay deathCountry deathState deathCity nameFirst
## 1: NA NA David
## 2: NA NA David
## 3: NA NA Hank
## 4: NA NA Hank
## 5: NA NA Hank
## ---
## 84365: NA NA George
## 84366: 3 27 USA CA La Crescenta Dutch
## 84367: 3 27 USA CA La Crescenta Dutch
## 84368: 3 27 USA CA La Crescenta Dutch
## 84369: 3 27 USA CA La Crescenta Dutch
## nameLast nameGiven weight height bats throws debut
## 1: Aardsma David Allan 205 75 R R 2004-04-06
## 2: Aardsma David Allan 205 75 R R 2004-04-06
## 3: Aaron Henry Louis 180 72 R R 1954-04-13
## 4: Aaron Henry Louis 180 72 R R 1954-04-13
## 5: Aaron Henry Louis 180 72 R R 1954-04-13
## ---
## finalGame retroID bbrefID name
## 1: 2013-09-28 aardd001 aardsda01 Dave Adam
## 2: 2013-09-28 aardd001 aardsda01 Dave Adam
## 3: 1976-10-03 aaroh101 aaronha01 Hope Allen
## 4: 1976-10-03 aaroh101 aaronha01 Hope Allen
## 5: 1976-10-03 aaroh101 aaronha01 Hope Allen

Dave Adam, who in many years never even had a single At Bat (AB is 0).

## playerID yearID teamID lgID stint G G_batting AB R H X2B


## 1: aardsda01 2006 CHN NL 1 45 43 2 0 0 0
## 2: aardsda01 2008 BOS AL 1 47 5 1 0 0 0
## 3: aaronha01 1954 ML1 NL 1 122 122 468 58 131 27
## 4: aaronha01 1955 ML1 NL 1 153 153 602 105 189 37
## 5: aaronha01 1956 ML1 NL 1 153 153 609 106 200 34

summarized.batters = merged.all[, list(Total.HR=sum(HR)), by="playerID"]

We cna create one more column called Total HR


summarized.batters

## playerID Total.HR
## 1: aardsda01 0
## 2: aaronha01 755
## 3: aaronto01 13
## 4: aasedo01 0
## 5: abadan01 0

we can see that we've created a new data.table that contains each player's ID and
their total career home runs

merged.all[, name:=paste(nameFirst, nameLast)]

## playerID yearID teamID lgID stint G G_batting AB R H X2B


## 1: aardsda01 2006 CHN NL 1 45 43 2 0 0 0
## 2: aardsda01 2008 BOS AL 1 47 5 1 0 0 0
## 3: aaronha01 1954 ML1 NL 1 122 122 468 58 131 27
## 4: aaronha01 1955 ML1 NL 1 153 153 602 105 189 37
## 5: aaronha01 1956 ML1 NL 1 153 153 609 106 200 34

Now using merge.all , we can merge all the data

## finalGame retroID bbrefID name


## 1: 2013-09-28 aardd001 aardsda01 Dave Adam
## 2: 2013-09-28 aardd001 aardsda01 Dave Adam
## 3: 1976-10-03 aaroh101 aaronha01 Hope Allen
## 4: 1976-10-03 aaroh101 aaronha01 Hope Allen
## 5: 1976-10-03 aaroh101 aaronha01 Hope Allen

Atlast we can summiraze the data and see , how it looks like.

## playerID name Total.HR


## 1: aardsda01 Dave Adam 0
## 2: aaronha01 Hope Allen 755
## 3: aaronto01 Tommie Aaron 13
## 4: aasedo01 Don Aase 0
## 5: abadan01 Andy Abad 0

summarized.batters = merged.all[, list(Total.HR=sum(HR), Total.R=sum(R),


Total.H=sum(H)), by=c("playerID", "name")]
summarized.batters

In the same way we can summarize by other statistics, like total number of hits or
runs

## playerID name Total.HR Total.R Total.H


## 1: aardsda01 Dave Adam 0 0 0
## 2: aaronha01 Hope Allen 755 2174 3771
## 3: aaronto01 Tommie Aaron 13 102 216
## 4: aasedo01 Don Aase 0 0 0
## 5: abadan01 Andy Abad 0 1 2

The more a player gets hits in baseball, the more chance they have to actually
score runs

You might also like