You are on page 1of 12

data=read.csv('NBA_sample.

csv')
head(data)

## GAME_ID DATE HOME_TEAM AWAY_TEAM PLAYER_NAME PLAYER_ID LOCATION


## 1 21400001 OCT 28, 2014 NOP ORL Omer Asik 201600 H
## 2 21400001 OCT 28, 2014 NOP ORL Omer Asik 201600 H
## 3 21400001 OCT 28, 2014 NOP ORL Omer Asik 201600 H
## 4 21400001 OCT 28, 2014 NOP ORL Jrue Holiday 201950 H
## 5 21400001 OCT 28, 2014 NOP ORL Jrue Holiday 201950 H
## 6 21400001 OCT 28, 2014 NOP ORL Jrue Holiday 201950 H
## WIN_LOSE SHOT_NUMBER PERIOD SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TIME
## 1 W 1 1 607 11.3 0 0.8
## 2 W 7 2 647 22.6 0 0.3
## 3 W 8 2 361 23.1 0 2.0
## 4 W 2 2 424 12.7 11 10.5
## 5 W 3 2 331 11.5 3 4.4
## 6 W 4 2 229 14.0 11 9.8
## SHOT_DIST PTS_TYPE CLOSEST_DEFENDER CLOSEST_DEFENDER_ID CLOSE_DEF_DIST
## 1 3.6 2 Nikola Vucevic 202696 1.7
## 2 1.2 2 Nikola Vucevic 202696 3.6
## 3 2.3 2 Kyle O'Quinn 203124 2.1
## 4 3.6 2 Elfrid Payton 203901 2.0
## 5 20.6 2 Elfrid Payton 203901 4.0
## 6 1.3 2 Nikola Vucevic 202696 3.3
## SUCCESS
## 1 1
## 2 1
## 3 1
## 4 1
## 5 0
## 6 0

Ten rows from the NBA_sample data set

GAME_ID DATE HOME_TEAM AWAY_TEAM PLAYER_NAME PLAYER_ID LOCATION WIN_LOSE SHOT_NUMBER PERIOD SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TIM

21400001 OCT NOP ORL Omer Asik 201600 H W 1 1 607 11.3 0


28,
2014

21400001 OCT NOP ORL Omer Asik 201600 H W 7 2 647 22.6 0


28,
2014

21400001 OCT NOP ORL Omer Asik 201600 H W 8 2 361 23.1 0


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 2 2 424 12.7 11 1


28,
2014

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
GAME_ID DATE HOME_TEAM AWAY_TEAM PLAYER_NAME PLAYER_ID LOCATION WIN_LOSE SHOT_NUMBER PERIOD SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TIM

21400001 OCT NOP ORL Jrue Holiday 201950 H W 3 2 331 11.5 3


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 4 2 229 14.0 11


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 6 3 595 10.7 0


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 7 3 528 14.4 10


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 11 4 181 20.2 1


28,
2014

21400001 OCT NOP ORL Ryan Anderson 201583 H W 1 1 237 10.1 0


28,
2014

summary(data)

## GAME_ID DATE HOME_TEAM AWAY_TEAM


## Min. :21400001 Length:50000 Length:50000 Length:50000
## 1st Qu.:21400235 Class :character Class :character Class :character
## Median :21400452 Mode :character Mode :character Mode :character
## Mean :21400454
## 3rd Qu.:21400677
## Max. :21400908
## PLAYER_NAME PLAYER_ID LOCATION WIN_LOSE
## Length:50000 Min. : 708 Length:50000 Length:50000
## Class :character 1st Qu.:101162 Class :character Class :character
## Mode :character Median :201939 Mode :character Mode :character
## Mean :157509
## 3rd Qu.:202704
## Max. :204060
## SHOT_NUMBER PERIOD SEC_REMAIN SHOT_CLOCK
## Min. : 1.00 Min. :1.000 Min. : 0.0 Min. : 0.00
## 1st Qu.: 3.00 1st Qu.:1.000 1st Qu.:174.0 1st Qu.: 7.90
## Median : 5.00 Median :2.000 Median :353.0 Median :12.10
## Mean : 6.48 Mean :2.456 Mean :352.2 Mean :12.21
## 3rd Qu.: 9.00 3rd Qu.:3.000 3rd Qu.:531.0 3rd Qu.:16.50
## Max. :37.00 Max. :4.000 Max. :720.0 Max. :24.00
## DRIBBLES TOUCH_TIME SHOT_DIST PTS_TYPE
## Min. : 0.000 Min. : 0.000 Min. : 0.0 Min. :2.000
## 1st Qu.: 0.000 1st Qu.: 0.900 1st Qu.: 4.7 1st Qu.:2.000
## Median : 1.000 Median : 1.600 Median :13.4 Median :2.000
## Mean : 2.049 Mean : 2.799 Mean :13.5 Mean :2.261
## 3rd Qu.: 3.000 3rd Qu.: 3.700 3rd Qu.:22.5 3rd Qu.:3.000

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## Max. :32.000 Max. :23.900 Max. :47.2 Max. :3.000
## CLOSEST_DEFENDER CLOSEST_DEFENDER_ID CLOSE_DEF_DIST SUCCESS
## Length:50000 Min. : 708 Min. : 0.000 Min. :0.0000
## Class :character 1st Qu.:101187 1st Qu.: 2.300 1st Qu.:0.0000
## Mode :character Median :201949 Median : 3.700 Median :0.0000
## Mean :158788 Mean : 4.113 Mean :0.4535
## 3rd Qu.:203078 3rd Qu.: 5.300 3rd Qu.:1.0000
## Max. :530027 Max. :53.200 Max. :1.0000

## 'data.frame': 50000 obs. of 20 variables:


## $ GAME_ID : int 21400001 21400001 21400001 21400001 21400001 21400001 21400001 21400001 21400001
21400001 ...
## $ DATE : chr "OCT 28, 2014" "OCT 28, 2014" "OCT 28, 2014" "OCT 28, 2014" ...
## $ HOME_TEAM : chr "NOP" "NOP" "NOP" "NOP" ...
## $ AWAY_TEAM : chr "ORL" "ORL" "ORL" "ORL" ...
## $ PLAYER_NAME : chr "Omer Asik" "Omer Asik" "Omer Asik" "Jrue Holiday" ...
## $ PLAYER_ID : int 201600 201600 201600 201950 201950 201950 201950 201950 201950 201583 ...
## $ LOCATION : chr "H" "H" "H" "H" ...
## $ WIN_LOSE : chr "W" "W" "W" "W" ...
## $ SHOT_NUMBER : int 1 7 8 2 3 4 6 7 11 1 ...
## $ PERIOD : int 1 2 2 2 2 2 3 3 4 1 ...
## $ SEC_REMAIN : int 607 647 361 424 331 229 595 528 181 237 ...
## $ SHOT_CLOCK : num 11.3 22.6 23.1 12.7 11.5 14 10.7 14.4 20.2 10.1 ...
## $ DRIBBLES : int 0 0 0 11 3 11 0 10 1 0 ...
## $ TOUCH_TIME : num 0.8 0.3 2 10.5 4.4 9.8 0.9 9.4 1.8 1.3 ...
## $ SHOT_DIST : num 3.6 1.2 2.3 3.6 20.6 1.3 20.9 16 4 25 ...
## $ PTS_TYPE : int 2 2 2 2 2 2 2 2 2 3 ...
## $ CLOSEST_DEFENDER : chr "Nikola Vucevic" "Nikola Vucevic" "Kyle O'Quinn" "Elfrid Payton" ...
## $ CLOSEST_DEFENDER_ID: int 202696 202696 203124 203901 203901 202696 203901 203901 203932 202699 ...
## $ CLOSE_DEF_DIST : num 1.7 3.6 2.1 2 4 3.3 6.7 2.8 5.5 4.2 ...
## $ SUCCESS : int 1 1 1 1 0 0 0 1 1 0 ...

M=cor(data[sapply(data,is.numeric)])
M

## GAME_ID PLAYER_ID SHOT_NUMBER PERIOD


## GAME_ID 1.000000000 0.0255284516 0.007606209 -0.003796700
## PLAYER_ID 0.025528452 1.0000000000 -0.004660915 0.006579311
## SHOT_NUMBER 0.007606209 -0.0046609154 1.000000000 0.646548072
## PERIOD -0.003796700 0.0065793110 0.646548072 1.000000000
## SEC_REMAIN 0.005016585 -0.0067354808 -0.220464233 -0.003534901
## SHOT_CLOCK 0.012335384 0.0339543515 -0.034338761 -0.042171722
## DRIBBLES -0.002495007 0.0230073923 0.135618945 0.051976854
## TOUCH_TIME -0.002803400 0.0014376711 0.141734082 0.042031282
## SHOT_DIST 0.001512646 -0.0235278731 0.011876958 0.027770040
## PTS_TYPE 0.008382307 0.0122510025 0.002240424 0.046885668
## CLOSEST_DEFENDER_ID 0.030784617 -0.0087141881 0.015515154 0.010003636
## CLOSE_DEF_DIST 0.010862689 0.0133460768 -0.035186967 -0.009965387
## SUCCESS -0.009505929 0.0004739182 -0.009933381 -0.017543691
## SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TIME
## GAME_ID 0.005016585 0.012335384 -0.002495007 -0.002803400
## PLAYER_ID -0.006735481 0.033954351 0.023007392 0.001437671
## SHOT_NUMBER -0.220464233 -0.034338761 0.135618945 0.141734082
## PERIOD -0.003534901 -0.042171722 0.051976854 0.042031282

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## SEC_REMAIN 1.000000000 0.083547254 -0.119217923 -0.107193663
## SHOT_CLOCK 0.083547254 1.000000000 -0.093604980 -0.152240041
## DRIBBLES -0.119217923 -0.093604980 1.000000000 0.930430405
## TOUCH_TIME -0.107193663 -0.152240041 0.930430405 1.000000000
## SHOT_DIST -0.024088534 -0.186365653 -0.081751017 -0.085041616
## PTS_TYPE -0.048798718 -0.049868136 -0.164934735 -0.181563830
## CLOSEST_DEFENDER_ID -0.010107345 -0.002970448 0.013922407 0.010644263
## CLOSE_DEF_DIST 0.005220037 0.019357294 -0.152893777 -0.167642788
## SUCCESS 0.014607510 0.106144010 -0.035780988 -0.048719861
## SHOT_DIST PTS_TYPE CLOSEST_DEFENDER_ID
## GAME_ID 0.0015126456 0.008382307 0.0307846174
## PLAYER_ID -0.0235278731 0.012251003 -0.0087141881
## SHOT_NUMBER 0.0118769583 0.002240424 0.0155151537
## PERIOD 0.0277700399 0.046885668 0.0100036363
## SEC_REMAIN -0.0240885342 -0.048798718 -0.0101073449
## SHOT_CLOCK -0.1863656530 -0.049868136 -0.0029704478
## DRIBBLES -0.0817510172 -0.164934735 0.0139224075
## TOUCH_TIME -0.0850416163 -0.181563830 0.0106442633
## SHOT_DIST 1.0000000000 0.746107695 0.0004503692
## PTS_TYPE 0.7461076948 1.000000000 0.0054807594
## CLOSEST_DEFENDER_ID 0.0004503692 0.005480759 1.0000000000
## CLOSE_DEF_DIST 0.5250464243 0.418037003 -0.0179132936
## SUCCESS -0.1905261604 -0.123084899 0.0016405956
## CLOSE_DEF_DIST SUCCESS
## GAME_ID 0.010862689 -0.0095059285
## PLAYER_ID 0.013346077 0.0004739182
## SHOT_NUMBER -0.035186967 -0.0099333813
## PERIOD -0.009965387 -0.0175436905
## SEC_REMAIN 0.005220037 0.0146075099
## SHOT_CLOCK 0.019357294 0.1061440103
## DRIBBLES -0.152893777 -0.0357809878
## TOUCH_TIME -0.167642788 -0.0487198609
## SHOT_DIST 0.525046424 -0.1905261604
## PTS_TYPE 0.418037003 -0.1230848986
## CLOSEST_DEFENDER_ID -0.017913294 0.0016405956
## CLOSE_DEF_DIST 1.000000000 0.0017080389
## SUCCESS 0.001708039 1.0000000000

library(DataExplorer)
plot_missing(data)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
library(corrplot)

## corrplot 0.90 loaded

corrplot(M)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
library(PerformanceAnalytics)

## Loading required package: xts

## Loading required package: zoo

##
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':


##
## as.Date, as.Date.numeric

##
## Attaching package: 'PerformanceAnalytics'

## The following object is masked from 'package:graphics':


##
## legend

# showing histogram
# color grey

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
chart.Correlation(data[sapply(data,is.numeric)],histogram=TRUE,
col="grey10",
pch=1,
cex.cor.scale=2,
main="Correlation Plot",
cex.labels=20)

#removing highly correlated Values


data=subset(data,select=-c(DRIBBLES))

data$SUCCESS = as.factor(data$SUCCESS)

library(ggplot2)
library(readr)
library(repr)

options(repr.plot.width=6, repr.plot.height=3.5)

ggplot(data, aes(x=SHOT_DIST, color=SUCCESS, group=SUCCESS)) +


geom_density() +
xlab("Shot Distance") + ylab("") +
theme_light()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
# CLOSE_DEF_DIST
library(dplyr)

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:xts':


##
## first, last

## The following objects are masked from 'package:stats':


##
## filter, lag

## The following objects are masked from 'package:base':


##
## intersect, setdiff, setequal, union

summary(data$CLOSE_DEF_DIST)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 0.000 2.300 3.700 4.113 5.300 53.200

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
options(repr.plot.width=6, repr.plot.height=3.5)

ggplot(data, aes(x=CLOSE_DEF_DIST, color=SUCCESS, group=SUCCESS)) +


geom_density() +
xlab("CLOSE_DEF_DIST") + ylab("") +
theme_light()

# CTOUCH_TIME
library(dplyr)
summary(data$TOUCH_TIME)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 0.000 0.900 1.600 2.799 3.700 23.900

options(repr.plot.width=6, repr.plot.height=3.5)

ggplot(data, aes(x=TOUCH_TIME, color=SUCCESS, group=SUCCESS)) +


geom_density() +
xlab("TOUCH_TIME") + ylab("") +
theme_light()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
data$HOME_TEAM = as.factor(data$HOME_TEAM)
data$AWAY_TEAM = as.factor(data$AWAY_TEAM)
data$LOCATION = as.factor(data$LOCATION)
data$WIN_LOSE = as.factor(data$WIN_LOSE)

model <- glm(SUCCESS ~ LOCATION + SHOT_NUMBER + PERIOD + PLAYER_ID +SHOT_DIST +CLOSE_DEF_DIST, data = data, famil
y = "binomial")

summary(model)

##
## Call:
## glm(formula = SUCCESS ~ LOCATION + SHOT_NUMBER + PERIOD + PLAYER_ID +
## SHOT_DIST + CLOSE_DEF_DIST, family = "binomial", data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0455 -1.0934 -0.8205 1.1522 2.0879
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.395e-01 3.407e-02 7.030 2.07e-12 ***
## LOCATIONH 3.539e-02 1.845e-02 1.918 0.0551 .
## SHOT_NUMBER 3.285e-03 2.617e-03 1.256 0.2093
## PERIOD -2.494e-02 1.087e-02 -2.294 0.0218 *
## PLAYER_ID -2.097e-07 1.164e-07 -1.802 0.0716 .

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## SHOT_DIST -6.418e-02 1.308e-03 -49.072 < 2e-16 ***
## CLOSE_DEF_DIST 1.182e-01 4.360e-03 27.102 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68882 on 49999 degrees of freedom
## Residual deviance: 66245 on 49993 degrees of freedom
## AIC: 66259
##
## Number of Fisher Scoring iterations: 4

The Location of the team has a very slight impact on the result of the shot, From the summary it’s is evident as we see that LOCATION is not
significant to predict the result.

barplot(table(data$WIN_LOSE,data$HOME_TEAM),beside = T,legend=c("Lose","Win"),
col=c("#3C6688", "#45A778"), border="white",las=2,main="Team Analysis")

table(data$WIN_LOSE,data$HOME_TEAM)

##
## ATL BKN BOS CHA CHI CLE DAL DEN DET GSW HOU IND LAC LAL MEM
## L 868 753 928 849 931 883 850 838 881 743 857 722 901 808 906
## W 928 771 889 921 918 865 903 896 863 797 821 814 965 699 818
##
## MIA MIL MIN NOP NYK OKC ORL PHI PHX POR SAC SAS TOR UTA WAS

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## L 752 666 816 847 733 737 797 667 897 903 1009 734 893 771 891
## W 715 709 809 802 737 701 839 791 868 904 1030 763 877 790 966

SAC has won the most matches at home,followed by WAS and LAC

remodel=glm(WIN_LOSE~LOCATION + SEC_REMAIN +PLAYER_ID ,data = data, family = "binomial")


summary(remodel)

##
## Call:
## glm(formula = WIN_LOSE ~ LOCATION + SEC_REMAIN + PLAYER_ID, family = "binomial",
## data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.4095 -1.1645 0.9642 1.1070 1.3282
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.673e-02 2.685e-02 -2.113 0.03460 *
## LOCATIONH 5.005e-01 1.806e-02 27.712 < 2e-16 ***
## SEC_REMAIN 1.247e-04 4.360e-05 2.860 0.00424 **
## PLAYER_ID -1.428e-06 1.144e-07 -12.482 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 69312 on 49999 degrees of freedom
## Residual deviance: 68382 on 49996 degrees of freedom
## AIC: 68390
##
## Number of Fisher Scoring iterations: 4

Location and Player_id are very much statistically significant and sec_remain to predict the result of the match, as the p value is less than 0.05

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD

You might also like