You are on page 1of 12

data=read.csv('NBA_sample.

csv') head(data)

## GAME_ID DATE HOME_TEAM AWAY_TEAM PLAYER_NAME PLAYER_ID LOCATION


## 1 21400001 OCT 28, 2014 NOP ORL Omer Asik 201600 H
## 2 21400001 OCT 28, 2014 NOP ORL Omer Asik 201600 H
## 3 21400001 OCT 28, 2014 NOP ORL Omer Asik 201600 H
## 4 21400001 OCT 28, 2014 NOP ORL Jrue Holiday 201950 H
## 5 21400001 OCT 28, 2014 NOP ORL Jrue Holiday 201950 H
## 6 21400001 OCT 28, 2014 NOP ORL Jrue Holiday 201950 H
## WIN_LOSE SHOT_NUMBER PERIOD SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TIME
## 1 W 1 1 607 11.3 0 0.8
## 2 W 7 2 647 22.6 0 0.3
## 3 W 8 2 361 23.1 0 2.0
## 4 W 2 2 424 12.7 11 10.5
## 5 W 3 2 331 11.5 3 4.4
## 6 W 4 2 229 14.0 11 9.8
## SHOT_DIST PTS_TYPE CLOSEST_DEFENDER CLOSEST_DEFENDER_ID CLOSE_DEF_DIST
## 1 3.6 2 Nikola Vucevic 202696 1.7
## 2 1.2 2 Nikola Vucevic 202696 3.6
## 3 2.3 2 Kyle O'Quinn 203124 2.1
## 4 3.6 2 Elfrid Payton 203901 2.0
## 5 20.6 2 Elfrid Payton 203901 4.0
## 6 1.3 2 Nikola Vucevic 202696 3.3
## SUCCESS
## 1 1
## 2 1
## 3 1
## 4 1
## 5 0
## 6 0

Ten rows from the NBA_sample data set

GAME_ID DATE HOME_TEAM AWAY_TEAM PLAYER_NAME PLAYER_ID LOCATION WIN_LOSE SHOT_NUMBER PERIOD SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TI

21400001 OCT NOP ORL Omer Asik 201600 H W 1 1 607 11.3 0


28,
2014

21400001 OCT NOP ORL Omer Asik 201600 H W 7 2 647 22.6 0


28,
2014

21400001 OCT NOP ORL Omer Asik 201600 H W 8 2 361 23.1 0


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 2 2 424 12.7 11 1


28,
2014
GAME_ID DATE HOME_TEAM AWAY_TEAM PLAYER_NAME PLAYER_ID LOCATION WIN_LOSE SHOT_NUMBER PERIOD SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TI

21400001 OCT NOP ORL Jrue Holiday 201950 H W 3 2 331 11.5 3


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 4 2 229 14.0 11


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 6 3 595 10.7 0


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 7 3 528 14.4 10


28,
2014

21400001 OCT NOP ORL Jrue Holiday 201950 H W 11 4 181 20.2 1


28,
2014

21400001 OCT NOP ORL Ryan Anderson 201583 H W 1 1 237 10.1 0


28,
2014

summary(data)

## ## ## ## ## ## ## ##GAME_ID
## ## ## ## ## ## ##
DATEHOME_TEAMAWAY_TEAM
## ## ## ## ## ## ## ## ## ## ## ##
Min. :21400001 1st Length:50000Length:50000Length:50000
Qu.:21400235 Class :characterClass :characterClass :character Mode :characterMode :characterMode :character
Median :21400452
Mean :21400454 3rd Qu.:21400677 Max. :21400908 PLAYER_NAME
Length:50000 Class :character

PLAYER_ID Min.:708 1st Qu.:101162


LOCATION WIN_LOSE
Median :201939 Length:50000 Class :character
Length:50000
Mode :character
Class :character Mode :character
Mean :157509 3rd Qu.:202704 Max. :204060
Mode PERIOD
:character

SHOT_NUMBER Min.: 1.00 SEC_REMAIN SHOT_CLOCK Min. : 0.00


1st Qu.: 3.00 1st Qu.:
Min.:1.000 1st Qu.:1.000Min.: 7.90
0.0 1st Qu.:174.0
Median : 5.00 Median :2.000 MedianMedian
:12.10 :353.0
Mean: 6.48 Mean:2.456 3rd Qu.:3.000
Mean:352.2
Max.:4.0003rd
TOUCH_TIME Mean :12.21
Qu.:531.0 Max.:720.0 3rd Qu.:16.50 Max. :24.00
SHOT_DIST
3rd Qu.: 9.00 Min.: 0.000 PTS_TYPE Min.: 0.0
Max.:37.00 DRIBBLES 1st Qu.: 0.900 Min.:2.0001st
1stQu.:
Qu.:2.000
4.7
Min.: 0.000 Median : 1.600 Median Median
:2.000 :13.4
1st Qu.: 0.000 Mean: 2.799 Mean:2.261
Mean:13.5 3rd 3rd Qu.:3.000
Qu.:22.5
Median : 1.000 3rd Qu.: 3.700
Mean: 2.049
3rd Qu.: 3.000
## ## ## ## ## ## ## ## Max.:32.000Max.:23.900Max.:47.2Max. :3.000
CLOSEST_DEFENDER CLOSEST_DEFENDER_ID CLOSE_DEF_DIST SUCCESS
Length:50000 ClassMin.
:character
: 708 1st Qu.:101187 Min.: 0.000Min. :0.0000 1st Qu.:0.0000
Median :201949 Median :0.0000
1st Qu.: 2.300
Mean :0.4535 3rd Qu.:1.0000 Max. :1.0000
Mean :158788 3rd Qu.:203078 Max. :530027Median : 3.700
Mode :character
Mean: 4.113
3rd Qu.: 5.300
Max.:53.200

## 'data.frame': ## $ GAME_ID 21400001 ... 50000 obs. of 20 variables:


: int 21400001 21400001 21400001 21400001 21400001 21400001 21400001 21400001 21400001

## ## ## ## ## ## ## ## ## ## ## ##$##
DATE
## ## ## ## ## "OCT
## 28, 2014" : chr "OCT 28, 2014" "OCT 28, 2014" "OCT 28, 2014" ... "NOP" "NOP" "NOP" "NOP" ...
$ HOME_TEAM "ORL" "ORL" : chr
"ORL" "ORL" ...
$ AWAY_TEAM "Omer Asik": chr "Omer Asik" "Omer Asik" "Jrue Holiday" ...
$ PLAYER_NAME 201600 201600 : chr 201600 201950 201950 201950 201950 201950 201950 201583 ... "H" "H" "H" "H" ...
$ PLAYER_ID "W" "W" "W" : int"W" ...
$ LOCATION 1 7 8 2 3 4 6: chr
7 11 1 ...
$ WIN_LOSE 1 2 2 2 2 2 3: chr
3 4 1 ...
$ SHOT_NUMBER 607 647 361: int 424 331 229 595 528 181 237 ...
$ PERIOD 11.3 22.6 23.1
: int12.7 11.5 14 10.7 14.4 20.2 10.1 ...
$ SEC_REMAIN 0 0 0 11 3 11: int
0 10 1 0 ...
$ SHOT_CLOCK 0.8 0.3 2 10.5 : num
4.4 9.8 0.9 9.4 1.8 1.3 ...
$ DRIBBLES 3.6 1.2 2.3 3.6
: int20.6 1.3 20.9 16 4 25 ...
$ TOUCH_TIME 2 2 2 2 2 2 2: num 2 2 3 ...
$ SHOT_DIST "Nikola Vucevic"
: num "Nikola Vucevic" "Kyle O'Quinn" "Elfrid Payton" ... 202696 202696 203124 203901 203901 202696 203901 203901 203932 202699 ...
$ PTS_TYPE 1.7 3.6 2.1 2: int
4 3.3 6.7 2.8 5.5 4.2 ...
$ CLOSEST_DEFENDER 1 1 1 1 0 0 0: chr
1 1 0 ...

$ CLOSEST_DEFENDER_ID: int
$ CLOSE_DEF_DIST : num
$ SUCCESS : int

M=cor(data[sapply(data,is.numeric)]) M

## GAME_ID PLAYER_ID SHOT_NUMBER PERIOD


## GAME_ID 1.000000000 0.0255284516 0.007606209 -0.003796700
## PLAYER_ID 0.025528452 1.0000000000 -0.004660915 0.006579311
## SHOT_NUMBER 0.007606209 -0.0046609154 1.000000000 0.646548072
## PERIOD -0.003796700 0.0065793110 0.646548072 1.000000000
## SEC_REMAIN 0.005016585 -0.0067354808 -0.220464233 -0.003534901
## SHOT_CLOCK 0.012335384 0.0339543515 -0.034338761 -0.042171722
## DRIBBLES -0.002495007 0.0230073923 0.135618945 0.051976854
## TOUCH_TIME -0.002803400 0.0014376711 0.141734082 0.042031282
## SHOT_DIST 0.001512646 -0.0235278731 0.011876958 0.027770040
## PTS_TYPE 0.008382307 0.0122510025 0.002240424 0.046885668
## CLOSEST_DEFENDER_ID 0.030784617 -0.0087141881 0.015515154 0.010003636
## CLOSE_DEF_DIST 0.010862689 0.0133460768 -0.035186967 -0.009965387
## SUCCESS -0.009505929 0.0004739182 -0.009933381 -0.017543691
## SEC_REMAIN SHOT_CLOCK DRIBBLES TOUCH_TIME
## GAME_ID 0.005016585 0.012335384 -0.002495007 -0.002803400
## PLAYER_ID -0.006735481 0.033954351 0.023007392 0.001437671
## SHOT_NUMBER -0.220464233 -0.034338761 0.135618945 0.141734082
## PERIOD -0.003534901 -0.042171722 0.051976854 0.042031282
## SEC_REMAIN 1.000000000 0.083547254 -0.119217923 -0.107193663
## SHOT_CLOCK 0.083547254 1.000000000 -0.093604980 -0.152240041
## DRIBBLES -0.119217923 -0.093604980 1.000000000 0.930430405
## TOUCH_TIME -0.107193663 -0.152240041 0.930430405 1.000000000
## SHOT_DIST -0.024088534 -0.186365653 -0.081751017 -0.085041616
## PTS_TYPE -0.048798718 -0.049868136 -0.164934735 -0.181563830
## CLOSEST_DEFENDER_ID -0.010107345 -0.002970448 0.013922407 0.010644263
## CLOSE_DEF_DIST 0.005220037 0.019357294 -0.152893777 -0.167642788
## SUCCESS 0.014607510 0.106144010 -0.035780988 -0.048719861
## SHOT_DIST PTS_TYPE CLOSEST_DEFENDER_ID
## GAME_ID 0.0015126456 0.008382307 0.0307846174
## PLAYER_ID -0.0235278731 0.012251003 -0.0087141881
## SHOT_NUMBER 0.0118769583 0.002240424 0.0155151537
## PERIOD 0.0277700399 0.046885668 0.0100036363
## SEC_REMAIN -0.0240885342 -0.048798718 -0.0101073449
## SHOT_CLOCK -0.1863656530 -0.049868136 -0.0029704478
## DRIBBLES -0.0817510172 -0.164934735 0.0139224075
## TOUCH_TIME -0.0850416163 -0.181563830 0.0106442633
## SHOT_DIST 1.0000000000 0.746107695 0.0004503692
## PTS_TYPE 0.7461076948 1.000000000 0.0054807594
## CLOSEST_DEFENDER_ID 0.0004503692 0.005480759 1.0000000000
## CLOSE_DEF_DIST 0.5250464243 0.418037003 -0.0179132936
## SUCCESS -0.1905261604 -0.123084899 0.0016405956
## CLOSE_DEF_DIST SUCCESS
## GAME_ID 0.010862689 -0.0095059285
## PLAYER_ID 0.013346077 0.0004739182
## SHOT_NUMBER -0.035186967 -0.0099333813
## PERIOD -0.009965387 -0.0175436905
## SEC_REMAIN 0.005220037 0.0146075099
## SHOT_CLOCK 0.019357294 0.1061440103
## DRIBBLES -0.152893777 -0.0357809878
## TOUCH_TIME -0.167642788 -0.0487198609
## SHOT_DIST 0.525046424 -0.1905261604
## PTS_TYPE 0.418037003 -0.1230848986
## CLOSEST_DEFENDER_ID -0.017913294 0.0016405956
## CLOSE_DEF_DIST 1.000000000 0.0017080389
## SUCCESS 0.001708039 1.0000000000

library(DataExplorer)
plot_missing(data)
library(corrplot)

## corrplot 0.90 loaded

corrplot(M)
library(PerformanceAnalytics)

## Loading required package: xts

## Loading required package: zoo

##
## Attaching package: 'zoo'

## The following objects are masked from 'package:base': ##


##as.Date, as.Date.numeric

##
## Attaching package: 'PerformanceAnalytics'

## The following object is masked from 'package:graphics': ##


##legend

# showing histogram # color grey


chart.Correlation(data[sapply(data,is.numeric)],histogram=TRUE, col="grey10",
pch=1, cex.cor.scale=2,
main="Correlation Plot", cex.labels=20)

#removing highly correlated Values


data=subset(data,select=-c(DRIBBLES))

data$SUCCESS = as.factor(data$SUCCESS)

library(ggplot2) library(readr)
library(repr)

options(repr.plot.width=6, repr.plot.height=3.5)

ggplot(data, aes(x=SHOT_DIST, color=SUCCESS, group=SUCCESS)) + geom_density() +


xlab("Shot Distance") + ylab("") + theme_light()
# CLOSE_DEF_DIST
library(dplyr)

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:xts': ##


##first, last

## The following objects are masked from 'package:stats': ##


##filter, lag

## The following objects are masked from 'package:base': ##


##intersect, setdiff, setequal, union

summary(data$CLOSE_DEF_DIST)

##Min. 1st Qu. MedianMean 3rd Qu.Max. ##0.0002.3003.7004.1135.300 53.200


options(repr.plot.width=6, repr.plot.height=3.5)

ggplot(data, aes(x=CLOSE_DEF_DIST, color=SUCCESS, group=SUCCESS)) + geom_density() +


xlab("CLOSE_DEF_DIST") + ylab("") + theme_light()

# CTOUCH_TIME
library(dplyr) summary(data$TOUCH_TIME)

##Min. 1st Qu. MedianMean 3rd Qu.Max. ##0.0000.9001.6002.7993.700 23.900

options(repr.plot.width=6, repr.plot.height=3.5)

ggplot(data, aes(x=TOUCH_TIME, color=SUCCESS, group=SUCCESS)) + geom_density() +


xlab("TOUCH_TIME") + ylab("") + theme_light()
data$HOME_TEAM = as.factor(data$HOME_TEAM) data$AWAY_TEAM = as.factor(data$AWAY_TEAM) data$LOCATION = as.factor(data$LOCATION) data$WIN_LOSE = as.factor(data$WIN_LOSE)

model <- glm(SUCCESS ~ LOCATION + SHOT_NUMBER + PERIOD + PLAYER_ID +SHOT_DIST +CLOSE_DEF_DIST, data = data, famil
y = "binomial")

summary(model)

##
## Call:
## glm(formula = SUCCESS ~ LOCATION + SHOT_NUMBER + PERIOD + PLAYER_ID +
##SHOT_DIST + CLOSE_DEF_DIST, family = "binomial", data = data) ##
## Deviance Residuals:
## Min 1Q Median 3Q Max ## -2.0455 -1.0934 -0.8205 1.1522 2.0879 ##
## Coefficients:

## Estimate Std. Error z value Pr(>|z|)


## (Intercept) 2.395e-01 3.407e-02 7.030 2.07e-12 ***
## LOCATIONH 3.539e-02 1.845e-02 1.918 0.0551 .
## SHOT_NUMBER 3.285e-03 2.617e-03 1.256 0.2093
## PERIOD -2.494e-02 1.087e-02 -2.294 0.0218 *
## PLAYER_ID -2.097e-07 1.164e-07 -1.802 0.0716 .
## SHOT_DIST -6.418e-02 1.308e-03 -49.072 < 2e-16 ***
## CLOSE_DEF_DIST 1.182e-01 4.360e-03 27.102 < 2e-16 *** ## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##
## (Dispersion parameter for binomial family taken to be 1) ##
## Null deviance: 68882 on 49999 degrees of freedom ## Residual deviance: 66245 on 49993 degrees of freedom ## AIC: 66259
##
## Number of Fisher Scoring iterations: 4

The Location of the team has a very slight impact on the result of the shot, From the summary it’s is evident as we see that LOCATION is not
significant to predict the result.
barplot(table(data$WIN_LOSE,data$HOME_TEAM),beside = T,legend=c("Lose","Win"), col=c("#3C6688", "#45A778"), border="white",las=2,main="Team Analysis")

table(data$WIN_LOSE,data$HOME_TEAM)

##
## ATL BKN BOS CHA CHI CLE DAL DEN DET GSW HOU IND LAC LAL MEM
## L 868 753 928 849 931 883 850 838 881 743 857 722 901 808 906
## W 928 771 889 921 918 865 903 896 863 797 821 814 965 699 818
##
## MIA MIL MIN NOP NYK OKC ORL PHI PHX POR SAC SAS TOR UTA WAS
##L 752 666 816 847 733 737 797 667 897 903 1009 734 893 771 891
##W 715 709 809 802 737 701 839 791 868 904 1030 763 877 790 966

SAC has won the most matches at home,followed by WAS and LAC
remodel=glm(WIN_LOSE~LOCATION + SEC_REMAIN +PLAYER_ID ,data = data, family = "binomial") summary(remodel)

##
## Call:
## glm(formula = WIN_LOSE ~ LOCATION + SEC_REMAIN + PLAYER_ID, family = "binomial", ##data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max ## -1.4095 -1.1645 0.9642 1.1070 1.3282 ##
## Coefficients:
##Estimate Std. Error z value Pr(>|z|) ## (Intercept) -5.673e-02

2.685e-02 -2.113 0.03460 *


## LOCATIONH ## SEC_REMAIN ## PLAYER_ID
5.005e-01
## --- 1.806e-02 27.712 < 2e-16 *** 0.00424 **
1.247e-04 4.360e-052.860 < 2e-16 ***
-1.428e-06 1.144e-07 -12.482

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##
## (Dispersion parameter for binomial family taken to be 1) ##
## Null deviance: 69312 on 49999 degrees of freedom ## Residual deviance: 68382 on 49996 degrees of freedom ## AIC: 68390
##
## Number of Fisher Scoring iterations: 4

Location and Player_id are very much statistically significant and sec_remain to predict the result of the match, as the p value is less than 0.05

You might also like