Professional Documents
Culture Documents
Final Project cst383
Final Project cst383
ipynb - Colaboratory
About:
In this project we aim to determine the value and effect a player trade has on a desired team. The
data is collected from the 2018-2019 & 2019-2020 NBA Seasons and provided by sports-
reference.com.
Goals:
Explore the relation between player effectiveness and team win percentage
Observe any possible trends between the top rated players and overall team performance
Dataset:
Name:
2018-2019 NBA Team/Opponent Per Game Stats, Player Stats, Misc. Stats
2019-2020 NBA Team/Opponent Per Game Stats, Player Stats, Misc. Stats
Owner:
sports-reference.com
Source Link:
Aquired:
import numpy as np
import pandas as pd
import io
import matplotlib.pyplot as plt
from scipy.stats import zscore
from matplotlib import rcParams
import seaborn as sns
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 1/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
DATA PREPARATION:
Load various datasets into speci c DataFrame which will be a good starting point to explore and
manipulate our desired data for later operations. Each dataset is then printed to ensure we have
loaded and stored them properly.
# Display DataFrames
print("NBA 2018-2019 Team Game Stats:\n\n",df_18_19_team)
print("NBA 2019-2020 Team Game Stats:\n\n",df_19_20_team)
print("NBA 2018-2019 Opponent Game Stats:\n\n",df_18_19_opponent)
print("NBA 2019-2020 Opponent Game Stats:\n\n",df_19_20_opponent)
print("NBA 2018-2019 Player Stats:\n\n",df_18_19_player)
print("NBA 2019-2020 Player Stats:\n\n",df_19_20_player)
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 3/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
Display all DataFrame info to gain a more detailed look into what our datatypes and the amount of
non-null entries are in each row/column.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rk 30 non-null float64
1 Team 31 non-null object
2 G 31 non-null int64
3 MP 31 non-null float64
4 FG 31 non-null float64
5 FGA 31 non-null float64
6 FG% 31 non-null float64
7 3P 31 non-null float64
8 3PA 31 non-null float64
9 3P% 31 non-null float64
10 2P 31 non-null float64
11 2PA 31 non-null float64
12 2P% 31 non-null float64
13 FT 31 non-null float64
14 FTA 31 non-null float64
15 FT% 31 non-null float64
16 ORB 31 non-null float64
17 DRB 31 non-null float64
18 TRB 31 non-null float64
19 AST 31 non-null float64
20 STL 31 non-null float64
21 BLK 31 non-null float64
22 TOV 31 non-null float64
23 PF 31 non-null float64
24 PTS 31 non-null float64
dtypes: float64(23), int64(1), object(1)
memory usage: 6.2+ KB
NBA 2018-2019 Team Game Stats:
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rk 30 non-null float64
1 Team 31 non-null object
2 G 31 non-null int64
3 MP 31 non-null float64
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 4/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
4 FG 31 non-null float64
5 FGA 31 non-null float64
6 FG% 31 non-null float64
7 3P 31 non-null float64
8 3PA 31 non-null float64
9 3P% 31 non-null float64
10 2P 31 non-null float64
11 2PA 31 non-null float64
12 2P% 31 non-null float64
13 FT 31 non-null float64
14 FTA 31 non-null float64
15 FT% 31 non-null float64
16 ORB 31 non-null float64
17 DRB 31 non-null float64
18 TRB 31 non-null float64
19 AST 31 non-null float64
Clean up data by eliminating all null values and by determining if there are values of similar entries,
then eleminating them.
# Clean up null
team18_19_barplot = df_18_19_team.dtypes.value_counts().plot.bar(color="green")
plt.title("18-19 Team Number of columns by data types")
plt.xlabel("Datatypes")
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(team18_19_barplot)
team19_20_barplot = df_19_20_team.dtypes.value_counts().plot.bar(color="green")
plt.title("19-20 Team Number of columns by data types")
plt.xlabel("Datatypes")
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(team19_20_barplot)
opponent18_19_barplot = df_18_19_opponent.dtypes.value_counts().plot.bar(color="green")
plt.title("18-19 Opponent Number of columns by data types")
plt.xlabel("Datatypes")
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(opponent18_19_barplot)
opponent19_20_barplot = df_19_20_opponent.dtypes.value_counts().plot.bar(color="green")
plt.title("19-20 Opponent Number of columns by data types")
plt.xlabel("Datatypes")
( )
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 5/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(opponent19_20_barplot)
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 6/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
[8 rows x 24 columns]
NBA 2019-2020 Team Game Stats:
Rk G MP FG FGA FG% ... AST STL BLK TOV PF PTS
count 30.0 31.0 31.0 31.0 31.0 31.0 ... 31.0 31.0 31.0 31.0 31.0 31.0
mean 16.0 71.0 242.0 41.0 89.0 0.0 ... 24.0 8.0 5.0 15.0 21.0 112.0
std 9.0 3.0 1.0 1.0 2.0 0.0 ... 2.0 1.0 1.0 1.0 1.0 4.0
min 1.0 64.0 241.0 37.0 84.0 0.0 ... 21.0 6.0 3.0 13.0 18.0 103.0
25% 8.0 69.0 241.0 40.0 88.0 0.0 ... 23.0 7.0 4.0 14.0 20.0 110.0
50% 16.0 72.0 242.0 41.0 89.0 0.0 ... 24.0 8.0 5.0 15.0 21.0 112.0
75% 23.0 73.0 242.0 42.0 90.0 0.0 ... 26.0 8.0 5.0 15.0 22.0 114.0
max 30.0 75.0 243.0 43.0 92.0 0.0 ... 27.0 10.0 7.0 16.0 23.0 119.0
[8 rows x 24 columns]
NBA 2018-2019 Opponent Game Stats:
Rk G MP FG FGA FG% ... AST STL BLK TOV PF PTS
count 30.0 31.0 31.0 31.0 31.0 31.0 ... 31.0 31.0 31.0 31.0 31.0 31.0
mean 16.0 82.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 111.0
std 9.0 0.0 1.0 1.0 2.0 0.0 ... 1.0 1.0 1.0 1.0 1.0 4.0
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 7/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
min 1.0 82.0 240.0 38.0 83.0 0.0 ... 22.0 7.0 4.0 12.0 19.0 105.0
25% 8.0 82.0 241.0 40.0 88.0 0.0 ... 24.0 7.0 5.0 13.0 20.0 108.0
50% 16.0 82.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 111.0
75% 23.0 82.0 242.0 42.0 91.0 0.0 ... 26.0 8.0 5.0 15.0 22.0 114.0
max 30.0 82.0 244.0 43.0 94.0 0.0 ... 27.0 10.0 6.0 17.0 24.0 119.0
[8 rows x 24 columns]
NBA 2019-2020 Opponent Game Stats:
Rk G MP FG FGA FG% ... AST STL BLK TOV PF PTS
count 30.0 31.0 31.0 31.0 31.0 31.0 ... 31.0 31.0 31.0 31.0 31.0 31.0
mean 16.0 71.0 242.0 41.0 89.0 0.0 ... 24.0 8.0 5.0 15.0 21.0 112.0
std 9.0 3.0 1.0 2.0 2.0 0.0 ... 1.0 1.0 1.0 1.0 1.0 4.0
min 1.0 64.0 241.0 38.0 82.0 0.0 ... 22.0 7.0 4.0 12.0 19.0 106.0
25% 8.0 69.0 241.0 40.0 87.0 0.0 ... 24.0 7.0 4.0 14.0 20.0 109.0
50% 16.0 72.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 112.0
75% 23.0 73.0 242.0 42.0 90.0 0.0 ... 25.0 8.0 5.0 15.0 21.0 115.0
max 30.0 75.0 243.0 44.0 94.0 0.0 ... 27.0 9.0 6.0 18.0 23.0 120.0
[8 rows x 24 columns]
NBA 2018-2019 Player Stats:
Rk Age G GS MP ... STL BLK TOV PF PTS
count 708.0 708.0 708.0 708.0 708.0 ... 708.0 708.0 708.0 708.0 708.0
mean 268.0 26.0 43.0 20.0 19.0 ... 1.0 0.0 1.0 2.0 8.0
std 151.0 4.0 26.0 26.0 9.0 ... 0.0 0.0 1.0 1.0 6.0
min 1.0 19.0 1.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0
25% 138.0 23.0 19.0 0.0 12.0 ... 0.0 0.0 0.0 1.0 4.0
50% 270.0 26.0 44.0 6.0 19.0 ... 0.0 0.0 1.0 2.0 7.0
75% 398.0 29.0 68.0 32.0 27.0 ... 1.0 0.0 1.0 2.0 12.0
max 530.0 42.0 82.0 82.0 37.0 ... 2.0 3.0 5.0 4.0 36.0
Further investigation to determine which teams and player are outliers on the entire league average.
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 9/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 10/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 11/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
Number of Entries
At rst glance, each entry seems to represent a unique value. This is true for the team and opponent
game stats, but this is not true for the player stats. Upon further investigation, there are clearly
several entires with the same player name.
Phoenix Suns 1
Oklahoma City Thunder* 1
Brooklyn Nets* 1
Name: Team, dtype: int64
Phoenix Suns 1
Toronto Raptors* 1
Brooklyn Nets* 1
Name: Team, dtype: int64
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 12/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
30.0 1
29.0 1
2.0 1
Name: Rk, dtype: int64
30.0 1
29.0 1
2.0 1
Name: Rk, dtype: int64
Jason Smith\smithja02 4
Isaiah Canaan\canaais01 4
Andrew Harrison\harrian01 4
Name: Player, dtype: int64
Anthony Tolliver\tollian01 4
Jordan McRae\mcraejo01 4
Troy Daniels\danietr01 3
Name: Player, dtype: int64
1 444
3 80
4 6
Name: Player, dtype: int64
1 469
3 58
4 2
Name: Player, dtype: int64
reports = df_18_19_player['Player'].value_counts()
reportPlot = sns.countplot(reports)
plt.title("Number of duplicated players")
plt.xlabel("# Of Players")
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 13/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
plt.xlabel( # Of Players )
plt.show(reportPlot)
reports = df_19_20_player['Player'].value_counts()
reportPlot = sns.countplot(reports)
plt.title("Number of duplicated players")
plt.xlabel("# Of Players")
plt.show(reportPlot)
Data Cleaning:
Predict the value of a trade or player aquisition using the team points totals and the player point
averages.
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 15/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 16/16