You are on page 1of 16

2/20/2021 Final_Project_CST383.

ipynb - Colaboratory

NBA TRADE PREDICTOR

CST 383 - FINAL PROJECT


BY: Deen Altawil & Jordan Cruz

About:
In this project we aim to determine the value and effect a player trade has on a desired team. The
data is collected from the 2018-2019 & 2019-2020 NBA Seasons and provided by sports-
reference.com.

Goals:
Explore the relation between player effectiveness and team win percentage
Observe any possible trends between the top rated players and overall team performance

Dataset:
Name:

2018-2019 NBA Team/Opponent Per Game Stats, Player Stats, Misc. Stats
2019-2020 NBA Team/Opponent Per Game Stats, Player Stats, Misc. Stats

Owner:

sports-reference.com

Source Link:

2019-2020 Season, 2018-2019 Season

Aquired:

February 10, 2021 from basketball-reference.com

import numpy as np
import pandas as pd
import io
import matplotlib.pyplot as plt
from scipy.stats import zscore
from matplotlib import rcParams
import seaborn as sns

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 1/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

DATA PREPARATION:

Upload dataset CSV les from drive.

from google.colab import files


uploaded = files.upload()

Choose Files 8 files


NBA_2018_209_Misc_Stat.csv(application/vnd.ms-excel) - 5446 bytes, last modified: 2/19/2021 -
100% done
NBA_2018_2019_Opponent_Per_Game_Stat.csv(application/vnd.ms-excel) - 4245 bytes, last
modified: 2/19/2021 - 100% done
NBA_2018_2019_Player_Stats.csv(application/vnd.ms-excel) - 98857 bytes, last modified:
2/19/2021 - 100% done
NBA_2018_2019_Team_Per_Game_Stats.csv(application/vnd.ms-excel) - 4227 bytes, last
modified: 2/19/2021 - 100% done
NBA_2019_2020_Misc_Stat.csv(application/vnd.ms-excel) - 5458 bytes, last modified: 2/19/2021
- 100% done
NBA_2019_2020_Opponent_Per_Game_Stat.csv(application/vnd.ms-excel) - 4238 bytes, last
modified: 2/19/2021 - 100% done
NBA_2019_2020_Player_Stats.csv(application/vnd.ms-excel) - 91422 bytes, last modified:
2/19/2021 - 100% done
NBA_2019_2020_Team_Per_Game_Stat.csv(application/vnd.ms-excel) - 4255 bytes, last
modified: 2/19/2021 - 100% done
Saving NBA 2018 209 Misc Stat csv to NBA 2018 209 Misc Stat csv

Load various datasets into speci c DataFrame which will be a good starting point to explore and
manipulate our desired data for later operations. Each dataset is then printed to ensure we have
loaded and stored them properly.

# Load CSV data files into DataFrames


df_18_19_team = pd.read_csv(io.BytesIO(uploaded['NBA_2018_2019_Team_Per_Game_Stats.csv']))
df_19_20_team = pd.read_csv(io.BytesIO(uploaded['NBA_2019_2020_Team_Per_Game_Stat.csv']))
df_18_19_opponent = pd.read_csv(io.BytesIO(uploaded['NBA_2018_2019_Opponent_Per_Game_Stat.csv
df_19_20_opponent =pd.read_csv(io.BytesIO(uploaded['NBA_2019_2020_Opponent_Per_Game_Stat.csv'
df_18_19_player = pd.read_csv(io.BytesIO(uploaded['NBA_2018_2019_Player_Stats.csv']))
df_19_20_player = pd.read_csv(io.BytesIO(uploaded['NBA_2019_2020_Player_Stats.csv']))

# Display DataFrames
print("NBA 2018-2019 Team Game Stats:\n\n",df_18_19_team)
print("NBA 2019-2020 Team Game Stats:\n\n",df_19_20_team)
print("NBA 2018-2019 Opponent Game Stats:\n\n",df_18_19_opponent)
print("NBA 2019-2020 Opponent Game Stats:\n\n",df_19_20_opponent)
print("NBA 2018-2019 Player Stats:\n\n",df_18_19_player)
print("NBA 2019-2020 Player Stats:\n\n",df_19_20_player)

NBA 2018-2019 Team Game Stats:


https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 2/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

Rk Team G MP FG ... STL BLK TOV PF PTS


0 1.0 Milwaukee Bucks* 82 241.2 43.4 ... 7.5 5.9 13.9 19.6 118.1
1 2.0 Golden State Warriors* 82 241.5 44.0 ... 7.6 6.4 14.3 21.4 117.7
2 3.0 New Orleans Pelicans 82 240.9 43.7 ... 7.4 5.4 14.8 21.1 115.4
3 4.0 Philadelphia 76ers* 82 241.5 41.5 ... 7.4 5.3 14.9 21.3 115.2
4 5.0 Los Angeles Clippers* 82 241.8 41.3 ... 6.8 4.7 14.5 23.3 115.1
5 6.0 Portland Trail Blazers* 82 242.1 42.3 ... 6.7 5.0 13.8 20.4 114.7
6 7.0 Oklahoma City Thunder* 82 242.1 42.6 ... 9.3 5.2 14.0 22.4 114.5
7 8.0 Toronto Raptors* 82 242.4 42.2 ... 8.3 5.3 14.0 21.0 114.4
8 9.0 Sacramento Kings 82 240.6 43.2 ... 8.3 4.4 13.4 21.4 114.2
9 10.0 Washington Wizards 82 243.0 42.1 ... 8.3 4.6 14.1 20.7 114.0
10 11.0 Houston Rockets* 82 241.8 39.2 ... 8.5 4.9 13.3 22.0 113.9
11 12.0 Atlanta Hawks 82 242.1 41.4 ... 8.2 5.1 17.0 23.6 113.3
12 13.0 Minnesota Timberwolves 82 241.8 41.6 ... 8.3 5.0 13.1 20.3 112.5
13 14.0 Boston Celtics* 82 241.2 42.1 ... 8.6 5.3 12.8 20.4 112.4
14 15.0 Brooklyn Nets* 82 243.7 40.3 ... 6.6 4.1 15.1 21.5 112.2
15 16.0 Los Angeles Lakers 82 241.2 42.6 ... 7.5 5.4 15.7 20.7 111.8
16 17.0 Utah Jazz* 82 240.9 40.4 ... 8.1 5.9 15.1 21.1 111.7
17 18.0 San Antonio Spurs* 82 241.5 42.3 ... 6.1 4.7 12.1 18.1 111.7
18 19.0 Charlotte Hornets 82 241.8 40.2 ... 7.2 4.9 12.2 18.9 110.7
19 20.0 Denver Nuggets* 82 240.6 41.9 ... 7.7 4.4 13.4 20.0 110.7
20 21.0 Dallas Mavericks 82 241.2 38.8 ... 6.5 4.3 14.2 20.1 108.9
21 22.0 Indiana Pacers* 82 240.3 41.3 ... 8.7 4.9 13.7 19.4 108.0
22 23.0 Phoenix Suns 82 242.4 40.1 ... 9.0 5.1 15.6 23.6 107.5
23 24.0 Orlando Magic* 82 241.2 40.4 ... 6.6 5.4 13.2 18.6 107.3
24 25.0 Detroit Pistons* 82 242.1 38.8 ... 6.9 4.0 13.8 22.1 107.0
25 26.0 Miami Heat 82 240.6 39.6 ... 7.6 5.5 14.7 20.9 105.7
26 27.0 Chicago Bulls 82 242.7 39.8 ... 7.4 4.3 14.1 20.3 104.9
27 28.0 New York Knicks 82 241.2 38.2 ... 6.8 5.1 14.0 20.9 104.6
28 29.0 Cleveland Cavaliers 82 240.9 38.9 ... 6.5 2.4 13.5 20.0 104.5
29 30.0 Memphis Grizzlies 82 242.4 38.0 ... 8.3 5.5 14.0 22.0 103.5
30 NaN League Average 82 241.6 41.1 ... 7.6 5.0 14.1 20.9 111.2

[31 rows x 25 columns]


NBA 2019-2020 Team Game Stats:

Rk Team G MP ... BLK TOV PF PTS


0 1.0 Dallas Mavericks* 75 242.3 ... 4.8 12.7 19.5 117.0
1 2.0 Milwaukee Bucks* 73 241.0 ... 5.9 15.1 19.6 118.7
2 3.0 Portland Trail Blazers* 74 241.0 ... 6.1 12.8 21.7 115.0
3 4.0 Houston Rockets* 72 241.4 ... 5.2 14.7 21.8 117.8
4 5.0 Los Angeles Clippers* 72 241.4 ... 4.7 14.6 22.1 116.3
5 6.0 New Orleans Pelicans 72 242.1 ... 5.0 16.4 21.2 115.8
6 7.0 Phoenix Suns 73 241.0 ... 4.0 14.8 22.0 113.6
7 8.0 Washington Wizards 72 241.0 ... 4.3 14.2 22.7 114.4
8 9.0 Memphis Grizzlies 73 240.7 ... 5.5 15.2 21.2 112.6
9 10.0 Boston Celtics* 72 242.1 ... 5.6 13.8 21.6 113.7
10 11.0 Miami Heat* 73 243.1 ... 4.5 14.9 20.6 112.0
11 12.0 Denver Nuggets* 73 243.1 ... 4.6 13.8 20.3 111.3
12 13.0 Toronto Raptors* 72 241.4 ... 5.0 14.8 21.7 112.8
13 14.0 San Antonio Spurs 71 242.5 ... 5.5 12.6 19.4 114.1
14 15.0 Philadelphia 76ers* 73 241.0 ... 5.3 14.2 20.9 110.7
15 16.0 Los Angeles Lakers* 71 240.7 ... 6.6 15.2 20.7 113.4
16 17.0 Brooklyn Nets* 72 242.8 ... 4.5 15.3 21.0 111.8
17 18.0 Utah Jazz* 72 241.0 ... 4.1 15.1 20.4 111.3
18 19.0 Indiana Pacers* 73 241.4 ... 5.2 13.2 19.8 109.4

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 3/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

19 20 0 Oklahoma City Thunder* 72 242 1 4 9 13 7 19 3 110 4

Display all DataFrame info to gain a more detailed look into what our datatypes and the amount of
non-null entries are in each row/column.

print("NBA 2018-2019 Team Game Stats:\n",df_18_19_team.info())


print("NBA 2019-2020 Team Game Stats:\n",df_19_20_team.info())
print("NBA 2018-2019 Opponent Game Stats:\n",df_18_19_opponent.info())
print("NBA 2019-2020 Opponent Game Stats:\n",df_19_20_opponent.info())
print("NBA 2018-2019 Player Stats:\n",df_18_19_player.info())
print("NBA 2019-2020 Player Stats:\n",df_19_20_player.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rk 30 non-null float64
1 Team 31 non-null object
2 G 31 non-null int64
3 MP 31 non-null float64
4 FG 31 non-null float64
5 FGA 31 non-null float64
6 FG% 31 non-null float64
7 3P 31 non-null float64
8 3PA 31 non-null float64
9 3P% 31 non-null float64
10 2P 31 non-null float64
11 2PA 31 non-null float64
12 2P% 31 non-null float64
13 FT 31 non-null float64
14 FTA 31 non-null float64
15 FT% 31 non-null float64
16 ORB 31 non-null float64
17 DRB 31 non-null float64
18 TRB 31 non-null float64
19 AST 31 non-null float64
20 STL 31 non-null float64
21 BLK 31 non-null float64
22 TOV 31 non-null float64
23 PF 31 non-null float64
24 PTS 31 non-null float64
dtypes: float64(23), int64(1), object(1)
memory usage: 6.2+ KB
NBA 2018-2019 Team Game Stats:
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31 entries, 0 to 30
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rk 30 non-null float64
1 Team 31 non-null object
2 G 31 non-null int64
3 MP 31 non-null float64
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 4/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

4 FG 31 non-null float64
5 FGA 31 non-null float64
6 FG% 31 non-null float64
7 3P 31 non-null float64
8 3PA 31 non-null float64
9 3P% 31 non-null float64
10 2P 31 non-null float64
11 2PA 31 non-null float64
12 2P% 31 non-null float64
13 FT 31 non-null float64
14 FTA 31 non-null float64
15 FT% 31 non-null float64
16 ORB 31 non-null float64
17 DRB 31 non-null float64
18 TRB 31 non-null float64
19 AST 31 non-null float64

Clean up data by eliminating all null values and by determining if there are values of similar entries,
then eleminating them.

# Clean up null

# Check for duplicate values

Barplot to display the datatypes in each dataset.

team18_19_barplot = df_18_19_team.dtypes.value_counts().plot.bar(color="green")
plt.title("18-19 Team Number of columns by data types")
plt.xlabel("Datatypes")
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(team18_19_barplot)

team19_20_barplot = df_19_20_team.dtypes.value_counts().plot.bar(color="green")
plt.title("19-20 Team Number of columns by data types")
plt.xlabel("Datatypes")
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(team19_20_barplot)

opponent18_19_barplot = df_18_19_opponent.dtypes.value_counts().plot.bar(color="green")
plt.title("18-19 Opponent Number of columns by data types")
plt.xlabel("Datatypes")
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(opponent18_19_barplot)

opponent19_20_barplot = df_19_20_opponent.dtypes.value_counts().plot.bar(color="green")
plt.title("19-20 Opponent Number of columns by data types")
plt.xlabel("Datatypes")
( )
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 5/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
plt.ylabel("Number of columns")
plt.xticks(rotation=-45)
plt.show(opponent19_20_barplot)

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 6/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

print("NBA 2018-2019 Team Game Stats:\n",df_18_19_team.describe().round())


print("NBA 2019-2020 Team Game Stats:\n",df_19_20_team.describe().round())
print("NBA 2018-2019 Opponent Game Stats:\n",df_18_19_opponent.describe().round())
print("NBA 2019-2020 Opponent Game Stats:\n",df_19_20_opponent.describe().round())
print("NBA 2018-2019 Player Stats:\n",df_18_19_player.describe().round())
print("NBA 2019-2020 Player Stats:\n",df_19_20_player.describe().round())

NBA 2018-2019 Team Game Stats:


Rk G MP FG FGA FG% ... AST STL BLK TOV PF PTS
count 30.0 31.0 31.0 31.0 31.0 31.0 ... 31.0 31.0 31.0 31.0 31.0 31.0
mean 16.0 82.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 111.0
std 9.0 0.0 1.0 2.0 2.0 0.0 ... 2.0 1.0 1.0 1.0 1.0 4.0
min 1.0 82.0 240.0 38.0 84.0 0.0 ... 20.0 6.0 2.0 12.0 18.0 104.0
25% 8.0 82.0 241.0 40.0 88.0 0.0 ... 23.0 7.0 5.0 13.0 20.0 108.0
50% 16.0 82.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 112.0
75% 23.0 82.0 242.0 42.0 90.0 0.0 ... 26.0 8.0 5.0 15.0 21.0 114.0
max 30.0 82.0 244.0 44.0 94.0 0.0 ... 29.0 9.0 6.0 17.0 24.0 118.0

[8 rows x 24 columns]
NBA 2019-2020 Team Game Stats:
Rk G MP FG FGA FG% ... AST STL BLK TOV PF PTS
count 30.0 31.0 31.0 31.0 31.0 31.0 ... 31.0 31.0 31.0 31.0 31.0 31.0
mean 16.0 71.0 242.0 41.0 89.0 0.0 ... 24.0 8.0 5.0 15.0 21.0 112.0
std 9.0 3.0 1.0 1.0 2.0 0.0 ... 2.0 1.0 1.0 1.0 1.0 4.0
min 1.0 64.0 241.0 37.0 84.0 0.0 ... 21.0 6.0 3.0 13.0 18.0 103.0
25% 8.0 69.0 241.0 40.0 88.0 0.0 ... 23.0 7.0 4.0 14.0 20.0 110.0
50% 16.0 72.0 242.0 41.0 89.0 0.0 ... 24.0 8.0 5.0 15.0 21.0 112.0
75% 23.0 73.0 242.0 42.0 90.0 0.0 ... 26.0 8.0 5.0 15.0 22.0 114.0
max 30.0 75.0 243.0 43.0 92.0 0.0 ... 27.0 10.0 7.0 16.0 23.0 119.0

[8 rows x 24 columns]
NBA 2018-2019 Opponent Game Stats:
Rk G MP FG FGA FG% ... AST STL BLK TOV PF PTS
count 30.0 31.0 31.0 31.0 31.0 31.0 ... 31.0 31.0 31.0 31.0 31.0 31.0
mean 16.0 82.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 111.0
std 9.0 0.0 1.0 1.0 2.0 0.0 ... 1.0 1.0 1.0 1.0 1.0 4.0

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 7/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

min 1.0 82.0 240.0 38.0 83.0 0.0 ... 22.0 7.0 4.0 12.0 19.0 105.0
25% 8.0 82.0 241.0 40.0 88.0 0.0 ... 24.0 7.0 5.0 13.0 20.0 108.0
50% 16.0 82.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 111.0
75% 23.0 82.0 242.0 42.0 91.0 0.0 ... 26.0 8.0 5.0 15.0 22.0 114.0
max 30.0 82.0 244.0 43.0 94.0 0.0 ... 27.0 10.0 6.0 17.0 24.0 119.0

[8 rows x 24 columns]
NBA 2019-2020 Opponent Game Stats:
Rk G MP FG FGA FG% ... AST STL BLK TOV PF PTS
count 30.0 31.0 31.0 31.0 31.0 31.0 ... 31.0 31.0 31.0 31.0 31.0 31.0
mean 16.0 71.0 242.0 41.0 89.0 0.0 ... 24.0 8.0 5.0 15.0 21.0 112.0
std 9.0 3.0 1.0 2.0 2.0 0.0 ... 1.0 1.0 1.0 1.0 1.0 4.0
min 1.0 64.0 241.0 38.0 82.0 0.0 ... 22.0 7.0 4.0 12.0 19.0 106.0
25% 8.0 69.0 241.0 40.0 87.0 0.0 ... 24.0 7.0 4.0 14.0 20.0 109.0
50% 16.0 72.0 242.0 41.0 89.0 0.0 ... 25.0 8.0 5.0 14.0 21.0 112.0
75% 23.0 73.0 242.0 42.0 90.0 0.0 ... 25.0 8.0 5.0 15.0 21.0 115.0
max 30.0 75.0 243.0 44.0 94.0 0.0 ... 27.0 9.0 6.0 18.0 23.0 120.0

[8 rows x 24 columns]
NBA 2018-2019 Player Stats:
Rk Age G GS MP ... STL BLK TOV PF PTS
count 708.0 708.0 708.0 708.0 708.0 ... 708.0 708.0 708.0 708.0 708.0
mean 268.0 26.0 43.0 20.0 19.0 ... 1.0 0.0 1.0 2.0 8.0
std 151.0 4.0 26.0 26.0 9.0 ... 0.0 0.0 1.0 1.0 6.0
min 1.0 19.0 1.0 0.0 1.0 ... 0.0 0.0 0.0 0.0 0.0
25% 138.0 23.0 19.0 0.0 12.0 ... 0.0 0.0 0.0 1.0 4.0
50% 270.0 26.0 44.0 6.0 19.0 ... 0.0 0.0 1.0 2.0 7.0
75% 398.0 29.0 68.0 32.0 27.0 ... 1.0 0.0 1.0 2.0 12.0
max 530.0 42.0 82.0 82.0 37.0 ... 2.0 3.0 5.0 4.0 36.0

Further investigation to determine which teams and player are outliers on the entire league average.

data = df_18_19_team[df_18_19_team['PTS'] > 119 | (df_18_19_team['PTS'] < 105)]


plt.figure(figsize=(10,8))
gg = sns.scatterplot(x=data["PTS"], y=data['Team'])
plt.title("Points scored per team 2018-2019")
plt.show(gg)

data = df_19_20_team[df_19_20_team['PTS'] > 120 | (df_19_20_team['PTS'] < 104)]


plt.figure(figsize=(10,8))
gg = sns.scatterplot(x=data["PTS"], y=data['Team'])
plt.title("Points scored per team 2019-2020")
plt.show(gg)

data = df_18_19_opponent[df_18_19_opponent['PTS'] > 120 | (df_18_19_opponent['PTS'] < 106)]


plt.figure(figsize=(10,8))
gg = sns.scatterplot(x=data["PTS"], y=data['Team'])
plt.title("Points scored per opponent 2018-2019")
plt.show(gg)

data = df_19_20_opponent[df_19_20_opponent['PTS'] > 121 | (df_19_20_opponent['PTS'] < 107)]


plt.figure(figsize=(10,8))
gg = sns.scatterplot(x=data["PTS"], y=data['Team'])
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 8/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

plt.title("Points scored per opponent 2019-2020")


plt.show(gg)

# Need to clean up, y-axis is too crowded with amount of players


data = df_18_19_player[df_18_19_player['PTS'] > 40 | (df_18_19_player['PTS'] < 0)]
plt.figure(figsize=(10,8))
gg = sns.scatterplot(x=data["PTS"], y=data['Player'])
plt.title("Points scored per player 2018-2019")
plt.show(gg)

# Need to clean up, y-axis is too crowded with amount of players


data = df_19_20_player[df_19_20_player['PTS'] > 40 | (df_19_20_player['PTS'] < 0)]
plt.figure(figsize=(10,8))
gg = sns.scatterplot(x=data["PTS"], y=data['Player'])
plt.title("Points scored per player 2019-2020")
plt.show(gg)

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 9/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 10/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 11/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

DATA EXPLORATION & VISUALIZATION:

Number of Entries

At rst glance, each entry seems to represent a unique value. This is true for the team and opponent
game stats, but this is not true for the player stats. Upon further investigation, there are clearly
several entires with the same player name.

print("NBA 2018-2019 Team Game Stats:\n\n", df_18_19_team['Team'].value_counts().head(3))


print("\nNBA 2019-2020 Team Game Stats:\n\n", df_19_20_team['Team'].value_counts().head(3))
print("\nNBA 2018-2019 Opponent Game Stats:\n\n", df_18_19_opponent['Rk'].value_counts().head
print("\nNBA 2019-2020 Opponent Game Stats:\n\n",df_19_20_opponent['Rk'].value_counts().head(
print("\nNBA 2018-2019 Player Stats:\n\n", df_18_19_player['Player'].value_counts().head(3))
print("\nNBA 2019-2020 Player Stats:\n\n", df_19_20_player['Player'].value_counts().head(3))

NBA 2018-2019 Team Game Stats:

Phoenix Suns 1
Oklahoma City Thunder* 1
Brooklyn Nets* 1
Name: Team, dtype: int64

NBA 2019-2020 Team Game Stats:

Phoenix Suns 1
Toronto Raptors* 1
Brooklyn Nets* 1
Name: Team, dtype: int64

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 12/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

NBA 2018-2019 Opponent Game Stats:

30.0 1
29.0 1
2.0 1
Name: Rk, dtype: int64

NBA 2019-2020 Opponent Game Stats:

30.0 1
29.0 1
2.0 1
Name: Rk, dtype: int64

NBA 2018-2019 Player Stats:

Jason Smith\smithja02 4
Isaiah Canaan\canaais01 4
Andrew Harrison\harrian01 4
Name: Player, dtype: int64

NBA 2019-2020 Player Stats:

Anthony Tolliver\tollian01 4
Jordan McRae\mcraejo01 4
Troy Daniels\danietr01 3
Name: Player, dtype: int64

Number of Entries with Duplicates:

print("NBA 2018-2019 Player Stats:\n")


print(df_18_19_player['Player'].value_counts().value_counts().sort_index())
print("\nNBA 2019-2020 Player Stats:\n")
print(df_19_20_player['Player'].value_counts().value_counts().sort_index())

NBA 2018-2019 Player Stats:

1 444
3 80
4 6
Name: Player, dtype: int64

NBA 2019-2020 Player Stats:

1 469
3 58
4 2
Name: Player, dtype: int64

reports = df_18_19_player['Player'].value_counts()
reportPlot = sns.countplot(reports)
plt.title("Number of duplicated players")
plt.xlabel("# Of Players")
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 13/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
plt.xlabel( # Of Players )
plt.show(reportPlot)

reports = df_19_20_player['Player'].value_counts()
reportPlot = sns.countplot(reports)
plt.title("Number of duplicated players")
plt.xlabel("# Of Players")
plt.show(reportPlot)

/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass th


FutureWarning

/usr/local/lib/python3.6/dist-packages/seaborn/_decorators.py:43: FutureWarning: Pass th


FutureWarning

Data Cleaning:

#Drop columns we will not be using


"""
df_18_19_team.drop(columns=[], inplace=True)
df_19_20_team.drop(columns=[], inplace=True)
df_18_19_opponent.drop(columns=[], inplace=True)
https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 14/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory
_ _ _ pp p( [], p )
df_19_20_opponent.drop(columns=[], inplace=True)
df_18_19_player.drop(columns=[], inplace=True)
df_19_20_player.drop(columns=[], inplace=True)
"""

MACHINE LEARNING & PREDICTIONS

Predict the value of a trade or player aquisition using the team points totals and the player point
averages.

# Determine the teams scoring average


# Does aquired player add to team average or does it decrease the team average
# Split into test and training sets and scale

# Compute distance between two n-dimensional points


#df edist(x,y):
#return np.sqrt(np.sum((x-y)**2))

# Look at only desired columns


#df = df_18_19_player[['PTS','Player']]

# Scale the data


#df_raw = df
#df = df.apply(z.score)

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 15/16
2/20/2021 Final_Project_CST383.ipynb - Colaboratory

https://colab.research.google.com/drive/1E8m0GjmOIZEgLe82p0u0F5Dwf5JB7n52#scrollTo=P5XzSLVg99Cm&uniqifier=1&printMode=true 16/16

You might also like