You are on page 1of 14

Comparing Neural Network

and Linear Regression Models


to Predict NBA Team Records

Alan Yao – Mission San Jose High School


STEAMS Approach

Outline and Objectives


• Outline
• Obtain Data
• Build Models
• Linear Regression (LR)
• Neural Network (NN)

• Objectives
• Compare LR and NN models on prediction accuracy
• Apply STEAMS approach
• Add AI and Statistics concepts to STEM => STEAMS
Science, Technology
Obtain Data
• Download NBA data online
• Use Python to improve efficiency
• ~80,000 datapoints (30 teams for 27yrs, ~100 parameters)
Engineering, Statistics
Procedure to Build LR Model
• Choose important parameters
• Backward Elimination
• Remove parameters having
• Low correlation with winning percentage
• Strong multicollinearity

FLAG, no APG (Asist per TPG (Turnover per PPG Diff (Point per game
correlation game), basic game), basic difference), not a basic
parameter parameter parameter
Math, Statistics
Multivariate Correlations of NBA Parameters (partial)
Winning Percent PPG PPG ALLOW PPG DIFF FG% FG% ALLOW 3FG% 3FG% ALLOW FT% FT% ALLOW PPS OFF ORPG ORPG ALLOW DEF DRPG DRPG ALLOW
Winning Percent 1 0.37 -0.43 0.97 0.57 -0.59 0.42 -0.42 0.13 -0.10 0.59 -0.12 -0.13 -0.10 0.21 0.34 -0.31
PPG 0.37 1 0.66 0.39 0.66 0.26 0.34 0.04 0.24 0.22 0.66 0.00 -0.17 -0.07 0.52 0.50 0.29
PPG ALLOW -0.43 0.66 1 -0.44 0.17 0.74 -0.03 0.38 0.12 0.30 0.15 0.09 -0.07 0.02 0.33 0.20 0.54
PPG DIFF 0.97 0.39 -0.44 1 0.58 -0.60 0.44 -0.42 0.14 -0.10 0.61 -0.11 -0.12 -0.11 0.21 0.34 -0.31
FG% 0.57 0.66 0.17 0.58 1 0.01 0.44 -0.10 0.13 0.03 0.82 -0.10 -0.24 0.06 0.25 0.17 -0.29
FG% ALLOW -0.59 0.26 0.74 -0.60 0.01 1 -0.14 0.46 0.02 0.12 -0.06 0.24 0.15 0.10 -0.07 -0.36 0.21
3FG% 0.42 0.34 -0.03 0.44 0.44 -0.14 1 0.05 0.25 0.10 0.53 -0.28 -0.42 -0.18 0.25 0.22 -0.01
3FG% ALLOW -0.42 0.04 0.38 -0.42 -0.10 0.46 0.05 1 0.07 0.23 -0.01 0.01 -0.11 -0.22 0.07 -0.11 0.18
FT% 0.13 0.24 0.12 0.14 0.13 0.02 0.25 0.07 1 0.25 0.23 -0.26 -0.39 -0.22 0.24 0.22 0.14
FT% ALLOW -0.10 0.22 0.30 -0.10 0.03 0.12 0.10 0.23 0.25 1 0.05 -0.14 -0.33 -0.47 0.38 0.30 0.36
PPS 0.59 0.66 0.15 0.61 0.82 -0.06 0.53 -0.01 0.23 0.05 1 -0.23 -0.37 -0.07 0.28 0.24 -0.24
OFF -0.12 0.00 0.09 -0.11 -0.10 0.24 -0.28 0.01 -0.26 -0.14 -0.23 1 0.84 0.43 0.11 -0.36 -0.34
ORPG -0.13 -0.17 -0.07 -0.12 -0.24 0.15 -0.42 -0.11 -0.39 -0.33 -0.37 0.84 1 0.55 -0.35 -0.47 -0.45
ORPG ALLOW -0.10 -0.07 0.02 -0.11 0.06 0.10 -0.18 -0.22 -0.22 -0.47 -0.07 0.43 0.55 1 -0.40 -0.52 -0.37
DEF 0.21 0.52 0.33 0.21 0.25 -0.07 0.25 0.07 0.24 0.38 0.28 0.11 -0.35 -0.40 1 0.71 0.43
DRPG 0.34 0.50 0.20 0.34 0.17 -0.36 0.22 -0.11 0.22 0.30 0.24 -0.36 -0.47 -0.52 0.71 1 0.56
DRPG ALLOW -0.31 0.29 0.54 -0.31 -0.29 0.21 -0.01 0.18 0.14 0.36 -0.24 -0.34 -0.45 -0.37 0.43 0.56 1
REB 0.12 0.43 0.32 0.13 0.17 0.06 0.08 0.07 0.08 0.25 0.13 0.55 0.09 -0.13 0.89 0.43 0.20
RPG 0.25 0.38 0.16 0.27 -0.02 -0.26 -0.11 -0.21 -0.10 0.05 -0.05 0.32 0.34 -0.10 0.45 0.67 0.22
RPG ALLOW -0.38 0.26 0.57 -0.39 -0.27 0.29 -0.13 0.04 0.00 0.07 -0.30 -0.07 -0.10 0.28 0.18 0.23 0.79
FGM 0.16 0.69 0.54 0.16 0.51 0.31 0.23 0.13 0.18 0.29 0.36 0.39 -0.09 -0.06 0.80 0.30 0.22
FGA -0.05 0.52 0.54 -0.04 0.19 0.35 0.09 0.19 0.16 0.32 0.09 0.49 -0.01 -0.09 0.82 0.28 0.37
FGM/G 0.29 0.90 0.63 0.30 0.68 0.30 0.22 -0.02 0.15 0.19 0.43 0.11 -0.04 0.03 0.45 0.41 0.27
FGA/G -0.12 0.64 0.72 -0.11 0.04 0.40 -0.09 0.07 0.08 0.23 -0.15 0.24 0.16 -0.01 0.40 0.41 0.63

|R| < 0.1


Eliminate parameters with low correlations
|R| > 0.6
Statistics
LR Model
• Based on 1992 – 2018 data

• Least Squares Method

• Finalize LR model with 14


independent parameters

• R Sq. = 0.913, R Sq. Adj. =


0.911, RMSE = 0.047

• Low multicollinearity,
Variance inflation factors (VIF)
< 5,
Residual Analysis
• Randomly dispersed around the
horizontal axis (top left)

• Residuals have no evidence of


auto-correlation (bottom left)

• Approximately normal
distribution (bottom right)
ENGINEERING, AI
Neural Network Models
Neural Network Diagram

• Model-1
• Used 14 parameters, same as LR
• Model-2
• Used all available parameters (64)
• Removing redundant such as
APG_Diff, since :
APG_Diff = APG - APG_Allowed
Output Layer

JMP, Enables Technology


Input Layer
STATISTICS, AI
NN Model-1
Typical fitting charts • Comparing with LR model
• Lower RMSE : ~0.43 vs 0.47
• Higher R-sq. : ~0.92 vs 0.91
Training Validation
• Prediction profiles: similar to LR Model

NN Model-1

Typical fitting summary

LR Model
STATISTICS, AI
NN Model-2
Typical fitting charts

• Comparing with NN Model-1


Training Validation
• R-sq. improved, 0.92 -> 0.95
• RMSE reduced, 0.043 -> ~0.034
• Not every model has good fit,
refer to next page)

Typical fitting summary Partial Profiles


ENGINEERING, AI
Ensemble Learning
• Not all models fit good
• No.1: higher RMSE, lower R2
• No. 8: bigger RMSE/R2 differences
between Training Validation
• Ensemble Learning
• Simple average them
ENGINEERING, Statistics, AI
Compare Prediction Accuracy

• Prediction accuracy validated by the 2018-2019 regular season data


• NN Model with all parameters: RMSE reduced ~20%
STEAMS Approach
Summary
• Both LR and NN Models can be used for sport game prediction
• Linear regression: make physical sense
• Neural network: better prediction accuracy
• Neural networks can implement ensemble learning
STEAMS Methodology
Science: Data Science, Computer Science
Technology: JMP, Python
Engineering: Building Models
AI: Neural Network, Machine Learning
Math: Linear Algebra, Matrices
Statistics: Multiple Linear Regression, VIF, R-squared, RMSE