Professional Documents
Culture Documents
Observed value
Predicted value
LINEAR REGRESSION FORMULA
“Training” a model means finding the parameters (in this case the beta values) that minimize the error (MSE for linear regression)
LINEAR REGRESSION METRICS
LINEAR REGRESSION METRICS
LINEAR REGRESSION METRICS
LOGISTIC REGRESSION
LOGISTIC REGRESSION
1
Observed
Predicted
1 0
1
Observed
Threshold=1.0 FPR = [False Positives] / [Observed Negatives] 0
AUC = Area Under the Curve
CLASSIFICATION PERFORMANCE METRICS: RECALL, PRECISION,
F1
OVERFIT, HOLD-OUT DATA AND CROSS-VALIDATION
Overfit is when a model begins to "memorize" training data rather than "learning" to generalize
from a trend.
In general, for a fixed-size dataset, the more variables you have, the
higher the risk of overfit => if it doesn’t hurt performance, less
variables is better.