You are on page 1of 7

Uncertainty

What is a model?
- A model is a theoretical representation of a system It helps us with knowing,
understanding and simulating reality
- Natural systems
- Natural laws

- World systems
- No natural laws, often based on our perceptions

Statistical & machine learning models


- Based on data
- Generation of predictions and forecasts

Goodness of a model
- “All models are wrong, some are useful” – George Box
- “All models are wrong, some are harmful” – Anonymous
- Transparency is key!

Uncertainties

Why?
- Help us to decide whether a model is useful
- Guide us on how to act on a
prediction/forecast/estimation
Sources of uncertainty
- Model
• Structural
• Parameters
• Numerical errors
- Data
• Noise on measurements
• Dataset sizes
• Bias
• Incompleteness

Quantifying uncertainties
- Forward propagation
• Input uncertainties
- Inverse quantification
• Bias correction
• Parameter calibration

Confidence intervals

Bootstrap accuracy confidence intervals


Algorithm
1. Train and evaluate model for each bootstrap accuracies = []
i=O
bootstraps = 1000
while i < bootstraps:
train, test = sample_with_replacement(data, size)
model = train_model(train)
accuracy = evaluate_model(test)
accuracies.append(accuracy)
i += 1
2. Calculate accuracy confidence interval
alpha = 0.95
p = ( (1.0 - alpha) / 2.0) * 100
lower = max(0.0, numpy.percentile(accuracies, p))
p = ( alpha + ( (1.0 - alpha) / 2.0) ) * 100
upper = min(1.0, numpy.percentile(accuracies, p))
Example: 95% likelihood that classification accuracy falls between 80% and 85%
Prediction intervals
- Prediction range with a certain confidence instead of single prediction

Prediction intervals
Algorithm
1. Define three models with different loss functions
- lower_model = Regressor(loss="quantile", alpha=0.025)
- mid_model1 = Regressor(loss="ls")
- mid_model2 = Regressor(loss="quantile", alpha=0.5)
- upper_model = Regressor(loss="quantile", alpha=0.975)

2. Fit three different models with the training data


- lower_model.fit(X_train, y_train)
- mid_model1.fit(X_train, y_train)
- upper_model.fit(X_train, y_train)
3. Evaluate the models with the test data
- lower_pred = lower_model.predict(X_test)
- mid_pred = mid_model1.predict(X_test)
- upper_pred = upper_model.predict(X_test)
- error = evaluation(lower_pred, mid_pred, upper_pred)
4. Apply models on new data
- lower_pred = lower_model.predict(X_new)
- mid_pred = mid_model1.predict(X_new)
- upper_pred = upper_model.predict(X_new)
Bayesian methods
Bayesian linear regression

Frequentist linear regression:


- y = β0 + β1X
- Model parameters that minimize error
Bayesian linear regression:
- y ∼ N (β0 + β1X, σ2 )
- Posterior distribution for model parameter
Algorithm
1. Specify priors for the model parameters
– Uniform distribution
– Normal distribution
– Etc
2. Define likelihood where mean is linear predictor with variance σ2
– Normal distribution
– Etc
3. Approximate posterior distribution for model parameters
– Markov Chain
– Monte Carlo
– Etc
4. Compute probability distribution for outcome

1. Specify priors for the model parameters


- beta0 = Normal(mu=0, std=10)
- beta1 = Normal(mu=0, std=10)
- sigma = HalfNormal(std=10)
2. Define likelihood where mean is linear predictor with variance σ^2
- mean = beta0 + beta1*X
- likelihood = Normal(mu=mean, std=sigma, observed=y)
3. Approximate posterior distribution for model parameters
- posterior = Sample(steps=1000, ...)
4. Compute probability distribution for outcome
- y_pred = posterior['beta0'] + posterior['beta1']*X_new

(3)

(4)

You might also like