BN2102 7-10

Contents
7. Linear Regression ............................................................................................................2

PYTHON ...............................................................................................................................8
8. Linear Regression General Case ...............................................................................10
PYTHON .............................................................................................................................13
9. Non-linear regression (single variable) ....................................................................16
PYTHON .............................................................................................................................17
10. Non-linear regression (multiple variables) ............................................................20
PYTHON .............................................................................................................................22
7. Linear Regression
Purpose: finding an equation linear with respect to a function (with only one
coefficient).
Problem: Find the values of β0 and β1 so that y(x) = β0 + β1x best approximate the
experimental observations.
Aim: minimize the SSE
Statistical Approach to linear regression

- The “true” means at every xi follow a straight line (µy)
- At each xi, the possible values yi are distributed following a normal distribution
with mean = µyi = µβ0 + µβ1xi and one constant variance σ2. This σ2 is
unknown and will be estimated by a quantity that we define as residual
variance (sy2)
-
Our observations should thought of as a sample drawn from a normal distribution at
each xi. Moreover, we must think of our parameters β0 and β1 as estimates of their
“true” (unknown) values µβ0 and µβ1.
*Residual variance depends on value of SSE, is an estimate of variance of underlying

population, and is smaller if more number of measurements (assuming SSE same).
Hypothesis testing on the mean

- Do 2 HT, one for B0 and one for B1.
After computing the t statistic, proceed as per normal hypothesis testing: compare with
tcrit which is based on the desired confidence and m − 2 degrees of freedom. p value
is computed the same way.
Confidence intervals of regression parameters

- standard errors are proportional to our residual standard deviation Sy
- residual variance is an estimate of the variance of the population around its
mean.
- The bigger the sample, the smaller the standard errors
Estimation of mean response 𝝁𝒚𝒉 at a given 𝑿𝒉

Estimation of an individual prediction ypred at a given xpred
(extrapolation) when xpred is outside the range of x where we collected data.
SYpred always > SYh
Confidence Interval: Summary
1. Confidence interval for parameters (e.g., 95%)

Suppose I repeat the experiments at all my xi, yi values a large N times and, every
time I compute the regression line and the confidence intervals. Of all those N
confidence intervals, 95% will contain the true value of the parameters. I am therefore
95% confident that the interval I computed based on my observation is one of them
and 95% confident it will contain the true values of the parameters.
2. Confidence interval for the mean response at x = xh (e.g., 95%)

Suppose I repeat the experiments at xh a large N times and, every time I compute the
regression line and the confidence interval for the mean response. Of all those N
confidence intervals, 95% will contain the true value of the mean response at x = xh. I
am therefore 95% confident that the interval I computed based on my observations is
one of them and will contain the true value of the mean response.
3. Prediction interval for individual prediction at x = xpred (e.g., 95%)

If I were able to repeat the experiment at xpred a large N times and, every time I
compute the regression line and the prediction interval at x = xpred. Of all those N
prediction intervals, 95% will contain the true value of y at x = xpred. I am therefore
95% confident that the interval I computed based on my observations is one of them
and will contain the true value of y at x = xpred.
Descriptive parameters in linear regression
Extreme Cases:
1. SSE=0 (perfect linear fit)
2. SSE=SST
Common interpretation: r2 is the fraction of the total variation that is explained

by the regression line.
How to report results from hypothesis testing
- Rejected null hypothesis
Data support a decrease in improvement as age increases. The trend was
statistically significant (p < 0.05). – confidence interval will NOT CONTAIN ZERO.
- Failed to reject null hypothesis

Data failed to support any difference in improvement with age.
If range of data increases, r2 value increases.
Assumptions in linear regression

- Nonlinear may be better sometimes
- At each xi, possible yi values are normally distributed with mean and one
constant variance (defined as residual variance sy squared) – this may not be
true
- The sample is random and independent – bias would invalidate analysis
PYTHON
Compute r-square value
dbs_data = np.loadtxt('dbs_data.dat');
ages = dbs_data[:,1]; #ages = control variable, x
scores = dbs_data[:,2]; #scores = measurement
m = len(scores); variable, y
y_bar = np.mean(scores);
x_bar = np.mean(ages); #Compute summations
sum_1 = 0; necessary for the slope
ssx = 0;
for i in range(0,m): #summation at numerator of
sum_1 = sum_1 + (ages[i]- beta 1 formula
x_bar)*(scores[i]-y_bar);
ssx = ssx + (ages[i]-x_bar)**2;
beta_1 = sum_1/ssx; #compute slope and
beta_0 = y_bar - beta_1*x_bar; intercept of the line
#slope of the line
regr_line = np.zeros(m); #intercept of the line
for i in range(0,m):
regr_line[i] = beta_0 + beta_1*ages[i];
sse = 0;
sst = 0; #R2 value = 1-SSE/SST
sse = sse + (scores[i] - (beta_0 +
beta_1*ages[i]))**2;
sst = sst + (scores[i] - y_bar)**2;
r_square = 1-sse/sst;
Compute confidence intervals (95%) for slope and intercept

s_y = np.sqrt(sse/(m-2)); #Compute residuals variance (square
s_beta_1 = s_y*np.sqrt(1/ssx); root of)
s_beta_0 = s_y*np.sqrt(1/m + #Standard errors
x_bar**2/ssx);
t_crit = stats.t.ppf(0.975,m-2); #t crit
upp_slope = beta_1 + #Confidence intervals
t_crit*s_beta_1;
low_slope = beta_1 -
t_crit*s_beta_1;
upp_int = beta_0 +
t_crit*s_beta_0;
low_int = beta_0 -
t_crit*s_beta_0;
Hypothesis test
#Hypothesis testing H0: slope = 0 versus H1: slope is not zero to check
whether there is a trend or not (point 5)
t_stat = (beta_1-0)/s_beta_1;
if (np.abs(t_stat)>t_crit):
print('Reject the NULL hypothesis\ Data support a change in UPDRS
score\ associated with a change in age');
#After looking at the slope which is negative I can also
conclude that data support a decrease in UPDRS improvement with age
else:
print('Unable to reject the NULL hypothesis\ Data failed to support
any association\ between age and UPDRS improvement');
p_value = 2.0*(1.0-stats.t.cdf(abs(t_stat),m-2));
Compute confidence bands for mean scores within age range of data + PLOT!!
upp_conf_band = np.zeros(m);
low_conf_band = np.zeros(m);
x_axis = np.sort(ages);#x axis is a sorted version of ages
for i in range (0,m):
x_h = x_axis[i];
y_h = beta_0 + beta_1*x_h;
s_y_h = s_y*np.sqrt(1/m + (x_h-x_bar)**2/ssx);
upp_conf_band[i] = y_h + t_crit*s_y_h;
low_conf_band[i] = y_h - t_crit*s_y_h;
#Plot the regression line and the data points in one plot
plt.plot(ages,scores,'k.',ages,regr_line,'r-',\
x_axis,upp_conf_band,'g-',x_axis,low_conf_band,'g-');
8. Linear Regression General Case
Assumption: y(x) is linear WITH RESPECT TO THE PARAMETERS β0, β1, β2, ...,
βn−1.
SSE (minimize)
Minimising SSE
- Using original method is unideal as need to compute derivatives manually every
time.
- Instead, use a general approach = matrix-based approach
Matrix-based Approach
Statistical approach to linear regression
- The parameters we calculated (β0, β1, β2, etc) are estimates of the “true”
values µβ0, µβ1, etc.
- at each xi, the distribution of possible values y is normally distributed, with the
mean on the µy line and the same standard deviation for all xi
- Use residual variance (sy2) to estimate σ.
Hypothesis testing on the mean (same as normal linear regression)
- DOF = m - n
*more parameters = improved SSE, but may result in overfitting
K matrix (standard errors of linear regression parameters)
The straight line is a particular case of the matrix-based approach we saw in this
lecture (f0(x) = 1,f1(x) = x).
*SST is not necessary for calculation of confidence intervals.

Multiple regression
QUESTION
what are the assumptions made for calculating confidence intervals in linear
regression?
PYTHON
Problem 1
Determine the #Assemble matrix A
A = np.zeros((m,3));
best-fit
polynomial of A[i,0] = 1.0; #Coefficient of beta_0 in my model equation
order 2 using A[i,1] = length[i]; #coefficient of beta_1 in my model
the general equation
A[i,2] = length[i]**2; #coeffeicient of beta_2 in my model
matrix-based
equation
approach.
#Compute matrix K (A^T*A)^-1 * A^T
K_matrix =
np.linalg.inv(np.transpose(A).dot(A)).dot(np.transpose(A));
betas = K_matrix.dot(force);
Optimal length print(-betas[1]/(2*betas[2]));
Confidence t_crit = stats.t.ppf(0.975,m-3);#Note m-3 dof
intervals @ 95%
#Compute the SSE
sse = 0;
sse = sse + (force[i] -
(betas[0]+betas[1]*length[i]+betas[2]*length[i]**2) )**2
#Compute sy^2
s_y_sq = sse/(m-3);
#Compute the 3 s_beta_sq

s_beta_sq_0 = 0;
s_beta_sq_1 = 0;
s_beta_sq_2 = 0;
for j in range(0,m):
s_beta_sq_0 = s_beta_sq_0 + (K_matrix[0,j]**2)*s_y_sq;
#Confidence intervals
upp_beta_0 = betas[0] + t_crit*np.sqrt(s_beta_sq_0);
low_beta_0 = betas[0] - t_crit*np.sqrt(s_beta_sq_0);
Plot regression plt.plot(length,force,'k.',x_axis,regr_curve,'r-')
curve
Problem 2: Force = β0S0 + β1S1 + β2S2

Find the values force = exp_data[:,0];
s_0 = exp_data[:,1];
of β0, β1, β2
m = len(s_0);
A = np.zeros((m,3));
A[i,0]=s_0[i];
A[i,1]=s_1[i];
A[i,2]=s_2[i];
#Compute matrix K (A^T*A)^-1 * A^T

K_matrix =
np.linalg.inv(np.transpose(A).dot(A)).dot(np.transpose(A));
betas = K_matrix.dot(force); #answers
Hypothesis t_crit = stats.t.ppf(0.995,m-3);#99% required
testing:
#Compute residual variance
Determine sse = 0;
whether the for i in range(0,m):
variable S0 has y_model = betas[0]*s_0[i] + betas[1]*s_1[i] +
betas[2]*s_2[i];
a significant
sse = sse + (force[i] - y_model)**2;
impact on force
in the presence s_y_sq = sse/(m-3);
of the other
s_beta_0_sq = 0;
substances. Use
for j in range(0,m):
a 99% s_beta_0_sq = s_beta_0_sq + (K_matrix[0,j]**2)*s_y_sq;
confidence
level. s_beta_0 = np.sqrt(s_beta_0_sq);
t_stat = betas[0]/s_beta_0;
Test statements if (np.abs(t_stat) > t_crit):
print ('Reject NULL hypothesis. Data support that a change
in S0 is associated with a change in force in the presence of
S1 and S2');
else:
print('Unable to reject the NULL hypothesis. Data failed to
support that a change in S0 is associated with a change in
force in the presence of S1 and S2');
9. Non-linear regression (single variable)
Iterative procedures
Bracketing Methods:
Main Idea: start from an initial bracket [a b] that contains the minimum and reduce
the size of the bracket towards the minimum.
1. How do I choose the initial bracket [a b]?

2. Choosing x1 and x2: Golden Ratio Method
r = 0.618
Golden Ratio = 1/r = 1.618
We conduct the golden ratio from the left and right to determine x1 and x2.
3. How do I decide when the interval is small enough?
- Stopping/termination criteria
The calculated minimum will be accurate as of +/- tolerance value (the smaller, the
better). How many decimal figures the computer can handle.
Risks
- Local minimum instead of global minimum
- We always start with a guess when choosing [a,b]
PYTHON
Function to def obj_fun(beta):
m = len(exp_data[:,0]);#number of experimental data points
calculate SSE
sse = 0;
from the given for i in range(0,m):
equation x = exp_data[i,0];
y_model = (100.0/(1.0+(beta/x)**2));
y_experimental = exp_data[i,1];
sse = sse + (y_model - y_experimental)**2;
return sse;
Implement #initial bracket
a = 0;
golden sections
b = 100.0;
method
r = (np.sqrt(5.0)-1)/2.0;#key number for golden sections
#The first two internal points
x1 = (1-r)*(b-a) + a;
x2 = r*(b-a) + a;
#...the function evaluations at these points
f_x1 = obj_fun(x1);
f_x2 = obj_fun(x2);
Initialising the tolerance = 1e-9;
max_iterations = 10000000;#A large number. I will not wait for
iterations to find
longer than this amount of iterations
minimum
for i in range(0,max_iterations):
if (f_x1 >= f_x2):#we discard the interval [a x1]
a = x1; #a is where x1 used to be
x1 = x2; #x1 is where x2 used to be
x2 = r*(b-a)+a #new point
f_x1 = f_x2;#Recycling the value of f_x2
f_x2 = obj_fun(x2);
else:#we discard the interval [x2 b]
b = x2;#b is where x2 used to be
x2 = x1;#x2 is where x1 used to be
x1 = (1-r)*(b-a)+a;
f_x2 = f_x1;#Recycle the value at previous iteration
f_x1 = obj_fun(x1);
if (np.abs(b-a)<tolerance):
break;#Bracket is small enough, exit the loop
beta_min = (a+b)/2.0;
Plotting results x_axis = np.arange(0.01,30.0,0.01);
regr_curve = np.zeros(len(x_axis));
for i in range(0,len(x_axis)):
regr_curve[i] = 100.0/(1.0+(beta_min/x_axis[i])**2);
plt.semilogx (exp_data[:,0], exp_data[:,1], 'k.',\

x_axis,regr_curve,'r-');
Object function def obj_fun(x):

return np.exp(x) - 25*x;
Implement #initial bracket.

#Try other initial brackets to find different roots
golden sections
a = -1.0;
method b = 1.0;
r = (np.sqrt(5.0)-1)/2.0;#key number for golden sections

#The first two internal points
x1 = (1-r)*(b-a) + a;
x2 = r*(b-a) + a;
#...the function evaluations at these points
f_x1 = obj_fun(x1);
f_x2 = obj_fun(x2);
Iterations tolerance = 1e-9;
max_iterations = 10000000;#A large number. I will not wait for
longer than this amount of iterations
for i in range(0,max_iterations):
#The following line changes the criteria for discarding
intervals
if (np.abs(f_x1) >= np.abs(f_x2)):#we discard the interval
[a x1]
a = x1; #a is where x1 used to be
x1 = x2; #x1 is where x2 used to be
x2 = r*(b-a)+a #new point
f_x1 = f_x2;#Recycling the value of f_x2
f_x2 = obj_fun(x2);
else:#we discard the interval [x2 b]
b = x2;#b is where x2 used to be
x2 = x1;#x2 is where x1 used to be
x1 = (1-r)*(b-a)+a;
f_x2 = f_x1;#Recycle the value at previous iteration
f_x1 = obj_fun(x1);
if (np.abs(b-a)<tolerance):
break;#Bracket is small enough, exit the loop
root = (a+b)/2.0;
10. Non-linear regression (multiple variables)
Iterative procedures
Instead of using the golden ratio method (bracketing method), we use the Simplex
method for multiple variables.
Creating a simplex model

Simplex is a 2D (3 pointed) triangle.
First guess of x1 and x2 will be the tip of the simplex.
calculating midpoint and reflection coordinates – tip !
for f(T)
Termination criteria
- Simplex is too small
- F(G) has not improved for the last few tries
- Maximum number of iterations n
Risks
- Initial guess: highly affects
- Local minimum instead of global minimum
PYTHON
Problem 1: running simplex algorithm with SciPy
import numpy as np Import scipy library to help
import matplotlib.pyplot as plt run simplex
from scipy.optimize import minimize
def eval_model_equation(x,params):
a1 = params[0];
a2 = params[1]; Define a function to
mu = params[2];
evaluate the above
sigma = params[3];
function, given certain
return
(a1/np.sqrt(2*np.pi*sigma**2))*np.exp(-(x- parameter values
mu)**2/(sigma**2))+a2 #params given as list
#Given x (a scalar) and params = [a1,a2,mu,sigma]
def obj_fun(params): Function to be minimised
sse = 0; (SSE).
m = len(exp_data[:,0]);
y_model =
eval_model_equation(exp_data[i,0], params);
sse = sse + (exp_data[i,1] - y_model)**2;
return sse;
init_guess = [10,13,20,1]; Try various initial guesses.
results = minimize(obj_fun, init_guess, For example [10,10,10,1]
method='nelder-mead', options = {'xatol':1e-8, results in a successful fit,
'disp':True});
but completely
print(results.x); unsatisfactory result in
terms of the model
equation fitting the data
points.
regr_curve = np.zeros(len(exp_data[:,0])); Observe behaviour
for i in range(0,len(regr_curve)): obtained with each guess,
regr_curve[i] = and plot results finally.
eval_model_equation(exp_data[i,0],results.x);
#Plot your results

plt.plot(exp_data[:,0], exp_data[:,1],'k.',\
exp_data[:,0], regr_curve,'b-');
Problem 2: finding minima of a function given
Find minimum of in the domain [-5:5].

import numpy as np 3D plot is beyond syllabus!
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from mpl_toolkits import mplot3d #just for 3D plots
def obj_fun(params): Objective function with
x = params[0]; params input as a list
y = params[1];
return 3*x*x - 3*y + y*y + 30*np.sin(x);
init_guess = [1,1]; Guesses
results = minimize(obj_fun, init_guess,
method='nelder-mead', options = {'xatol':1e-8,
'disp':True});
print(results.x);
x = np.arange(-5,5.1,0.1); NON EXAMINABLE !
y = np.arange(-5,5.1,0.1);
Plotting in 3D plot here
X, Y = np.meshgrid(x,y);
Z = obj_fun([X, Y]);
ax = plt.axes(projection='3d');
ax.plot_surface(X,Y,Z,rstride=2,cstride=2,cmap='v
iridis',edgecolor='none');

BN2102 7-10

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BN2102 7-10

Uploaded by

Copyright:

Available Formats

Contents

7. Linear Regression ............................................................................................................2

Aim: minimize the SSE

Statistical Approach to linear regression

*Residual variance depends on value of SSE, is an estimate of variance of underlying

Hypothesis testing on the mean

Confidence intervals of regression parameters

Estimation of mean response 𝝁𝒚𝒉 at a given 𝑿𝒉

SYpred always > SYh

Confidence Interval: Summary

1. Confidence interval for parameters (e.g., 95%)

2. Confidence interval for the mean response at x = xh (e.g., 95%)

3. Prediction interval for individual prediction at x = xpred (e.g., 95%)

Common interpretation: r2 is the fraction of the total variation that is explained

- Failed to reject null hypothesis

If range of data increases, r2 value increases.

Assumptions in linear regression

Compute confidence intervals (95%) for slope and intercept

Statistical approach to linear regression

*more parameters = improved SSE, but may result in overfitting

K matrix (standard errors of linear regression parameters)

*SST is not necessary for calculation of confidence intervals.

#Compute the 3 s_beta_sq

Problem 2: Force = β0S0 + β1S1 + β2S2

#Compute matrix K (A^T*A)^-1 * A^T

1. How do I choose the initial bracket [a b]?

3. How do I decide when the interval is small enough?

plt.semilogx (exp_data[:,0], exp_data[:,1], 'k.',\

Object function def obj_fun(x):

Implement #initial bracket.

r = (np.sqrt(5.0)-1)/2.0;#key number for golden sections

Creating a simplex model

#Plot your results

Problem 2: finding minima of a function given

Find minimum of in the domain [-5:5].

You might also like

#Compute matrix K (A^TA)^-1 A^T