You are on page 1of 29

”Chemometrics for dummies”

Per Waaben Hansen


Senior Scientist
Team Chemometric Development

Dedicated Analytical Solutions


Outline

• Spectroscopy
– Mid-Infrared spectra and what they contain
– Lambert-Beer’s law

• Calibration
– Univariate calibration
– Multivariate (linear) calibration (PLS)
– Artificial Neural Networks (ANN)
– Chemometrics FAQ

• Accuracy
– Accuracy measures (RMSEP, SEP, SEPcorr)
– How is the accuracy composed?
– Examples

Dedicated Analytical Solutions


The milk fat spectrum

0.250
C-H stretch
absorbance

0.200 C=O stretch


absorbans

0.150
0.100 C-O stretch C-H bend
0.050
0.000
-0.0501000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
wavenumbers
bølgetal

O
CH2-O-C-CH2-CH2-CH2-CH2-CH2-CH2-CH2-CH2-CH3

R
Dedicated Analytical Solutions
The milk protein spectrum

C=O stretch
0.200
N-H bend
absorbance

0.150
absorbans

0.100
C-H stretch
0.050
0.000
1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
-0.050
wavenumbers
bølgetal

O O O O O
R-C-NH-CH-C-NH-CH-C-NH-CH-C-NH-CH-C-NH-CH-R
    
R R R R R
Dedicated Analytical Solutions
The milk lactose spectrum

0.300 C-O stretch


absorbance

0.250
0.200
absorbans

0.150
C-H stretch
0.100
0.050
0.000
-0.0501000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
wavenumbers
bølgetal

CH2 OH
CH2 OH O OH
OH O OH
OH O
OH
OH
Dedicated Analytical Solutions
Lambert-Beer’s law

• One constituent:
Absorbance  k  concentration

• Multiple constituents:
Absorbance k1  concentration1  k2  concentration2  k3  concentration3  ...

Dedicated Analytical Solutions


Milk spectra

0.400 SkimSkummetmælk
0.350 Low Letmælk
fat
0.300 Whole
Sødmælk
absorbance

0.250
absorbans

0.200
0.150
0.100
0.050
0.000
-0.0501000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000

wavenumbers
bølgetal

Dedicated Analytical Solutions


Conclusion - Spectroscopy

• Mid-Infrared (MIR) and near infrared (NIR) spectra contain


absorptions from molecular vibrations

• According to Lambert-Beer’s law the signals are


proportional to the concentrations

Dedicated Analytical Solutions


Outline

• Spectroscopy
– Mid-Infrared spectra and what they contain
– Lambert-Beer’s law

• Calibration
– Univariate calibration
– Multivariate (linear) calibration (PLS)
– Artificial Neural Networks (ANN)
– Chemometrics FAQ

• Accuracy
– Accuracy measures (RMSEP, SEP, SEPcorr)
– How is the accuracy composed?
– Examples

Dedicated Analytical Solutions


Univariate linear regression

C = b 0 + b1 A1

3
Absorbance

2
850 900 950 1000 1050
Wavelength

Dedicated Analytical Solutions


PLS - Partial Least Squares Regression

C = b0 + b1A1 + b2A2 + ... + bnAn

3
Absorbance

2
850 900 950 1000 1050
Wavelength

Dedicated Analytical Solutions


ANN - Artificial Neural Networks

ANN

PLS
Concentration

Instrumental response

Dedicated Analytical Solutions


Chemometrics FAQ

• What is the difference between calibration and prediction?

– During calibration the true concentrations for all samples are


known and the calibration method (PLS or ANN) is used for
generating a the relationship between the spectrum and the
concentrations. The result is a prediction model.

– During prediction the true concentrations are unknown and a


previously developed prediction model is applied to new
spectra in order to obtain the predicted concentrations.

Dedicated Analytical Solutions


Chemometrics FAQ

• Why can’t I use my milk powder fat prediction model for


fat in meat?

– A prediction model is local in the way that it only produces


valid predictions for samples that are similar to the samples
originally used for developing the prediction model. A sample
that does not match the model is called an outlier.

– Other differences that may result in invalid predictions


(outliers) include:
• Temperature changes
• Seasonal sample changes
• Added constituents (e.g. sugar in milk)
• Different instruments

Dedicated Analytical Solutions


Chemometrics FAQ

• Why do chemometricians always require more samples?

– This is in order to describe the future samples as well as


possible for the prediction model. The more sample types (and
variations, such as temperature or instrument) included in the
calibration samples, the more robust the resulting prediction
model will be.

– In principle, we want samples that correspond to every


possible sample that we will see in the future (i.e. an infinite
number of samples).

Dedicated Analytical Solutions


Chemometrics FAQ

• What is the difference between precision and accuracy?

– The precision indicates how well a prediction model repeats its


own answer (i.e. the repeatability).

– The accuracy is a measure for how close the result from the
prediction model is to the truth (i.e. the reference).

– A precise prediction model is not necessarily accurate.

– The precision is easy to determine – that is why customers Accu racy: 3


R e f ve ar el u n e c e

Accu racy: 1-2

request a good repeatability (although it does not say anything


about the accuracy).
mean mean

2 1 3

Pr ecision

Accu rate an d precise Accu rate, not precise Precise, n ot accurate

1 2 3

Dedicated Analytical Solutions


Conclusion - Calibration

• Univariate calibration
– can be used when pure signals are available

• Multivariate (linear) calibration (e.g. PLS)


– is useful for generating prediction models for specific products
over a limited concentration range
– requires a relatively small set of calibration samples (<100)

• Artificial Neural Networks (ANN)


– handles non-linear situations, such as wide calibration ranges
– can include different sample types into the same model
– requires a very large set of calibration samples (>1000)

Dedicated Analytical Solutions


Outline

• Spectroscopy
– Infrared spectra and what they contain
– Lambert-Beer’s law

• Calibration
– Univariate calibration
– Multivariate (linear) calibration (PLS)
– Artificial Neural Networks (ANN)
– Chemometrics FAQ

• Accuracy
– Accuracy measures (RMSEP, SEP, SEPcorr)
– How is the accuracy composed?
– Examples

Dedicated Analytical Solutions


How do we calculate the accuracy?

20

15
Reference

10

0
0 5 10 15 20
Prediction

Dedicated Analytical Solutions


RMSEP - Root Mean Square Error of Prediction

N
RMSEP  1
N  i ,pred i ,ref
( x
i 1
 x ) 2

• This is an indication of the total error, i.e. includes


– Bias
– Slope/intercept

Dedicated Analytical Solutions


20
SEP - Standard Error of Prediction

N
SEP  1
N 1  i ,pred i ,ref
( x
i 1
 x  bias ) 2

• This is an indication of the bias corrected error, i.e.


– Without any systematic bias
– Still affected by the slope and intercept

• The SEP should always be quoted with its associated bias

Dedicated Analytical Solutions


SEPcorr - Corrected prediction error

N
SEPcorr  1
N 2  (
i 1
slope  x i ,pred  intercept  x i ,ref ) 2

• This is an indication of the random error, i.e.


– Without any systematic errors (slope and intercept)

• The SEPcorr should always be quoted with its associated


slope and intercept

• SEPcorr is also known as SEC or sy,x

Dedicated Analytical Solutions


R2 - correlation

• R2 is a number indicating the closeness of agreement


between the reference and predicted results

• R2 is independent of the measurement unit, bias, or


slope/intercept

• R2=1 for the perfect relationship

• R2=0 when no relationship exists

Dedicated Analytical Solutions


How is the accuracy composed?

Temperature
Signal-to-noise ratio
Sampling

Repeatability ”Transferability”

Accuracy

Reference method
Operator

Dedicated Analytical Solutions


How is the accuracy composed?

Accuracy  reference2  measurement 2  temperature 2  bias 2  sampling 2  ...

• The effect of the squared sum is that only the largest


contributors have an effect on the accuracy

• For example, if the sampling error is high the


measurement error is insignificant in relation to the
accuracy

Dedicated Analytical Solutions


No systematic errors

20 %Cv
absolute relative
RMSEP 0.788 7.91%
SEP 0.787 7.90%
15 SEPCorr 0.739 7.42%
bias 0.045
slope 0.878
Reference

intercept 1.173
10 2
R 0.876
Mean 9.962

0
0 5 10 15 20
Prediction

Dedicated Analytical Solutions


Bias (intercept) error

20 %Cv
absolute relative
RMSEP 2.191 22.00%
SEP 0.787 7.90%
15 SEPCorr 0.739 7.42%
bias 2.045
slope 0.878
Reference

intercept -0.584
10 2
R 0.876
Mean 9.962

0
0 5 10 15 20
Prediction

Dedicated Analytical Solutions


Slope and intercept error

20 %Cv
absolute relative
RMSEP 2.291 23.00%
SEP 1.031 10.35%
15 SEPCorr 0.739 7.42%
bias 2.047
slope 0.732
Reference

intercept 1.173
10 2
R 0.876
Mean 9.962

0
0 5 10 15 20
Prediction

Dedicated Analytical Solutions


Conclusion - Accuracy

• Accuracy is

– approximately the square sum of all error sources involved in


the experiment

– dependent on the way it is calculated (RMSEP, SEP or SEPcorr)

Dedicated Analytical Solutions

You might also like