You are on page 1of 49

Manual for MVC2

IMPORTANT

The software, data and manual described here correspond to those discussed in the book
"Practical three-way calibration", by Alejandro C. Olivieri and Graciela M. Escandar,
Elsevier, Waltham, MA, USA, 2014. However, the software has been updated since the
book was published, and the following major changes have been introduced:
1. Non-negativity and unimodality constraints can be applied to all trilinear
decomposition algorithms and not only to PARAFAC.
2. The limits of detection and quantitation for the PLS/RBL models are now
given according to the most recent approach discussed in F. Allegrini and
A.C. Olivieri, Anal. Chem. 86 (2014) 7858-7866.
3. For the MCR-ALS model, new options are available for initialization and for
applying constraints in both data modes, including selectivity/local rank and
closure. Also, rotational ambiguity bands are estimated, requiring the
Optimization Toolbox to be installed in the current MATLAB version. The
compiled version does not require this toolbox (or MATLAB installed in the
computer).

1
GENERAL CONSIDERATIONS 3 MCR-ALS 25

MATLAB version 3
PARAFAC2 33
Compiled version 3
BLLS/RBL 35
Program files 3

Decompressing and locating files 3 N-PLS/RBL AND U-PLS/RBL 36


Examining example data 4 Calibration 36

Running the program 5 Prediction in test samples 38

Plotting a sample 6 Predictions using RBL 40

Data types 10
U-PCA/RBL 43
List of screen options 11
REFERENCES 47
Files created on program execution 12

THE TLD MODELS 13 FINAL REMARKS 49

Trilinear systems 13 ACKNOWLEDGMENTS 49


Non-trilinear systems 24

2
General considerations
MATLAB version
The routines work under MATLAB 2012. You need to have MATLAB installed on your
computer. The codes are freely available at www.iquir-
conicet.gov.ar/descargas/mvc2.zip.

Compiled version
To run the compiled version, you do not need to have MATLAB installed in your
computer, but you need to install the freely available MCR (MATLAB common
runtime) appropriate for the MATLAB version used to develop the interface, which is
available at:
https://www.dropbox.com/sh/nruf3lp0ge1gbww/AAAj6r97UBMIhgQmukRGYFPKa?dl
=0

Follow the instructions in the corresponding folder (“Common runtime”).

The program is located in the folder “MVC2 Second-order multivariate calibration”


and runs by simply double clicking in mvc2_32.exe.

Program files
In the MATLAB version, the following folders are available:

Folder Content
mvc2_program Main program: a set of MATLAB codes
N_way_toolbox The N-way toolbox (version 3.31) downloaded from Rasmus Bro web
site: http://www.models.kvl.dk/algorithms
mvc2_manual Program manual: a PDF document
mvc2_data Example data: RAR files in separate folder (see below)
In both the MATLAB and compiled versions, the following data folders are available:

Folder RAR data file Content

EEFM EEFM_data Synthetic excitation-emission fluorescence data matrices


EEFM_R_data
EEFM_IF_data
EEFM_D1_data
EEFM_D2_data
LCDAD LCDAD_data Synthetic liquid chromatographic data-DAD matrices with
LCDAD_PI_data shifts in peak positions
Other pHDAD_data Synthetic data matrices showing linear dependency
KinDAD_data

Decompressing and locating files


In the MATLAB version, decompress the folders 'mvc2_program' and 'N_way_toolbox'
into two folders, and include the latter ones into the 'MATLAB search path'.

3
Be sure that these program folders are on top of the 'MATLAB search path'.

In the compiled version, double click in mvc2_32.exe.

To begin working with the example data, decompress each example data folder into
separate folders, and then make each of the latter the 'Current Folder' in turn for each
study.

Examining example data


We will begin examining the simulated data contained in 'EEFM_data'. Go to the folder
where these data are located, and examine the available data, checking the following
issues:

• Calibration spectra ('EEFM_cal1.txt', 'EEFM_cal2.txt', ..., 'EEFM_cal9.txt') are of the


type 'X,Y_vectors', i.e., two columns (the first column is wavelength). Although the
names are sequentially numbered, this is not a requirement. They contain, in the second
column, the calibration matrices (JK) unfolded into (JK1) vectors.
• Calibration concentrations for two analytes are in the single column files
'EEFM_y1cal.txt' and 'EEFM_y2cal.txt'.
• Filenames of calibration spectra are in a single column in the file 'EEFM_calfiles.txt', in
the same order as the concentrations in 'EEFM_y1cal.txt', etc.
• Test spectra ('EEFM_test1.txt', 'EEFM_test2.txt', ..., 'EEFM_test10.txt') are of
'X,Y_vectors' type, i.e., two-column.
• Test concentrations are in a single column in the files 'EEFM_y1test.txt' and
'EEFM_y2test.txt'.
• Filenames of test spectra are in a single column in the file 'EEFM_testfiles.txt', in the
same order as the concentrations in 'EEFM_y1test.txt', etc.
• All the above information is contained in the text file 'EEFM_Data_help.txt'.

Check the content of the text file 'EEFM_Data_help.txt':

File name Content


--------------------------------------------------------------------------------------
EEFM_calfiles.txt Names of calibration files
EEFM_testfiles.txt Names of test files
EEFM_cal1.txt ... EEFM_cal9.txt Calibration data files (9)
EEFM_test1.txt ... EEFM_test10.txt Test data files (10)
EEFM_y1cal.txt Calibration concentrations for analyte 1
EEFM_y2cal.txt Calibration concentrations for analyte 2
EEFM_y1test.txt Test concentrations for analyte 1
EEFM_y2test.txt Test concentrations for analyte 2

IMPORTANT
--------------------------------------------------------------------------------------
The example contains 2 analytes and a single interferent
in samples test1.txt to test5.txt,
and 2 analytes and 2 interferents
in samples test6.txt to test10.txt
Sensor data: 321 360 1 261 290 1 (emission-excitation)

4
Data type: X,Y_vectors
Noise in X: 0.1
Noise in Y: 0.01
----------------------------------------------------------------------------------------

Notice that information on the calibration and test sets is provided, ranges for excitation and
emission and estimated uncertainty in both signal (X) and concentration (Y). This example
data contain two analytes whose excitation and emission spectra are as follows:

Excitation (left) and emission (right) spectral profiles for both analytes: blue, analyte 1, green,
analyte 2.

Running the program


Run mvc2. You will see this screen:

Comments:

5
• The different fields should be completed according to the problem under study. If
something is changed, and 'SAVE' is clicked, the new screen will be saved in the Current
Folder and can be loaded in the next working session. You can save a particular screen
with a name, and then load it with 'LOAD SCREEN'. If no name is provided, a default
'init2_comp' name is given to the screen file.
• You can search for a screen file already saved, using the browser (‘…’). After selecting a
suitable screen (.mat) file, the path name will be automatically updated.
• The 'Sensor data' field must be filled with the indexes of the data matrices. In the present
example, the numbers are: 321, first sensor in mode B, 360, last sensor in mode B, 1,
separation between sensors in mode B (there are 40 points in mode B), 261, first sensor in
mode 2, 290, last sensor in mode C, 1, separation between sensors in mode C (30 points in
mode 2). Here the data matrices are emission-excitation matrices recorded from 321 to
360 nm each 1 nm (emission), and from 261 nm to 290 nm each 1 nm (excitation). They
were unfolded along the emission axis, i.e., all emission spectra were put on top of each
other, beginning with the one recorded at an excitation of 261 nm.
• The 'Data type' should be changed to X,Y_vectors, which is the data type corresponding
to the EEFM_data example.
• Select the filename of a sample to plot, e.g., 'EEFM_test1.txt' using the browser.

The screen should therefore look like this:

Plotting a sample
Click in 'PLOT to see a three-dimensional and a contour plots:

6
10

6
Intensity

330 265
340 270
275
350 280
285
360 290
Mode C
Mode B

360

9
355
8

350
7

345 6
Mode B

5
340

4
335
3

330
2

325 1

0
265 270 275 280 285 290
Mode C

Comments:
• Modes B and C (columns and rows) are conveniently labeled.
• The figures can be rotated by going to 'Tools', 'Rotate 3D'.

To select wavelength ranges, fill the 'Selected sensors' space with appropriate values.
For example, if one wishes to restrict the selection to the emission range 330 to 350 and
excitation from 265 to 285, introduce these numbers separated by blank spaces:

7
Comments:
• The separations (1 and 1 data points respectively) are not required in the field 'Selected
sensors'.

Click in 'PLOT SAMPLE' and obtain a sample plot in the selected ranges.

10

8
Intensity

330 265
340 270
275
350 280
285
360 290
Mode C
Mode B

8
360

9
355

8
350
7

345
6
Mode B

340 5

335 4

3
330

2
325

1
265 270 275 280 285 290
Mode C

You can also employ a different digital point separation in each data mode. For
example, if you wish to plot (and employ for data processing) the matrices in the sensor
ranges 330 350 265 285, but, additionally, you want to pick data points every 2 points in
the B mode, and every 2 points in the C mode, then input 330 350 265 285 2 2 in the
'Selected sensors' window:

Click in 'PLOT SAMPLE' and you will see the following figures:

9
9
8
7
6
Intensity

5
4
3
2
1

330 265
340 270
275
350 280
285
360 290
Mode C
Mode B

360
9

355
8

350 7

345 6
Mode B

340 5

4
335

3
330

2
325
1

265 270 275 280 285 290


Mode C

Data types
Available data types are the following:
• X_vectors: each JK data matrix is unfolded and saved as a JK1 vector.
• X,Y_vectors: similar to X_vectors, but saved as a two column file, where the first column
contains wavelengths and the second one the signals.
• X,Y,Z_vectors: similar to X_vectors, but saved as a three column file, where the first
column contains wavelengths, the second a reference signal and the third the sample
signals.
• X,Y_matrices: data are in a J2K matrix form, because a column of wavelengths is
interpolated between successive columns of instrumental data.
• X_matrices: data are directly as JK matrices, with no wavelength values.

10
List of screen options
A complete list of available options in the main screen of mvc2 is (they will be explained
below in connection with the examples):

• Model: used to select the second-order multivariate model to be applied. The first window
gives the model type, the second one the specific algorithm.
• Number of components: number of responsive components or latent variables (LVs),
depending on the model: trilinear decomposition (TLD) models, total components
(number of analytes + number of interferents or total number of analyte species + number
of interferents); residual bilinearization (RBL) models, number of calibration components
and number of interferents.
• Sensor data: give in the form a b c d e f, where a = first sensor in mode B, b = last sensor
in mode B, c = separation of sensors in mode B, d = first sensor in mode C,e = last sensor
in mode C, f = separation of sensors in mode C.
• Selected sensors: give in the form g h i j k l, where g = first selected sensor in mode B, h =
last selected sensor in mode B, i = first selected sensor in mode C, j = last selected sensor
in mode C, k = selected separation of sensors in mode B (optional), l = selected separation
of sensors in mode C (optional).
• Data type: X_vectors: each sample is a one-column file, X,Y_vectors: each sample is a
two-columns file, X,Y,Z_vectors: each sample is a three-columns file, X,Y_matrices:
matrices with wavelength information in intermediate columns, X_matrices: matrices JxK
• Filenames: Calibration conc., text file with calibration concentrations; Calibration signals,
text file with filenames of calibration spectra; Test signals: text file with filenames of test
data for a set of samples or file with data for a single sample.
• Plot: give sample file name for plotting and activate Plot.
• Calibration samples excluded: if you wish to exclude samples from calibration, simply
type the sample numbers separated by white spaces. Otherwise, leave a blank.
• Mean center: this option mean centers the data using the mean instrumental values of the
calibration set. It should not be used in combination with non-negative restrictions.
• Uncertainty in signals: used to estimate standard deviation in predicted concentrations. If
left blank, the default option is estimated from the regression. Otherwise, enter the desired
value.
• Uncertainty in conc.: used to estimate standard deviation in predicted concentrations. If
left blank, the default option is zero. Otherwise, enter the desired value.
• Constraints in A,B,C: select constraints in each TLD mode. A are the scores or relative
concentrations, B the profiles in the mode B and C the profiles in the mode C.
• Initialization method: select the initial values of scores and loadings to start TLD models.
• Estimate components: runs leave-one-out U-PLS and N-PLS cross-validation, U-PCA
unfolded analysis (explained variance and residual fit), and PCA of the augmented matrix
in MCR-ALS.
• Predict: predict concentration in test samples.
• Save: saves current screen values for next mvc2 session
• Load: loads a saved screen.
• Process cal.: saves PCA scores of calibration data for future ANN training.
• Process test: saves PCA scores of test data for future ANN prediction.
• Augmentation: mode of MCR augmentation: mode B (column-wise) or mode C (row-
wise).
• Non-negativity: selection of non-negativity constraints for MCR-ALS.

11
• Sample-dependent mode: mode in which profiles change in PARAFAC2.

Files created on program execution


When executing mvc2, the default screen file is loaded, containing the default values of
most screen options and variables. If this file does not exist in the working folder,
MATLAB will search the path looking for a file with this name.

The program will create a folder \temp2 inside the working folder. Within this \temp2
folder, the following files may be created:

Filename Content
temp2.mat Some auxiliary variables needed for program
execution
A_ + 'algorithm name' + .txt Scores (A) and profiles (B and C) retrieved
B_ + 'algorithm name' + .txt by TLD algorithms in both data modes
C_ + 'algorithm name' + .txt
Bint_ + 'algorithm name' + '_sample Profiles for interferents retrieved by N-
number'.txt PLS/RBL, U-PLS/RBL and U-PCA/RBL for
Cint_ + 'algorithm name' + '_sample each selected test sample.
number'.txt
TCAL.txt Calibration scores for ANN training
TU_+ 'Sample name'.txt Test sample scores for ANN prediction
D_'Sample name'.txt Augmented matrix for MCR-ALS
calculations.
B_ + 'algorithm name' + '_sample number'.txt Profiles for each selected sample and
C_ + 'algorithm name' + '_sample number'.txt augmented profiles in MCR-ALS or
Baug_ + 'algorithm name' + .txt PARAFAC2, depending on augmentation or
Caug_ + 'algorithm name' + .txt changing mode.

12
The TLD models
Trilinear systems
Trilinear systems (e.g., fluorescence) do not usually require any special initialization or
restriction for successful TLD processing.
The available algorithms based on trilinear decomposition (TLD) models are listed in
the following table:

Algorithm Description Reference


PARAFAC Parallel factor analysis 1
ATLD Alternating trilinear decomposition 2
APTLD Alternating penalty TLD 3
SWATLD Self-weighted ATLD 4

We continue studying 'EEFM_data'. Return to MENU and change the model to 'TLD'
and 'PARAFAC', setting 3 as the number of components, and leaving the whole spectral
range for data processing.

Comments
• We have set 3 as the number of components, which is equal to the number of analytes
plus one (in test samples No. 1 to 5, because in test samples No. 6 to 10 there are two
interferents). We can then increase (or decrease) it to check for the optimum number.
• A single test sample is studied, whose filename is 'EEFM_test1.txt'
• Only 'Calibration signals' is allowed at this stage, since the first step in PARAFAC does
not use concentration information.
• Additional PARAFAC options appear, such as Constraints and Initialization method
(these are advanced aspects to be commented below).

Click on 'PREDICT'. Profiles for the two components appear on a separate figure:

13
Comments
• Three profiles appear in both modes, which should be used for matching them with those
of the analytes. The components are not labeled according to our nominal analytes 1 and
2, but in the order of their contribution to the overall spectral variance. In this case, for
example, our analyte 1 is the PARAFAC component 3 (we should know this from spectra
of standards, see above), while our analyte 2 is PARAFAC component 1. Clearly,
component 2 is an interferent.
• Introduce the profile number matching analyte 1, and the filename with calibration
concentrations for analyte 1, and then click in 'GO'. Use the browser if necessary.
• You may save the profiles by clicking in 'SAVE'.

In this case the profile for analyte 1 is the one labeled as 3 in this plot. Introduce this
information:

Mode B

1
0.3 2
Intensity

3
0.2

0.1

0
325 330 335 340 345 350 355 360

Mode C

0.3
Intensity

0.2

0.1

0
265 270 275 280 285 290

Click in 'GO' and get the pseudo-univariate calibration graph for analyte 1:

14
Calibration line: component 3, 3 components
60
Calibration
Fit: s=0.36598 r2=0.99981
40
Test

Score
20

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Nominal
Calibration errors
1

0.5
Error

-0.5

-1
1 2 3 4 5 6 7 8 9
Sample No.

Comments
• The test sample is represented by the red cross, interpolated in the pseudo-univariate
graph.
• Try selecting a different profile, i.e., put '1' or '2' as the index for 'Analyte' and keep
'EEFM_y1cal.txt' to see what happens.
• Take note of the regression fit (s) in this graph. It is a quality measure of the pseudo-
univariate calibration graph. The new MVC2 version also gives the correlation
coefficient.
• We are employing external calibration, but you can change to standard addition if this is
the case.
• The sub-plot at the bottom permits identification of outliers.

Meanwhile, the MATLAB space shows this:

Calculating ... please wait

PARAFAC

PRELIMINARY

A 3-component model will be fitted


No constraints on mode 1
No constraints on mode 2
No constraints on mode 3
The convergence criterion is 1e-006
No weights given
No missing values
Line-search acceleration scheme initialized
Using direct trilinear decomposition for initialization

Sum-of-Squares Iterations Explained


of residuals variation

15
Tuckers congruence coefficient
1.0000 0.1049 0.1149
0.1049 1.0000 0.4301
0.1149 0.4301 1.0000

112.2771673355 8 99.8457

Components have been normalized in all but the first mode


Components have been ordered according to contribution
Components have been reflected according to convention
The algorithm converged

Core consistency: 99.9975

SD of residuals: 0.098

Comments
• Progression of the convergence of the least-squares algorithm is shown.
• Take note of the core consistency (99.9975). It will be useful for estimating the optimum
number of components.5
• The least-squares residue (0.098, standard deviation of residuals) is reported. This can
also be useful in assessing the number of components. Notice that it agrees with the
already known noise level in signals (0.1 units).

This information also appears in a report screen:

Try repeating the calculations with different number of components, taking note, in
each case, of the core consistency and the SD of residuals. The results are as follows:

Number of Core consistency SD of residuals


components
2 100 0.42
3 100 0.098
4 99.8 0.097
5 –0.3 0.097

Comments:
• According to the core consistency test, the number of components is 4, because the core
consistency drops below 50 after N = 4.

16
• According to the SD of residuals, the number of components is 3, because this stabilizes
the SD at the noise level, which is known to be 0.1 units.
• Visually examining the profiles for N = 4 and N = 5 shows that that the fourth and fifth
components have profiles which do not resemble a 'spectrum', or are identical to previous
components.

Having established that 3 is a good number of components, repeat the 'PREDICT'


procedure using this number of components, return to the corresponding pseudo-
univariate calibration graph for 3 PARAFAC components, then click in 'GO'. The
MATLAB space shows:

Prediction results:

Concentration for component 3


Sample: 1: 0.31707

Figures of merit

Sensitivity: 18
Selectivity: 0.36
Analytical sensitivity: 1.9e+002
LOD: 0.018
SD of concentration:
Sample 1: 0.0056

Comments:
• The result is the predicted concentration of analyte 1 (component 3 for PARAFAC).
• Analytical figures of merit, sensitivity, analytical sensitivity, selectivity, etc. are
provided.6,7 The noise level will be taken as the standard deviation of residuals.
• To estimate LOD and SD of concentration, the uncertainties in calibration signals and
concentrations are needed. In the present case the program used default values based on
least-squares fit, because no values were given in the main screen.
• The SD of concentration is given sample by sample, in this case there is a single sample.

This information also appears in a separate report:

17
If the signal and concentration uncertainties are already known, simply type them in the
main screen of mvc2:

Predicting with this value of signal uncertainty gives these figures of merit:

Figures of merit

Sensitivity: 18
Selectivity: 0.36
Analytical sensitivity: 1.8e+002
LOD: 0.018
SD of concentration:

18
Sample 1: 0.0059

Comment:
• The results have not changed because the input values are similar to the default values
estimated by the program.

When studying samples EEFM_test6.txt through EEFM_test10.txt, notice that they


contain two interferents. Hence, set the screen as follows:

Of course you should check that 4 is the correct number of components, using the
statistical tools commented above. Clicking in predict now leads to:

In this plot we clearly see the profiles for both analytes, however, those for the
interferents are somewhat 'mixed', with some negative values which are undesirable.
For qualitative purposes, this might in principle be corrected by using constraints in the
least-squares fit. Set non-negativity constraints in all profiles and repeat the procedure:

19
Now we obtain physically reasonable results; however in comparing with the pure
component profiles we see that linear combinations may still be obtained (this is typical
of several interferents in the test sample):

Mode B

1
0.3 2
Intensity

3
0.2 4

0.1

0
325 330 335 340 345 350 355 360

Mode C

0.3
0.2
Intensity

0.1

0
-0.1
265 270 275 280 285 290

Inserting the analyte index (3), the calibration y file and clicking in 'GO' leads to:

Prediction results:

Concentration for component 3


Sample: 1: 0.62909

Figures of merit

20
Sensitivity: 9.9
Selectivity: 0.2
Analytical sensitivity: 1e+002
LOD: 0.032
SD of concentration:
Sample 1: 0.011

Calibration line: component 3, 4 components


60

40
Score

Calibration
20
Fit: s=0.36267 r2=0.99981
Test
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Nominal
Calibration errors
1

0.5
Error

-0.5

-1
1 2 3 4 5 6 7 8 9
Sample No.

Repeat the prediction of all test samples and for both analytes. Notice that when using
PARAFAC, the prediction procedure should be repeated for each unknown sample,
using 3 or 4 components depending on the sample.

You should complete the following table using the various options:

Sample Analyte 1 Analyte 2


Nominala PARAFAC Nominalb PARAFAC
EEFM_test1.txt 0.33 0.42
EEFM_test2.txt 0.42 0.06
EEFM_test3.txt 0.71 0.28
EEFM_test4.txt 0.21 0.22
EEFM_test5.txt 0.46 0.83
EEFM_test6.txt 0.65 0.70
EEFM_test7.txt 0.92 0.51
EEFM_test8.txt 0.89 0.59
EEFM_test9.txt 0.79 0.65
EEFM_test10.txt 0.41 0.29
RMSE
REP%
a
Contained in the file 'EEFM_y1test.txt'.
b
Contained in the file 'EEFM_y2test.txt'.

21
In the above table, RMSE is the root mean square error, and REP% its value relative to
the mean calibration concentration in %. You should explain the results in terms of the
figures of merit for each analyte, i.e., better recoveries should correspond to higher
sensitivity.

Try predicting all test samples using the remaining algorithms based on the TLD model,
which work similarly to PARAFAC. The algorithms APTLD and SWATLD, however,
are claimed to be faster, and independent on the number of components, provided they
are larger than the actual number. This means that they always recover good analyte
profiles, with only noisy profiles for the excess components. Check if this is true, and
compare with PARAFAC predictive results.

IMPORTANT: you can study multiple samples, introducing the name


'EEFM_testfiles.txt' in the 'Test Signals' field, which contains the names of the ten test
files. You should employ 4 components to study this set, since some of the samples
contain 3 but other contain 4 components:

The result is the same set of profiles, but analyte 1 is component 4:

22
Mode B

0.3
1

Intensity
2
0.2
3
0.1 4

0
325 330 335 340 345 350 355 360

Mode C

0.3
Intensity

0.2

0.1

0
265 270 275 280 285 290

Analyte prediction may proceed in all samples simultaneously:

Calibration line: component 4, 4 components


60
Calibration
Fit: s=0.3713 r2=0.9998
40
Test
Score

20

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Nominal
Calibration errors
1

0.5
Error

-0.5

-1
1 2 3 4 5 6 7 8 9
Sample No.

And in the MATLAB workspace:

Prediction results:

Concentration for component 4


Sample: 1: 0.32147
Sample: 2: 0.40489
Sample: 3: 0.71077
Sample: 4: 0.21791
Sample: 5: 0.44625
Sample: 6: 0.6525

23
Sample: 7: 0.92795
Sample: 8: 0.8996
Sample: 9: 0.7744
Sample: 10: 0.4016

Figures of merit

Sensitivity: 10
Selectivity: 0.21
Analytical sensitivity: 1.1e+002
LOD: 0.031
SD of concentration:
Sample 1: 0.0096
Sample 2: 0.0098
Sample 3: 0.011
Sample 4: 0.0095
Sample 5: 0.0099
Sample 6: 0.01
Sample 7: 0.011
Sample 8: 0.011
Sample 9: 0.011
Sample 10: 0.0098

Non-trilinear systems
When profiles change from sample to sample, as in liquid chromatography, the system
becomes non-trilinear. TLD modeling algorithms should not work in this case, unless the
changes are small.
Examine the example LCDAD_data.rar. You should be able to study it in the same
manner than the EEFM_data example analyzed in detail above.

In the LCDAD_data example, the shifts in chromatographic peaks are small and
random in all calibration samples and in test samples 1 to 5. You should be able to
correctly predict the analyte concentrations in these samples, because of the smallness of
the changes.

However, in samples 6 to 10, you should obtain larger errors in quantitating the
analytes, because the shifts in these test samples are larger than in calibration.

This system is best handled using MCR-ALS (see below).

24
MCR-ALS
MCR-ALS (multivariate curve resolution - alternating least-squares) can be applied to
second-order calibration.8

After selecting the option 'MCR-ALS' in the screen, you may choose one of the
augmentation modes [i.e., 'column-wise (mode B)' or 'row-wise (mode C)'] and then
click in 'PREDICT'. This will implement MCR-ALS with non-negativity, unimodality
and correspondence restrictions, and initialization using the purest variables or external
sources.

The mvc2 screen also allows you to generate and save to disk the so-called augmented D
matrix for full MCR-ALS calculations. You should then call a separate MCR-ALS
graphical interface if you wich to implement further calculations with additional
restrictions (closure, selectivity, etc.).9

To study the LCDAD_data example, set the screen as follows:

First notice that these matrices should be augmented column-wise (along the B mode),
because elution time is in the column direction. The number of components has been set
to 6 in order to estimate the correct number of responsive constituents. Click in
'ESTIMATE COMPONENTS' leading to the following figure:

25
2
PC Analysis of augmented matrix
10

Explained variance
0
10

-2
10

-4
10
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

0.05

0.04
Residual fit

0.03

0.02

0.01

0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Component

This information allows one to choose 3 as the correct number of calibration


components, because: 1) the explained variance is small after the third component and
2) the residual fit can be compared with the noise level of 0.005 units.

Set the number of components to 3:

Clicking in 'PREDICT' will lead you to two screens, one with the initial guesses of the
spectra for the 3 components:

26
Comment:
• You can change the initialization method from: (1) 'PURE non augm 10%' (default 10%
noise level) to other noise levels, (2) PURE in the augmented mode with various noise
levels, (3) external files in the augmented ('X0_augm.TXT') or non-augmented
('X0_nonaugm.TXT') mode, meaning that you have the initial profiles in ASCII files
named either X0_augm.TXT or X0_nonaugm.TXT, in the folder where the data are
located and (4) PARAFAC. Selecting these options will automatically load either file and
use it as initialization guess.

Another screen will also appear, with a summary of constraints:

27
Comment:
• You can apply non-negativity and unimodality in any of the two modes, and for selected
components. For example, changing unimodality in any of the two modes from 0 0 0 to 1
1 1 will require all components to be unimodal, and to 1 0 1 will require components 1
and 3 to be unimodal, but not component 2.
• If all components are required to be non-negative, then fast non-negative least-squares
will be applied for MCR-ALS fit, otherwise forcing to zero values will be applied.
• You can enter the indexes for interferents separated by blank spaces, 2 in our case,
because component No. 2 is not present in the calibration set (it will, of course, be present
in the test sample).
• You can enter a filename having calibration concentrations (number of rows = number of
calibration samples, number of columns = number of calibrated analytes), and also the
indexes of the calibrated analytes (in the same order as the columns of the calibration
concentrations). In this case the filename (provided with the data) is 'LCDAD_ycal.txt'.
Notice that the analyte order (3 1) is the same as the columns in the latter file. You can
select correspondence and/or correlation for calibration concentrations.
• You can apply selectivity constraints in each mode, by loading separate files containing a
matrix of 1s and 0s, with the size of the augmented or non-augmented matrix of profiles
(0s indicate where a component should not respond).
• You can apply closure constraints in each sub-profile in the augmented mode, by loading
a file containing as many columns as closure constraints, having in each column 1s and
0s, which will indicate which species participates in each closure relationship.
• After selecting and reviewing the constraints to be applied during ALS, click in ‘APPLY’.

Then click in 'GO'. This report will appear:

MVC2
Second-order multivariate calibration toolbox
January 2013
For assistance read the document 'mvc2_gui_manual.pdf'
and Chemom. Intell. Lab. Syst. 96 (2009) 246

MCR-ALS
Multivariate curve resolution-
alternating least-squares
written by Alejandro Olivieri
Department of Analytical Chemistry
University of Rosario
Suipacha 531, Rosario (2000)
Argentina

Augmented matrix was saved to disk in file


\temp2\D_LCDAD_test1.txt

Sum-of-Squares Iterations Explained


of residuals variation

0.3694 5 99.8822
0.3655 10 99.8834
0.3644 15 99.8838
0.3640 20 99.8839
0.3637 25 99.8840

Convergence was achieved in 26 iterations

28
SSR = 0.36371
Residual fit = 0.0049242
Explained variance = 99.884

This information also appears in a report screen:

You will obtain the spectra and augmented elution profiles:

Non-augmented mode

1
0.3 2
Intensity

3
0.2

0.1

0
255 260 265 270 275 280 285 290 295 300

Augmented mode

1.5
Intensity

0.5

0
50 100 150 200 250 300

If you click in 'SAVE', profiles and augmented matrix data are saved to disk:

******* IMPORTANT ****************************************


Files named A_MCR.TXT, Baug_MCR.TXT,
and C_MCR.TXT have been saved to disk
containing the scores A and the profiles Baug and C

Augmented matrix was saved to disk in file


\temp2\D_LCDAD_test1.txt
First top matrix is the sample
followed by the calibration matrices.
**********************************************************
A plot is also obtained of the profiles sample by sample, starting by sample No. 1 which
is a test sample:

29
Augmented mode: selected sample
2
1
1.8 2
3
1.6

1.4

1.2
Intensity

0.8

0.6

0.4

0.2

0
5 10 15 20 25 30

Inserting the component number and calibration file with concentrations of selected
analyte leads (click in 'GO') to the pseudo-univariate plot:

30
Calibration line: component 3, 3 components
25

20 Calibration
Fit: s=0.27366 r2=0.99936
15 Test

Score
10

-5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Nominal
Calibration errors
0.02

0.01
Error

-0.01

-0.02
1 2 3 4 5 6 7 8 9
Sample No.

And in the workspace the analytical results including figures of merit:

Prediction results

Predicted concentration:
Sample 1: 0.3315

Figures of merit

Sensitivity: 0.69
Selectivity: 0.58

Based on pseudo-univariate graph


Analytical sensitivity: 83
LOD: 0.047
LOQ: 0.14
SD of concentration:
Sample 1: 0.013

Based on theory
Analytical sensitivity: 1.4e+002
LOD: 0.036
LOQ: 0.11
SD of concentration:
Sample 1: 0.0088

You may click on the button 'AFS BANDS' to get the AFS extreme bands for the
analyte, and an estimation of the relative rotational ambiguity error in the analyte
prediction:

31
The extreme profiles can be saved by clicking in 'SAVE'.

32
PARAFAC2
PARAFAC2 is a variant of PARAFAC which allows for changes in component profiles
along one of the data modes. This permits analysis of chromatographic-spectral data.

Set the working folder for LCDAD_data and the mvc2 screen as follows:

Comments
• The changing mode is the column mode, which is the elution time mode (as in MCR-
ALS).
• Non-negativity in A and C modes are necessary because PARAFAC2 is not unique.
Notice that this cannot be applied to the B mode (the changing elution time mode).
• Initialization is also important, in this case it has been set at the best of several trials.

After clicking in 'PREDICT', profiles are obtained which allows one to select the analyte
and go to the pseudo-univariate plot. Notice that the profiles are OK for the calibration
samples, however the profiles in the test samples are not correct. This cannot be
corrected by constraints, since PARAFAC2 does not allow for constraints in the
changing mode.

You may try other samples, and even analyte prediction in the test samples to check
whether the prediction is accurate or not.

33
Unique mode

1
0.3 2
3

Intensity
0.2

0.1

0
255 260 265 270 275 280 285 290 295 300

Sample-dependent mode

1.5
Intensity

0.5

50 100 150 200 250 300

34
BLLS/RBL
BLLS/RBL has been temporarily disabled, see U-PLS/RBL or N-PLS/RBL.

35
N-PLS/RBL and U-PLS/RBL
Calibration
We continue working with the simulated data contained in 'EEFM_data'. These data
contain two calibrated analytes and one or two interferences, hence U-PLS and N-PLS
methodologies are not ideal for analyzing these data, because they do not show the
second-order advantage. Therefore, we expect U-PLS and N-PLS to behave well during
calibration, but to show poor predictive ability towards new test samples, unless RBL is
implemented. RBL is not new,10 but was recently re-discovered and applied to complex
systems.11
The first step in U-PLS is the selection of the number of components from calibration
data only, using cross-validation.

Return to the MENU and prepare for leave-one-out cross-validation in U-PLS/RBL:

Comments:
• The 'ESTIMATE COMPONENTS’ option is available. These algorithms do not show the
second-order advantage, 12,13 unless coupled to RBL.
• We have increased the number of components or latent variables (6 in this case) to a
number larger than the number of analytes (a good number if about half the number of
calibration samples).
• CV works regardless of the number of interferents (0 in this case), and test samples. It
only works with calibration data and with the calibration latent variables (6 in this case).

Click on 'ESTIMATE COMPONENTS and the MATLAB working space shows:

U-PLS Leave-one-out cross validation

Sample left out = 1 of a total of 9 samples


Sample left out = 2 of a total of 9 samples

36
Sample left out = 3 of a total of 9 samples
Sample left out = 4 of a total of 9 samples
Sample left out = 5 of a total of 9 samples
Sample left out = 6 of a total of 9 samples
Sample left out = 7 of a total of 9 samples
Sample left out = 8 of a total of 9 samples
Sample left out = 9 of a total of 9 samples

Component PRESS SEP F p


1.0000e+000 1.1473e+000 3.5705e-001 1.1699e+003 9.9900e-001
2.0000e+000 9.8075e-004 1.0439e-002 1.0000e+000 4.9900e-001
3.0000e+000 1.0400e-003 1.0749e-002 0 0
4.0000e+000 1.0390e-003 1.0745e-002 0 0
5.0000e+000 1.0388e-003 1.0743e-002 0 0
6.0000e+000 1.0388e-003 1.0743e-002 0 0

Suggested number of latent variables: 2

CV outliers for A=2


Sample F Outlier
1.0000e+000 8.4138e-002 0
2.0000e+000 3.1763e+000 1.0000e+000
3.0000e+000 2.3433e-002 0
4.0000e+000 7.2660e-002 0
5.0000e+000 4.1490e-002 0
6.0000e+000 1.6887e-001 0
7.0000e+000 5.1659e-002 0
8.0000e+000 2.3131e-002 0
9.0000e+000 9.9043e-002 0

Comments:
• The first table gives the usual parameters for establishing the number of latent variables
according to Haaland's criterion,14 (i.e., p<0.75) suggesting that the optimum number of
U-PLS LVs is 2. The last column is the probability associated to van der Voet's
randomization test:15 p < 0.05 indicates that the PRESS with A LVs is significantly larger
than the minimum PRESS. This does also suggest A = 2.
• The second table allows one to pick outliers. Sample No. 1 appears to be outlier, however
the F value is not significantly larger than 1, and hence we decide to retain it.

This information is also available in a table:

37
The CV plots are as follows:

Prediction in test samples


Return to the MENU, change the number of LVs to 2 (the optimum from CV), and click
on 'PREDICT'. We will first predict the concentrations in a test sample using 0
interferents, which we know is not the case:

38
After clicking in 'PREDICT', the MATLAB Command Window shows:

Predicted concentration:

Sample No. 1: 1.2315

Explained calibration variance


A Var(X) Var(y)
1.0000 95.6014 71.9779
2.0000 99.8516 99.9861

Calibration fit residue: 0.086

Sample residues
Sample Residue
1.0000 2.2553

Figures of merit
Sample SEN Anal. SEN
1.0000 39.5994 17.5585

Sample SD LODmin LODmax LOQmin LOQmax


1.0000 0.0698 0.1879 0.2173 0.5638 0.6518

Comments:
• Predicted concentration, SD and fit residue are shown.
• The calibration residue, sensitivity, analytical sensitivity, and LOD/LOQ (both as ranges)
are estimated. If you want another noise level, start again by inserting the correct value in
'Uncertainty in signals'. Notice that the calibration residue (0.086) is much smaller than
the sample residue (2.25), due to the presence of unexpected components in the samples.
• Notice that the selectivity is not available for U-PLS.
• SDs are calculated with error propagation theory, and requires introducing standard errors
in instrumental response to be inserted under 'Uncertainty in signals' in the MENU. If left
blank, the program will automatically estimate it on the basis of regression errors, but it is

39
preferable to introduce them from experimental results. Please note that uncertainty in
concentrations is a subject of current debate in the literature.16,17
• Compare the results for analyte 2.
• Repeat all the calculations (including CV) with N-PLS.

Predictions using RBL


To predict in this system which has 2 analytes and interferences, set the screen for
prediction using 2 LVs for calibration and, as a first trial, 1 interference (we will then
increase this number).

Clicking in 'PREDICT' produces a plot of the profiles for the interference only (the
calibration profiles are latent vectors with no physical meaning).

40
Mode B

0.2

0.15

Intensity
0.1

0.05
1
325 330 335 340 345 350 355 360

Mode C

0.2
Intensity

0.15

0.1

0.05
265 270 275 280 285 290

Comments:
• Only the interference profile is retrieved, in both modes (top and bottom).

The MATLAB space shows the following:

Predicted concentration:

Sample No. 1: 0.31991

Explained calibration variance


A Var(X) Var(y)
1.0000 95.6014 71.9779
2.0000 99.8516 99.9861

Calibration fit residue: 0.086

Sample residues
Sample Before RBL After RBL
1.0000 2.2553 0.0973

Figures of merit
Sample SEN Anal. SEN
1.0000 18.2522 187.4973

Sample SD LODmin LODmax LOQmin LOQmax


1.0000 0.0057 0.0176 0.0244 0.0528 0.0731

Comments
• Figures of merit are given (except the selectivity, since this is not available for U-PLS).
PSO (particle swarm optimization) is employed for improving the RBL procedure.18
• You may notice that for more than one interferent, the recovered profiles are not real
spectra, but linear combinations given by singular value decomposition.

41
Try increasing the number of interferents, taking note of the interferent profiles and of
the residues after RBL. You should obtain a table similar to the following:

Number of Residues after Profiles


interferents RBL
1 0.097 Spectrum-like
2 0.096 A spectrum-like and a noisy profile,
or a profile resembling an analyte

The above table shows that the correct number of interferences is 1, because the
residues stabilize and no noisy profiles are obtained. If you have an idea of the
underlying instrumental noise, then the residues after RBL should be similar to the
noise level.

You can also analyze the example LCDAD_data, with similar results to TLD models. In
this case, U-PLS can cope with the random and small changes in calibration and in test
samples 1 to 5, but cannot deal with the large changes in samples 6 to 10.

42
U-PCA/RBL
U-PCA/RBL has been recently developed to obtain the second-order advantage from
non linear second-order signals. Set the screen as follows, to study the set 'EEFM_data':

U-PCA/RBL is flexible enough to be applied to non-linear systems. If this were the case,
use the following screen for this purpose:

We have set the number of calibration components to 6, in order to explore the PC


analysis of the calibration matrix from 1 to 6 components. You can generate a report on
PCA analysis using the button "ESTIMATE COMPONENTS". You will obtain a report
on PCA analysis on the screen:

Unfolded PC analysis of calibration data

PC %Expl. Var. Sfit


1.0000e+00 9.6831e+01 4.2196e-01
2.0000e+00 3.0208e+00 9.7654e-02
3.0000e+00 2.4082e-02 9.6582e-02
4.0000e+00 2.3013e-02 9.5547e-02
5.0000e+00 2.2132e-02 9.4476e-02
6.0000e+00 2.0966e-02 9.3578e-02

and also a figure with the corresponding plots:

43
2 PC Analysis of calibration data
10

Explained variance
1
10

0
10

-1
10

-2
10
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

0.5

0.4
Residual fit

0.3

0.2

0.1

0
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Component

This information allows one to choose 2 as the correct number of calibration


components, because: 1) the explained variance is small after the second component and
2) the residual fit can be compared with the noise level of 0.1 units.

Set 2 as the number of components, and use the button 'PROCESS CALIBRATION”, to
get scores for calibration samples which are stored for ANN training:

U-PCA of calibration data

Calibration fit residue: 0.098

********** IMPORTANT ***********************************************

44
A file named TCAL.TXT has been saved to disk
containing the calibration scores Tcal
in the folder \temp2
********************************************************************

Now set to 1 the number of interferents in the main screen, in order to study a test
sample having unexpected components:

Using the button 'PROCESS TEST', you get the plot of interferent profiles and a report
on the RBL procedure (test scores are also saved to disk):

Calibration fit residue: 0.098

Sample fit residue:


Sample Before RBL After RBL
1.0000 2.2553 0.0973

********** IMPORTANT ***********************************************


A file named TU_EEFM_test1.txt has been saved to disk
containing the sample scores of all test samples in matrix form
(samples x factors) in the folder \temp2
**********************************************************************

Mode B

0.2

0.15
Intensity

0.1

0.05
1
325 330 335 340 345 350 355 360

Mode C

0.2
Intensity

0.15

0.1

0.05
265 270 275 280 285 290

Comments
• The RBL residue decreases significantly with 1 RBL component, and reaches the level of
the noise. This indicates that 1 is a correct choice for the number of interferents.
• The scores of the PCA analysis are conveniently saved in ASCII files.

Once the numbers of components are known, repeat the process and save the calibration
and test sample scores.

45
Then employ a separate non-linear methodology, train it with the calibration scores, and
use the test scores for prediction. You can use the typical perceptron neural network.
For additional neural networks, see for example, radial basis function (RBF) programs19
or support vector machine (SVM) programs,20 easily available on the internet.

46
References

1 Bro, R. PARAFAC. Tutorial and applications. Chemom. Intell. Lab. Syst. 38, 149
(1997).
2 Wu, HL, Shibukawa, M, Oguma, K. An alternating trilinear decomposition algorithm
with application to calibration of HPLC-DAD for simultaneous determination of
overlapped chlorinated aromatic hydrocarbons J. Chemometrics 12, 1 (1998).
3 Xia, AL, Wu, HL, Fang, DM, Ding, YJ, Hu, LQ, Yu, RQ. Alternating penalty
trilinear decomposition algorithm for second-order calibration with application to
interference-free analysis of excitation-emission matrix fluorescence data. J.
Chemometrics 19, 65, (2005).
4 Chen, Z-P, Wu, H-L, Jiang, J-H, Li, Y, Yu, R-Q. A novel trilinear decomposition
algorithm for second-order linear calibration. Chemom. Intell. Lab. Syst. 52, 75
(2000).
5 Bro, R, Kiers, HAL. A new efficient method for determining the number of
components in PARAFAC models. J. Chemometrics 17, 274 (2003).
6 Olivieri, AC y Faber, NM Standard error of prediction in parallel factor (PARAFAC)
analysis of three-way data. Chemom. Intell. Lab. Syst. 70, 75 (2004).
7 Olivieri, AC, Computing sensitivity and selectivity in parallel factor analysis and
related multi-way techniques: the need for further developments in net analyte signal
theory, Anal. Chem. 77, 4936 (2005).
8 A. DeJuan, E. Casassas, R. Tauler, in R. A. Myers (Ed.), Encyclopedia of
Analytical.Chemistry, Vol. 11, Wiley, Chichester, 2002, pp. 9800-9837.
9 http://www.ub.es/gesq/mcr/mcr.htm
10 Öhman, J, Geladi, P, Wold, S. Residual bilinearization. Part I. Theory and algorithms.
J. Chemometrics 4, 79 (1990).
11 Olivieri, AC. On a versatile second-order multivariate calibration method based on
partial least-squares and residual bilinearization. Second-order advantage and
precision properties. J. Chemometrics 19, 253 (2005).
12 Wold, S, Geladi, P, Esbensen, K, Øhman, J. Multiway principal components and PLS
analysis. J. Chemometrics 1, 41 (1987).
13 Bro, R. Multiway calibration. Multilinear PLS. J. Chemometrics 10, 47 (1996).
14 Haaland DM, Thomas EV. Partial Least-Squares Methods for Spectral Analyses. 1.
Relation to Other Quantitative Calibration Methods and the Extraction of Qualitative
Information. Anal. Chem. 60, 1193 (1988).
15 van der Voet, H, Chemom. Intell. Lab. Syst. 25, 313 (1994).
16 Faber, K, Lorber, A, Kowalski, BR, Analytical figures of merit for tensorial
calibration. J. Chemometrics 11, 419 (1997).
17 Faber, NM, Bro, R. Standard error of prediction for multiway PLS. 1. Background and
a simulation study. Chemom. Intell. Lab. Syst. 61, 133 (2002).

47
18 Bortolato, SA, Arancibia, JA, Escandar, GM, Olivieri, AC, Improvement of residual
bilinearization by particle swarm optimization for achieving the second-order
advantage with unfolded partial least-squares, J. Chemometrics 21, 557-566 (2007).
19 http://www.anc.ed.ac.uk/rbf/rbf.html.
20 http://www.esat.kuleuven.ac.be/sista/lssvmlab/.

48
Final remarks
Efforts have been made to avoid conflicts. However, no program is 'bullet-proof'. If any
problem occurs, please contact:

Prof. Alejandro C. Olivieri


Instituto de Química Rosario
Departamento de Quimica Analitica
Facultad de Ciencias Bioquimicas y Farmaceuticas
Universidad Nacional de Rosario
Suipacha 531 Rosario (S2002LRK) Argentina
Tel./Fax: 54-341-4372704
E-mail: olivieri@iquir-conicet.gov.ar or aolivier@fbioyf.unr.edu.ar

Acknowledgments
Part of the MVC2 codes were written by Hai-Long Wu and Ru-Qin Yu, State Key
Laboratory of Chemo/Biosensing & Chemometrics, College of Chemistry and Chemical
Engineering, Hunan University, Changsha 410082, China, by Roma Tauler, IDAEA,
Jordi Girona 18, Barcelona 08034, Spain, and Anna de Juan, Chemometrics and
Solution Equilibria Group, University of Barcelona, Department of Analytical
Chemistry, Diagonal 647, Barcelona 08028, Spain. The N-way toolbox has been
generously made public by Rasmus Bro and Claus Andersson, Copenhagen University,
DK-1958 Frederiksberg, Denmark. This has been properly acknowledged in the
corresponding program codes.

49

You might also like