4 views

Uploaded by Ryshardson Geovane

Instructions for ParLeS

- 2. Manage-New Proxy for Financial Development-Khalil Mhadhbi
- Chapter 10-Analysis of Ecological Distance by Ordination (6)
- Kaggle competitions - How to win
- Survey paper on methodologies employed in MINERAL exploration
- Speech processing research paper 13
- IDRISI Selva - Whats New
- Fitting and Predicting a Time Series Model
- water quality Taiwan
- Choosing a Machine Learning Classifier
- 9 Representing Animations by Principal Components
- Chapter6BMS.pdf
- Gas Liquid Absorption
- Online Temporal Signal Comparison
- Fault Detection Using the Phase Spectra From Spectral Decomposition
- Bellani Deprivation
- 18_Muncan-Janjic
- 10.1.1.193.6236
- Shapes
- sensors-14-19806
- Molecular representation

You are on page 1of 13

Viscarra Rossel, R.A. 2007. ParLeS: Software for chemometric analysis of spectroscopic data.

Chemometrics and Intelligent Laboratory Systems (in-press) doi: 10.1016/j.chemolab.2007.06.006

Before you can run ParLeS you will need to format your data correctly.

S1 Object1 OD11 OD12 . . OD1k

Each column

Each row . . . . represents the

represents the . . . . reflectance for

spectrum of a

. . . . all samples at a

sample

Sn Objectn ODn1 . . . ODnk particular

wavelength

s y X

Sample Response Spectroscopic predictor variables

label variable Reflectance measured at each K

wavelength

y (bold lower case) represents a vector (array of scalars)

z (lower case) represents a scalar ( a single number)

From Figure 1:

a. For the Calibration data:

First row must contain header information, i.e. labels that include those for your samples (S1,

S2, etc), your response variables (e.g. carbon, pH, etc) and labels for your predictor variables

(e.g. the wavelengths/wavenumbers).

First column should contain sample labels

The next column(s) should contain response variables (i.e. the y-variables). Note that ParLeS

accepts more than one response variable (see below).

The columns after that should contain the predictor variables (i.e. the X-data e.g. NIR spectra)

Example of format for calibration file, containing labels, a single response variable (OC) and NIR

spectra (700-2500nm):

S1 2.56 0.35 0.37 0.67

S2 1.35 0.32 0.33 0.62

.

.

.

Etc.

Note: you can have more than 1 response variable in your files. You will be asked to select the y-

variable you want to model or test in the appropriate sections of the software.

If you have more than one response variable then place them in the second third, etc. columns after the

sample labels and before the predictor variables (i.e. before the X-data).

Example of format for calibration file, containing labels, three response variables (OC, pH and N) and

NIR spectra (700-2500nm):

S1 2.56 6.5 1.3 0.35 0.37 0.67

S2 1.35 7.3 1.8 0.32 0.33 0.62

.

.

.

Etc.

As for calibration data, but a prediction file requires a column of zeroes replacing the y-variable, as

this will be your unknowns that you want to predict using your model.

S1 0 0.35 0.37 0.67

S2 0 0.32 0.33 0.62

.

.

.

Etc.

If you want to test your models with independent test data then your file format will be as in a.

above, i.e. including the response variable data to be used to test the models. Remember to also

include headers as in a. above.

Prepare your data files for import into ParLeS and save them as

tab delimited (ASCII) text files.

2 Importing data into ParLeS

In this tab, you may choose to:

(i) select a file (with the above formatting) for modelling, or,

(ii) by checking the box labelled Check to join files from a directory select to merge multiple

spectroscopic files with x,y format (e.g. where x is frequency and y is reflectance), into a single

file.

- Use the 'Get file for modelling' browse folder button to select the path and your data file.

- Press the IMPORT DATA FOR MODELLING button and you will see the header information of

your file

- Using the numeric control 'Total number of y variables', select the total number of response variables

in your file. For example, if you have 3 response variables as in the example above then you will write

3 in this numeric control. If you only have 1 response variable then write 1.

- Using the numeric control 'Select y variable for modelling', select the response variable you want to

model. Remember that ParLeS uses the PLSR1 algorithm, i.e. models a single y-variable at a time.

Following from the above example, if you want to model pH then you would write 2 in this control,

if you want to model N then you write 3.

- Press the IMPORT DATA FOR MODELLING button and if your data file is correctly formatted

and you have correctly identified the total number of y-variables and the y-variable you want to model

then you should see a sample of your data in the windows labelled y variables, Labels, Selected y

X variables and Spectral range. You will also see a histogram and descriptive statistics of the y-

data and a sample of your spectra on the graph.

- If you cannot see the correct data in the windows or you cannot see the spectra then before you

proceed you will need to check your data file and if necessary remake the file by carefully following

the above instructions.

If the file format is incorrect or you have incorrectly identified the total number of y variables in your

data file then you will be able to see this in the sample data windows and more than likely your

spectra will not plot correctly.

In Figure 2, the data file contains 152 samples (size y: rows) and 933 predictor variables (size X:

columns). The data file has 25 response (y) variables, the Labels look correct (i.e. GYD), the first

y variable was selected for modelling, the Spectral range are in wavenumbers starting from 3992.5

cm-1 and the spectra are soil MIR spectra.

- in a single directory which you specify in the Directory with files to join control (the best

thing to do is to copy the file path rather than use the browse button), and

- of the same type, i.e. with the same file extension protocol, which you specify in the File

extension control. For example text files will have a .txt extension. Note that the files should

be in ASCII format.

You may then run the program using the IMPORT DATA FOR MODELLING button. Once the

software has run, a sample of the merged spectra will be displayed. This may take some time

depending on the number of files that you have. If the sample spectra do not appear to plot properly,

then an error has occurred and you should check that you have the correct directory or that you have

the correct file extension.

The merged file may be saved by checking the SAVE MERGED FILE control or it may be further

analysed in ParLeS (see below).

In this tab you may select a file for: (i) prediction of unknowns or (ii) to test your models with

independent test data. Refer to file format instructions in 1 above.

(a) (b)

Figure 3. The import data for prediction tab. (a) for prediction of unknowns (b) for independent

testing of models. Note: in the latest version of the software you will also see a histogram and

descriptive statistics of the y-data.

For the prediction of unknowns the file requires a column of zeroes replacing the y variable. In this

case your Total number of y variables will be 1 and the Select y variable for prediction will also

be 1. See Figure 3a.

- Using the numeric control 'Total number of y variables', select the total number of response variables

in your test file.

- Using the numeric control 'Select y variable for modelling', select the response variable for which

you want to test your model.

- Press the IMPORT DATA FOR MODELLING button and if your data file is correctly formatted

and you have correctly identified the total number of y-variables and the y-variable you want to test

then you should see a sample of your data in the windows labelled y variables (this is All test y

variables in earlier versions of ParLeS), Labels, Selected y X variables and Spectral range. The

graph will show a sample of the spectra used for the predictions.

- If you cannot see the correct data in the windows or you cannot see the spectra then before you

proceed you will need to check your data file and if necessary remake the file by carefully following

the above instructions.

If the file format is incorrect or you have incorrectly identified the total number of y variables in your

data file then you will be able to see this in the sample data windows and more than likely your

spectra will not plot correctly.

In Figure 3b, the data file contains 76 test samples (size y: rows) and 933 predictor variables (size X:

columns). The data file has 25 test response (y) variables, the Labels look correct (i.e. GYD), the

first y variable was selected for testing, the Spectral range are in wavenumbers starting from 3992.5

cm-1 and the spectra are soil MIR spectra.

3. Data transformations, preprocessing and pretreatments

The Data Manipulations tab (called Preprocessing in earlier versions of ParLeS) can be used to

transform, preprocess and pretreat your spectra.

(a) (b)

(c) (d)

From the drop-down menus select the desired combination of transformation, preprocessing and

pretreatment to apply. You can test any combination of methods as long as you understand what they

do and you carefully follow the instructions.

From Figure 4, using the dropdown menus you can perform the following transformations and

preprocessing:

o Data transformation transform diffuse reflectance (R) data to Log(1/R) or Kubelka-Munk units

K/S = (1-R)^2/2R. You may also transform from Log(1/R) to R.

o Light scatter and baseline corrections correct data for light scattering effects, etc. using

Multiplicative Signal Correction (MSC), Standard Normal Variate (SNV), SNV with quadratic

detrending, Wavelet de-trending or SNV with wavelet detrending.

The wavelet de-trending level specifies the number of levels of the wavelet decomposition, which

is approximately (1 - trend level*log2(Ls), where Ls is the signal length. When trend level is zero,

signal trend is equal to zero, and signal detrended is identical to signal in. It may be thought of as a

form of baseline correction.

o De-noising/Smoothing de-noise data using a Median filter or the Savitzky-Golay or Wavelet de-

noising. For the Median Filter select the rank to be used in the filtering. For the Savitzky-Golay

first select the number of data points to fit the curve and then the order of the polynomial you wish

to fit. For the Wavelet de-noising select the desired wavelet scale for de-noising. ParLeS uses a

Daubechies wavelet with 4 vanishing moments.

o Differentiation correct the data for baseline, particle size, etc. using first or second derivatives

together with the desired sampling interval.

The software also offers a number of methods for pretreating the predictor data.

From Figure 5, using the drop-down menu you can select which data pretreatment (or enhancement)

to use before you move onto the multivariate modelling. The choices include:

- Mean centre,

- Variance scale,

- Mean centre & variance scale

NOTE: it is common practice, although not imperative, to Mean Centre your data before PCA and

PLSR

Once the particular combination is selected, press the RUN SELECTION button. The first graph will

show your raw data and the graph on the bottom part of the ParLeS window will show you the

combined transformed, preprocessed and pre-treated spectra. You may investigate the effect of each

algorithm separately by selecting it and then pressing the RUN SELECTION button.

For example if you have diffuse reflectance data you may choose to transform these to Log(1/R);

correct for light scattering effects using the MSC; de-noise your signal using the wavelet de-noising at

scale = 2; take the first derivative and mean centre your data before you perform PCA or PLSR.

You can save the manipulated data to a file using the SAVE MANIPULATED DATA (called the

SAVE PREPROCESSED DATA in earlier versions of ParLeS). The saved file will be a tab

delimited text file.

4. Principal Components Analysis (PCA)

ParLeS implements an iterative PCA algorithm based on the NIPALS algorithm described in Martens

& Naes (1989).

In the PCA tab, using the numeric control or slide bar you need to select the maximum number of

PCA components to calculate (Figure 6).

The results from the PCA are displayed in a number of graphics that include:

- the loadings vs. wavelength/wavenumber plot. Using the numeric controls on this plot you may

select to view the loading for each principal component separately or, as in Figure 6, all loadings

simultaneously

- the scores vs. scores plot. Using the numeric controls on this plot you may select the scores for the

principal component that you want to plot.

- the loadings vs. loadings plot. Using the numeric controls on this plot you may select the loadings

for the principal component that you want to plot.

- the percent variation of the predictor data that is explained by each component

Note that in ParLeS version 3.1 you can interact with the scores vs. scores and loadings vs. loadings

plot. Glide your mouse over the data points and click on the point that you want to identify. The point

will change colour and its label will be briefly displayed on the graph.

The PCA scores and loadings can be saved to tab delimited text files by checking the SAVE PCA

SCORES & LOADINGS check box. Two separate dialogues will appear once you check to save: the

first will ask you to give a name for the scores file and the second will ask you to provide a name for

the loadings file.

5. Jackknife cross validation

The cross validation tab can be used to help determine the optimal number of PLSR factors to model.

The results are shown in a number of graphics showing appropriate assessment statistics.

In the PLSR Cross validation tab, using the numeric slide bar or control select the maximum number

of factors for the leave-one-out cross validation (Figure 7).

With large data sets it may be too computationally expensive to use leave-one-out so you could for

example use leave-ten-out. To do this, type the number of samples n to leave out. To help you

decide, the total number of samples in your dataset are given in the numeric indicator No. Samples.

To start the cross validation, press the RUN X-VAL button. The progress bar indicates how much of

the data has been cross validated.

The results of the cross validation is displayed in the following graphics:

- the root mean squared error of cross validation (RMSE) vs. the number of factors

- R2 and Q2 statistics vs. the number of factors

- the Akaike Information Criterion (AIC) vs. the number of factors. Note the AIC preserves model

parsimony.

- the observed vs. cross validation predictions for a selected number of factors, where the user may

select the cross validate predictions to plot using the numeric control Select X-Val model to plot.

The fitted line and equation are also given. For this cross validated model, various assessment

statistics are given: R2, R2adjusted, RMSE, mean error (ME) the standard deviation of the error (SDE)

and the RPD.

The cross validation results can be saved by checking the SAVE X-VAL RESULTS check box. Two

separate dialogs will appear once you check to save: the first will ask you to give a name for the

assessment statistics file and the second will ask you to provide a name for the observed vs. cross

validation predictions for the selected number of factors.

.

Note if you do not need to cross-validate, proceed to the PLSR modelling tab.

6. Partial Least Squares Regression (PLSR)

The orthogonalised PLSR 1 algorithm implemented in ParLeS is that described by Martens & Naes

(1989). In the PLSR Modelling tab you may select the optimal number of factors to model, using the

slide bar or numerical indicator (Figure 8).

Once the number of factors to model are selected, run the software using the RUN PLSR

MODELLING button. Results from the PLSR modelling are shown in a number of graphs:

- Scores vs. scores plot

- Scores vs. y plot

- Regression coefficients (B) vs. wavelength/wavenumber plot

- Spectral loadings (P) and loading weights (W) vs. wavelength/wavenumber plot

- Variable importance for projection (VIP) vs. wavelength/wavenumber plot

- Sorted VIP and wavelength/wavenumber table

- the percent variation of each the predictor and response data that is explained by each factor in the

PLSR model

Note that in ParLeS version 3.1 you can interact with the scores vs. scores; scores vs. y plot;

regression coefficients vs. wavelength/wavenumber plot and the VIP vs. wavelength/wavenumber

plot. Glide your mouse over the data points and click on the point that you want to identify. The point

will change colour and its label will be briefly displayed on the graph.

The PLSR model (scores, regression coefficients (b), the intercept (b0), spectral loadings and loading

weights) as wells as the VIP results can be saved to tab delimited text files by checking the SAVE

SCORES; b, b0, p, w; and VIP check box. Three separate dialogues will appear once you check to

save: the first will ask you to give a name for the PLSR scores file; the second will ask you to provide

a name for the regression coefficients and the third for the VIP results.

7. Prediction

To make PLSR predictions press RUN PREDICTIONS to run the PLSR predictions using the

selected model selected in the PLSR Model tab (see 6. above). The program will run and results and

assessment statistics will be displayed (Figure 9).

The results from the PLSR predictions are displayed in a number of graphics and assessed using

various statistics:

- a sample of the spectra used for predictions

- the predicted values

- when using a test data set, the residuals (observed predicted)

- when using a test data set, the observed vs. predicted and the fitted line, also showing its equation

- the following assessment statistics: R2, R2adjusted, RMSE and confidence intervals, mean error (ME)

the standard deviation of the error (SDE) and the RPD

- a histogram of the predicted values and their descriptive statistics

The predictions can be saved to a file using the SAVE PREDCITIONS check-box.

8. Bootstrap aggregation-PLSR or (bagging-PLSR)

To make the bagging-PLSR predictions first you need to select the number of bootstraps to use for

bagging (the default is 30 bootstraps) as well as the number of PLSR factors to use. Then press the

RUN BAGGING-PLSR button. The program will run and results and assessment statistics will be

displayed (Figure 10).

The results from bagging-PLSR are displayed in a number of graphics and assessed using various

statistics:

- the observed vs. predicted from the bootstraps

- the out-of-bag statistics, which may also be used to evaluate the models

- a plot of the predicted values and their 95% confidence intervals

- the descriptive statistics of the predictions

- the observed vs. predicted and the fitted line, also showing its equation

- the following assessment statistics: R2, R2adjusted, RMSE and confidence intervals, mean error (ME)

the standard deviation of the error (SDE) and the RPD

The bagging-PLSR predictions and confidence intervals can be saved to a file using the SAVE

BAGGED check-box.

Once finished you can exit ParLeS using the EXI PROGRAM button.

9. Errors

If incorrect file format, the software will not run, or run incorrectly.

Please refer to ParLeS license agreement.

You may not use the software for commercial purposes, unless you have obtained permission, in

writing, from Raphael VISCARRA ROSSEL (r.viscarra-rossel@usyd.edu.au or tel. +61 413 326 457)

If the ParLeS is used in research you agree to cite the following reference:

Viscarra Rossel, R.A. 2007. ParLeS: Software for chemometric analysis of spectroscopic data.

Chemometrics and Intelligent Laboratory Systems (in-press) doi: 10.1016/j.chemolab.2007.06.006

http://www.usyd.edu.au/su/agric/acpa/people/rvrossel/Publications.htm

I will appreciate comments/ suggestions for further improvements to ParLeS. In essence ParLeS is

still under development.

11. Disclaimer

I have taken all care to ensure that ParLeS is operationally sound. However, it is supplied 'as is' and no

warranty is provided or implied. I assume no liability for damages, direct or consequential that may

result from its use.

- 2. Manage-New Proxy for Financial Development-Khalil MhadhbiUploaded byImpact Journals
- Chapter 10-Analysis of Ecological Distance by Ordination (6)Uploaded byElPoilu
- Kaggle competitions - How to winUploaded bysvejed123
- Survey paper on methodologies employed in MINERAL explorationUploaded byInternational Journal for Scientific Research and Development - IJSRD
- Speech processing research paper 13Uploaded byimparivesh
- IDRISI Selva - Whats NewUploaded byAndrea Borruso
- Fitting and Predicting a Time Series ModelUploaded bySueja Goldhahn
- water quality TaiwanUploaded bySemana Mh
- Choosing a Machine Learning ClassifierUploaded byjstpallav
- 9 Representing Animations by Principal ComponentsUploaded byleonaso6
- Chapter6BMS.pdfUploaded byQQ
- Gas Liquid AbsorptionUploaded byMahmoud Hendawy
- Online Temporal Signal ComparisonUploaded bySriparthan
- Fault Detection Using the Phase Spectra From Spectral DecompositionUploaded byHafidz Alawy
- Bellani DeprivationUploaded byGeorge Mitica
- 18_Muncan-JanjicUploaded bytransgresivac
- 10.1.1.193.6236Uploaded byVirojana Tantibadaro
- ShapesUploaded byyosue7d
- sensors-14-19806Uploaded byThomas Mercer
- Molecular representationUploaded byBrandon Meza Gonzalez
- Exam 2009Uploaded byAleksandar Stanic
- Environmental correlates of montane tree species distribution in the Nganda-Domwe rainshadow area of the Nyika plateau, MalawiUploaded byThabisisani Ndhlovu
- paper kuUploaded byferdinand_fassa
- v78b03Uploaded bySur Yanti
- [3]Pca(Data Reduction)Uploaded byRinki Parikh
- Group 7Uploaded byMaria Salman
- 2007-past-present-future.pdfUploaded byKartikeya Singh
- Comparative Analysis of Lianyungang Ports Competitiveness based on calculation of principal component analysisUploaded byRosa
- Maraun Et Al-2010-Reviews of GeophysicsQUANTILEUploaded byIván Latorre
- New Analysis WayUploaded byPuskar Bist

- Tibco Spot Miner 8.2 UguideUploaded byikeyada
- Conferencia FRAGBLASTUploaded byRalph Paredes
- using_eviews_for_principles_of_econometrics.pdfUploaded byFaeezah27
- chap3Uploaded bykishoreparasa
- 6 Papr849 YanUploaded bymendezfs
- Predicting Social Adjustment in Middle ChildhoodUploaded byAndini Pritania Putri
- UMTS RNP and RNO IntroductionUploaded byMarzieh Abaspour
- Business CommunicationUploaded byanil_angl
- Financial Management of Sick UnitsUploaded byAfroz Malik
- get_pdf3Uploaded bytwinpixtwinpix
- RyallUploaded bytcalith
- DW Clarke - Application of GPC to Industrial Processes (00001874)Uploaded byAlboPenkista
- Fan Noise, Technology and Numerical MethodsUploaded bypatnaikanup
- ThesisUploaded byfoobar
- SPE-189969-PAUploaded byChris Ponners
- 01_1Uploaded byhn531
- Jyotish_Brighu Prashna Nadi_RG RaoUploaded byParameshwaran Shanmugasundharam
- Significator TableUploaded bySaravanan
- Discriminant AnalysisUploaded byMansoor Mahmood
- Simplified Cooling Time Calculation ZarkadasUploaded byJayesh Modi
- An Intelligent Approach for the Prediction of Surface Roughness in Ball-EndUploaded byBosco Belo
- 0046352920c5292a4e000000Uploaded bymariomato
- ICINCO_2009_199_CRUploaded byzeolium
- Boys Will Be Boys_ Gender, Overconfidence, And Common Stock InvestmentUploaded byalexander_koo_3
- straw rocket lessonUploaded byapi-362184853
- Analysis if Traffic Growth RatesUploaded byManu Bk
- 13. Forecasting&PlanningUploaded bykristina
- Evaluation of Water Corrosivity Using a Corrosion Rate Model for a Cooling Water SystemUploaded byjutll
- 23122013 RUC en Charging FrameworkUploaded byNesagolubac
- Hora RatnamUploaded bySleffa77