Genetic Algorithms to Correct for Instrumental

October 9-11, 2002, Mumbai.

Instabilities in IImpurity

mpurity Estimation by

Spectrochemical Analysis

S.V.G. Ravindranath

Spectroscopy Division

svgr@apsara.barc.ernet.in

and

A.P. Tiwari

Reactor Control Division

aptiwari@apsara.barc.ernet.in

Genetic Algorithms to Correct for Instrumental Instabilities in Impurity

Estimation by Spectrochemical Analysis

S.V.G. Ravindranath

Spectroscopy Division

svgr@apsara.barc.ernet.in

and

A.P. Tiwari

Reactor Control Division

aptiwari@apsara.barc.ernet.in

Bhabha Atomic Research Centre, Trombay, Mumbai – 400 085

SUMMARY

Of late genetic algorithms (GAs) are used in solving complex problems in science,

engineering, business and social sciences. GAs are population based parallel search

strategies based upon the Darwinian principle of biological evolution. GAs start with a

set of initial random population of solutions called generation. To produce next

generation the individual solutions are evaluated and selected according to their fitness.

These are transformed with genetically inspired operators such as crossover and

mutation. By repeating this procedure involving evaluation, selection, crossover and

mutation the GA will likely find a solution with a higher fitness value. In this paper

application of genetic algorithms for spectrochemical analysis has been described for

better estimation of impurities by correcting for instrumental instability. This has been

developed in MATLAB.

1.0 INTRODUCTION

Spectrochemical methods using instruments such as inductively coupled plasma

atomic emission spectrometers (ICP-AES) determine the trace level concentrations of

impurities in a given sample. Accuracy of the determinations in spectrochemcial analysis

is limited by two factors viz. spectral interference and instrumental instability. Spectral

interference occurs due to the overlapping of the analyte line by the neighbouring non-

analyte lines. Chemometric methods such as Kalman filters are available to tackle the

interference problems. Spectrometer instability over time causes spectral shift between

the pure component and sample scans. Shifts exceeding 0.1 pico meter (pm) affect the

accuracy of impurity determination. Simplex methods are available to correct for

instrumental instabilities. But simplex methods are sensitive to initial guesses and

become more complex with more number of parameters. In this paper application of GA

to correct for the instrumental instabilities is described. This has been tested with the data

available in literature. A brief introduction to GAs, the problem of instrumental

instability in estimation of Cd in As, implementation of GA and the results are given in

the sections following.

Genetic algorithms introduced by Holland comprise a set of initial random population

of solutions and biologically inspired operators like selection, crossover and mutation [1].

A typical GA cycle consists of the following steps:

• Creation of population strings

• Evaluation of each string

• Selection of best strings

• Genetic manipulation to create new population of strings

The population comprises of a group of potential solutions called chromosomes. Initially

population is generated randomly. A chromosome is usually expressed in a string of

variables, each element of which is called gene. The variable can be represented either by

binary, real or other forms and its range is usually problem specified. Bit string encoding

is the classical approach. Of late several researchers are using the other types of

representation too.

The fitness function is the main source to provide mechanism to evaluate the fitness

of each chromosome as a potential solution. The fitness values of all chromosomes are

evaluated by calculating the fitness function in a decoded form with respect to the

constraints imposed by the function.

Selection operator emulates nature’s policy of survival of fittest. Based on fitness

values selection operator selects the parents for mating process. There are many ways to

achieve effective selection such as ranking, tournament and Roulette wheel selection but

the essential assumption is to give preference to fitter individuals.

Crossover and mutation operators produce new population of individuals by

manipulating the genetic information referred to as genes possessed by members of

current generation. Crossover operator combines two subparts of two parent

chromosomes to produce the offspring that contain subparts of both parents’ genetic

material. Length of the subparts is chosen randomly. After crossover mutation operator

changes value of chromosome by changing the value of bit at randomly selected position.

Crossover and mutation operators are applied with probabilities Pc and Pm respectively

and generally Pm < Pc.

Instruments used in spectrochemical analysis such as sequential ICP-AES consist of a

scanning monochromator. Scanning monochromators are subject to drift. Drifts above

0.1 pm affect the accuracy of determination of impurities in a sample. In spectrochemical

analysis instrument is calibrated with standards consisting of known amount of analyte

concentrations against spectral line intensity. Using the calibration curve and measured

intensity the analyte concentration in a sample is determined. Because of the

spectrometer instability over time the standard and sample scans which were recorded

sequentially may have been shifted with respect to each other to an unknown amount.

van Veen et al [2] have taken up the classical case of Cd interfered by As and developed

a program to solve the interference problem by using Kalman filter technique and drift

problem by optimizing the peak distance by a version of simplex method. The spectral

drift problem was solved by optimizing the peak distance between spectral lines of Cd

and As with reference to sample scan in the spectral window at 228.802 nm. Sample

consists of both Cd and As. In this paper peak distance optimization has been carried out

using the genetic algorithms using the spectral data given in reference [3].

4.0 IMPLEMENTATION

The two parameters needed to be optimized are the peak distance between Cd and As

scans and the peak distance between the Cd and sample scans. These two parameters are

represented as d1 and d2. The scans were recorded with a scan step of 1.5 pm. The

maximum possible drift for these two parameters is taken as ± 5 steps. Implementation is

based on the steps and details as given by Michalewicz [3]. To represent the two

parameters as a string of binary numbers with a precision of three decimal places, each

requires a length of 14 bits. Thus each chromosome is a string of 28 binary bits. A

population of size 30 has been generated randomly. Determining the fitness of each

chromosome, the genes are decoded in to d1 and d2 and passed to a fitness function. First,

the fitness function effects the required shifts and reconstructs the best possible sample

spectrum using Kalman filtering. Then it returns the square root of the sum of the squares

of the difference between the corresponding data points in the constructed and original

sample scan as the fitness value of that particular chromosome. To treat the optimization

as a maximization problem the value returned by the fitness function is suitably modified.

All the members of the population are evaluated and their cumulative probabilities are

computed.

Roulette wheel selection mechanism is applied to select the population for the next

generation. The crossover operator is applied on the new population with Pc as 0.25.

This means that 25% of the chromosomes are expected to undergo the crossover. Later

the population undergoes the mutation operator with Pm as 0.01. This means 1% of the

bits of the population undergo the mutation. With this new generation of chromosomes

are ready for further evolution. This process is repeated for 100 generations. In each

generation, concentration values of Cd and As in the sample are computed for the best

chromosome. The best chromosome is allowed to pass to the next generation.

5.0 RESULTS

The constructed Cd spectrum with the fitted solution parameters and the observed Cd

spectrum are shown in Fig.1. This gives visual indication how close both spectra are. The

fitted spectrum is generated by subtracting the spectrum of As and the background

information computed by the Kalman filter routine. The evolution of concentration

values of Cd and As as generations progressed is shown in Fig.2. The concentrations

reached and stabilized at the expected values after 40 generations. This information could

be used as one of the criterion for termination of the optimization sequence. Due to the

randomness of the process any subsequent deviations stabilize soon as in the case of Cd

around 80th generation. The impurity estimation values with and without peak distance

optimization are tabulated in Table.1. These results are very close to those reported in

reference [2].

7000

0.8

6000

Cd

5000 Observed As/100

Fitted 0.7

Concentration

4000

Intensity

3000 0.6

2000

0.5

1000

0

0 10 20 30 40 50 60 0 20 40 60 80 100

Steps Generations

Before After

Expected value

Element optimization optimization

(mg/ml)

(mg/ml) (mg/ml)

Cd 0.629 0.785 0.632

As 51.8 55.0 51.4

Though the GA approach takes longer time for computation, the possibility of

applying genetic algorithms with success opens up the possibilities of using GAs in more

complex situations where more number of lines interfere with the analyte line.

REFERENCE

1. K.K. Shukla, Neuro-Computers: Optimization based learning, Narosa Publishing

House, New Delhi, 2001

2. E.H. van Veen, S. Bosch and M.T.C. de Loos-Vollebregt, “The Kalman filter approach

to inductively coupled plasma atomic emission spectrometry”, Spectrochimica Acta,

Vol.49B, pp. 829, 1994

3. Z. Michalewicz, Genetic algorithms + Data structures = Evolution programs, 2nd Ed.,

Springer-Verlag, Berlin (1994)

