You are on page 1of 15

CLUSTERING EXCIPIENT NEAR INFRARED

SPECTRA USING DIFFERENT CHEMOMETRIC METHODS

Seminar summary Anna Jrgensen Pharmaceutical technology division Department of Pharmacy University of Helsinki 18. 9. 2000

Table of contents I Theory....................................................................................................................... 1 1 Introduction........................................................................................................... 1 2 Physicochemical principles of NIR spectroscopy................................................. 1 3 Chemometrics used in NIR spectroscopy............................................................. 2 3.1 Pre-processing of data................................................................................... 2 3.1.1 Derivatives ............................................................................................... 3 3.1.2 Standard normal variate (SNV) transformation....................................... 3 3.1.3 Multiplicative scatter correction (MSC) .................................................. 3 3.2 Pattern recognition methods ......................................................................... 4 3.2.1 Principal Components Analysis (PCA) ................................................... 4 3.2.2 Correlation coefficient ............................................................................. 4 3.2.3 Distances .................................................................................................. 4 4 Objectives of the study.......................................................................................... 4 II Materials and methods .......................................................................................... 5 III Results and discussion .......................................................................................... 6 5 Correlation coefficients......................................................................................... 6 6 Euclidean distances............................................................................................... 6 7 Principal components analysis (PCA) .................................................................. 8 8 Conclusions......................................................................................................... 10 References................................................................................................................... 11 Appendix

Theory

1 Introduction Near infrared (NIR) spectroscopy has raised a lot of interest in the pharmaceutical industry because it is a rapid and non-invasive analytical technique (Blanco et al., 1998). A great advantage of NIR spectroscopy is that there is no need for extensive sample preparation. The amount of radiation absorbed in the NIR region is small and therefore measurements can be made directly from the material itself. This enables the use of fiber optic probes (Higgins, 1997). NIR spectra contain information of both chemical and physical properties of the sample (Blanco et al., 1998, Osborne et al., 1993). Although NIR analysis is fast in use, setting up a NIR method is laborious. The chemometric methods (i.e. the mathematical procedures for extracting the useful information) used are complicated and demand acquaintance with statistics (Lavine, 1998). 2 Physicochemical principles of NIR spectroscopy Near infrared region lies between the visible and middle infrared (MIR) regions of the electromagnetic spectrum. It is defined as the range of electromagnetic radiation in the region of about 780-2500 nm (12821-4000 cm-1) by the European Pharmacopoeia (3rd ed.). Absorption of radiation of molecules in this region is primarily caused by overtones and combinations of fundamental vibration bands that appear in MIR region. To infrared radiation be absorbed, its frequency has to be same as the fundamental vibrational frequency of the molecule in question. The absorption at overtone frequencies occurs at NIR region of the electromagnetic radiation. Polyatomic molecules possess several fundamental frequencies, and thereby, they may exhibit simultaneous changes in the energies of two or more vibrational modes. The frequency observed would be the sum or difference between these fundamental frequencies, which results in very weak bands called combination and difference bands, respectively. The latter are rarely observed in room temperature (Blanco et al., 1998). The measurement of reflectance instead of transmittance is possible due to the low molar absorptivity of absorption bands in the NIR region. This enables recording of solid samples and therefore rapid determination by NIR spectroscopy. Reflectance spectroscopy measures the light reflected from the sample surface. This reflection contains of two components called specular and diffuse reflectance. Specular reflectance is the mirror-type reflection occurring at the boundary of the sample surface and air, and contains very little information of the composition. Diffuse reflectance is light reflecting from the interior of the sample. It arises through multiple scatter by particles near the surface but inside the material. In the process of diffusion, the radiation becomes entirely depolarised, but any radiation speculary reflected maintains its state of polarisation. Thus, the specular component can be eliminated by adjusting the detectors position relative to the sample. The particle size of the sample has a significant effect on the NIR spectrum. If the particle size changes it causes a change in the amount of radiation scattered by the sample (Hruschka, 1987). When the particles are large, the direction of radiation does not change as often as with small particles, so the more radiation is absorbed. This results in a higher absorbance, and thereby has an additive effect on the spectra. Particle size has also a multiplicative effect as the strong absorbers show more change with particle size than weak.

Relative reflectance, which is the ratio of the intensity of the light reflected by the sample to that by a standard, is used in the NIR measurements (Blanco et al., 1998). The standard is usually a stable material with a high absolute reflectance. A widely used alternative, is similar to Beers law 1 (10) A = log = a' c , R where A is apparent absorbance, R relative reflectance, c concentration and a a proportionality constant. Although it lacks functional relationship, it provides satisfactory results in the usual applications of NIR diffuse reflectance spectroscopy (Blanco et al., 1998). 3 Chemometrics used in NIR spectroscopy Interpretation of near infrared spectra always requires mathematical processing due to their complexity. Chemometrics is the use of mathematical, statistical and computer science methods for improving the extraction of useful information from chemical measurement data (Geladi and Dbakk, 1995). The use of chemometrics enables the use multivariate data. This can show unexpected patterns because of the joint effect of all variables is taken into account, in contrast to traditional chemical relationships which usually consider only one or a very few variables at the same time (Wold, 1995). The multivariate data analysis gives an overview of all the data and allows an overall evaluation of the significance of differences between groups and correlations (Wold, 1991). Information about complicated samples or processes is not associated with single variable. Interactions and synergisms can be perceived only by analysing all the relevant variables together, viz. by multivariate analysis. The methods presented here are not by any means limited to NIR spectroscopy. Numerous chemometric methods have been proposed for use in NIR spectroscopy (Lavine, 1998). The analytical signal obtained in NIR spectroscopy is a complex function that depends on both the physical and chemical properties of the sample (Blanco et al., 1998). It is non-linear owing to scatter, stray light and inconsistency in the instrument response. Multicollinearity of the variables is typical for spectroscopic data since the data consists of continuous signals (Candolfi et al., 1999b). It is possible to pre-process the data in order to minimise these effects. Many analytical techniques used today, such as NIR, produce large data matrixes, which are impractical to process or classify by mere inspection. It is therefore necessary to use pattern recognition methods to extract the useful information. If the data is non-linear, multivariate calibration has to be applied in order to achieve a reliable calibration. 3.1 Pre-processing of data Diversity of the particle size of the material will affect scattering and is a major source of variation in NIR spectra (Dhanoa et al, 1994). These effects have an additive and multiplicative nature and vary from sample to sample. Additive effects cause vertical displacement or shift whereas multiplicative effects result in non-unity slope for each spectrum when compared to a reference spectrum. If the physical information in the spectra is irrelevant and undesired, it is possible to extract the chemical information. Certain transformation techniques are able to remove baseline shifts, slope changes and curvilinearity of spectra, i.e. they reduce the influence of particle size, scattering and other influencing factors (Candolfi et al., 1999b). Scattering occurs on the surface of a material and depends on the physical nature of the material. Varying particle sizes result in a baseline shift in the spectra, because the particle size defines the spectral pathlength. The transformation techniques generally used include calculating first and 2

second derivatives (Osborne et al., 1993), offset correction, de-trending (Candolfi et al., 1999b), standard normal variate transformation and multiplicaticative scatter correction (Barnes et al, 1989). 3.1.1 Derivatives By using the derivative spectra, it is possible to remove overlapping peaks and correct the baseline. The derivative brings the overlapping peaks apart and the linear background becomes to a constant level in the first derivative spectrum and zero in the second derivative spectrum (Osborne et al., 1993). In the second derivative the peaks alter to troughs, whereas in the first derivative they become zero. Unfortunately, the differencing operation magnifies the noise and increases the complexity of the spectrum. Therefore, it is necessary to smooth the data beforehand. This can be done by Savitzky-Golay smoothing that is a moving window method fitting a polynome by least squares (Savitzky and Golay, 1964). The use of derivatives can increase the amount of accepted unacceptable samples (-errors) of pattern recognition methods (Candolfi et al., 1999b). 3.1.2 Standard normal variate (SNV) transformation Standard normal variate (SNV) transformation removes the slope variation from spectra caused by scatter and variation of particle size (Barnes et al, 1989; Candolfi et al., 1999b). The transformation is applied to each spectrum individually by subtracting the spectrum mean and scaling with the spectrum standard deviation. ( x ij x i ) , (13) x ij ,SNV =

(x
i =1

ij

xi ) 2

p 1 where x ij ,SNV is the transformed element for original element xij, and xi is the mean of spectrum i and p is the number of variables in the spectrum. 3.1.3 Multiplicative scatter correction (MSC) MSC is an another scatter correction method. It is based on the idea of correcting the scatter level of all spectra of a group of samples to the level of an ideal samples spectrum, which is usually the average spectrum (Geladi et al., 1985). This is possible because light scatters wavelength dependency is different from that of chemically based light absorption. Each spectrum is fitted to the average spectrum as closely as possible by least squares: (14) x i = a i + bi x j + e i , where xi is an individual spectrum i, x j the mean spectrum of the group, and ei the residual spectrum, which ideally represents the chemical information in spectrum i. The corrected spectrum xi,MSC is calculated using the fitted constants ai (intercept) and bi (slope): (15) x i , MSC = ( x i a i ) / bi . MSC is a set dependent transformation (Dhanoa et al., 1994). If the raw data set is modified, the ideal spectrum, i.e. the mean spectrum, is likely to change and thereby the MSC corrected spectra will need to be recalculated.

3.2

Pattern recognition methods

3.2.1 Principal Components Analysis (PCA) The aim of PCA is to reduce the quantity of spectral data, and thereby avoid overfitting problems, without discarding any useful information (Osborne et al., 1993). PCA uses projections to extract from a large number of variables, a much smaller number of new variables, which account for most of the variability between samples. Each of the new variables (principal components) is a linear combination of the original measurements and therefore contains information from the entire spectrum. PCA fits new axes (variables) in the data space. The first axis is chosen in the direction of maximum variability. This way the amount of information in the first new variable is maximised. The second axis is chosen to be orthogonal to the first, so the second new variable is uncorrelated with the first one. This operation is continued until a sufficient amount of variation is explained by the new variables. The higher a loading of a variable on a principal component, the more the variable has in common with this component. The loadings can be interpreted as correlations between the variables and the components. The score of the object is the value on the principal component axis, where the object is projected. 3.2.2 Correlation coefficient The correlation in the wavelength space can be used to compare spectra (Yoon et al., 1999). This simple method has the advantage of depending only on the shape of the spectra and not on the absolute magnitude of the response. The dot product correlation coefficient, r, is given by xi y i , (16) r= xi2 y i2 where xi and yi are the ordinate values of the two spectra being compared at wavelength i. The correlation value is the dot product of the two vectors representing the spectra and is equal to the cosine of the angle () between the two vectors. The correlation coefficients between absorbance spectra are often very high and not suitable for identification purposes. The best results are obtained using second derivative spectra. 3.2.3 Distances Distances can be used in measuring the similarity of objects. One of the most popular distances is the Euclidean distance (Massart et al., 1988). Distances can be visualised by dendrograms, which are hierarchical trees. In these, the objects with smallest distances are linked together to form a new combined object and this object is linked again to the object with the smallest distance. This process is repeated until all objects are linked. 4 Objectives of the study The purpose of the study was to investigate clustering behaviour of excipient NIR spectra by using different chemometric methods. The effect of spectral pre-treatments was studied. As one goal, it was to assign the spectral regions differing similar spectra. The methods employed here are not only restricted for use in NIR spectroscopy but can be also utilised in other spectroscopic methods as Raman and other areas of infrared.

II Materials and methods The materials were collected to cover a range of the most used excipients in the pharmaceutical industry (Table 1). Materials from different manufacturers and batches were included. The samples were stored in ambient conditions in tightly sealed plastic jars. The water content of one batch of each excipient group was measured by Karl Fisher titration and with IR-balance measurement. Karl Fisher titration was performed on a Mettler Karl Fisher titrator (model DL35, Mettler Toledo AG, Switzerland) using samples of 500-1000 mg. The titration vial was warmed to 51 C, except in the case of PVP. The IR-balance (Sartorius Thermo control, Sartorius GmbH, Germany) was set to warm the samples (2.0 g) with water of crystallisation up to 70 C and the others up to 105 C. All determinations were performed triplicate. Thermogravimetry (TG) and Differential Scanning Calorimetry (DSC) were performed on one batch of each excipient group. TG was carried out with a Mettler TGA/SDTA analyser (model 851e, Mettler Toledo AG, Switzerland) and SDTA data was collected simultaneously. DSC was performed with a Mettler DSC instrument (model 821e, Mettler Toledo AG, Switzerland). The equipment were run by STARsoftware (Sun Soft Inc., USA). The temperature scale of the equipment was calibrated with zinc and indium, while the thermobalance was calibrated with calcium carbonate. Samples (10-15 mg in TG and 3-5 mg in DSC) were in open aluminium pans under nitrogen purge (50 ml/min) and the temperature was elevated 10 C/min. A temperature program of 25-250 C was used for MCC, SMCC, starch, HPMC, PVP, calcium phosphates, silicon dioxide, Eudragit RL PO and NaCMC, and 25-300 C was used for lactose, gelatine, Eudragit S100, Eudragit L100 and mannitol. DSC measurements were performed in triplicate. NIR spectra were collected in apparent absorbance (log 1/R) values with a FTIR Spectrometer (Bomem, Hartman & Braun, Canada) using Bomem-GRAMS software (v. 4.04, Galactic Industries, USA) and teflon as reference (99% reflective Spectralon, Labsphere, USA). The reference spectra were taken every morning. Five samples of each batch were taken in glass vials and an average of 32 scans was recorded in a spectral range of 10000-4000 cm-1 (1000-2500 nm) with 8 cm-1 resolution. The median spectrum of the five determinations was used. Second derivatives of the original absorbance spectra (log 1/R) were calculated with 13-point Savitzky-Golay smoothing and the averages were used. The median absorbance spectra were pre-treated by SNV and MSC algorithms. The excipients were divided into groups to obtain the group mean spectra used in MSC (Table 1). The excipients with only one group member could not be MSC corrected. All the pretreatments were performed using Matlab (v. 5.3, MathWorks Inc., USA). Correlation coefficients (dot products) and Euclidean distances were calculated using average linkage plotted using Matlab (v. 5.3, MathWorks Inc., USA) and the Statistics toolbox (v. 2, MathWorks Inc., USA). Dendrograms of the Euclidean distances were plotted using the same program. Principal component analysis was performed to the absorbance spectra and the pre-treated spectra using SIMCA (v. 8.0, Umetri AB, Sweden).

Table 1. Grouping of excipients for multiplicative scatter correction (MSC).


Excipient group No. of members Lactose monohydrate 10 Anhydrous lactose 4 Anhydrous lactose + lactiol 1 Microcrystalline cellulose (MCC): Emcocel SP 15 1 MCC: Emcocel 50 M 14 MCC: Emcocel 90 M 12 MCC: Emcocel XLM 90 5 MCC: Avicel PH-101 11 MCC: Avicel PH-102 14 MCC: Avicel PH-200 12 Silicified MCC: Prosolv 50 2 Silicified MCC: Prosolv 90 10 Anhydrous dibasic calcium phosphate 1 Dibasic calcium phosphate dihydrate 1 Polyvinylpyrrolidone (PVP) 3 Gelatine 1 Polymetacrylate (Eudragit) 3 Sodium croscarmellose (NaCMC) 1 Hydroxypropylmethyl cellulose (HPMC) 3 Silicon dioxide 1 Mannitol 1 MCC: Emcocel LP 200 7 If the group contains only one member, it is impossible to perform MSC.

III

Results and discussion

5 Correlation coefficients The correlation coefficients were very high (0.991-1.000) between the spectra of absorbance (log 1/R), SNV and MSC treated data. The correlation coefficients as high as 1.000 were found between absorbance spectra both in and outside the excipient groups. The results of MSC and SNV pre-treated data were similar. The correlation coefficients between second derivative spectra had a broader distribution (0.961 0.996). The excipients could not be identified by correlation because the lowest correlation coefficients in the excipient groups were lower than the highest ones with excipients outside the groups. When the amount of spectra was high, correlation coefficients were rather infeasible in use because the correlation matrix became immense. This technique gives no opportunity to visualise the data in order to get a general view. 6 Euclidean distances In the dendrogram plotted from the absorbance (log 1/R) spectra, the excipients were mixed together. Some small homogenous groups could be observed at the small distances, but these groups were mixed with unexpected excipients at larger distances, e.g. lactose monohydrate and gelatine. Silicon dioxide had the biggest distance from the others. In the second derivative dendrogram, at the distance level 0.004, the spectra were divided into nine groups. MCC was mixed in one big group. The other excipients were grouped better, yet starch and NaCMC were grouped together. Mannitol had the biggest distance from all other excipients. Lactose monohydrate formed two groups, which was probably due to grinding used to achieve the smaller particle sizes. The DSC of lactose monohydrate in small particle sizes revealed an exotherm at about 170C,

which was due to uncrystallisation of lactose monohydrate during grinding (Lerk et al., 1984). In the dendrogram of SVN corrected spectra (Appendix 1), eight groups could be observed at the distance level 3.2. The groups were similar as in the dendrogram of second derivative spectra. Here, the NaCMC was not grouped with starch. Additionally, the low moisture grade MCC formed a group. Furthermore, the largest distance was between the group of PVP, HPMC and polymetacrylates, and the rest of the excipients, which reflected the chemical difference of these compared to each other. At a lower distance level, the MCC formed four groups. The water peak was a discriminating element (Figure 1). This was confirmed by the moisture content measurements; the MCC with highest moisture content were in the fourth group from left and the ones with lowest in the second group from left.
1.4 1.2 1 0.8 0.6 log 1/R 0.4
1st group from left

0.2 0 -0.2 1860

2nd group from left 3rd group from left 4th group from left

1880

1900

1920 1940 1960 Wavelength (nm)

1980

2000

2020

Figure 1. Water peak of the four MCC groups formed in the dendrogram of SNV corrected spectra. In the dendrogram calculated from MSC spectra, the excipients formed groups after the group division (Table 1) done before performing MSC. Here, the MCC were grouped after producer and particle size, yet Avicel PH-102 and PH-200 were mixed together. When computing the MSC to MCC as one whole group, the MCC were mixed. However, the low moisture grade and the small particle size MCC remained as they were in the original MSC dendrogram. MSC seemed to be a rough pre-treatment method and not suitable from identification purposes because it is impossible to know what to use as ideal spectrum for an unknown sample. The second derivative spectra could be truncated from the lower wavelengths without a significant effect to the distances between the spectra. The first wavelengths up to 1300 nm were removed. There was a slight difference between the dendrograms of the original and the truncated second derivative spectra, but the groups formed remained the same.

7 Principal components analysis (PCA) The PCA performed on the absorbance spectra did not discriminate excipients very well. Most of the excipients were located in one big group. Lactose monohydrate and gelatine lied apart from the main group in the upper right quarter. Silicon dioxide was located in the lower right corner long apart from the main group. The loadings of PC1 explained mostly the differences in the baselines. The loadings PC2 distinguished spectra differing in intensities of the peaks The PCA performed on the second derivative spectra discriminated excipients in several groups. First two PCs separated lactose monohydrate anhydrous lactose, PVP, gelatine, polymetacrylates, HPMC and mannitol in different groups. The loadings of the first two PCs were rather evenly spread. When inspecting the loadings of second derivative spectra, it was hard to know whether the extreme values were actual peaks which can be assigned or shoulders created by the differencing operation. When the second derivative spectra were truncated from the lower wavelengths, it had no remarkable effect on the groups formed. The spectral region was shortened to be 1300-2500 nm. The truncated spectra could be used to shorten the processing time. Additionally, the first three PCs explained a larger amount of the variation, which indicates that the model was more accurate than the model created of the entire second derivative spectra.
SNVabs.M1 (PC), all wl, Work set Scores: t[1]/t[2]

40 20 0 t[2 ] -20 -40 -60 -80 -100 83 -60 -50 -40 -30 -20 -10 0 10 20 t[1] 30 40 50 60 70 80 90 100 110 2 1 43 14 11 13 12 10 984 68 8 69 65 715 75 65 27 5334 35 21 23 524466 5460 4367 2922 2630 32 61 24 5138 5533 25 28 41 88 89 19 17 85 93 91 90 36 20 92 3986 45 63 3740 429994 5087 5618 103 119 98 105 104 115 120 48 4101 97 7 118 46111 106 96 114 113 95 109 57112 58 117 62102 116 110 31107 49100 59108 64 16 80 82 79 70 71 77 76 81 72 73 74

78

Figure 2. The two first PCs of PCA performed to SNV corrected spectra The PCA carried out on absorbance spectra pre-treated by Standard Normal Variate (SNV) separated the spectra better than the PCA performed on absorbance or second derivative spectra (Figure 2). The chemically related spectra were near each other. PVP (72-73), polymetacrylates (76-78), HPMC (80-83) and anhydrous calcium phosphate (70) were longest apart from the others. Silicon dioxide (83) was alone in the lower left quadrate. Lactose monohydrate (1-9,15), mannitol (84) and starch (68-69) were poorly separated from each other. Polymetacrylates were not grouped together as in the PCA carried out on the derivatives. All MCC was in one dense group. The loadings of PC1 (Figure 3) separated the spectra mainly by intermolecular hydrogen bonds located at 1580 nm (number 650) (Osborne et al., 1993) and (CH2)n combination bands located at 1150 nm (number 1200) (Murray and Williams, 1987). Lactose 8

contains intermolecular hydrogen bonds, whereas polymetacrylates, PVP, and HPMC contain (CH2)n groups. The loadings of PC2 (Figure 4) separate the spectra mainly by the regions at 1450 nm (number 750) and 1780 nm (number 400). The first overtone of O-H stretching vibration occurred in the region of 1450 nm (Osborne et al., 1993), but the water combination band at around 1940 nm lacked from the discriminating loadings. At 1780 nm was the region of first overtone of C-H stretching vibration (Osborne et al., 1993). Silicon dioxide lacks these bonds, and it was placed at the edge of the PCA space.
SNVabs.M1 (PC), all wl, Work set Loadings: NUM/p[1]
0.040 0.030 0.020 0.010 p[ 1] 0.000 -0.010 -0.020 -0.030

100

200

300

400

500

600

700

800 NUM

900

1000 1100 1200 1300 1400 1500 1600

Figure 3. The loadings of PC1 of PCA performed to SNV corrected spectra.


SNVabs.M1 (PC), all wl, Work set Loadings: NUM/p[2]

0.040 0.030 0.020 0.010 p[ 2] 0.000 -0.010 -0.020 -0.030 -0.040 -0.050 0 100 200 300 400 500 600 700 800 NUM 900 1000 1100 1200 1300 1400 1500 1600

Figure 4. The loadings PC2 of PCA performed to SNV corrected spectra.

The PCA performed on absorbance spectra pre-treated by Multiplicative Scatter Correction (MSC) gave the best separation. The groups formed were similar to those in the PCA of SNV corrected spectra. Silicon dioxide was far away from the others as in the PCA of SNV corrected spectra. Additionally, the MCC were separated into groups according to their grade. The loadings of PC1 were almost identical to that of the original spectra. The loadings of PC2 separated the spectra in a similar manner as the loadings of PC1 in the PCA of SNV corrected spectra. When the MCC was MSC treated as a single group, the PCA score plot was similar to the previous one. In this case, the MCC was in one dense group, and the different grades were mixed together. 8 Conclusions The correlation dot products were slow and laborious to go through and could not visualise the data. The correlations of original, SNV and MSC spectra were too high to give a good separation. The correlations of second derivative spectra seemed more promising, but variation inside the groups was bigger then variation between the groups, which disabled clustering. It was easier to get an overall picture of the differences of data from a PCA plot than from dendrograms, yet the latter were handier when surveying small differences. By inspecting the loadings, PCA enables examination of the areas of data, which have a significant role in the clustering behaviour. When using NIR spectroscopy, it was possible to assign the loadings as vibrational bands. However, when second derivative spectra were used, this became difficult due to the uniform spreading of the loadings. The MSC pre-treatment was able to create dense clusters, but if its used in identification, the problem of determining what is the correct ideal spectrum to use for unknown samples comes apparent. It is laborious to compute several MSCs for an unknown sample using the average spectrum of each substance group in the library. From this point of view, SNV is easier to use. Its performance was as good as MSCs, if the grouping of MCC is not taken into account.

10

References Barnes, R. J., Dhanoa, M.S. and Lister, S.J., 1989: Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra. Appl. Spectrosc., 43 (5), 772-777. Blanco, M., Coello, J., Iturriaga, H., Maspoch, S. and de la Pezuela, C., 1998: Nearinfrared spectroscopy in the pharmaceutical industry. Analyst, 123, 135R-150R. Candolfi, A., De Maesschalck, R., Jouan-Rimbaud, D., Hailey, P. A. and Massart, D.L., 1999b: The influence of data pre-processing in the pattern recognition of excipients near-infrared spectra. J. Pharm. Biomed. Anal., 21, 115-132. Dhanoa, M.S., Lister, S.J., Sanderson, R. and Barnes, R.J., 1994: The link between Multiplicative Scatter Correction (MSC) and Standard Normal Variate (SNV) Transformations of NIR spectra. J. Near Infrared Spectrosc., 2, 43-47. Dreassi, E., Ceramelli, G., Perruccio, P.L. and Corti, P., 1998: Transfer of calibration in near-infrared reflectance spectrometry. Analyst, 123, 1259-1264. European Pharmacopoeia, 3rd ed., European Commission, Strasbourg, France, 1997, p. 43-44. Geladi P., MacDougall D. and Martens H., 1985: Linerization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat. Appl. Spectrosc., 39 (3), 491500. Geladi P. and Dbakk E., 1995: An overview of chemometrics applications in near infrared spectrometry. J. Near Infrared Spectrosc., 3, 119-132. Higgins, M., 1997: Pinpointing production problems with NIR analysis. Manuf. Chem., 68, 38-39. Hruschka, W.R., 1987: Data Analysis: Wavelength Selection Methods, in volume: Williams, P. and Norris, K. (eds.): Near-Infrared Technology in the Agricultural and Food Industries. American Association of Cereal Chemists, St. Paul, Minnesota, USA, p. 35. Lavine B. K., 1998: Chemometrics. Anal. Chem., 70, 209R-228R. Lerk, C.F., Andreae, A.C., de Boer, A.H., de Hoog, P., Kussendrager, K. and van Leverink, J., 1984: Transitions of Lactoses by Mechanical and Thermal Treatment. J. Pharm. Sci., 73 (6), 857-859. Massart, D.L., Vandeginste, B.G.M., Deming, S.N., Michotte, Y. and Kaufman, L., 1988: Chemometrics: a textbook, in series: Data handling in science and technology, volume 2. Elsevier Science Publishers B.V., Netherlands, p. 339355, 373-378.

11

Murray, I. and Williams, P.C., 1987: Chemical Principles of Near-Infrared Technology, in volume: Williams, P. and Norris, K. (eds.): Near-Infrared Technology in the Agricultural and Food Industries. American Association of Cereal Chemists, St. Paul, Minnesota, USA, p. 29-31. Osborne B. G., Fearn T. and Hindle P. H., 1993: Practical NIR Spectroscopy with Applications in Food and Beverage Analysis. 2nd ed., Longman Group, Burnt Mill, Harlow, Essex, England, UK, p. 20-33, 42-43, 106-113, 123-132. Savitzky, A. and Golay, M.J.E., 1964: Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem., 36 (8), 1627-1639. Wold, S., 1991: Chemometrics, why, what and where to next? . J. Pharm. Biomed. Anal., 9 (8), 589-596. Wold, S., 1995: Chemometrics; what do we mean with it, and what do we want from it? Chemom. Intell. Lab. Syst., 30, 109-115. Yoon W. L., Jee R. D., Moffat A. C., Blackler P. D., Yeung K. and Lee D. C., 1999: Construction and transferability of a spectral library for the identification of common solvents by near-infrared transreflectance spectroscopy. Analyst, 124, 1197-1203.

12

20 18 16 14 12

Anhydrous lactose Lactose Monohydrate

Eudragits

MCC
10 8 6 4 2 0
102 10196107 103 111113991099710098108 11595106 110 114 112 10410563 62116 118 12017 19 41 20 39 40 28 18 87 88 85 89 86 90 93 92 91 94 21 22 67 60 34 65 66 52 54 53 55 56 29 32 51 61 30 35 33 38 43 44 31 58 49 59 57 64 36 45 50 37 46 47 48 42 16 23 27 25 26 24 68 69 79 1 2 3 4 5 6 7 8 15 9 10 11 12 13 14 84 70 71 75 83 72 73 74 80 82 81 76 77 78 117 119

Appendix 1. Dendrogram of SNV treated spectra