You are on page 1of 9


Energy & Fuels 2001, 15, 1304-1312

Determination of Saturate, Aromatic, Resin, and Asphaltenic (SARA) Components in Crude Oils by Means of Infrared and Near-Infrared Spectroscopy
Narve Aske,*, Harald Kallevik, and Johan Sjo blom
The Norwegian University of Science and Technology, Department of Chemical Engineering, N-7491 Trondheim, Norway, and Statoil R&D Centre, Rotvoll, N-7005 Trondheim, Norway Received April 11, 2001. Revised Manuscript Received June 29, 2001

Eighteen crude oils and condensates have been investigated by means of infrared (IR) and near-infrared spectroscopy (NIR) and high-performance liquid chromatography (HPLC). By means of HPLC the samples have been separated into four chemical group classes, namely saturates, aromatics, resins, and asphaltenes, the so-called SARA fractions. Using multivariate analysis techniques such as principal component analysis (PCA) and partial least-squares analysis (PLS), the predictive ability of the spectroscopic techniques with regard to the SARA components have been explored. The results show that the SARA distribution of crude oils and related materials can be determined both from infrared and near-infrared spectroscopy. The uncertainties in the prediction models based on IR spectroscopy have been found to be 2.5, 2.2, 1.4, and 1.3 wt % for the saturate, aromatic, resin, and asphaltene fraction, respectively. For NIR the equivalent uncertainties are 2.8, 2.4, 1.4, and 1.0 wt %. These values are in the same range as the reported uncertainty in the direct determination by HPLC. Spectroscopic determination of SARA values, especially using NIR, offers the possibility of rapid SARA determinations of these values. The determinations could be done at high pressure and temperature.

Introduction To characterize the chemical composition of crude oils and related materials, hydrocarbon group type determinations give important parameters. One such separation is the SARA fractionation, separating the crude oil into saturates, aromatics, resins, and asphaltenes. There exists a number of techniques for the separation and characterization of crude oils and related materials into hydrocarbon group types.1-6 A review of these methods is given by Lundanes and Greibrokk.7 Many of these techniques are based on chromatographic separation. Among these methods, high-performance liquid chromatography (HPLC) is especially well suited when the separation is to be achieved in minimum time. Generally, deasphalting of the sample is the first step in the SARA separation of crude oils. The following SAR class separation is then achieved by HPLC using bonded phase columns, or a combination of silica and bonded phase columns, and an alkane as mobile phase. The
* Author to whom correspondence should be addressed. Fax: +47 73 58 46 28. E-mail: The Norwegian University of Science and Technology. Statoil R&D Centre. (1) Ali, M. A.; Nofal, W. A. Fuel Sci. Technol. Int. 1994, 12, 21-33. (2) Bollet, C.; Escalier, J.-C.; Souteyrand, C.; Caude, M.; Rosset, R. J. Chromatogr. 1981, 206, 289-300. (3) Dark, W. A. J. Liq. Chromatogr. 1982, 5, 1645-1652. (4) Suatoni, J. C.; Swab, R. E. J. Chromatogr. Sci. 1975, 13, 361366. (5) Radke, M.; Willsch, H.; Welte, D. H. Am. Chem. Soc. 1980, 52, 406-411. (6) Grizzle, P. L.; Sablotny, D. M. Anal. Chem. 1986, 58, 2389-2396. (7) Lundanes, E.; Greibrokk, T. J. High Resolut. Chromatogr. 1994, 17, 197-202.

complexity of the crude oils and the lack of an aromatic/ resin selective column make it impossible to draw a clear distinction of aromatics and resins. A method specific definition of aromatics and resins is therefore usually employed.7 Separation of saturates and aromatics with backflushing of the resins has been obtained on amino columns, cyano columns, and aminocyano columns. Often, however, two coupled columns such as cyano/ aminocyano or silica/cyano are employed. In this case the purpose of the first column is to retain the resins, while the second retains the aromatics. The saturate fraction has no retention on such columns, and is therefore eluted directly through the columns. The resins and aromatics have to be eluted from their respective columns with appropriate solvents. At Statoil R&D Centre, a high-performance liquid chromatography method has earlier been developed for the semipreparative separation of crude oils and related materials into the four SARA fractions. The separation is performed on a chemically bonded silica-NH2 column. The method employed is based on one column and the use of different mobile phases. The saturates and aromatics are eluted with n-hexane as mobile phase, while the more polar resins are eluted by backflushing the column with trichloromethane. After elution of the resins the column must be equilibrated with the initial mobile phase of n-hexane. Recently 18 samples ranging from light condensates to heavy crude oils have been analyzed by this HPLC procedure. In addition, all the samples have been characterized by infrared (IR) and near-infrared spec-

10.1021/ef010088h CCC: $20.00 2001 American Chemical Society Published on Web 08/11/2001

Determination of SARA Components in Crude Oils

Energy & Fuels, Vol. 15, No. 5, 2001 1305

troscopy (NIR). IR and NIR instrumentation offers the possibility of fast determination of different chemical parameters. This offers the possibility of correlating the spectra to the SARA values of the respective crude oils. In our research we are using a high-pressure NIR cell capable of NIR measurements at several hundred bars. If SARA values can be predicted from the NIR spectra, it is our intention to map phase diagrams of crude oils by simultaneous measurements of asphaltene aggregation state and SARA values at various pressures and temperatures. Unlike the HPLC method, IR and NIR spectroscopy do not resolve the components in a sample. Hence the chemical information about components is embedded in multiple absorption bands in the spectra. The extraction of information is possible by the use of multivariate techniques. In this study partial least squares regression (PLS) has been used to establish the relationship between the spectral data from IR and NIR, and the SARA parameters already obtained from the HPLC procedure. To fully exploit the predictive ability of the IR and NIR spectra one could concatenate the two spectra and perform a single analysis. This is not done in this study since we are primarily interested in a situation where only one of these instruments would be available for the SARA analysis. A principal component analysis (PCA) has also been performed in order to visualize the main trends in the sample set and to detect potential outliers. Theory Spectroscopic Techniques. The infrared region of practical use covers the region between 4000 and 400 cm-1. In the IR spectrum certain groups of chemical bonding give rise to bands at or near the same frequency regardless of the structure of the rest of the molecule.8 The absorption bands of aliphatic C-H bonds, with additional bands originating from groups containing aromatics, oxygen, sulfur, and nitrogen, usually dominate the spectra of crude oils. The near-infrared spectroscopic region of the electromagnetic spectrum extends from 780 to 2500 nm (12820 to 4000 cm-1), but the region generally used is between 1100 and 2500 nm (10000 to 4000 cm-1). The NIR region is attractive for crude oil analysis because many of the absorption bands observed in this region arise from overtones or combinations of carbon-hydrogen stretching vibrations. Functional groups such as methylenic, oleifinic, or aromatic C-H give rise to various C-H stretching vibrations that are mainly independent of the rest of the molecule. This makes NIR spectroscopy especially well suited for analysis based on hydrocarbon functional groups.9 The broad bands of the NIR spectrum, and thereby lack of selectivity, makes multivariate techniques an important tool in analyzing NIR data. Data Analysis. Differentiation. Spectroscopic raw data may have a distribution that is not optimal for analysis of the data. Preprocessing of the spectral data is used to reduce effects that make it difficult to extract meaningful information from the data. For NIR and IR
(8) Silverstein, R. M.; Bassler, G. C.; Morrill, T. C. Spectrometric Identification of Organic Compounds, 5th ed.; John Wiley & Sons: New York, 1991. (9) Kelly, J. J.; Callis, J. B. Am. Chem. Soc. 1990, 62, 1444-1451.

spectra, differentiation is a common pretreatment. A first-order differentiation of the spectra removes additive shifts in the data, thus removing effects from, for instance, baseline shifts. Any relation between the absorption and concentration of an analyte remains after differentiation of the spectra. The drawback of this pretreatment is a slight decrease in the signal-to-noise ratio. Principal Component Analysis (PCA). Principal component analysis (PCA) is a projection method that helps visualize the most important information contained in a data set. PCA finds combinations of variables that describe major trends in the data set. Mathematically, PCA is based on an eigenvector decomposition of the covariance matrix of the variables in a data set. Given a data matrix X with m rows of samples and n columns of variables, the covariance matrix of X is defined as

cov(X) )

XTX m-1


The result of the PCA procedure is a decomposition of the data matrix X into principal components called score and loading vectors

Xnm ) t1p1T + t2p2T + tipiT + ... + tkpkT + Enm (2)

Here ti is the score vector, pi is the loading vector, and E is the residual matrix. The score and loading vectors contains information on how the samples and variables, respectively, relate to each other. The direction of the first principal component (t1, p1) is the line in the variable space that best describes the variation in the data matrix X. The direction of the second principal component is given by the straight line that best describes the variation not described by the first principal component and so on. Thus the original data set can be adequately described using a few orthogonal principal components instead of the original variables, with no significant loss of information. For NIR data, usually more than 95% of the original variation is described by 2-5 principal components. When plotting principal components against each other, relations between samples are easily detected.10 Outlier Detection. Outliers are samples that in some way are abnormal. They may contain valuable information, but they may also represent nonrepresentative samples that could introduce large errors to a model. In either case, it is important to be able to discover them. One way of doing this is by use of Hotelling T2 statistics. By use of the Hotelling T2 test a 95% confidence interval ellipse can be included in score plots and reveals potential outliers, which will be positioned outside the ellipse.11 Partial Least Squares (PLS) Analysis. Regression is used to fit a model to observed data in order to quantify the relationship between two groups of variables. The fitted model may then be used to predict new values. In other words, we are interested in fitting data from a data matrix X to some response vector y. In this case the data matrix are spectroscopic data, while the response is the observed SARA values, taken one by one.
(10) Wise, B. M.; Gallagher, N. B. J. Proc. Cont. 1996, 6, 329-348. (11) User Manual, The Unscrambler; CAMO ASA, 1998.


Energy & Fuels, Vol. 15, No. 5, 2001

Aske et al.

Due to the lack of selectivity, especially in the NIR spectra, multivariate techniques have to be employed in the regression, just as for the PCA. Due to the strongly correlated and redundant information in spectral data set, regression between the X data and the response y is often impossible by ordinary least-squares methods. The solution to this is to decompose the X matrix as shown in eq 2, and perform the regression between the resulting score vectors and the response. PCA decomposition followed by regression is called principal component regression (PCR). The partial least squares (PLS) technique is another method and the one used in this study. In PCA the scores and loadings are the vectors that best describes the variance of the X matrix. In PLS the scores and loadings (called latent variables) are the vectors that have the highest covariance with the response vector y. The decomposition is followed by a regression between the latent variables and the response.12 Due to the danger of overfitting the regression model, the optimum number of latent variables to be used must be determined. One way of doing this is the so-called cross validation technique. Cross validation checks a model by repeatedly taking out different subsets of calibration samples from the model estimation, and instead using them as temporary, local sets of secret test samples. If the model parameter estimates are stable against these repeated perturbations, this indicates that the model is reliable.13 In the simplest case each subset contains only one sample, which is called full cross validation, and is the technique employed in all the regression modeling in this study. Experimental Section
Total Acid Number (TAN). Total acid number (TAN) for the crude oils was determined by potentiometric titration according to the ASTM D-664 procedure.14 SARA. Apparatus. The HPLC-system consists of a multisolvent delivery system (Waters 600E), an UV-detector (Waters 490), a differential refractometer (Waters 410), an injection valve (Rheodyne), and an automated switching valve (Waters). The columns used were two 7.8 300 mm Bondapak NH2-columns (Waters) in series, with particle diameter of 10 m. The solvent reservoirs were equipped with continuous helium degassing. Sample Preparation. A weighed amount of approximately 3 g of crude oil was mixed with 120 mL HPLC-grade n-hexane, and the mixture was left to stand overnight at ambient temperature. The mixture was filtered through a 0.45 m filter in order to remove precipitated asphaltene. The filter, containing the asphaltene fraction, was dried and the asphaltene weight percent was determined gravimetrically. The supernatant was concentrated to 25 mL in a rotavapor unit. HPLC Separation. The n-hexane soluble fraction was further fractionated by HPLC into saturates, aromatics and resins. 500 L of the filtered sample, containing about 60 mg of crude oil sample, was injected onto the column. The temperature was kept ambient and the elution of the saturated and aromatic components was carried out using HPLC-grade n-hexane at a flow rate of 8 mL/min. Saturates were first
(12) Kallevik, H. Characterisation of Crude Oil and Model Oil Emulsions by Means of Near Infrared Spectroscopy and Multivariate Analysis; University of Bergen, 1999. (13) Martens, H.; Martens, M. Multivariate Analysis of Quality, An Introduction; John Wiley & Sons: Chichester, 2001. (14) ASTM D-664: Standard Test Method for Acid Number of Petroleum Products by Potentiometric Titration.

Figure 1. Detector response for a typical SARA fractionation run. eluted from the column as shown by the RI detector response. The aromatics were then collected. After approximately 5060 min all the aromatic components had eluted from the column, as seen by decrease of the UV-detector response. At this point the column was backflushed with a mobile phase of trichloromethane, to elute the resin fraction. This was accomplished in a few minutes. The column was then regenerated by flushing it with n-hexane. The RI- and UV-detector response are shown in Figure 1. The saturates are clearly seen as a sharp peak on the RI-detector, while the aromatic and resin fractions are seen as UV-detector response. Solvent Removal. The collected aromatics and resins were obtained by first concentrating the fractions in a rotavapor at 30 C under reduced pressure. The fractions were then transferred to 10 mL, preweighed vials, and the rest of the solvent was removed by carefully drying the fractions with nitrogen. The saturate fraction was dried directly with nitrogen due too the small amounts of n-hexane used to elute this fraction. After solvent removal the fractions were weighed, and the percentage mass based SARA-distribution of the sample were determined. Spectroscopic Techniques. The infrared and nearinfrared analysis were performed on the original crude oil samples at 25 C. A Nicolet 710 FT-IR spectrometer was used for the IRanalysis. Wavenumbers from 4000 to 400 cm-1 were scanned. The samples were placed in an ATR-cell (Attenuated Total Reflectance) with a KBr crystal. A Brimrose AOTF Luminar 2000 spectrometer was used for the NIR sampling. The spectrometer was equipped with a fiber optic sampling probe for transflectance measurements. The wavelength region was set to 1100-2200 nm, and the total number of scans per spectra set to 64. Total path length was 2 mm. The spectroscopic data were analyzed using Unscrambler 7.6.15

Results and Discussion HPLC. A total of 18 crude oils and condensates were analyzed on the HPLC apparatus in order to determine the SARA distributions. The results are given in Table 1. As can be seen, the typical yield for a crude oil sample lies in the 90-100% range. A few samples show yields in excess of 100%, indicating incomplete solvent removal. The reported total acid number (TAN) values for the crude oil samples have been included for later reference. It is assumed that most of the material lost during the SARA-fractionation is due to evaporation losses
(15) The Unscrambler v7.6; CAMO ASA, 2000.

Determination of SARA Components in Crude Oils

Energy & Fuels, Vol. 15, No. 5, 2001 1307

Table 1. Origin, SARA Distribution, Total Acid Number (TAN), and Density for the 18 Crude Oils and Condensates (SARA distributions are normalized to 100 wt %) SARA fractionation results crude oil no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 origin West Africa North Sea West Africa North Sea North Sea North Sea North Sea North Sea West Africa North Sea West Africa North Sea West Africa North Sea North Sea North Sea West Africa France saturates (wt %) 47.9 48.0 41.2 82.7 62.7 35.3 41.8 50.9 40.6 79.8 57.3 60.6 42.4 65.0 50.3 55.4 54.5 24.4 aromatics (wt %) 36.5 37.5 36.4 13.4 23.6 36.8 38.8 34.6 32.1 16.5 27.9 30.0 36.1 30.7 31.4 28.3 28.8 43.4 resins (wt %) 15.2 14.2 20.4 3.9 12.2 24.5 18.7 14.0 20.6 3.6 13.5 9.2 20.5 4.3 17.5 12.9 14.9 19.9 asphaltenes (wt %) 0.4 0.3 2.1 0.0 1.5 3.5 0.6 0.5 6.6 0.1 1.3 0.2 1.0 0.0 0.7 3.4 1.8 12.4 yield (%) 99.0 102.6 91.9 91.0 93.6 106.4 103.0 94.4 97.0 70.5 93.9 103.2 98.9 90.1 94.8 105.1 93.7 93.7 TAN 1.10 3.10 1.50 0.69 0.18 2.30 3.10 2.70 0.49 0.01 0.50 0.04 3.60 0.02 1.20 0.36 0.44 0.20 density (g/cm3) 0.914 0.916 0.916 0.839 0.844 0.945 0.914 0.885 0.888 0.796 0.873 0.857 0.921 0.796 0.898 0.840 0.873 0.939

during solvent removal. This has also been confirmed by Radke et al.5 who studied the effect of evaporation on the recovered saturate and aromatic fractions using the same solvent evaporation procedure used in this study. They showed that the loss was primarily due to evaporation of saturates, and to some extent, the aromatics. The reported SARA-values of the low-yield samples in this study are corrected to 100% by adjusting the saturate and aromatic values. This implies that the evaporation loss from the resin fraction is considered to be negligible. This is also supported by the fact that condensates, with high saturate and correspondingly low resin values, show the lowest yields. Parallel analyses have been performed on a selection of the crude oils to test the reproducibility of the SARAprocedure. The average standard deviation for the S-, A-, and R-fractions were 2.2, 2.5, and 1.4 wt %, respectively. As expected, there is most uncertainty in the saturate and aromatic fractions. The asphaltene fraction is not separated on the HPLC apparatus, but by precipitation from the original sample. The standard deviation for this procedure was found to be 0.2 wt % on average. Spectroscopic Techniques. Figures 2 and 3 show examples of original and first-order differentiated IR spectra for the crude oil samples. The samples are

chosen to reflect extreme values in saturates, aromatics, resins, asphaltenes and total acid number. Only the region from 1750 to 650 cm-1 is shown, since this is the region where the differences between the samples are most clearly seen. The carbonyl peak at around 1710 cm-1 is seen for the high TAN sample 13. In addition, crude oil 4 displays low absorption compared to the other samples. Only two samples are shown for the differentiated IR spectra, in Figure 3. In the differentiated spectra the spectral differences are diminished, but the carbonyl absorption pattern for sample 13 is still easily observed. Figures 4 and 5 show original and first-order differentiated NIR spectra for crude oil samples. The elevation of the baseline in Figure 4 is due to light scattering caused by asphaltene aggregates. This effect is obviously more prominent for the high asphaltene samples, as one would expect. In addition, the slope of the absorption curves from 1600 to 1300 nm increases with increasing asphaltene content of the sample. This effect is also reported by Kallevik.12 This trend is claimed to be due to electronic transitions caused by * and n * transitions of the asphaltene molecules.16 Most of the light scattering effect is removed in the first-order differentiated spectra in Figure 5, and the

Figure 2. Part of the IR spectra for crude oil samples 4, 13, and 18. The carbonyl peak for the acidic crude oil 13 is clearly seen at around 1710 cm-1.

Figure 3. Part of the first-order differentiated IR spectra for crude oil samples 13 and 18.


Energy & Fuels, Vol. 15, No. 5, 2001

Aske et al.

Figure 6. Score plot from the PCA of the 18 IR spectra. Two outliers are clearly seen, and this is confirmed by the 95% confidence interval Hotelling T2 ellipse. Figure 4. NIR spectra for the crude oil samples 4, 6, 13, and 18. The baseline shift effect as a consequence of light scattering is clearly seen.

Figure 7. Score plot from the PCA of the IR spectra with two outliers removed. Distinct sample groupings are encircled. Explained variance: PC1 60%, PC2 22%. Figure 5. First-order differentiated NIR spectra for the crude oil samples 13 and 18. Most of the light scattering effect is removed.

differences between the samples are reduced compared to the nondifferentiated spectra. Principal Component Analysis. Both the IR and NIR data were analyzed with PCA in order to visualize the main trends in the data. A plot of the scores on principal component 1 plotted against the scores on principal component 2 is in our opinion the easiest way to visualize the main trends in the sample set. In addition, eventual outliers in the data set are easily detected. Both the IR and NIR spectra were first-order differentiated before the principal component analysis. Differentiation of the spectra gave a clearer separation of the samples. Infrared Spectroscopy. The score plot of the IR spectra is shown in Figure 6. A Hotelling T2 ellipse is included in the plot. The ellipse is based on a 95% confidence interval, and observations found outside the ellipse are potential outliers. The Hotelling test suggests that samples 15 and 17 may be outliers. These samples differ from the average samples and have a large impact on the model. A close inspection of the IR spectra of these two samples revealed several absorption peaks many times higher than for the other samples. The samples were therefore classified as not representative, and they were left out of the remaining analysis.
(16) Mullins, O. C. Anal. Chem. 1990, 62, 508-514.

Figure 7 shows the score plot of the IR spectra with the two abnormal samples removed. The two first principal components explain 82% of the variation in the data set. Samples close to each other in the score plot will have similar properties. One obvious group is the 6 oils with high TAN-numbers, which have been encircled. The only exception to this trend is the no. 3 crude (TAN ) 1.50), which is positioned within the heavy component rich samples. This may be explained by the relatively high resin and asphaltene content of this crude. The crude oils containing high values of the heaviest components, asphaltenes and resins, are generally positioned in the upper part of the score plot. This group contains all the samples with more than 2 wt % asphaltenes, with no. 6 (3.5 wt %) as an exception. Crude oil 6 is positioned in the acidic samples group, due to its high TAN number (2.30). This trend is not absolute though; one would expect the 11 and 12 crude to be positioned toward the lighter samples. The condensates and light crude oils, being rich in saturates, is positioned in the lower left part of the plot. All the samples in this group contain more than 60% saturates. One exception to this trend is sample 4, containing nearly 83% saturates according to the SARA fractionation. For some reason this sample is positioned in the lower right corner of the score plot. Its vicinity to the high TAN samples could be explained by its slightly acidic character (TAN ) 0.7), but it does not follow the trend of the other light samples. As seen by its high value of saturates and colorless appearance,

Determination of SARA Components in Crude Oils

Energy & Fuels, Vol. 15, No. 5, 2001 1309

Figure 8. First loading vector from PCA of differentiated IR spectra.

Figure 10. Score plot from the PCA of the NIR spectra. Distinct sample groupings are encircled. Explained variance: PC1 63%, PC2 15%.

Figure 9. Second loading vector from PCA of differentiated IR spectra.

sample 4 differs from most of the other crude oil samples, even the other condensates. This is also seen in both the IR and NIR absorption patterns. It is these special characteristics of sample 4 that are reflected in the score plot. As a summary, the IR spectra seem to contain much information with regard to the acidic properties of the samples, since the samples with the highest reported TAN numbers are closely grouped together. This of course is a known fact, and IR spectra are routinely used to determine the TAN number for oil samples, by using the height of the carbonyl peak at approximately 1710 cm-1. But it is also obvious that the 16 samples in the score plot to a certain extent are separated on the basis of their SARA properties. Additional information can be achieved by looking at the loading vectors. The loading vectors of principal components 1 and 2 in Figures 8 and 9 can be compared to the spectral features in Figures 2 and 3, remembering that these figures show only parts of the spectra and that the loading vectors are based on the differentiated spectra. Both loading vectors are related to the aliphatic CH2 and CH3 stretching band around 2900 cm-1. Their deformation bands are also seen at around 1370 and 1450 cm-1 for both loading vectors. Different absorption patterns in the low wavenumber region seen in Figures 2 and 3 are also found in both loading vectors. One distinct difference between the two loading vectors is the pattern around 1710 cm-1 seen in loading vector 2. This is not found in loading vector 1, and explains why

the acidic crude oils are found as a group with low scores on PC2 in the score plot in Figure 7. Near-Infrared Spectroscopy. No potential outliers were detected in the NIR spectra score plot in Figure 10, and all the 18 samples were therefore included in the further analysis. The first two principal components explain 78% of the variation in the data set. As for the IR spectra score plot, the most acidic crude oils can be seen as a group. Heavy crude oils, with high amounts of resins and asphaltenes, are generally found in the left part of the score plot, while lighter samples, containing much saturates, can be found at the lower right side of the score plot. The no. 4 sample is yet again not included in any of the three indicated groups. The high TAN group now includes crude oil 15 as well (TAN ) 1.20). Some differences in this group compared to the IR score plot should be noted. Crude oil 13, being the most acidic sample (TAN ) 3.60), seems to be drawn more toward the heavy samples in the NIR score plot. This could be explained by its high resin content (20.5 wt %). Crude oil 6, being both heavy and acidic in character, is now positioned in the heavy samples group, not with the acidic samples as in the IR score plot. Close inspections of the score plot reveals that the relatively light samples 11 and 12 now are positioned inside the light samples group. As mentioned, this was not the fact for the IR score plot. Sample 18, containing least saturates and most asphaltenes of the 18 samples, is positioned on the far left of principal component 1. Sample 4, containing most saturates and no asphaltenes, is positioned on the far right of the same principal component. This indicates that the NIR spectra may contain much information with respect to one or more of the SARA parameters. Partial Least Squares Regression. From the PCA above it seems like both the IR and the NIR spectra contain information about the chemical composition of the samples. To correlate the spectral data and the SARA parameters a PLS regression was performed on the data set. For the IR data set four PLS regression models were built. All four used the IR spectra for the 18 crude oils as X-data, while the response (Y) in the four models was the saturate, aromatic, resin, and asphaltene weight percent values as determined by precipitation/HPLC. The same was done for the NIR data, using the NIR spectra as X-data instead. This sums up to a total of eight prediction models.


Energy & Fuels, Vol. 15, No. 5, 2001

Table 2. PLS-Data for IR Calibration cal. range (wt %) X-expl 85 99 82 98 Y-expl 98 98 98 99 LV 3 7 3 7 R2 0.987 0.997 0.981 0.989 RMSEC 1.70 0.41 0.89 0.33 RMSEP 2.45 2.20 1.37 1.29

Aske et al.

excl. 6 6 2 4

saturates aromatics resins asphaltenes

24.4-82.7 13.4-43.4 3.9-24.5 0.0-12.4

Table 3. PLS-data for NIR Calibration cal. range (wt %) saturates aromatics resins asphaltenes 24.4-82.7 13.4-43.4 3.9-24.5 0.0-12.4 X-expl 99 99 100 100 Y-expl 98 99 100 98 LV 8 6 7 8 R2 0.997 0.994 0.996 0.995 RMSEC 1.13 0.85 0.54 0.29 RMSEP 2.78 2.39 1.41 0.98 excl. 6 6 2 -

The four SARA variables are dependent in the sense that they have to sum to 100 wt % for each sample. However, the contribution from each fraction is determined individually. Although both the saturate, aromatic, and resin fraction are separated by the same HPLC procedure, they are dried individually. The asphaltene content is determined in advance as explained in the Experimental Section. This means that for a certain sample the experimental resin weight percent may be correct, while the saturate value could be wrong. Due to this fact, a sample that does not fit the calibration model, for example for the saturates, is not necessarily wrong for the resin calibration model. However, for low yield samples the saturate and aromatic weight fractions have been adjusted to give a total SARA distribution of 100 wt %. As explained, this is because these fractions are believed to contribute most to the sample loss. This inevitably leads to a closer dependence between these two fractions. To be able to build robust prediction models, a broad calibration range for the SARA parameters are required. The samples used in this study have been chosen to reflect such a broad distribution, the saturate values range from 24 to 83%, the aromatic values from 13 to 43%, the resin values from 4 to 25%, and the asphaltene values from 0 to 12%. One problem with the asphaltene values though, is that most samples have an asphaltene content in the relative narrow 0-2% range, while only a few samples have high asphaltene values. This could make the two asphaltene models less accurate and mathematically stable. A model valid only for the 0-2% range could have been chosen, but we are primarily interested in making a model as general as possible, and thus the high asphaltene crudes are included in the model. To validate the prediction models, full cross validation was employed, and the cross validation was also used to find the optimum number of latent variables to be used in the models. For the IR spectra the original spectra were used in the modeling, but samples 15 and 17 were not included in the modeling since they were classified as outliers in the PCA. The NIR spectra were first-order differentiated before PLS regression due to the large effect of light scattering in these spectra. The light scattering causes a baseline shift in the spectra, which is more or less removed with the differentiation. Table 2 presents the regression data from the 4 models predicting the SARA parameters from IR spectra. Table 3 presents the corresponding NIR model data. The first column in the tables shows the calibration

Figure 11. PLS prediction model for saturates from NIR spectra. Crude oil 6 not included. Statistics: R2 ) 0.997, RMSEP ) 2.78.

range for the model and the next two columns express how much of the information in the data matrix (X) is used to explain how much of the response (Y). LV is the number of latent variables used in the modeling. This number is the optimum number of latent variables as determined by the full cross validation procedure. R2 is the correlation coefficient of the calibration curve while RMSEC and RMSEP are the root-mean-square error of calibration and prediction, respectively. The last column indicates the samples that were left out in the modeling. These are the samples that for some reason did not fit the respective calibration models. Saturates and Aromatics. Figure 11 shows an example calibration curve, this one shows the predicted vs measured values for the saturates fraction based on NIR. The prediction models for the saturate and aromatic fractions from the IR and NIR spectra are good, with more or less the same RMSEP values as the given experimental uncertainty. This means that the predictions from the spectra give as precise values as the HPLC procedure itself. As seen in Tables 2 and 3, both for IR and NIR, crude oil 6 did not fit either of the saturate or aromatic models. This may indicate that the experimental value of saturates and aromatics for sample 6 are wrong, and that it therefore will not be well predicted by the model. Another explanation is that crude oil 6 possesses some properties not displayed by the other crude oils, meaning that its IR and NIR spectra will not resemble the others. We have no other indications that crude oil 6, a North Sea crude, should be very different from the other crude oils in the sample set, and the deviation is probably due to experimental errors. This assumption is supported by the following remarks: The experimental determination of the satu-

Determination of SARA Components in Crude Oils

Energy & Fuels, Vol. 15, No. 5, 2001 1311

Table 4. Comparison of the Experimental and PLS Predicted Uncertainty in the Determination of the Weight Percentage Saturates, Aromatics, Resins, and Asphaltenes in Crude Oils expt (wt %) saturates aromatics resins asphaltenes 2.2 2.5 1.4 0.2 RMSEP, IR (wt %) 2.5 2.2 1.4 1.3 RMSEP, NIR (wt %) 2.8 2.4 1.4 1.0

Figure 12. PLS prediction model for asphaltenes from IR spectra. Crude oil 4 not included. Statistics: R2 ) 0.989, RMSEP ) 1.29.

rate and aromatic fraction is somewhat correlated as explained earlier, meaning that errors in the determination of one of the fractions will influence the other as well. Since crude oil 6 did not fit either the saturate or aromatic model, this could be yet another indication of experimental errors. The fact that both the IR and NIR models were not able to accurately predict the saturate and aromatic values of crude oil 6 may also be an indication of experimental errors, since both the IR and NIR models uses the same saturate and aromatic values in the regression. But the possibility that sample 6 is different from the other samples in some way cannot be excluded unless replicate SARA determinations are performed. Resins. The prediction models for resins have the same RMSEP values as the experimental uncertainty (1.4 wt %) both for IR and NIR. The resin value of crude oil 2 did not fit either of the models. On the basis of the same arguments as for the exclusion of crude oil 6 in the saturates and aromatics models, we expect this to indicate that the experimental resin value of sample 2 is incorrect. Asphaltenes. Figure 12 shows the calibration curve for the asphaltene fraction based on IR. Crude oil 4 did not fit the IR model, while all samples were included in the NIR modeling. Sample 4 is a very light condensate, a fact that could explain the problems of modeling its asphaltene content on the basis of the IR spectra. For the asphaltene fraction the prediction errors are relatively higher than for the other fractions. There are probably two main reasons for this, one being the poor sample set with 14 of 18 samples in the narrow 0-2 wt % asphaltene region, and the other one being experimental uncertainty. The reported experimental uncertainty in the SARA fractionation procedure is 0.2 wt %. Bearing in mind that 10 of the 15 samples have equal to or less than 1.0 wt % asphaltene, this naturally give rise to appreciable uncertainty in the asphaltene prediction of these samples. In addition, the experimental uncertainty in the high asphaltene samples, such as samples 9 and especially 18, is probably higher than the reported mean value of the uncertainty. The high value of the prediction error (RMSEP 1.21.3 wt %) is mainly due to problems with modeling of the most asphaltene rich samples. There are several explanations for this, the experimental uncertainty for

these samples has been mentioned, but also the lack of more high asphaltene samples in the sample set contributes. The inclusion of more heavy crude oils would inevitable produce a more robust asphaltene prediction model. In the present model, high asphaltene properties are mainly based on only one sample, namely crude oil 18. On the other hand, it should be pointed out that the inclusion of more highly asphaltenic oils could degrade the ability to predict resins, since the spectral effects of these two groups are similar. Table 4 sums up the predictive power of the 8 models developed. The experimental uncertainties are also included for reference. Conclusion It has been shown that both infrared and nearinfrared spectroscopy can be used to determine the saturate, aromatic, and resin content of a crude oil in a fast and simple manner. Estimated errors of prediction are comparable to the reported experimental uncertainty for the present HPLC fractionation procedure. The uncertainties in the prediction models based on IR spectroscopy are 2.5, 2.2, 1.4, and 1.3 wt % for the saturate, aromatic, resin, and asphaltene fraction, respectively. For NIR the equivalent uncertainties are 2.8, 2.4, 1.4, and 1.0 wt %. These values are in the same range as the reported uncertainty in the direct determination of the SARA parameters by HPLC. The results for asphaltene modeling are a bit more inconclusive than for the other fractions. This is probably mainly due to an inadequate sample set, and could be improved by utilizing a more well-distributed sample set for calibration. Generally, IR spectroscopy seems to perform slightly better than NIR in predicting the saturate and aromatic fraction of crude oils. For the asphaltene fraction NIR performs slightly better than IR spectroscopy. The differences are small though, and the sample set is limited, 16 samples for IR and 18 for NIR. The prediction models could probably be further improved by including more samples in the calibration set. Also, due to the relatively high experimental uncertainty in the SARA determination, the inclusion of well-characterized model oils in the calibration set could contribute to even better and more robust prediction models. The samples used in this study include both light condensates and heavy crude oils, making the prediction models valid over a broad range. The spectroscopic techniques are less complicated and time-consuming than the present SARA fractionation procedure, which is based on a more tedious and time-consuming HPLC method.


Energy & Fuels, Vol. 15, No. 5, 2001

Aske et al.

Acknowledgment. Narve Aske acknowledges the Flucha II program, financed by The Research Council of Norway (NFR) and oil industry, for a Ph.D. grant. Harald Kallevik acknowledges Flucha II for a postdoctoral grant. Statoil R&D Centre is thanked for the use

of all necessary instrumentation in this research. Harald Ulleberg and Kari Elise Berg are thanked for skillful assistance in the experimental work.