Conference Proceedings1 - SPO2 - Cheran College

MACHINE LEARNING APPROACH FOR THE
NONINVASIVE ESTIMATION OF BLOOD OXYGEN

SATURATION FROM PHOTOPLETHYSMOGRAPHY
SIGNALS
Jean Effil N1, Rajeswari R2
1
Department of Computer Science, Government College of Arts and Science, Kadayanallur, India
2
Department of Computer Applications, Bharathiar University, Coimbatore, India
ABSTRACT
Measuring physiological parameters such as heart rate, blood pressure, blood oxygen saturation and respiration rate
using PPG devices are gaining popularity as it is inexpensive, portable, and user friendly. Advancement in signal
processing, machine learning and the wearable commercial products integrated with PPG technology have motivated
in the design and development of methodology for the noninvasive estimation of blood oxygen saturation(SPO2) from
photoplethysmography signals. The conventional method referred as ratio to ratio method determines the SPO2 level
from two different sets of PPG signals obtained by passing visible red light and infra red light to a peripheral body
part. In this proposed system, machine learning approach is used to calculate SPO2 from PPG signals captured from
conventional PPG technology. Beth Israel Deaconess Medical Center(BIDMC) dataset collected from Physionet’s
Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) II online waveform database was used for
developing the learning models. PPG Signals along with reference SPO2 values were retrieved from dataset and used
for this work. Signal partitioning and data cleaning was performed. Five regression algorithms such as 1 dimensional
Convolutional Neural Network(CNN), Long Short Term Memory(LSTM), Support Vector Regression(SVR), Random
forest and K-Nearest Neighbour have been implemented and performance analysis have been performed. CNN and
Support vector regression were found to produce promising results.
Keywords : BIDMC dataset, machine learning, photoplethysmography signals, oxygen saturation, deep
learning
1. INTRODUCTION
Blood oxygen saturation (SPO2) denotes the percentage of oxygen present in blood and it is helpful in checking how
well the lungs are functioning and also to measure the acid-base balance in the blood. As we know, Oxygen is
essential for life, all tissue cells depend on Oxygen for their functioning and survival. If the supply of oxygen is
interrupted, this energy-generating process is curtailed and eventually ceases, thus resulting in permanent cell damage
and ultimately cell death and organ failure [1]. In this present pandemic situation, people dread about the spread of
142 | P a g e
COVID 19 virus, which affects the lungs thus causing a life threatening disease. Low Blood oxygen saturation in the
blood is the indication of a corona virus attack. Given this circumstance, there is an urgent and compulsory need to
monitor the oxygen saturation level regularly and ensure healthy living conditions.
Oxygen saturation is measured by examining the percentage of oxygen in hemoglobin, which is the pigment of
red blood cells that carries oxygen and supplies oxygen to the tissues in different parts of the body. Hemoglobin is
found in two forms, the first form that is essential for life is called oxygenated hemoglobin, which is oxygen loaded
and is denoted as HbO2. The second form that is essential for continuity of life is called deoxygenated hemoglobin,
which is oxygen depleted and is denoted as Hb (“oxygen-depleted”). Peripheral blood oxygen saturation (SpO2) is the
measure of relative concentration of oxygenated hemoglobin molecules in the arterial blood with respect to the total
amount of hemoglobin in the blood[2]. The normal reading of oxygen saturation for a healthy human is 95% to
100%. The person with lung condition will have oxygen saturation between 88% to 92%. The low oxygen level in the
blood has to be considered as a serious medical condition, because if this condition is left unattended and untreated, it
may lead to lack of oxygen supply to the tissues which in turn would result in organ failure in the long run. A person
infected with COVID 19, who is monitoring his clinical status at home, needs to ensure, if the SPO2 reading is
consistently at or above 90% and if the reading falls below 90%, then he or she needs to immediately seek medical
attention.
Arterial blood gas evaluation is the globally accepted clinical standard for the measurement of SPO2
accurately in an invasive manner. To perform this test, a health care provider takes the subject‟s blood sample by
puncturing the artery using a needle. Hence naturally it is painful and causes bleeding, bruising and soreness at the
spot where the needle has punctured the body. Another widely used method to measure SPO2 is
photoplethysmography (PPG), this is a noninvasive method and its working principle is based on how the hemoglobin
in the respective subject‟s blood reflects and absorbs light. This method is safe, convenient and enables continuous
monitoring of oxygen saturation without causing any bodily discomfort. The most common site of measurement of the
blood flow is the fingertip or earlobe in which arterial blood runs close to the skin. The most common medical device
with which photoplethysmography is implemented is a pulse oximeter. Nowadays there are many commercial
wearable products such as smart watches, smart bracelets and smart phones available, that measures vital signs using
the principle of photoplethysmography [3]. There are ample research works being carried out in this field in the recent
years. The present unhindered availability of ever-updated and continuously advancing computational power, signal
processing methods and highly developed and developing pattern recognition algorithms that compute at very high
speed has led to the emergence of computer-based measuring systems that can be trained to perform complex tasks in
bioinformatics [4].
In this work, machine learning (ML) based estimation of oxygen saturation using photoplethysmography
signals is proposed. Three benchmark machine learning models namely Random forest, K –Nearest neighbor and
support vector machine have been used. Training the ML regression models was performed with time series data of
PPG signals and reference SPO2 values which are collected from Physionet‟s Multi-parameter Intelligent Monitoring
in Intensive Care (MIMIC) II online waveform database. Among the three ML models, it has been evident that that
143 | P a g e
Random forest produced promising results and also outperformed the other two models. The rest of this paper is
organized as follows; Section 2 describes the underlying concept of photoplethysmography in the measurement of
oxygen saturation and the related research in this area. Section 3 provides an overview of the database used and
elaborates the design of oxygen saturation estimation models using Machine learning. Section 4 discusses about the
experimental results were discussed and Section 5 gives the conclusion of the paper.
2. UNDERLYING TECHNOLOGY AND RELATED RESEARCH
2.1 Photoplethysmography
Photoplethysmography is presently accepted and a widely used technique for noninvasive measurement of vital health
parameters that are associated with cardiovascular system. Vital health signs to be monitored includes the heart rate,
blood pressure, oxygen saturation, respiration rate and peripheral vascular diseases. Photoplethysmography (PPG) is a
non-invasive technique for detecting blood volume changes during a cardiac cycle at selected body locations. The
technique functions by illuminating skin with penetrating optical radiation usually from a light emitting diode and
detecting the signal strength and patterns using a photo detector[5]. There are two methods of capturing the signal.
One is transmissive mode, in which transmitted light signal is detected by positioning photodetector directly across
the light source. This method is employed in pulse oximeter and is commonly used in hospital set ups. Another
method uses reflection mode in which the reflected light signals pattern and luminosity is detected using a capturing
device placed next to the light source. Reflective mode is applied in wearable products and smart phone [6].
Reflection mode PPG is a much useful configuration than transmission mode because of the shallow penetration depth
of optical radiation [5]. The most common site of measurement is fingertips [7], earlobe [8], wrist [6] and forehead [9]
since the blood flow can be easily detected in these areas [10] [11]. In recent years, PPG technology has been
integrated into commercial wearable products such as smart bracelets, smart watches, wristbands, fitness trackers etc.
for the measurement of vital signs and has gained the support of customers that are concerned over their health
[12][11].
2.2 Related Research
Presently ample research works are being carried out with the goal of achieving accurate results for vital signs using
PPG technology. The conventional method referred as ratio to ratios method uses visible red light and infrared light
for measuring blood oxygen saturation level[2] [6] [7] [10]. The interesting nature of blood is the various degrees to
which a respective subject‟s blood reflects and absorbs light. The oxygenated blood (HbO2) absorbs more infrared
light and the deoxygenated blood (Hb) absorbs more visible red light. The oxygen saturation is defined as the ratio of
oxygenated hemoglobin relative to total hemoglobin (oxygenated hemoglobin + deoxygenated hemoglobin). This
method of measuring can be accomplished by illuminating the body part with both red LED that emits red light and an
infrared LED that emits infrared light. In case of transmissive PPG device the light transmitted through the tissue is
144 | P a g e
measured. In case of reflectance PPG device light reflected by tissue is measured. By comparing their relative
intensities SPO2 can be determined. Typical pulse oximeters use red light with wavelength range of 600 nm – 750 nm
and an infrared light with the wavelength range of 850 nm – 1000nm [13]. Most of the new commercial wearable
products use reflectance PPG [13]. Another method of measurement is non contact imaging PPG implemented by
using Broadband lighting and RGB camera [14][15].
In recent years, researchers have proposed machine learning based oxygen saturation estimation. In [6][9] method
the PPG signal extracted from body part and the signal is segmented. Then the machine learning algorithm is applied
to signal to classify them into reliable and unreliable signal. After which the conventional ratio to ratios formula is
applied to reliable signals. In [16] calibration method the curve is derived from preprocessed red and infrared PPG
signal by applying independent component analysis and SPO2 that was predicted from it. It has been proven that this
method produces 13% better results compared to Discrete Saturation Transform [17]. In [18] reflectance PPG signal
and reference method SPO2 has been collected from 95 subjects using custom SPO2 data acquisition platform. PPG
signals are preprocessed and valid signals are obtained. Three machine learning algorithms that are employed are
Bagged trees, K-Nearest Neighbor (KNN) and Quadratic Discriminant Analysis (QDA) are used to derive optimal
model. Bagged tree model is found to outperform other models with accuracy of 96% and +/-2% error band. The
dataset used for machine learning in prior studies consists of red PPG and infrared PPG signals along with actual
SPO2 values obtained using medical devices.
In this proposed work, the dataset used for machine learning comprises of a single time series data of PPG signals
along with reference SPO2 values. This is a calibration free model, the PPG signal is directly given as input to five
different learning regression models and promising results have been obtained. Comparative study was performed to
find the algorithm that produces good results.
3 PROPOSED METHOD
The proposed method is accomplished in four steps: 1)Data collection 2) Signal partitioning and data cleaning 3)Data
Partitioning 4) Regression models
3.1 Dataset Collection
Dataset was collected from Physionet‟s Multi-parameter Intelligent Monitoring in Intensive Care (MIMIC) II online
waveform database [19]. The dataset consists of 53 recordings each of 8 minutes duration recorded at the sampling
rate of 125 Hz. It was been obtained from 53 adults during hospital care at the Beth Israel Deaconess Medical
Center(BIDMC),MA,USA at Boston. Each recording consists of continuous physiological signals such as PPG, ECG
and respiratory signals and physiological parameters such as respiration rate, heart rate and blood oxygen saturation.
PPG signals and the reference parameter, blood oxygen saturation of the 53 records were used for this work.
3.2 Signal Partitioning and Data Cleaning
Each PPG signal that was recorded in eight minutes duration with the sampling rate of 125 Hz is partitioned in such a
145 | P a g e
way each partition consists of signals with 1 second duration. The corresponding reference SPO2 value was noted. It
was observed that few of the reference SPO2 values were found to be NAN values. Those values and the
corresponding signal blocks are omitted. The resultant dataset consists of 25312 observations that were adequate for
training the regression models.
3.3 Data Partitioning
Data partitioning was performed for training the regression models. The resultant dataset from previous step was
partitioned into training set consisting of first 22781 pairs of signals and the corresponding reference SPO2 values and
testing set with last 2531 data pairs. Each regression model was trained with training dataset and the performance of
the model was evaluated with testing dataset.
3.4 Regression models
3.4.1 Long Short Term Memory(LSTM) regression
LSTM is a type of recurrent neural network model which is suitable to analyze and learn dependencies between time
series data[20]. In this work, LSTM network was developed in Python using the keras deep learning library. The
network is constructed with 1 input timestep and 4 input features in the visible layer, a hidden layer with 200 LSTM
blocks that is followed by dense layer with100 neurons. Activation function „Relu‟ is used for the LSTM blocks.
Final layer is the fully connected layer with a single neuron that predicts the SPO2 value. The training is performed
for 200 epochs with mini batch size of 10 using the solver adaptive moment estimation(ADAM).
3.4.2 Convolutional Neural Network(CNN) regression
Convolutional neural network is the feed forward neural network model that is being applied in various image
processing applications. CNN accepts data in matrix form or input images and does feature extraction and
classification or regression. One dimensional(1D) CNN is found to handle time series data effectively and produce
good results[21]. Hence 1D CNN was used for this work. In this work, 1D CNN was developed in Python using the
keras deep learning library. The network is designed with input layer that accepts time series data of PPG signal, two
convolutional layer with each having 64 filters and 32 filters respectively with filter size of one element. Activation
function relu is used in these layers. This is followed by Maxpooling layer with pool size of 1 element . The output of
pooling layer is flattened into a single vector of data and send as an input to fully connected layer with 100 neurons
and relu activation function is applied which is followed by another fully connected layer with single neuron that
produces a single predicted SPO2 value. The training is performed for 200 epochs with mini batch size of 10 using
the solver adaptive moment estimation(ADAM).
3.4.3 Support Vector Regression(SVR)
Support vector regression is a supervised machine learning algorithm that is well known for pediction problems.
146 | P a g e
Support vector machine has shown remarkable results in signal processing medical applications[22]. The main goal of
SVR is to find the hyperplane that best fits the data and to determine the decision boundary around the hyperplane.
Kernel helps in determining the hyperplane which is the function that maps the data points into its higher dimension.
SVR model is designed to fit as many data points as possible within the margin. Computation is minimal in SVR
when compared to other regression models. SVR developed using the function fitrsvm in MATLAB.
3.4.4 Random Forest
This algorithm works on the principle of ensemble learning. Multiple decision tree algorithms are combined to build a
more powerful prediction model and produce accurate results. It is preferable to use this model when there is large
number of features. This algorithm produces comparatively better results for the nonlinear parameters. This algorithm
is very stable, robust to outliers and less sensitive to noise. There is no need for feature normalization as it uses rule
based approach. This algorithm consumes more computational resources and requires more training time [23]. In this
work, Random forest algorithm was implemented in Python using the keras deep learning library.
3.4.5 K- Nearest Neighbour
K-NN is a non parametric algorithm and it is easy to implement. The SPO2 value of the given query point is
computed based on the mean of the SPO2 value of its k nearest neighbours. In this work, K value was chosen as 2.
This algorithm does not require separate training. It learns from the training dataset at the time of making prediction.
It works much faster compared to other prediction algorithms. But the performance degrades as the size of the dataset
becomes large [24]. In this work, keras deep learning library was used for learning purpose.
4. RESULTS AND DISCUSSION
The prediction accuracy of the learned models are evaluated using the metrics namely mean absolute error(MAE),
standard deviation(SD), relative absolute error(RAE) and root relative squared error(RRSE) and are shown in table 1.
From the table it is evident that CNN, SVM and LSTM have produced better results compared to Random forest and
KNN. Deep learning algorithms such as 1D CNN and LSTM are found to perform well with time series data.
Table 1: Prediction accuracy of various regression algorithms - a comparative study
MAE SD RAE RRSE

CNN 3.99 3.70 0.88 1.39
LSTM 4.10 3.52 0.89 1.39
SVR 3.98 3.97 0.87 1.44
Random Forest 4.16 3.68 0.91 1.42
KNN 4.24 3.80 0.93 1.46
147 | P a g e
(a) (b)
Fig 1 : Error histogram from Regression models (a) SVR (b) 1D CNN
Fig.1 shows the error histograms obtained from results produced by SVR and 1D CNN models. It was found that
74% of the data shows error rate less than 4. Bland Altman plot[25] was drawn to show the agreement between actual
SPO2 and predicted SPO2 with limits of agreement +/-1.96 SD. Figure 2 shows the Bland Altman plot constructed
for 2531 observation pairs obtained from SVR and CNN. It was observed from Bland Altman plot that error rate was
larger for SPO2 values less than 90. It is due to the unequal distribution of SPO2 value range used for training. It was
found that there were less number of data within the SPO2 range 80 – 90 in the training dataset. The prediction
accuracy of the model can be improved by training the model with dataset with equal distribution of values of all
range
(a) (b)
Fig.2 Bland Altman plot for 2531 observation pairs obtained from regression models (a) SVM (b) CNN
5. CONCLUSION
In this paper , a machine learning approach is used to predict the blood oxygen saturation level from PPG signals.
148 | P a g e
Performance of five different regression algorithms in the estimation of SPO2 have been analysed. 1D CNN and SVR
were found to produce good results for the dataset used in this work. The accuracy of learned models can be improved
by training the models with different datasets. Any preprocessing techniques applied over the data to remove noise
may improve the results.
REFERENCES
[1] Gutierrez J, Theidorou A. Oxygen delivery and oxygen consumption in Pediatric Critical Care. In: Lucking S,
Maffei F et al, eds. Pediatric Critical Care Study Guide. London: Springer 2012, chapter 2
[2] Diab, M. K. (2008). U.S. Patent No. 7,440,787. Washington, DC: U.S. Patent and Trademark Office.
[3] Lamonaca, F., Carnì, D. L., Grimaldi, D., Nastro, A., Riccio, M., & Spagnolo, V. (2015, May). Blood oxygen
saturation measurement by smartphone camera. In 2015
[4] Goldenberg SL, Nir G, Salcudean SE. A new era: artificial intelligence and machine learning in prostate
cancer. Nature Reviews Urology. 2019 Jul;16(7):391-403.
[5] Jonathan, E., & Leahy, M. (2010). Investigating a smartphone imaging unit for
photoplethysmography. Physiological measurement, 31(11), N79.
[6] Phillips, C., Liaqat, D., Gabel, M., & de Lara, E. (2019). Wrist02--Reliable Peripheral Oxygen Saturation
Readings from Wrist-Worn Pulse Oximeters. arXiv preprint arXiv:1906.07545.
[7] Bagha, S., & Shaw, L. (2011). A real time analysis of PPG signal for measurement of SpO2 and pulse
rate. International journal of computer applications, 36(11), 45-50.
[8] Bradke, B., & Everman, B. (2020). Investigation of Photoplethysmography Behind the Ear for Pulse Oximetry
in Hypoxic Conditions with a Novel Device (SPYDR). Biosensors, 10(4), 34.
[9] Liu, S. H., Liu, H. C., Chen, W., & Tan, T. H. (2020). Evaluating Quality of Photoplethymographic Signal on
Wearable Forehead Pulse Oximeter With Supervised Classification Approaches. IEEE Access, 8, 185121-
185135.
[10] Longmore, S. K., Lui, G. Y., Naik, G., Breen, P. P., Jalaludin, B., & Gargiulo, G. D. (2019). A comparison of
reflective photoplethysmography for detection of heart rate, blood oxygen saturation, and respiration rate at
various anatomical locations. Sensors, 19(8), 1874.
[11] Tamura, T. (2019). Current progress of photoplethysmography and SPO 2 for health monitoring. Biomedical
engineering letters, 9(1), 21-36.
[12] Castaneda, D., Esparza, A., Ghamari, M., Soltanpur, C., & Nazeran, H. (2018). A review on wearable
photoplethysmography sensors and their potential future applications in health care. International journal of
biosensors & bioelectronics, 4(4), 195.
[13] Feng, Z., & Smith, M. (2014). Measuring heart rate and blood oxygen levels for portable medical and wearable
devices.
149 | P a g e
[14] Guazzi, A. R., Villarroel, M., Jorge, J., Daly, J., Frise, M. C., Robbins, P. A., & Tarassenko, L. (2015). Non-
contact measurement of oxygen saturation with an RGB camera. Biomedical optics express, 6(9), 3320-3338.
[15] Van Gastel, M., Stuijk, S., & De Haan, G. (2016). New principle for measuring arterial blood oxygenation,
enabling motion-robust remote monitoring. Scientific reports, 6(1), 1-16.
[16] Jensen, T., Duun, S., Larsen, J., Haahr, R. G., Toft, M. H., Belhage, B., & Thomsen, E. V. (2009, September).
Independent component analysis applied to pulse oximetry in the estimation of the arterial oxygen saturation
(S p O 2)-a comparative study. In 2009 Annual International Conference of the IEEE Engineering in Medicine
and Biology Society (pp. 4039-4044). IEEE.
[17] Goldman, J. M., Petterson, M. T., Kopotic, R. J., & Barker, S. J. (2000). Masimo signal extraction pulse
oximetry. Journal of clinical monitoring and computing, 16(7), 475-483.
[18] Venkat, S., PS, M. T. P. A., Alex, A., Preejith, S. P., Christopher, D. J., Joseph, J., & Sivaprakasam, M. (2019,
July). Machine Learning based SpO 2 Computation Using Reflectance Pulse Oximetry. In 2019 41st Annual
International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 482-485).
IEEE.
[19] A. L. Goldberger et al., “PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research
resource for complex physiologicsignals,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
[20] Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: Continual prediction with
LSTM,850-855.
[21] Eren, L., Ince, T., & Kiranyaz, S. (2019). A generic intelligent bearing fault diagnosis system using compact
adaptive 1D CNN classifier. Journal of Signal Processing Systems, 91(2), 179-189.
[22] Toledo-Pérez, D. C., Rodríguez-Reséndiz, J., Gómez-Loenzo, R. A., & Jauregui Correa, J. C. (2019). Support
vector machine-based EMG signal classification techniques: A review. Applied Sciences, 9(20), 4402.
[23] Pal, R., Overview of predictive modeling based on genomic characterizations. Predictive Modeling of Drug
Sensitivity, Elsevier, 121-148 (2017)
[24] Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of translational
medicine, 4(11).
[25] Bland, J. M., & Altman, D. G. (2010). Statistical methods for assessing agreement between two methods of
clinical measurement. International journal of nursing studies, 47(8), 931-936.
150 | P a g e

Conference Proceedings1 - SPO2 - Cheran College

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conference Proceedings1 - SPO2 - Cheran College

Uploaded by

Copyright:

Available Formats

MACHINE LEARNING APPROACH FOR THE

NONINVASIVE ESTIMATION OF BLOOD OXYGEN

2. UNDERLYING TECHNOLOGY AND RELATED RESEARCH

2.2 Related Research

3.1 Dataset Collection

3.2 Signal Partitioning and Data Cleaning

3.3 Data Partitioning

3.4 Regression models

3.4.1 Long Short Term Memory(LSTM) regression

3.4.2 Convolutional Neural Network(CNN) regression

3.4.3 Support Vector Regression(SVR)

3.4.4 Random Forest

3.4.5 K- Nearest Neighbour

4. RESULTS AND DISCUSSION

Table 1: Prediction accuracy of various regression algorithms - a comparative study

MAE SD RAE RRSE

You might also like