You are on page 1of 4

AUTOMATIC DETECTION OF OBSTRUCTIVE SLEEP APNEA USING FACIAL IMAGES

Asghar Tabatabaei Balaei1, Kate Sutherland2, Peter A. Cistulli2 Philip de Chazal1


1
Charles Perkin Centre, Faculty of Engineering, University of Sydney, Australia
2
Charles Perkin Centre, Faculty of Medicine, University of Sydney, Australia
ABSTRACT and robustness of the automatic landmarks have been the
subject of many research papers [3].
Obstructive sleep apnea (OSA) is a medical condition in Facial landmark detection techniques have been used by
which the airway is repetitively obstructed and resulting in researchers to detect or predict certain medical conditions.
sleep disruption. Previous research has shown that this These medical conditions affect particular facial features
condition may be the cause or the result of the craniofacial which are calculated by the detected landmarks on the
structure, and that specific facial features such as ‘face patient’s facial image by the computer. For example [4],
width’ or ‘eye width’ are correlated with the risk of OSA. reviews the recent studies which investigate the link
In this study we developed two automatic image between genetic abnormalities and facial features. Also [1]
processing systems that processed facial images and investigates the relationship between facial dimensions and
determined the likelihood of a subject having OSA, based UA structures.
on a dataset of photographs from 365 apnea and control In prediction of OSA, MRI images have shown to be
subjects. In our first approach, an algorithm was developed applicable in monitoring the excess of the soft tissues in the
to calculate craniofacial photographic features that were UA which results in its blockage and hence sleep apnea [5,
previously shown to be useful for OSA discrimination. 6]. Overnight sleep monitoring in a sleep laboratory is the
These features were processed with a logistic classifier and standard approach in OSA detection. These techniques are
the resulting system achieved an accuracy of 70% in expensive and time consuming. As an alternative to the
discriminating patients with clinically significant OSA from mentioned approaches, researchers have examined the
controls. In our second approach, a neural network was effectiveness of analyzing facial profile and frontal images
designed to automatically process the frontal and profile to detect OSA [1, 7, 8, 9].
photographs directly and classify the patient as a normal or Apnea and hypopnea index (AHI) is the recorded
OSA. It achieved an accuracy of 62%. number of pauses in breath which happen per hour of sleep.
It is the metric that is used to define the severity of OSA. In
Index Terms— obstructive sleep apnea, face landmark [10], the relationship between the AHI and some facial
detection, neural networks features extracted from the subjects’ frontal and profile
images have been identified. Using manual markings of
1. INTRODUCTION frontal and profile facial images, an accuracy of 75.6% in
detecting OSA was reported. The features used in [10] have
Daytime sleepiness and fatigue can be caused by a reduced been used in this study. In [7], these features were shown to
quality of sleep at night. One reason for reduced quality be effective in detecting OSA with accuracies in the range
sleep is obstructive sleep apnea (OSA), which is associated of 70 ± 5%. A key limitation to the widespread application
with frequent breathing disturbances caused by upper of the method is the reliance on manual landmark detection.
airway (UA) obstruction and each lasting more than 10 In this research, we use machine learning techniques to
seconds during sleep. Certain craniofacial structures can be train our system to automatically identify the same facial
the cause or the result of the UA collapsibility and landmarks manually detected in [10]. We then use these
obstruction during sleep. Previous studies have suggested a landmarks to generate the facial features for the purpose of
correlation between these structures and the OSA prediction of OSA. Furthermore, using a neural network, we
anatomical risk factors such as obesity and the enlargement show that it is possible to detect OSA without the step of
of UA soft tissue structure [1]. calculating facial landmarks and features. This goal is
Face landmark detection is a technique developed by achieved by directly feeding the down sampled images to a
researchers in computer vision to automatically detect some feed forward neural network classifier.
particular landmarks in human faces using machine learning
algorithms [2]. Manually annotated facial images are 2. DATASETS AND METHODOLOGY
provided to these algorithms as training datasets and the
algorithms learn to automatically generate the same 2.1. Datasets
landmarks for unseen test images. Increasing the accuracy Two sets of images were used in this study, which are
referred to as Dataset 1 and Dataset 2.

978-1-5090-1172-8/17/$31.00 ©2017 IEEE 215


In the first set of images (Dataset 1), measurements are automatic landmarking step. This step demonstrates the
derived from the study conducted by Lee et.al [10] where feasibility of an automatic facial landmark detection process
subjects were referred for polysomnography to the same in measuring the required facial features to detect OSA.
hospital as the first data set for the initial investigation of In the second method, we tried to answer the following
OSA. A total of 169 subjects were included in the analysis question: If the training dataset of frontal and profile facial
where 104 subjects had OSA (AHI >=10/h) and 65 were images are provided to the image processing classifiers, are
selected as controls (AHI <10/h) [10]. Further details on they able to predict OSA without the intermediate step of
collecting this dataset including the calibration method are measuring the craniofacial features used in the first method?
in [10]. Does the intermediate step of measuring facial landmark
The second set of craniofacial photographs (Dataset 2) features boost or limit the classifiers’ predicting
were obtained from consecutive people attending for an performance?
overnight sleep study (polysomnography) at a clinical sleep
laboratory (Royal North Shore Hospital, Sydney, Australia) 3. FACIAL LANDMARKS AND FEATURES
from 2013 to 2016. The craniofacial photography was
performed as part of a study to collect phenotypic Research in the area of face landmark detection can be
information from people with OSA in order to investigate divided into two main categories: model fitting [11] and
clinical phenotypes of disease. Frontal and profile facial regression [12]. In the first approach, using the existing
photographs were taken using a consumer digital camera training dataset a statistical model of the face is generated to
(Canon Digital IXUS 75 PC1227) by trained researchers. A fit test images. This approach has limitations in dealing with
laser ruler was used to calibrate the facial photo variations of the images (scale, pose, illumination, etc.). In
measurements which consisted of two parallel laser beams the second approach, face landmarks are localized by
of fixed distance apart (10 mm) projected onto the face for solving the regression problem which relates the features of
the photograph. A marking pen suitable for skin was used to the training datasets and their associated landmarks. The
mark bony landmark gonion (lateral point on the angle of estimated parameters are then used to detect the landmarks
the mandible) (see Figure 1, ‘go’ landmark) which is not of the test images. This is the approach that we have used to
readily visible on the photograph. For the photograph, detect both the 21 profile and the 14 front face landmarks
subjects were asked to assume neutral head position (by illustrated in Fig 1. Training datasets include 365 images
asking them to imagine looking into their own eyes in a from the datasets that were described in section 2.1. We
mirror) and to keep a neutral facial expression with lips and used the C++ machine learning toolkit DLIB [13]. A
teeth lightly touching. Care was taken to ensure that the Support Vector Machine (SVM) classifier is first used to
photos were taken as a true frontal view (both ears equally detect the object and then a cascade regression technique is
visible) and true profile (subject 90° to camera). There were used for landmark detection. This technique was presented
196 subjects with 77 controls (AHI<10) and 119 apnea in [14]. The detected landmarks are then used for calculation
patients (AHI>10). of the craniofacial features explained in [10]. There are
more than 200 features of which four have proved to have
more influence in the prediction of OSA [8]. We used the
same four features in our system. The four features are “face
width” (L27), “eye width” (L62), “cervicomental angle”
(A18) and “mandibular length” (L65).
We also considered a “no calibration” approach to
feature calculation. The three length features (L27, L62 and
L65) were replaced with the ratio features L27/L62 and
L65/L62. The two ratio features were combined with A18
and used as “no calibration” features.

(a) Front image (b) Profile image Auto(mean) Manual(mean) Mean deviation from Manual
mean(abs(Auto-Man)/Man)%
Figure 1: Front and side facial images. 14 landmarks shown L27 7.111cm 7.009cm 10.0%
on the front (a) and 21 landmarks shown on the profile L62 14.262cm 13.252cm 9.0%
image (b) L65 2.701cm 2.614cm 9.2%
A18 155.547deg 160.169cm 5.3%

2.2. Methods
Table 1: The measured facial features using manual and
Two methods have been used in automatic prediction of
automatic landmarks on Dataset 1
OSA using frontal and profile facial images:
In the first method, we followed the work presented in Table 1 compares the automatically and manually
[10] but replaced the manual landmarking step with an determined features on Dataset 1 for the four selected

216
uncalibrated facial features. The table shows that the
automatic landmark detection introduced approximately a
10% error relative in the determination of the features.

Figure 3: Block diagram of OSA detection by directly


processing the images
In the image preprocessing step, the front and the side
views of the face were isolated from the full images using
the SVM classifier described in section 3. This was done to
eliminate the parts of the image which were not useful in
OSA detection such as shoulder, chest and clothing. The
resulting square images were resized to 50 by 50 pixel color
images and used as the input vectors to the neural networks.
The size of this vector was 15000 for each subject to include
the information of the side and front images with RGB
Figure 2: ROC for OSA detection using manual and
components (total number of the pixels per each pair of
automatic landmarks on Dataset 2 with calibrated features
front and profile images 50*50*2*3=15000).
We used a feed forward neural network with 15000
In the next step a logistic regression classifier was used inputs, 4 hidden neurons and 2 outputs as our classifier with
to predict OSA using both manually and automatic the outputs predicting the presence or absence of OSA. The
generated features. We have used “sklearn” module of classifier was trained with a stochastic gradient descent
“Python” programming language for training the classifier algorithm, using mini-batches of 10 subjects, and training
[15]. stopped when the error on the validation set began to rise.
Tables 2 and 3 summarize the results of using the For the neural network classifier we used Dataset 1 and
manual and automatic features in prediction of OSA on the 2 as independent sets and we divided each dataset into
datasets using the calibrated and uncalibrated facial features. training, validation and test sets to estimate the performance.
The manual landmarks achieved the highest accuracy To analyze the effect of the size of the dataset on the
on the training data (Dataset 1) with an accuracy for the network perfromance, three training set sizes were randomly
calibrated features of 76.1% and an accuracy of 68.8% for chosen (75 subjects, 125 subjects and 165 subjects), 10
the uncalibrated features. The manual system did not subjects were randomly used for the validation set and the
generalize well as the performance on the test set (Dataset 2) remaining subjects used as the test data. This process was
was an accuracy of 63.7% with calibrated features and repeated 10 times and the mean and standard deviation of
64.2% for uncalibrated features. the performance measures calculated.
The automatic landmarks achieved a performance that Table 4 shows the test-set results on Datasets 1 and 2.
generalized well across the two datasets. The accuracy for Increasing the number of subjects in the training dataset
the calibrated features was 68.6% on the training set and resulted in modest increase in the accuracy. This suggests
69.8% on the test set. For the uncalibrated features, the that further performance gains may be achieved with larger
accuracy was 66.2% on the training and 67.3% on the test datasets.
set. These results suggest the possibility of using In the final experiment, we retrained our neural network
automatically generated facial features in predicting OSA. using the four selected features calculated from the
In Figure 2, the Receiver Operating Characteristics landmarks. For this network we had 4 inputs, 4 hidden
(ROC) and the area under the curve for both manual and neurons and two outputs. The results of these experiments
automatic detection of OSA using the Dataset 2 has been are shown in Table 5.
demonstrated which shows the performance of the As was in Table 4, increasing the number of training
automatic approach to classify the images is similar to the subjects increased the network performance and we
performance manual approach. anticipate further performance gains with larger datasets.
Comparing the results in Tables 4 and 5, reveals that the
4. NEURAL NETWORK CLASSIFIER neural network processing the four selected facial features
outperformed the network processing the pixels directly.
A summary of this method is shown in Figure 3. Hence our results suggest that intermediate step of

217
Dataset 1: Training set Dataset 2: Testing set
Acc. (%) Sens (%) Spec (%) Acc. (%) Sens. (%) Spec. (%)
Manual 76.11 78.8 70.1 63.7 67.6 55.0
Auto 68.6 71.1 58.8 69.8 68.5 76.4
Table 2: Performance results of the manually and automatically determined calibrated facial features on the training and
testing datasets

Dataset 1: Training set Dataset 2: Testing set


Acc. (%) Sens (%) Spec (%) Acc. (%) Sens. (%) Spec. (%)
Manual 68.8 70.4 63.1 64.2 65.4 59.4
Auto 66.2 69.3 53.1 67.3 67.5 67.2
Table 3: Performance results of the manually and automatically determined uncalibrated facial features on the training
and testing datasets

Data set 75 subjects 125 subjects 165 subjects apnea.,” Sleep, vol. 33, no. 9, pp. 1249–54, 2010.
[2] C. Ding and D. Tao, “A Comprehensive Survey on Pose-
Dataset 1 57.0 ∓ 5% 61.0 ∓ 5% 61.8 ∓ 5% Invariant Face Recognition,” Cvpr2015, no. 2014, pp. 1–40,
Dataset 2 59.0 ∓ 5% 59.0 ∓ 5% 60.0 ∓ 5% 2015.
[3] X. Zhu and D. Ramanan, “Face detection, pose estimation, and
Table 4: OSA detection accuracy using preprocessed image landmark estimation in the wild.,” Proc. Int. Conf. Comput. Vis.
pixle values as neural network inputs. Pattern Recognit., vol. X, no. X, pp. 2879–2886, 2012.
[4] M. C. EL Rai;, N. Werghi;, H. Al Muhairi;, and H. Alsafar,
“Using facial images for the diagnosis of genetic syndromes: A
Data set 75 subjects 125 subjects 165 subjects survey,” in Communications, Signal Processing, and their
Applications (ICCSPA),International Conference on, 2015, pp.
Dataset 1 61.0 ∓ 5% 63.0 ∓ 5% 66.0 ∓ 5% 1–6.
Dataset 2 58.0 ∓ 5% 62.0 ∓ 5% 64.0 ∓ 5% [5] P. A. Schwab RJ1, Pasirstein M, Pierson R, Mackley A,
Hachadoorian R, Arens R, Maislin G, “Identification of upper
Table 5: OSA detection accuracy using the four selected airway anatomic risk factors for obstructive sleep apnea with
facial features as the neural network inputs. volumetric magnetic resonance imaging,” Am J Respir Crit Care
Med, vol. 168(5):522, p. 168(5):522-30, 2003.
[6] M. Okubo et al., “Morphologic analyses of mandible and upper
airway soft tissue by MRI of patients with obstructive sleep
measuring facial landmark features boosts the apnea hypopnea syndrome.,” Sleep, vol. 29, no. 7, pp. 909–915,
classifier’s predicting performance. 2006.
[7] F. Espinoza-cuadros, R. Fernández-pozo, D. Toledano, J.
Alcázar-ramírez, E. López-gonzalo, and L. Hernández-gómez,
“Speech Signal and Facial Image Processing for Obstructive
4. CONCLUSION AND FUTURE DIRECTIONS Sleep Apnea Assessment,” Comput. Math. Methods Med., vol.
2015, pp. 1–13, 2015.
In this research, the possibility of automatic detection of [8] R. W. W. Lee, P. Petocz, T. Prvan, A. S. L. Chan, R. R.
Grunstein, and P. a Cistulli, “Prediction of obstructive sleep
OSA using frontal and profile facial images was apnea with craniofacial photographic analysis.,” Sleep, vol. 32,
investigated. In a first step, the facial images with their no. 1, pp. 46–52, 2009.
manually annotated landmarks were used to automatically [9] H. Nosrati and P. De Chazal, “Apnoea-hypopnoea Index
detect the facial landmarks of the new images. These Estimation using Craniofacial Photographic Measurements,” in
Computing in Cardiology, 2016, p. In Press.
landmarks were then used in classifier to predict OSA. An [10] R. Lee, A. Chan, R. Grunstein, and P. Cistulli, “Craniofacial
accuracy of 70% was achieved in this stage using a logistic phenotyping in obstructive sleep apnea--a novel quantitative
classifier. In a second step the same training images (after photographic approach.,” Sleep, vol. 32, no. 1, pp. 37–45, 2009.
pre-processing) were directly fed into a neural network [11] T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active
Appearance Models,” IEEE Trans. Pattern Anal. Mach. Intell.,
classifier and an average accuracy of 62% was achieved in vol. 23, no. 6, pp. 681–685, 2001.
the detection of OSA. Although with these accuracies, our [12] X. P. Burgos-Artizzu, P. Perona, and P. Dollar, “Robust face
algorithms in the current stage may not be considered to landmark estimation under occlusion,” Proc. IEEE Int. Conf.
replace the normal sleep monitoring procedures to diagnose Comput. Vis., no. October 2016, pp. 1513–1520, 2013.
[13] D. E. King, “Dlib-ml: A Machine Learning Toolkit,” J. Mach.
OSA, they demonstrate some of the features that can be Learn. Res., vol. 10, pp. 1755–1758, 2009.
used in achieving this goal. New facial features can possibly [14] V. Kazemi and J. Sullivan, “One millisecond face alignment with
be incorporated in the algorithm to increase the accuracy. an ensemble of regression trees,” in Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern
Recognition, 2014, pp. 1867–1874.
5. REFERENCES [15] “Python machine learning module: http://scikit-
learn.org/stable/".”
[1] R. W. W. Lee et al., “Relationship between surface facial
dimensions and upper airway structures in obstructive sleep

218

You might also like