Professional Documents
Culture Documents
Neuromorphic Seatbelt State Detection For In-Cabin Monitoring With Event Cameras
Neuromorphic Seatbelt State Detection For In-Cabin Monitoring With Event Cameras
Event Cameras
Paul Kielty1, Cian Ryan2, Mehdi Sefidgar Dilmaghani1, Waseem Shariff1, Joe Lemley2, and Peter Corcoran1
1
University of Galway, Galway, Ireland
2
Xperi Corporation, Parkmore Indl. Estate, Galway, Ireland
Abstract
Neuromorphic vision sensors, or event cameras, differ from conventional cameras in that they do
not capture images at a specified rate. Instead, they asynchronously log local brightness changes at each
pixel. As a result, event cameras only record changes in a given scene, and do so with very high temporal
resolution, high dynamic range, and low power requirements. Recent research has demonstrated how
these characteristics make event cameras extremely practical sensors in driver monitoring systems
(DMS), enabling the tracking of high-speed eye motion and blinks. This research provides a proof of
concept to expand event-based DMS techniques to include seatbelt state detection. Using an event
simulator, a dataset of 108,691 synthetic neuromorphic frames of car occupants was generated from a
near-infrared (NIR) dataset, and split into training, validation, and test sets for a seatbelt state detection
algorithm based on a recurrent convolutional neural network (CNN). In addition, a smaller set of real
event data was collected and reserved for testing. In a binary classification task, the fastened/unfastened
frames were identified with an F1 score of 0.989 and 0.944 on the simulated and real test sets
respectively. When the problem extended to also classify the action of fastening/unfastening the seatbelt,
respective F1 scores of 0.964 and 0.846 were achieved.
1 Introduction
Neuromorphic vision describes a class of sensors designed to mimic biological perceptual functions. One such
sensor is an event camera, which differs from a conventional camera in that each pixel records data
asynchronously. Whenever one of these pixels detects a relative change in brightness above a set threshold an
‘event’ is logged. Each event is comprised of a timestamp, the coordinate of the pixel that reported the event, and
a polarity to indicate whether an increase or decrease in brightness occurred. The event camera does not output
images, but a list of events generated by motion or lighting changes in the scene. The event data has no intrinsic
framerate, however, its time resolution exceeds that of video captured at 10,000 frames per second. Event cameras
also offer higher dynamic range and lower power consumption than most conventional shutter cameras [Gallego
et al., 2022].
A 2018 meta-analysis found that a fastened seatbelt reduces the risk of injury in road collisions by 65%
[Fouda Mbarga et al., 2018], and in the United States, seatbelt use was shown to reduce mortality by 72%
[Crandall et al., 2001]. Existing seatbelt alert systems in modern vehicles rely on pressure sensors in the seat to
determine occupancy and simply detect if the seatbelt tongue is inserted in the buckle. This can easily be spoofed
by buckling and sitting in front of the seatbelt, and has no ability to determine if a seatbelt has been fastened
correctly. Also, it is often only implemented in the front seats of the vehicle. Camera-based seatbelt detection
systems have the potential to rectify these flaws. With the ever-increasing demand for safer, more intelligent
vehicles, there have been remarkable developments in camera-based DMS. At this stage they have been fully
implemented in many modern consumer vehicles. With the camera systems already in place, it is possible to add
new DMS features with minimal additional cost. Recent research has revealed how event cameras hold many
advantages over standard shutter cameras in for driver monitoring tasks, particularly when it comes to face and
eye motion analysis [Ryan et al., 2021, Chen et al., 2020]. In this paper, we demonstrate the viability of another
feature in an event-based DMS by creating the first event-based seatbelt state detector.
4 Network Architecture
It is difficult to distinguish individual fastening/unfastening frames,
but it becomes obvious when the whole sequence of frames is
considered. Additionally, for the static classes with unreliable
seatbelt visibility, using a sequence of frames can provide a more
reliable result. For these reasons we used a recurrent CNN
architecture which takes a frame sequence is as the input for each
prediction. Fig. 2 gives a high-level overview of the structure. The
MobileNetV2 network is used as an efficient, lightweight backbone
for initial feature extraction [Sandler et al., 2018]. Recent years have
seen self-attention introduced to many CNN tasks for its ability to
contextualize and apply a weighting to input features, with only a
small computational cost. The self-attention module in our proposed
network is implemented according to [Zhang et al., 2018]. When
attended feature maps have been generated for every frame of the
input sequence, they are stacked and passed to the recurrent head of
the network. This is comprised of a 2 stacked bi-directional LSTM
layers [Hochreiter and Schmidhuber, 1997].
5 Training
In this work, two models were trained. The first was for binary
classification of frame sequences using the static fastened/unfastened
classes only. For the second model, all classes were included to
determine if the 4 states could be reliably identified, as they must all
be handled in a real-world implementation. To train the network, the Figure 2: Proposed network architecture.
videos were split into single-class sequences of 15 frames, before
randomized cropping and downsampling to a resolution of 256x256.
Using cross-entropy loss and a batch size of 15 sequences, the
network was trained for 30 epochs. The initial learning rate of 1x10-4
was halved every 5 epochs.
An added benefit of the self-attention layer is allowing us to
visualize the areas in each frame that are more heavily weighted by
the network. This can be helpful to verify that the network is utilizing
appropriate features. Fig. 3 shows these weighted regions tracking
the seatbelt when visualized on the real event videos in the test set.
References
[Chen et al., 2020] Chen, G., Hong, L., Dong, J., Liu, P., Conradt, J., and Knoll, A. (2020). Eddd: Event-based
drowsiness driving detection through facial motion analysis with neuromorphic vision sensor. IEEE Sensors
Journal, 20(11):6170–6181.
[Crandall et al., 2001] Crandall, C. S., Olson, L. M., and Sklar, D. P. (2001). Mortality reduction with air bag
and seat belt use in head-on passenger car collisions. Am. J. Epidemiol., 153(3):219–224.
[Delbrück et al., 2020] Delbrück, T., Hu, Y., and He, Z. (2020). V2E: from video frames to realistic DVS event
camera streams. CoRR, abs/2006.07722.
[Fouda Mbarga et al., 2018] Fouda Mbarga, N., Abubakari, A.-R., Aminde, L. N., and Morgan, A. R. (2018).
Seatbelt use and risk of major injuries sustained by vehicle occupants during motor-vehicle crashes: a sys-
tematic review and meta-analysis of cohort studies. BMC Public Health, 18(1):1413.
[Gallego et al., 2022] Gallego, G., Delbrück, T., Orchard, G., Bartolozzi, C., Taba, B., Censi, A., Leutenegger,
S., Davison, A. J., Conradt, J., Daniilidis, K., and Scaramuzza, D. (2022). Event-based vision: A survey.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):154–180.
[Hochreiter and Schmidhuber, 1997] Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory.
Neural computation, 9:1735–80.
[Jain, 1989] Jain, A. K. (1989). Fundamentals of Digital Image Processing. Englewood Cliffs NJ: Prentice-
Hall.
[McCarthy, 1960] McCarthy, J. (1960). Recursive functions of symbolic expressions and their computation by
machine. Communications of the ACM, 7:184–195.
[Ryan et al., 2021] Ryan, C., O’Sullivan, B., Elrasad, A., Cahill, A., Lemley, J., Kielty, P., Posch, C., and
Perot, E. (2021). Real-time face eye tracking and blink detection using event cameras. Neural Networks,
141:87–97.
[Sandler et al., 2018] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018). Mo-
bilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 4510–4520.
[Zhang et al., 2018] Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. (2018). Self-attention generative
adversarial networks