You are on page 1of 10

MM-TrafficRisk: A Video-based Fleet Management

Application for Traffic Risk Prediction, Prevention,


and Querying
Minh-Son Dao, Muhamad Hilmil Muchtar Aditya Pradana, and Koji Zettsu

National Institute of Information And Communications Technology, Tokyo, Japan

{dao, hilmil, zettsu}@nict.go.jp

Abstract—This paper introduces MM-TrafficRisk, an innova- for all purposes is intriguing, it will require more time before
tive fleet management application that harnesses dashcam video these technologies become widely adopted and commonplace
data, environmental data, and physiological data to forecast, worldwide. While awaiting the maturity of autonomous vehi-
mitigate, and investigate traffic-risk events while uncovering
traffic-risk patterns. To provide a comprehensive overview of this cles, individuals have been exploring alternative, convenient,
application, we outline its system architecture, encompassing a and cost-effective solutions that offer intelligence to assist
database, ETL processes, UI/UX components, and a fine-grained with transportation needs, both in direct tasks like driving and
text-video search engine. Additionally, we present a groundbreak- indirect roles such as coaching and management. One of these
ing two-stage near-miss accident prediction model designed to solutions involves the development of video-based systems
identify near-miss incidents within dashcam video databases,
and the text-to-video search engine, facilitating rapid searches that utilize what are known as ’smart cameras.’ These cameras
for traffic-risk events based on textual queries. These models are equipped with various techniques and can connect with in-
and search engines are rigorously evaluated in a controlled cabin and wearable sensors to help drivers respond promptly
laboratory environment to showcase their advantages. Moreover, and appropriately to traffic hazards. The use of video-based
we highlight several essential functions of the MM-TrafficRisk safety systems in commercial motor vehicles can help improve
application through snapshots, emphasizing the collaborative
efforts between a government agency and industrial companies to road safety and prevent accidents [1], [2]. These systems
develop and deploy this application in practical settings. We also use cameras to capture footage of the road and the driver’s
delve into our future endeavors, focusing on multi-modal deep behavior, which can be analyzed to identify risky driving
learning event prediction and the adaptability of our application habits and provide feedback to drivers. The implementation
to Edge AI environments. of these systems comes with challenges such as concerns over
Index Terms—traffic-risk forecasting, multi-modal deep learn-
ing models, industry-government collaboration, cross-modal
driver privacy and the cost of equipment and data analysis.
search engine However, video-based safety systems have the potential to
significantly improve fleet driver safety and reduce the risk of
I. I NTRODUCTION accidents on the road. Additionally, video recorders could be
used to capture data that could help in accident investigations
The topmost priority for the industry, businesses, and gov- and improve safety.
ernment alike is ensuring transportation safety. Each of these In the pursuit of enhancing transportation safety, govern-
entities focuses on distinct aspects: the industry strives to ments cannot remain on the sidelines. They have implemented
create products that enhance traveler safety, businesses are several policies aimed at supporting the advancement of ad-
concerned with logistics cost, fleet management, and insur- vanced driver assistance systems (ADAS) in commercial motor
ance, while the government’s primary objective is to formulate vehicles that also has the potential to improve road safety [3].
policies that safeguard human life and raise awareness about The federal government has taken steps to address this issue
traffic risks for travelers. by passing the Infrastructure Investment and Jobs Act (IIJA),
In industry, we are currently witnessing the evolution of which allocates billions of dollars in funding to transporta-
autonomous vehicles capable of operating without human tion initiatives and directs the Secretary of Transportation to
drivers. These technologies rely on a suite of sensors both conduct various motor vehicle safety-related studies. The IIJA
within and outside the vehicle, such as dashcams, stereo includes provisions aimed at addressing driver distraction and
cameras, and onboard sensors, as well as remote sensors mandates the use of driver monitoring systems to minimize or
like satellites, CCTV systems, and loop sensors. Additionally, eliminate driver distraction, driver disengagement, automation
artificial intelligence techniques are essential for extracting complacency by drivers, and foreseeable misuse of ADAS.
real-time insights from the data collected. This, however, The directive in the IIJA to incorporate privacy and data
results in a substantial operating cost for autonomous vehicles. security safeguards into any regulations or rules relating to the
While the vision of making autonomous vehicles accessible adoption of ADAS highlights the importance of stakeholders
understanding how these technologies work and what privacy 3) We present a comprehensive fleet management sys-
risks may arise. tem encompassing system architecture and user in-
The use of video telematics in commercial motor vehicles terface/user experience (UI/UX) design, meeting the
can help fleets get smarter by providing data to improve safety rigorous criteria set by both government and industry
and increase efficiency [4]. This data can be used to implement stakeholders, bridging the gap between academia and
training and safety programs that can help reduce accidents practical applications.
and improve driver safety. The telematics market is set to The structure of this paper is as follows: In Section II, we
experience explosive growth due to the rise in the adoption offer a concise overview of commercial products associated
of connected car technologies, advancements in wireless com- with traffic-risk prevention utilizing dashcams. Section III
munication technology, and the growth of the IoT. Ultimately, delves into the MM-TrafficRisk System, a collaborative effort
the use of these technologies in commercial motor vehicles involving multiple sectors. Section IV presents experimental
is important to improve road safety and prevent accidents. results for the system’s two primary functions: near-miss
By balancing safety, privacy, and transparency, stakeholders accident prediction and text-to-video searching. Finally, in
can facilitate the safe and responsible deployment of these Section V, we draw conclusions and outline future avenues
technologies in the commercial transportation industry. of work.
As this trend continues, we are motivated to leverage
multimedia data analytics and AI technology to create a fleet II. R ELATED W ORKS
management system that uses dashcam video data and other Several technology companies have developed solutions to
data collected from wearable and IoT sensors. Unlike current enhance safety in the trucking industry using various technolo-
commercial products, our system places a strong emphasis on gies such as Video-Based Safety (VBS), dash cameras, and
predicting near-miss accidents in order to help drivers avoid smart video technology. These technologies provide numerous
collisions. A near-miss accident, which is also known as a benefits such as real-time feedback to drivers, reduced accident
near accident or close call, refers to an incident that had the rates, and improved driver behavior. They can also identify
potential to cause harm, injury, or property damage, but was high-risk drivers, enhance driver coaching and training, and
ultimately prevented by chance or corrective action. reduce liability for fleets.
In our commitment to offer users, particularly fleet man- However, there are also disadvantages to consider. One
agers and driving coaches, a comprehensive understanding of major concern is driver privacy, as drivers may feel uncom-
the attributes of traffic risks for enhancing the effectiveness of fortable being constantly monitored. This can lead to potential
management, training, and self-learning, we have introduced a pushback from drivers who may perceive it as a violation
sophisticated text-to-video feature. This feature enables users of their privacy. Additionally, the cost of implementing these
to search for specific traffic risk events whose content aligns technologies, including hardware and software, can be expen-
with their textual queries. It seamlessly integrates with a traffic sive for some fleets.
risk event browsing and search interface, bridging the gap Omnitracs [5] has developed a safety solution for the truck-
between users’ conceptualizations and real-world occurrences. ing industry using Video-Based Safety (VBS) technology to
Moreover, this interface empowers users to visualize correla- improve safety. This product offers various advantages of VBS
tions and co-occurrences among data associated with the same technology, including real-time feedback to drivers, decreased
event, facilitating the development of new models for detecting accident rates, and better driver behavior. Furthermore, it
novel traffic risk events. explains how VBS technology can aid in driver coaching and
In this paper, we present a collaborative effort between a training, identifying high-risk drivers, and minimizing liability
government agency and industrial companies resulting in a for fleets.
fleet management product designed to enhance transportation Wexinc [6] offers a useful fleet management tool that
safety for both individuals and fleet companies. The gov- enhances driver efficiency and safety through the use of dash-
ernment agency has focused on developing algorithms and cams. These cameras provide valuable insights into driving
AI models for detecting and identifying traffic-risk events, behaviors and accident causes, and can serve as evidence in the
while the industrial companies have contributed by providing event of an accident. Furthermore, they can reduce accidents
essential infrastructure, conducting user surveys, gathering by providing feedback to drivers and serving as a deterrent for
feedback, supplying necessary data, and offering testing fa- risky behavior.
cilities for system evaluation. This paper’s contributions are Verizon Connect [7] offers a smart video technology that
outlined as follows: can improve fleet safety. This technology allows for the
1) We highlight the benefits of multi-sector collaboration monitoring of driver behavior and the detection of risky driving
in research and development, culminating in the intro- habits. Furthermore, it can help to clear drivers who were
duction of a new product to the market. not at fault in an accident. The use of this technology can
2) We introduce two pivotal functions critical for improv- lead to improved driver behavior, reduced insurance costs,
ing transportation safety: traffic-risk prediction, which and better legal protection. However, there are also disad-
includes near-miss accident prediction, and retrieval, vantages to using smart video technology, such as concerns
specifically text-to-video searching. about driver privacy and potential pushback from drivers who
are uncomfortable being monitored. In addition, the cost of A. System Architecture
implementing the technology may be significant, including the
In this subsection, we will provide a detailed explana-
need for hardware and software investments.
tion of the significant components of the fleet management
Netradyne [8] is a tech firm that concentrates on creating ad- system. The system has been designed and deployed for a
vanced and efficient safety solutions for business fleets. Their transportation company, with servers located at the company’s
principal product, Driveri, is a video telematics system that headquarters and branches. Clients have been installed in every
employs artificial intelligence and machine learning to assess vehicle of the company, including dashcams, wearable sensors,
driver behavior and provide immediate feedback to enhance and in-cabin IoT sensors. By providing the transportation
safety. Additionally, it provides tools for driver training and company with reliable and timely information about their
coaching, as well as fleet management capabilities like GPS fleet’s operations, including driver behavior analysis, near-
tracking, vehicle diagnostics, and route optimization. Further- miss accident detection, and abnormal event identification, the
more, Netradyne provides other safety solutions, such as a system aims to enhance the company’s overall performance.
collision avoidance system and a distracted driving detection Figure 1 illustrates the overview of system architecture.
system. The system comprises three main parts: the front-end,
Omnitracs, Wexinc, Verizonconnect, and Netradyne are back-end, and AI services. The front-end is responsible for
some of the technology companies providing safety solutions displaying data and information, as well as allowing users to
to the trucking industry. Each company has unique products interact with the system. It is designed to be user-friendly and
that utilize VBS, dash cameras, smart video technology, and intuitive, with a dashboard that provides a real-time overview
AI to provide fleet management features, analyze driver be- of the fleet’s operations. Users can also access more detailed
havior, and offer real-time feedback to drivers. While these information about individual vehicles and drivers, as well as
technologies have many advantages, it is important for fleets run reports and analytics.
to implement them with a focus on driver buy-in and under- The back-end is responsible for managing services and
standing, as well as ensuring compliance with privacy laws carrying out commands. It receives requests from the front-
and regulations. By doing so, fleets can improve safety and end and processes them to provide the desired information or
efficiency, reduce accident rates, and provide legal protection action. It also returns feedback to the front-end, allowing users
for their drivers. to see the result of their actions.
Regrettably, the aforementioned products lack the capa- The AI services are in charge of running AI services, which
bilities of predicting near-miss accidents and providing a focus on detecting and searching events and data augmentation
finely detailed text-video search engine within their systems. and annotation. This includes the dashcam-based near-miss
These functionalities have been exclusively developed and accident prediction model and the cross-modal text-video
incorporated into our own system. search engine, which allows users to search for specific events
from the dashcam database. The AI services also augment
data for enriched and diverse training data and self-supervised
III. MM-T RAFFIC R ISK S YSTEM
learning for annotation, which helps improve the accuracy of
the AI models over time.
The transportation industry has been rapidly adopting
technology-driven solutions to improve fleet management, Finally, the system’s data storage and components con-
reduce operational costs, and enhance passenger safety. In nection are critical components. The system’s data storage
this paper section, we present our work on developing a is designed to be scalable and secure, allowing the system
comprehensive product architecture for a transportation com- to store large amounts of data from multiple sources. The
pany that integrates various technologies, including UI/UX components connection ensures that all the different parts of
design, AI models, and data augmentation techniques. Our the system work together seamlessly, allowing for real-time
product includes a user-friendly UI/UX design that enables data processing and analysis.
seamless interaction with fleet managers and enhances the
B. Dashcam-based near-miss accident prediction
accuracy of the AI models. We developed a dashcam-based
near-miss accident prediction model that uses video data to A near-miss accident, also known as a near collision, refers
predict potential accidents and prevent them from happening. to an incident in which a vehicle or pedestrian almost collided
Additionally, we designed a cross-modal text-video search with another object or person but was able to avoid it at the
engine that allows users to search for specific events in the last moment. Although these events may not result in property
dashcam database using textual queries. Finally, we propose a damage or injury, they can serve as warning signs for future
data augmentation approach to enrich and diversify the training accidents. Identifying and analyzing near-miss accidents can
data and enable self-supervised learning for annotation, which provide valuable insights into potential hazards on the road,
enhances the effectiveness of our AI models. Our product pro- helping to prevent more serious accidents from occurring in the
vides a robust and comprehensive solution for transportation future. Therefore, it is essential to include near-miss accident
companies looking to leverage technology to improve fleet detection in transportation safety measures to improve road
management and enhance passenger safety. safety and reduce the risk of accidents.
Fig. 1. System Architecture: Overview

In this subsection, we introduce our approach for predicting we then calculate the extrinsic camera parameters. Using the
near-miss accidents using dashcam video data. Our two-stage known intrinsic camera parameters, the dashcam position, and
algorithm is designed to be highly effective: In stage one, we the extrinsic camera parameters, we can calibrate the 3D
focus on detecting the safety area. In stage two, we use 30 first-view frame to a 2D top-view image. To estimate the
seconds of video data from the current time and go back 30 velocity of all objects that appear in the field-of-view of the
seconds to apply our classification models, predicting what dashcam, we utilize the GPS sensors installed in the vehicle
type of near-miss accident might occur. With this innovative to estimate the ego-vehicle velocity, along with the 2D map.
approach, we can proactively alert drivers to potential hazards This process allows us to accurately measure the location,
on the road, ultimately reducing the likelihood of accidents velocity, and trajectory of objects, such as cars, bicycles,
and promoting safer driving habits. and pedestrians, that are necessary for detecting near-miss
In the first stage, Equation 1, as introduced in [9], is utilized accidents. The overview of the first stage of the near-miss
to calculate the minimum stopping distance required by a accident prediction algorithm is depicted in Figure 2.
vehicle, considering the present velocity and coefficient of An example of the results from stage 1 is depicted in Figure
friction that depends on various factors such as weather and 3. The left side of the figure shows the current velocity of
road conditions. The primary goal of this stage is to convert the ego-truck, indicating a ”NOT SAFE” situation due to the
the 3D first-person view videos into 2D top-view videos using sudden brake of the upfront car, which resulted in the violation
a meter-based coordinate system. This conversion process of the safety distance. On the right side of the figure, a 2D
accurately identifies moving objects, including cars, bicycles, map is presented, depicting the travelling objects, with the
and pedestrians, and also identifies obstacles and road signs ego-truck in the center and a green track.
with their precise location, velocity, and trajectory information, Once the safety area has been breached, an alarm is
which is crucial for the near-miss accident detection. activated, and a 30-second risk video is generated, which
concludes at the point of activation. Subsequently, the second
dstop = v 2 /2µg (1) stage of the algorithm is initiated, wherein the S3D model,
originally introduced in [13], is employed with modifications
where to cater to our near-miss accident dataset. The model is trained
• dstop : the stopping distance (m) to extract pertinent features that are required for constructing
• v: the initial velocity of the vehicle (m/s) the classifier.
• µ: the coefficient of friction [10] Eleven types of near-miss accidents, focusing on the nature
2
• g: the acceleration due to gravity (9.8m/s ) of their occurrence as shown in Figure 4, have been defined
To accomplish the task of converting the 3D first-person based on the accident and near-miss accident definitions of
view videos into 2D top-view videos, we employ a series Tokyo University of Agriculture and Technology (TUAT) [14]
of techniques. First, we use the Deep Hough Transform [11] and the National Center for Statistics and Analysis (NCSA) 1 .
and YOLOP [12] to detect semantic lines and determine the
vanishing point. With the help of lane and road sign detection, 1 https://www.nhtsa.gov/data
To accomplish our objective, we have developed a cross-
modal text-video search engine based on the methodology
outlined in [17]. The architectural framework of this model
is depicted in Figure 5. Within this design, our aim is to
generate cross-features that enhance the overall global and
local correlations between visual and textual attributes of
events and objects, respectively. This process begins with the
extraction of visual, textual, and object relation features from
both video and text inputs, employing a video encoder (e.g.,
ViVIT), an object relation encoder (e.g., Visual Genome),
and a text encoder (e.g., BERT). These extracted features
are then supplied to the (Q, V, K) inputs of a multi-head
attention-transformer backbone, along with additional position
encoding.
To improve similarity between an ”entity” in text and
an ”object” in a video while considering object relations
from both visual and textual perspectives, we define a loss
function. Upon completion of the training stage, our system
gains the capability to retrieve traffic-risk videos that meet
the conditions described in the textual queries. For readers
seeking a deeper understanding of the algorithm, we invite
you to consult the details provided in the paper [17].
D. Data augmentation and self-supervised annotation
Fig. 2. near-miss accident prediction: An Overview
Developing near-miss accident prediction models requires
advanced technologies to identify and mitigate potential traffic
To ensure accuracy, we also conducted a survey with a trans- risks. Current methods for identifying risks typically focus on
portation company to refine our definitions. Leveraging these detecting anomalies frame-by-frame, rather than identifying
definitions, we annotated data collected by the transportation which participant could cause a collision. This approach is lim-
company and re-annotated other open datasets such as DADA ited by the availability of annotation datasets, which only allow
[15] and RetroTruck [16]. for the detection of anomalies. Near-miss accidents, which are
Since we have extracted the velocity, position, and trajectory narrowly avoided collisions, are a type of traffic risk that is
of all the objects that appear within the field-of-view of the not differentiated from actual accidents before the collision
dashcam, these pieces of information can be employed as occurs. In response, we decide to redefine the definition of an
fine-grained features, together with the S3D-based features accident and re-annotate the DADA-2000 dataset to include
extracted from the entire video and the road lanes, to develop near-miss accidents, by extending the duration of the accident
the classifier. Ultimately, our system is capable of alerting and precisely covering all ego-motions during an incident.
drivers to critical moments that may pose a threat to their To achieve this, we relied on the method that integrates con-
safety, while also providing insight into the potential type of ditional style translation (CST) and separable 3-dimensional
accident that could occur if swift and appropriate action is not convolutional neural network (S3D) [18]. The CST architec-
taken. ture is used to augment the re-annotation of the DADA-2000
dataset [15], increasing the number of traffic risk accident
C. Cross-modal text-video search engine videos and generalizing the performance of the video clas-
In addition to the primary goal of predicting near-miss sification model under different types of conditions. The S3D
accidents, the capability to search and query other traffic component is useful for video classification to prove dataset
risk events that occurred during the travel time from the re-annotation consistency.
company’s storage is also a crucial task. This task serves The evaluation of the proposed method yielded a significant
multiple purposes, including gaining a better understanding of improvement in accuracy on cross-validation analysis, with
driver behavior, unexpected accidents, and providing evidence a 10.25% positive margin from the baseline model. The re-
for policies and insurance companies. Moreover, it provides annotation using the proposed method is valuable for the com-
valuable information for the company to train or retrain puter vision community to train their models to produce better
their drivers to respond effectively to various types of traffic traffic risk classification. The authors stress the importance of
risks. By collecting and analyzing this data, we can extend consistent dataset annotation in improving the performance
the number of accidents that our model can predict in the of the classification model to generalize classification on real
future through annotations and retraining processes, ultimately applications. These findings highlight the need for increased
improving the safety of drivers on the road. dataset annotation consistency to improve the performance of
Fig. 3. Stage 1 Results: An Example of near-miss Accident Detection

Fig. 4. near-miss Accident Definitions

self-driving systems.

E. UI/UX design
The front-end components depicted in Figure 1 aim to
provide a comprehensive UI/UX for fleet management tasks.
However, due to space limitations, the authors present only a
few snapshots of the system, focusing on statistical reporting,
event browsing, and event searching.
As shown in Figure 6, the system includes a statistical
reporting feature that generates reports on a daily, weekly,
monthly, and yearly basis.
Fig. 5. Text-2-Video: Overview Another snapshot 7 depicts a near-miss accident report,
which provides rich information on where and how the ac-
cident occurred, as well as physiological and in-cabin en-
classification models in real-world accident videos, which has vironment data. The screen also displays a map and in-
implications for the development of more reliable and safe cabin/out-cabin video shots, with the ”orange” part of the
Fig. 6. Statistical Report Screen exhibits various statistical reports aimed at aiding managers in comprehending the fleet’s status better. For instance, in the
top-left, there’s a combined evaluation; in the top-middle, data on safe driving; and in the top-right, information regarding economical driving.

bar under the in-cabin video indicating the time of the near- IV. E XPERIMENTAL R ESULTS
miss accident. Users can interact with several components
In this section, we present in-lab experimental findings
to obtain more specific data, such as the driver’s behavior
related to two fundamental functions: near-miss accident pre-
before, during, and after the near-miss accident, or how in-
diction and text-to-video retrieval. It’s worth noting that while
cabin environment factors like temperature and luminance can
we have published the results of these experiments in academic
affect driver alertness.
journals and conferences, we are unable to disclose real-
The UI/UX offers a significant and valuable advantage to world testing field results due to company confidentiality
both regular users and researchers. It enables them to visually considerations.
perceive the correlations among various factors, including
human activities (such as gestures, heart rate, and stress level), A. Near-Miss Accident Detection
in-cabin environmental conditions (like CO2 levels, PM2.5
The results presented in this subsection are primarily derived
particulate matter, illumination, and temperature), and external
from our publication in [19]. We have re-conducted the tests
factors affecting the cabin environment (such as traffic conges-
with certain parameter adjustments and data refinements in
tion, rain, urban crowded areas, and highway conditions).
comparison to the original experiments.
This wealth of information provides numerous cues and in- We utilized the CST-S3D dataset [18] (mentioned in sub-
sights for the development of new models or patterns databases section III-D) and NHW dataset (provided by our industrial
related to traffic risk. Additionally, it facilitates the manual partner) for the training and evaluation of our model. For pos-
annotation of rare events, enabling the successful training of itive samples, which represent traffic-risk events, we allocated
few-shot learning algorithms. 837 videos for training, 324 for validation, and 132 for testing
In this scenario illustrated in Figure 8, the user utilizes from the CST-S3D dataset. The NHW dataset was exclusively
both tabular conditions and free-style text queries to perform used for testing and contained 231 videos. Conversely, for
a video search. When the user wishes to view further details negative samples, which serve as non-traffic-risk instances,
about a specific event, the system directs them to the screen we designated 612 videos for training, 138 for validation, and
shown in Figure 7. 69 for testing from the CST-S3D dataset. The NHW dataset
Overall, these UI/UX features enable fleet managers to contributed 204 videos for testing purposes.
gain better insight into their operations and make informed Figure 9 presents the results of Near-Miss Accident detec-
decisions to improve safety and efficiency. tion in the Hitting/Conflicting and Crossing/Perpendicularly
Fig. 7. near-miss Accident Report: Detailed Screen

Fig. 8. Text-Video Searching

classes. In this context, Hitting/Conflicting refers to a scenario Meanwhile, Crossing/Perpendicularly denotes a conflict where
where the ego-car is on the verge of colliding with an object the ego-car encounters an object crossing its path at a perpen-
situated directly in front of it, as illustrated in Figure 4. dicular angle. It’s worth noting that, in reality, certain classes
Fig. 9. Near-Miss Accident Detection Results: Hitting/Conflicting and Crossing/Perpendicularly Classes (copied from [19]

have very limited samples, and in some cases, no samples at product features a user-friendly UI/UX design, a dashcam-
all, as evident in Figure 9. For instance, the NWH dataset, based near-miss accident prediction model, and a cross-modal
which represents real-world driving scenarios, lacks samples text-video search engine. Additionally, a data augmentation
featuring cyclists, pedestrians, and motorcycles in both the Hit- approach was proposed to enrich and diversify the training
ting and Crossing classes. This scarcity poses a considerable data and enable self-supervised learning for annotation. The
challenge in fine-tuning the model and underscores the need product offers a robust and comprehensive solution for trans-
to augment data from the limited original dataset to enrich portation companies seeking to leverage technology to enhance
the variety of available samples. For readers seeking more in- fleet management and passenger safety.
depth insights into the experimental results, we encourage you This paper also underscores the achievements resulting from
to consult the original paper [19]. the effective collaboration between a government agency and
industry companies. This collaborative effort has facilitated
B. Text-2-Video Retrieval the swift transition of knowledge generated in a laboratory
The results presented in this subsection are primarily de- setting to user-centric products for widespread societal benefit.
rived from our publication in MMM 2023 [17]. We have re- Such collaboration has not only helped reduce costs but also
conducted the tests with certain parameter adjustments and enabled the rapid response to genuine market demands and
data refinements in comparison to the original experiments. requirements.
In this experiment, we utilized the same dataset as described The system is currently undergoing beta testing and will
in subsection , where each video was segmented into 30- soon be released to the transportation company. We are con-
second clips, consisting of 20 seconds before the occurrence tinuously improving the system by incorporating the latest
of a traffic-risk event and 10 seconds thereafter. To enrich the computer vision and AI technologies to better understand
dataset, we engaged in manual caption creation, employing driver behavior and establish correlations between the driver’s
crowd-sourced labeling to generate five distinct captions for mental and physical state, the in-cabin environment, and the
each video, spanning from abstract to detailed descriptions. surrounding conditions outside the cabin. This will enable
Subsequently, we applied the algorithm outlined in subsection the system to recognize necessary ”signs” that may lead to
III-D to augment the dataset, expanding its volume, and accidents and enhance traffic safety.
proceeded to index and store this augmented dataset in our Furthermore, as an outcome of this collaboration, a new
database. We then activated the text-to-video search engine and integrated device consolidating all previously separate IoT
enlisted the assistance of volunteers to query videos using their devices utilized for system evaluation is currently in de-
own text-based queries. Finally, we evaluated the Precision at velopment. This device will be a distinctive unit equipped
K (P@K) for each volunteer’s search results and calculated the with CPU/GPU, a video camera, IoT capabilities, and Wi-Fi
average to report the final outcomes. The results were notably connectivity. Each of these devices will also function as an
positive, with P@10 at 8, P@50 at 46, P@100 at 95, and edge client, actively participating in edge AI initiatives that
P@200 at 194. hold promise for the future of fleet management and Advanced
Driver Assistance Systems (ADAS).
V. C ONCLUSIONS
This paper presented a comprehensive product architecture ACKNOWLEDGMENT
that integrates various technologies, including UI/UX design, This R&D includes the results of ”Research and develop-
AI models, and data augmentation techniques. The developed ment of optimized AI technology by secure data coordination
(JPMI00316)” by the Ministry of Internal Affairs and Com- of real world crashes,” in Australasian Road Safety Research, Policing
munications (MIC), Japan. and Education Conference, 2009.
[11] K. Zhao, Q. Han, C.-B. Zhang, J. Xu, and M.-M. Cheng, “Deep hough
R EFERENCES transform for semantic line detection,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4793–4806, 2022.
[1] worktruckonline, “can-video-based-safety-systems-improve-fleet-driver- [12] D. Wu, M.-W. Liao, W.-T. Zhang, X.-G. Wang, X. Bai, W.-Q. Cheng,
safety,” https://www.worktruckonline.com/10186709/can-video-based- and W.-Y. Liu, “YOLOP: You only look once for panoptic driving
safety-systems-improve-fleet-driver-safety. perception,” Machine Intelligence Research, vol. 19, no. 6, pp. 550–562,
[2] truckinginfo, “Ntsb explains how video recorders could have a major nov 2022.
impact on safety,” https://www.truckinginfo.com/313449/ntsb-explains- [13] S. Xie, C. Sun, J. Huang, Z. Tu, and K. P. Murphy, “Rethinking
how-video-recorders-could-have-a-major-impact-on-safety. spatiotemporal feature learning: Speed-accuracy trade-offs in video
[3] FPF, “FPF-Samsara-White-Paper,” https://fpf.org/wp- classification,” in European Conference on Computer Vision, 2017.
content/uploads/2022/05/FPF-Samsara-White-Paper.pdf. [14] R. Matsumi, P. Raksincharoensak, and M. Nagai, “Study on
[4] automotive fleet, “can-video-telematics-get-smarter,” autonomous intelligent drive system based on potential field
https://www.automotive-fleet.com/345812/can-video-telematics-get- with hazard anticipation,” Journal of Robotics and Mechatronics,
smarter. vol. 27, no. 1, pp. 5–11, 02 2015. [Online]. Available:
[5] omnitracs, “Sd20-029,” https://www.omnitracs.com/sites/default/files/files/2021- https://cir.nii.ac.jp/crid/1390001288150507520
08/SD20-029-Video-BasedSafetyeBook-US-GW-v12.pdf. [15] J. Fang, D. Yan, J. Qiao, J. Xue, and H. Yu, “Dada: Driver attention
[6] wexinc, “dash-cameras-a-powerful-fleet-management- prediction in driving accident scenarios,” 2023.
tool-to-improve-driver-efficiency-and-safety,” [16] S. Haresh, S. Kumar, M. Z. Zia, and Q.-H. Tran, “Towards anomaly
https://www.wexinc.com/insights/blog/fleet/dash-cameras-a-powerful- detection in dashcam videos,” 2020.
fleet-management-tool-to-improve-driver-efficiency-and-safety/. [17] D.-D. Pham, M.-S. Dao, and T.-B. Nguyen, “A cross-modal attention
[7] verizonconnect, “smart-video-modern-fleet-safety,” model for fine-grained incident retrieval from dashcam videos,” in MMM
https://www.verizonconnect.com/resources/article/smart-video-modern- 2023. Berlin, Heidelberg: Springer-Verlag, 2023.
fleet-safety/. [18] H. Pradana, M.-S. Dao, and K. Zettsu, “Augmenting ego-vehicle for
[8] netradyne, “netradyne,” https://www.netradyne.com/. traffic near-miss and accident classification dataset using manipulating
[9] M. Sabri and A. Fauza, “Analysis of vehicle braking behaviour and conditional style translation,” 2022 International Conference on Digital
distance stopping,” in IOP Conference Series: Materials Science and Image Computing: Techniques and Applications (DICTA), pp. 1–8, 2022.
Engineering, vol. 309, 2019. [19] H. Pradana, “An end-to-end online traffic-risk incident prediction in
[10] J. Mackenzie and R. Anderson, “The potential effects of electronic sta- first-person dash camera videos,” Big Data and Cognitive Computing,
bility control interventions on rural road crashes in australia: simulation vol. 7, no. 3, 2023. [Online]. Available: https://www.mdpi.com/2504-
2289/7/3/129

You might also like