You are on page 1of 38

2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

5G Core PFCP Intrusion Detection Dataset


George Amponis†‡ , Panagiotis Radoglou-Grammatikis†§ , George Nakas† ,
Sotirios Goudos¶ , Vasileios Argyriou∥ , Thomas Lagkas‡ and Panagiotis Sarigiannidis§
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176693

Abstract—The rapid evolution of the 5G environments intro- and interfaces of the Next-Generation Radio Access Network
duces several benefits, such as faster data transfer speeds, lower (NG-RAN) and the 5G core itself are vulnerable to attacks,
latency and energy efficiency. However, this situation brings also which can potentially disrupt end-to-end communication ser-
critical cybersecurity issues, such as the complex and increased
attack surface, privacy concerns and the security of the 5G vices.
core network functions. Therefore, it is evident that the role While a particular emphasis has been given to the security
of intrusion detection mechanisms empowered with Artificial of NG-RAN, there are not many studies investigating the
Intelligence (AI) models is crucial. Therefore, in this paper, we security issues of the 5G core. In this paper, we focus on
introduce a labelled security dataset called 5GC PFCP Intrusion the security issues of the Packet Forwarding Control Protocol
Detection Dataset. This dataset includes a set of network flow
statistics that can be used by AI detection models to recognise (PFCP) protocol, which is utilised in the N4 interface between
cyberattacks against the Packet Forwarding Control Protocol the Session Management Function (SMF) and the User Plane
(PFCP). PFCP is used for the N4 interface between the Session Function (UPF) in the 5G core. In particular, based on the
Management Function (SMF) and the User Plane Function PFCP attacks investigated in our previous work in [3], in
(UPF) in the 5G core. In particular, four PFCP attacks are this paper, we introduce a labelled intrusion detection dataset,
investigated in this paper, including the relevant network traffic
data in terms of pcap files and the Transmission Control Protocol called 5GC PFCP Intrusion Detection Dataset, which was
(TCP)/Internet Protocol (IP) and application-layer statistics. This generated in the context of the H2020 SANCUS project,
dataset is already publicly available in IEEE Dataport and a collaborative research initiative funded by the European
Zenodo. Union (EU) to enhance the security of 5G networks. This
Index Terms—5G, Artificial Intelligence, Cybersecurity, Intru- security dataset can fully support the development of Artificial
sion Detection, PFCP
Intelligence (AI)-powered Intrusion Detection and Prevention
Systems (IDPS) against these attacks.
I. I NTRODUCTION
The proposed dataset is available in IEEE Dataport and
In today’s communication landscape, there is a growing Zenodo and can support the development of IDPS that adopt
demand for secure and reliable connections with high-speed, Machine Learning (ML) and Deep Learning (DL) methods.
high-throughput capabilities between the User Equipment We construct this dataset by following the methodological
(UE) and the Data Network (DN). The 5G core architecture, framework of A. Gharib et al. [4]. Therefore, our dataset is
which follows the 3rd Generation Partnership Project (3GPP) characterised by eleven main attributes: (a) Complete Network
network specifications, offers faster connectivity, lower la- Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d)
tency, higher bit rates, and improved network reliability. This Complete Interaction, (e) Complete Capture, (f) Available
technology is crucial to support critical applications such as Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature
the Internet of Things (IoT) and industrial use cases targeting Set and (j) Metadata. Therefore, the contributions of this paper
pivotal infrastructures [1], [2]. However, several components are summarised as follows:
∗ This project has received funding from the European Union’s Horizon • 5G Core Testbed and PFCP Attacks: A virtualised
2020 research and innovation programme under grant agreement 952672 5G environment was implemented in order to investigate
(SANCUS). PFCP attacks against the 5G core. Four PFCP-related at-
† G. Amponis, P. Radoglou-Grammatikis and G. Nakas are with
K3Y Ltd. Sofia 1612, Bulgaria - E-Mail: {gamponis, pradoglou, tacks are examined, targeting the communication between
gnakas}@k3y.bg SMF and UPF.
‡ G. Amponis and T. Lagkas are with the Department of Computer Science,
• 5GC PFCP Intrusion Detection Dataset: Based on the
International Hellenic University, Kavala Campus, 65404, Greece - E-Mail:
{geaboni, tlagkas}@cs.ihu.gr previous PFCP attacks, a new labelled security dataset is
§ P. Radoglou Grammatikis and P. Sarigiannidis are with the De- implemented and shared publicly in order to support the
partment of Electrical and Computer Engineering, University of West- development of AI solutions for intrusion detection. This
ern Macedonia, Kozani 50100, Greece - E-Mail: {pradoglou,
psarigiannidis}@uowm.gr dataset is available in IEEE Dataport1 and Zenodo 2 .
¶ Sotirios Goudos is with the Department of Physics, Aristotle
Based on the previous remarks, the rest of this paper is
University of Thessaloniki, Thessaloniki, 54124, Greece - E-Mail:
{sgoudo}@physics.auth.gr organised as follows. Section II discusses the 5G testbed and
∥ V. Argyriou is with the Department of Networks and Digital Media,
1 https://ieee-dataport.org/documents/5gc-pfcp-intrusion-detection-dataset-0
Kingston University London, Penrhyn Road, Kingston upon Thames, Surrey
KT1 2EE, UK - E-Mail: vasileios.argyriou@kingston.ac.uk 2 https://zenodo.org/record/7888347#.ZFejbNJBxhE

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

the relevant attacks used to compose the 5GC PFCP Intrusion DN is disconnected, and the UE remains connected to the 5G
Detection Dataset. Section III provides the structure of the RAN or the Core network. The attack is executed on the N4
dataset. Section IV summarises the features and the balanced interface and may affect the N6 interface.
files that can be utilised by ML and DL methods. Finally, PFCP Session Modification Flood attack (DUPL Apply
section V gives the concluding remarks of the paper. Action Field Flag): The aim of this attack is to utilise the
DUPL flag in the Apply Action field to compel the UPF
II. 5G T ESTBED AND PFCP ATTACKS to replicate rules for the session, generating multiple paths
As depicted in Fig. 1, an experimental 5G testbed was for the same data from a single source. This can result in
used to develop the 5GC PFCP Intrusion Detection Dataset, undefined behaviour in the N6 interface and/or cause traffic
including the following 5G network functions: Network Slice to be duplicated during transmission to the DN. Additionally,
Selection Function (NSSF), the Network Exposure Function this attack may be part of a larger scheme to carry out a
(NEF), the Network Repository Function (NRF), the Policy Distributed Denial of Service (DDoS) attack against hosts
Control Function (PCF), the User Data Management (UDM), within the DN, while also overwhelming the UPF’s resources
the Access and Mobility Management Function (AF), the to forward outgoing packets to external hosts outside the 5G
Authentication Server Function (AUSF), the Access Manage- core. By amplifying the number of transmitted packets per
ment Function (AMF), SMF and UPF. Moreover, a virtualised active user, a malicious entity can create an almost passive
UE, a virtualised gNodeB (gNb) and an attacker instance attack vector that can be effortlessly scaled to impact the traffic
impersonating a maliciously instantiated SMF were used to of numerous subscribers, exponentially draining the packet
generate the dataset. This testbed was implemented, utilising handling resources of the UPF.
Open5GS as the cellular core [5], and UERANSIM as NG- The attacks were performed in the following order: First, on
RAN. Thus, the following PFCP attacks were emulated in a Wednesday, October 5, 2022, the PFCP Session Establishment
coordinated manner in order to construct the dataset. DoS Attack was performed for four hours. Continuing, on
PFCP Session Establishment DoS Attack: The aim of this Thursday, October 13, 2022, the PFCP Session Deletion DoS
attack is to exhaust the resources of the UPF by inundating Attack was performed for four hours. Then, on Tuesday,
it with genuine Session Establishment Requests and Heartbeat November 01, 2022, the PFCP Session Modification DoS
Requests. This could potentially hinder the 5G core’s ability to Attack with the DROP flag was performed for four hours.
create new Protocol Data Unit (PDU) sessions between clients Finally, on Tuesday, November 22, 2022, the PFCP Session
and DN. The attack is executed on the N4 interface and could Modification DoS Attack with a DUPL flag in the Apply
affect the intermediate interfaces as well. To evade detection, Action Field Flag was performed for four hours.
a unique Session ID (SEID) is generated for every session
establishment request. III. DATASET S TRUCTURE
PFCP Session Deletion DoS Attack: The goal of this attack The previous PFCP attacks were carried out using se-
is to disconnect a specific UE from the DN. The script focuses curity tools, such as scapy,pcap-splitter, editcap
on PDU sessions between clients and DN in such a way that and CICFlowMeter. Each attack is provided by a 7z/zip
only the DN is disconnected, while the UE remains connected file that contains the network traffic and flow statistics for
to the NG-RAN or the Core network. The attack is executed each entity involved. Specifically, each 7z/zip file includes (a)
on the N4 interface, and its impact is noticeable on the N6 pcap files for each entity, (b) Transmission Control Proto-
interface. The only way to restore the connection of an affected col (TCP)/Internet Protocol (IP) network flow statistics in a
UE is to re-initiate the session, either by restarting the session Comma-Separated Values (CSV) format, and (c) PFCP flow
or entering the coverage area of another gNb. In such cases, statistics for each entity, utilising different timeout values (such
a new SEID is assigned to the UE’s PDU session, rendering as 15, 20, 60, 120, and 240 seconds). The TCP/IP network flow
the attack ineffective. statistics were generated using CICFlowMeter, while the
PFCP Session Modification Flood attack (DROP Apply PFCP flow statistics were produced using a Custom PFCP
Action Field Flags): The objective of this attack is to in- Flow Generator.
validate packet handling rules for a specific session, leading Based on the aforementioned remarks, the dataset consists
to the disassociation of a targeted UE from the DN. When of the following 7z/zip files:
successful, the Forwarding Action Rule (FAR) rules that • Balanced PFCP APP Layer.7z/zip: It includes the bal-
contain the base station’s Tunnel Endpoint Identifier (TEID) anced CSV files from CICFlowMeter that may be used
and IP address are removed from the UPF. This action cuts off to train ML and DL algorithms. Each folder includes
the General Packet Radio Service (GPRS) Tunneling Protocol a different subfolder for the corresponding flow timeout
(GTP) tunnel for the subscriber’s downlink data, depriving values used by the Custom PFCP Flow Generator.
them of internet connectivity. However, the GPRS Tunneling • Balanced TCP-IP Layer.7z/zip: It includes the balanced
Protocol - User Plane (GTP-U) tunnel can be restored by CSV files from the Custom PFCP Flow Generator that
transmitting the necessary data to the UPF. Similar to other may be used to train ML and DL algorithms. Each folder
PFCP-based attacks, this script focuses on the PDU sessions includes a different sub-folder for the corresponding flow
between the clients and the DN in such a way that only the timeout values used by CICFlowMeter.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

N1: UE - AMF
N2: gNB - AMF
N3: gNB - UPF
N4: SMF - UPF NSSF NEF NRF PCF UDM AF
N5: PCF - AF N22 N15 N10 N5
N6: UPF - DN
N7: SMF - PCF N13 N14 N8 N11 N7 N9
N8: AMF - UDM N12 N4 N6
N9: UPF - UPF
AUSF AMF SMF UPF DN
N10: SMF -UDM N1 N2 N3
N11: AMF - SMF 5G Core PDU Session

N12: AUSF - AMF NG


N13: AUSF - UDM
N14: AMF - AMF Xn
N15: AMF – PCF
N22: AMF - NSSF RAN

Figure 1. 5G Testbed used to generate 5GC PFCP Intrusion Detection Dataset

• PFCP Session Deletion DoS Attack.7z/zip: It includes each of the classes. The five classes and their labels are avail-
the pcap files and CSV files related to the PFCP Session able in Table III. The two balanced versions are summarised
Deletion Denial of Service (DoS) Attack. by the files Balanced_PFCP_APP_Layer.7z/zip and
• PFCP Session Establishment DoS Attack.7z/zip: It Balanced_TCP-IP_Layer.7z/zip. The first file con-
includes the pcap files and CSV files related to the PFCP tains the PFCP flow statistics, while the second file includes
Session Establishment Flood DoS Attack. the TCP/IP flow statistics. Each file also contains a set of sub-
• PFCP Session Modification DoS Attack.7z/zip: It in- folders for each flow timeout value. In particular, five values
cludes the pcap files and CSV files related to the PFCP were used: 15s, 20s, 60s, 120s, and 240s. In addition, there
Session Modification DoS Attack. are two sub-subfolders, namely Training and Testing.
Each of these sub-subfolders contains a .csv file named
IV. F EATURES AND BALANCED F ILES Training_X.csv and Testing_X.csv, where X is the
flow timeout value. The split ratio is: 70% - 30%, for the
training and testing processes, respectively. In addition, the
Table I splitting is stratified, meaning that the same percentage of
A PPLICATION L AYER PFCP F LOW S TATISTICS – F EATURES
samples of each class are present in each training and testing
Feature
flow ID
Description
Flow identifier
dataset. The number of flows per each flow timeout value for
source IP Source IP address Balanced_TCP-IP_Layer.7z/zip are presented in Ta-
destination IP Destination IP address
source port Source port number ble IV, while the number of flows for each flow timeout value
destination port Destination port number
protocol Network layer protocol for Balanced_PFCP_APP_Layer.7z/zip are provided
duration Length of time flow was active
fwd packets Number of forward packets in Table V.
bwd packets Number of backward packets
PFCPHeartbeatRequest counter Number of PFCP Heartbeat Request messages
PFCPHeartbeatResponse counter Number of PFCP Heartbeat Response messages V. C ONCLUSIONS
PFCPPFDManagementRequest counter Number of PFCP PFD Management Request messages
PFCPPFDManagementResponse counter
PFCPAssociationSetupRequest counter
Number of PFCP PFD Management Response messages
Number of PFCP Association Setup Request messages
The communication points in the 5G core can lead to
PFCPAssociationSetupResponse counter
PFCPAssociationUpdateRequest counter
Number of PFCP Association Setup Response messages
Number of PFCP Association Update Request messages
various security weaknesses that are investigated by both
PFCPAssociationUpdateResponse counter Number of PFCP Association Update Response messages academia and industry. In this paper, we present the 5GC
PFCPAssociationReleaseRequest counter Number of PFCP Association Release Request messages
PFCPAssociationReleaseResponse counter Number of PFCP Association Release Response messages PFCP Intrusion Detection Dataset, which was generated in the
PFCPVersionNotSupportedResponse counter Number of PFCP Version Not Supported Response messages
PFCPNodeReportRequest counter Number of PFCP Node Report Request messages context of the H2020 SANCUS project. This security dataset
PFCPNodeReportResponse counter Number of PFCP Node Report Response messages
PFCPSessionSetDeletionRequest counter Number of PFCP Session Set Deletion Request messages is publicly available in IEEE Dataport and Zenodo and can be
PFCPSessionSetDeletionResponse counter Number of PFCP Session Set Deletion Response messages
PFCPSessionEstablishmentRequest counter Number of PFCP Session Establishment Request messages utilised for the development of AI-powered intrusion detection
PFCPSessionEstablishmentResponse counter Number of PFCP Session Establishment Response messages
PFCPSessionModificationRequest counter Number of PFCP Session Modification Request messages and prevention mechanisms. It includes the network traffic data
PFCPSessionModificationResponse counter
PFCPSessionDeletionRequest counter
Number of PFCP Session Modification Response messages
Number of PFCP Session Deletion Request messages
(i.e., pcap files) and labelled TCP/IP and PFCP flow statistics
PFCPSessionDeletionResponse counter
PFCPSessionReportRequest counter
Number of PFCP Session Deletion Response messages
Number of PFCP Session Report Request messages
related to four PFCP cyberattacks, namely (a) PFCP Session
PFCPSessionReportResponse counter
Downlink counter
Number of PFCP Session Report Response messages
Number of downlink packets
Establishment DoS Attack, (b) PFCP Session Deletion DoS
Uplink counter Number of uplink packets Attack, (c) PFCP Session Modification Flood attack (DROP
Bidirectional traffic counter Number of bidirectional traffic packets
Label Flow label (e.g. benign or malicious) Apply Action Field Flags) and (d) PFCP Session Modification
Flood attack (DUPL Apply Action Field Flag).
Two balanced versions of the dataset have been created for
R EFERENCES
the TCP/IP flow statistics generated by CICFlowMeter (Ta-
ble II) and the PFCP flow statistics produced by the Custom [1] P. Radoglou-Grammatikis, P. Sarigiannidis, C. Dalamagkas, Y. Spyridis,
T. Lagkas, G. Efstathopoulos, A. Sesis, I. L. Pavon, R. T. Burgos,
PFCP Flow Generator (Table I). Each version is bal- R. Diaz, A. Sarigiannidis, D. Papamartzivanos, S. A. Menesidou,
anced. Therefore, they contain an equal number of samples for G. Ledakis, A. Pasias, T. Kotsiopoulos, A. Drosou, O. Mavropoulos,

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Table II Table III


TCP/IP N ETWORK F LOW S TATISTICS - F EATURES C LASSES OF THE 5GC PFCP I NTRUSION D ETECTION DATASET
Feature Description Class Label
Flow ID ID of the flow Normal flow Normal
Src IP Source IP address PFCP Session Establishment Flood attack flow Mal Estab
Src Port Source TCP/UDP port PFCP Session Deletion Flood attack flow Mal Del
Dst IP Destination IP address PFCP Session Modification Flood attack (DROP Apply Action Field Flags) flow Mal Mod
Dst Port Destination TCP/UDP port PFCP Session Modification Flood attack (DUPL Apply Action Field Flag) flow Mal Mod2
Protocol Protocol related to the flow
Timestamp Flow timestamp
Flow Duration Duration of the flow in Microseconds
Tot Fwd Pkts Total packets in forward direction Table IV
Tot Bwd Pkts Total packets in backward direction N UMBER OF THE TCP/IP FLOWS ( GENERATED BY CICF LOW M ETER ) FOR
TotLen Fwd Pkts Total size of packets in forward direction THE DIFFERENT FLOW TIMEOUT VALUES IN THE BALANCED FILES
TotLen Bwd Pkts Total size of packets in backward direction
Fwd Pkt Len Max Maximum size of packet in forward direction
Fwd Pkt Len Min Minimum size of packet in forward direction Training
Fwd Pkt Len Mean Mean size of packet in forward direction Timeout Normal Mal Estab Mal Del Mal Mod Mal Mod2
Fwd Pkt Len Std Standard deviation size of packet in forward direction 15s 1439 1440 1440 1440 1440
Bwd Pkt Len Max Maximum size of packet in backward direction
Bwd Pkt Len Min Minimum size of packet in backward direction 20s 1439 1440 1440 1440 1440
Bwd Pkt Len Mean Mean size of packet in backward direction 60s 485 485 485 485 485
Bwd Pkt Len Std Standard deviation size of packet in backward direction 120s 260 261 260 260 261
Flow Byts/s Number of flow bytes per second
Flow Pkts/s Number of flow packets per second
240s 133 134 133 134 134
Flow IAT Mean Mean time between two packets sent in the flow Testing
Flow IAT Std Standard deviation time between two packets sent in the flow Timeout Normal Mal Estab Mal Del Mal Mod Mal Mod2
Flow IAT Max Maximum time between two packets sent in the flow 15s 618 617 617 617 617
Flow IAT Min Minimum time between two packets sent in the flow
Fwd IAT Tot Total time between two packets sent in the forward direction 20s 618 617 617 617 617
Fwd IAT Mean Mean time between two packets sent in the forward direction 60s 208 208 208 208 208
Fwd IAT Std Standard deviation time between two packets sent in the forward direction 120s 112 111 112 112 111
Fwd IAT Max Maximum time between two packets sent in the forward direction
240s 58 57 58 57 57
Fwd IAT Min Minimum time between two packets sent in the forward direction
Bwd IAT Tot Total time between two packets sent in the backward direction
Bwd IAT Mean Mean time between two packets sent in the backward direction
Bwd IAT Std Standard deviation time between two packets sent in the backward direction Table V
Bwd IAT Max Maximum time between two packets sent in the backward direction N UMBER OF THE PFCP FLOWS ( GENERATED BY C USTOM PFCP F LOW
Bwd IAT Min Minimum time between two packets sent in the backward direction
Fwd PSH Flags Number of Forward PSH flags G ENERATOR ) FOR THE DIFFERENT FLOW TIMEOUT VALUES IN THE
Bwd PSH Flags Number of Backward PSH flags BALANCED FILES
Fwd URG Flags Number of Forward URG flags
Bwd URG Flags Number of Backward URG flags Training
Fwd Header Len Length of Forward header
Bwd Header Len Length of Backward header Timeout Normal Mal Estab Mal Del Mal Mod Mal Mod2
Fwd Pkts/s Number of Forward packets per second 15s 1103 1104 1104 1104 1104
Bwd Pkts/s Number of Backward packets per second 20s 666 667 666 666 667
Pkt Len Min Minimum packet length
Pkt Len Max Maximum packet length
60s 222 223 222 223 223
Pkt Len Mean Mean packet length 120s 110 111 110 111 111
Pkt Len Std Standard deviation of packet length 240s 68 69 68 69 69
Pkt Len Var Variance of packet length Testing
FIN Flag Cnt Number of FIN flags
SYN Flag Cnt Number of SYN flags Timeout Normal Mal Estab Mal Del Mal Mod Mal Mod2
RST Flag Cnt Number of RST flags 15s 474 473 473 473 473
PSH Flag Cnt Number of PSH flags 20s 286 285 286 286 285
ACK Flag Cnt Number of ACK flags
URG Flag Cnt Number of URG flags
60s 96 95 96 95 95
CWE Flag Count Number of CWE flags 120s 48 47 48 47 47
ECE Flag Cnt Number of ECE flags 240s 30 29 30 29 29
Down/Up Ratio Down/Up ratio
Pkt Size Avg Average packet size
Fwd Seg Size Avg Average Forward segment size
Bwd Seg Size Avg Average Backward segment size
Fwd Byts/b Avg Average Forward bytes per bit H. C. Bolstad, D.-E. Archer, N. Paunovic, R. Gallart, T. Rokkas, and
Fwd Pkts/b Avg Average Forward packets per bit A. Arce, “SDN-Based Resilient Smart Grid: The SDN-microSENSE
Fwd Blk Rate Avg Average Forward block rate
Bwd Byts/b Avg Average Backward bytes per bit
Architecture,” Digital, vol. 1, no. 4, pp. 173–187, 2021. [Online].
Bwd Pkts/b Avg Average Backward packets per bit Available: https://www.mdpi.com/2673-6470/1/4/13
Bwd Blk Rate Avg Average Backward block rate [2] G. Amponis, P. Radoglou-Grammatikis, T. Lagkas, S. Ouzounidis,
Subflow Fwd Pkts Number of Forward subflow packets M. Zevgara, I. Moscholios, S. Goudos, and P. Sarigiannidis, “Towards se-
Subflow Fwd Byts Number of Forward subflow bytes
Subflow Bwd Pkts Number of Backward subflow packets curing next-generation networks: Attacking 5g core/ran testbed,” in 2022
Subflow Bwd Byts Number of Backward subflow bytes Panhellenic Conference on Electronics & Telecommunications (PACET),
Init Fwd Win Byts Initial Forward window bytes 2022, pp. 1–4.
Init Bwd Win Byts Initial Backward window bytes
Fwd Act Data Pkts Number of Forward active data packets
[3] G. Amponis, P. Radoglou-Grammatikis, T. Lagkas, W. Mallouli,
Fwd Seg Size Min Minimum Forward segment size A. Cavalli, D. Klonidis, E. Markakis, and P. Sarigiannidis, “Threatening
Active Mean Mean active time the 5g core via pfcp dos attacks: the case of blocking uav
Active Std Standard deviation of active time communications,” EURASIP Journal on Wireless Communications and
Active Max Maximum active time
Active Min Minimum active time Networking, vol. 2022, no. 1, p. 124, Dec 2022. [Online]. Available:
Idle Mean Mean idle time https://doi.org/10.1186/s13638-022-02204-5
Idle Std Standard deviation of idle time [4] A. Gharib, I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “An eval-
Idle Max Maximum idle time
Idle Min Minimum idle time
uation framework for intrusion detection dataset,” in 2016 International
Label Label Conference on Information Science and Security (ICISS). IEEE, 2016,
pp. 1–6.
[5] P. Kiri Taksande, P. Jha, A. Karandikar, and P. Chaporkar, “Open5G:
A Software-Defined Networking Protocol for 5G Multi-RAT Wireless
Networks,” in 2020 IEEE Wireless Communications and Networking
A. C. Subirachs, P. P. Sola, J. L. Domı́nguez-Garcı́a, M. Escalante, Conference Workshops (WCNCW), 2020, pp. 1–6.
M. M. Alberto, B. Caracuel, F. Ramos, V. Gkioulos, S. Katsikas,

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

A Real-time Chaos-based Audio Encryption Scheme


2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176599

Ioannis Kafetzis∗ , Christos Volos∗ , Hector E. Nistazakis† , Sotirios Goudos‡ , Nikolaos G. Bardis§
∗: Laboratory of Nonlinear Systems - Circuits & Complexity, Physics Department, Aristotle University of Thessaloniki,
Thessaloniki, Greece. {kafetzis,volos}@physics.auth.gr
† : Section of Electronic Physics and Systems, Department of Physcics, National and Kapodistrian University of Athens,

Panepistiopolis Zografou 15784, Athens, Greece, enistaz@phys.uoa.gr


‡ : Physics Department, Aristotle University of Thessaloniki, Thessaloniki, Greece, sgoudo@physics.auth.gr
§ : Hellenic Army Academy, Department of Mathematics and Engineering Science, Vari - 16673, Greece, bardis@ieee.org

Abstract—This work introduces a chaos based audio encryp- chaotic maps and DNA encoding are proposed. Synchroniza-
tion scheme, along with its implementation, which works in real tion of fractional order chaotic systems is used in [15] for the
time. Initially, a novel chaotic map is introduced, and using this same purpose. In [16] discrete chaotic systems are used to
map a Pseudo-Random Bit generator is defined. The encryption
scheme is defined based on the above and utilizes parallel create encryption schemes for mono and stereo audio signals.
computing to encrypt recorded audio ”on the fly”, rather than A chaos-based audio encryption system focusing on speech is
collecting the complete recording and encrypting afterwards, proposed in [17]. The above works assume that the complete
making the method suitable for real-time encryption. The security audio signal is available before encryption, which can be a
of the proposed method is evaluated via statistical tests. Finally, slowing factor for real-time encryption.
an Open-Source implementation of the proposed method using
Python is given. In this work, we propose an efficient, chaos based, au-
Index Terms—chaos, encryption, audio, secure communica- dio encryption method that works in real-time. Instead of
tions acquiring the complete audio signal and then encrypting it,
audio chaos-based encryption is performed on the fly and in
I. I NTRODUCTION parallel, which reduces the execution time. The security of
Wireless voice communication becomes increasingly fre- the proposed method is validated through statistical testing.
quent. This in turn results into increased requirements in Finally, the proposed method is implemented in Python for
guaranteeing secure communication, which is usually achieved encryption of mono audio signals, as a proof of concept.
with the use of encryption schemes. The traditional encryption The implementation is open-source and available at https:
algorithms, such as the Data Encryption System (DES) [1] //github.com/iokaf/Chaotic-Audio-Encryption.
and Advanced Encryption Standard (AES) [2] struggle with II. M ETHODS
encrypting signals, such as images or audio, due to their long
A. Design and Study of the Chaotic Map
computational time and power consumption [3]. It is thus
important that the encryption schemes are not only secure, The discrete map proposed in this work is
but fast and computationally efficient. Chaos-based encryption xk+1 = r (sin(πxk ) + rem (1000xk , 1)) (1)
schemes have become a constant point of research interest, as
they can fulfill all requirements. where rem is the remainder operation. This map draws in-
Chaotic systems are deterministic dynamical systems, both spiration from [18], where it is proved that using the remain-
continuous and discrete, that demonstrate extreme sensitivity der operation results into increased Lyapunov exponent. The
to initial conditions [4]. This implies, that a small alteration behavior of the map is studied using the standard graphical
of the initial condition results into a completely different methods, that is, the bifurcation [19] and Lyapunov exponent
trajectory. Hence, chaotic systems serve as computationally [20] and Cobweb plots, the data for which are calculated using
efficient sources of deterministic entropy. When used for the ”Chaos-Maps” Python Library [21].
encryption purposes, the chaotic system is utilized to generate B. Design and Validation of the PRBG
a Pseudo-Random Bit Generator (PRBG). The pseudo-random
bits generated this way are used to mask the original signal To obtain bits from each point x of one trajectory of the
[5], [6]. Using the above ideas, chaotic systems have been map (1), the value:
used as the basis for encryption methods for different signals, mod 1010 x, 1

(2)
such as image [7], [8], text [9], video [10] and audio [11],
[12]. This idea serves as the basis for most of the chaos-based is computed and subsequently compared with a threshold of
audio encryption methods [13]. 0.5. The corresponding bit is 1 when (2) is greater than the
Several chaotic audio encryption methods have been pro- threshold, and 0 otherwise. The randomness of the proposed
posed. In [3] and [14], audio encryption schemes based on method is validated through a set of statistical tests, defined
and implemented by the National Institute of Standards and
Technology (NIST) [22]. For the test to be executed, a total

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

of 100 ∗ 106 bits are generated using (2) and are used as input
to the statistical test suite, as suggested in the documentation.
C. Audio Encryption and Decryption
This work considers mono audio signals, that is audio
signals that consist of only one channel. Such signals can be
considered as one-dimensional time-series, with integer values
in the interval [−215 , 215 ] which are converted to their 16-bit
binary representation. In order to encrypt such a time-series,
the proposed PRBG (2) is used to generate 16 random bits
for each point of the original time-series. The encrypted value
is obtained by performing the Exclusive OR (XOR) operation
element-wise between the bits generated via the PRBG and
the binary representation of the original time-series. Fig. 2: Bifurcation diagram of the proposed map (1) for initial
For the decryption process, the same random bit sequence condition x0 = 0.4.
is obtained. Then the XOR operation is performed between
the encrypted signal and the random sequence. This operation
results into the original audio signal. proposed map is shown in Fig. 3. It can be observed that the
Lyapunov exponent is constantly positive, indicating that the
D. Real-Time Implementation proposed map (1) is chaotic.
To perform audio encryption in real time, the audio signal
is recorded in batches of a standard frequency of 4000Hz,
and duration, here chosen as 0.5s. This way, the time-series
recorded in each step has fixed length. In parallel to the
recording, the PRBG is used to produce the 16 bits to encrypt
each point of the recorded time-series. When both the audio
recording and bit generation are complete, XOR is performed
element-wise, thus obtaining the encrypted audio signal. This
process is repeated until the whole desired audio message is
recorded and encrypted. Regarding the initial conditions, for
the first recording the user selected the first initial condition
and parameter. For each subsequent encryption scheme, the
parameter remains the same, and the last point of the trajectory
is used as the initial value for the next encryption step. The
process is depicted in Fig. 1. Fig. 3: Lyapunov exponent of the proposed map (1) for initial
condition x0 = 0.4.

To further highlight the complexity of the proposed system


(1), the Cobweb plot is also presented in Fig. 4 with the blue
line, whereas the green line is the graph of the map (1).
B. PRBG Results
The proposed PRBG successfully passes all of the statistical
Fig. 1: Block diagram representation of the encryption process. tests of the NIST suite. The 100 ∗ 106 points are generated
using x0 = 0.4 as the initial condition for the chaotic map
For the decryption process, the map is initialized with (1). The results for each of the suite’s test together with the
the same initial conditions and parameter as the ones used related p-value are shown in Table I. For multiple occurrences
for encryption. Subsequently, the PRBG is used to generate of the same tests, the last result is stated.
enough bits for each of the trajectory points to be decrypted. C. Results of the Proposed Audio Encryption Method
Finally, the element-wise XOR is performed, resulting into the The proposed method is tested with a mono audio signal
original message. recorded on a frequency of 4000Hz, which contains a record-
III. R ESULTS ing of human speech. The original audio signal is depicted in
Fig. 5.
A. Chaotic Map Analysis Figure 6 represents the result of applying the proposed
The bifurcation diagram of the proposed map (1) is shown method. The encrypted audio-wave is homogeneous and does
in Fig. 2. Additionally, the Lyapunov exponent diagram of the not give away any information about the original signal.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

This becomes even more evident when considering the


histograms for the original and encrypted audio signals, as
shown in Figures 5 and 6 respectively.

Fig. 4: Cobweb diagram for the map (1) with initial condition
x0 = 0.4 and r = 3.98.
(a) Original Audio (b) Encrypted Audio
TABLE I: NIST test results for the proposed PRBG. The initial Fig. 7: Histograms for the original and encrypted audio signals.
condition for the map (1) is x0 = 0.4 and r = 6.5.
Frequency 0.319084 100/100 Subsequently, our attention is turned to the key-space anal-
BlockFrequency 0.883171 100/100
CumulativeSums 0.419021 100/100
ysis. There are two keys values to be selected for the proposed
Runs 0.045675 99/100 encryption method, namely the parameter r and the initial
LongestRun 0.574903 97/100 condition x0 for the chaotic map (1). Assuming typical 16bit
Rank 0.304126 99/100 precision, then 102∗16 is an upper bound for the keyspace.
FFT 0.759756 99/100
NonOverlappingTemplate 0.474986 99/100 Considering that 102∗16 = (103 )10.66 ≈ (210 )10.66 > 2100
OverlappingTemplate 0.366918 99/100 which is usually considered as a threshold for an encryption
Universal 0.171867 97/100 method to be resistant against brute force attacks [23].
ApproximateEntropy 0.026948 100/100
RandomExcursions 0.484646 58/61 Additionally, the key sensitivity [13] of the proposed method
RandomExcursionsVariant 0.005833 61/61 is considered. The same original audio is encrypted with the
Serial 0.236810 100/100 same parameter value r = 5 and initial conditions x10 = 0.3
LinearComplexity 0.851383 99/100
and x20 = 0.3+10−10 . Subsequently, the correlation coefficient
between the two encrypted audio signals is calculated. A
correlation coefficient close to zero indicates key sensitivity.
For the above setup, the correlation coefficient was calculated
as 2.8 · 10−3 , indicating the key sensitivity of the proposed
method.

D. Implementation Details

A Graphical User Interface in which the proposed method


is implemented is developed using Python. The interface is
shown in Fig. 8, which allows both encrypting and decrypting
Fig. 5: The audio-wave for the original audio signal used for a message. The user can select the initial condition and
evaluation of the proposed method. parameter values for the chaotic map (1). Additionally, they
can select where the encrypted audio file is to be stored, and
the filename to be used. Subsequently, the user presses ”Start”
and the audio recording from the computer’s microphone
begins. Audio is recorded and encrypted until the user presses
”Stop”. Once ”Stop” is pressed, the encrypted segments are
merged and the encrypted audio file is saved.
For the decryption process, the user can select the encrypted
file. Additionally, the user gets to select the directory and
filename where the decrypted message is to be saved. Finally,
the user inputs the initial condition and parameters for the
chaotic map (1). Pressing ”Decrypt” saves the decrypted audio
Fig. 6: The audio-wave for the encrypted audio signal. file in the specified location.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

[2] D. Selent, “Advanced encryption standard,” Rivier Academic Journal,


vol. 6, no. 2, pp. 1–14, 2010.
[3] R. I. Abdelfatah, “Audio encryption scheme using self-adaptive bit
scrambling and two multi chaotic-based dynamic dna computations,”
IEEE Access, vol. 8, pp. 69 894–69 907, 2020.
[4] C. H. Skiadas and C. Skiadas, Handbook of applications of chaos theory.
crc Press, 2017.
[5] R. B. Naik and U. Singh, “A review on applications of chaotic maps
in pseudo-random number generators and encryption,” Annals of Data
Science, pp. 1–26, 2022.
Fig. 8: The Graphical User Interface for using the proposed [6] M. Sharma, R. K. Ranjan, and V. Bharti, “A pseudo-random bit generator
method. based on chaotic maps enhanced with a bit-xor operation,” Journal of
Information Security and Applications, vol. 69, p. 103299, 2022.
[7] I. Kafetzis, L. Moysis, C. Volos, H. Nistazakis, J. M. Munoz-
Pacheco, and I. Stouboulos, “Automata-derived chaotic image encryption
IV. C ONCLUSIONS & D ISCUSSION scheme,” in 2022 11th International Conference on Modern Circuits and
Systems Technologies (MOCAST). IEEE, 2022, pp. 1–4.
In this work, a system for real-time chaotic audio encryption [8] I. Kafetzis and C. Volos, “Modification of the quantum logistic map
was proposed. Initially, a novel chaotic map is proposed and with application in pseudo-random bit generation and image encryp-
its dynamical behavior is studied. Subsequently, a PRBG was tion,” in Complex Systems and Their Applications: Second International
Conference (EDIESCA 2021). Springer, 2022, pp. 85–110.
defined using the proposed chaotic map. This PRBG was [9] M. Lawnik, L. Moysis, and C. Volos, “Chaos-based cryptography: Text
used to determine a chaos-based audio encryption method. encryption using image algorithms,” Electronics, vol. 11, no. 19, p. 3156,
The proposed method encrypts audio as it was recorded, 2022.
[10] H. Elkamchouchi, W. M. Salama, and Y. Abouelseoud, “New video
rather than recording the audio signal and encrypting it af- encryption schemes based on chaotic maps,” IET Image Processing,
terwards. For this method to work, single-channel audio was vol. 14, no. 2, pp. 397–406, 2020.
recorded in batches of a given frequency and duration. In [11] E. A. Albahrani, “A new audio encryption algorithm based on chaotic
block cipher,” in 2017 Annual Conference on New Trends in Information
parallel to the recording, PRBG was used to produce the & Communications Technology Applications (NTICT). IEEE, 2017, pp.
necessary number of random bits to encrypt the recorded audio 22–27.
segment. Then the XOR operation was performed between [12] F. Farsana, V. Devi, and K. Gopakumar, “An audio encryption scheme
based on fast walsh hadamard transform and mixed chaotic keystreams,”
the recorded segment and the random bit sequence, resulting Applied Computing and Informatics, no. ahead-of-print, 2020.
in an encrypted segment. This process continues until the [13] E. A. Albahrani, T. K. Alshekly, and S. H. Lafta, “A review on audio
desired message was recorded. Then, all recorded segments encryption algorithms using chaos maps-based techniques,” Journal of
Cyber Security and Mobility, pp. 53–82, 2022.
were joined, to construct the encrypted signal. The security [14] A. Kumar and M. Dua, “Audio encryption using two chaotic map based
of the proposed method was confirmed via real-time tests. dynamic diffusion and double dna encoding,” Applied Acoustics, vol.
Finally, to prove the validity of the proposed method, it was 203, p. 109196, 2023.
[15] P. Muthukumar, “Secure audio signal encryption based on triple
implemented in Python. The implementation can be found at compound-combination synchronization of fractional-order dynamical
https://github.com/iokaf/Chaotic-Audio-Encryption. systems,” International Journal of Dynamics and Control, vol. 10, no. 6,
This work can be further extended in several ways. Initially, pp. 2053–2071, 2022.
[16] A. Akgül, S. Kaçar, and İ. Pehlivan, “An audio data encryption with
different chaotic maps and PRBGs can be defined and used. single and double dimension discrete-time chaotic systems,” Tojsat,
Additionally, the method can be extended to stereo audio vol. 5, no. 3, pp. 14–23, 2015.
signals, that is, audio signals with multiple channels. To do [17] S. N. Al Saad and E. Hato, “A speech encryption based on chaotic
maps,” International Journal of Computer Applications, vol. 93, no. 4,
this, multiple parallel threads can be defined, one for each pp. 19–28, 2014.
channel of the original audio signal. Finally, the segments of [18] L. Moysis, I. Kafetzis, M. S. Baptista, and C. Volos, “Chaotification of
the proposed method can be shuffled using the chaotic system one-dimensional maps based on remainder operator addition,” Mathe-
matics, vol. 10, no. 15, p. 2801, 2022.
before they are merged into the final recorded sound, thus [19] A. Jafari, I. Hussain, F. Nazarimehr, S. M. R. H. Golpayegani, and
adding extra security. The proposed method can be easily S. Jafari, “A simple guide for plotting a proper bifurcation diagram,”
modified to be applicable for higher recording frequencies. International Journal of Bifurcation and Chaos, vol. 31, no. 01, p.
2150011, 2021.
To this end, multiple iterations of the PRBG, potentially with [20] H. D. Abarbanel, R. Brown, and M. Kennel, “Lyapunov exponents in
different initial conditions, can be running in parallel, thus chaotic systems: their importance and their evaluation using observed
reducing the time required to produce enough random integers data,” International Journal of Modern Physics B, vol. 5, no. 09, pp.
1347–1375, 1991.
required for encrypting the signal. This can also results in an [21] I. Kafetzis, “Chaos maps,” https://github.com/iokaf/chaos-maps, 2022.
increased key-space and thus method security. [22] L. Bassham, A. Rukhin, J. Soto, J. Nechvatal, M. Smid, S. Leigh,
M. Levenson, M. Vangel, N. Heckert, and D. Banks, “A statistical
V. ACKNOWLEDGEMENTS test suite for random and pseudorandom number generators for cryp-
tographic applications,” 2010-09-16 2010.
The authors acknowledge funding from the National and [23] G. Alvarez and S. Li, “Some basic cryptographic requirements for chaos-
Kapodistrian University of Athens (NKUA), Special Account based cryptosystems,” International journal of bifurcation and chaos,
for Research Grants (SARG). vol. 16, no. 08, pp. 2129–2151, 2006.

R EFERENCES
[1] D. E. Standard et al., “Data encryption standard,” Federal Information
Processing Standards Publication, vol. 112, 1999.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Construction of Piecewise Chaotic Maps With


Tunable Statistical Mean
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176612

Lazaros Moysis∗† , Marcin Lawnik‡ , Murilo S. Baptista§ , Sotirios Goudos¶ , Christos Volos∗
∗: Laboratory of Nonlinear Systems - Circuits & Complexity, Physics Department, Aristotle University of Thessaloniki,
Thessaloniki, Greece. {lmousis,volos}@physics.auth.gr
† : Department of Mechanical Engineering, University of Western Macedonia, Kozani, Greece.
‡ : Department of Mathematics Applications and Methods for Artifiial Intelligence, Faculty of Applied Mathematics,

Silesian University of Technology, Kaszubska 23, 44-100 Gliwice, Poland. marcin.lawnik@polsl.pl


§ : Institute for Complex Systems and Mathematical Biology, SUPA, University of Aberdeen,

Aberdeen AB24 3UX, UK. Email: murilo.baptista@abdn.ac.uk.


¶ : Physics Department, Aristotle University of Thessaloniki, Thessaloniki, Greece, 54124. sgoudo@physics.auth.gr

Abstract—In numerous chaos based applications, it is often performance of a design, in the aforementioned applications.
important to use chaotic maps that showcase desirable statistical Motivated by this, the purpose of this work is to study the
characteristics, that can improve the performance of the designed effects of tuning the parameters of a piecewise map on its
architecture. Yet, it is hard to design maps that can achieve
the desired statistical properties. Motivated by this, this work statistical measures, like the mean value. Piecewise maps
considers a family of piecewise chaotic maps, and studies how [3]–[5], [15]–[21] are chosen for this study, as they are
tuning their parameters can affect the mean value of the very common in applications, have simple structure, and can
generated chaotic trajectories. Such a family of maps can be generate chaotic dynamics for wide parameter ranges. It is
highly practical in future chaos based applications. also easier to control the effect of parameter tuning on the
Index Terms—Chaos, piecewise map, mean value, encryption,
optimization shape of the map’s curve, which affects their behavior, as
will be shown in the next section. From the experimental
study performed on three maps, it is seen that by appropriate
I. I NTRODUCTION
parameter tuning, it is possible to control the mean value
The applications of chaos theory are well established in of the generated chaotic time series. This work can thus be
the fields of engineering, electronics, and communications a starting point towards developing a family of maps with
[1]. Examples include, pseudo random number/bit generators controllable statistical characteristics, for use in relevant chaos
[2]–[5], data security [6], secure communications [7], path based applications. We take an applied numerical perspective
planning [8]–[10], and optimization [11]–[14], among others. to this problem on the view of the non-linearity involved in 2
In such applications, a chaotic map is used as a source of our analysed maps, focusing on the structure of the mean
of unpredictability, and fed into the system architecture, to value appearing as a function of several tunable parameters.
improve its performance, with respect to a set of measures The rest of the work is structured as follows. In Section
relevant to the application at hand. II, the general form of the considered maps is presented.
For example, in data encryption and secure communications, In Section III three different maps are studied. Section IV
a chaotic system is used to mask an information signal concludes the work with a discussion on future goals.
through a series of operations like permutation, modulation,
and confusion, generating a masked signal that is statistically II. P ROPOSED FAMILY OF M APS
similar to random noise. Naturally, the chaotic source should
not reveal any statistical information to an adversary, to resist The maps considered in this work are piecewise maps of
unmasking attacks. In chaotic path planning, a chaotic map the form
feeds an autonomous agent with navigation commands, to (
unpredictably explore an area. Thus, the uniformity of the g1 (xi−1 ) 0 ≤ xi−1 ≤ b
xi = g(xi−1 ) = , (1)
generated trajectory will be heavily affected by the statistical g2 (xi−1 ) b < xi−1 ≤ 1
properties of the source map. In chaos optimization, chaotic
maps are used for the actions of selection, emigration, and that map on the interval xi ∈ [0, 1]. The maps have three
mutation, to improve the performance of evolutionary algo- control parameters a, b, c, and the functions g1 , g2 are designed
rithms. So maps with different statistical behavior will have so that they interpolate to the following three points g1 (0) = a,
adverse effects on the algorithm’s performance. g1 (b) = 1, g2 (1) = c. Tuning each one of the parameters has a
From the above examples, it can be understood that knowing different effect on the map’s behavior. These are the following.
how to control the behavior of a map, while maintaining 1) Parameter a ∈ [0, 1]. Increasing a changes the mapping
its chaotic behavior, will offer great ease in improving the of the first subfunction to g1 (xi−1 ) ∈ [a, 1]).

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

2) Parameter b ∈ [0, 1]. Changing the parameter b skews


the piecewise function g left/right-wise, with b = 0.5
being the symmetric case.
3) Parameter c ∈ [0, 1]. Increasing c changes the mapping
of the second subfunction to g2 (xi−1 ) ∈ [c, 1]).
Depending on the shape of the piecewise functions, tun-
ing the parameter c can change the complete mapping to
g(xi−1 ) ∈ [c, 1]), as will be shown in the next section. Hence,
we propose setting the parameter c to the default value c = 0
so that (1) always maps inside the complete interval [0, 1].
III. C ONSIDERED M APS
We will illustrate the effect of changing parameters a, b on
the mean value for three maps. Due to space limitations, other
statistical measures like the standard deviation are omitted,
but will be included in future studies. In all cases, the initial
condition of the maps is set to x0 = 0.111.
A. Skewed Tent Map
Based on the well-known skewed tent map, we propose the Fig. 1. (Top) Function g(x) in (2) for different parameter values. (Bottom)
Histograms for 105 iterations of the skewed tent map (2) for the corresponding
following model for further analysis: parameters.
(
1−a
b xi−1 + a 0 ≤ xi−1 ≤ b
xi = g(xi−1 ) = 1−c bc−1
, (2)
b−1 xi−1 + b−1 b < xi−1 ≤ 1
where a, b, c are the map’s parameters, used to control its
behavior. The parameters a, c control the two endpoints g(0) =
a, g(1) = c, while parameter b controls the skew position
between the two subfunctions. In Fig. 1(top), the function
g(x) for several choices of the parameters is shown, where the
effects of the parameter values on the shape of the function g
are clear. In Fig. 1(bottom) the corresponding histograms for
the time series {xi } are computed. What can be observed is
that in the symmetric case a = 0, b = 0.4999 the histogram is
uniform, while by increasing the values of a, b, the histogram Fig. 2. Diagram depicting the chaotic regions of the skewed tent map of (2)
becomes right skewed. So, tuning the values of parameters a, b for different parameter values, and c = 0. Black denotes chaotic regions and
will affect the mean value of the time series {xi }. Moreover, as white non-chaotic. The iteration step size is 0.0025.
previously noted, changing c will affect the mapping interval
of the map, and so in the following, we set c = 0. Note that B. Cosine Skewed Piecewise Map
for the symmetric case, the parameter is set to b = 0.4999,
The work [16] proposes several piecewise chaotic maps.
since for b = 0.5 the map falls into periodic behavior due to
Here, we consider a modification of a map proposed therein:
round-off errors.
To study the effect of the parameters a, b on the mean value,
the chaotic parameter ranges of the map of Eq. (2) are first xi = g(xi−1 ) =
  xi−1 
computed. The parameter pairs (a, b) that generate chaotic f b
−f (0)
−a

 + xi−1 +a 0 ≤ xi−1 < b
trajectories are shown in Fig. 2. It can be seen that the map = f (1)−f (0)
xi−1 −1
 b
, (3)
f −f (0)
exhibits chaos for a wide range of parameter values, which is  b−1 −c bc
+ x + b < xi−1 ≤ 1

f (1)−f (0) b−1 i−1 b−1
a desirable property for most applications. For the parameter
pairs that generate chaotic trajectories, Fig. 3 depicts the mean where f (x) = cos(ex−1 ). In the original map the edges were
value that is obtained for 105 iterations of the map. What can mapped on 0 and 1. In the modified map, g(0) = a and g(1) =
be observed is that for lower parameter values, the mean is c. A graph depicting the function g, for a = 0.2, b = 0.6,
around 0.5, indicating a uniform histogram. As the parameters c = 0.3, is shown in Fig. 4. The map has a form similar to
increase though, the mean value increases, indicating a right the skewed tent map, but each subfunction is nonlinear. The
skewed histogram. This diagram can serve as a guide when shape of the subfunctions is convex.
applying the map to an application like the ones mentioned in Figure 5 depicts the parameter pairs that generate chaotic
the introduction. trajectories. Most parameter pairs indeed generate chaos, al-

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Fig. 5. Diagram depicting the chaotic regions of the map of Eq. (3), with
f (x) = cos(ex−1 ), for different parameter values and c = 0. Black denotes
chaotic regions and white non-chaotic.

Fig. 3. Diagram depicting the changes in the mean value of the skewed tent
map (2) for different parameter values, and c = 0. The iteration step size is
0.0025.

though there are some non-chaotic regions appearing, which


have self-similar characteristics, the so called ”shrimps”. For
the chaotic regions, Fig. 6 depicts the changes in mean value.
It can be observed that both lower and higher values can be
achieved, so the histogram can swift from left to right skewed.

Fig. 6. Diagram depicting the changes in the mean value of the map of Eq.
(3), with f (x) = cos(ex−1 ), for different parameter values and c = 0.

Fig. 4. The function g(x) of the map of Eq. (3), with f (x) = cos(ex−1 ),
for a = 0.2, b = 0.6, c = 0.3.

C. Secant Skewed Piecewise Map


A second variation of Eq. (3) is considered, with f (x) =
sec(ex−1 ). Fig. 7 depicts the shape of g(x) for a = 0.2, b =
0.6, c = 0.3. The shape of the subfunctions is convex, similar
to the previous map.
Fig. 8 depicts the pairs that generate chaotic trajectories,
where it is seen that there are wide parametric regions that
generate chaos, similar to the Cosine Skewed Piecewise Map. Fig. 7. The function g(x) of the map (3), with f (x) = sec(ex−1 ), for
For these regions, in Fig. 9 it is seen that as with the previous a = 0.2, b = 0.6, c = 0.3.
map, the mean value can range from around 0.2, up to 0.8.
IV. C ONCLUSIONS three different chaotic maps, we experimentally observed that
In this work, a generalized family of piecewise chaotic maps by tuning the control parameters, it is possible to affect the
was proposed, with three control parameters that correspond statistical mean of the map’s chaotic time series. This property
to interpolating limit points of each subfunction. By studying can be highly effective in many chaos based applications, and

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

[2] A. V. Tutueva, E. G. Nepomuceno, A. I. Karimov, V. S. Andreev, and


D. N. Butusov, “Adaptive chaotic maps and their application to pseudo-
random numbers generation,” Chaos, Solitons & Fractals, vol. 133, p.
109615, 2020.
[3] Y. Wang, Z. Liu, J. Ma, and H. He, “A pseudorandom number generator
based on piecewise logistic map,” Nonlinear Dynamics, vol. 83, no. 4,
pp. 2373–2391, 2016.
[4] M. Irfan, A. Ali, M. A. Khan, M. Ehatisham-ul Haq, S. N.
Mehmood Shah, A. Saboor, and W. Ahmad, “Pseudorandom number
generator (prng) design using hyper-chaotic modified robust logistic map
(hc-mrlm),” Electronics, vol. 9, no. 1, p. 104, 2020.
[5] D. Lambić, “Security analysis and improvement of the pseudo-random
number generator based on piecewise logistic map,” Journal of Elec-
tronic Testing, vol. 35, no. 4, pp. 519–527, 2019.
[6] F. Özkaynak, “Brief review on application of nonlinear dynamics in
image encryption,” Nonlinear Dynamics, vol. 92, no. 2, pp. 305–313,
Fig. 8. Diagram depicting the chaotic regions of the map (3), with f (x) = 2018.
sec(ex−1 ), for different parameter values, and c = 0. Black denotes chaotic [7] M. S. Baptista, “Chaos for communication,” Nonlinear Dynamics, vol.
regions and white non-chaotic. 105, no. 2, pp. 1821–1841, 2021.
[8] J. J. Cetina-Denis, R. M. Lopéz-Gutiérrez, C. Cruz-Hernández, and
A. Arellano-Delgado, “Design of a chaotic trajectory generator algo-
rithm for mobile robots,” Applied Sciences, vol. 12, no. 5, p. 2587,
2022.
[9] P. S. Gohari, H. Mohammadi, and S. Taghvaei, “Using chaotic maps for
3d boundary surveillance by quadrotor robot,” Applied Soft Computing,
vol. 76, pp. 68–77, 2019.
[10] D.-I. Curiac, O. Banias, C. Volosencu, and C.-D. Curiac, “Novel
bioinspired approach based on chaotic dynamics for robot patrolling
missions with adversaries,” Entropy, vol. 20, no. 5, p. 378, 2018.
[11] S. Saremi, S. Mirjalili, and A. Lewis, “Biogeography-based optimisation
with chaos,” Neural Computing and Applications, vol. 25, no. 5, pp.
1077–1097, 2014.
[12] D. Yang, Z. Liu, and J. Zhou, “Chaos optimization algorithms based on
chaotic maps with different probability distribution and search speed
for global optimization,” Communications in Nonlinear Science and
Numerical Simulation, vol. 19, no. 4, pp. 1229–1246, 2014.
[13] M. A. Tawhid and A. M. Ibrahim, “Improved salp swarm algorithm
combined with chaos,” Mathematics and Computers in Simulation, vol.
202, pp. 113–148, 2022.
[14] A. Saxena, R. Kumar, and S. Das, “β-chaotic map enabled grey wolf
optimizer,” Applied Soft Computing, vol. 75, pp. 84–105, 2019.
[15] M. Lawnik and M. Berezowski, “New chaotic system: M-map and its
Fig. 9. Diagram depicting the changes in the mean value of the map (3), application in chaos-based cryptography,” Symmetry, vol. 14, no. 5, p.
with f (x) = sec(ex−1 ), for different parameter values and c = 0. 895, 2022.
[16] H. Zang, Y. Yuan, and X. Wei, “Research on pseudorandom number
generator based on several new types of piecewise chaotic maps,”
Mathematical Problems in Engineering, vol. 2021, 2021.
specifically chaotic path planning, and chaos optimization. [17] Z. Zhang, Y. Wang, L. Y. Zhang, and H. Zhu, “A novel chaotic map
This work is a starting point for addressing several different constructed by geometric operations and its application,” Nonlinear
Dynamics, vol. 102, pp. 2843–2858, 2020.
research questions. Our main future research goal is to provide [18] N. Nagaraj, “The unreasonable effectiveness of the chaotic tent map in
a theoretical derivation of the expected mean value and stan- engineering applications,” Chaos Theory and Applications, vol. 4, no. 4,
dard deviation with respect to parameters a, b. Moreover, a pp. 197–204, 2022.
[19] D. P. Chaves, C. E. Souza, and C. Pimentel, “A smooth chaotic map with
second goal is to apply the derived maps to the application parameterized shape and symmetry,” EURASIP Journal on Advances in
of chaos based optimization, and evaluate the performance Signal Processing, vol. 2016, no. 1, p. 122, 2016.
for different parameter pairs. By addressing the above, the [20] S. Zhang and L. Liu, “A novel image encryption algorithm based on
spwlcm and dna coding,” Mathematics and Computers in Simulation,
proposed family of maps can be established as a highly vol. 190, pp. 723–744, 2021.
efficient tool for chaos based applications. [21] X. Zhao, H. Zang, and X. Wei, “Construction of a novel n th-order
As a final remark, it is worth noting that a close family polynomial chaotic map and its application in the pseudorandom number
generator,” Nonlinear Dynamics, vol. 110, no. 1, pp. 821–839, 2022.
to piecewise maps is the recent development of chaotic maps [22] M. Akraam, T. Rashid, and S. Zafar, “An image encryption scheme pro-
based on fuzzy numbers [22]–[24]. This is another family of posed by modifying chaotic tent map using fuzzy numbers,” Multimedia
maps that can be explored in the future, with respect to the Tools and Applications, pp. 1–19, 2022.
[23] N. Charalampidis, C. Volos, L. Moysis, H. E. Nistazakis, and I. Stoubou-
control of statistical performance. los, “Composition of fuzzy numbers with chaotic maps,” in Nonlinear
Dynamics and Complexity: Mathematical Modelling of Real-World
R EFERENCES Problems. Springer, 2022, pp. 133–150.
[24] L. Moysis, C. Volos, S. Jafari, J. M. Munoz-Pacheco, J. Kengne,
[1] G. Grassi, “Chaos in the real world: Recent applications to communi- K. Rajagopal, and I. Stouboulos, “Modification of the logistic map using
cations, computing, distributed sensing, robotic motion, bio-impedance fuzzy numbers with application to pseudorandom number generation and
modelling and encryption systems,” Symmetry, vol. 13, no. 11, p. 2151, image encryption,” Entropy, vol. 22, no. 4, p. 474, 2020.
2021.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Energy Consumption Assessment in Refrigeration


Equipment: The SmartFridge Project
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176514

Aikaterini I. Griva Vasileios P. Rekkas Kyriakos Koritsoglou


ELEDIA@AUTh, School of Physics ELEDIA@AUTh, School of Physics Department of Information and
Aristotle University of Thessaloniki Aristotle University of Thessaloniki Telecommunications
Thessaloniki, Greece Thessaloniki, Greece University of Ioannina
aigriva@physics.gr vrekkas@physics.auth.gr Arta, Greece
kkoritsoglou@uoi.gr

Sotirios P. Sotiroudis Achilles D. Boursianis Maria S. Papadopoulou


ELEDIA@AUTh, School of Physics ELEDIA@AUTh, School of Physics Department of Information and Electronic
Aristotle University of Thessaloniki Aristotle University of Thessaloniki Engineering
Thessaloniki, Greece Thessaloniki, Greece International Hellenic University
ssoti@physics.auth.gr bachi@physics.auth.gr Sindos, Greece
mspapa@ihu.gr

Sotirios K. Goudos
ELEDIA@AUTh, School of Physics
Aristotle University of Thessaloniki
Thessaloniki, Greece
sgoudo@physics.auth.gr

Abstract—Energy consumption is a key factor in environmental Manufacturers of refrigeration equipment are consequently
sustainability and the economy. In the next 30 years, global food placing more emphasis on strategies that will reduce the
production will need to increase rapidly and, in this context, amount of power required for effective functioning [2].
refrigeration will play a pivotal role. In the SmartFridge project,
several aspects of the operating cycle of refrigeration equipment Many researchers are working in the field of smart equip-
are explored. Energy consumption reduction is one of the most ment and smart devices by trying to improve the perfor-
important use cases of the project. For this reason, this work mance of the devices while decreasing energy consumption.
provides an assessment of three scenarios in a refrigerator model, Mathematical models [3], simulated models [4], [5], as well
in which the indoor temperature and moisture, as well as the as experimental ones [6], [7] have been proposed to predict
interior temperature, are explored to quantify their impact on
energy consumption. the energy consumption of refrigerators. In the SmartFridge
Index Terms—Smart refrigerator, energy consumption, indoor project, several approaches will be adapted to study the energy
temperature, indoor moisture, interior temperature consumption of refrigeration equipment in food retail stores
[8].
I. I NTRODUCTION During the day when the food retail stores are open, a
In the last few years, Europe is experiencing an energy considerable increase in power consumption is recorded in
crisis. The cost of electricity is continuously rising and, at the refrigeration systems due to the repeated door opening by the
same time, the amount of power consumed is also increasing. customers. As a result, the refrigeration systems of cooling
Companies in the food retail sector are ranked second in chambers are set to operate at lower temperatures to maintain
terms of the energy footprint in developed countries, behind the interior temperature within desired limits and insure the
industries. According to the Food and Agriculture Organi- quality of the stored food. However, during the night when
zation (FAO), global food production will need to increase the retail stores are closed, the systems could be adjusted to
by 70% to feed an additional 2.3 billion people by 2050 higher temperatures without any risk to the preservation of
and, in this context, refrigeration will play a pivotal role [1]. food quality.
The core idea in the SmartFridge project is the imple-
This research was carried out as part of the project ”Smart Fridge Using mentation of a new system that combines IoT technology,
IoT Technology” (Project code: KMP6-0078831) under the framework of wireless sensor networking (WSN), and embedded artificial
the Action ”Investment Plans of Innovation” of the Operational Program
”Central Macedonia 2014 2020”, that is co-funded by the European Regional intelligence (AI) to reduce the energy consumption of re-
Development Fund and Greece. frigeration equipment. In [2], the development of an IoT-

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Fig. 1. Schematic of the Simulink refrigerator model [9].

based device is proposed to reduce the power consumption • Compressor: The cooling process of the refrigerator
of a real system. The implementation of the overall system is begins in the compressor. The motor that powers the
ongoing. In this work, we focus on the evaluation of the energy compressor increases the pressure and temperature of the
consumption in computed scenarios using a refrigerator model. refrigerant gas before delivering it to the condenser.
Computed and experimental results will be evaluated and the • Condenser: The refrigerant liquefies in the condenser. The
overall performance of the proposed system will be assessed. hot vapors are collected by the condenser and cooled into
Furthermore, a machine-learning model will be developed to a liquid.
extract the desired period times where the reduce of energy • Evaporator: In the evaporator, the cooling process comes
consumption can take place based on the experimental data. to an end and the next cycle starts. It converts the
Finally, the extracted period times will also be evaluated remaining refrigerant liquid into vapor, which is used by
compated to the computed results by the proposed model in the compressor to restart the process.
this work. • Accumulator: This module keeps liquid refrigerant from
The remainder of this paper is structured as follows. In Sec- rushing back into the compressor in vapor compression
tion II, an overview of the simulation model, which is selected refrigeration systems.
to assess the performance of the refrigeration equipment is • Capillary tube: A thin set of copper tubes called the
presented. In Section III, the computed scenarios, as well as expansion valve drastically reduces the temperature and
their results, are analyzed and discussed. The conclusion of pressure of the liquid refrigerant, which results in around
the work is outlined in Section IV. half of it evaporating. The low temperature inside the
refrigerator is produced by the constant evaporation that
II. M ATERIALS AND M ETHODS takes place in the capillary tube.
• Compartment: The compartment is mainly the fridge
A. Simulation model chamber where the food is stored in. The door is also
The Simulink Residential Refrigerator Model is provided by an important part of the compartment.
Simulink, Mathworks [10]. It depicts a simple refrigeration Details of the utilized model can be found in [9].
system that transfers heat between the moist air mixture of
the environment and the two-phase fluid of the refrigerant. B. Model Parameterization
The R134a refrigerant is driven by the compressor through
a condenser, a capillary tube, and an evaporator. Pure vapor Three main parameters that mostly affect the energy con-
is returned to the compressor by an accumulator. Two fans sumption of the model are selected to be evaluated.
circulate moist air over the condenser and evaporator. The • Indoor temperature: A temperature range from 8°C to
evaporator airflow is split between the main compartment and 32°C is chosen, as those temperature values can be
the freezer. Finally, a controller prompts the operation of the found inside a typical food retail store in Greece (and
compressor to maintain the compartment air temperature at in continental climates in general) during the year. The
the setpoint temperature [9]. computed scenarios are taken place during the night when
In detail, the model consists of the following modules as the stores are closed and the heating/cooling system is
shown in Fig. 1: probably turned off.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

• Indoor moisture: The amount of moisture that the air


can hold depends on the temperature of the air. A wide
range of theoretical values of moisture is considered in
the computed scenarios, at an average indoor temperature
of 22°C.
• Interior temperature: The temperature inside the fridge
is the most important parameter to ensure the quality of
the food. The temperature threshold in the refrigeration
equipment is strictly determined by the European Food
Safety Authority (EFSA) [11] and the Hellenic Food
Authority (EFET) [12]. For example, meat and poultry
should be stored at 0–2°C and strictly no higher than Fig. 3. Energy consumption as a function of the indoor moisture.
4°C. Milk and cheese are recommended to be stored at
4°C, and fruits, such as melon and oranges, should be
stored at around 7°C [13]. Thus, a range from 1°C to Scenario 2. In the next scenario, the impact of indoor mois-
7°C is explored in the computed scenarios. ture is considered in our model. Several values are selected,
III. C OMPUTED R ESULTS starting from 25% up to 90%. The indoor temperature is set
to 22°C. The computed results are shown in Fig. 3. The
The Simulink refrigeration model is utilized to assess the energy consumption of the model is recorded to be almost
operation of a fridge during the night. The duration of each stable in every case. By changing the moisture conditions
simulation is set at 12 hours. During this time, the fridge door from 25% to 90% an increase in consumption by 2.12% is
is considered to be closed. This assumption is based on the recorded. With a change in the indoor temperature to 18°C
opening hours of many food retail stores in Greece. The usual and 32°C, an increase of 1.99% and 2.23% in the consumption
opening hours in large cities are from 9:00 a.m. to 9:00 p.m. is respectively obtained. As a result, the effect of the indoor
each day. moisture to the energy consumption can be considered of low
Scenario 1. In our first scenario, the impact of the indoor significance.
temperature on the energy consumption of the fridge model
Scenario 3. In the last scenario, the impact of the interior
is explored. A wide range of temperature values, from 8°C to
temperature of the model is studied. The indoor temperature
32°C, is selected. The energy consumption of the system is
changes from 8°C to 32°C and the indoor moisture is kept
recorded in 12 hours of constant operation without any door
at 50%. A range of interior temperatures, between 1°C and
opening. The results are shown in Fig. 2. By increasing the
7°C, is selected. The results are shown in Fig. 4. As expected,
temperature of the environment from 18°C to 22°C, we record
the increase in the interior temperature reduces the energy con-
an increase of 34.96% in energy consumption. Moreover, by
sumption of the model. This results from a consequence of the
increasing the temperature from 22°C to 30°C, an increase
decrease between the temperature inside the cooling chamber
of 61.5% in energy consumption is found. The recorded
and the indoor temperature. For example, by changing the
consumption of the model is 8.73 times higher at 32°C than at
temperature from 1°C to 7°C at 22°C, the consumption is
8°C. Thus, the indoor temperature greatly impacts the energy
decreased by 28.96%, and from 1°C to 7°C at 8°C is decreased
consumption of the cooling chamber. For example, if we
by 92.95%. Finally, from 1°C to 7°C at 32°C, the consumption
consider that in Greece the typical electricity cost is e0.159
is decreased by 20.81% as shown in Fig. 4
per kWh, the minimum operation cost of a cooling chamber
at 10°C is e8.36 per year, at 18°C is e20.89 per year, and at Another important result can be drawn from the variations
32°C is e50.33 per year.

Fig. 4. Energy consumption as a function of the interior and indoor


Fig. 2. Energy consumption as a function of the indoor temperature. temperature

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Fig. 5. Quantitative change of the interior temperature.

in the rate of the interior temperature in the model. We choose [3] S. Ledesma, J. M. Belman-Flores, E. Cabal-Yepez, A. Morales-Fuentes,
the reference point at 0.5°C and calculate the quantitative J. A. Alfaro-Ayala, and M.-A. Ibarra-Manzano, “Mathematical models
to predict and analyze the energy consumption of a domestic refrig-
variation from 1°C to 7°C. The results are presented for the erator for different position of the shelves,” IEEE Access, vol. 6, pp.
lowest and highest value of indoor temperature and for an 68882–68891, 2018, doi: 10.1109/ACCESS.2018.2880653.
average one (8°C, 32°C, and 22°C) as shown in Fig. 5. By [4] D. Villanueva, D. San-Facundo, E. Miguez-Garcı́a, and A. Fernández-
Otero, “Modeling and Simulation of Household Appliances Power
increasing the indoor temperature, the variation in the rate is Consumption,” Applied Sciences, vol. 12, no. 7, p. 3689, Apr. 2022,
significantly decreased. The results highlight the importance doi: 10.3390/app12073689.
of the valid configuration of the parameters in the presented [5] C.J. Hermes, C. Melo, F. T. Knabben, and J. M. Gonçalves, ”Prediction
of the energy consumption of household refrigerators and freezers
model, especially in scenarios with low indoor temperatures. via steady-state simulation.” Applied Energy, 86(7-8), 1311-1319, doi:
10.1016/j.apenergy.2008.10.008
[6] G. Cortella, P. D’Agaro, M. Franceschi, and O. Saro, “Prediction of
IV. C ONCLUSIONS the energy consumption of a supermarket refrigeration system,” in Proc.
23rd International Congress of Refrigeration, 2011.
In this work, an assessment of a refrigerator model by terms [7] H. Nasir, W. B. W. Aziz, F. Ali, K. Kadir, and S. Khan, “The
of the energy consumption has been presented. Parameters implementation of iot based smart refrigerator system,” in 2018 2nd
such as indoor temperature and moisture were explored to International Conference on Smart Sensors and Application (ICSSA),
2018, pp. 48–52.
find the effect on the energy consumption of the model. [8] K. Koritsoglou, M. Papadopoulou, A. Boursianis, A. Griva, S. Niko-
As the indoor temperature increases, consumption is also laidis, and S. Goudos, “Reducing power consumption in refrigeration
increased. Regarding the indoor moisture, a small effect on equipment using iot technology: The smartfridge project.” presented at
IEEE – 7th South-East Europe Design Automation, Computer Engi-
the energy consumption of the fridge was recorded during neering, Computer Networks and Social Media Conference, Ioannina,
the simulations. Finally, the impact of the selection of the Greece, Sept. 2022.
interior temperature is also studied and its notable effect is [9] “Residential refrigerator.” [Online]. Available:
https://www.mathworks.com/help/hydro/ug/refrigerator.html.
highlighted. As future work, the computed results will be [10] “Simulink Environment, The MathWorks, Inc.” [online]. Available:
compared with the experimental results of the SmartFridge https://www.mathworks.com/products/simulink.html.
project. In addition, a machine learning model will be designed [11] “European food safety authority.” [Online]. Available:
https://www.efsa.europa.eu/en.
to predict the energy consumption of refrigeration equipment [12] “Hellenic food authority.” [Online]. Available:
and its performance will be compared with the findings of this https://www.efet.gr/index.php/en/.
work. [13] “Hygiene declaration.” [Online (in Greek)]. Available:
https://www.efet.gr/files/F5242odhgap.pdf.

R EFERENCES
[1] D. Coulomb, J. L. Dupont, and A. Pichard, “The role of refrigeration in
the global economy.” in Proc. 29th Informatory Note on Refrigeration
Technologies, 2015, pp. 1-16.
[2] K. Koritsoglou, M. Papadopoulou, A. Boursianis, P. Sarigiannidis,
S. Nikolaidis, and S. Goudos, “Smart refrigeration equipment based
on iot technology for reducing power consumption,” 11th Interna-
tional Conference on Modern Circuits and Systems Technologies
(MOCAST), Bremen, Germany, 2022, pp. 1-4, doi: 10.1109/MO-
CAST54814.2022.9837760.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Ensemble Learning Technique for Artificial


Intelligence Assisted IVF Applications
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176690

George Vergos Lazaros Alexios Iliadis Paraskevi Kritopoulou


ELEDIA@AUTH, School of Physics ELEDIA@AUTH, School of Physics Dept. of Applied Informatics
Aristotle University of Thessaloniki Aristotle University of Thessaloniki University of Macedonia
Thessaloniki, Greece Thessaloniki, Greece Thessaloniki, Greece
gvergos@physics.auth.gr liliadis@physics.auth.gr pkritopoulou@uom.edu.gr

Achilleas Papatheodorou Achilles D. Boursianis Konstantinos-Iraklis D. Kokkinidis


Embryolab Fertility Clinic ELEDIA@AUTH, School of Physics Dept. of Applied Informatics
Thessaloniki, Greece Aristotle University of Thessaloniki University of Macedonia
a.papatheodorou@embryolab.eu Thessaloniki, Greece Thessaloniki, Greece
bachi@physics.auth.gr kostas.kokkinidis@uom.edu.gr

Maria S. Papadopoulou Sotirios K. Goudos


Dept. of Information and Electronic Eng. ELEDIA@AUTH, School of Physics
International Hellenic University Aristotle University of Thessaloniki
Sindos, Greece Thessaloniki, Greece
mspapa@ihu.gr sgoudo@physics.auth.gr

Abstract—In-vitro fertilization (IVF) is considered one of the of the embryo’s development. TLI enables controlled culture
most effective assisted reproductive approaches. However, it conditions, while the developmental events are annotated in a
involves a series of complex and expensive procedures, that result dynamic procedure. Such events, which are called morphoki-
in approximately 30% success rate. New technological concepts,
like deep learning (DL), are required to increase the success rate netic parameters, include cell divisions, blastocyst formation,
while simplifying the procedure. DL techniques can automatically and expansion [2].
extract valuable features from the available data, can be flexible,
and can work efficiently on multiple problems. Motivated by these Deep learning (DL) methods are part of machine learning
characteristics, in this work, a DL ensemble model is trained on methodologies and have been established as the most success-
a private dataset to classify blastocysts’ images. Further analysis
is conducted to provide insights for future research. ful set of approaches in the field of computer vision (CV) [3].
Index Terms—Deep learning, IVF, convolutional neural net- Convolutional neural networks (CNNs), after their success in
work, ensemble learning various image classification tasks [4], constitute the dominant
approach in CV and have been widely utilized in several CV
I. I NTRODUCTION tasks, including medical imaging [5]. In the last five years,
a growing research trend for DL-empowered IVF approaches
Thanks to in-vitro fertilization (IVF) technology, millions of
can be identified [6]. Motivated by the above, in this paper, an
kids have been born. However, only one-third of the couples,
ensemble DL model is trained on a private dataset to classify
who resort to IVF, manage to have a child, despite the long
blastocysts’ images.
and expensive procedures. To overcome several challenges,
such as the age, the quality of the embryo, and the limitations
This work is part of the ”Smart Embryo” project which
of the required technology [1], embryologists and researchers
aims to develop artificial intelligence (AI) based software
search for new tools and methods that will provide better
to assist embryologists to evaluate the blastocysts’ quality
results. Time-lapse imaging incubators (TLI) were introduced
before transfer. This research project aims to develop novel
as an IVF method more than ten years ago. Setting regular
AI methods for DL-powered IVF.
time intervals, TLI takes photographs, which form a video

This research was carried out as part of the project ”Classification and This work is structured in the following way: Section II
characterization of fetal images for assisted reproduction using artificial describes the problem, whereas Section III provides the details
intelligence and computer vision” (Project code: KP6-0079459) under the of our DL model. The experiments and the results of our
framework of the Action ”Investment Plans of Innovation” of the Operational
Program ”Central Macedonia 2014 2020”, that is co-funded by the European approach are discussed in Section IV. The conclusions are
Regional Development Fund and Greece. presented in Section V.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

TABLE I: ICM score

ICM grade ICM quality


A Large number of cells that are tightly packed
B Many cells that are loosely grouped
C Small number of cells

TABLE II: TE score


(a) AA (b) AB
TE grade TE quality
A Large number of cells, which form a compact layer
B Several cells, which form a loose epithelium
C Small number of large cells

two classes; class 0 and class 1. Class 0 represents the images


(c) BA (d) BB
that have one of the following ICM and TE scores combined:
AA, AB, BA. Class 1 represents all the other images.

III. DL METHODOLOGIES
To address the problem of blastocysts’ quality classification,
three different DL architectures are utilized to construct an
ensemble learner. All three models are based on the convolu-
(e) BC (f) CB tion filters (kernels) that slide along the images to provide the
respective feature maps.

A. DL models
AlexNet [8] is the first CNN that outperformed the classical
CV algorithms in the Imagenet large-scale visual recognition
challenge (ILSVRC) [9]. AlexNet utilizes five convolutional
layers, while down-sampled feature maps are created by the
(g) CC (h) DD three max-pooling layers. The flattening of the data and the
Fig. 1: Blastocyst images obtained from our private dataset. outcomes are the results of the three fully-connected layers.
The activation function in the convolutional layers is the
rectified linear unit (ReLU). ReLU is simply defined as
II. P ROBLEM DESCRIPTION g(z) = max(0, z) (1)
The different aspects of the blastocysts’ grading system are
Another popular CNN model is the VGG11 [10]. VGG11
presented in this Section. This morphological evaluation sys-
has eight convolution layers, five max-pooling layers, and three
tem has been developed to evaluate the quality of a blastocyst
fully-connected layers. As in the AlexNet architecture, the
before the transfer and can be exploited by DL models. The
activation function is taken to be the ReLU function. Both
number of cells inside the embryo begins to outgrow the space
AlexNet and VGG11 use dropout layers, which randomly set
inside the zona pellucida. Once this zone breaks, the blastocyst
input units to 0. In this way, DL models avoid over-fitting.
can hatch. Each blastocyst contains two types of cells; those
A variant of VGG is constructed for this particular problem.
that form the placental tissue, and those that form the fetus.
This model has seven convolutional layers, four max-pooling
Two key parameters that help embryologists decide which
layers, and three fully-connected layers. In contrast to AlexNet
blastocyst has more viability chances [7] are:
and VGG11, no dropout layer is utilized.
• Inner cell mass (ICM) score, (A) through (D)
• Trophectoderm (TE) score, (A) through (D) B. Ensemble learning
In Fig. 1, samples of blastocyst images are depicted. The Ensemble learning is frequently utilized to achieve better
quality scores are presented in Tables I and II. Since the results. In this work, VGG11, AlexNet, and the variant of VGG
blastocysts’ grading system is based only on morphological constructed an ensemble classifier. By combining the proba-
characteristics, their classification may be formulated into a bility outputs of each base model into two new probabilistic
CV problem. outcomes, the ensemble can predict with greater confidence the
In this work, the dataset is constructed in the following way. blastocyst’s class. The fact that two of the DL models contain
First, images, accompanied by their respective metadata, are dropout layers results in a large variance. To stabilize and
collected from a TLI. Then, the images are categorized into further improve the DL models’ performance, five instances

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

of this learner formed a voting classifier that takes the average


over the predictions of all estimators.

IV. E XPERIMENTS AND R ESULTS

In this work, an ensemble learning approach is taken for


the task of blastocysts image classification. The private dataset
consists of 2269 images categorized into two classes; class 0
(suitable for transfer) and class 1 (unsuitable for transfer). The
dataset is split into a training set (80% of the initial set) and
a test set (20% of the initial set).
To update the models’ weights, Adam (adaptive moment es-
timation) algorithm [11] is utilized. The training is performed Fig. 2: Model Architecture
on an NVIDIA RTX 3080 Ti GPU. The GPU processor is a
chip with a die area of 628 mm2 and 28,300 million transistors,
making it suitable for DL applications. Each base model is
trained for 40 epochs with a mini-batch size of 8. The initial
learning rate’s value is set to 10−4 and it is divided by 10
when the error reaches a plateau. VGG11 performs the best
with an accuracy of 76% while AlexNet and the VGG Variant
have an accuracy score of 73% and 72% respectively. By
combining the outputs of the base models into an ensemble
the accuracy is further increased to 78%. Finally, to further
improve the model’s predictive capability, a voting classifier
is implemented by taking five instances of the ensemble model
and it is trained for 5 epochs. The whole algorithmic procedure
is depicted in Fig. 2. The final accuracy score is improved to
81%. The loss and accuracy graphs are shown in Fig. 3 and Fig. 3: Loss
Fig. 4 respectively.
The accuracy score itself cannot provide enough information
about the classifier’s learning capabilities. Considering our
problem (and binary classification problems in general), a data
sample (instance) is classified as either belonging to class 0
or class 1. The following four outcomes are possible:
1) True Positive (Tp ): The data sample belongs to class 0
and it is classified as class 0
2) False Negative (Fn ): The data sample is in class 0 and
it is classified as class 1
3) True Negative (Tn ): The data sample belongs to class 1
and it is classified in class 1
4) False Positive (Fp ): The data sample is in class 1 and it
is classified as part of class 0 Fig. 4: Accuracy
The above outcomes are better formatted in the confusion
matrix configuration. In our case, the voting classifier’s perfor-
mance is illustrated in Fig. 5. Using two other scores, namely The f1-score is the harmonic mean of the above measures, and
Precision and Recall, one can define the f1-score which is a it is defined as
measure of the proposed algorithm’s accuracy. Recall shows Rec · P rec
f1-score = 2 · (4)
how many of the 0 class instances were predicted correctly Rec + P rec
In our case, the final f1-score = 85.5%, which validates our
Tp model’s good performance.
Rec = (2)
Tp + Fn
V. C ONCLUSIONS
and Precision is defined as In this work, an ensemble learner is trained on a private
Tp dataset to classify blastocysts’ images. Three different DL
P rec = (3) models were trained from scratch, and then they constructed
Tp + Fp

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

[11] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-


tion,” in 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.

Fig. 5: Confusion matrix of the final ensemble learner

an ensemble learner. Five different instances of this learner


formed the final classifier, in a voting framework. This DL
method achieved over 81% prediction accuracy, thus it is
considered quite satisfactory. Future research includes the
extension to a multi-class classification formulation and the
utilization of other DL architectures.

R EFERENCES

[1] S. Dyer, G. Chambers, J. de Mouzon, K. Nygren, F. Zegers-Hochschild,


R. Mansour, O. Ishihara, M. Banker, and G. Adamson, “Interna-
tional Committee for Monitoring Assisted Reproductive Technologies
world report: Assisted Reproductive Technology 2008, 2009 and 2010,”
Human Reproduction, vol. 31, no. 7, pp. 1588–1609, Apr. 2016,
doi:10.1093/humrep/dew082.
[2] C. Pribenszky, A.-M. Nilselid, and M. H. M. Montag, “Time-lapse
culture with morphokinetic embryo selection improves pregnancy and
live birth chances and reduces early pregnancy loss: a meta-analysis.”
Reproductive biomedicine online, vol. 35 5, pp. 511–520, Nov. 2017.
[3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, ser. Adap-
tive computation and machine learning. MIT Press, Nov. 2016.
[4] W. Rawat and Z. Wang, “Deep convolutional neural networks for image
classification: A comprehensive review,” Neural Computation, vol. 29,
pp. 1–98, 06 2017, doi:https://doi.org/10.1162/NECO a 00990.
[5] A. Kumar, J. Kim, D. Lyndon, M. Fulham, and D. D. F.
Feng, “An ensemble of fine-tuned convolutional neural networks
for medical image classification,” IEEE Journal of Biomedi-
cal and Health Informatics, vol. 21, pp. 31–40, 01 2017,
doi:https://doi.org/10.1109/JBHI.2016.2635663.
[6] K. Sfakianoudis, E. Maziotis, S. Grigoriadis, A. Pantou, G. Kokkini,
A. Trypidi, P. Giannelou, A. Zikopoulos, I. Angeli, T. Vaxevanoglou,
K. Pantos, and M. Simopoulou, “Reporting on the value of artificial
intelligence in predicting the optimal embryo for transfer: A systematic
review including data synthesis,” Biomedicines, vol. 10, p. 697, 03 2022,
doi:https://doi.org/10.3390/biomedicines10030697.
[7] T. Baczkowski, R. Kurzawa, and W. Głabowski, “Methods of scoring in
in vitro fertilization,” Reproductive biology, vol. 4, pp. 5–22, 04 2004.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica-
tion with deep convolutional neural networks,” in Advances in Neural
Information Processing Systems, F. Pereira, C. Burges, L. Bottou, and
K. Weinberger, Eds., vol. 25. Curran Associates, Inc., 2012.
[9] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and
L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”
International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp.
211–252, 2015, doi:https://doi.org/10.1007/s11263-015-0816-y.
[10] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” in International Conference on Learning
Representations, 2015.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs:


Attention over Long/Short Memory
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176585

Ilias Siniosoglou∗ , Konstantinos Xouveroudis† , Vasileios Argyriou‡ , Thomas Lagkas§ ,


Sotirios K. Goudos¶ , Konstantinos E. Psannis∥ and Panagiotis Sarigiannidis∗

Abstract—In regards to the field of trend forecasting in time process non-Independent and Identically Distributed (non-IID)
series, many popular Deep Learning (DL) methods such as data [1]. This data is highly volatile and usually leads to lower
Long Short Term Memory (LSTM) models have been the gold model quality. This work investigates the effect of volatile
standard for a long time. However, depending on the domain
and application, it has been shown that a new approach can be federated data on benchmark Deep Neural Architectures in
implemented and possibly be more beneficial, the Transformer order to specify their efficacy in the federated environment on
deep neural networks. Moreover, one can incorporate Federated everyday tasks.
Learning (FL) in order to further enhance the prospective utility Under the broader category of Deep Learning, Transformers
of the models, enabling multiple data providers to jointly train [2] are a particular type of Artificial Intelligence (AI) model
on a common model, while maintaining the privacy of their
data. In this paper, we use an experimental Federated Learning that excels in Natural Language Processing (NLP) [3] tasks in
System that employs both Transformer and LSTM models on particular. These models are unrelated to other popular types,
a variety of datasets. The sytem receives data from multiple such as Recurrent Neural Networks (RNNs) [4].
clients and uses federation to create an optimized global model. Federated Learning is a technique designed to overcome
The potential of Federated Learning in real-time forecasting is
various challenges associated with traditional Machine Learn-
explored by comparing the federated approach with conventional
local training. Furthermore, a comparison is made between the ing methods, such as the privacy dangers that come from the
performance of the Transformer and its equivalent LSTM in need to collect data from multiple sources [5].
order to determine which one is more effective in each given This paper introduces a Centralized Federated Learning
domain, which shows that the Transformer model can produce System (CFLS) that employs state-of-the-art DL techniques
better results, especially when optimised by the FL process.
Index Terms—Federated Learning, Deep Learning, LSTM,
to address the challenges posed by each provided use case
Transformers, Forecasting, Federated Dataset and also helps in deploying and evaluating the aforementioned
DNN architectures. For our purposes, we focus on Transformer
I. I NTRODUCTION models and how they compare to the popular Long Short Term
The development process of reliable AI models in the realm Memory (LSTM) [6] approach. In summary, this work offers
of Machine Learning (ML) often requires the gathering of data the following contributions:L
from multiple sources. This strategy of using varied data can • Creates an experimentation system to evaluate the per-
enhance the model’s predictive capabilities, as it is trained formance of Federated data on benchmark DNN archi-
on a broader range of patterns. Nevertheless, this approach tectures
has several drawbacks and risks, the primary concern being • Compares the performance of two architectures (LSTM,
privacy. All datasets from each source must be uploaded to a Transformer) on volatile (non-iid) federated data
single server, creating a strictly local ML technique that may The rest of this paper is organized as follows: Section II
intentionally or unintentionally expose sensitive information. features the related work. Section III, outlines the methodol-
Federated Learning is a technique aimed at addressing this ogy, while Section IV presents a series of quantitative results
difficulty, among others. Unfortunately, when talking about on the available data. The paper concludes with Section V.
federated data it is specified that usually, the system has to
∗ I. Siniosoglou and P. Sarigiannidis are with the Department of Electrical II. L ITERATURE R EVIEW
and Computer Engineering, University of Western Macedonia, Kozani, Greece Transformer models are widely used in the field, with no-
- E-Mail: {isiniosoglou, psarigiannidis}@uowm.gr
† K. Xouveroudis is with the R&D Department, MetaMind Innovations table examples including BERT [7], a method of pre-training
P.C., Kozani, Greece - E-Mail: kxouveroudis@metamind.gr Transformer models that has achieved top-level performance
‡ V. Argyriou is with the Department of Networks and Digital Media,
across various NLP tasks. In addition, [8] proposed RoBERTa,
Kingston University, Kingston upon Thames, United Kingdom - E-Mail:
vasileios.argyriou@kingston.ac.uk which made several optimizations to the BERT pre-training
§ T. Lagkas is with the Department of Computer Science, In- process, resulting in improved performance. Other noteworthy
ternational Hellenic University, Kavala Campus, Greece - E-Mail: publications consist of XLM-R [9], a variant of RoBERTa
tlagkas@cs.ihu.gr
¶ S. K. Goudos is with the Physics Department, Aritostle that is cross-lingual and has the ability to attain high-quality
University of Thessaloniki, Thessaloniki, Greece - E-Mail: representations for languages that it was not explicitly trained
sgoudo@physics.auth.gr on, as well as GPT-2 [10], a Transformer-based language
∥ K. E. Psannis is with the Department of Applied Informatics, School
of Information Sciences, University of Macedonia, Thessaloniki, Greece - model of significant scale that has demonstrated remarkable
E-Mail: kpsannis@uom.edu.gr performance overall.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

In the current era of machine learning, a substantial propor-


tion of ML models created to forecast time series data [11]
are still actively applied across various scientific disciplines.
It’s no surprise that there have been attempts to implement
Transformers for time series forecasting as well, such as
Informer [12], The Influenza Prevalence Case [13] and the
proposed Probabilistic Transformer model [14].
In in 2016, Google introduced FL as a new sub-field of
ML [15], [16]. By implementing a framework designed around
FL, a model can be trained without the need for independent
providers to share raw data. This addresses several challenges
in the field, such as improving the efficiency of handling
large datasets or facilitating supervised training [17], but most
importantly, removing the possibility of disclosing otherwise
private information to outsiders.
Many published works have already experimented with
FL regarding time series forecasting. In their recent work
[18], the authors introduced a Federated LSTM system for
safeguarding Intelligent Connected Vehicles against network
intrusion threats by predicting message IDs, thereby protecting
all vehicles connected to the specific network. [19] leverages
Fig. 1: Transformer Architecture
FL to analyze time series data and proposes a framework
designed for detecting anomalies in the Industrial Internet of
Things (IIoT) domain. to fit within the range of [0, 1]. This reduces the gap between
Furthermore, attempts to utilize Transformers in FL have data points and enhances the model’s overall performance.
already been published, such as FedTP [20], which concludes
that in the presence of data heterogeneity, FedAvg algorithms B. Transformer Neural Networks
can have a detrimental effect on self-attention, which Trans- Transformer models fall under the category of DL, using
formers rely on. Another application is AnoFed [21], which an encoder-decoder arrangement for their design. While more
concentrates on the challenges of anomaly detection. classic methodologies, such as RNNs, process their input
III. M ETHODOLOGY sequentially, Transformers are designed to concurrently un-
derstand the connections between each element in a sequence.
A. Preprocessing
Transformers can potentially be more effective in capturing
Effective data preprocessing is considered equally important long-term dependencies and connections between various se-
as building the actual neural network model, as the predictive quence pieces, and therefore maintaining context in a DL task
performance of the network is significantly influenced by with no restrictions, as opposed to RNNs and their popular
the format of the data input, placing the task at a similar subcategory, the Long-Short Term Memory (LSTM) models.
importance level as the role of implementation and opti- Typically, models with an encoder-decoder structure are
mization. This highlights the significance of efficient data limited, as the encoder is set to modify the input data into a
preprocessing, as noted in the empirical study by Huang et al. vector of fixed length, which leads to the data processed by the
[22]. Preprocessing encompasses a range of methods utilized decoder being partially incomplete. Transformers, however,
to modify the input data to meet the requirements of the apply an attention mechanism [24], which allows the network
specific task. to produce output by focusing on specific segments of the input
Within the context of this work, the preprocessing steps sequence, boosting performance. The encoder and the decoder
remained consistent for each dataset, beginning with resam- work in parallel, the former being responsible for transforming
pling since all the data fell under the time series category. the input into a vector containing context information for the
Resampling refers to altering the frequency of time series whole sequence, whereas the latter is in charge of deciphering
observations. For instance, a dataset uneven measurements the context and meaningfully resolving the output.
may be sampled at constant intervals. In a DL task, the order of components makes up a sequence.
An additional possiblity that needs to be covered is miss- Models that receive data sequentially don’t face any difficulties
ing values, which are frequently encountered. We employed in this department, however Transformers need to assign each
forward filling [23], which replaces the missing value with component a relative position. This process is called positional
the preceding value. If any missing values persist, they are encoding [2] and it’s carried out through clever use of the sin
replaced with zeros. This enables us to retain the entire row, and cos functions:
even if one feature is absent, as it may contain significant
information. The final step involves scaling the remaining data P E(pos,2i) = sin (pos/100002i/dmodel ) (1)

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

P E(pos,2i+1) = cos (pos/100002i/dmodel ) (2) embedding. Subsequently, the residual connection’s output
is normalized via layer normalization and is then projected
Where pos represents the position of the element within
through a pointwise feed-forward network to be further pro-
the sequence, i denotes the index of the dimension in the
cessed. This network, also seen in Fig. 4, consists of linear
embedding vector, and dmodel refers to the embedding’s di-
layers with a ReLu activation function sandwiched between
mensionality. The sin formula (Eq. 1) is used to build a vector
them. The output of this network is normalized and added to
for each even index and the cos formula (Eq. 2) for each odd
the layer normalization as well. The Decoder, like the Encoder,
index. Then, those vectors are incorporated into the parts of
has two multi-headed attention layers, a feed-forward layer,
the input that correspond to them.
residual connections, and layer normalization. It generates
We can see the Transformer model’s complete structure via
tokens sequentially using a start token and is auto-regressive.
the diagram in Fig. 1. The Encoder layer has two sub-modules:
The first attention layer calculates scores for the input using
multi-headed attention and a completely interconnected net-
positional embeddings. The final layer is a classifier, with the
work. There are also residual connections encircling each of
Encoder providing attention-related information.
the two sub-layers, and additionally, a normalization layer. To prevent the conditioning of future tokens in the first at-
Self-attention is applied to the encoder via the multi-headed tention layer, a masking process is implemented. This modifies
attention module. This feature allows each input component attention scores for future tokens using look-ahead masking,
to be connected to other components. In order to achieve self- which involves setting them to ”-inf”. Such is displayed in Eq.
attention, we need to feed the input into three distinct and 4, where a) the first matrix represents the scaled scores, b) the
fully connected layers, (i) the Query Q, (ii) the Key K and second matrix represents the look-ahead mask added to it and
(iii) the Values V . and c) the last matrix represents the new, masked scores.
The overall process can be seen in Fig. 2. Once the Q, K,
and V vectors have been passed through a linear layer each, 
0.4 0.1 0.1 0.1 0.2
 
0 −inf −inf −inf −inf
 
0.4 −inf −inf −inf −inf

−inf −inf −inf  −inf −inf −inf 
we compute the attention score, which determines the degree 0.3

0.5

0.9
0.8
0.1
0.5
0.1
0.4
0.2
0.9
 0
+
 

0
0
0 0 −inf
 0.3
−inf  =
 

0.5
0.9
0.8 0.5 −inf

−inf 
 (4)
−inf  0.2 −inf 
of importance that one component, such as a word, should 0.2
0.2
0.6
0.1
0.7
0.3
0.9
0.9
0.7 0
0.5 0
0
0
0
0
0
0 0 0.2
0.6
0.1
0.7
0.3
0.9
0.9 0.5
assign to other components. In order to calculate the attention
When utilizing attention, the masking process is executed
score, the Q vector is dot-multiplied with all other K vectors
between the scaler and the softmax layer (Fig. 2). The new
within the arrangement of data.
scores are then normalized through softmax, making all future
To achieve greater gradient stability, the scores are scaled
token values zero, and eliminating their impact. The Decoder’s
down by dividing them with the square root of the dimensions
second attention layer aligns the input of the Encoder with that
of the query and key. Finally, the attention weights are derived
of the Decoder, emphasizing the most significant information
by applying the softmax function to the scaled score, resulting
between them. The output of this layer undergoes further
in probability values [0 − 1], with a higher value indicating a
processing before being fed into a linear and a softmax layer,
greater level of significance for the component, seen in Eq. 3.
and the highest score index is chosen as the prediction. The
QK T Decoder can be stacked on multiple layers, resulting in added
Attention(Q, K, V ) = sof tmax( √ )V (3)
dk diversity and enhancing predictive capabilities.
Additionally, multi-headed attention can be enabled by par- C. Federated Learning
alleling the attention mechanism, enabling multiple processes Federated Learning is a modern technique for securely
to utilize it in conjunction. This is achieved by splitting the training decentralized Machine Learning and Deep Learning
query, key and value into multiple vectors prior to imple- models. FL [16] manages the distribution, training, and ag-
menting self-attention. A self-attention process is referred to gregation of DL models across multiple devices at the edge
as a head. Every head generates an output vector, which is
concatenated into a singular vector before entering the final
linear layer. The goal is for each head to learn information at
least partially exclusive to it, further expanding the encoder’s
capabilities. We can see a representation of multi-head atten-
tion at Fig. 3.
A residual connection is created by adding the multi-
headed attention output vector to the initial positional input

Fig. 2: Self-Attention Process Fig. 3: Multiple attention layers operating in parallel.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

synthetic data of the same caliber but differently geo-located


distributions. The purpose of generating synthetic data is to aid
the models in achieving better generalization [27]. Additional
information for the dataset is provided in Table I.
Fig. 4: The input and output of the pointwise feed-forward layer, connected
via a residual connection.
Farm Animal Welfare (Synthetic)
Training Features Air Humidity, Air Temperature, Ch4, CO2 Average
Target Feature Ch4
[25]. In the centralized FL used in this study, a central server Records >80K
selects a population Pf ∈ [1, N ] (N ∈ N∗ ) of devices where Nodes 5
0
a global model wGlobal is locally trained using the device’s TABLE I: Animal Stable Sensor Data
collected data Di∈N . The locally trained models wli , are
collected by the server, and an aggregation algorithm, such as 2) Product Sales Dataset: The last category has a distinct
Federated Averaging 5, is used to merge them into an updated goal, which is to predict future product sales using existing
k
model wGlobal . The updated model is then redistributed to information. Sales data spanning three years for seven distinct
remote devices, and this process is repeated over multiple FL products were made available, with each one treated as a
iterations. separate node. The datasets offer various details on the daily
N
k 1 X
unit sales of the products, as depicted in Table II, and compare
wG = P Di wik (5)
i∈N Di i=1 them to their corresponding figures from the previous year.
k Both of the described datasets include measurements from a
Above, wG notes the global model at the kth epoch and wik
different number of distributed data nodes. The data are non-
denotes the nodes’ ith model at the specific epoch. Figure 5
iid in the sense that the distribution of the information differs
illustrates the underlying mechanism of FL.
in each note. Furthermore, the data are volatile since both of
IV. E VALUATION the datasets include multifactor information the behavioural
A. Datasets pattern of which is affected by external factors that are not
depicted in the measured features but also are subject to
For the evaluation of the two model architectures, two frequent and unpredictable changes. In particular, in the first
distinct datasets are utilised. The first dataset pertains to the case, even though the stable sensors measure the conditions in
Smart Farming domain [26], while the last one concerns each stable, the behaviour and number of animals is not none.
Product Sales of an industrial ecosystem. This is a factor that, of course, changes the measurements since
1) Smart Farming Dataset: The first dataset consists of the number of animals in the stable at a specific moment,
time-based sensor readings collected from sensors located the quality of animal feed, etc. are variables that cannot be
inside distributed animal stables. The dataset includes 80,000 easily quantified. Conversely, the second dataset, even though
sets of data. The objective is to leverage the available sensor it includes accurate measurements of the production and sales
data to anticipate future stable conditions and ensure the well- of the produced products, can’t depict the consumer behaviour
being of the animals residing in these facilities. affected for example by sales or increased advertisement. In
Due to the limited and uneven information collected by this aspect, the employed models have to be able to generalise
the field sensors, the dataset has been augmented to include with the available information producing quantifiable results.

B. Experimental Results
To validate the performance of the tested models, three
benchmark metrics were used, outlined below in Eq. 6, 7, and
8. Here, Ti represents the true values, Pi represents the model
predictions, and n denotes the total number of data points.
1) Mean Square Error (MSE): This metric calculates the
average of the squared differences between the predicted and

Daily Product Sales


Daily Sales, Daily Sales (Previous Year),
Daily Sales (Percentage Difference), Daily Sales KG,
Daily Sales KG (Previous Year)
Training Features
Daily Sales KG (Percentage Difference)
Daily Returns KG, Daily Returns KG (Previous Year),
Points of Distribution, Points of Distribution (Previous Year)
Target Features Daily Sales, Daily Returns KG
Records 7K
Nodes 7
Fig. 5: Federated Learning Architecture TABLE II: Daily Product Sales Dataset Information

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

(a) Mean Square Error


MSE
LSTM Transformer
Local Federated Local Federated
Node0 0,3124 0,3153 0,3247 0,3206
Node1 0,0185 0,0189 0,0157 0,0149
Node2 0,0642 0,0674 0,0655 0,0649
Node3 0,1174 0,1023 0,0909 0,0689
Node4 0,0151 0,0203 0,0094 0,0149
Node5 0,0651 0,0565 0,0689 0,0359
Node6 0,0237 0,0372 0,0359 0,0230
Average 0,0881 0,0883 0,0873 0,0776

(b) Root Mean Square Error


RMSE
LSTM Transformer
Local Federated Local Federated
Node0 0,4711 0,4726 0,5699 0,5662
Node1 0,1303 0,1312 0,1252 0,1223
Node2 0,2359 0,2440 0,2559 0,2548
Fig. 6: Stable Sensor Data Evaluation Node3 0,2700 0,2618 0,3015 0,2624
Node4 0,0929 0,1366 0,0971 0,1222
Node5 0,2477 0,2077 0,2626 0,1895
Node6 0,1462 0,1778 0,1896 0,1517
Average 0,2277 0,2331 0,2574 0,2384

(c) Mean Absolute Error


MAE
LSTM Transformer
Local Federated Local Federated
Node0 0,1347 0,1340 0,1336 0,1243
Node1 0,0933 0,1036 0,0836 0,0908
Node2 0,0868 0,1114 0,0857 0,0967
Node3 0,2166 0,2131 0,1758 0,1695
Node4 0,0720 0,1187 0,0549 0,1005
Node5 0,2165 0,1689 0,2169 0,1252
Node6 0,0972 0,1343 0,1521 0,0965
Average 0,1310 0,1406 0,1289 0,1148

TABLE IV: Results - Daily Sales Dataset


Fig. 7: Product Sales Evaluation

2) Root Mean Square Error (RMSE): RMSE i a very useful


evaluation metric as it shares the same units as the response
actual values. A model that produces no errors would result variable, obtained by taking the square root of MSE. Eq. (7)
in an MSE of zero. Eq. (6) displays the formula for MSE: displays the formula for RMSE:
v
n u n
1X u1 X
M SE = (Ti − Pi )2 (6) RM SE = t (Ti − Pi )2 (7)
n i=1 n i=1

3) Mean Absolute Error (MAE): This metric involves


(a) Mean Square Error
calculating the average of the absolute differences between
MSE the predicted and actual values. It is a popular choice for
LSTM Transformer
Local Federated Local Federated evaluation because it provides a straightforward interpretation
Node0 0.0054 0.0311 0.0081 0.0049 of the difference between the two. The mathematical formula
Node1 0.0086 0.0137 0.0096 0.0051
Node2 0.0074 0.0133 0.0078 0.0050 for MAE is shown in Eq. (8):
Node3 0.0061 0.0136 0.0074 0.0050
n
Node4 0.0181 0.0146 0.0089 0.0051 1X
Average 0.0091 0.0173 0.0084 0.0050 M AE = |Ti − Pi | (8)
n i=1
(b) Root Mean Square Error
RMSE
LSTM Transformer The outcomes of the experiments are demonstrated in Tables
Node0
Local
0.0733
Federated
0.1764
Local
0.0901
Federated
0.0702
III and IV, which correspond to the Animal Welfare datasets
Node2 0.0861 0.1152 0.0883 0.0710 and the Product Sales dataset, respectively. Each table is
Node3 0.0782 0.1168 0.0859 0.0707
Node4 0.1347 0.1210 0.0942 0.0716 divided into three subtables, one for each of the evaluation
Average 0.0930 0.1293 0.0913 0.0709 metrics. In addition, we can see the aforementioned tables’
(c) Mean Absolute Error results visualized in Figures 6 and 7.
MAE Based on the findings, we can infer that in most cases
LSTM Transformer
Local Federated Local Federated (nodes) the Transformer models show increased generalisation
Node0 0.0551 0.0935 0.0654 0.0525
Node1 0.0763 0.0792 0.0715 0.0533 ability over the LSTM models. This can be attributed to the
Node2
Node3
0.0712
0.0600
0.0731
0.0792
0.0643
0.0608
0.0524
0.0539
multi-head attention mechanism that is able to keep track
Node4 0.1073 0.0830 0.0718 0.0534 of multiple key pieces of information in the accumulation
Average 0.0740 0.0816 0.0668 0.0531
process. This is further augmented when the models are
TABLE III: Results - Animal Welfare Prediction Loss [Synthetic] federated since the produced global model contains mutual

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

knowledge from the distributed data nodes. As can be seen in [7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
the results, the Federated Transformer models are superior to of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
both the local and Federated LSTM models. [8] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
A more pronounced distinction is observed in the stable L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
sensor data. In the local training, the LSTM and Transformer pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[9] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek,
models exhibit similar performance. However, when subjected F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsu-
to FL, the Transformer displays substantially lower loss while pervised cross-lingual representation learning at scale,” arXiv preprint
it produces more consistent results across its nodes overall for arXiv:1911.02116, 2019.
[10] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al.,
both of its models and unlike the LSTM, the Transformer’s “Language models are unsupervised multitask learners,” OpenAI blog,
FL models outperform their local counterparts. vol. 1, no. 8, p. 9, 2019.
Lastly, in the forecasting of Daily Product Sales data, we [11] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecast-
ing,” International journal of forecasting, vol. 22, no. 3, pp. 443–473,
observe another scenario where the performance of the two 2006.
models is comparable. Yet, in this instance, the Transformer [12] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang,
model also outperforms the LSTM. This pattern holds for both “Informer: Beyond efficient transformer for long sequence time-series
forecasting,” in Proceedings of the AAAI conference on artificial intel-
local and Federated Learning of the models. ligence, vol. 35, no. 12, 2021, pp. 11 106–11 115.
[13] N. Wu, B. Green, X. Ben, and S. O’Banion, “Deep transformer mod-
V. C ONCLUSIONS els for time series forecasting: The influenza prevalence case,” arXiv
preprint arXiv:2001.08317, 2020.
Federated Learning is a valuable attribute that aids in safe- [14] B. Tang and D. S. Matteson, “Probabilistic transformer for time series
guarding data confidentiality, as well as offering model quality analysis,” Advances in Neural Information Processing Systems, vol. 34,
pp. 23 592–23 608, 2021.
enhancement and quicker processing times for large datasets [15] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and
and real-time predictions. In this paper, we compare the D. Bacon, “Federated learning: Strategies for improving communication
performance of two benchmark DNN architectures, namely, efficiency,” arXiv preprint arXiv:1610.05492, 2016.
[16] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
a) LSTMs and b) Transformers, for timeseries forecasting “Communication-efficient learning of deep networks from decentralized
in real-world federated datasets containing non-iid distribu- data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–
tions. We further employed FL in conjunction to evaluate 1282.
[17] A. Taı̈k and S. Cherkaoui, “Electrical load forecasting using edge com-
the performance of the local models against the federated puting and federated learning,” in ICC 2020-2020 IEEE International
global model that is trained on knowledge from distributed Conference on Communications (ICC). IEEE, 2020, pp. 1–6.
data nodes. The experimental results showed that the perfor- [18] T. Yu, G. Hua, H. Wang, J. Yang, and J. Hu, “Federated-lstm based net-
work intrusion detection method for intelligent connected vehicles,” in
mance of both the local and Federated Transformer models ICC 2022-IEEE International Conference on Communications. IEEE,
was comparable to LSTM models. The results indicate that 2022, pp. 4324–4329.
consistently Transformers have the potential to compete with [19] Y. Liu, S. Garg, J. Nie, Y. Zhang, Z. Xiong, J. Kang, and M. S.
Hossain, “Deep anomaly detection for time-series data in industrial
and even surpass LSTMs in terms of performance in predicting iot: A communication-efficient on-device federated learning approach,”
on federated data intricately affected by external factors, not IEEE Internet of Things Journal, vol. 8, no. 8, pp. 6348–6358, 2020.
always included in the data features. [20] H. Li, Z. Cai, J. Wang, J. Tang, W. Ding, C.-T. Lin, and Y. Shi, “Fedtp:
Federated learning by transformer personalization,” arXiv preprint
arXiv:2211.01572, 2022.
ACKNOWLEDGEMENT [21] A. Raza, K. P. Tran, L. Koehl, and S. Li, “Anofed: Adaptive anomaly de-
This project has received funding from the European tection for digital health using transformer-based federated learning and
support vector data description,” Engineering Applications of Artificial
Union’s Horizon 2020 research and innovation programme Intelligence, vol. 121, p. 106051, 2023.
under grant agreement No. 957406 (TERMINET). [22] J. Huang, Y.-F. Li, and M. Xie, “An empirical analysis of data
preprocessing for machine learning-based software cost estimation,”
R EFERENCES Information and software Technology, vol. 67, pp. 108–127, 2015.
[23] W. McKinney et al., “pandas: a foundational python library for data
[1] M. Tiwari, I. Maity, and S. Misra, “Fedserv: Federated task service analysis and statistics,” Python for high performance and scientific
in fog-enabled internet of vehicles,” IEEE Transactions on Intelligent computing, vol. 14, no. 9, pp. 1–9, 2011.
Transportation Systems, vol. 23, no. 11, pp. 20 943–20 952, 2022. [24] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, jointly learning to align and translate,” arXiv preprint arXiv:1409.0473,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in 2014.
neural information processing systems, vol. 30, 2017. [25] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated
[3] P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, “Natural learning,” Synthesis Lectures on Artificial Intelligence and Machine
language processing: an introduction,” Journal of the American Medical Learning, vol. 13, no. 3, pp. 1–207, 2019.
Informatics Association, vol. 18, no. 5, pp. 544–551, 2011. [26] C. Chaschatzis, A. Lytos, S. Bibi, T. Lagkas, C. Petaloti, S. Goudos,
[4] S. Grossberg, “Recurrent neural networks,” Scholarpedia, vol. 8, no. 2, I. Moscholios, and P. Sarigiannidis, “Integration of information and
p. 1888, 2013. communication technologies in agriculture for farm management and
[5] I. Siniosoglou, V. Argyriou, S. Bibi, T. Lagkas, and P. Sarigiannidis, knowledge exchange,” in 2022 11th International Conference on Modern
“Unsupervised ethical equity evaluation of adversarial federated Circuits and Systems Technologies (MOCAST), 2022, pp. 1–4.
networks,” in Proceedings of the 16th International Conference on [27] L. Perez and J. Wang, “The effectiveness of data augmentation in
Availability, Reliability and Security, ser. ARES 21. New York, NY, image classification using deep learning,” 2017. [Online]. Available:
USA: Association for Computing Machinery, 2021. [Online]. Available: https://arxiv.org/abs/1712.04621
https://doi.org/10.1145/3465481.3470478
[6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Greek Orthodox Church Hymns Recognition Using


Deep Learning Techniques
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176857

Nikolaos Tsakatanis Lazaros A. Iliadis Achilles D. Boursianis


ELEDIA@AUTH, School of Physics ELEDIA@AUTH, School of Physics ELEDIA@AUTH, School of Physics
Aristotle University of Thessaloniki Aristotle University of Thessaloniki Aristotle University of Thessaloniki
Thessaloniki, Greece Thessaloniki, Greece Thessaloniki, Greece
ntsakata@physics.auth.gr liliadis@physics.auth.gr bachi@physics.auth.gr

Konstantinos-Iraklis D. Kokkinidis Georgios Patronas Pavlos Serafeim


Dept. of Applied Informatics Dept. of Music Science and Art Dept. of International and European Studies
University of Macedonia University of Macedonia University of Macedonia
Thessaloniki, Greece Thessaloniki, Greece Thessaloniki, Greece
kostas.kokkinidis@uom.edu.gr gpatronas@uom.edu.gr serapavl@uom.edu.gr

Maria S. Papadopoulou Sotirios K. Goudos


Dpt. of Information and Electronic Eng. ELEDIA@AUTH, School of Physics
International Hellenic University Aristotle University of Thessaloniki
Sindos, Greece Thessaloniki, Greece
mspapa@ihu.gr sgoudo@physics.auth.gr

Abstract—Deep learning techniques have found numerous classification problems lies in their ability to apply filters for
applications in the sector of cultural and religious heritage. Greek feature extraction, by exploiting images’ grid topology.
Orthodox Church hymns are examples of Byzantine religious
tradition. Their performance is governed by stringent rules not
observed in other cultures. This work trains a novel shallow Greek Orthodox Church hymns are part of the Orthodox
convolutional neural network to identify Greek Orthodox Church Christian tradition and their vocal performance is not accom-
hymns using a private dataset. For ten of the most frequently panied by any musical instrument. The ML-based recognition
chanted hymns, the deep learning model achieves over 98% and classification of the above-mentioned hymns is a challeng-
accuracy. AlexNet and MobileNetV2 are trained on the same data ing problem. This is due to the fact that they are vocal music
to evaluate their performance. Additional analysis is conducted
to validate the obtained results. whose performance depends on the chanting voice. In addition,
Index Terms—Byzantine hymns, deep learning, convolutional byzantine hymns’ climaxes differ from the Western ones.
neural networks, Mel-spectrograms This work collects a corpus of 10 Greek Orthodox Church
hymns, having more than 200 samples each. The audio data is
I. I NTRODUCTION transformed into images in the Mel scale (Mel-spectrograms),
and then a novel shallow CNN is trained to classify them
The cultural and religious heritage sectors require the study
correctly. To the best of the authors’ knowledge, this is the
of a diverse and large amount of data. Recently, machine learn-
first time that DL techniques are applied to the task of Greek
ing (ML) approaches have been considered complementary
Orthodox Church hymns recognition.
ones to traditional techniques [1]. In addition, deep learning
(DL) based computer vision (CV) algorithms are employed in
This paper is part of the "Ymnodos" project, which aims
cases where useful information needs to be extracted from
to develop software suitable for mobile devices to identify
the available data. More specifically, convolutional neural
Greek Orthodox Church hymns. The software will be based
networks (CNNs) have been trained to perform paintings’
on artificial intelligence and signal-processing techniques. The
style identification [2], the music features extraction [3], and
ultimate goal is to further promote Byzantine cultural heritage
optical music classification [4]. The success of CNNs in image
for educational and religious purposes.
This research was carried out as part of the project «Recognition and
direct characterization of cultural items for the education and promotion of The rest of this treatise is organized as follows: Section II
Byzantine Music using artificial intelligence» (Project code: KMP6-0078938) provides the architecture of the designed model. The exper-
under the framework of the Action «Investment Plans of Innovation» of the
Operational Program «Central Macedonia 2014 2020», that is co-funded by iments and the results form Section III. Section IV presents
the European Regional Development Fund and Greece. the conclusions of this work.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Fig. 1. Architecture of the proposed CNN model

II. A LGORITHMS DESCRIPTION


Mobile devices have limited computational resources, hence
the trained DL models should be suitable for such config-
urations. The core elements of the proposed CNN are the
two convolution layers. In addition, two max-pooling layers
and three fully-connected layers are utilized. The convolution Fig. 2. An example of a Mel-spectrogram of our dataset
layers apply a (3 × 3) filter to their inputs. The activation
function in the convolution layers is Rectified linear unit
(ReLU), which is defined as, the audio data is transformed into images that contain valuable
information for the listeners, namely Mel-spectrograms. Mel-
f (u) = max(0, u) (1) spectrograms are spectrograms transformed in the Mel scale.
The Mel scale can be obtained as follows,
Convolution layers perform the feature extraction procedure,  ν 
exploiting the image’s grid topology. Max-pooling layers cre- m = 2595 log 1 + . (2)
ate a down-sampled feature map after calculating the max- 700
imum value for their input feature map patches. The first Here, ν is the frequency measured in Hz.
one performs the flattening operation from the three fully- The Mel scale is an alternative scale of pitches that results
connected layers, which transforms the two-dimensional array from listeners’ judgment that a pair of pitches are an equal
into a one-dimensional vector. The model’s architecture is distance from one another. A reference point between the two
depicted in Fig. 1. scales is established by setting a perceptual pitch of 1000 Hz in
To further evaluate our model’s performance, two other the normal frequency scale, equal to 1000 Mels. An example
pre-trained on ImageNet dataset [5] DL classifiers, namely of a Mel-spectrogram of our dataset is depicted in Fig. 2.
AlexNet [6] and MobileNetV2 [7] are trained on the same The dataset consists of ten classes with 210 samples each
dataset. AlexNet is structured with five convolutional layers, and it is split into three different sets. 70% of the initial data
while three max-pooling and three fully-connected layers are formed the training set, 15% is the validation set and the rest
employed. AlexNet was introduced in the 2012 Imagenet 15% is the test set. No common data points exist among the
large-scale visual recognition and outperformed the state-of- three sets.
the-art classical CV algorithms [5], [8]. B. DL models’ training
MobileNetV2 is a DL model designed specifically to per-
AlexNet and MobileNetV2 are pre-trained on the ImageNet
form well on mobile devices. Its key element is the bottleneck
dataset, while our model is trained from scratch. Adam (adap-
depth-separable convolution with residuals. Depth-separable
tive moment estimation) algorithm [9] is utilized to update the
convolution splits a classical convolutional layer into two.
DL models’ weights. The learning rate is constant and equal to
The first one, namely depth-wise convolution, applies a con-
10−4 for all models. The generalization ability of our model
volutional filter per input channel, while the second layer,
is studied using the 10-fold cross-validation process, while
namely point-wise convolution, is a 1 × 1 convolution, that
the early stopping technique was employed. The training was
computes linear combinations of the input channels and builds
performed on an NVIDIA RTX 3080 Ti GPU. In Table I, the
new features. The use of linear bottleneck layers prevents the
experiments’ results are summarized.
destruction of valuable information. It is evident that both AlexNet and MobileNet exhibit severe
III. E XPERIMENTS AND R ESULTS overfitting. On the contrary, the shallow CNN achieves great
results in all sets, which means that it has learned to extract
A. Dataset valuable information from Mel-spectrograms.
In this work, three DL models are trained to recognize Another way to examine ML models’ performance is the
religious hymns. The private dataset consists of ten classes utilization of the confusion matrix [10]. In such matrices, the
with 210 samples each. Each sample represents a different major diagonal depicts the correct decisions made by the clas-
performance of the same hymn. To exploit current CV tools, sifier and the other cells represent the errors (confusion) among

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

other DL architectures suitable for mobile devices are trained


on the same task. The experiments prove the superiority of
our model in the task of Greek Orthodox Church hymns
recognition. Future work includes the construction of a larger
dataset with more hymns, the utilization of other DL and CV
methodologies, and the creation of suitable software for mobile
devices.
R EFERENCES
[1] M. Fiorucci, M. Khoroshiltseva, M. Pontil, A. Traviglia, A. Del
Bue, and S. James, “Machine learning for cultural heritage: A
survey,” Pattern Recognition Letters, vol. 133, pp. 102–108, 2020,
doi:https://doi.org/10.1016/j.patrec.2020.02.017.
[2] G. Castellano and G. Vessio, “Deep learning approaches to pattern
extraction and recognition in paintings and drawings: an overview,”
Neural Computing and Applications, vol. 33, p. 12263–12282, 10 2021,
doi:https://doi.org/10.1007/s00521-021-05893-z.
[3] Q. Lin and B. Ding, “Music score recognition method based
on deep learning,” Intell. Neuroscience, vol. 2022, jan 2022,
doi:https://doi.org/10.1155/2022/3022767.
Fig. 3. Prediction accuracy of the proposed CNN [4] F. F. De Vega, J. Alvarado, and J. V. Cortez, “Optical music recog-
nition and deep learning: An application to 4-part harmony,” in 2022
IEEE Congress on Evolutionary Computation (CEC), 2022, pp. 01–07,
doi:https://doi.org/10.1109/CEC55065.2022.9870357.
[5] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma,
Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and
L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,”
International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp.
211–252, 2015, doi:10.1007/s11263-015-0816-y.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Proceedings of the 25th
International Conference on Neural Information Processing Systems -
Volume 1, ser. NIPS’12. Red Hook, NY, USA: Curran Associates Inc.,
2012, p. 1097–1105.
[7] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
“Mobilenetv2: Inverted residuals and linear bottlenecks,” in The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), June
2018.
[8] W. Rawat and Z. Wang, “Deep convolutional neural networks for image
classification: A comprehensive review,” Neural Computation, vol. 29,
no. 9, pp. 2352–2449, 2017, doi:10.1162/neco_a_00990.
[9] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
tion,” in 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track
Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.
Fig. 4. Confusion matrix of the shallow CNN [10] T. Fawcett, “An introduction to roc analysis,” Pattern Recognition
Letters, vol. 27, no. 8, pp. 861–874, 2006, rOC Analysis in Pattern
TABLE I Recognition.
T RAINING OF DL MODELS ON THE TASK OF G REEK O RTHODOX C HURCH
HYMNS RECOGNITION : R ESULTS

Model Train accuracy Test accuracy


AlexNet 98.1 54.19
MobileNet 95.25 43.62
This work 98.91 98.38

the various classes. The shallow CNN has predicted either


each class correctly or has made at most two mistakes. For
the shallow CNN model, the confusion matrix is illustrated in
Fig. 4, which validates the model’s satisfactory performance.
IV. C ONCLUSIONS
This work trains a novel shallow CNN model to recognize
Greek Orthodox Church hymns. Hymns from a private dataset
are converted to Mel-spectrograms, and then they served as
input to the CNN model. To compare its performance, two

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Model-Agnostic Meta-Learning Techniques: A


State-of-The-Art Short Review
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176893

Aikaterini I. Griva Achilles D. Boursianis Lazaros A. Iliadis


ELEDIA@AUTh, School of Physics ELEDIA@AUTh, School of Physics ELEDIA@AUTh, School of Physics
Aristotle University of Thessaloniki Aristotle University of Thessaloniki Aristotle University of Thessaloniki
Thessaloniki, Greece Thessaloniki, Greece Thessaloniki, Greece
aigriva@physics.gr bachi@physics.auth.gr liliadis@physics.auth.gr

Panagiotis Sarigiannidis George Karagiannidis Sotirios K. Goudos


Department of Electrical and Computer School of Electrical and Computer ELEDIA@AUTh, School of Physics
Engineering Engineering Aristotle University of Thessaloniki
University of Western Macedonia Aristotle University of Thessaloniki Thessaloniki, Greece
Kozani, Greece Thessaloniki, Greece sgoudo@physics.auth.gr
psarigiannidis@uowm.gr geokarag@auth.gr

Abstract—In the last few years, novel meta-learning techniques how to learn. In contrast to conventional approaches, meta-
have established a new field of research. Great emphasis is given learning intends to enhance the learning algorithm itself, taking
to few-shot learning approaches, where the model is trained by into account the knowledge gained from previous learning
using only a few training examples. In this work, a review of
several model-agnostic meta-learning methodologies (MAML) is episodes [3]. One famous meta-learning method is the model-
presented. Firstly, we identify and discuss the typical characteris- agnostic meta-learning (MAML) algorithm. MAML is char-
tics of the first proposed MAML algorithm. Next, we classify the acterized by its simplicity and the fact that can be applied
model-agnostic approaches into three main categories: Regular to any problem whose solution is approximated with gradient
gradient descent MAML, Hessian-free MAML, and Bayesian descent.
MALM, by presenting their advantages and limitations. Finally,
we conclude this work with further discussion and highlight the The TERMINET project aims at providing a novel next-
future research directions. generation reference architecture based on cutting-edge tech-
Index Terms—Model-agnostic meta-learning, meta-learning, nologies. Within this project, both the multi-task and meta-
deep learning, few-shot learning, one-shot learning learning perspectives will be supported to enable personalized
or device-specific modeling. The Meta-Learning technique
I. I NTRODUCTION will be used to optimize the performance of each device
based on the given results of few-shot adaptation examples
Humans have the great ability to learn quickly. We can
on heterogeneous tasks. Moreover, MAML algorithms will
recognize objects from a few examples, we can learn new tasks
be considered and adopted as guiding approaches that offer
based on prior experience and a small amount of information,
improved personalized models for a large majority of the
thus we are expecting our deep neural networks to perform
Internet of Things (IoT) end devices.
the same. However, large amounts of data are required from
such models to perform effectively. This is the reason why Motivated by the above, the current contribution provides a
we are constantly investigating approaches such as few-shot state-of-the-art review of the existing research from the area
learning techniques to decrease the amount of data and at the of MAML methodologies. In detail, the contribution of this
same time improve our models’ performance. paper is summarized as follows:
Few-shot learning is a challenging approach to training a 1) Three state-of-the-art MAML techniques, namely regu-
learning model by using only a few examples. For multiple lar gradient descent MAML, hessian-free MAML, and
tasks, a common feature representation can be learned using bayesian MAML are identified and thoroughly studied.
prior experience [1]. Recently, a meta-learning approach has 2) The principles, advantages, and limitations of each al-
been used to address the issue of few-shot learning. Meta- gorithm are highlighted.
learning studies how intelligent systems can increase their The rest of this paper is organized as follows: Section II in-
performance through experience [2]. In other words, we troduces the first MAML algorithm that was proposed in 2017.
can describe meta-learning as the mechanism for learning Section III presents the most popular MAML techniques and
classifies them according to their characteristics. Section IV
This work has received funding from the European Union’s Horizon
2020 research and innovation programme under grant agreement No. 957406 explores the future directions, whereas Section V concludes
(TERMINET). this work and summarizes its findings.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Algorithm 1 Pseudocode of the first proposed MAML algo-


rithm
Require: distribution p(T ) over tasks
Require: hyperparameters step size α, β initialization
Initialize randomly the parameters w of the parameterized
model fw
while task do
Sample tasks {Ti } ∼ p(T )
for all {Ti } do
Compute ∇w LTi (fw ) w.r.t. N examples

wi ← w − α∇w LTi (fw )
end for P
w ← w − β∇w Ti ∼p(T ) LTi (fw′ )
i
end while
Fig. 1. Model-agnostic meta-learning algorithm representation

II. M ODEL -AGNOSTIC M ETA -L EARNING ALGORITHM • Hessian-free MAML algorithms, and
• Bayesian MAML algorithms.
MAML is an algorithm for meta-learning that was proposed
in 2017 by Chelsea Finn et al. [3]. It is called model agnostic
A. Regular gradient descent MAML algorithms
because it was designed to perform with any model trained
with gradient descent and also to be applicable to a wide range The main characteristic of the methods that can be found
of learning problems mainly in the deep learning framework. in the first technique is that higher-order derivatives are used
The core idea is to train the initial parameters of the model in the backpropagation of the error procedure. The algorithms
to maximize its performance on a new task after updating are proposed to optimize the main MAML algorithm in terms
those parameters, Fig. 1. This can be achieved through one of computational efficiency and stability. Moreover, they can
or more gradient steps using a small amount of data from the be extended to problems with previously unseen classes.
new task. The proposed algorithm neither increases the number • Implicit Model-Agnostic Meta-Learning (iMAML) was
of learned parameters nor imposes restrictions on the model’s proposed by Aravind Rajeswaran et al. [5] in 2019. This
architecture. Furthermore, it can be used with a variety of loss algorithm was developed in order to achieve significant
functions, and the idea is to maximize the sensitivity of those gains in computing and memory efficiency. iMAML gives
loss functions of new tasks with respect to the parameters. the opportunity to select the suitable inner optimization
Training the parameters of the model such that a few or method. In this algorithm, there is no need of differ-
even a single gradient step can lead to better results on a new entiating through the optimization path. The algorithm’s
task can be generalized as building an internal representation goal is to learn a set of parameters and lead to good
that is broadly applicable to a wide range of tasks. Finn generalization for various tasks based only on the solution
et al. [3] experimentally evaluated the algorithm in few- to the inner-level optimization. At the same time an
shot classification, supervised regression, and reinforcement iMAML subvariant, i.e. the Hessian-free subvariant was
learning (RL) problems. However, the MAML algorithm has a also proposed achieving slightly better results in the few-
number of issues that can limit the generalization performance shot classification problem.
of the model, reduce the flexibility of the framework, and • Adaptive Model-Agnostic Meta-Learning (Alpha-
lead to instabilities during training. Moreover, MAML can be MAML) was proposed by Harkirat Singh Behl et al. [6]
computationally expensive during training and inference and in 2019 as an extension to MAML. Alpha-MAML
requires the model to go through a costly hyperparameter tun- increases the algorithm’s resistance to hyperparameter
ing. In order to face those weaknesses, an improved subvariant choices and improves stability. This algorithm was
of the MAML framework called MAML++ was proposed by evaluated on a few-shot image classification problem
Antreas Antoniou et al. in 2019 [4] and was experimentally and the results showed that is more robust than MAML
evaluated in few-shot learning tasks achieving better results. due to the less time that is required for tuning.
III. P OPULAR MAML ALGORITHM SUBVARIANTS • Out-of-distribution (OOD-MAML) is another extension
of the MAML algorithm, introduced by Taewon Jeong
During the last few years, several different subvariants of et al. [7] in 2020. Fake or out-of-distribution samples
the classical MAML algorithm can be found in the literature. that look like in-distribution samples are used to train the
In this work, some popular techniques are briefly presented. model in order to detect those OOD samples in new tasks.
To maintain consistency, the algorithms are classified into the This method achieves learning both the initialization of
following three categories: a model and the initial OOD-samples over the tasks. The
• Regular gradient descent MAML algorithms, experimental evaluation in classification tasks shows that

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

the proposed algorithm achieves higher accuracy than variational inference method with gradient-based meta-
MAML and, at the same time, detects OOD samples. learning. This variant is effective, precise, and robust.
At the same time, it is simple to implement despite the
B. Hessian-free MAML algorithms fact that training such big networks might be costly. This
Another interesting approach is the Hessian-free MAML algorithm was experimentally evaluated in classification,
methods that ignore second-order derivatives when backprop- regression, and RL problems recording good results.
agating in the meta-learning procedure to save a huge amount • Probabilistic Model-Agnostic Meta-Learning (PLATI-
in the network computation and at the same time maintain the PUS) was introduced by Chelsea Finn et al. [12] in 2018.
performance of the model. It is a probabilistic algorithm that can sample models for a
new task from a model distribution. PLATIPUS also uses
• First Order Model-Agnostic Meta-Learning a variational lower bound to train a parameter distribution.
(FOMAML) [3], [8] is a subvariant of the first proposed Moreover, it is simple and provides comparable results
MAML that ignores second derivatives. In that way, to MAML in experimental evaluation. However, it is less
it is simpler to implement and reduces computational efficient in assessing uncertainty because it provides a
complexity. However, it might lose gradient information. nearly poor estimation of posterior variance.
It is proven that FOMAML works as well as the MAML • Amortized Bayesian Meta-Learning (ABLM) proposed
algorithm in few-shot classification problems without by Ravi and Beatson [13] in 2018 is a technique that
affecting its performance. successfully employs hierarchical variational inference to
• Reptile is a method introduced by Alex Nichol et al. [8] in
learn a prior distribution. ABLM also provides a high-
2018. Reptile is similar to FOMAML and quite simple. It quality approximation posterior. Compared to MAML
is characterized by its ability to learn an initialization for and PLATIPUS, the experimental evaluation shows that
the model’s parameters. After the optimization of these this model achieves better computation in the uncertainty
parameters, the model can be generalized from only a few estimations.
examples. By evaluating the results of the algorithms in
few-shot classification tasks, we obtain that they perform
TABLE I
as well as the main MAML algorithm. E XPERIMENTAL E VALUATION FIELDS
• Hessian-Free (HF-MAML) was proposed by Alireza Fal-
lah et al. [9] in 2020 to achieve low computational Reinforcement
Algorithm Classification Regression
Learning
complexity and, at the same time, to preserve all theoret- MAML x x x
ical guarantees of the basic algorithm without evaluating MAML++ x - -
any Hessian. The authors provide a theoretical analysis iMAML x - -
of HF-MAML and highlight its improvement regarding ALPHA-MAML x - -
OOD-MAML x - -
the convergence properties compared to MAML and FOMAML x - -
FOMAML algorithms. Reptile x - -
HF-MAML - - -
C. Bayesian MAML algorithms LLAMA x x -
BMAML x x x
The Bayesian hierarchical model is also used in meta- PLATIPUS x x -
learning techniques to incorporate uncertainty into the model ABLM x - -
estimation. Based on this, there are several subvariants of the
basic MAML algorithm that improve its performance metrics
and weaknesses.
IV. D ISCUSSION AND F UTURE D IRECTIONS
• Laplace Approximation for Meta-Adaption (LLAMA)
is an extension to MAML based on the hierarchical All the above algorithms are demonstrated on different
Bayesian model and the Laplace approximation that model types and are experimentally evaluated in several dis-
provides a more accurate estimate of the integral. It tinct domains, i.e., classification problems, few-shot regres-
was proposed by Erin Grant et al. [10] in 2018. The sions, and RL scenarios. Table I indicates that the presented
experimental evaluation of the one-shot classification algorithms are mostly designed for classification problems.
problem shows a small improvement in accuracy. How- Some of them are also tested in regression problems and only
ever, skewed distributions do not respond well to this the first MAML algorithm and its Bayesian format (BMAML)
approximation. Furthermore, the extension has limitations are used in RL tasks.
since it uses a point estimate to approximate the predic- In contrast to the prior algorithms, other MAML subvariants
tive distribution over additional data points. To overcome have also been proposed focused only on reinforcement, deep
these limitations, the BMAML was proposed. reinforcement, and meta-reinforcement learning. The Taming
• Bayesian MAML (BMAML) was proposed in 2018 by MAML [14] using the automatic differentiation framework,
Taesup Kim et al [11], in order to learn complex uncer- and the ES-MAML [15], which is based on Evolution Strate-
tainty structures and combine an effective nonparametric gies, are two typical examples of this category.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

Fig. 2. MAML algorithms in timescale. Different colors are used for each category (Grey for the first MAML algorithm, blue for regular gradient descent
MAML algorithms, green for the Hessian-free MAML algorithms, and orange for Bayesian MAML algorithms).

Another observation is that most of the works regarding the [4] A. Antoniou, H. Edwards, and A. Storkey, “How to train your MAML,”
MAML algorithm family were proposed between 2017 and in International Conference on Learning Representations, New Orleans,
LA, USA, May 2019, doi: 10.48550/arXiv.1810.09502.
2020 as shown in Fig. 2. At this time, the research interest [5] A. Rajeswaran, C. Finn, S. M. Kakade, and S. Levine, “Meta-learning
has moved to new RL methods, which are a widely developing with implicit gradients,” Advances in neural information processing
field in the deep-learning field. However, the MAML family systems, vol. 32, Vancouver, Canada, 2019.
[6] H. S. Behl, A. G. Baydin, and P. H. Torr, “Alpha maml: Adaptive model-
consists of simple and widely used algorithms and there is a agnostic meta-learning,” in 6th ICML Workshop on Automated Machine
change that in the next years the researchers will continue to Learning, California, USA, 2019, doi: 10.48550/arXiv.1905.07435.
evolve this field. [7] T. Jeong and H. Kim, “Ood-maml: Meta-learning for few-shot out-of-
distribution detection and classification,” Advances in Neural Informa-
tion Processing Systems, vol. 33, pp. 3907–3916, Dec. 2020.
V. C ONCLUSIONS [8] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning
algorithms,” 2018, arXiv:1803.02999.
In this work, a brief review of the most popular MAML [9] A. Fallah, A. Mokhtari, and A. Ozdaglar, “On the convergence the-
techniques is presented. Various algorithms are proposed as ory of gradient-based model-agnostic meta-learning algorithms,” in
an extension to the first MAML algorithm to overcome the International Conference on Artificial Intelligence and Statistics, pp.
1082–1092, 2020.
weaknesses of the main proposal. The main characteristics [10] E. Grant, C. Finn, S. Levine, T. Darrell, and T. Griffiths, “Recasting
that researchers try to decode are training stability, memory ef- gradient-based meta-learning as hierarchical bayes,” in International
ficiency, flexibility, generalization, computational complexity, Conference on Learning Representations, Vancouver, Canada, 2018, doi:
10.48550/arXiv.1801.08930.
and robustness. Most of these algorithms are experimentally [11] J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn, “Bayesian
evaluated in few-shot classification problems and achieved model-agnostic meta-learning,” in Advances in Neural Information Pro-
promising results. Due to the simplicity and adaptability of cessing Systems, vol. 31, 2018, doi: 10.48550/arXiv.1806.03836.
[12] C. Finn, K. Xu, and S. Levine, “Probabilistic model-agnostic meta-
the MAML algorithm family, they established a field with learning,” Advances in Neural Information Processing Systems, vol. 31,
further potential development. In future work, we will extend 2018, doi: 10.48550/arXiv.1806.02817.
our research to RL techniques. We are planning to make an [13] S. Ravi and A. Beatson, “Amortized bayesian meta-learning,” in Inter-
national Conference on Learning Representations, New Orleans, LA,
extensive review as well as an experimental evaluation of the USA, May, 2018.
algorithms that are used in this field. [14] H. Liu, R. Socher, and C. Xiong, “Taming maml: Efficient unbiased
meta-reinforcement learning,” in International conference on machine
R EFERENCES learning, Long Beach, CA, USA, June, 2019.
[15] X. Song, W. Gao, Y. Yang, K. Choromanski, A. Pacchiano, and
[1] T. Hospedales, A. Antoniou, P. Micaelli and A. Storkey, ”Meta-Learning Y. Tang, “Es-maml: Simple hessian-free meta learning,” in In-
in Neural Networks: A Survey,” in IEEE Transactions on Pattern ternational Conference on Learning Representations, Apr. 2020,
Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5149-5169, 1 doi:10.48550/arXiv.1910.01215.
Sept. 2022, doi: 10.1109/TPAMI.2021.3079209.
[2] R. Vilalta, C. Giraud-Carrier, and P. Brazdil, ”Meta-Learning - Concepts
and Techniques”, in Data Mining and Knowledge Discovery Handbook,
Springer, Boston, MA, pp. 717–731., 2010, doi: 10.1007/978-0-387-
09823-4-36.
[3] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for
fast adaptation of deep networks,” in Proc. 34th International Conference
on Machine Learning - Volume 70, Sydney, Australia, Aug. 2017, pp.
1126–1135, doi: 10.48550/arXiv.1703.03400.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

The Challenges of Music Deep Learning for


Traditional Music
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST) | 979-8-3503-2107-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/MOCAST57943.2023.10176775

Lazaros Moysis∗† , Lazaros Alexios Iliadis‡ , Sotirios P. Sotiroudis‡ , Kostas Kokkinidis§


Panagiotis Sarigiannidis¶ , Spiridon Nikolaidis‡ , Christos Volos∗
Achilles D. Boursianis‡ , Dimitrios Babas∥ , Maria S. Papadopoulou‡∗∗ , Sotirios K. Goudos‡
∗: Laboratory of Nonlinear Systems - Circuits & Complexity, School of Physics, Aristotle University of Thessaloniki,
Thessaloniki, Greece. {lmousis,volos}@physics.auth.gr
† : Department of Mechanical Engineering, University of Western Macedonia, Kozani, Greece.
‡ : ELEDIA@AUTH, School of Physics, Aristotle University of Thessaloniki, 54124,

Thessaloniki, Greece. {liliadis,ssoti,snikolaid,bachi, sgoudo}@physics.auth.gr


§ : Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece. kostas.kokkinidis@uom.edu.gr
¶ : Department of Electrical and Computer Engineering,

University of Western Macedonia, 50131, Kozani, Greece. e-mail: psarigiannidis@uowm.gr


∥ : School of Physics, Aristotle University of Thessaloniki, Thessaloniki, Greece. email: babas@auth.gr
∗∗ : Department of Information and Electronic Engineering, International Hellenic University,

57400 Sindos, Greece. email: mspapa@ihu.gr

Abstract—Numerous applications for music listeners, educa- adjacent to it. Examples include music education, which
tors, DJs, and musicians have been created over the past decade encompasses the teaching of music theory, instrument play,
as the field of Music Deep Learning has expanded. Evidently, as well as music history. Another example is cultural tourism,
the majority of works rely on training their architectures using
well-established databases, which consist primarily of Western which refers to traveling in order to visit a cultural event or
popular music genres. This creates a deficiency in applications place, like a music concert or a religious event. Music therapy
focusing on traditional and regional music, which are vital to is another relevant application of MDL, which refers to the use
preserving culture. As an increasing number of research groups of music to treat symptoms related to conditions like stress,
begin to develop works centered on traditional music, several anxiety, depression, and more [10].
practical challenges for applying deep learning arise. This work
discusses these issues and the future of deep learning in the study An essential part of building any MDL architecture is
of traditional music. training, validating, and testing the model, using an available
Index Terms—Deep Learning, Music Information Retrieval, database. Certain Western-oriented music genres, such as pop,
Music Generation, Traditional Music, Regional Music jazz, hip-hop, electronic, rock, metal, blues, and classical,
are widely recognized. Thus, there is an abundance of free
I. I NTRODUCTION databases that contain substantial amounts of music recordings
from the aforementioned genres. Examples include the ISMIR
Deep Learning (DL) has so far expanded to numerous fields
dataset [11], the Million Song Dataset [12], the GTZAN
of the natural sciences [1], such as engineering, computer
dataset [13], MagnaTagATune [14], J.S. Bach Chorales [15],
vision, physics, and medicine, with a plethora of commercial
and more.
applications. Among these, DL for music processing has
The availability of large music recording databases is ex-
gained considerable attention in the last decade or so.
tremely useful in building efficient architectures. Unfortu-
Music Deep Learning (MDL) encompasses a collection of
nately, though, the focus on Western music leads to a big issue,
music-related applications, like music recommendation [2],
that of under-represented music genres, especially traditional
classification [3], music generation [4], instrument recognition
music. Traditional, or regional music, generally refers to music
[5], singing analysis [6], emotion classification [7], transcrip-
that is specific to a given country or region and is a big part of
tion [8] and more [9]. Naturally, considering that the art of
its history and cultural identity. Traditional music genres are
music plays such a big part in society and human interaction,
abundant throughout the world [16], with the roots of many
the above problems find several commercial applications in
genres tracing back hundreds of years. As music listening
fields related to the music industry, as well as disciplines
becomes more and more digitized and is moved to online plat-
This research was carried out as part of the project ’Recognition and forms like Youtube and Spotify, traditional music genres are
direct characterization of cultural items for the education and promotion of also becoming part of the digital online music community. As
Byzantine Music using artificial intelligence (Project code: KMP6-0078938) such, they are also subject to music information retrieval tasks
under the framework of the Action ’Investment Plans of Innovation’ of the
Operational Program ’Central Macedonia 2014 2020’, that is co-funded by like song/artist recommendation, automatic playlist generation,
the European Regional Development Fund and Greece. emotion recognition, transcription, and more.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

So, studying traditional music under the scope of MDL, music, in [20] for Indian classical music, in [21] for Korean
termed Traditional Music Deep Learning (TMDL), will sup- traditional music, in [22], [48] for Persian, in [25], [37] for
port cultural preservation through the use of state-of-the- Maqams, in [26]–[29], [31] for Chinese music, in [30] for
art tools for the study and digitization of music. Moreover, Cantonese Opera, in [32] for Croatian, in [33] for Ethiopian,
building ML architectures focusing on traditional music will and in [35] for Irish. For many of the above, the data collection
also help introduce traditional music to a new generation of included gathering data either from personal collections, from
listeners, by promoting digital education and cultural tourism. online sources, or obtaining new samples and annotation help
As more research groups recognize the impact potential from music students and experts.
of TMDL, they are developing DL architectures for Music Note that, complementary to music tracks, it is essential
Information Retrieval (MIR) [17]–[33] and Music Generation that music datasets be synchronized with lyric datasets, for
(MG) [34]–[37], specifically for such genres. For these tasks traditional music that includes a singing part. Lyric availability
though, several challenges arise that must be addressed to is important for tasks like singing voice detection or emotion
properly apply DL techniques. These issues have to do with recognition. Also, lyrics are important in traditional music,
the peculiarities of each music genre, and in many cases, the especially religious music. This is because singing is the most
lack of appropriate data for feature tagging and training [38]. important part of a song. For example, in Byzantine hymns,
Motivated by this, the goal of this work is to outline and the music rhythm comes from the chanter alone, without
discuss several of these issues, as a guide for future research any accompanying instrument. For this type of ’a Capella
groups that aim to tackle the problem of TMDL. By addressing music, suitable architectures lie at the intersection of music
these problems, the field of MDL will be closer to making and speech processing. Hence, MIR tasks need to be able to
unified architectures that can be easily adopted and work well detect a singing voice and have the lyrics available.
for all types of music. The work also aims to gather and
highlight recent literature works that study MDL topics for B. Tagging
specialized music genres.
As mentioned, the existence of available datasets for tradi-
The rest of the work is structured as follows: In Section
tional music is already an issue. It is important to emphasize
II, the challenges of applying DL to traditional music are dis-
here that a dataset does not simply involve the collection of
cussed. Each issue is addressed in a separate subsection. The
music files. In order for a dataset to find its highest usability
paper concludes with Section III, which includes a discussion
in DL, it must be organized and enriched with the appropriate
of future research goals.
tags. Thus, building upon the issue of lacking datasets, is the
II. C HALLENGES IN T RADITIONAL M USIC DL issue of organizing the datasets, to make them suitable for use
[27].
Here, the challenges arising in TMDL are discussed.
Music tags may refer to a subgenre, the type of instrument
A. Available Datasets involved, the presence of singing, the singer’s gender, the
emotion associated with the track, an era, and indirect musical
As mentioned in the introduction section, there are several
information like the musician’s popularity, record sales, and
public databases containing music tracks of popular music
more [52]. To fill the dataset with annotations, the best
genres, like pop, jazz, hip-hop, electronic, rock, metal, blues,
way is to consult culturally-aware experts in the field, like
and classical [11]–[15]. So, any MDL architecture has access
historians and musicians, but also music listeners. Many of
to many, usually, thousands, of music tracks to train, test,
the aforementioned works’ research groups are working on
and validate. Unfortunately, this is not the case with many
enriching their datasets with appropriate tags. Note that some
traditional music genres. Traditional music datasets are scarce,
tags, like the gender of a singer, may be clear, while others,
and those that are available typically contain far fewer samples.
like the underlying emotion of a score, or a specific subgenre,
So, making public, copyright-free datasets is the first and most
may not be. Of course, working on unlabeled data is possible
important thing that needs to be done to help TMDL develop.
in many DL tasks, but generally speaking, a label-enriched
Administrative authorities, like Ministries of Culture, can
dataset can find a wider usage.
greatly help in this direction by working alongside research
groups to provide available recordings, and also settle any
C. Recording Difficulties
copyright issues. In cases where available recordings (for
example, recordings by past famous artists) are not enough For most music genres, audio recordings take place in
to equip a dataset with the necessary samples for training, recording studios, equipped with consoles and appropriate
the datasets can be enriched by gathering recordings from hardware to capture the sound, and also perform the mixing
local artists and music schools. Many groups and authorities and editing. There can be exceptions though, as for some
are working towards this [39]–[51]. Here, data augmentation genres this may be hard to achieve. One such example is
techniques can also be applied to complement the newly religious music, where it may not always be possible to record
created datasets. songs in a studio. For example, when it comes to Orthodox
As recent examples, TMDL has been considered in [23], psalms, which are sometimes recorded in remote monasteries,
[24], [34] for Turkish music, in [19] for Hindustani classical the recording can only be done in place, because there is no

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

nearby studio, and the chanters may not be willing to perform and a given artist. For example, the song ”Whole Lotta Love”
outside the monastery grounds. is attributed to the band Led Zeppelin, and is associated with
So, when gathering tracks for the dataset, many record- their studio album Led Zeppelin II. Although other studio and
ings may take place in different environments with varying live versions may be available, the original album track can be
acoustics. This can result in recordings with different sound considered the main reference to this song. Yet, in traditional
characteristics, even for the same song. Thus, to compensate music, it is less common to associate a song with a given
for these technicalities, the recordings must be managed by artist or a specific performance. As such songs are much older,
sound engineers. It is worth noting of course that essential they have been covered by many musicians over the years, so
fieldwork has also been performed by dedicated ethnographers multiple performances can be associated with each song. This
and ethnomusicologists. Moreover, certain preprocessing of is even more prominent in religious music, where there can
the dataset may be required, to retrieve musical information be numerous recordings of a musical piece, like a hymn, a
and normalize all the features among the recordings. These chant, or a prayer. Such renditions can vary heavily from one
topics generally fall into the discipline of Audio Engineering. another, as the recitation of a chant is heavily dependent on
Note that for Symbolic music, that is, music saved in a each performer flavoring the track with their own rhythm and
notational format, like music sheets or MIDI files, this is not emotions.
an issue, as it is not accompanied by an audio signal. For these music categories, developing architectures for the
D. Local Instruments task of song identification will be especially challenging, due
to the variations in the musical characteristics of each song’s
Most of the widely popular instruments are being played (or chant’s) performance. For music tracks, earlier works have
by musicians from all genres. For example, instruments like showcased that entropy can be used to design fingerprints for
the guitar, trumpet, and drums are used extensively in many matching different performances of the same musical piece
music genres, from blues and pop to metal. Thus, there is an [59], [60]. Entropy is a measure of a signal’s complexity
abundance of recordings of these instruments that cover most and predictability, which is often considered in physics and
genres. So, it is possible that architectures trained to perform signal processing. Thus, entropy-based features, computed for
an application like instrument classification on one dataset can the time and frequency domains, could play a key role in
be equally efficient on another genre dataset as well. achieving robustness in identifying renditions of the same song
This would be much more difficult to achieve for musical or religious chant.
instruments that are much less common. There are plenty of
regional instruments in each country that are used specifically G. Public Outreach
in local music, making it unlikely that they are ever used The public dissemination of scientific results through jour-
for performing other genres. Examples include the Greek nal publications and conference participation is of course a
Baglamas, the American Banjo, the Eastern Oud, the Chinese common expectation for all scientific disciplines. These plat-
Erhu, and many more [53]. So for TMDL, special datasets forms usually ensure that any newly derived scientific results
must be constructed for regional instruments. These can first will reach all the interested groups that can be influenced
be used separately for tasks like classification and MG, and by them. This may not entirely be true for new results in
possibly at a later stage, they can be combined with larger TMDL. Here, apart from the scientists working in the field
databases and tested for the same tasks. Recent examples who are staying up-to-date on new scientific publications,
include the Thai Xylophone [54], the woodwind instrument there is a large group of non-academic parties that may be
Sopele [32], a collection of Chinese instruments [27], and left unaware of the opportunities and benefits offered by DL.
Persian instruments [48]. The scientific groups should thus be entrusted with the extra
E. Musical Symbols task of making sure that the developments in the studies of
Nowadays, music transcriptions follow a set of modernized traditional music through deep learning reach the communities
notation symbols, which for the most part are unified for all devoted to this music. This can be achieved by reaching out
genres. So, DL architectures for tasks like music transcription to local authorities, music schools, musicians, and media,
and symbolic MG can work well for many different types and presenting the results through public talks, as well as
of music. Considering TMDL though, there are many excep- non-academic platforms, like television talks, newspapers, and
tions, as some regional and historic genres have notational social media.
differences. Thus, the above architectures must be adapted Another approach that could prove highly effective is to
to the specific datasets for each genre. This, of course, also improve the ease of use of MDL, and especially music
falls under the issue of dataset creation, as discussed above. generation, for musicians and educators. This can be achieved
Examples include Mensural notation [55], Chinese Gong-che by building architectures that are accompanied by front-end
notation [56], Organ tablature [57], and Carnatic notation [58]. interactivity, for music generation tasks. Having a Graphical
User Interface (GUI) that would allow any user to interact
F. Matching Performances and experiment with the architecture, without requiring any
In MIR tasks, like song identification, an essential premise background knowledge of the model structure, would offer
is that a given song is associated with a specific performance immense help to professionals. Moreover, such GUIs can be

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

of great service in educational activities, for the introduction [6] Chitralekha Gupta, Haizhou Li, and Masataka Goto, “Deep learning
of younger generations to traditional music. approaches in topics of singing information processing,” IEEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 30, pp.
2422–2451, 2022.
H. Accessibility of the Research Results [7] X. Jia, “Music emotion classification method based on deep learning
and improved attention mechanism,” Computational Intelligence and
A final remark can be made regarding the commercialization Neuroscience, 2022.
potential of TMDL studies. As with all works on MIR, the [8] Carlos Hernandez-Olivan, Ignacio Zay Pinilla, Carlos Hernandez-Lopez,
commercial prospects are vast. On the other side, TMDL and Jose R Beltran, “A comparison of deep learning methods for timbre
analysis in polyphonic automatic music transcription,” Electronics, vol.
studies offer a service to the cultural preservation of each 10, no. 7, pp. 810, 2021.
region’s identity. Thus, an argument can be made that the [9] Lazaros A. Iliadis, Sotirios P Sotiroudis, Kostas Kokkinidis, Panagiotis
deliverables of such studies should be made available to any- Sarigiannidis, Spiridon Nikolaidis, and Sotirios K Goudos, “Music deep
one, due to their cultural significance. This could be achieved learning: A survey on deep learning methods for music processing,” in
2022 11th International Conference on Modern Circuits and Systems
if TMDL studies are supported by public funding, with the Technologies (MOCAST). IEEE, 2022, pp. 1–4.
deliverables (datasets and architectures) being made available [10] Yang Heping and Wang Bin, “Online music-assisted rehabilitation
as open access. This point of course comes in accordance with system for depressed people based on deep learning,” Progress in Neuro-
Psychopharmacology and Biological Psychiatry, p. 110607, 2022.
the trending worldwide discussion on the public availability of [11] “Ismir genre dataset,” https://ismir2004.ismir.net/, 2004, Access: 2022-
research outputs, which is relevant to other research fields as 30-09.
well. [12] Thierry Bertin-Mahieux, Daniel Ellis, Brian Whitman, and Paul Lamere,
“The million song dataset.,” in Proceedings of the 12th International
Society for Music Information Retrieval Conference (ISMIR 2011), 01
III. C ONCLUSIONS 2011, pp. 591–596.
[13] George Tzanetakis and Perry Cook, “Musical genre classification of
The difficulties of applying deep learning techniques to audio signals,” IEEE Transactions on speech and audio processing, vol.
traditional music are discussed in this paper. Considering the 10, no. 5, pp. 293–302, 2002.
aforementioned issues, one conclusion that can be drawn is [14] “The MagnaTagATune Dataset,” http://mirg.city.ac.uk/codeapps/the-
that, given sufficient resources, they can all be successfully magnatagatune-dataset, 2013, Accessed: 2022-30-09.
[15] “J.S. Bach Chorales Dataset,” https://github.com/czhuang/JSB-Chorales-
addressed. Consequently, it is crucial to raise awareness among dataset, Accessed: 2022-30-09.
the Ministries of Culture of each country, to provide the [16] Wikipedia, “List of cultural and regional genres of music,”
financial resources for research. Collaboration with local com- en.wikipedia.org/wiki/List of cultural and regional genres of music,
Accessed: 2022-30-09.
munities, cultural associations, and conservation clubs must
[17] Yandre MG Costa, Luiz S Oliveira, and Carlos N Silla Jr, “An
also be extended, in order to aid in database construction and evaluation of convolutional neural networks for music classification
knowledge digitization. It is also essential that the members using spectrograms,” Applied soft computing, vol. 52, pp. 28–38, 2017.
of the research groups examining TMDL come from a variety [18] Akhilesh Kumar Sharma, Gaurav Aggarwal, Sachit Bhardwaj, Pra-
sun Chakrabarti, Tulika Chakrabarti, Jemal H Abawajy, Siddhartha
of disciplines, including computer engineers, physicists, sound Bhattacharyya, Richa Mishra, Anirban Das, and Hairulnizam Mahdin,
engineers, historians, ethnographers, music composers, music “Classification of Indian classical music with time-series matching deep
performers, and even musical instrument manufacturers. With- learning approach,” IEEE Access, vol. 9, pp. 102041–102052, 2021.
[19] Vishnu S Pendyala, Nupur Yadav, Chetan Kulkarni, and Lokesh Vad-
out a doubt, future research in the field will resolve the issues lamudi, “Towards building a Deep Learning based Automated Indian
listed above. Classical Music Tutor for the Masses,” Systems and Soft Computing,
A natural expansion from TMDL would be to apply DL vol. 4, pp. 200042, 2022.
techniques to the field of traditional dance [61]. Many of [20] Sayan Nag, Medha Basu, Shankha Sanyal, Archi Banerjee, and Dipak
Ghosh, “On the application of deep learning and multifractal techniques
the aforementioned issues are shared here as well, like the to classify emotions and instruments using Indian Classical Music,”
existence of suitable datasets. Addressing traditional music Physica A: Statistical Mechanics and its Applications, vol. 597, pp.
through DL will certainly complement the study of traditional 127261, 2022.
[21] JongSeol Lee, MyeongChun Lee, Dalwon Jang, and Kyoungro Yoon,
music and aid in cultural preservation, education, and global “Korean traditional music genre classification using sample and midi
representation. phrases,” KSII Transactions on Internet and Information Systems (TIIS),
vol. 12, no. 4, pp. 1869–1886, 2018.
R EFERENCES [22] Nacer Farajzadeh, Nima Sadeghzadeh, and Mahdi Hashemzadeh,
“PMG-Net: Persian Music Genre classification using deep neural Net-
[1] Ajay Shrestha and Ausif Mahmood, “Review of deep learning algo- works,” Entertainment Computing, p. 100518, 2022.
rithms and architectures,” IEEE Access, vol. 7, pp. 53040–53065, 2019. [23] Merve Ayyüce Kızrak and Bülent Bolat, “A musical information
[2] Markus Schedl, “Deep learning in music recommendation systems,” retrieval system for Classical Turkish Music makams,” Simulation, vol.
Frontiers in Applied Mathematics and Statistics, p. 44, 2019. 93, no. 9, pp. 749–757, 2017.
[3] Ndiatenda Ndou, Ritesh Ajoodha, and Ashwini Jadhav, “Music genre [24] Serhat Hizlisoy, Serdar Yildirim, and Zekeriya Tufekci, “Music emotion
classification: A review of deep-learning and traditional machine- recognition using convolutional long short term memory deep neural
learning approaches,” in 2021 IEEE International IOT, Electronics and networks,” Engineering Science and Technology, an International
Mechatronics Conference (IEMTRONICS). IEEE, 2021, pp. 1–6. Journal, vol. 24, no. 3, pp. 760–767, 2021.
[4] Miguel Civit, Javier Civit-Masot, Francisco Cuadrado, and Maria J [25] Sakib Shahriar and Usman Tariq, “Classifying maqams of Qur¢anic
Escalona, “A systematic review of artificial intelligence-based music recitations using deep learning,” IEEE Access, vol. 9, pp. 117271–
generation: Scope, applications, and future trends,” Expert Systems with 117281, 2021.
Applications, p. 118190, 2022. [26] Y. Yang and X. Huang, “Research based on the application and
[5] Maciej Blaszke and Bożena Kostek, “Musical instrument identification exploration of artificial intelligence in the field of traditional music,”
using deep learning approach,” Sensors, vol. 22, no. 8, pp. 3033, 2022. Journal of Sensors, vol. 2022, 2022.

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)

[27] Rongfeng Li and Qin Zhang, “Audio recognition of Chinese traditional recordings for computational music analysis,” in The 20th International
instruments based on machine learning,” Cognitive Computation and Society for Music Information Retrieval Conference, 2019.
Systems, vol. 4, no. 2, pp. 108–115, 2022. [50] “RWC pop music dataset,” https://staff.aist.go.jp/m.goto/RWC-MDB/,
[28] Ke Xu, “Recognition and classification model of music genres and 2012, Accessed: 2022-30-09.
Chinese traditional musical instruments based on deep neural networks,” [51] Masataka Goto, Hiroki Hashiguchi, Takuichi Nishimura, and Ryuichi
Scientific Programming, vol. 2021, 2021. Oka, “RWC Music Database: Popular, Classical and Jazz Music
[29] Juan Li, Jing Luo, Jianhang Ding, Xi Zhao, and Xinyu Yang, “Regional Databases.,” in Ismir, 2002, vol. 2, pp. 287–288.
classification of Chinese folk songs based on CRF model,” Multimedia [52] Paul Lamere, “Social tagging and music information retrieval,” Journal
tools and applications, vol. 78, no. 9, pp. 11563–11584, 2019. of new music research, vol. 37, no. 2, pp. 101–114, 2008.
[30] Qiao Chen, Wenfeng Zhao, Qin Wang, and Yawen Zhao, “The Sustain- [53] Wikipedia, “List of musical instruments,”
able Development of Intangible Cultural Heritage with AI: Cantonese wikipedia.org/wiki/List of musical instruments, Access: 2022-30-
Opera Singing Genre Classification Based on CoGCNet Model in 09.
China,” Sustainability, vol. 14, no. 5, pp. 2923, 2022. [54] Apichai Huaysrijan and Sunee Pongpinigpinyo, “Automatic Music
[31] Zhongkui Xu, “Construction of intelligent recognition and learning edu- Transcription for the Thai Xylophone played with Soft Mallets,” in 2022
cation platform of national music genre under deep learning,” Frontiers 19th International Joint Conference on Computer Science and Software
in Psychology, vol. 13, 2022. Engineering (JCSSE). IEEE, 2022, pp. 1–6.
[32] Arian Skoki, Sandi Ljubic, Jonatan Lerga, and Ivan Štajduhar, “Auto- [55] Jorge Calvo-Zaragoza, Alejandro H Toselli, and Enrique Vidal, “Hand-
matic music transcription for traditional woodwind instruments sopele,” written music recognition for mensural notation with convolutional
Pattern Recognition Letters, vol. 128, pp. 340–347, 2019. recurrent neural networks,” Pattern Recognition Letters, vol. 128, pp.
[33] Ephrem A Retta, Richard Sutcliffe, Eiad Almekhlafi, Yosef Enku, Eyob 115–121, 2019.
Alemu, Tigist Gemechu, Michael A Berwo, Mustafa Mhamed, and Jun [56] G.-F. Chen, “Music sheet score recognition of Chinese Gong-che
Feng, “Kinit Classification in Ethiopian Chants, Azmaris and Modern notation based on Deep Learning,” in International Conference on Big
Music: A New Dataset and CNN Benchmark,” arXiv:2201.08448, 2022. Data Analysis and Computer Science (BDACS). IEEE, 2021, pp. 183–
[34] Senem Tanberk and Dilek Bilgin Tükel, “Style-Specific Turkish Pop 190.
Music Composition with CNN and LSTM Network,” in 2021 IEEE [57] Daniel Schneider, Nikolaus Korfhage, Markus Mühling, Peter Lüttig,
19th World Symposium on Applied Machine Intelligence and Informatics and Bernd Freisleben, “Automatic transcription of organ tablature music
(SAMI). IEEE, 2021, pp. 000181–000185. notation with deep neural networks,” Transactions of the International
[35] Antonina Kolokolova, Mitchell Billard, Robert Bishop, Moustafa Elsisy, Society for Music Information Retrieval, vol. 4, no. 1, 2021.
Zachary Northcott, Laura Graves, Vineel Nagisetty, and Heather Patey, [58] VK Prathyushaa, PS Chandrasekar, and R Anuradha, “A Comparative
“GANs & Reels: Creating Irish Music using a Generative Adversarial Study of Image Classification Models for Western Notation to Carnatic
Network,” arXiv preprint arXiv:2010.15772, 2020. Notation: Conversion of Western Music Notation to Carnatic Music
[36] Bob LT Sturm and Oded Ben-Tal, “Folk the algorithms:(Mis) Applying Notation,” in 2021 Fifth International Conference on I-SMAC (IoT in
artificial intelligence to folk music,” in Handbook of Artificial Intelli- Social, Mobile, Analytics and Cloud)(I-SMAC). IEEE, 2021, pp. 1299–
gence for Music, pp. 423–454. Springer, 2021. 1303.
[37] Ismail Hakki Parlak, Yalçin Çebi, Cihan Işikhan, and Derya Birant, [59] Antonio Camarena-Ibarrola and Edgar Chávez, “Identifying music by
“Deep learning for Turkish makam music composition,” Turkish Journal performances using an entropy based audio-fingerprint,” in Mexican
of Electrical Engineering and Computer Sciences, vol. 29, no. 7, pp. International Conference on Artificial Intelligence (MICAI), 2006.
3107–3118, 2021. [60] Antonio C Ibarrola and Edgar Chávez, “A robust entropy-based audio-
[38] Xavier Serra, “A multicultural approach in music information research,” fingerprint,” in 2006 IEEE International Conference on Multimedia and
in ISMIR 2011: Proceedings of the 12th International Society for Music Expo. IEEE, 2006, pp. 1729–1732.
Information Retrieval Conference; Miami, Florida (USA). International [61] I. Rallis, A.s Voulodimos, N. Bakalos, E. Protopapadakis, N. Doulamis,
Society for Music Information Retrieval (ISMIR), 2011. and A. Doulamis, “Machine learning for intangible cultural heritage: a
[39] Xiaojing Liang, Zijin Li, Jingyu Liu, Wei Li, Jiaxing Zhu, and Baoqiang review of techniques on dance analysis,” Visual Computing for Cultural
Han, “Constructing a multimedia Chinese musical instrument database,” Heritage, pp. 103–119, 2020.
in Proceedings of the 6th Conference on Sound and Music Technology
(CSMT). Springer, 2019, pp. 53–60.
[40] Xia Gong, Yuxiang Zhu, Haidi Zhu, and Haoran Wei, “Chmusic: a tra-
ditional Chinese music dataset for evaluation of instrument recognition,”
in 2021 4th International Conference on Big Data Technologies, 2021,
pp. 184–189.
[41] Carlos Nascimento Silla Jr, Alessandro L Koerich, and Celso AA
Kaestner, “The Latin Music Database.,” in ISMIR, 2008, pp. 451–456.
[42] “Royal Museum for Central-Africa,” http://www.africamuseum.be/en,
Accessed: 2022-30-09.
[43] Dimos Makris, Ioannis Karydis, and Spyros Sioutas, “The Greek
music dataset,” in Proceedings of the 16th International Conference on
Engineering Applications of Neural Networks (INNS), 2015, pp. 1–7.
[44] Charilaos Papaioannou, Ioannis Valiantzas, Theodoros Giannakopoulos,
Maximos Kaliakatsos-Papakostas, and Alexandros Potamianos, “A
Dataset for Greek Traditional and Folk Music: Lyra,” arXiv preprint
arXiv:2211.11479, 2022.
[45] “Thrace and Macedonia,” http://epth.sfm.gr/, Accessed: 2022-30-09.
[46] “The session,” https://thesession.org/, Accessed: 2022-30-09.
[47] M Kemal Karaosmanoğlu, “A Turkish makam music symbolic database
for music information retrieval: SymbTr,” in Proceedings of 13th In-
ternational Society for Music Information Retrieval Conference; ISMIR.
International Society for Music Information Retrieval (ISMIR), 2012,
pp. 223–228.
[48] S. M. H. Mousavi and S. M. H. Prasath, V. S.and Mousavi, “Persian
classical music instrument recognition (PCMIR) using a novel Persian
music database,” in 2019 9th International Conference on Computer
and Knowledge Engineering (ICCKE). IEEE, 2019, pp. 122–130.
[49] Lucas Maia, Magdalena Fuentes, Luiz Biscainho, Martı́n Rocamora, and
Slim Essid, “SAMBASET: A dataset of historical samba de enredo

Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.

You might also like