Professional Documents
Culture Documents
Abstract—The rapid evolution of the 5G environments intro- and interfaces of the Next-Generation Radio Access Network
duces several benefits, such as faster data transfer speeds, lower (NG-RAN) and the 5G core itself are vulnerable to attacks,
latency and energy efficiency. However, this situation brings also which can potentially disrupt end-to-end communication ser-
critical cybersecurity issues, such as the complex and increased
attack surface, privacy concerns and the security of the 5G vices.
core network functions. Therefore, it is evident that the role While a particular emphasis has been given to the security
of intrusion detection mechanisms empowered with Artificial of NG-RAN, there are not many studies investigating the
Intelligence (AI) models is crucial. Therefore, in this paper, we security issues of the 5G core. In this paper, we focus on
introduce a labelled security dataset called 5GC PFCP Intrusion the security issues of the Packet Forwarding Control Protocol
Detection Dataset. This dataset includes a set of network flow
statistics that can be used by AI detection models to recognise (PFCP) protocol, which is utilised in the N4 interface between
cyberattacks against the Packet Forwarding Control Protocol the Session Management Function (SMF) and the User Plane
(PFCP). PFCP is used for the N4 interface between the Session Function (UPF) in the 5G core. In particular, based on the
Management Function (SMF) and the User Plane Function PFCP attacks investigated in our previous work in [3], in
(UPF) in the 5G core. In particular, four PFCP attacks are this paper, we introduce a labelled intrusion detection dataset,
investigated in this paper, including the relevant network traffic
data in terms of pcap files and the Transmission Control Protocol called 5GC PFCP Intrusion Detection Dataset, which was
(TCP)/Internet Protocol (IP) and application-layer statistics. This generated in the context of the H2020 SANCUS project,
dataset is already publicly available in IEEE Dataport and a collaborative research initiative funded by the European
Zenodo. Union (EU) to enhance the security of 5G networks. This
Index Terms—5G, Artificial Intelligence, Cybersecurity, Intru- security dataset can fully support the development of Artificial
sion Detection, PFCP
Intelligence (AI)-powered Intrusion Detection and Prevention
Systems (IDPS) against these attacks.
I. I NTRODUCTION
The proposed dataset is available in IEEE Dataport and
In today’s communication landscape, there is a growing Zenodo and can support the development of IDPS that adopt
demand for secure and reliable connections with high-speed, Machine Learning (ML) and Deep Learning (DL) methods.
high-throughput capabilities between the User Equipment We construct this dataset by following the methodological
(UE) and the Data Network (DN). The 5G core architecture, framework of A. Gharib et al. [4]. Therefore, our dataset is
which follows the 3rd Generation Partnership Project (3GPP) characterised by eleven main attributes: (a) Complete Network
network specifications, offers faster connectivity, lower la- Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d)
tency, higher bit rates, and improved network reliability. This Complete Interaction, (e) Complete Capture, (f) Available
technology is crucial to support critical applications such as Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature
the Internet of Things (IoT) and industrial use cases targeting Set and (j) Metadata. Therefore, the contributions of this paper
pivotal infrastructures [1], [2]. However, several components are summarised as follows:
∗ This project has received funding from the European Union’s Horizon • 5G Core Testbed and PFCP Attacks: A virtualised
2020 research and innovation programme under grant agreement 952672 5G environment was implemented in order to investigate
(SANCUS). PFCP attacks against the 5G core. Four PFCP-related at-
† G. Amponis, P. Radoglou-Grammatikis and G. Nakas are with
K3Y Ltd. Sofia 1612, Bulgaria - E-Mail: {gamponis, pradoglou, tacks are examined, targeting the communication between
gnakas}@k3y.bg SMF and UPF.
‡ G. Amponis and T. Lagkas are with the Department of Computer Science,
• 5GC PFCP Intrusion Detection Dataset: Based on the
International Hellenic University, Kavala Campus, 65404, Greece - E-Mail:
{geaboni, tlagkas}@cs.ihu.gr previous PFCP attacks, a new labelled security dataset is
§ P. Radoglou Grammatikis and P. Sarigiannidis are with the De- implemented and shared publicly in order to support the
partment of Electrical and Computer Engineering, University of West- development of AI solutions for intrusion detection. This
ern Macedonia, Kozani 50100, Greece - E-Mail: {pradoglou,
psarigiannidis}@uowm.gr dataset is available in IEEE Dataport1 and Zenodo 2 .
¶ Sotirios Goudos is with the Department of Physics, Aristotle
Based on the previous remarks, the rest of this paper is
University of Thessaloniki, Thessaloniki, 54124, Greece - E-Mail:
{sgoudo}@physics.auth.gr organised as follows. Section II discusses the 5G testbed and
∥ V. Argyriou is with the Department of Networks and Digital Media,
1 https://ieee-dataport.org/documents/5gc-pfcp-intrusion-detection-dataset-0
Kingston University London, Penrhyn Road, Kingston upon Thames, Surrey
KT1 2EE, UK - E-Mail: vasileios.argyriou@kingston.ac.uk 2 https://zenodo.org/record/7888347#.ZFejbNJBxhE
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
the relevant attacks used to compose the 5GC PFCP Intrusion DN is disconnected, and the UE remains connected to the 5G
Detection Dataset. Section III provides the structure of the RAN or the Core network. The attack is executed on the N4
dataset. Section IV summarises the features and the balanced interface and may affect the N6 interface.
files that can be utilised by ML and DL methods. Finally, PFCP Session Modification Flood attack (DUPL Apply
section V gives the concluding remarks of the paper. Action Field Flag): The aim of this attack is to utilise the
DUPL flag in the Apply Action field to compel the UPF
II. 5G T ESTBED AND PFCP ATTACKS to replicate rules for the session, generating multiple paths
As depicted in Fig. 1, an experimental 5G testbed was for the same data from a single source. This can result in
used to develop the 5GC PFCP Intrusion Detection Dataset, undefined behaviour in the N6 interface and/or cause traffic
including the following 5G network functions: Network Slice to be duplicated during transmission to the DN. Additionally,
Selection Function (NSSF), the Network Exposure Function this attack may be part of a larger scheme to carry out a
(NEF), the Network Repository Function (NRF), the Policy Distributed Denial of Service (DDoS) attack against hosts
Control Function (PCF), the User Data Management (UDM), within the DN, while also overwhelming the UPF’s resources
the Access and Mobility Management Function (AF), the to forward outgoing packets to external hosts outside the 5G
Authentication Server Function (AUSF), the Access Manage- core. By amplifying the number of transmitted packets per
ment Function (AMF), SMF and UPF. Moreover, a virtualised active user, a malicious entity can create an almost passive
UE, a virtualised gNodeB (gNb) and an attacker instance attack vector that can be effortlessly scaled to impact the traffic
impersonating a maliciously instantiated SMF were used to of numerous subscribers, exponentially draining the packet
generate the dataset. This testbed was implemented, utilising handling resources of the UPF.
Open5GS as the cellular core [5], and UERANSIM as NG- The attacks were performed in the following order: First, on
RAN. Thus, the following PFCP attacks were emulated in a Wednesday, October 5, 2022, the PFCP Session Establishment
coordinated manner in order to construct the dataset. DoS Attack was performed for four hours. Continuing, on
PFCP Session Establishment DoS Attack: The aim of this Thursday, October 13, 2022, the PFCP Session Deletion DoS
attack is to exhaust the resources of the UPF by inundating Attack was performed for four hours. Then, on Tuesday,
it with genuine Session Establishment Requests and Heartbeat November 01, 2022, the PFCP Session Modification DoS
Requests. This could potentially hinder the 5G core’s ability to Attack with the DROP flag was performed for four hours.
create new Protocol Data Unit (PDU) sessions between clients Finally, on Tuesday, November 22, 2022, the PFCP Session
and DN. The attack is executed on the N4 interface and could Modification DoS Attack with a DUPL flag in the Apply
affect the intermediate interfaces as well. To evade detection, Action Field Flag was performed for four hours.
a unique Session ID (SEID) is generated for every session
establishment request. III. DATASET S TRUCTURE
PFCP Session Deletion DoS Attack: The goal of this attack The previous PFCP attacks were carried out using se-
is to disconnect a specific UE from the DN. The script focuses curity tools, such as scapy,pcap-splitter, editcap
on PDU sessions between clients and DN in such a way that and CICFlowMeter. Each attack is provided by a 7z/zip
only the DN is disconnected, while the UE remains connected file that contains the network traffic and flow statistics for
to the NG-RAN or the Core network. The attack is executed each entity involved. Specifically, each 7z/zip file includes (a)
on the N4 interface, and its impact is noticeable on the N6 pcap files for each entity, (b) Transmission Control Proto-
interface. The only way to restore the connection of an affected col (TCP)/Internet Protocol (IP) network flow statistics in a
UE is to re-initiate the session, either by restarting the session Comma-Separated Values (CSV) format, and (c) PFCP flow
or entering the coverage area of another gNb. In such cases, statistics for each entity, utilising different timeout values (such
a new SEID is assigned to the UE’s PDU session, rendering as 15, 20, 60, 120, and 240 seconds). The TCP/IP network flow
the attack ineffective. statistics were generated using CICFlowMeter, while the
PFCP Session Modification Flood attack (DROP Apply PFCP flow statistics were produced using a Custom PFCP
Action Field Flags): The objective of this attack is to in- Flow Generator.
validate packet handling rules for a specific session, leading Based on the aforementioned remarks, the dataset consists
to the disassociation of a targeted UE from the DN. When of the following 7z/zip files:
successful, the Forwarding Action Rule (FAR) rules that • Balanced PFCP APP Layer.7z/zip: It includes the bal-
contain the base station’s Tunnel Endpoint Identifier (TEID) anced CSV files from CICFlowMeter that may be used
and IP address are removed from the UPF. This action cuts off to train ML and DL algorithms. Each folder includes
the General Packet Radio Service (GPRS) Tunneling Protocol a different subfolder for the corresponding flow timeout
(GTP) tunnel for the subscriber’s downlink data, depriving values used by the Custom PFCP Flow Generator.
them of internet connectivity. However, the GPRS Tunneling • Balanced TCP-IP Layer.7z/zip: It includes the balanced
Protocol - User Plane (GTP-U) tunnel can be restored by CSV files from the Custom PFCP Flow Generator that
transmitting the necessary data to the UPF. Similar to other may be used to train ML and DL algorithms. Each folder
PFCP-based attacks, this script focuses on the PDU sessions includes a different sub-folder for the corresponding flow
between the clients and the DN in such a way that only the timeout values used by CICFlowMeter.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
N1: UE - AMF
N2: gNB - AMF
N3: gNB - UPF
N4: SMF - UPF NSSF NEF NRF PCF UDM AF
N5: PCF - AF N22 N15 N10 N5
N6: UPF - DN
N7: SMF - PCF N13 N14 N8 N11 N7 N9
N8: AMF - UDM N12 N4 N6
N9: UPF - UPF
AUSF AMF SMF UPF DN
N10: SMF -UDM N1 N2 N3
N11: AMF - SMF 5G Core PDU Session
• PFCP Session Deletion DoS Attack.7z/zip: It includes each of the classes. The five classes and their labels are avail-
the pcap files and CSV files related to the PFCP Session able in Table III. The two balanced versions are summarised
Deletion Denial of Service (DoS) Attack. by the files Balanced_PFCP_APP_Layer.7z/zip and
• PFCP Session Establishment DoS Attack.7z/zip: It Balanced_TCP-IP_Layer.7z/zip. The first file con-
includes the pcap files and CSV files related to the PFCP tains the PFCP flow statistics, while the second file includes
Session Establishment Flood DoS Attack. the TCP/IP flow statistics. Each file also contains a set of sub-
• PFCP Session Modification DoS Attack.7z/zip: It in- folders for each flow timeout value. In particular, five values
cludes the pcap files and CSV files related to the PFCP were used: 15s, 20s, 60s, 120s, and 240s. In addition, there
Session Modification DoS Attack. are two sub-subfolders, namely Training and Testing.
Each of these sub-subfolders contains a .csv file named
IV. F EATURES AND BALANCED F ILES Training_X.csv and Testing_X.csv, where X is the
flow timeout value. The split ratio is: 70% - 30%, for the
training and testing processes, respectively. In addition, the
Table I splitting is stratified, meaning that the same percentage of
A PPLICATION L AYER PFCP F LOW S TATISTICS – F EATURES
samples of each class are present in each training and testing
Feature
flow ID
Description
Flow identifier
dataset. The number of flows per each flow timeout value for
source IP Source IP address Balanced_TCP-IP_Layer.7z/zip are presented in Ta-
destination IP Destination IP address
source port Source port number ble IV, while the number of flows for each flow timeout value
destination port Destination port number
protocol Network layer protocol for Balanced_PFCP_APP_Layer.7z/zip are provided
duration Length of time flow was active
fwd packets Number of forward packets in Table V.
bwd packets Number of backward packets
PFCPHeartbeatRequest counter Number of PFCP Heartbeat Request messages
PFCPHeartbeatResponse counter Number of PFCP Heartbeat Response messages V. C ONCLUSIONS
PFCPPFDManagementRequest counter Number of PFCP PFD Management Request messages
PFCPPFDManagementResponse counter
PFCPAssociationSetupRequest counter
Number of PFCP PFD Management Response messages
Number of PFCP Association Setup Request messages
The communication points in the 5G core can lead to
PFCPAssociationSetupResponse counter
PFCPAssociationUpdateRequest counter
Number of PFCP Association Setup Response messages
Number of PFCP Association Update Request messages
various security weaknesses that are investigated by both
PFCPAssociationUpdateResponse counter Number of PFCP Association Update Response messages academia and industry. In this paper, we present the 5GC
PFCPAssociationReleaseRequest counter Number of PFCP Association Release Request messages
PFCPAssociationReleaseResponse counter Number of PFCP Association Release Response messages PFCP Intrusion Detection Dataset, which was generated in the
PFCPVersionNotSupportedResponse counter Number of PFCP Version Not Supported Response messages
PFCPNodeReportRequest counter Number of PFCP Node Report Request messages context of the H2020 SANCUS project. This security dataset
PFCPNodeReportResponse counter Number of PFCP Node Report Response messages
PFCPSessionSetDeletionRequest counter Number of PFCP Session Set Deletion Request messages is publicly available in IEEE Dataport and Zenodo and can be
PFCPSessionSetDeletionResponse counter Number of PFCP Session Set Deletion Response messages
PFCPSessionEstablishmentRequest counter Number of PFCP Session Establishment Request messages utilised for the development of AI-powered intrusion detection
PFCPSessionEstablishmentResponse counter Number of PFCP Session Establishment Response messages
PFCPSessionModificationRequest counter Number of PFCP Session Modification Request messages and prevention mechanisms. It includes the network traffic data
PFCPSessionModificationResponse counter
PFCPSessionDeletionRequest counter
Number of PFCP Session Modification Response messages
Number of PFCP Session Deletion Request messages
(i.e., pcap files) and labelled TCP/IP and PFCP flow statistics
PFCPSessionDeletionResponse counter
PFCPSessionReportRequest counter
Number of PFCP Session Deletion Response messages
Number of PFCP Session Report Request messages
related to four PFCP cyberattacks, namely (a) PFCP Session
PFCPSessionReportResponse counter
Downlink counter
Number of PFCP Session Report Response messages
Number of downlink packets
Establishment DoS Attack, (b) PFCP Session Deletion DoS
Uplink counter Number of uplink packets Attack, (c) PFCP Session Modification Flood attack (DROP
Bidirectional traffic counter Number of bidirectional traffic packets
Label Flow label (e.g. benign or malicious) Apply Action Field Flags) and (d) PFCP Session Modification
Flood attack (DUPL Apply Action Field Flag).
Two balanced versions of the dataset have been created for
R EFERENCES
the TCP/IP flow statistics generated by CICFlowMeter (Ta-
ble II) and the PFCP flow statistics produced by the Custom [1] P. Radoglou-Grammatikis, P. Sarigiannidis, C. Dalamagkas, Y. Spyridis,
T. Lagkas, G. Efstathopoulos, A. Sesis, I. L. Pavon, R. T. Burgos,
PFCP Flow Generator (Table I). Each version is bal- R. Diaz, A. Sarigiannidis, D. Papamartzivanos, S. A. Menesidou,
anced. Therefore, they contain an equal number of samples for G. Ledakis, A. Pasias, T. Kotsiopoulos, A. Drosou, O. Mavropoulos,
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Ioannis Kafetzis∗ , Christos Volos∗ , Hector E. Nistazakis† , Sotirios Goudos‡ , Nikolaos G. Bardis§
∗: Laboratory of Nonlinear Systems - Circuits & Complexity, Physics Department, Aristotle University of Thessaloniki,
Thessaloniki, Greece. {kafetzis,volos}@physics.auth.gr
† : Section of Electronic Physics and Systems, Department of Physcics, National and Kapodistrian University of Athens,
Abstract—This work introduces a chaos based audio encryp- chaotic maps and DNA encoding are proposed. Synchroniza-
tion scheme, along with its implementation, which works in real tion of fractional order chaotic systems is used in [15] for the
time. Initially, a novel chaotic map is introduced, and using this same purpose. In [16] discrete chaotic systems are used to
map a Pseudo-Random Bit generator is defined. The encryption
scheme is defined based on the above and utilizes parallel create encryption schemes for mono and stereo audio signals.
computing to encrypt recorded audio ”on the fly”, rather than A chaos-based audio encryption system focusing on speech is
collecting the complete recording and encrypting afterwards, proposed in [17]. The above works assume that the complete
making the method suitable for real-time encryption. The security audio signal is available before encryption, which can be a
of the proposed method is evaluated via statistical tests. Finally, slowing factor for real-time encryption.
an Open-Source implementation of the proposed method using
Python is given. In this work, we propose an efficient, chaos based, au-
Index Terms—chaos, encryption, audio, secure communica- dio encryption method that works in real-time. Instead of
tions acquiring the complete audio signal and then encrypting it,
audio chaos-based encryption is performed on the fly and in
I. I NTRODUCTION parallel, which reduces the execution time. The security of
Wireless voice communication becomes increasingly fre- the proposed method is validated through statistical testing.
quent. This in turn results into increased requirements in Finally, the proposed method is implemented in Python for
guaranteeing secure communication, which is usually achieved encryption of mono audio signals, as a proof of concept.
with the use of encryption schemes. The traditional encryption The implementation is open-source and available at https:
algorithms, such as the Data Encryption System (DES) [1] //github.com/iokaf/Chaotic-Audio-Encryption.
and Advanced Encryption Standard (AES) [2] struggle with II. M ETHODS
encrypting signals, such as images or audio, due to their long
A. Design and Study of the Chaotic Map
computational time and power consumption [3]. It is thus
important that the encryption schemes are not only secure, The discrete map proposed in this work is
but fast and computationally efficient. Chaos-based encryption xk+1 = r (sin(πxk ) + rem (1000xk , 1)) (1)
schemes have become a constant point of research interest, as
they can fulfill all requirements. where rem is the remainder operation. This map draws in-
Chaotic systems are deterministic dynamical systems, both spiration from [18], where it is proved that using the remain-
continuous and discrete, that demonstrate extreme sensitivity der operation results into increased Lyapunov exponent. The
to initial conditions [4]. This implies, that a small alteration behavior of the map is studied using the standard graphical
of the initial condition results into a completely different methods, that is, the bifurcation [19] and Lyapunov exponent
trajectory. Hence, chaotic systems serve as computationally [20] and Cobweb plots, the data for which are calculated using
efficient sources of deterministic entropy. When used for the ”Chaos-Maps” Python Library [21].
encryption purposes, the chaotic system is utilized to generate B. Design and Validation of the PRBG
a Pseudo-Random Bit Generator (PRBG). The pseudo-random
bits generated this way are used to mask the original signal To obtain bits from each point x of one trajectory of the
[5], [6]. Using the above ideas, chaotic systems have been map (1), the value:
used as the basis for encryption methods for different signals, mod 1010 x, 1
(2)
such as image [7], [8], text [9], video [10] and audio [11],
[12]. This idea serves as the basis for most of the chaos-based is computed and subsequently compared with a threshold of
audio encryption methods [13]. 0.5. The corresponding bit is 1 when (2) is greater than the
Several chaotic audio encryption methods have been pro- threshold, and 0 otherwise. The randomness of the proposed
posed. In [3] and [14], audio encryption schemes based on method is validated through a set of statistical tests, defined
and implemented by the National Institute of Standards and
Technology (NIST) [22]. For the test to be executed, a total
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
of 100 ∗ 106 bits are generated using (2) and are used as input
to the statistical test suite, as suggested in the documentation.
C. Audio Encryption and Decryption
This work considers mono audio signals, that is audio
signals that consist of only one channel. Such signals can be
considered as one-dimensional time-series, with integer values
in the interval [−215 , 215 ] which are converted to their 16-bit
binary representation. In order to encrypt such a time-series,
the proposed PRBG (2) is used to generate 16 random bits
for each point of the original time-series. The encrypted value
is obtained by performing the Exclusive OR (XOR) operation
element-wise between the bits generated via the PRBG and
the binary representation of the original time-series. Fig. 2: Bifurcation diagram of the proposed map (1) for initial
For the decryption process, the same random bit sequence condition x0 = 0.4.
is obtained. Then the XOR operation is performed between
the encrypted signal and the random sequence. This operation
results into the original audio signal. proposed map is shown in Fig. 3. It can be observed that the
Lyapunov exponent is constantly positive, indicating that the
D. Real-Time Implementation proposed map (1) is chaotic.
To perform audio encryption in real time, the audio signal
is recorded in batches of a standard frequency of 4000Hz,
and duration, here chosen as 0.5s. This way, the time-series
recorded in each step has fixed length. In parallel to the
recording, the PRBG is used to produce the 16 bits to encrypt
each point of the recorded time-series. When both the audio
recording and bit generation are complete, XOR is performed
element-wise, thus obtaining the encrypted audio signal. This
process is repeated until the whole desired audio message is
recorded and encrypted. Regarding the initial conditions, for
the first recording the user selected the first initial condition
and parameter. For each subsequent encryption scheme, the
parameter remains the same, and the last point of the trajectory
is used as the initial value for the next encryption step. The
process is depicted in Fig. 1. Fig. 3: Lyapunov exponent of the proposed map (1) for initial
condition x0 = 0.4.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Fig. 4: Cobweb diagram for the map (1) with initial condition
x0 = 0.4 and r = 3.98.
(a) Original Audio (b) Encrypted Audio
TABLE I: NIST test results for the proposed PRBG. The initial Fig. 7: Histograms for the original and encrypted audio signals.
condition for the map (1) is x0 = 0.4 and r = 6.5.
Frequency 0.319084 100/100 Subsequently, our attention is turned to the key-space anal-
BlockFrequency 0.883171 100/100
CumulativeSums 0.419021 100/100
ysis. There are two keys values to be selected for the proposed
Runs 0.045675 99/100 encryption method, namely the parameter r and the initial
LongestRun 0.574903 97/100 condition x0 for the chaotic map (1). Assuming typical 16bit
Rank 0.304126 99/100 precision, then 102∗16 is an upper bound for the keyspace.
FFT 0.759756 99/100
NonOverlappingTemplate 0.474986 99/100 Considering that 102∗16 = (103 )10.66 ≈ (210 )10.66 > 2100
OverlappingTemplate 0.366918 99/100 which is usually considered as a threshold for an encryption
Universal 0.171867 97/100 method to be resistant against brute force attacks [23].
ApproximateEntropy 0.026948 100/100
RandomExcursions 0.484646 58/61 Additionally, the key sensitivity [13] of the proposed method
RandomExcursionsVariant 0.005833 61/61 is considered. The same original audio is encrypted with the
Serial 0.236810 100/100 same parameter value r = 5 and initial conditions x10 = 0.3
LinearComplexity 0.851383 99/100
and x20 = 0.3+10−10 . Subsequently, the correlation coefficient
between the two encrypted audio signals is calculated. A
correlation coefficient close to zero indicates key sensitivity.
For the above setup, the correlation coefficient was calculated
as 2.8 · 10−3 , indicating the key sensitivity of the proposed
method.
D. Implementation Details
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
R EFERENCES
[1] D. E. Standard et al., “Data encryption standard,” Federal Information
Processing Standards Publication, vol. 112, 1999.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Lazaros Moysis∗† , Marcin Lawnik‡ , Murilo S. Baptista§ , Sotirios Goudos¶ , Christos Volos∗
∗: Laboratory of Nonlinear Systems - Circuits & Complexity, Physics Department, Aristotle University of Thessaloniki,
Thessaloniki, Greece. {lmousis,volos}@physics.auth.gr
† : Department of Mechanical Engineering, University of Western Macedonia, Kozani, Greece.
‡ : Department of Mathematics Applications and Methods for Artifiial Intelligence, Faculty of Applied Mathematics,
Abstract—In numerous chaos based applications, it is often performance of a design, in the aforementioned applications.
important to use chaotic maps that showcase desirable statistical Motivated by this, the purpose of this work is to study the
characteristics, that can improve the performance of the designed effects of tuning the parameters of a piecewise map on its
architecture. Yet, it is hard to design maps that can achieve
the desired statistical properties. Motivated by this, this work statistical measures, like the mean value. Piecewise maps
considers a family of piecewise chaotic maps, and studies how [3]–[5], [15]–[21] are chosen for this study, as they are
tuning their parameters can affect the mean value of the very common in applications, have simple structure, and can
generated chaotic trajectories. Such a family of maps can be generate chaotic dynamics for wide parameter ranges. It is
highly practical in future chaos based applications. also easier to control the effect of parameter tuning on the
Index Terms—Chaos, piecewise map, mean value, encryption,
optimization shape of the map’s curve, which affects their behavior, as
will be shown in the next section. From the experimental
study performed on three maps, it is seen that by appropriate
I. I NTRODUCTION
parameter tuning, it is possible to control the mean value
The applications of chaos theory are well established in of the generated chaotic time series. This work can thus be
the fields of engineering, electronics, and communications a starting point towards developing a family of maps with
[1]. Examples include, pseudo random number/bit generators controllable statistical characteristics, for use in relevant chaos
[2]–[5], data security [6], secure communications [7], path based applications. We take an applied numerical perspective
planning [8]–[10], and optimization [11]–[14], among others. to this problem on the view of the non-linearity involved in 2
In such applications, a chaotic map is used as a source of our analysed maps, focusing on the structure of the mean
of unpredictability, and fed into the system architecture, to value appearing as a function of several tunable parameters.
improve its performance, with respect to a set of measures The rest of the work is structured as follows. In Section
relevant to the application at hand. II, the general form of the considered maps is presented.
For example, in data encryption and secure communications, In Section III three different maps are studied. Section IV
a chaotic system is used to mask an information signal concludes the work with a discussion on future goals.
through a series of operations like permutation, modulation,
and confusion, generating a masked signal that is statistically II. P ROPOSED FAMILY OF M APS
similar to random noise. Naturally, the chaotic source should
not reveal any statistical information to an adversary, to resist The maps considered in this work are piecewise maps of
unmasking attacks. In chaotic path planning, a chaotic map the form
feeds an autonomous agent with navigation commands, to (
unpredictably explore an area. Thus, the uniformity of the g1 (xi−1 ) 0 ≤ xi−1 ≤ b
xi = g(xi−1 ) = , (1)
generated trajectory will be heavily affected by the statistical g2 (xi−1 ) b < xi−1 ≤ 1
properties of the source map. In chaos optimization, chaotic
maps are used for the actions of selection, emigration, and that map on the interval xi ∈ [0, 1]. The maps have three
mutation, to improve the performance of evolutionary algo- control parameters a, b, c, and the functions g1 , g2 are designed
rithms. So maps with different statistical behavior will have so that they interpolate to the following three points g1 (0) = a,
adverse effects on the algorithm’s performance. g1 (b) = 1, g2 (1) = c. Tuning each one of the parameters has a
From the above examples, it can be understood that knowing different effect on the map’s behavior. These are the following.
how to control the behavior of a map, while maintaining 1) Parameter a ∈ [0, 1]. Increasing a changes the mapping
its chaotic behavior, will offer great ease in improving the of the first subfunction to g1 (xi−1 ) ∈ [a, 1]).
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Fig. 5. Diagram depicting the chaotic regions of the map of Eq. (3), with
f (x) = cos(ex−1 ), for different parameter values and c = 0. Black denotes
chaotic regions and white non-chaotic.
Fig. 3. Diagram depicting the changes in the mean value of the skewed tent
map (2) for different parameter values, and c = 0. The iteration step size is
0.0025.
Fig. 6. Diagram depicting the changes in the mean value of the map of Eq.
(3), with f (x) = cos(ex−1 ), for different parameter values and c = 0.
Fig. 4. The function g(x) of the map of Eq. (3), with f (x) = cos(ex−1 ),
for a = 0.2, b = 0.6, c = 0.3.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:05 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Sotirios K. Goudos
ELEDIA@AUTh, School of Physics
Aristotle University of Thessaloniki
Thessaloniki, Greece
sgoudo@physics.auth.gr
Abstract—Energy consumption is a key factor in environmental Manufacturers of refrigeration equipment are consequently
sustainability and the economy. In the next 30 years, global food placing more emphasis on strategies that will reduce the
production will need to increase rapidly and, in this context, amount of power required for effective functioning [2].
refrigeration will play a pivotal role. In the SmartFridge project,
several aspects of the operating cycle of refrigeration equipment Many researchers are working in the field of smart equip-
are explored. Energy consumption reduction is one of the most ment and smart devices by trying to improve the perfor-
important use cases of the project. For this reason, this work mance of the devices while decreasing energy consumption.
provides an assessment of three scenarios in a refrigerator model, Mathematical models [3], simulated models [4], [5], as well
in which the indoor temperature and moisture, as well as the as experimental ones [6], [7] have been proposed to predict
interior temperature, are explored to quantify their impact on
energy consumption. the energy consumption of refrigerators. In the SmartFridge
Index Terms—Smart refrigerator, energy consumption, indoor project, several approaches will be adapted to study the energy
temperature, indoor moisture, interior temperature consumption of refrigeration equipment in food retail stores
[8].
I. I NTRODUCTION During the day when the food retail stores are open, a
In the last few years, Europe is experiencing an energy considerable increase in power consumption is recorded in
crisis. The cost of electricity is continuously rising and, at the refrigeration systems due to the repeated door opening by the
same time, the amount of power consumed is also increasing. customers. As a result, the refrigeration systems of cooling
Companies in the food retail sector are ranked second in chambers are set to operate at lower temperatures to maintain
terms of the energy footprint in developed countries, behind the interior temperature within desired limits and insure the
industries. According to the Food and Agriculture Organi- quality of the stored food. However, during the night when
zation (FAO), global food production will need to increase the retail stores are closed, the systems could be adjusted to
by 70% to feed an additional 2.3 billion people by 2050 higher temperatures without any risk to the preservation of
and, in this context, refrigeration will play a pivotal role [1]. food quality.
The core idea in the SmartFridge project is the imple-
This research was carried out as part of the project ”Smart Fridge Using mentation of a new system that combines IoT technology,
IoT Technology” (Project code: KMP6-0078831) under the framework of wireless sensor networking (WSN), and embedded artificial
the Action ”Investment Plans of Innovation” of the Operational Program
”Central Macedonia 2014 2020”, that is co-funded by the European Regional intelligence (AI) to reduce the energy consumption of re-
Development Fund and Greece. frigeration equipment. In [2], the development of an IoT-
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
based device is proposed to reduce the power consumption • Compressor: The cooling process of the refrigerator
of a real system. The implementation of the overall system is begins in the compressor. The motor that powers the
ongoing. In this work, we focus on the evaluation of the energy compressor increases the pressure and temperature of the
consumption in computed scenarios using a refrigerator model. refrigerant gas before delivering it to the condenser.
Computed and experimental results will be evaluated and the • Condenser: The refrigerant liquefies in the condenser. The
overall performance of the proposed system will be assessed. hot vapors are collected by the condenser and cooled into
Furthermore, a machine-learning model will be developed to a liquid.
extract the desired period times where the reduce of energy • Evaporator: In the evaporator, the cooling process comes
consumption can take place based on the experimental data. to an end and the next cycle starts. It converts the
Finally, the extracted period times will also be evaluated remaining refrigerant liquid into vapor, which is used by
compated to the computed results by the proposed model in the compressor to restart the process.
this work. • Accumulator: This module keeps liquid refrigerant from
The remainder of this paper is structured as follows. In Sec- rushing back into the compressor in vapor compression
tion II, an overview of the simulation model, which is selected refrigeration systems.
to assess the performance of the refrigeration equipment is • Capillary tube: A thin set of copper tubes called the
presented. In Section III, the computed scenarios, as well as expansion valve drastically reduces the temperature and
their results, are analyzed and discussed. The conclusion of pressure of the liquid refrigerant, which results in around
the work is outlined in Section IV. half of it evaporating. The low temperature inside the
refrigerator is produced by the constant evaporation that
II. M ATERIALS AND M ETHODS takes place in the capillary tube.
• Compartment: The compartment is mainly the fridge
A. Simulation model chamber where the food is stored in. The door is also
The Simulink Residential Refrigerator Model is provided by an important part of the compartment.
Simulink, Mathworks [10]. It depicts a simple refrigeration Details of the utilized model can be found in [9].
system that transfers heat between the moist air mixture of
the environment and the two-phase fluid of the refrigerant. B. Model Parameterization
The R134a refrigerant is driven by the compressor through
a condenser, a capillary tube, and an evaporator. Pure vapor Three main parameters that mostly affect the energy con-
is returned to the compressor by an accumulator. Two fans sumption of the model are selected to be evaluated.
circulate moist air over the condenser and evaporator. The • Indoor temperature: A temperature range from 8°C to
evaporator airflow is split between the main compartment and 32°C is chosen, as those temperature values can be
the freezer. Finally, a controller prompts the operation of the found inside a typical food retail store in Greece (and
compressor to maintain the compartment air temperature at in continental climates in general) during the year. The
the setpoint temperature [9]. computed scenarios are taken place during the night when
In detail, the model consists of the following modules as the stores are closed and the heating/cooling system is
shown in Fig. 1: probably turned off.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
in the rate of the interior temperature in the model. We choose [3] S. Ledesma, J. M. Belman-Flores, E. Cabal-Yepez, A. Morales-Fuentes,
the reference point at 0.5°C and calculate the quantitative J. A. Alfaro-Ayala, and M.-A. Ibarra-Manzano, “Mathematical models
to predict and analyze the energy consumption of a domestic refrig-
variation from 1°C to 7°C. The results are presented for the erator for different position of the shelves,” IEEE Access, vol. 6, pp.
lowest and highest value of indoor temperature and for an 68882–68891, 2018, doi: 10.1109/ACCESS.2018.2880653.
average one (8°C, 32°C, and 22°C) as shown in Fig. 5. By [4] D. Villanueva, D. San-Facundo, E. Miguez-Garcı́a, and A. Fernández-
Otero, “Modeling and Simulation of Household Appliances Power
increasing the indoor temperature, the variation in the rate is Consumption,” Applied Sciences, vol. 12, no. 7, p. 3689, Apr. 2022,
significantly decreased. The results highlight the importance doi: 10.3390/app12073689.
of the valid configuration of the parameters in the presented [5] C.J. Hermes, C. Melo, F. T. Knabben, and J. M. Gonçalves, ”Prediction
of the energy consumption of household refrigerators and freezers
model, especially in scenarios with low indoor temperatures. via steady-state simulation.” Applied Energy, 86(7-8), 1311-1319, doi:
10.1016/j.apenergy.2008.10.008
[6] G. Cortella, P. D’Agaro, M. Franceschi, and O. Saro, “Prediction of
IV. C ONCLUSIONS the energy consumption of a supermarket refrigeration system,” in Proc.
23rd International Congress of Refrigeration, 2011.
In this work, an assessment of a refrigerator model by terms [7] H. Nasir, W. B. W. Aziz, F. Ali, K. Kadir, and S. Khan, “The
of the energy consumption has been presented. Parameters implementation of iot based smart refrigerator system,” in 2018 2nd
such as indoor temperature and moisture were explored to International Conference on Smart Sensors and Application (ICSSA),
2018, pp. 48–52.
find the effect on the energy consumption of the model. [8] K. Koritsoglou, M. Papadopoulou, A. Boursianis, A. Griva, S. Niko-
As the indoor temperature increases, consumption is also laidis, and S. Goudos, “Reducing power consumption in refrigeration
increased. Regarding the indoor moisture, a small effect on equipment using iot technology: The smartfridge project.” presented at
IEEE – 7th South-East Europe Design Automation, Computer Engi-
the energy consumption of the fridge was recorded during neering, Computer Networks and Social Media Conference, Ioannina,
the simulations. Finally, the impact of the selection of the Greece, Sept. 2022.
interior temperature is also studied and its notable effect is [9] “Residential refrigerator.” [Online]. Available:
https://www.mathworks.com/help/hydro/ug/refrigerator.html.
highlighted. As future work, the computed results will be [10] “Simulink Environment, The MathWorks, Inc.” [online]. Available:
compared with the experimental results of the SmartFridge https://www.mathworks.com/products/simulink.html.
project. In addition, a machine learning model will be designed [11] “European food safety authority.” [Online]. Available:
https://www.efsa.europa.eu/en.
to predict the energy consumption of refrigeration equipment [12] “Hellenic food authority.” [Online]. Available:
and its performance will be compared with the findings of this https://www.efet.gr/index.php/en/.
work. [13] “Hygiene declaration.” [Online (in Greek)]. Available:
https://www.efet.gr/files/F5242odhgap.pdf.
R EFERENCES
[1] D. Coulomb, J. L. Dupont, and A. Pichard, “The role of refrigeration in
the global economy.” in Proc. 29th Informatory Note on Refrigeration
Technologies, 2015, pp. 1-16.
[2] K. Koritsoglou, M. Papadopoulou, A. Boursianis, P. Sarigiannidis,
S. Nikolaidis, and S. Goudos, “Smart refrigeration equipment based
on iot technology for reducing power consumption,” 11th Interna-
tional Conference on Modern Circuits and Systems Technologies
(MOCAST), Bremen, Germany, 2022, pp. 1-4, doi: 10.1109/MO-
CAST54814.2022.9837760.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Abstract—In-vitro fertilization (IVF) is considered one of the of the embryo’s development. TLI enables controlled culture
most effective assisted reproductive approaches. However, it conditions, while the developmental events are annotated in a
involves a series of complex and expensive procedures, that result dynamic procedure. Such events, which are called morphoki-
in approximately 30% success rate. New technological concepts,
like deep learning (DL), are required to increase the success rate netic parameters, include cell divisions, blastocyst formation,
while simplifying the procedure. DL techniques can automatically and expansion [2].
extract valuable features from the available data, can be flexible,
and can work efficiently on multiple problems. Motivated by these Deep learning (DL) methods are part of machine learning
characteristics, in this work, a DL ensemble model is trained on methodologies and have been established as the most success-
a private dataset to classify blastocysts’ images. Further analysis
is conducted to provide insights for future research. ful set of approaches in the field of computer vision (CV) [3].
Index Terms—Deep learning, IVF, convolutional neural net- Convolutional neural networks (CNNs), after their success in
work, ensemble learning various image classification tasks [4], constitute the dominant
approach in CV and have been widely utilized in several CV
I. I NTRODUCTION tasks, including medical imaging [5]. In the last five years,
a growing research trend for DL-empowered IVF approaches
Thanks to in-vitro fertilization (IVF) technology, millions of
can be identified [6]. Motivated by the above, in this paper, an
kids have been born. However, only one-third of the couples,
ensemble DL model is trained on a private dataset to classify
who resort to IVF, manage to have a child, despite the long
blastocysts’ images.
and expensive procedures. To overcome several challenges,
such as the age, the quality of the embryo, and the limitations
This work is part of the ”Smart Embryo” project which
of the required technology [1], embryologists and researchers
aims to develop artificial intelligence (AI) based software
search for new tools and methods that will provide better
to assist embryologists to evaluate the blastocysts’ quality
results. Time-lapse imaging incubators (TLI) were introduced
before transfer. This research project aims to develop novel
as an IVF method more than ten years ago. Setting regular
AI methods for DL-powered IVF.
time intervals, TLI takes photographs, which form a video
This research was carried out as part of the project ”Classification and This work is structured in the following way: Section II
characterization of fetal images for assisted reproduction using artificial describes the problem, whereas Section III provides the details
intelligence and computer vision” (Project code: KP6-0079459) under the of our DL model. The experiments and the results of our
framework of the Action ”Investment Plans of Innovation” of the Operational
Program ”Central Macedonia 2014 2020”, that is co-funded by the European approach are discussed in Section IV. The conclusions are
Regional Development Fund and Greece. presented in Section V.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
III. DL METHODOLOGIES
To address the problem of blastocysts’ quality classification,
three different DL architectures are utilized to construct an
ensemble learner. All three models are based on the convolu-
(e) BC (f) CB tion filters (kernels) that slide along the images to provide the
respective feature maps.
A. DL models
AlexNet [8] is the first CNN that outperformed the classical
CV algorithms in the Imagenet large-scale visual recognition
challenge (ILSVRC) [9]. AlexNet utilizes five convolutional
layers, while down-sampled feature maps are created by the
(g) CC (h) DD three max-pooling layers. The flattening of the data and the
Fig. 1: Blastocyst images obtained from our private dataset. outcomes are the results of the three fully-connected layers.
The activation function in the convolutional layers is the
rectified linear unit (ReLU). ReLU is simply defined as
II. P ROBLEM DESCRIPTION g(z) = max(0, z) (1)
The different aspects of the blastocysts’ grading system are
Another popular CNN model is the VGG11 [10]. VGG11
presented in this Section. This morphological evaluation sys-
has eight convolution layers, five max-pooling layers, and three
tem has been developed to evaluate the quality of a blastocyst
fully-connected layers. As in the AlexNet architecture, the
before the transfer and can be exploited by DL models. The
activation function is taken to be the ReLU function. Both
number of cells inside the embryo begins to outgrow the space
AlexNet and VGG11 use dropout layers, which randomly set
inside the zona pellucida. Once this zone breaks, the blastocyst
input units to 0. In this way, DL models avoid over-fitting.
can hatch. Each blastocyst contains two types of cells; those
A variant of VGG is constructed for this particular problem.
that form the placental tissue, and those that form the fetus.
This model has seven convolutional layers, four max-pooling
Two key parameters that help embryologists decide which
layers, and three fully-connected layers. In contrast to AlexNet
blastocyst has more viability chances [7] are:
and VGG11, no dropout layer is utilized.
• Inner cell mass (ICM) score, (A) through (D)
• Trophectoderm (TE) score, (A) through (D) B. Ensemble learning
In Fig. 1, samples of blastocyst images are depicted. The Ensemble learning is frequently utilized to achieve better
quality scores are presented in Tables I and II. Since the results. In this work, VGG11, AlexNet, and the variant of VGG
blastocysts’ grading system is based only on morphological constructed an ensemble classifier. By combining the proba-
characteristics, their classification may be formulated into a bility outputs of each base model into two new probabilistic
CV problem. outcomes, the ensemble can predict with greater confidence the
In this work, the dataset is constructed in the following way. blastocyst’s class. The fact that two of the DL models contain
First, images, accompanied by their respective metadata, are dropout layers results in a large variance. To stabilize and
collected from a TLI. Then, the images are categorized into further improve the DL models’ performance, five instances
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
R EFERENCES
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:06 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Abstract—In regards to the field of trend forecasting in time process non-Independent and Identically Distributed (non-IID)
series, many popular Deep Learning (DL) methods such as data [1]. This data is highly volatile and usually leads to lower
Long Short Term Memory (LSTM) models have been the gold model quality. This work investigates the effect of volatile
standard for a long time. However, depending on the domain
and application, it has been shown that a new approach can be federated data on benchmark Deep Neural Architectures in
implemented and possibly be more beneficial, the Transformer order to specify their efficacy in the federated environment on
deep neural networks. Moreover, one can incorporate Federated everyday tasks.
Learning (FL) in order to further enhance the prospective utility Under the broader category of Deep Learning, Transformers
of the models, enabling multiple data providers to jointly train [2] are a particular type of Artificial Intelligence (AI) model
on a common model, while maintaining the privacy of their
data. In this paper, we use an experimental Federated Learning that excels in Natural Language Processing (NLP) [3] tasks in
System that employs both Transformer and LSTM models on particular. These models are unrelated to other popular types,
a variety of datasets. The sytem receives data from multiple such as Recurrent Neural Networks (RNNs) [4].
clients and uses federation to create an optimized global model. Federated Learning is a technique designed to overcome
The potential of Federated Learning in real-time forecasting is
various challenges associated with traditional Machine Learn-
explored by comparing the federated approach with conventional
local training. Furthermore, a comparison is made between the ing methods, such as the privacy dangers that come from the
performance of the Transformer and its equivalent LSTM in need to collect data from multiple sources [5].
order to determine which one is more effective in each given This paper introduces a Centralized Federated Learning
domain, which shows that the Transformer model can produce System (CFLS) that employs state-of-the-art DL techniques
better results, especially when optimised by the FL process.
Index Terms—Federated Learning, Deep Learning, LSTM,
to address the challenges posed by each provided use case
Transformers, Forecasting, Federated Dataset and also helps in deploying and evaluating the aforementioned
DNN architectures. For our purposes, we focus on Transformer
I. I NTRODUCTION models and how they compare to the popular Long Short Term
The development process of reliable AI models in the realm Memory (LSTM) [6] approach. In summary, this work offers
of Machine Learning (ML) often requires the gathering of data the following contributions:L
from multiple sources. This strategy of using varied data can • Creates an experimentation system to evaluate the per-
enhance the model’s predictive capabilities, as it is trained formance of Federated data on benchmark DNN archi-
on a broader range of patterns. Nevertheless, this approach tectures
has several drawbacks and risks, the primary concern being • Compares the performance of two architectures (LSTM,
privacy. All datasets from each source must be uploaded to a Transformer) on volatile (non-iid) federated data
single server, creating a strictly local ML technique that may The rest of this paper is organized as follows: Section II
intentionally or unintentionally expose sensitive information. features the related work. Section III, outlines the methodol-
Federated Learning is a technique aimed at addressing this ogy, while Section IV presents a series of quantitative results
difficulty, among others. Unfortunately, when talking about on the available data. The paper concludes with Section V.
federated data it is specified that usually, the system has to
∗ I. Siniosoglou and P. Sarigiannidis are with the Department of Electrical II. L ITERATURE R EVIEW
and Computer Engineering, University of Western Macedonia, Kozani, Greece Transformer models are widely used in the field, with no-
- E-Mail: {isiniosoglou, psarigiannidis}@uowm.gr
† K. Xouveroudis is with the R&D Department, MetaMind Innovations table examples including BERT [7], a method of pre-training
P.C., Kozani, Greece - E-Mail: kxouveroudis@metamind.gr Transformer models that has achieved top-level performance
‡ V. Argyriou is with the Department of Networks and Digital Media,
across various NLP tasks. In addition, [8] proposed RoBERTa,
Kingston University, Kingston upon Thames, United Kingdom - E-Mail:
vasileios.argyriou@kingston.ac.uk which made several optimizations to the BERT pre-training
§ T. Lagkas is with the Department of Computer Science, In- process, resulting in improved performance. Other noteworthy
ternational Hellenic University, Kavala Campus, Greece - E-Mail: publications consist of XLM-R [9], a variant of RoBERTa
tlagkas@cs.ihu.gr
¶ S. K. Goudos is with the Physics Department, Aritostle that is cross-lingual and has the ability to attain high-quality
University of Thessaloniki, Thessaloniki, Greece - E-Mail: representations for languages that it was not explicitly trained
sgoudo@physics.auth.gr on, as well as GPT-2 [10], a Transformer-based language
∥ K. E. Psannis is with the Department of Applied Informatics, School
of Information Sciences, University of Macedonia, Thessaloniki, Greece - model of significant scale that has demonstrated remarkable
E-Mail: kpsannis@uom.edu.gr performance overall.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
P E(pos,2i+1) = cos (pos/100002i/dmodel ) (2) embedding. Subsequently, the residual connection’s output
is normalized via layer normalization and is then projected
Where pos represents the position of the element within
through a pointwise feed-forward network to be further pro-
the sequence, i denotes the index of the dimension in the
cessed. This network, also seen in Fig. 4, consists of linear
embedding vector, and dmodel refers to the embedding’s di-
layers with a ReLu activation function sandwiched between
mensionality. The sin formula (Eq. 1) is used to build a vector
them. The output of this network is normalized and added to
for each even index and the cos formula (Eq. 2) for each odd
the layer normalization as well. The Decoder, like the Encoder,
index. Then, those vectors are incorporated into the parts of
has two multi-headed attention layers, a feed-forward layer,
the input that correspond to them.
residual connections, and layer normalization. It generates
We can see the Transformer model’s complete structure via
tokens sequentially using a start token and is auto-regressive.
the diagram in Fig. 1. The Encoder layer has two sub-modules:
The first attention layer calculates scores for the input using
multi-headed attention and a completely interconnected net-
positional embeddings. The final layer is a classifier, with the
work. There are also residual connections encircling each of
Encoder providing attention-related information.
the two sub-layers, and additionally, a normalization layer. To prevent the conditioning of future tokens in the first at-
Self-attention is applied to the encoder via the multi-headed tention layer, a masking process is implemented. This modifies
attention module. This feature allows each input component attention scores for future tokens using look-ahead masking,
to be connected to other components. In order to achieve self- which involves setting them to ”-inf”. Such is displayed in Eq.
attention, we need to feed the input into three distinct and 4, where a) the first matrix represents the scaled scores, b) the
fully connected layers, (i) the Query Q, (ii) the Key K and second matrix represents the look-ahead mask added to it and
(iii) the Values V . and c) the last matrix represents the new, masked scores.
The overall process can be seen in Fig. 2. Once the Q, K,
and V vectors have been passed through a linear layer each,
0.4 0.1 0.1 0.1 0.2
0 −inf −inf −inf −inf
0.4 −inf −inf −inf −inf
−inf −inf −inf −inf −inf −inf
we compute the attention score, which determines the degree 0.3
0.5
0.9
0.8
0.1
0.5
0.1
0.4
0.2
0.9
0
+
0
0
0 0 −inf
0.3
−inf =
0.5
0.9
0.8 0.5 −inf
−inf
(4)
−inf 0.2 −inf
of importance that one component, such as a word, should 0.2
0.2
0.6
0.1
0.7
0.3
0.9
0.9
0.7 0
0.5 0
0
0
0
0
0
0 0 0.2
0.6
0.1
0.7
0.3
0.9
0.9 0.5
assign to other components. In order to calculate the attention
When utilizing attention, the masking process is executed
score, the Q vector is dot-multiplied with all other K vectors
between the scaler and the softmax layer (Fig. 2). The new
within the arrangement of data.
scores are then normalized through softmax, making all future
To achieve greater gradient stability, the scores are scaled
token values zero, and eliminating their impact. The Decoder’s
down by dividing them with the square root of the dimensions
second attention layer aligns the input of the Encoder with that
of the query and key. Finally, the attention weights are derived
of the Decoder, emphasizing the most significant information
by applying the softmax function to the scaled score, resulting
between them. The output of this layer undergoes further
in probability values [0 − 1], with a higher value indicating a
processing before being fed into a linear and a softmax layer,
greater level of significance for the component, seen in Eq. 3.
and the highest score index is chosen as the prediction. The
QK T Decoder can be stacked on multiple layers, resulting in added
Attention(Q, K, V ) = sof tmax( √ )V (3)
dk diversity and enhancing predictive capabilities.
Additionally, multi-headed attention can be enabled by par- C. Federated Learning
alleling the attention mechanism, enabling multiple processes Federated Learning is a modern technique for securely
to utilize it in conjunction. This is achieved by splitting the training decentralized Machine Learning and Deep Learning
query, key and value into multiple vectors prior to imple- models. FL [16] manages the distribution, training, and ag-
menting self-attention. A self-attention process is referred to gregation of DL models across multiple devices at the edge
as a head. Every head generates an output vector, which is
concatenated into a singular vector before entering the final
linear layer. The goal is for each head to learn information at
least partially exclusive to it, further expanding the encoder’s
capabilities. We can see a representation of multi-head atten-
tion at Fig. 3.
A residual connection is created by adding the multi-
headed attention output vector to the initial positional input
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
B. Experimental Results
To validate the performance of the tested models, three
benchmark metrics were used, outlined below in Eq. 6, 7, and
8. Here, Ti represents the true values, Pi represents the model
predictions, and n denotes the total number of data points.
1) Mean Square Error (MSE): This metric calculates the
average of the squared differences between the predicted and
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
knowledge from the distributed data nodes. As can be seen in [7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
the results, the Federated Transformer models are superior to of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
both the local and Federated LSTM models. [8] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
A more pronounced distinction is observed in the stable L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert
sensor data. In the local training, the LSTM and Transformer pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
[9] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek,
models exhibit similar performance. However, when subjected F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsu-
to FL, the Transformer displays substantially lower loss while pervised cross-lingual representation learning at scale,” arXiv preprint
it produces more consistent results across its nodes overall for arXiv:1911.02116, 2019.
[10] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al.,
both of its models and unlike the LSTM, the Transformer’s “Language models are unsupervised multitask learners,” OpenAI blog,
FL models outperform their local counterparts. vol. 1, no. 8, p. 9, 2019.
Lastly, in the forecasting of Daily Product Sales data, we [11] J. G. De Gooijer and R. J. Hyndman, “25 years of time series forecast-
ing,” International journal of forecasting, vol. 22, no. 3, pp. 443–473,
observe another scenario where the performance of the two 2006.
models is comparable. Yet, in this instance, the Transformer [12] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang,
model also outperforms the LSTM. This pattern holds for both “Informer: Beyond efficient transformer for long sequence time-series
forecasting,” in Proceedings of the AAAI conference on artificial intel-
local and Federated Learning of the models. ligence, vol. 35, no. 12, 2021, pp. 11 106–11 115.
[13] N. Wu, B. Green, X. Ben, and S. O’Banion, “Deep transformer mod-
V. C ONCLUSIONS els for time series forecasting: The influenza prevalence case,” arXiv
preprint arXiv:2001.08317, 2020.
Federated Learning is a valuable attribute that aids in safe- [14] B. Tang and D. S. Matteson, “Probabilistic transformer for time series
guarding data confidentiality, as well as offering model quality analysis,” Advances in Neural Information Processing Systems, vol. 34,
pp. 23 592–23 608, 2021.
enhancement and quicker processing times for large datasets [15] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and
and real-time predictions. In this paper, we compare the D. Bacon, “Federated learning: Strategies for improving communication
performance of two benchmark DNN architectures, namely, efficiency,” arXiv preprint arXiv:1610.05492, 2016.
[16] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
a) LSTMs and b) Transformers, for timeseries forecasting “Communication-efficient learning of deep networks from decentralized
in real-world federated datasets containing non-iid distribu- data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–
tions. We further employed FL in conjunction to evaluate 1282.
[17] A. Taı̈k and S. Cherkaoui, “Electrical load forecasting using edge com-
the performance of the local models against the federated puting and federated learning,” in ICC 2020-2020 IEEE International
global model that is trained on knowledge from distributed Conference on Communications (ICC). IEEE, 2020, pp. 1–6.
data nodes. The experimental results showed that the perfor- [18] T. Yu, G. Hua, H. Wang, J. Yang, and J. Hu, “Federated-lstm based net-
work intrusion detection method for intelligent connected vehicles,” in
mance of both the local and Federated Transformer models ICC 2022-IEEE International Conference on Communications. IEEE,
was comparable to LSTM models. The results indicate that 2022, pp. 4324–4329.
consistently Transformers have the potential to compete with [19] Y. Liu, S. Garg, J. Nie, Y. Zhang, Z. Xiong, J. Kang, and M. S.
Hossain, “Deep anomaly detection for time-series data in industrial
and even surpass LSTMs in terms of performance in predicting iot: A communication-efficient on-device federated learning approach,”
on federated data intricately affected by external factors, not IEEE Internet of Things Journal, vol. 8, no. 8, pp. 6348–6358, 2020.
always included in the data features. [20] H. Li, Z. Cai, J. Wang, J. Tang, W. Ding, C.-T. Lin, and Y. Shi, “Fedtp:
Federated learning by transformer personalization,” arXiv preprint
arXiv:2211.01572, 2022.
ACKNOWLEDGEMENT [21] A. Raza, K. P. Tran, L. Koehl, and S. Li, “Anofed: Adaptive anomaly de-
This project has received funding from the European tection for digital health using transformer-based federated learning and
support vector data description,” Engineering Applications of Artificial
Union’s Horizon 2020 research and innovation programme Intelligence, vol. 121, p. 106051, 2023.
under grant agreement No. 957406 (TERMINET). [22] J. Huang, Y.-F. Li, and M. Xie, “An empirical analysis of data
preprocessing for machine learning-based software cost estimation,”
R EFERENCES Information and software Technology, vol. 67, pp. 108–127, 2015.
[23] W. McKinney et al., “pandas: a foundational python library for data
[1] M. Tiwari, I. Maity, and S. Misra, “Fedserv: Federated task service analysis and statistics,” Python for high performance and scientific
in fog-enabled internet of vehicles,” IEEE Transactions on Intelligent computing, vol. 14, no. 9, pp. 1–9, 2011.
Transportation Systems, vol. 23, no. 11, pp. 20 943–20 952, 2022. [24] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
[2] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, jointly learning to align and translate,” arXiv preprint arXiv:1409.0473,
Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in 2014.
neural information processing systems, vol. 30, 2017. [25] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, and H. Yu, “Federated
[3] P. M. Nadkarni, L. Ohno-Machado, and W. W. Chapman, “Natural learning,” Synthesis Lectures on Artificial Intelligence and Machine
language processing: an introduction,” Journal of the American Medical Learning, vol. 13, no. 3, pp. 1–207, 2019.
Informatics Association, vol. 18, no. 5, pp. 544–551, 2011. [26] C. Chaschatzis, A. Lytos, S. Bibi, T. Lagkas, C. Petaloti, S. Goudos,
[4] S. Grossberg, “Recurrent neural networks,” Scholarpedia, vol. 8, no. 2, I. Moscholios, and P. Sarigiannidis, “Integration of information and
p. 1888, 2013. communication technologies in agriculture for farm management and
[5] I. Siniosoglou, V. Argyriou, S. Bibi, T. Lagkas, and P. Sarigiannidis, knowledge exchange,” in 2022 11th International Conference on Modern
“Unsupervised ethical equity evaluation of adversarial federated Circuits and Systems Technologies (MOCAST), 2022, pp. 1–4.
networks,” in Proceedings of the 16th International Conference on [27] L. Perez and J. Wang, “The effectiveness of data augmentation in
Availability, Reliability and Security, ser. ARES 21. New York, NY, image classification using deep learning,” 2017. [Online]. Available:
USA: Association for Computing Machinery, 2021. [Online]. Available: https://arxiv.org/abs/1712.04621
https://doi.org/10.1145/3465481.3470478
[6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Abstract—Deep learning techniques have found numerous classification problems lies in their ability to apply filters for
applications in the sector of cultural and religious heritage. Greek feature extraction, by exploiting images’ grid topology.
Orthodox Church hymns are examples of Byzantine religious
tradition. Their performance is governed by stringent rules not
observed in other cultures. This work trains a novel shallow Greek Orthodox Church hymns are part of the Orthodox
convolutional neural network to identify Greek Orthodox Church Christian tradition and their vocal performance is not accom-
hymns using a private dataset. For ten of the most frequently panied by any musical instrument. The ML-based recognition
chanted hymns, the deep learning model achieves over 98% and classification of the above-mentioned hymns is a challeng-
accuracy. AlexNet and MobileNetV2 are trained on the same data ing problem. This is due to the fact that they are vocal music
to evaluate their performance. Additional analysis is conducted
to validate the obtained results. whose performance depends on the chanting voice. In addition,
Index Terms—Byzantine hymns, deep learning, convolutional byzantine hymns’ climaxes differ from the Western ones.
neural networks, Mel-spectrograms This work collects a corpus of 10 Greek Orthodox Church
hymns, having more than 200 samples each. The audio data is
I. I NTRODUCTION transformed into images in the Mel scale (Mel-spectrograms),
and then a novel shallow CNN is trained to classify them
The cultural and religious heritage sectors require the study
correctly. To the best of the authors’ knowledge, this is the
of a diverse and large amount of data. Recently, machine learn-
first time that DL techniques are applied to the task of Greek
ing (ML) approaches have been considered complementary
Orthodox Church hymns recognition.
ones to traditional techniques [1]. In addition, deep learning
(DL) based computer vision (CV) algorithms are employed in
This paper is part of the "Ymnodos" project, which aims
cases where useful information needs to be extracted from
to develop software suitable for mobile devices to identify
the available data. More specifically, convolutional neural
Greek Orthodox Church hymns. The software will be based
networks (CNNs) have been trained to perform paintings’
on artificial intelligence and signal-processing techniques. The
style identification [2], the music features extraction [3], and
ultimate goal is to further promote Byzantine cultural heritage
optical music classification [4]. The success of CNNs in image
for educational and religious purposes.
This research was carried out as part of the project «Recognition and
direct characterization of cultural items for the education and promotion of The rest of this treatise is organized as follows: Section II
Byzantine Music using artificial intelligence» (Project code: KMP6-0078938) provides the architecture of the designed model. The exper-
under the framework of the Action «Investment Plans of Innovation» of the
Operational Program «Central Macedonia 2014 2020», that is co-funded by iments and the results form Section III. Section IV presents
the European Regional Development Fund and Greece. the conclusions of this work.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:04 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Abstract—In the last few years, novel meta-learning techniques how to learn. In contrast to conventional approaches, meta-
have established a new field of research. Great emphasis is given learning intends to enhance the learning algorithm itself, taking
to few-shot learning approaches, where the model is trained by into account the knowledge gained from previous learning
using only a few training examples. In this work, a review of
several model-agnostic meta-learning methodologies (MAML) is episodes [3]. One famous meta-learning method is the model-
presented. Firstly, we identify and discuss the typical characteris- agnostic meta-learning (MAML) algorithm. MAML is char-
tics of the first proposed MAML algorithm. Next, we classify the acterized by its simplicity and the fact that can be applied
model-agnostic approaches into three main categories: Regular to any problem whose solution is approximated with gradient
gradient descent MAML, Hessian-free MAML, and Bayesian descent.
MALM, by presenting their advantages and limitations. Finally,
we conclude this work with further discussion and highlight the The TERMINET project aims at providing a novel next-
future research directions. generation reference architecture based on cutting-edge tech-
Index Terms—Model-agnostic meta-learning, meta-learning, nologies. Within this project, both the multi-task and meta-
deep learning, few-shot learning, one-shot learning learning perspectives will be supported to enable personalized
or device-specific modeling. The Meta-Learning technique
I. I NTRODUCTION will be used to optimize the performance of each device
based on the given results of few-shot adaptation examples
Humans have the great ability to learn quickly. We can
on heterogeneous tasks. Moreover, MAML algorithms will
recognize objects from a few examples, we can learn new tasks
be considered and adopted as guiding approaches that offer
based on prior experience and a small amount of information,
improved personalized models for a large majority of the
thus we are expecting our deep neural networks to perform
Internet of Things (IoT) end devices.
the same. However, large amounts of data are required from
such models to perform effectively. This is the reason why Motivated by the above, the current contribution provides a
we are constantly investigating approaches such as few-shot state-of-the-art review of the existing research from the area
learning techniques to decrease the amount of data and at the of MAML methodologies. In detail, the contribution of this
same time improve our models’ performance. paper is summarized as follows:
Few-shot learning is a challenging approach to training a 1) Three state-of-the-art MAML techniques, namely regu-
learning model by using only a few examples. For multiple lar gradient descent MAML, hessian-free MAML, and
tasks, a common feature representation can be learned using bayesian MAML are identified and thoroughly studied.
prior experience [1]. Recently, a meta-learning approach has 2) The principles, advantages, and limitations of each al-
been used to address the issue of few-shot learning. Meta- gorithm are highlighted.
learning studies how intelligent systems can increase their The rest of this paper is organized as follows: Section II in-
performance through experience [2]. In other words, we troduces the first MAML algorithm that was proposed in 2017.
can describe meta-learning as the mechanism for learning Section III presents the most popular MAML techniques and
classifies them according to their characteristics. Section IV
This work has received funding from the European Union’s Horizon
2020 research and innovation programme under grant agreement No. 957406 explores the future directions, whereas Section V concludes
(TERMINET). this work and summarizes its findings.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
II. M ODEL -AGNOSTIC M ETA -L EARNING ALGORITHM • Hessian-free MAML algorithms, and
• Bayesian MAML algorithms.
MAML is an algorithm for meta-learning that was proposed
in 2017 by Chelsea Finn et al. [3]. It is called model agnostic
A. Regular gradient descent MAML algorithms
because it was designed to perform with any model trained
with gradient descent and also to be applicable to a wide range The main characteristic of the methods that can be found
of learning problems mainly in the deep learning framework. in the first technique is that higher-order derivatives are used
The core idea is to train the initial parameters of the model in the backpropagation of the error procedure. The algorithms
to maximize its performance on a new task after updating are proposed to optimize the main MAML algorithm in terms
those parameters, Fig. 1. This can be achieved through one of computational efficiency and stability. Moreover, they can
or more gradient steps using a small amount of data from the be extended to problems with previously unseen classes.
new task. The proposed algorithm neither increases the number • Implicit Model-Agnostic Meta-Learning (iMAML) was
of learned parameters nor imposes restrictions on the model’s proposed by Aravind Rajeswaran et al. [5] in 2019. This
architecture. Furthermore, it can be used with a variety of loss algorithm was developed in order to achieve significant
functions, and the idea is to maximize the sensitivity of those gains in computing and memory efficiency. iMAML gives
loss functions of new tasks with respect to the parameters. the opportunity to select the suitable inner optimization
Training the parameters of the model such that a few or method. In this algorithm, there is no need of differ-
even a single gradient step can lead to better results on a new entiating through the optimization path. The algorithm’s
task can be generalized as building an internal representation goal is to learn a set of parameters and lead to good
that is broadly applicable to a wide range of tasks. Finn generalization for various tasks based only on the solution
et al. [3] experimentally evaluated the algorithm in few- to the inner-level optimization. At the same time an
shot classification, supervised regression, and reinforcement iMAML subvariant, i.e. the Hessian-free subvariant was
learning (RL) problems. However, the MAML algorithm has a also proposed achieving slightly better results in the few-
number of issues that can limit the generalization performance shot classification problem.
of the model, reduce the flexibility of the framework, and • Adaptive Model-Agnostic Meta-Learning (Alpha-
lead to instabilities during training. Moreover, MAML can be MAML) was proposed by Harkirat Singh Behl et al. [6]
computationally expensive during training and inference and in 2019 as an extension to MAML. Alpha-MAML
requires the model to go through a costly hyperparameter tun- increases the algorithm’s resistance to hyperparameter
ing. In order to face those weaknesses, an improved subvariant choices and improves stability. This algorithm was
of the MAML framework called MAML++ was proposed by evaluated on a few-shot image classification problem
Antreas Antoniou et al. in 2019 [4] and was experimentally and the results showed that is more robust than MAML
evaluated in few-shot learning tasks achieving better results. due to the less time that is required for tuning.
III. P OPULAR MAML ALGORITHM SUBVARIANTS • Out-of-distribution (OOD-MAML) is another extension
of the MAML algorithm, introduced by Taewon Jeong
During the last few years, several different subvariants of et al. [7] in 2020. Fake or out-of-distribution samples
the classical MAML algorithm can be found in the literature. that look like in-distribution samples are used to train the
In this work, some popular techniques are briefly presented. model in order to detect those OOD samples in new tasks.
To maintain consistency, the algorithms are classified into the This method achieves learning both the initialization of
following three categories: a model and the initial OOD-samples over the tasks. The
• Regular gradient descent MAML algorithms, experimental evaluation in classification tasks shows that
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
the proposed algorithm achieves higher accuracy than variational inference method with gradient-based meta-
MAML and, at the same time, detects OOD samples. learning. This variant is effective, precise, and robust.
At the same time, it is simple to implement despite the
B. Hessian-free MAML algorithms fact that training such big networks might be costly. This
Another interesting approach is the Hessian-free MAML algorithm was experimentally evaluated in classification,
methods that ignore second-order derivatives when backprop- regression, and RL problems recording good results.
agating in the meta-learning procedure to save a huge amount • Probabilistic Model-Agnostic Meta-Learning (PLATI-
in the network computation and at the same time maintain the PUS) was introduced by Chelsea Finn et al. [12] in 2018.
performance of the model. It is a probabilistic algorithm that can sample models for a
new task from a model distribution. PLATIPUS also uses
• First Order Model-Agnostic Meta-Learning a variational lower bound to train a parameter distribution.
(FOMAML) [3], [8] is a subvariant of the first proposed Moreover, it is simple and provides comparable results
MAML that ignores second derivatives. In that way, to MAML in experimental evaluation. However, it is less
it is simpler to implement and reduces computational efficient in assessing uncertainty because it provides a
complexity. However, it might lose gradient information. nearly poor estimation of posterior variance.
It is proven that FOMAML works as well as the MAML • Amortized Bayesian Meta-Learning (ABLM) proposed
algorithm in few-shot classification problems without by Ravi and Beatson [13] in 2018 is a technique that
affecting its performance. successfully employs hierarchical variational inference to
• Reptile is a method introduced by Alex Nichol et al. [8] in
learn a prior distribution. ABLM also provides a high-
2018. Reptile is similar to FOMAML and quite simple. It quality approximation posterior. Compared to MAML
is characterized by its ability to learn an initialization for and PLATIPUS, the experimental evaluation shows that
the model’s parameters. After the optimization of these this model achieves better computation in the uncertainty
parameters, the model can be generalized from only a few estimations.
examples. By evaluating the results of the algorithms in
few-shot classification tasks, we obtain that they perform
TABLE I
as well as the main MAML algorithm. E XPERIMENTAL E VALUATION FIELDS
• Hessian-Free (HF-MAML) was proposed by Alireza Fal-
lah et al. [9] in 2020 to achieve low computational Reinforcement
Algorithm Classification Regression
Learning
complexity and, at the same time, to preserve all theoret- MAML x x x
ical guarantees of the basic algorithm without evaluating MAML++ x - -
any Hessian. The authors provide a theoretical analysis iMAML x - -
of HF-MAML and highlight its improvement regarding ALPHA-MAML x - -
OOD-MAML x - -
the convergence properties compared to MAML and FOMAML x - -
FOMAML algorithms. Reptile x - -
HF-MAML - - -
C. Bayesian MAML algorithms LLAMA x x -
BMAML x x x
The Bayesian hierarchical model is also used in meta- PLATIPUS x x -
learning techniques to incorporate uncertainty into the model ABLM x - -
estimation. Based on this, there are several subvariants of the
basic MAML algorithm that improve its performance metrics
and weaknesses.
IV. D ISCUSSION AND F UTURE D IRECTIONS
• Laplace Approximation for Meta-Adaption (LLAMA)
is an extension to MAML based on the hierarchical All the above algorithms are demonstrated on different
Bayesian model and the Laplace approximation that model types and are experimentally evaluated in several dis-
provides a more accurate estimate of the integral. It tinct domains, i.e., classification problems, few-shot regres-
was proposed by Erin Grant et al. [10] in 2018. The sions, and RL scenarios. Table I indicates that the presented
experimental evaluation of the one-shot classification algorithms are mostly designed for classification problems.
problem shows a small improvement in accuracy. How- Some of them are also tested in regression problems and only
ever, skewed distributions do not respond well to this the first MAML algorithm and its Bayesian format (BMAML)
approximation. Furthermore, the extension has limitations are used in RL tasks.
since it uses a point estimate to approximate the predic- In contrast to the prior algorithms, other MAML subvariants
tive distribution over additional data points. To overcome have also been proposed focused only on reinforcement, deep
these limitations, the BMAML was proposed. reinforcement, and meta-reinforcement learning. The Taming
• Bayesian MAML (BMAML) was proposed in 2018 by MAML [14] using the automatic differentiation framework,
Taesup Kim et al [11], in order to learn complex uncer- and the ES-MAML [15], which is based on Evolution Strate-
tainty structures and combine an effective nonparametric gies, are two typical examples of this category.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Fig. 2. MAML algorithms in timescale. Different colors are used for each category (Grey for the first MAML algorithm, blue for regular gradient descent
MAML algorithms, green for the Hessian-free MAML algorithms, and orange for Bayesian MAML algorithms).
Another observation is that most of the works regarding the [4] A. Antoniou, H. Edwards, and A. Storkey, “How to train your MAML,”
MAML algorithm family were proposed between 2017 and in International Conference on Learning Representations, New Orleans,
LA, USA, May 2019, doi: 10.48550/arXiv.1810.09502.
2020 as shown in Fig. 2. At this time, the research interest [5] A. Rajeswaran, C. Finn, S. M. Kakade, and S. Levine, “Meta-learning
has moved to new RL methods, which are a widely developing with implicit gradients,” Advances in neural information processing
field in the deep-learning field. However, the MAML family systems, vol. 32, Vancouver, Canada, 2019.
[6] H. S. Behl, A. G. Baydin, and P. H. Torr, “Alpha maml: Adaptive model-
consists of simple and widely used algorithms and there is a agnostic meta-learning,” in 6th ICML Workshop on Automated Machine
change that in the next years the researchers will continue to Learning, California, USA, 2019, doi: 10.48550/arXiv.1905.07435.
evolve this field. [7] T. Jeong and H. Kim, “Ood-maml: Meta-learning for few-shot out-of-
distribution detection and classification,” Advances in Neural Informa-
tion Processing Systems, vol. 33, pp. 3907–3916, Dec. 2020.
V. C ONCLUSIONS [8] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning
algorithms,” 2018, arXiv:1803.02999.
In this work, a brief review of the most popular MAML [9] A. Fallah, A. Mokhtari, and A. Ozdaglar, “On the convergence the-
techniques is presented. Various algorithms are proposed as ory of gradient-based model-agnostic meta-learning algorithms,” in
an extension to the first MAML algorithm to overcome the International Conference on Artificial Intelligence and Statistics, pp.
1082–1092, 2020.
weaknesses of the main proposal. The main characteristics [10] E. Grant, C. Finn, S. Levine, T. Darrell, and T. Griffiths, “Recasting
that researchers try to decode are training stability, memory ef- gradient-based meta-learning as hierarchical bayes,” in International
ficiency, flexibility, generalization, computational complexity, Conference on Learning Representations, Vancouver, Canada, 2018, doi:
10.48550/arXiv.1801.08930.
and robustness. Most of these algorithms are experimentally [11] J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn, “Bayesian
evaluated in few-shot classification problems and achieved model-agnostic meta-learning,” in Advances in Neural Information Pro-
promising results. Due to the simplicity and adaptability of cessing Systems, vol. 31, 2018, doi: 10.48550/arXiv.1806.03836.
[12] C. Finn, K. Xu, and S. Levine, “Probabilistic model-agnostic meta-
the MAML algorithm family, they established a field with learning,” Advances in Neural Information Processing Systems, vol. 31,
further potential development. In future work, we will extend 2018, doi: 10.48550/arXiv.1806.02817.
our research to RL techniques. We are planning to make an [13] S. Ravi and A. Beatson, “Amortized bayesian meta-learning,” in Inter-
national Conference on Learning Representations, New Orleans, LA,
extensive review as well as an experimental evaluation of the USA, May, 2018.
algorithms that are used in this field. [14] H. Liu, R. Socher, and C. Xiong, “Taming maml: Efficient unbiased
meta-reinforcement learning,” in International conference on machine
R EFERENCES learning, Long Beach, CA, USA, June, 2019.
[15] X. Song, W. Gao, Y. Yang, K. Choromanski, A. Pacchiano, and
[1] T. Hospedales, A. Antoniou, P. Micaelli and A. Storkey, ”Meta-Learning Y. Tang, “Es-maml: Simple hessian-free meta learning,” in In-
in Neural Networks: A Survey,” in IEEE Transactions on Pattern ternational Conference on Learning Representations, Apr. 2020,
Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5149-5169, 1 doi:10.48550/arXiv.1910.01215.
Sept. 2022, doi: 10.1109/TPAMI.2021.3079209.
[2] R. Vilalta, C. Giraud-Carrier, and P. Brazdil, ”Meta-Learning - Concepts
and Techniques”, in Data Mining and Knowledge Discovery Handbook,
Springer, Boston, MA, pp. 717–731., 2010, doi: 10.1007/978-0-387-
09823-4-36.
[3] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for
fast adaptation of deep networks,” in Proc. 34th International Conference
on Machine Learning - Volume 70, Sydney, Australia, Aug. 2017, pp.
1126–1135, doi: 10.48550/arXiv.1703.03400.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:08 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
Abstract—Numerous applications for music listeners, educa- adjacent to it. Examples include music education, which
tors, DJs, and musicians have been created over the past decade encompasses the teaching of music theory, instrument play,
as the field of Music Deep Learning has expanded. Evidently, as well as music history. Another example is cultural tourism,
the majority of works rely on training their architectures using
well-established databases, which consist primarily of Western which refers to traveling in order to visit a cultural event or
popular music genres. This creates a deficiency in applications place, like a music concert or a religious event. Music therapy
focusing on traditional and regional music, which are vital to is another relevant application of MDL, which refers to the use
preserving culture. As an increasing number of research groups of music to treat symptoms related to conditions like stress,
begin to develop works centered on traditional music, several anxiety, depression, and more [10].
practical challenges for applying deep learning arise. This work
discusses these issues and the future of deep learning in the study An essential part of building any MDL architecture is
of traditional music. training, validating, and testing the model, using an available
Index Terms—Deep Learning, Music Information Retrieval, database. Certain Western-oriented music genres, such as pop,
Music Generation, Traditional Music, Regional Music jazz, hip-hop, electronic, rock, metal, blues, and classical,
are widely recognized. Thus, there is an abundance of free
I. I NTRODUCTION databases that contain substantial amounts of music recordings
from the aforementioned genres. Examples include the ISMIR
Deep Learning (DL) has so far expanded to numerous fields
dataset [11], the Million Song Dataset [12], the GTZAN
of the natural sciences [1], such as engineering, computer
dataset [13], MagnaTagATune [14], J.S. Bach Chorales [15],
vision, physics, and medicine, with a plethora of commercial
and more.
applications. Among these, DL for music processing has
The availability of large music recording databases is ex-
gained considerable attention in the last decade or so.
tremely useful in building efficient architectures. Unfortu-
Music Deep Learning (MDL) encompasses a collection of
nately, though, the focus on Western music leads to a big issue,
music-related applications, like music recommendation [2],
that of under-represented music genres, especially traditional
classification [3], music generation [4], instrument recognition
music. Traditional, or regional music, generally refers to music
[5], singing analysis [6], emotion classification [7], transcrip-
that is specific to a given country or region and is a big part of
tion [8] and more [9]. Naturally, considering that the art of
its history and cultural identity. Traditional music genres are
music plays such a big part in society and human interaction,
abundant throughout the world [16], with the roots of many
the above problems find several commercial applications in
genres tracing back hundreds of years. As music listening
fields related to the music industry, as well as disciplines
becomes more and more digitized and is moved to online plat-
This research was carried out as part of the project ’Recognition and forms like Youtube and Spotify, traditional music genres are
direct characterization of cultural items for the education and promotion of also becoming part of the digital online music community. As
Byzantine Music using artificial intelligence (Project code: KMP6-0078938) such, they are also subject to music information retrieval tasks
under the framework of the Action ’Investment Plans of Innovation’ of the
Operational Program ’Central Macedonia 2014 2020’, that is co-funded by like song/artist recommendation, automatic playlist generation,
the European Regional Development Fund and Greece. emotion recognition, transcription, and more.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
979-8-3503-2107-4/23/$31.00 ©2023 IEEE
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
So, studying traditional music under the scope of MDL, music, in [20] for Indian classical music, in [21] for Korean
termed Traditional Music Deep Learning (TMDL), will sup- traditional music, in [22], [48] for Persian, in [25], [37] for
port cultural preservation through the use of state-of-the- Maqams, in [26]–[29], [31] for Chinese music, in [30] for
art tools for the study and digitization of music. Moreover, Cantonese Opera, in [32] for Croatian, in [33] for Ethiopian,
building ML architectures focusing on traditional music will and in [35] for Irish. For many of the above, the data collection
also help introduce traditional music to a new generation of included gathering data either from personal collections, from
listeners, by promoting digital education and cultural tourism. online sources, or obtaining new samples and annotation help
As more research groups recognize the impact potential from music students and experts.
of TMDL, they are developing DL architectures for Music Note that, complementary to music tracks, it is essential
Information Retrieval (MIR) [17]–[33] and Music Generation that music datasets be synchronized with lyric datasets, for
(MG) [34]–[37], specifically for such genres. For these tasks traditional music that includes a singing part. Lyric availability
though, several challenges arise that must be addressed to is important for tasks like singing voice detection or emotion
properly apply DL techniques. These issues have to do with recognition. Also, lyrics are important in traditional music,
the peculiarities of each music genre, and in many cases, the especially religious music. This is because singing is the most
lack of appropriate data for feature tagging and training [38]. important part of a song. For example, in Byzantine hymns,
Motivated by this, the goal of this work is to outline and the music rhythm comes from the chanter alone, without
discuss several of these issues, as a guide for future research any accompanying instrument. For this type of ’a Capella
groups that aim to tackle the problem of TMDL. By addressing music, suitable architectures lie at the intersection of music
these problems, the field of MDL will be closer to making and speech processing. Hence, MIR tasks need to be able to
unified architectures that can be easily adopted and work well detect a singing voice and have the lyrics available.
for all types of music. The work also aims to gather and
highlight recent literature works that study MDL topics for B. Tagging
specialized music genres.
As mentioned, the existence of available datasets for tradi-
The rest of the work is structured as follows: In Section
tional music is already an issue. It is important to emphasize
II, the challenges of applying DL to traditional music are dis-
here that a dataset does not simply involve the collection of
cussed. Each issue is addressed in a separate subsection. The
music files. In order for a dataset to find its highest usability
paper concludes with Section III, which includes a discussion
in DL, it must be organized and enriched with the appropriate
of future research goals.
tags. Thus, building upon the issue of lacking datasets, is the
II. C HALLENGES IN T RADITIONAL M USIC DL issue of organizing the datasets, to make them suitable for use
[27].
Here, the challenges arising in TMDL are discussed.
Music tags may refer to a subgenre, the type of instrument
A. Available Datasets involved, the presence of singing, the singer’s gender, the
emotion associated with the track, an era, and indirect musical
As mentioned in the introduction section, there are several
information like the musician’s popularity, record sales, and
public databases containing music tracks of popular music
more [52]. To fill the dataset with annotations, the best
genres, like pop, jazz, hip-hop, electronic, rock, metal, blues,
way is to consult culturally-aware experts in the field, like
and classical [11]–[15]. So, any MDL architecture has access
historians and musicians, but also music listeners. Many of
to many, usually, thousands, of music tracks to train, test,
the aforementioned works’ research groups are working on
and validate. Unfortunately, this is not the case with many
enriching their datasets with appropriate tags. Note that some
traditional music genres. Traditional music datasets are scarce,
tags, like the gender of a singer, may be clear, while others,
and those that are available typically contain far fewer samples.
like the underlying emotion of a score, or a specific subgenre,
So, making public, copyright-free datasets is the first and most
may not be. Of course, working on unlabeled data is possible
important thing that needs to be done to help TMDL develop.
in many DL tasks, but generally speaking, a label-enriched
Administrative authorities, like Ministries of Culture, can
dataset can find a wider usage.
greatly help in this direction by working alongside research
groups to provide available recordings, and also settle any
C. Recording Difficulties
copyright issues. In cases where available recordings (for
example, recordings by past famous artists) are not enough For most music genres, audio recordings take place in
to equip a dataset with the necessary samples for training, recording studios, equipped with consoles and appropriate
the datasets can be enriched by gathering recordings from hardware to capture the sound, and also perform the mixing
local artists and music schools. Many groups and authorities and editing. There can be exceptions though, as for some
are working towards this [39]–[51]. Here, data augmentation genres this may be hard to achieve. One such example is
techniques can also be applied to complement the newly religious music, where it may not always be possible to record
created datasets. songs in a studio. For example, when it comes to Orthodox
As recent examples, TMDL has been considered in [23], psalms, which are sometimes recorded in remote monasteries,
[24], [34] for Turkish music, in [19] for Hindustani classical the recording can only be done in place, because there is no
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
nearby studio, and the chanters may not be willing to perform and a given artist. For example, the song ”Whole Lotta Love”
outside the monastery grounds. is attributed to the band Led Zeppelin, and is associated with
So, when gathering tracks for the dataset, many record- their studio album Led Zeppelin II. Although other studio and
ings may take place in different environments with varying live versions may be available, the original album track can be
acoustics. This can result in recordings with different sound considered the main reference to this song. Yet, in traditional
characteristics, even for the same song. Thus, to compensate music, it is less common to associate a song with a given
for these technicalities, the recordings must be managed by artist or a specific performance. As such songs are much older,
sound engineers. It is worth noting of course that essential they have been covered by many musicians over the years, so
fieldwork has also been performed by dedicated ethnographers multiple performances can be associated with each song. This
and ethnomusicologists. Moreover, certain preprocessing of is even more prominent in religious music, where there can
the dataset may be required, to retrieve musical information be numerous recordings of a musical piece, like a hymn, a
and normalize all the features among the recordings. These chant, or a prayer. Such renditions can vary heavily from one
topics generally fall into the discipline of Audio Engineering. another, as the recitation of a chant is heavily dependent on
Note that for Symbolic music, that is, music saved in a each performer flavoring the track with their own rhythm and
notational format, like music sheets or MIDI files, this is not emotions.
an issue, as it is not accompanied by an audio signal. For these music categories, developing architectures for the
D. Local Instruments task of song identification will be especially challenging, due
to the variations in the musical characteristics of each song’s
Most of the widely popular instruments are being played (or chant’s) performance. For music tracks, earlier works have
by musicians from all genres. For example, instruments like showcased that entropy can be used to design fingerprints for
the guitar, trumpet, and drums are used extensively in many matching different performances of the same musical piece
music genres, from blues and pop to metal. Thus, there is an [59], [60]. Entropy is a measure of a signal’s complexity
abundance of recordings of these instruments that cover most and predictability, which is often considered in physics and
genres. So, it is possible that architectures trained to perform signal processing. Thus, entropy-based features, computed for
an application like instrument classification on one dataset can the time and frequency domains, could play a key role in
be equally efficient on another genre dataset as well. achieving robustness in identifying renditions of the same song
This would be much more difficult to achieve for musical or religious chant.
instruments that are much less common. There are plenty of
regional instruments in each country that are used specifically G. Public Outreach
in local music, making it unlikely that they are ever used The public dissemination of scientific results through jour-
for performing other genres. Examples include the Greek nal publications and conference participation is of course a
Baglamas, the American Banjo, the Eastern Oud, the Chinese common expectation for all scientific disciplines. These plat-
Erhu, and many more [53]. So for TMDL, special datasets forms usually ensure that any newly derived scientific results
must be constructed for regional instruments. These can first will reach all the interested groups that can be influenced
be used separately for tasks like classification and MG, and by them. This may not entirely be true for new results in
possibly at a later stage, they can be combined with larger TMDL. Here, apart from the scientists working in the field
databases and tested for the same tasks. Recent examples who are staying up-to-date on new scientific publications,
include the Thai Xylophone [54], the woodwind instrument there is a large group of non-academic parties that may be
Sopele [32], a collection of Chinese instruments [27], and left unaware of the opportunities and benefits offered by DL.
Persian instruments [48]. The scientific groups should thus be entrusted with the extra
E. Musical Symbols task of making sure that the developments in the studies of
Nowadays, music transcriptions follow a set of modernized traditional music through deep learning reach the communities
notation symbols, which for the most part are unified for all devoted to this music. This can be achieved by reaching out
genres. So, DL architectures for tasks like music transcription to local authorities, music schools, musicians, and media,
and symbolic MG can work well for many different types and presenting the results through public talks, as well as
of music. Considering TMDL though, there are many excep- non-academic platforms, like television talks, newspapers, and
tions, as some regional and historic genres have notational social media.
differences. Thus, the above architectures must be adapted Another approach that could prove highly effective is to
to the specific datasets for each genre. This, of course, also improve the ease of use of MDL, and especially music
falls under the issue of dataset creation, as discussed above. generation, for musicians and educators. This can be achieved
Examples include Mensural notation [55], Chinese Gong-che by building architectures that are accompanied by front-end
notation [56], Organ tablature [57], and Carnatic notation [58]. interactivity, for music generation tasks. Having a Graphical
User Interface (GUI) that would allow any user to interact
F. Matching Performances and experiment with the architecture, without requiring any
In MIR tasks, like song identification, an essential premise background knowledge of the model structure, would offer
is that a given song is associated with a specific performance immense help to professionals. Moreover, such GUIs can be
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
of great service in educational activities, for the introduction [6] Chitralekha Gupta, Haizhou Li, and Masataka Goto, “Deep learning
of younger generations to traditional music. approaches in topics of singing information processing,” IEEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 30, pp.
2422–2451, 2022.
H. Accessibility of the Research Results [7] X. Jia, “Music emotion classification method based on deep learning
and improved attention mechanism,” Computational Intelligence and
A final remark can be made regarding the commercialization Neuroscience, 2022.
potential of TMDL studies. As with all works on MIR, the [8] Carlos Hernandez-Olivan, Ignacio Zay Pinilla, Carlos Hernandez-Lopez,
commercial prospects are vast. On the other side, TMDL and Jose R Beltran, “A comparison of deep learning methods for timbre
analysis in polyphonic automatic music transcription,” Electronics, vol.
studies offer a service to the cultural preservation of each 10, no. 7, pp. 810, 2021.
region’s identity. Thus, an argument can be made that the [9] Lazaros A. Iliadis, Sotirios P Sotiroudis, Kostas Kokkinidis, Panagiotis
deliverables of such studies should be made available to any- Sarigiannidis, Spiridon Nikolaidis, and Sotirios K Goudos, “Music deep
one, due to their cultural significance. This could be achieved learning: A survey on deep learning methods for music processing,” in
2022 11th International Conference on Modern Circuits and Systems
if TMDL studies are supported by public funding, with the Technologies (MOCAST). IEEE, 2022, pp. 1–4.
deliverables (datasets and architectures) being made available [10] Yang Heping and Wang Bin, “Online music-assisted rehabilitation
as open access. This point of course comes in accordance with system for depressed people based on deep learning,” Progress in Neuro-
Psychopharmacology and Biological Psychiatry, p. 110607, 2022.
the trending worldwide discussion on the public availability of [11] “Ismir genre dataset,” https://ismir2004.ismir.net/, 2004, Access: 2022-
research outputs, which is relevant to other research fields as 30-09.
well. [12] Thierry Bertin-Mahieux, Daniel Ellis, Brian Whitman, and Paul Lamere,
“The million song dataset.,” in Proceedings of the 12th International
Society for Music Information Retrieval Conference (ISMIR 2011), 01
III. C ONCLUSIONS 2011, pp. 591–596.
[13] George Tzanetakis and Perry Cook, “Musical genre classification of
The difficulties of applying deep learning techniques to audio signals,” IEEE Transactions on speech and audio processing, vol.
traditional music are discussed in this paper. Considering the 10, no. 5, pp. 293–302, 2002.
aforementioned issues, one conclusion that can be drawn is [14] “The MagnaTagATune Dataset,” http://mirg.city.ac.uk/codeapps/the-
that, given sufficient resources, they can all be successfully magnatagatune-dataset, 2013, Accessed: 2022-30-09.
[15] “J.S. Bach Chorales Dataset,” https://github.com/czhuang/JSB-Chorales-
addressed. Consequently, it is crucial to raise awareness among dataset, Accessed: 2022-30-09.
the Ministries of Culture of each country, to provide the [16] Wikipedia, “List of cultural and regional genres of music,”
financial resources for research. Collaboration with local com- en.wikipedia.org/wiki/List of cultural and regional genres of music,
Accessed: 2022-30-09.
munities, cultural associations, and conservation clubs must
[17] Yandre MG Costa, Luiz S Oliveira, and Carlos N Silla Jr, “An
also be extended, in order to aid in database construction and evaluation of convolutional neural networks for music classification
knowledge digitization. It is also essential that the members using spectrograms,” Applied soft computing, vol. 52, pp. 28–38, 2017.
of the research groups examining TMDL come from a variety [18] Akhilesh Kumar Sharma, Gaurav Aggarwal, Sachit Bhardwaj, Pra-
sun Chakrabarti, Tulika Chakrabarti, Jemal H Abawajy, Siddhartha
of disciplines, including computer engineers, physicists, sound Bhattacharyya, Richa Mishra, Anirban Das, and Hairulnizam Mahdin,
engineers, historians, ethnographers, music composers, music “Classification of Indian classical music with time-series matching deep
performers, and even musical instrument manufacturers. With- learning approach,” IEEE Access, vol. 9, pp. 102041–102052, 2021.
[19] Vishnu S Pendyala, Nupur Yadav, Chetan Kulkarni, and Lokesh Vad-
out a doubt, future research in the field will resolve the issues lamudi, “Towards building a Deep Learning based Automated Indian
listed above. Classical Music Tutor for the Masses,” Systems and Soft Computing,
A natural expansion from TMDL would be to apply DL vol. 4, pp. 200042, 2022.
techniques to the field of traditional dance [61]. Many of [20] Sayan Nag, Medha Basu, Shankha Sanyal, Archi Banerjee, and Dipak
Ghosh, “On the application of deep learning and multifractal techniques
the aforementioned issues are shared here as well, like the to classify emotions and instruments using Indian Classical Music,”
existence of suitable datasets. Addressing traditional music Physica A: Statistical Mechanics and its Applications, vol. 597, pp.
through DL will certainly complement the study of traditional 127261, 2022.
[21] JongSeol Lee, MyeongChun Lee, Dalwon Jang, and Kyoungro Yoon,
music and aid in cultural preservation, education, and global “Korean traditional music genre classification using sample and midi
representation. phrases,” KSII Transactions on Internet and Information Systems (TIIS),
vol. 12, no. 4, pp. 1869–1886, 2018.
R EFERENCES [22] Nacer Farajzadeh, Nima Sadeghzadeh, and Mahdi Hashemzadeh,
“PMG-Net: Persian Music Genre classification using deep neural Net-
[1] Ajay Shrestha and Ausif Mahmood, “Review of deep learning algo- works,” Entertainment Computing, p. 100518, 2022.
rithms and architectures,” IEEE Access, vol. 7, pp. 53040–53065, 2019. [23] Merve Ayyüce Kızrak and Bülent Bolat, “A musical information
[2] Markus Schedl, “Deep learning in music recommendation systems,” retrieval system for Classical Turkish Music makams,” Simulation, vol.
Frontiers in Applied Mathematics and Statistics, p. 44, 2019. 93, no. 9, pp. 749–757, 2017.
[3] Ndiatenda Ndou, Ritesh Ajoodha, and Ashwini Jadhav, “Music genre [24] Serhat Hizlisoy, Serdar Yildirim, and Zekeriya Tufekci, “Music emotion
classification: A review of deep-learning and traditional machine- recognition using convolutional long short term memory deep neural
learning approaches,” in 2021 IEEE International IOT, Electronics and networks,” Engineering Science and Technology, an International
Mechatronics Conference (IEMTRONICS). IEEE, 2021, pp. 1–6. Journal, vol. 24, no. 3, pp. 760–767, 2021.
[4] Miguel Civit, Javier Civit-Masot, Francisco Cuadrado, and Maria J [25] Sakib Shahriar and Usman Tariq, “Classifying maqams of Qur¢anic
Escalona, “A systematic review of artificial intelligence-based music recitations using deep learning,” IEEE Access, vol. 9, pp. 117271–
generation: Scope, applications, and future trends,” Expert Systems with 117281, 2021.
Applications, p. 118190, 2022. [26] Y. Yang and X. Huang, “Research based on the application and
[5] Maciej Blaszke and Bożena Kostek, “Musical instrument identification exploration of artificial intelligence in the field of traditional music,”
using deep learning approach,” Sensors, vol. 22, no. 8, pp. 3033, 2022. Journal of Sensors, vol. 2022, 2022.
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.
2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST)
[27] Rongfeng Li and Qin Zhang, “Audio recognition of Chinese traditional recordings for computational music analysis,” in The 20th International
instruments based on machine learning,” Cognitive Computation and Society for Music Information Retrieval Conference, 2019.
Systems, vol. 4, no. 2, pp. 108–115, 2022. [50] “RWC pop music dataset,” https://staff.aist.go.jp/m.goto/RWC-MDB/,
[28] Ke Xu, “Recognition and classification model of music genres and 2012, Accessed: 2022-30-09.
Chinese traditional musical instruments based on deep neural networks,” [51] Masataka Goto, Hiroki Hashiguchi, Takuichi Nishimura, and Ryuichi
Scientific Programming, vol. 2021, 2021. Oka, “RWC Music Database: Popular, Classical and Jazz Music
[29] Juan Li, Jing Luo, Jianhang Ding, Xi Zhao, and Xinyu Yang, “Regional Databases.,” in Ismir, 2002, vol. 2, pp. 287–288.
classification of Chinese folk songs based on CRF model,” Multimedia [52] Paul Lamere, “Social tagging and music information retrieval,” Journal
tools and applications, vol. 78, no. 9, pp. 11563–11584, 2019. of new music research, vol. 37, no. 2, pp. 101–114, 2008.
[30] Qiao Chen, Wenfeng Zhao, Qin Wang, and Yawen Zhao, “The Sustain- [53] Wikipedia, “List of musical instruments,”
able Development of Intangible Cultural Heritage with AI: Cantonese wikipedia.org/wiki/List of musical instruments, Access: 2022-30-
Opera Singing Genre Classification Based on CoGCNet Model in 09.
China,” Sustainability, vol. 14, no. 5, pp. 2923, 2022. [54] Apichai Huaysrijan and Sunee Pongpinigpinyo, “Automatic Music
[31] Zhongkui Xu, “Construction of intelligent recognition and learning edu- Transcription for the Thai Xylophone played with Soft Mallets,” in 2022
cation platform of national music genre under deep learning,” Frontiers 19th International Joint Conference on Computer Science and Software
in Psychology, vol. 13, 2022. Engineering (JCSSE). IEEE, 2022, pp. 1–6.
[32] Arian Skoki, Sandi Ljubic, Jonatan Lerga, and Ivan Štajduhar, “Auto- [55] Jorge Calvo-Zaragoza, Alejandro H Toselli, and Enrique Vidal, “Hand-
matic music transcription for traditional woodwind instruments sopele,” written music recognition for mensural notation with convolutional
Pattern Recognition Letters, vol. 128, pp. 340–347, 2019. recurrent neural networks,” Pattern Recognition Letters, vol. 128, pp.
[33] Ephrem A Retta, Richard Sutcliffe, Eiad Almekhlafi, Yosef Enku, Eyob 115–121, 2019.
Alemu, Tigist Gemechu, Michael A Berwo, Mustafa Mhamed, and Jun [56] G.-F. Chen, “Music sheet score recognition of Chinese Gong-che
Feng, “Kinit Classification in Ethiopian Chants, Azmaris and Modern notation based on Deep Learning,” in International Conference on Big
Music: A New Dataset and CNN Benchmark,” arXiv:2201.08448, 2022. Data Analysis and Computer Science (BDACS). IEEE, 2021, pp. 183–
[34] Senem Tanberk and Dilek Bilgin Tükel, “Style-Specific Turkish Pop 190.
Music Composition with CNN and LSTM Network,” in 2021 IEEE [57] Daniel Schneider, Nikolaus Korfhage, Markus Mühling, Peter Lüttig,
19th World Symposium on Applied Machine Intelligence and Informatics and Bernd Freisleben, “Automatic transcription of organ tablature music
(SAMI). IEEE, 2021, pp. 000181–000185. notation with deep neural networks,” Transactions of the International
[35] Antonina Kolokolova, Mitchell Billard, Robert Bishop, Moustafa Elsisy, Society for Music Information Retrieval, vol. 4, no. 1, 2021.
Zachary Northcott, Laura Graves, Vineel Nagisetty, and Heather Patey, [58] VK Prathyushaa, PS Chandrasekar, and R Anuradha, “A Comparative
“GANs & Reels: Creating Irish Music using a Generative Adversarial Study of Image Classification Models for Western Notation to Carnatic
Network,” arXiv preprint arXiv:2010.15772, 2020. Notation: Conversion of Western Music Notation to Carnatic Music
[36] Bob LT Sturm and Oded Ben-Tal, “Folk the algorithms:(Mis) Applying Notation,” in 2021 Fifth International Conference on I-SMAC (IoT in
artificial intelligence to folk music,” in Handbook of Artificial Intelli- Social, Mobile, Analytics and Cloud)(I-SMAC). IEEE, 2021, pp. 1299–
gence for Music, pp. 423–454. Springer, 2021. 1303.
[37] Ismail Hakki Parlak, Yalçin Çebi, Cihan Işikhan, and Derya Birant, [59] Antonio Camarena-Ibarrola and Edgar Chávez, “Identifying music by
“Deep learning for Turkish makam music composition,” Turkish Journal performances using an entropy based audio-fingerprint,” in Mexican
of Electrical Engineering and Computer Sciences, vol. 29, no. 7, pp. International Conference on Artificial Intelligence (MICAI), 2006.
3107–3118, 2021. [60] Antonio C Ibarrola and Edgar Chávez, “A robust entropy-based audio-
[38] Xavier Serra, “A multicultural approach in music information research,” fingerprint,” in 2006 IEEE International Conference on Multimedia and
in ISMIR 2011: Proceedings of the 12th International Society for Music Expo. IEEE, 2006, pp. 1729–1732.
Information Retrieval Conference; Miami, Florida (USA). International [61] I. Rallis, A.s Voulodimos, N. Bakalos, E. Protopapadakis, N. Doulamis,
Society for Music Information Retrieval (ISMIR), 2011. and A. Doulamis, “Machine learning for intangible cultural heritage: a
[39] Xiaojing Liang, Zijin Li, Jingyu Liu, Wei Li, Jiaxing Zhu, and Baoqiang review of techniques on dance analysis,” Visual Computing for Cultural
Han, “Constructing a multimedia Chinese musical instrument database,” Heritage, pp. 103–119, 2020.
in Proceedings of the 6th Conference on Sound and Music Technology
(CSMT). Springer, 2019, pp. 53–60.
[40] Xia Gong, Yuxiang Zhu, Haidi Zhu, and Haoran Wei, “Chmusic: a tra-
ditional Chinese music dataset for evaluation of instrument recognition,”
in 2021 4th International Conference on Big Data Technologies, 2021,
pp. 184–189.
[41] Carlos Nascimento Silla Jr, Alessandro L Koerich, and Celso AA
Kaestner, “The Latin Music Database.,” in ISMIR, 2008, pp. 451–456.
[42] “Royal Museum for Central-Africa,” http://www.africamuseum.be/en,
Accessed: 2022-30-09.
[43] Dimos Makris, Ioannis Karydis, and Spyros Sioutas, “The Greek
music dataset,” in Proceedings of the 16th International Conference on
Engineering Applications of Neural Networks (INNS), 2015, pp. 1–7.
[44] Charilaos Papaioannou, Ioannis Valiantzas, Theodoros Giannakopoulos,
Maximos Kaliakatsos-Papakostas, and Alexandros Potamianos, “A
Dataset for Greek Traditional and Folk Music: Lyra,” arXiv preprint
arXiv:2211.11479, 2022.
[45] “Thrace and Macedonia,” http://epth.sfm.gr/, Accessed: 2022-30-09.
[46] “The session,” https://thesession.org/, Accessed: 2022-30-09.
[47] M Kemal Karaosmanoğlu, “A Turkish makam music symbolic database
for music information retrieval: SymbTr,” in Proceedings of 13th In-
ternational Society for Music Information Retrieval Conference; ISMIR.
International Society for Music Information Retrieval (ISMIR), 2012,
pp. 223–228.
[48] S. M. H. Mousavi and S. M. H. Prasath, V. S.and Mousavi, “Persian
classical music instrument recognition (PCMIR) using a novel Persian
music database,” in 2019 9th International Conference on Computer
and Knowledge Engineering (ICCKE). IEEE, 2019, pp. 122–130.
[49] Lucas Maia, Magdalena Fuentes, Luiz Biscainho, Martı́n Rocamora, and
Slim Essid, “SAMBASET: A dataset of historical samba de enredo
Authorized licensed use limited to: Aristotle University of Thessaloniki. Downloaded on July 18,2023 at 07:48:07 UTC from IEEE Xplore. Restrictions apply.