You are on page 1of 4

Blockchain-Based Attack Detection on

Machine Learning Algorithms for


IoT-Based e-Health Applications
Thippa Reddy Gadekallu, Manoj M K, Sivarama Krishnan S, Neeraj Kumar, Saqib Hakak, and Sweta Bhattacharya

Abstract
The application of machine learning (ML) algorithms are massively scaling up due to rapid digitization and emergence of new tecnologies
like the Internet of Things (IoT). In today’s digital era, we can find ML algorithms being applied in the areas of healthcare, IoT, engineering,
finance, and more. However, all these algorithms need to be trained in order to predict/solve a particular problem. There is high possibil-
ity of tampering with the training datasets and producing biased results. Hence, in this article, we propose a blockchain-based solution to
secure the datasets generated from IoT devices for e-health applications. The proposed blockchain-based solution uses private cloud to tack-
le the aforementioned issue. For evaluation, we have developed a system that can be used by dataset owners to secure their data.

Introduction The main contribution of this article includes:


• The data generated from the sensors in the IoT system and
There has been extensive usage of machine learning (ML) ML algorithms are stored securely in a private cloud using
and related applications. This enormous usage has led to reli- AES encryption.
ance on ML-based predictions, which impacts the decisions • Tampering with datasets and ML algorithms is prevented
made [1]. The authenticity of the datasets used to train the using blockchain.
ML algorithms acts as the backbone for such predictions and The organization of the article is as follows. We present state-
pertinent decision making. However, if a dataset is tampered of-the-art technologies for securing ML datasets from potential
with, the training results of the ML algorithms would lead to attackers. A proposed framework is discussed. A section con-
diluted predictions guiding tilted and biased decisions. As tains experimental results, and the article is finally concluded.
an example, tampering with customer survey research and
product review related data could lead to biased product Recent Advances
recommendations in e-commerce platforms [2]. Also, there is Recent studies highlight the exposure of ML algorithms to
a possibility of the ML algorithm being tampered with leading adversarial attacks where non-traceable changes are introduced
to favorable decisions, especially in the healthcare sector. The in the input data leading to erroneous predictions of outputs
capability of ML algorithms to process data and classify data deceiving the ML algorithm used. The authors in [7] defined
patterns often increases the susceptibility of ML algorithms and analyzed the various forms of adversarial attacks launched
to various types of attacks. The authors in [3] implemented in real-time situations and also proposed plausible defense strat-
poisoning and evasion attacks with the objective of maximiz- egies to combat such attacks. In the case of adversarial imag-
ing the generalization errors in classification. This resulted in es, adversarial noise is introduced, which is used to train ML
a disingenuous model leading to biased values of measure- models subjected to black box attacks. The detectors help to
ments in classification. Evasion is one of the most commonly identify adversarial changes incorporated in the original image.
practiced attacks where falsified but normal-appearing inputs The threats relevant to adversarial attacks predominantly exist
are fed into the ML algorithm during the testing phase. These in classification of image objects captured through cell phone
inputs, when processed, invariably end up being erroneously cameras where even the Google inception model falls prey
classified by the model. The data thus gets misclassified, or to such attacks. The Robust Physical Perturbation algorithm is
in some cases leads to concept drift where the system con- a case wherein imposters print forged road sign posters and
tinuously gets retrained, deteriorating the performance [4]. In replace real signs with them. Similar discrepant approaches
poisoning attacks, training data gets tampered with. This tam- have been identified in the form of cyberspace attacks. Robotic
pered data, when fed into a classifier, negatively impacts the visual images as well as three-dimensional object images are fed
accuracy of the classification model. In some instances of this to ML algorithms for classifications and predictions [8, 9]. One
type of attack, the classifier function gets skewed, producing of the most interesting applications of blockchain is intrusion
favorable results for the attacker [5, 6]. With the advancement detection. Intrusion detection with the intersection of block-
of several technologies including IoT, the Internet of Medical chain has huge scope of implementation in cryptocurrency and
Things, federated learning, and so on, there is a rapid surge smart contract [10]. Blockchain has a lot of potential applica-
in the digital data in the healthcare sector generated through tions in the energy sector, which can be observed in peer-to-
IoT-based devices. ML algorithms play a vital role in helping peer energy trading, IoT applications incorporating blockchain,
doctors in diagnosing patients in a timely manner. Hence, decentralized marketplaces, charging of electric vehicles, and
in this article, a blockchain-based approach is presented to e-mobility [11]. The non-financial applications of blockchain
secure medical datasets generated through IoT devices in are Ethereum and Hyperledger. The authors in [12] identified
healthcare applications. the binary neural network (BNN) as more robust than full pre-
cision networks. Hence, input discretization or dimensionality
Thippa Reddy Gadekallu, Manoj M K, Sivarama Krishnan S, and Sweta Bhattacha-
reduction of the input parameters when combined with BNN
rya are with Vellore Institute of Technology.
makes the model more robust against adversarial attacks. The
Neeraj Kumar is with the University of Petroleum and Energy Studies.
challenges of the existing works are summarized in Table 1. The
Saqib Hakak is with the University of New Brunswick.
existing solutions against various types of attacks on training
Digital Object Identifier: 10.1109/IOTM.1021.2000160 and ML algorithms are given below:

30 2576-3180/21/$25.00 © 2021 IEEE IEEE Internet of Things Magazine • September 2021

GADEKALLU_LAYOUT.indd 30
Authorized licensed use limited to: Universidad de Antioquia. Downloaded on September 17,2022 at 17:30:20 UTC from IEEE Xplore. Restrictions apply. 9/14/21 4:19 PM
Ref. Methods used Evaluation metrics Research challenges

[2] Blockchain technology to secure e-commerce transactions MD5, smart contracts and digital signatures Scalability, computing resources

[5] Linear regression Mean squared error, execution time Delay/overhead in data processing

[10] Intrusion detection system on blockchain Data integrity, transparency Attacks prevention, scalability

[12] Binary neural networks Weight decay value, learning rate Multi-steps attacks still occur

[14] Blockchain system for dApps Smart contracts Transaction delay, lacks high throughput
Table 1. Summary of the challenges in existing literature.

Figure 1. Storage of a fragmented dataset in private cloud.

1. Adversarial attack on training data: Adversarial training using


brute force, data compression as a counter-measure, foveat-
ed imaging mechanism, randomization of data
2. Adversarial attack for network model: Deep contractive net-
work, regularization and masking of the gradient, defensive
filtration, bioinspired defense mechanism
3. Poisoning attack: Sanitization of data, micromodel-based Figure 2. Storage of encrypted fragments in blockchain and verifi-
defense, strong intentional perturbations, human in the loop cation of hash after experimentation.
(HITL) model, TRIM algorithm
Some of the limitations of the existing defense mechanisms
include: or ML algorithm is issued a block id along with the hash of
1. The existing defence mechanisms deal with specific types of the dataset. Upon receipt of this id, the user can apply the ML
attacks and hence fail to adapt to newer attacks. algorithms on the datasets to perform predictive analysis. Upon
2. Defence mechanisms such as brute force consume excessive completion of this process at the user end, a new hash is gen-
computational resources. erated. This user-generated hash is compared to the hash of the
The present work emphasizes elimination of these limitations blockchain. If the hashes match, it can be concluded that the
using the proposed blockchain -based approach. datasets and the ML algorithms have not been compromised.

bLocKcHAIn-bAsed FrAGMentAtIon APProAcH to ProPosed ArcHItecture


Figures 1 and 2 represent the architecture diagram of the pro-
secure MAcHIne LeArnInG dAtAsets posed work. Figure 1 describes the process of data handling
and storage in a private cloud. The private cloud holds the
bAcKGround encrypted fragments of the datasets and ML algorithms. When
Blockchain is a technology wherein a list of timestamped a user initiates a request for download, the data is decrypted
immutable data records is stored and managed in blocks by and defragmented. Figure 2 describes the hybrid blockchain
groups of computational entities. The blocks are interconnected wherein the creation of the blocks is done by the administrator
with one another through cryptographic hashes of the pre- depicting the private blockchain, and the visibility of and access
ceding block. Each block contains three components, namely to the blocks are provided to the user representing the public
the timestamp, hash of the preceding block, and data pertain- blockchain.
ing to transactions. Hence, if any updates in the transactions The key objective in using a private cloud is to let the owner
need to incorporated, it has to be uniformly updated in all the have full control over the dataset. The owner uses the private
blocks constituting the blockchain through a consensus mecha- cloud to restrict access to approved users and eliminate unau-
nism [13]. This ensures the immutability property of the block- thorized access. This level of access control greatly improves
chain, which establishes blockchain as the ideal technology for the overall objective of securing the dataset. On the cloud, the
addressing all types of attacks on ML algorithms. Blockchain dataset is stored as encrypted fragments so as to improve secu-
technologies have been successfully implemented in cryptocur- rity. Once a download request for the dataset is initiated, the
rencies, supply chain management, asset management, health- fragments are decrypted and defragmented so as to provide
care, maintenance of digital Ids, and many others [14, 15]. With the user the original dataset file.
the present world being dependent on data-centric analysis The user may then use the public blockchain to view and
requiring accurate ML algorithms, it becomes extremely nec- verify the hash of the file with the computed hash of the down-
essary to ensure defense against all possible attacks. There is loaded file to ensure file integrity. This helps to establish and
a dire need to build a model that is robust enough to combat justify the integrity of the dataset to any third party.
all such attacks on datasets and ML algorithms. This acts as the The admin is responsible for adding the dataset name and
primary motivation behind the present work. In this work, the hash of the file into the blockchain. This is done with special
datasets and ML algorithm are stored in an encrypted format private blockchain access through the admin private key where-
in the private cloud. Any user who intends to use this dataset in he/she may add a dataset hash as a block to the blockchain

IEEE Internet of Things Magazine • September 2021 31

GADEKALLU_LAYOUT.indd 31
Authorized licensed use limited to: Universidad de Antioquia. Downloaded on September 17,2022 at 17:30:20 UTC from IEEE Xplore. Restrictions apply. 9/14/21 4:19 PM
on a single entity. Securing the dataset by decentralized storage
may be a stepping stone to the future of decentralization, a
peek into Web 3.0.

reFerences
[1] M. S. Mahdavinejad et al., “Machine Learning for Internet of Things Data
Analysis: A Survey,” Digital Commun. and Networks, vol. 4, no. 3, 2018, pp.
161–75.
[2] Y. Zhang et al., “A Blockchain Based Secure E-Commerce Transaction Sys-
tem,” Prof. Int’l. Conf. Web Info. Systems and Applications, 2019, pp. 560–66.
[3] H. Kwon et al., “Multi-targeted Adversarial Example in Evasion Attack on Deep
Neural Network,” IEEE Access, vol. 6, 2018, pp. 46,084–96.
[4] A. N. Bhagoji et al., “Enhancing Robustness of Machine Learning Systems Via
Data Transformations,” Proc. 52nd Annual Conf. Info. Sciences and Systems,
2018, pp. 1–5.
Figure 3. Log of the blocks created in blockchain. [5] M. Jagielski et al., “Manipulating Machine Learning: Poisoning Attacks and
Countermeasures for Regression Learning,” Proc. 2018 IEEE Symp. Security
and Privacy, 2018, pp. 19–35.
making it visible publicly, thereby maintaining integrity of the [6] O. Suciu et al., “When Does Machine Learning FAIL Generalized Transfer-
file. ability for Evasion and Poisoning Attacks,” Proc. 27th USENIX Security Symp.,
This form of integrity check with a blockchain brings a new 2018, pp. 1299–1316.
[7] N. Akhtar and A. Mian, “Threat of Adversarial Attacks on Deep Learning in
flavor to the existing forms of security and can act as a step- Computer Vision: A Survey,” IEEE Access, vol. 6, 2018, pp. 14,410–30.
ping stone for more futuristic ideas of automated security. The [8] I. Goodfellow, P. McDaniel, and N. Papernot, “Making Machine Learning
hybrid blockchain can act as a means of utilizing the features of Robust Against Adversarial Inputs,” Commun. ACM, vol. 61, no. 7, 2018.
both private and public blockchain to get a desired outcome. [9] B. D. Rouani et al., “Safe Machine Learning and Defeating Adversarial
Attacks,” IEEE Security & Privacy, vol. 17, no. 2, 2019, pp. 31–38.
Here, we bring in the concept of full authority to the owner of [10] W. Meng et al., “When Intrusion Detection Meets Blockchain Technology: A
the data while not restricting the view of the data to the public. Review,” IEEE Access, vol. 6, 2018, pp. 10,179–88.
[11] M. Andoni et al., “Blockchain Technology in the Energy Sector: A Systematic
eXPerIMents And resuLts Review of Challenges and Opportunities,” Renewable and Sustainable Energy
Reviews, vol. 100, 2019, pp. 143–74.
To simulate the experimentation, the following software is used [12] P. Panda, I. Chakraborty, and K. Roy, “Discretization Based Solutions for
in this work. For fragmentation we have used 7Zip, an open Secure Machine Learning Against Adversarial Attacks”, IEEE Access, 2019.
source file archiver software. The private cloud is hosted in [13] N. Deepa et al., “A Survey on Blockchain for Big Data: Approaches, Oppor-
Google Cloud Platform. Blockchain is simulated with the help tunities, and Future Directions,” 2020, arXiv preprint arXiv:2009.00858.
[14] W. Cai et al., “Decentralized Applications: The Blockchain-Empowered Soft-
of Remix IDE (Ethereum) through smart contract developed ware System,” IEEE Access, vol. 6, 2018, pp. 53,019–33.
using Solidity. To conduct this experimentation, Medical Cost [15] G. R. Bojja and J. Liu, “Impact of IT Investment on Hospital Performance: A
Dataset from Kaggle is used. This dataset has 1338 rows of data Longitudinal Data Analysis,” Proc. 53rd Hawaii Int’l. Conf. System Sciences,
with 7 attributes. Before storing the dataset in a private cloud, it Jan. 2020.
has been divided into several fragments using 7zip open source
file archiever software. These fragments are then encrypted
bIoGrAPHIes
tHiPPa reDDy GaDekaLLU is currently working as an associate professor in the
using AES encryption with 256-bit key size and uploaded to School of Information Technology and Engineering, Vellore Institute of Technolo-
the virtual private cloud (VPC) in Google Cloud. The admin gy, Tamil Nadu, India. He obtained his Bachelor of Technology degree in comput-
can then compute hash of the datasets and ML algorithm, and er science and engineering from Nagarjuna University, Andhra Pradesh, India, his
Master of Engineeting in computer science and engineering from Anna University,
store the same in a blockchain. The linear regression algorithm Chennai, Tamil Nadu, India, and his Ph.D. from Vellore Institute of Technology.
is used for experimentation purposes in the present study. The He has 14 years of experience in teaching. He has coauthored more than 80
sample logs created in the blockchain are depicted in Fig. 3. A international publications. Currently, his research interests include machine learn-
simulation of the deployed contracts is performed to manage ing, deep learning, computer vision, big data analytics, and blockchain.
the blocks in the blockchain. MaNOJ Mk (mkmanoj1997@gmail.com) is currently working at Oracle India Pvt.
If a user wants to test the accuracy of ML algorithm on Ltd., India. He completed his Master of Technology in software engineering at
the dataset, he/she can request access from the admin for Vellore Institute of Technology. He has done various projects on blockchain,
the same. When the user provides a private key, the dataset cloud security, machine learning, AI, and IoT. He has been awarded the Fast
Track Research Initiative G D Naidu Young Scientist Award from VIT. He has
will be defragmented, and the user can download the dataset published a chapter for a book on blockchain. His interest lies deeply on futuristic
and ML algorithm. The user may compute the hash of the file technologies.
downloaded and compare the hash with the public blockchain
access, following which the experimentation of ML algorithms SiVa raMa kriSHNaN S is currently working as an assistant professor at Vellore
Institute of Technology. He was a research member at the Centre for Ambient
on the dataset can be performed by the user. After experimen- Intelligence and Advanced Networking Research. He has working experience
tation, any third party may verify the originality of the results in the Centre for Development and Advanced Computing (C-DAC) (Ministry
obtained by comparing the generated hash with the public of Science and Technology, Government of India) as a research intern in data
blockchain hash. If the hashes match, it means that the dataset center technologies. He is also certified by EMC Corp. as a proven professional
in information storage and management. Currently, his is a member of the EMC
and ML algorithm is not compromised. academic alliance faculty and played a key role in establishing an MoU between

concLusIon And Future scoPe VIT University and EMC. He proposed and developed an intelligent network
design framework for building small and large-scale networkw. He also developed
In this work, we successfully implemented a blockchain-based an efficient and secure framework for an IP storage network for C-DAC. His cur-
rent interests include e-waste management in India, wireless networks, and cloud
solution to identify attacks on ML algorithms and medical data- computing.
sets. The use of the same concept to power the need for secur-
ing datasets of an organization would mean that the private N eeraJ k UMar [SM] received his Ph.D. in CSE from Shri Mata Vaishno Devi
blockchain requires authentication from a wide range of higher University, Katra (Jammu and Kashmir), India, in 2009, and was a postdoctoral
research fellow at Coventry University, United Kingdom. He is a professor in
officials awaiting a consensus. A feasibility check on the differ- the Department of Computer Science and Engineering, Thapar Institute of Engi-
ent consensus for such a large scenario while taking into con- neering and Technology, Patiala, India. He is also with the School of Computer
sideration the processing power, time, and resources for data Science, University of Petroleum and Energy Studies, Dehradun, Uttarakhand.
block creation and mining could be a much needed analysis. He has published more than 500 technical research papers in top-cited journals
such as IEEE Network, IEEE Communications Magazine, Computer Networks,
A complete decentralized solution of this could be the use of Information Sciences, and many others. He has guided many research scholars
decentralized storage such as Inter Planetary File System or leading to Ph.D. and M.E./M.Tech degrees. His research is supported by funding
SWARM so that the dataset may be kept more secure and not from UGC, DST, CSIR, and TCS. He is an Associate Technical Editor of IEEE Com-

32 IEEE Internet of Things Magazine • September 2021

GADEKALLU_LAYOUT.indd 32
Authorized licensed use limited to: Universidad de Antioquia. Downloaded on September 17,2022 at 17:30:20 UTC from IEEE Xplore. Restrictions apply. 9/14/21 4:19 PM
munications Magazine. He is an Associate Editor of IJCS, Wiley, JNCA, Elsevier, He received his Bachelor’s degree in computer science engineering from the
Elsevier Computer Communications, and Security and Communication, Wiley. He University of Kashmir, India, in 2010 and his Master’s degree in computer and
has been a Guest Editor of various international publications of repute such as information engineering from IIUM, Malaysia. His research areas include informa-
IEEE Access, IEEE Communications Magazine, IEEE Network, Computer Networks, tion security, natural language processing, cyber security, artificial intelligence, and
Elsevier, Future Generation Computer Systems, Elsevier, the Journal of Medical Sys- wireless networks.
tems, Springer, Computer and Electrical Engineering, Elsevier, Mobile Information
Systems, the International Journal of Ad Hoc and Ubiquitous Computing, Tele- S weta B hattacharya (sweta.b@vit.ac.in) is currently associated with Vellore
communication Systems, Springer, and the Journal of Supercomputing, Springer. Institute of Technology as an assistant professor in the School of Information
He has been a Workshop Chair at IEEE GLOBECOM 2018 and IEEE ICC 2019, Technology & Engineering. She received her Ph.D. degree from Vellore Institute
and TPC Chair and member for various international conferences. He has more of Technology and her Master’s degree in industrial and systems engineering from
than 20,000 citations to his credit with current h-index of 77. He has won best the State University of New York, Binghamton. She has guided various UG and
paper awards from the IEEE Systems Journal and ICC 2018, Kansas City, Missouri, PG projects, and published peer- reviewed research articles. She is also a member
in 2018. He is visiting research fellow at Coventry University and Newcastle Uni- of the Computer Society of India and the Indian Science Congress. Her research
versity. experience includes working on pill dispensing robotic projects as a fully funded
Watson Research Scholar at Innovation Associates at SUNY Binghamton. She has
Saqib Hakak is currently working as an assistant professor at the Canadian Insti- completed six sigma green belt certification from Dartmouth College, Hanover.
tute for Cybersecurity, Faculty of Computer Science, University of New Bruns- Her research interests include applications of machine learning algorithms, data
wick, Fredericton, Canada. He received his Ph.D. from the University of Malaya, mining, simulation and modeling, applied statistics, quality assurance, and project
Malaysia, under the Faculty of Computer Science and Information Technology. management.

IEEE Internet of Things Magazine • September 2021 33

GADEKALLU_LAYOUT.indd 33
Authorized licensed use limited to: Universidad de Antioquia. Downloaded on September 17,2022 at 17:30:20 UTC from IEEE Xplore. Restrictions apply. 9/14/21 4:19 PM

You might also like