You are on page 1of 25

HOST PROOF STORAGE OF HEALTHCARE RECORDS

ON PUBLIC CLOUD USING MACHINE LEARNING


AND
PAKE PROTOCOL

Submitted by, Guided By,


CHANDRU G [613517104004] Mr Kirupa Shankar K M
GOVARTHANAN S [613517104009] Assistant Professor,
KAWIN M [613517104018] Computer Science and Engineering,
SURYA S [613517104704] Government College of Engineering,
Dharamapuri.
Outline
• Introduction
• Problem Definition
• Objectives
• Literature Survey
• Existing System
• Proposed System
• Results
• Conclusion
Introduction
Introduction
• With the advancement in information technology, Bloodbanking System is
becoming more digital, more collaborative, more patient-centric, more data-
driven. It aims towards accessing information anytime, anywhere.
• The traditional technology infrastructure of Bloodbanking System will not be able
to cater to this massive amount of data generated.
• Cloud Computing is a fast growing trend that includes several services, all offered
on demand over the internet in a pay-as-you-go model. It promises to increase
the speed with which applications are deployed and lowers costs.
• But the security of data in the cloud is of concern today. Providing security for a
large volume of data with efficiency is required.
Problem Definition
• Data generated by Blood banks and Healthcare industry are increasing
rapidly and therefore needs to be managed efficiently.
• Additionally, these information tends to be highly sensitive and
confidential and needs to be handled in a secured way.
• The conventional blood banking process in many hospitals and blood
banks involves a number of manual work and it takes considerable
amount of time and effort to find or store donor information and is at
high risk of exposure to outsiders.
• To ensure absolute quality of services, there is a need to digitize the
data.
Objectives ..
● To reduce manual work by saving and retrieving donor’s information in a public
cloud.

● To increase security and privacy of confidential donor information.

● To increase efficiency and speed of the proposed security framework.

● To provide a password based authentication method to prove the user’s


identity without sharing the password between entities.
Literature Survey
1) Y. C. Yau, P. Khethavath and J. A. Figueroa, "Secure Pattern-Based Data Sensitivity Framework for
Big Data in Healthcare," :
This research work deals with the estimation of sensitivity of data using two features which are
obtained from analysis of the Big Data. The sensitive data fields are then encrypted using Elliptic
Curve Cryptographic method by using Hadoop based distributed systems to speedup the process. It
generates a public/private key value pair to transfer the data to the systems securely.

2) Anitha Kumari, K., Sudha Sadasivam, G. “Two-Server 3D ElGamal Diffie-Hellman Password


Authenticated and Key Exchange Protocol Using Geometrical Properties” :
This paper deals with the concept of Zero-Knowledge Password Proof in which the password is not
revealed to anyone, anywhere in the system by using 3D ElGamal Diffie-Hellman Algorithm. Also, it
is used to securely transfer a shared key between the two systems. This is done based on the
properties of the 3D shape - Tetrahedron.
Literature Survey
3) B. Lee, E. K. Dewi and M. F. Wajdi, "Data security in cloud computing using AES
under HEROKU cloud," 2018 27th Wireless and Optical Communication
Conference (WOCC), Hualien, Taiwan, 2018, pp. 1-5, doi:
10.1109/WOCC.2018.8372705. :
This paper demonstrates that AES-256 as the best option for data security on
public cloud using several parameters including delay and performance.
Existing System
Existing System
• BLOOD BANKING SYSTEMS
• PAPER BASED SYSTEM:
Paper based system uses traditional methods of maintaining data in papers and
ledgers which are recorded manually. These notebooks are stored in cupboards and
whenever any information is needed, the whole notebook is skimmed to get that
particular information, which is a cumbersome process.

• DIGITAL RECORDS BASED SYSTEMS:


Digital systems use digital records such as Excel sheets stored in local computers
or servers, this system requires less effort in maintenance and retrieval of data than
paper based system. But availability of data and reliability of the system is of concern.
Existing System
Existing System
• DATA ANALYSER
A two feature based Data Analyser tool is present in existing systems to find the
sensitivity of the attributes in the blood donor dataset.

i. Frequency Counter: The Frequency of Unique values in the Attribute is


calculated.
ii. Pattern Matching: Pre-Hardcoded strings are compared with the name of the
attribute to find it’s sensitivity.
ProposedSystem
Proposed System
BLOOD BANKING SYSTEM
• The proposed system involves: ADVANTAGES:
• Data Analyser • It provides the accurate cost and time
• Data Preprocessor effective solution to store and retrieve
• Sensitivity Classifier data.
• Data Encrypter • Provides Security, Privacy, Integrity and
• PAKE system Confidentiality to the data.
• Provides an efficient and secure
authentication system.
• Enhances the speed of the encryption and
decryption process.
Proposed System
Data Analyser
For each column in dataset, the following information is analyzed:
i) Datatype: 1 : Int, 2 : Float, 3 : String, 4 : DateTime
; ‫ݎܰ;ݑ‬
‫݊ܽܰ ݂݋݂ ܾ݉ ݁ ݑ‬
ii) Null %:  Percentage of NULL values in a column, = × 100
݈ܶܽ
݈ܽ‫݋ݓ ݏܴ ݋ݐ‬
; ‫ݎܰ;ݑ‬
‫ݑ= ݂݋݂ ܾ݉ ݁ ݑ‬
ܷ݊݅݁‫ݒ݈ܽݑ݁ݏݍݑ‬
= × 100
iii) Unique %: Percentage of Unique values in a column, ݈ܶܽ‫݋ݓ ݏܴ ݋ݐ‬
iv) Categorical: Yes (1) if < 2%, otherwise No (0).
v) Correlation: measure of the linear relationship between two quantitative variables,
Yes (1) if, > 0.75 or < -0.75 , otherwise No (0). A correlation is a single number that
describes the degree of relationship between
ܿ‫ܺ(ݒ݋‬, ܻ)two‫[ܧ‬
variables. 
ܺ− ߤܺ ܻ− ߤܻ ]
࣋ࢄ,ࢅ = ࢉ࢕࢘࢘ ࢄ, ࢅ = =
ߪܺ ߪܻ ߪ ܺ ߪܻ

vi) Sensitive: Pattern based Sensitivity analysed using Regular Expression - 0 : No, 1 :
Yes
Data Preprocessor
STRATIFIED TRAIN-TEST SPLIT:
• It splits the dataset into train and test sets in a way that preserves the same
proportions of examples in each class as observed in the original dataset. This is
called a stratified train-test split.
• The dataset is split into training and testing set of ratio 80:20.

LABEL ENCODER:
• It converts the labels into numeric form so as to convert it into the machine-
readable form. Machine learning algorithms can then decide in a better way on
how those labels must be operated.
Sensitivity Classifier
• RANDOM FOREST
• Classifier that contains a number of decision trees on
various subsets of the given dataset and takes the
average to improve the predictive accuracy.
Hyperparameter Tuning
CROSS VALIDATION:
• CV is used to evaluate machine
learning models on a limited data
sample.

RANDOM SEARCH CV:


• Random search CV searches the
specified subset of hyperparameters
randomly instead of exhaustively.
• The major benefit being decreased
processing time.
Data Encrypter
AES – 256 GCM:
•AES is based on a design
principle known as
a substitution–permutation
network, and is efficient in both
software and hardware.
•AES is a variant of Rijndael, with
a fixed block size of 128 bits, and
a key size of 128, 192, or 256
bits.
Galois/Counter Mode (GCM)
• GCM is a mode of operation for
AES which is widely adopted for its
performance. The operation is
an authenticated encryption algorithm
designed to provide both data
authenticity and confidentiality.

ADVANTAGES:
• AES-GCM is not only efficient and
secure, but hardware implementations
can achieve high speeds with low cost
and low latency, because the mode can
be pipelined.
Password Authenticated Key Exchange (PAKE)
3 phases are performed in this protocol:
1)Initialization Phase:
A large prime number p is chosen and a multiplicative group Zp is found and
two generators g1 and g2 are found out. These are shared between the user and
the cloud server.
2)Registration Phase:
pake Authenticated Key Exchange (PAKE)
Password
3)Authentication Phase:
Result and Analysis
Algorithm Cross Training Testing CLASSIFIER COMPARISON
Validation Accuracy Accuracy 105
Accuracy
100

SVM 89 90.25 85
95

Percentage Accuracy
MNB 85.99 95.76 91.67
90
DT 94.11 100 96.67
RF 91.58 97.67 95.34 85

GB 90.74 88.14 86.67 80

K-NN 90.74 88.16 85.33 75


SVM MNB DT RF GB K-NN

Classifiers

Accuracy score of various Classifiers Cross Validation Accuracy Training Accuracy Testing Accuracy
Result and Analysis
Algorithm Precision Recall F1 Score COMPARISON OF CLASSIFIERS
1.2

SVM 0.8516 0.8333 0.8399


1
MNB 0.7803 0.7916 0.7803
DT 0.9736 0.9583 0.9647 0.8

RF 0.9285 0.875 0.8901 0.6

GB 0.9377 0.9236 0.9294 0.4

K-NN 0.9377 0.9236 0.9294


0.2

Comparison of various Classifiers based 0


SVM MNB DT RF GB K-NN

on Precision, Recall and F1 Score Precision Recall F1 Score


Result and Analysis
Encryption Secure Data Password
Time Integrity based Key ENCRYPTION ALGORITHM COMPARISON
derivation 900
1000 10000
800

700
3DES 0.24 16.12 ✖ ✖ ✖
600

Time (in Seconds)


500
AES-GCM 0.19 12.43 ✓ ✓ ✖
400

RSA/ECC 725. 8363.2 ✖ ✖ ✖ 300

3 200

AES-GCM 31.7 295.39 ✓ ✓ ✓ 100

+ scrypt 0
10 100 1000 10000

Number of rows per column

Comparison of Encryption techniques AES


3DES
RSA
AES-GCM + scrypt
CONCLUSION
Conclusion
• The proposed system to store and manage donor’s data in public
cloud is efficient and is also secure. This saves a substantial amount of
time and manual work for the blood banks and healthcare centers.
• The data will be available 24 x 7 and the system is automated, so all
critical cases are handed quickly.
• The streamlining of wokflow, also enables many blood banks and
healthcare centers to work together and share information in case of
emergencies.
References
References
• [1] Y. C. Yau, P. Khethavath and J. A. Figueroa, "Secure Pattern-Based Data Sensitivity Framework for Big Data in Healthcare," 2019 IEEE
International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD), Honolulu, HI, USA, 2019, pp. 65-70, doi:
10.1109/BCD.2019.8885114.
• [2] Anitha Kumari, K., Sudha Sadasivam, G. Two-Server 3D ElGamal Diffie-Hellman Password Authenticated and Key Exchange Protocol
Using Geometrical Properties. Mobile Netw Appl 24, 1104–1119 (2019). Doi: 10.1007/s11036-018-1104-1.
• [3] Sumathi, M., Sangeetha, S. A group-key-based sensitive attribute protection in cloud storage using modified random Fibonacci
cryptography. Complex Intell. Syst. (2020). https://doi.org/10.1007/s40747-020-00162-3.
• [4] N. Shafnamol and K. R. Simi Krishna, "Signature-Based multi-server pake protocol," 2017 International Conference on Networks &
Advances in Computational Technologies (NetACT), Thiruvanthapuram, 2017, pp. 310-313, doi: 10.1109/NETACT.2017.8076786.
• [5] Abdalla M., Barbosa M., Bradley T., Jarecki S., Katz J., Xu J. (2020) Universally Composable Relaxed Password Authenticated Key
Exchange. In: Micciancio D., Ristenpart T. (eds) Advances in Cryptology – CRYPTO 2020. CRYPTO 2020. Lecture Notes in Computer Science,
vol 12170. Springer, Cham. https://doi.org/10.1007/978-3-030-56784-2_10.
• [6] N. A. Patel, "A Survey on Security Techniques used for Confidentiality in Cloud Computing," 2018 International Conference on Circuits
and Systems in Digital Enterprise Technology (ICCSDET), Kottayam, India, 2018, pp. 1-6, doi: 10.1109/ICCSDET.2018.8821135.
• [7] M. A. Zardari, L. T. Jung and N. Zakaria, "K-NN classifier for data confidentiality in cloud computing," 2014 International Conference on
Computer and Information Sciences (ICCOINS), Kuala Lumpur, Malaysia, 2014, pp. 1-6, doi: 10.1109/ICCOINS.2014.6868432.
• [8] Kaur, Kulwinder & Zandu, Vikas. (2016). A Secure Data Classification Model in Cloud Computing Using Machine Learning Approach.
International Journal of Grid and Distributed Computing. 9. 13-22. 10.14257/ijgdc.2016.9.8.02.
• [9] N. Surv, B. Wanve, R. Kamble, S. Patil and J. Katti, "Framework for client side AES encryption technique in cloud computing," 2015 IEEE
International Advance Computing Conference (IACC), Banglore, India, 2015, pp. 525-528, doi: 10.1109/IADCC.2015.7154763.
THANK YOU

You might also like