You are on page 1of 7

Measurement: Sensors 25 (2023) 100602

Contents lists available at ScienceDirect

Measurement: Sensors
journal homepage: www.sciencedirect.com/journal/measurement-sensors

Develop the hybrid Adadelta Stochastic Gradient Classifier with optimized


feature selection algorithm to predict the heart disease at earlier stage
R. Senthil a, *, B. Narayanan b, K. Velmurugan c
a
Department of Computer Science & Engineering, Annamalai University, Chidambaram, Tamilnadu, India
b
Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar, Tamilnadu, India
c
Anjalai Ammal-Mahalingam Engineering College, Thiruvarur, Tamilnadu, India

A R T I C L E I N F O A B S T R A C T

Keywords: The technique of collecting and analyzing a massive quantity of patient data to obtain meaningful information
HADSGC-HHBS was available in a medical big data analysis. In many fields, including cloud-based medical systems, there are
Big data many barriers to big data analysis. The healthcare industry generates a significant amount of heart disease details
Machine learning
for the various patients. Most recent research focuses on business models based on big data analysis to improve
Health care
Performance
predictive performance of heart attack data and reduce risk levels for patients. Data storage, however, has been a
False positive rate major challenge; data must be accessed efficiently in multiple locations in a decentralized context. An objective
Space complexity should be to generate a Hybrid Adadelta Stochastic Gradient Classifier-based Healthcare Hash Big Data Storage
(HADSGC-HHBS) method of storing and managing clinical information from many places in a distributed setting
with the least amount of space and in the shortest amount of time. Data are categorized using a HADSGC-HHBS
technique after vast amounts of information have been collected based on certain characteristics. The stochastic
Gradient Classification (SGD) algorithm is to classify patient information using a non-convex possible risk target
than the Support Vector Machine(SVM) algorithm. A range of data documents is used to assess the proposed
HADSGC-HHBS process. Compared to previous approaches, the proposed HADSGC-HHBS process was productive
in terms of classification, false positives, and reduced computing complexity.

1. Introduction care devices [7]. The efficient storage and availability of a considerable
amount of data present a significant challenge to the biological and
The technique of gathering and analyzing massive amounts of pa­ medical industries [8]. Clinical Information Management, the big health
tient material is presented as big healthcare information analysis. Col­ data center, and model management used NoSQL datasets. The problem
lecting relevant information from a massive database has been of computational requirements, on the other hand, was unresolved. For
facilitated by evaluating big volumes of data [1]. In cloud health care big data analysis, a cloud-based MapReduce model was constructed,
applications, big data analysis was widely used. Personal information is although data analysis was not done [9–11]. To store and manage in­
derived from a better source of health care information [2]. Accurate formation, high-performance computing methods in biomedicine were
assessment of medical information would be a critical challenge for established, but the storage difficulty was not resolved [12]. Although a
research and access to big amounts of data as the biomedical and health decentralized architecture for securing patient records was built, it
care communities grow [3–5]. Due to its relevance, big data analysis neglected to categorize the statistics to retrieve meaningful information
presents barriers in various sectors, including cloud-based medical sys­ [13]. The practice-based approach, which reveals the basic relationships
tems. The latest results of the study as part of a marketing approach across big data analytics characteristics, was used to construct a big data
based on big data analytics to improve prediction efficiency and reduce analytics-enabled process model; however, the development method­
extreme danger [6]. ology did not perform well in data storage [14]. A temporal allocation
Big data analysis would be a technique for analyzing massive chart inquiry was used to perform an adequate assessment of healthcare
quantities of data for useful information. To acquire clinical informa­ big data, but it has a high time specificity [15].
tion, large amounts of information are commonly used in cloud health For medical big data analysis, a [16] Cloudlet-based mobile internet

* Corresponding author.
E-mail address: senthil3591@gmail.com (R. Senthil).

https://doi.org/10.1016/j.measen.2022.100602
Received 29 September 2022; Received in revised form 3 November 2022; Accepted 21 November 2022
Available online 25 November 2022
2665-9174/© 2022 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
R. Senthil et al. Measurement: Sensors 25 (2023) 100602

computing architecture was developed, and massive prices and has In addition to an SVM classifier, the Adadelta uses a non-convex
refused to deliver enhanced medical operations [17]. To improve prospective risk purpose to categorize the patient information. Ulti­
quality, big data analytics was applied to medical information obtained mately, the categorized material is subjected to the bloom hashing al­
through various sources [18]. Conventional techniques have several gorithm. The Bloom hash value, which should be a stochastic data
flaws, including a lack of classification, a high level of memory utiliza­ model, stores patient data with the least amount of capacity complica­
tion, and concerns about storage and access. In the healthcare industry, tions possible [24]. With the support of logic operations bitwise OR and
a vast amount of patient information was supplied. Healthcare data bitwise AND, the merge and intersection of bloom screens to identical
contain a variety of unexpected word forms, audio, and graphics [19]. size and quantity of hash values were achieved. As a result, hash func­
However, data security is the primary concern, and data in a decen­ tions were utilized to store and access sufferer records, resulting in low
tralized ledger requires good availability in multiple places [20]. For the memory utilization and information retrieving duration. The classifi­
examination of patient information association, a probability data cation performance, categorization duration, probability, high compu­
gathering system was designed [21]. Furthermore, depending on their tational, but also information retrieving moment of the experimental
current health status, a stochastic forecasting model predicts the future outcomes are examined.
medical condition of the most associated individuals. However, it is
unable to retain healthcare information. For the storage and retrieval of 2. Proposed method
clinical information, a hybrid Adadelta Classifier and Healthcare Hash
Big Data Storage (HADSGC-HHBS) methodology have been devised. The The amount of big data in the medical industry was enormous but
HADSGC-HHBS technique saves and retrieves clinical information of also growing at an accelerating rate. Through the supply of massive
several regions in a decentralized system to small capacity and temporal memory, cloud computing provides big data processes. The patient re­
difficulty. Initially, a considerable amount of information was gathered cord in the medical industry was complicated and tough to examine. As
[22]. The Adadelta Classification technique could then be used to divide a result, the amount of patient data held in health databases continues to
the supplied large datasets into several sections. Noisy data sets, the grow, and it is now dubbed big information. In the medical industry,
standard boosting strategy does not work well, and the end-enhancing storing and accessing patient data is a big concern. This research pro­
algorithm results in misinterpretation. The SVM classifier to the poses an effective HADSGC-HHBS technique to enhance the memoryand
HADSGC-HHBS approach would be a weak operator [23]. The Adadelta retrieving efficiency of sufferer information. Fig. 1 depicts the HADSGC-
transforms a weak learner into a “powerful” predictor by uniformly HHBS methodology for storing and managing healthcare information
combining weak hypotheses. with the least amount of space and time. Ultimately, the categorized

Fig. 1. Proposed architecture.

2
R. Senthil et al. Measurement: Sensors 25 (2023) 100602

data was stored using the Bloom hash algorithm. Bloom filter should be a it would be a collection of input patient records and jn = (1, +1). For
space-saving and time-saving data model for storing and managing pa­ identifying patient data, SVM was a machine learning algorithm that
tient information using a hash function. constructed an ideal hyperplane in a high-dimensional data matrix.
Fig. 3 shows the identification of patient records using an SVM classifier,
2.1. Algorithm counting the number of patient records D = D1, D2, D3, …. Dn spread
throughout medical systems.
Fig. 2 illustrates Adadelta Stochastic GradientDescent Segmentation
applied to poor SVM learners to achieve accurate patient record iden­
tification. Adadelta Stochastic Gradientapproaches continuously
educate SVM-based predictors. To generate a strong classifier, poor
learner SVM was applied to AdadeltaStochastic Gradient Perceptron
inputs has ‘n’ number of training samples (i0, j0), (i1, j1) …. (in, jn), where

2.1.1. Data storage algorithm


The input information of a Bloom hash algorithm should be produced
to right fixed-size hash code. The use of a big bit matrix in the blooming
filter approach effectively minimizes the likelihood of false detection.

Fig. 2. Steps to improve the SVM classifier using Adadelta Stochastic Gradient
Descent Classifier. Fig. 3. After applying ADSGD on SVM.

3
R. Senthil et al. Measurement: Sensors 25 (2023) 100602

Bloom filters use hashing algorithms to store categorized patient data D challenge of computational requirements. The physician could then
= D1, D2, D3,… Dn in’wa’ bits array. H = h1, h2, h3,… hn denotes the hash access the information and the bloom filter’s hash function. As a result,
function of each patient’s information. Each piece of data has a unique the HADSGC-HHBS mechanism improves computer storage perfor­
hashing algorithm, which would be saved in a bit vector. mance while keeping space complexity and available bandwidth low.

The patient information in the document was stored and accessed


through an experimental analysis. Medical patient information was
Data storage algorithm 2 illustrates the step-by-step procedure of gathered from the publicly available UCI computer learning repository
hash algorithm information memory to the shortest access duration. The for experimental evaluation. Cardiovascular disease patient information
categorized information was saved in a fixed size array to bits’wa’ uti­ collections are examined in the model for storing and managing
lizing the bloom filter and the hash function ‘h’. This aids in reducing the customer data from the file accurately and effectively. For medical big

4
R. Senthil et al. Measurement: Sensors 25 (2023) 100602

Table 1
Classification performance calculation.
Files Count Performance measures (accuracy)

Fuzzy system RNN Model CNN Proposed system

20 52 42 62 92
40 62 57 67 87
60 69 62 74 94
80 72 67 79 92
100 80 74 85 95
120 86 78 89 96
140 85 83 90 97
160 84 84 90 95
180 85 82 87 93
200 87 87 90 97

Fig. 5. Reliability of false-positive rate measurement.

Fig. 4. Effectiveness of classification performance measures.

data analytics, the dataset has 76 characteristics. Patient identification Fig. 6. The effect of computational requirements on the effectiveness.
number, age, sex, and other characteristics are among the variables
utilized in patient information documents. In this data gathering, there evaluation, patient data documents in the category of 10–200 docu­
are 303 occurrences. Cleveland Clinic Organization provided the infor­ ments were examined. The HADSGC-HHBS approach was compared to
mation. Numerous variables classification, false-positive frequency, the existing Fuzzy technique [25], Convolutional Neural Network (CNN)
memory requirement, and information retrieving duration are evaluated [26], and Recurrent Neural Network (RNN) [27]based on the table
experimentally utilizing the HADSGC-HHBS technique. below. The validity of the patient information forecast has been
enhanced. As a result, the HADSGC-HHBS strategy outperformed other
3. Results and discussion state-of-the-art methods of classification accuracy.
As demonstrated in Fig. 4, the classification performance of the
In classification performance, false-positive frequency, high HADSGC-HHBS technique would be higher than that of the previous
computational, and knowledge fetching length, the study’s findings of approaches. The input documents initially contain patient records. The
the HADSGC-HHBS approach are compared to the existing techniques, SVM basic classifier is used to partition the data into several columns.
the large medical storage device and recovery method, and the Map- The SVM classifier utilizes a distinct hyperplane to classify the patient
Reduce Method. The number of documents used to assess the perfor­ data, maximizing the training data margin. The data to the discriminant
mance of the HADSGC-HHBS algorithm. hyperplane’s edge was organized into several columns, including pa­
Table 1 shows the experimental forecast prediction performance of tient ID, age, and gender. The Adadelta Stochastic Gradient Descent
HADSGC-HHBS and existing approaches. To conduct the experimental would be used to aggregate the findings of weak learners to get strong

Table 2 Table 3
False positive ratio. Performance measures of space complexity.
File Count Performance measures (False Positive) File Count Performance measures (Space Complexity in Mb)

Fuzzy system RNN CNN Proposed system Fuzzy system RNN CNN Proposed system

20 62 52 42 11 20 16 15 13 9
40 42 37 32 16 40 26 22 20 14
60 35 32 25 13 60 30 25 22 17
80 32 30 26 16 80 35 26 25 19
100 33 30 27 17 100 33 30 29 23
120 34 32 29 19 120 37 32 30 26
140 35 31 28 20 140 37 36 32 27
160 34 30 27 19 160 38 40 37 32
180 32 27 25 20 180 43 37 36 29
200 33 29 28 22 200 45 42 40 32

5
R. Senthil et al. Measurement: Sensors 25 (2023) 100602

classification performance. When the predictor’s outcome was higher Fig. 6 shows how space complexity impacts the development of
than 0, the Adadelta Stochastic Gradient Descent Classification method different patient records saved in a document. The above-mentioned
was used to calculate the poor classifier. The final moments and the result reveals remarkable outcomes based on disease forecast
margin of the patient’s information are adjusted. This method is assessment.
continued until the patient information has been properly sorted into the
appropriate number of particles. A contrast with the present big medical 4. Conclusions
database engine and extract method of Map Reduce Prototype, classi­
fication performance has improved by 10% and 25%, correspondingly. The number of patient records was initially gathered and organized
in a document. A document contains a variety of patient records. The
3.1. False positive rate Adadelta Stochastic Gradient Descent is used to sort the patient records
into other columns, such as patient ID, age, and so on. This improves the
Table 2 compares the False Positive Ratio (FPR) to the total number accuracy of classification while lowering the rate of false positives. The
of documents. T false-positive frequency of three main methods, the database contains for cardiac disorders was acquired from the UCI ma­
HADSGC-HHBS method, which presents a big medical data save and chine learning repository, and variables prediction performance, false-
extracts a prototype, and the Map-Reduce model, is shown in the table. positive rate, parameter settings, and information retrieving time are
When compared to alternative approaches, the proposed HADSGC- evaluated experimentally. An efficient mechanism called HADSGC-
HHBS process has a much lower false-positive frequency. Fig. 5 de­ HHBS was developed for big medical analytics of the internet. The
picts an experimental analysis of the false positive frequency of the Bloom hash function would be used to save the secret information in a
number of input documents. In comparison to existing approaches, the bit matrix. The information can be processed by the physician using the
proposed HADSGC-HHBSmechanism successfully characterizes the hash function. Therefore, the HADSGC-HHBS technique performs effi­
number of patient information of a lower false-positive frequency. The ciently saving and effectively obtaining patient records. The HADSGC-
Adadelta Stochastic Gradient Descent decreases misclassification by HHBS technique outperforms state-of-the-art approaches in classifica­
merging the base classifier throughout each cycle. If the output of the tion performance, false-positive rate, high computational, and infor­
strong classifier was negative, the patient information has been cate­ mation retrieving duration. The proposed HADSGC-HHBS compared
gorized erroneously. However, reducing misinterpretation, the pro­ with Fuzzy, RNN and CNN existingapproach to decrease the amount of
posed HADSGC-HHBS mechanism enhances classification performance material needed to hold patient records uses a Gramian asymmetric
to a lower false-positive rate. Ten documents as input, the proposed vector. Patient information was gathered and kept in a matrix. A vast
approach has a false positive rate of 20%, but the big medical infor­ amount of data was saved in a matrix format. This aided in the lowering
mation saves and inspection paradigm, and the Map-Reduce Design, of the difficulty of storage capacity in big data analysis. In comparison to
have false-positive rates of 40% and 31%, respectively, indicating a the present fuzzy rule, the area range is limited by 36%, 28%, and 22%
significant increase in the HADSGC-HHBS process. As contrasted to the using the HADSGC-HHBS method. In future, the information will collect
massive health database server and retrieve paradigm and the Map- from real time online medical information.
Reduce Prototype, the proposed approach reduces the false positive
rate by 42 and 28%, correspondingly. CRediT authorship contribution statement
Fig. 6 depicts the evaluation of the false positive rate in the different
patient record information files. The results of the new HADSGC-HHBS R. Senthil: Conceptualization, Methodology. B. Narayanan: Vali­
techniquewere compared to Fuzzy techniques [25], RNN [26] and CNN dation, Comments evaluated. K. Velmurugan: Training and testing.
[27]. The forecast of an invalid document was different in approaches,
as seen in the picture. However, the HADSGC-HHBS techniqueproduced Declaration of competing interest
the lowest false positive rate despite increasing the number of data files.
Information regression and categorization were done using random The authors declare that they have no known competing financial
decision forest ensemble methods. The Bivariate Correlation was interests or personal relationships that could have appeared to influence
calculated to maintain the link between the dependent and independent the work reported in this paper.
variables. Eventually, to acquire a reliable illness forecast, a decision
tree was developed and a voting mechanism was utilized. Furthermore, Data availability
the classification technique calculated the mistake to decrease the
false-positive frequency. As an outcome, compared to the standard fuzzy No data was used for the research described in the article.
rule summarization method, the monitoring of the false positive rate
was reduced by 51%, 45%, and 39%, respectively. References

[1] P. Saranya, P. Asha, Survey on big data analytics in health care, in: 2019
3.2. Space complexity
International Conference on Smart Systems and Inventive Technology (ICSSIT),
IEEE, 2019, November, pp. 46–51.
Table 3 compares the performance of HADSGC-HHBS and known [2] K. Abouelmehdi, A. Beni-Hessane, H. Khaloufi, Big healthcare data: preserving
approaches for space complexity. The experiment conducted, a quantity security and privacy, Journal of big data 5 (1) (2018) 1–18.
[3] L. Syed, S. Jabeen, S. Manimala, H.A. Elsayed, Data science algorithms and
of saved patient information documents in the range of 10–200 docu­ techniques for smart healthcare using IoT and big data analytics, in: Smart
ments was examined. The correlation to the HADSGC-HHBS techni­ Techniques for a Smarter Planet, Springer, Cham, 2019, pp. 211–241.
quewith occurring fuzzy law abstracted was seen in Table 3. Based on [4] S.N. Sajedi, M. Maadani, M. Nesari Moghadam, F-LEACH: a fuzzy-based data
aggregation scheme for healthcare IoT systems, J. Supercomput. 78 (1) (2022)
the above table values, the proposed HADSGC-HHBS technique achieves 1030–1047.
the smallest amount of space for storing multiple patient records. [5] B. Rathore, R. Gupta, A fuzzy-based hybrid decision-making framework to examine
Accuracy: To calculate it, multiply the total amount of different the safety risk factors of healthcare workers during the COVID-19 outbreak,
J. Decis. Syst. 31 (1–2) (2022) 68–101.
groups by the overall quantity of categories. [6] K. Sivakumar, N.S. Nithya, O. Revathy, Phenotype algorithm-based Big Data
analytics for cancer diagnosis, J. Med. Syst. 43 (8) (2019) 1–14.
T+ + T− [7] H. Khaloufi, K. Abouelmehdi, A. Beni-hssane, M. Saadi, The security model for big
Accuracy =
T+ + T − + F− + F− healthcare data lifecycle, Procedia Comput. Sci. 141 (2018) 294–301.
[8] O. Ben-Assuli, T. Heart, N. Shlomo, R. Klempfner, Bringing big data analytics closer
T+: True Positive T-: True Negative. to practice: a methodological explanation and demonstration of classification
F+: False Positive F+: False Negative. algorithms, Health Policy and Technology 8 (1) (2019) 7–13.

6
R. Senthil et al. Measurement: Sensors 25 (2023) 100602

[9] S. Shafqat, S. Kishwer, R.U. Rasool, J. Qadir, T. Amjad, H.F. Ahmad, Big data [19] F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, J. Gao, Dipole: diagnosis prediction in
analytics enhanced healthcare systems: a review, J. Supercomput. 76 (3) (2020) healthcare via attention-based bidirectional recurrent neural networks, in:
1754–1799. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
[10] Z. Che, S. Purushotham, K. Cho, D. Sontag, Y. Liu, Recurrent neural networks for Discovery and Data Mining, 2017, August, pp. 1903–1911.
multivariate time series with missing values, Sci. Rep. 8 (1) (2018) 1–12. [20] S.A. Syed, K. Sheela Sobana Rani, G.B. Mohammad, K.K. Chennam, R. Jaikumar,
[11] S. Subramaniyan, R. Regan, T. Perumal, K. Venkatachalam, Semi-supervised Y. Natarajan, V.P. Sundramurthy, Design of resources allocation in 6G cyber twin
machine learning algorithm for predicting diabetes using big data analytics, in: technology using the fuzzy neuro model in healthcare systems, Journal of
Business Intelligence for Enterprise Internet of Things, Springer, Cham, 2020, Healthcare Engineering 2022 (2022).
pp. 139–149. [21] S.S. Gill, R. Buyya, Bio-inspired algorithms for big data analytics: a survey,
[12] A. Ismail, S. Abdlerazek, I.M. El-Henawy, Big data analytics in heart disease taxonomy, and open challenges, in: Big Data Analytics for Intelligent Healthcare
prediction, J. Theor. Appl. Inf. Technol. 98 (11) (2020) 15–19. Management, Academic Press, 2019, pp. 1–17.
[13] S.K. Pandey, R.R. Janghel, Recent deep learning techniques, challenges and its [22] Y. Sha, M.D. Wang, Interpretable predictions of clinical outcomes with an
applications for medical healthcare system: a review, Neural Process. Lett. 50 (2) attention-based recurrent neural network, in: Proceedings of the 8th ACM
(2019) 1907–1935. International Conference on Bioinformatics, Computational Biology, and Health
[14] M. Morrison, G. Lăzăroiu, Cognitive internet of medical things, big healthcare data Informatics, 2017, August, pp. 233–240.
analytics, and artificial intelligence-based diagnostic algorithms during the COVID- [23] B. Ramasamy, A.Z. Hameed, Classification of healthcare data using hybridized
19 pandemic, American Journal of Medical Research 8 (2) (2021) 23–36. fuzzy and convolutional neural networks, Healthcare technology letters 6 (3)
[15] N. Dawar, N. Kehtarnavaz, A convolutional neural network-based sensor fusion (2019) 59–63.
system for monitoring transition movements in healthcare applications, in: 2018 [24] D. Pamucar, A.E. Torkayesh, S. Biswas, Supplier selection in healthcare supply
IEEE 14th International Conference on Control and Automation (ICCA), IEEE, chain management during the COVID-19 pandemic: a novel fuzzy rough decision-
2018, June, pp. 482–485. making approach, Ann. Oper. Res. (2022) 1–43.
[16] S. Athmaja, M. Hanumanthappa, V. Kavitha, A survey of machine learning [25] S. Selvarajan, H. Manoharan, T. Hasanin, R. Alsini, M. Uddin, M. Shorfuzzaman,
algorithms for big data analytics, in: 2017 International Conference on Innovations A. Alsufyani, Biomedical signals for healthcare using hadoop infrastructure with
in Information, Embedded and Communication Systems (ICIIECS), IEEE, 2017, artificial intelligence and fuzzy logic interpretation, Appl. Sci. 12 (10) (2022) 5097.
March, pp. 1–4. [26] P.S. Reddy, M. Chandrasekar, Distributed file system on medical data using
[17] G. Harerimana, B. Jang, J.W. Kim, H.K. Park, Health big data analytics: a machine learning techniques for healthcare surveillance, in: Proceedings of Third
technology survey, IEEE Access 6 (2018) 65661–65678. International Conference on Intelligent Computing, Information and Control
[18] A. Suresh, R.R. Nair, E.A. Neeba, S.A. Kumar, Recurrent neural network for genome Systems, Springer, Singapore, 2022, pp. 871–887.
sequencing for personalized cancer treatment in precision healthcare, Neural [27] B. Singh, H.K. Verma, Dawn of big data with hadoop and machine learning,
Process. Lett. (2021) 1–10. Machine Learning and Data Science: Fundamentals and Applications (2022) 47–65.

You might also like