You are on page 1of 53

A

Internship Training Report


Entitled

Innovations in Risk Estimation Models,


Automation Solutions, and Chatbot Development
for Healthcare Applications
Submitted to the Department of Electronics Engineering in Partial Fulfilment for the
Requirements for the Degree of

Bachelor of Technology
(Electronics and Communication)

: Presented & Submitted By :

Bhavay Savaliya
Roll No. (U20EC096)
B. TECH. IV(EC), 8th Semester

: Guided By :

Dr. Kishor P. Upla


Associate Professor, DoECE

(Year: 2023-24)

DEPARTMENT OF ELECTRONICS ENGINEERING


SARDAR VALLABHBHAI NATIONAL INSTITUTE OF TECHNOLOGY
Surat-395007, Gujarat, INDIA.
Sardar Vallabhbhai National Institute Of Technology
Surat - 395 007, Gujarat, India

DEPARTMENT OF ELECTRONICS ENGINEERING

CERTIFICATE
This is to certify that the Internship Training Report entitled “Innovations in Risk
Estimation Models, Automation Solutions, and Chatbot Development for Health-
care Applications” is presented & submitted by Bhavay Savaliya, bearing Roll No.
U20EC096, of B.Tech. IV, 8th Semester in the partial fulfillment of the requirement
for the award of B.Tech. Degree in Electronics & Communication Engineering for aca-
demic year 2023-24.
He has successfully and satisfactorily completed Internship Training Exam in all
respects. We certify that the work is comprehensive, complete, and fit for evaluation.

Dr. Kishor P. Upla


Associate Professor & Guide

Examiners:

Name of Examiners Signature with Date


1.
2.
3.

Head of The Department Seal of The Department


DoECE, SVNIT (May, 2024)
GOOD4HEALTH
Lifecare Technology Pvt. Ltd.
CIN - U72900GJ2022PTC133257
GST - 24AAJCG8683P1ZP

Internship Completion Certificate


April 29, 2024

This is to certify that Bhavay Savaliya has successfully completed an internship as a


Product Development Intern at DQ Care from January 1, 2024 to April 26, 2024.

During the internship period, he demonstrated exceptional skills and dedication in


various projects:

1. Development of Risk Estimator Models: Exhibited proficiency in creating and


implementing various risk estimator models, showcasing a deep understanding of
statistical analysis and risk assessment methodologies.

2. Automation Solution for Medical Data Processing: Spearheaded the development


of an automation solution aimed at extracting medical parameters from patient
reports and efficiently saving them to the database. This initiative showcased his
strong programming skills and ability to solve complex data processing challenges.

3. Chatbot Development for Help and Support: Undertook the task of creating a
chatbot to provide assistance and support by fine-tuning Large Language Models. He
demonstrated creativity and innovation in leveraging cutting-edge technology to
enhance user experience and support services.

Throughout the internship, Bhavay Savaliya exhibited exemplary professionalism,


teamwork, and a willingness to learn. His contributions have significantly added value
to our projects and operations.

We commend Bhavay Savaliya for his outstanding performance and wish him all the
best in his future endeavours.

Dr. Krunal Ramteke


Founder, CEO

Call Office Address EMAIL


+91 6352 967 612 C/O POOJA DINESH TANDEL, UMARSADI, MACHHIWAD MANDIR admin@healthdetails.in
+91 9726 628 880 STREET PARDI VALSAD, VALSAD, Valsad, Gujarat, India, 396125 krunal.ramteke24@gmail.com

v
vii
Acknowledgements
I would like to express my profound gratitude and deep regards to my guide Dr. Kishor
P. Upla for his guidance. I am heartily thankful for suggestion and the clarity of the
concepts of the topic that helped me a lot for this work. I would also like to thank Prof.
Dr. Jignesh N. Sarvaiya , Head of the Electronics Engineering Department, SVNIT
and all the faculties of DoECE for their co-operation and suggestions. I am very much
grateful to all my classmates for their support.

Bhavay Savaliya
Sardar Vallabhbhai National Institute of Technology
Surat

May 2024

ix
Abstract
In the realm of healthcare, efficient utilization of data holds immense potential for im-
proving patient outcomes and streamlining medical processes. This internship report
outlines the development and implementation of several innovative solutions leverag-
ing deep neural networks (DNNs) and advanced natural language processing (NLP)
techniques in the medical domain.

The primary focus of this internship was to harness the power of machine learning
to analyze and interpret medical data. Classification and regression models were con-
structed using deep neural networks, enabling the classification of medical conditions
and the prediction of patient possibility of chances for a disease. These models were
trained on comprehensive datasets, providing insights into patterns and trends crucial
for clinical decision-making.

Furthermore, a novel solution was devised to automate the extraction of pertinent in-
formation from patient reports. Integrating optical character recognition (OCR) via
Pytesseract and leveraging the Gemini API, this solution efficiently extracts essential
parameters from textual data, facilitating seamless integration into medical databases.
This streamlined approach enhances data accessibility and accuracy, empowering health-
care professionals with actionable insights derived from patient records.

Additionally, a conversational agent (chatbot) was developed utilizing the Gemini API,
offering an intuitive interface for users to interact with medical data effectively. Through
natural language understanding and generation capabilities, the chatbot provides per-
sonalized assistance, answers queries, and facilitates data retrieval, ultimately enhanc-
ing the user experience and optimizing workflow efficiency.

Overall, this internship report underscores the significance of employing cutting-edge


technologies such as deep learning and NLP in revolutionizing healthcare practices. By
harnessing the potential of machine learning and automation, these solutions contribute
to the advancement of medical research, diagnosis, and patient care, paving the way for
a more data-driven and patient-centric healthcare ecosystem.

xi
Table of Contents
Page
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. xv
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Chapters
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Internship Insights: Unveiling My Professional Growth . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Industry Demand for Risk Estimation Models . . . . . . . . . . 2
1.2.2 Addressing Manual Processes with Automation Solutions . . . 3
1.2.3 Enhancing Patient Engagement and Support through Chatbot
Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.4 Convergence of Technology and Healthcare . . . . . . . . . . . 3
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Development of Accurate and Robust Risk Estimator Models . 3
1.3.2 Implementation of an Automated Solution for Medical Param-
eter Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.3 Development of a Chatbot for Patient Help and Support . . . . 4
1.3.4 Evaluation, Validation and Deployment of Developed Solutions 4
1.4 Organization of project Report . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Chapter 2: Literature Survey . . . . . . . . . . . . . . . . . . . 5
1.4.3 Chapter 3: Proposed Work . . . . . . . . . . . . . . . . . . . . 5
1.4.4 Chapter 4: Experimental Work . . . . . . . . . . . . . . . . . . 5
1.4.5 Chapter 5: Conclusion and Future Scope . . . . . . . . . . . . 6
2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Linear Regression [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Logistic Regression [2] . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Regression Using Deep Neural Networks . . . . . . . . . . . . . . . . 10
2.4 Classification Using Deep Neural Networks . . . . . . . . . . . . . . . 11
2.5 Normalization (Why Normalize Data?) . . . . . . . . . . . . . . . . . . 12
2.6 Optical Character Recognition (OCR) . . . . . . . . . . . . . . . . . . 14
2.6.1 Basic: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.2 Challenges: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.3 Applications: . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

xiii
Table of Contents

2.6.4 Future Directions: . . . . . . . . . . . . . . . . . . . . . . . . 15


2.7 Natural Language Processing [3] . . . . . . . . . . . . . . . . . . . . . 16
3 Proposed Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Creation of various estimation and classification models . . . . 19
3.2.2 Creation of a solution for Data Extraction . . . . . . . . . . . . 23
3.2.3 Creation of a chatbot for Help and Support . . . . . . . . . . . 25
4 Experimental Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1 Demo for Data Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Demo for AI Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Conclusion & Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

xiv
List of Figures

1.1 My learning during internship . . . . . . . . . . . . . . . . . . . . . . 1

3.1 Architecture of Cardiovascular risk estimation model . . . . . . . . . . 19


3.2 Flowchart of Estimation of Cardiovascular risk . . . . . . . . . . . . . 20
3.3 Architecture of Retinopathy risk estimation model . . . . . . . . . . . . 21
3.4 Overview of working of data extraction . . . . . . . . . . . . . . . . . 22
3.5 Overview of preprocessing report . . . . . . . . . . . . . . . . . . . . . 23
3.6 Flowchart of Extracting data from array of images . . . . . . . . . . . . 24

4.1 Patient’s report in pdf format . . . . . . . . . . . . . . . . . . . . . . . 28


4.2 Uploading Patient’s report . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3 Extracted data from patient’s report . . . . . . . . . . . . . . . . . . . 30
4.4 Extracted data from patient’s report . . . . . . . . . . . . . . . . . . . 31

xv
List of Abbreviations
API Application Programming Interface
CNN Convolutional Neural Network
LLM Large Language Model
ML Machine Learning
NLP Natural Language Processing
OCR Optical Character Recognition
RNN Recurrent Neural Network

xvii
Chapter 1
Introduction
As I look back on my internship at DQCare, I feel really grateful and excited to share
what I’ve learned. DQCare is a company that uses advanced technology, like AI and
image processing, to make healthcare better.

During my time there, I got to be part of some really cool projects that are chang-
ing the way healthcare works. I learned a lot and grew a ton, both as a professional and
as a person.

Figure 1.1: My learning during internship

1.1 Internship Insights: Unveiling My Professional Growth


In this report, I’ll talk about the big moments, the tough challenges, and the things I
learned while working at DQCare. From working with experts to helping with projects

1
Chapter 1. Introduction

that make a real difference for the users, every experience taught me something impor-
tant. Using Fig. 1.1 I will summarize my learning during internship:
1. First I had to complete a task in which I have to insert values from given csv file
to a specific website and save result back to csv file. As csv file contained 9000
entries. So, I learned selenium to automate this task.

2. After generating and filtering dataset, risk estimator model were created using
this data.

3. As it was a deep neural network model, if I used python then during deployment
size of docker-image would be greater than 500 MB as only TensorFlow library
size is around 450 MB. So, I learned candle framework and created model using
it. And the final docker-image was of 22 MB only because candle is a rust ML
framework.

4. Then I research OCR, Table Transformers, etc. that can help extract data from
pdf and images.

5. I thought using Gemini API and prompt engineering I can automate data extrac-
tion. So, I learned all about integrating with Gemini API.

6. My last task was to build an AI chatbot for Help and Support for the clients.
So, I learned finetuning open source Large Language Models using hugging-
face/transformers library. As the dataset provided by DQCare was not sufficient, I
provided data in context field of Gemini. Then I set the rules of chatbot in system
prompt and user queries were answered using user prompt.

7. Then I learned to containerize back-end using docker image and deployed the
solutions on Render.

1.2 Motivation
1.2.1 Industry Demand for Risk Estimation Models
The healthcare industry is increasingly recognizing the importance of predictive an-
alytics and risk estimation models in enhancing patient care and optimizing resource
allocation. With the growing volume of patient data being generated daily, there is a
pressing need for accurate risk assessment tools that can assist healthcare providers in
identifying high-risk patients and implementing proactive intervention strategies. The
creation of various risk estimator models during the internship period stemmed from
this industry demand to improve patient outcomes and operational efficiency within
healthcare organizations.

2
1.3. Objectives

1.2.2 Addressing Manual Processes with Automation Solutions


One of the prevalent challenges in healthcare institutions is the reliance on manual
processes for data entry and extraction, leading to inefficiencies and potential errors.
Recognizing this need for process optimization, the development of an automation so-
lution aimed to streamline the extraction of medical parameters from patient reports
and facilitate seamless integration into the database. By automating this labor-intensive
task, the solution not only improves data accuracy and timeliness but also frees up valu-
able human resources to focus on more value-added activities, such as patient care and
analysis of extracted information.

1.2.3 Enhancing Patient Engagement and Support through Chat-


bot Technology
In today’s digital era, patients expect convenient access to healthcare information and
support services round the clock. Leveraging advancements in natural language pro-
cessing (NLP) and chatbot technology, the creation of a chatbot for help and support
was motivated by the need to enhance patient engagement and satisfaction. By finetun-
ing Large Language models like GPT2 or by creating chatbot using Gemini API, the
chatbot can provide personalized assistance, answer frequently asked questions, and of-
fer guidance on medical queries, thereby empowering patients to make informed health-
care decisions and improving their overall experience with the healthcare provider.

1.2.4 Convergence of Technology and Healthcare


The overarching motivation behind the internship project lies in the convergence of
technology and healthcare to drive innovation and address longstanding challenges in
the industry. By harnessing the power of data analytics, automation, and artificial in-
telligence, the internship aimed to contribute towards the transformation of healthcare
delivery, making it more efficient, accessible, and patient-centric. Through the devel-
opment of risk estimator models, automation solutions, and chatbot technology, the
project sought to demonstrate the potential of technology to revolutionize healthcare
practices and improve outcomes for both patients and healthcare providers.

1.3 Objectives
1.3.1 Development of Accurate and Robust Risk Estimator Models
The primary objective of the internship project was to develop a series of risk estimator
models capable of accurately predicting various healthcare outcomes, such as disease

3
Chapter 1. Introduction

progression, treatment response, and patient prognosis. These models aimed to leverage
machine learning algorithms and statistical techniques to analyze patient data, including
demographic information, medical history, and clinical parameters, and generate risk
scores or probabilities. The goal was to create models that not only demonstrate high
predictive performance but also exhibit robustness and generalizability across diverse
patient populations and healthcare settings.

1.3.2 Implementation of an Automated Solution for Medical Pa-


rameter Extraction
Another key objective of the internship project was to design and implement an au-
tomation solution to streamline the extraction of medical parameters from patient re-
ports. The solution aimed to leverage natural language processing (NLP) techniques
and pattern recognition algorithms to automatically parse unstructured text data and
extract relevant medical information, such as vital signs, laboratory results, and diag-
nostic findings. By automating this process, the objective was to reduce manual effort
and error rates associated with manual data entry, while also improving the speed and
efficiency of data capture for subsequent analysis and decision-making.

1.3.3 Development of a Chatbot for Patient Help and Support


The internship project also sought to develop a chatbot application to provide assis-
tance and support to patients seeking healthcare information and guidance. Using state-
of-the-art language models such as GPT-2, the objective was to finetune the model on
healthcare-related datasets and train it to understand and respond to user queries in
a conversational manner. The chatbot aimed to offer personalized recommendations,
answer frequently asked questions, and provide relevant information on medical condi-
tions, treatments, and healthcare services. Additionally, the objective was to integrate
the chatbot with existing communication channels, such as websites and mobile appli-
cations, to enhance patient engagement and satisfaction.

1.3.4 Evaluation, Validation and Deployment of Developed Solu-


tions
A critical component of the internship project was the evaluation, validation and deploy-
ment of the developed solutions to assess their performance, accuracy, and usability in
real-world healthcare settings. This objective involved conducting rigorous testing and
validation procedures using representative datasets and scenarios to ensure the relia-
bility and effectiveness of the risk estimator models, automation solution, and chatbot

4
1.4. Organization of project Report

application.

1.4 Organization of project Report

1.4.1 Chapter 1: Introduction


The introduction chapter provides an overview of the internship project, including its
motivation, objectives, and organization. It sets the context for the report by discussing
the driving factors behind the project, the specific goals and outcomes to be achieved,
and the structure of the report itself.

1.4.2 Chapter 2: Literature Survey


This chapter presents a comprehensive review of relevant literature and background
information related to the internship project. It explores existing research, methodolo-
gies, and best practices in areas such as risk estimation modeling, automation solutions
in healthcare, and chatbot technology. By synthesizing prior knowledge and insights,
this chapter establishes a theoretical foundation for the internship project and identifies
gaps or opportunities for further exploration.

1.4.3 Chapter 3: Proposed Work


The methodology chapter outlines the research approach, data collection methods, and
analytical techniques employed during the internship project. It provides a detailed
description of the steps taken to develop the risk estimator models, design and im-
plement the automation solution, and train the chatbot application. Additionally, this
chapter discusses any ethical considerations, limitations, and assumptions underlying
the methodology used in the project.

1.4.4 Chapter 4: Experimental Work


In this chapter, the focus is on the practical implementation of the developed solutions
and the presentation of results obtained from testing and validation efforts. It showcases
the functionality and performance of the risk estimator models, automation solution,
and chatbot application through case studies, experiments, and performance metrics.
The chapter also discusses any challenges encountered during implementation and how
they were addressed to achieve the desired outcomes.

5
Chapter 1. Introduction

1.4.5 Chapter 5: Conclusion and Future Scope


The conclusion chapter summarizes the key findings, contributions, and insights gained
from the internship project. It restates the objectives outlined in the introduction and
discusses the extent to which they were achieved. Furthermore, the chapter suggests
potential avenues for future research and development, building on the work completed
during the internship period, and outlines recommendations for practitioners and poli-
cymakers interested in implementing similar solutions in healthcare settings.

6
Chapter 2
Literature Survey
Deep learning, a subset of artificial intelligence (AI), has emerged as a transformative
technology in healthcare, revolutionizing various aspects of medical practice and re-
search. Deep learning has enormous potential to improve patient care, diagnosis, and
treatment because of its capacity to automatically discover complex patterns and rep-
resentations from enormous volumes of data. An overview of the uses, difficulties,
and potential directions of deep learning in healthcare is given by this review of the
literature.

2.1 Linear Regression [1]


By fitting a linear equation to observed data, linear regression is a basic statistical ap-
proach that models the connection between a dependent variable and one or more inde-
pendent variables. Below is an explanation of the main ideas and theories underlying
linear regression:

1. Basic Idea: The goal of linear regression is to determine which straight line fits a
given set of data points the best. The relationship between the independent vari-
ables (commonly indicated as X) and the dependent variable (typically indicated
as y) is represented by this line.

2. Equation: The equation for a linear regression model with independent variables
is:
y = β0 + βX + ϵ (2.1)

• y is the dependent variable.


• X is the array of independent variables.
• β0 is the intercept (the value of y when X is zero).
• β is the slope of the line (the change in y for a one-unit change in X).
• ϵ is the error term, representing the difference between the observed y and
the predicted y.

3. Evaluation: Metrics like Root Mean Squared Error (RMSE) and R-squared
(the percentage of the dependent variable’s variance that can be predicted from
the independent variable or variables) are utilised in evaluating linear regression
models.

7
Chapter 2. Literature Survey

2.2 Logistic Regression [2]


A statistical method called logistic regression is employed to estimate the probability
of a binary result, or an outcome that might have one of two possible values, usually
denoted by the numbers 0 and 1. This kind of regression analysis uses categorical
dependent variables.

1. Basic Idea: Using a logistic (or sigmoid) function to represent the probability of a
binary result is the fundamental notion underpinning logistic regression. Logistic
regression is specifically developed for binary classification problems, where the
result variable can take only two possible values (usually coded as 0 and 1). This
is in contrast to linear regression, which is used to predict continuous outcomes.

2. Equation: The equation for a linear regression model with independent variables
is:
1
p(y = 1|x) = −(β
(2.2)
1 + e 0 +βX)
• p(y = 1|x) is the probability of the outcome being 1 (belonging to the
positive class) given the predictor array X.
• X is the array of independent variables.
• β0 is the constant coefficient (parameter) of the model.
• β is an array of coefficients (parameters) of the model.

3. Evaluation: Logistic regression is evaluated using given below metrics.

(a) Confusion Matrix: A table that lists a classification model’s performance is


called a confusion matrix. It displays the numbers of forecasts that are true
positives (TP), true negatives (TN), false positives (FP), and false negatives
(FN). Several evaluation metrics can be obtained from the confusion matrix.
(b) Accuracy: Accuracy measures the proportion of correctly classified obser-
vations out of the total number of observations. The percentage of correctly
identified observations relative to the total number of observations is known
as accuracy. It is calculated as:
TP + TN
Accuracy = (2.3)
TP + TN + FP + FN

(c) Precision: Out of all the positive predictions the model makes, precision
indicates the percentage of exact positive predictions. It is calculated as:
TP
P recision = (2.4)
TP + FP

8
2.2. Logistic Regression [2]

(d) Recall: The ratio of the number of true positive predictions among all actual
positive data cases is known as recall. It is calculated as:
TP
Recall = (2.5)
TP + FN

(e) F1 Score: The harmonic mean of recall and precision is the F1 score. It is
calculated as follows and offers a combination of recall and precision:
2 ∗ P recision ∗ Recall
F 1Score = (2.6)
P recision + Recall

(f) Receiver Operating Characteristic (ROC) Curve: Plotting the true pos-
itive rate (TPR) or recall versus the false positive rate (FPR) at different
threshold values is known as the ROC curve. It allows the representation of
the sensitivity vs. specificity trade-off.
(g) Area Under the ROC Curve (AUC-ROC): The model’s overall perfor-
mance is summarised by AUC-ROC for all threshold values. It shows the
likelihood that a randomly selected positive instance would be ranked higher
by the model than a randomly selected negative case.
(h) Precision-Recall Curve:The precision-recall curve plots precision against
recall at different threshold levels, much like the ROC curve. It comes in
very handy when working with unbalanced datasets.

9
Chapter 2. Literature Survey

2.3 Regression Using Deep Neural Networks


It involves employing neural network architectures to predict continuous outcomes. It
is well suited for tasks that has non - linear dependency.

1. Architecture:

(a) Input, hidden, and output layers are among the several layers of intercon-
nected neurons that make up DNNs.
(b) In regression tasks, the output layer typically consists of a single neuron,
and the network predicts continuous values.

2. Loss Function:

(a) For regression assignments, the mean squared error, or MSE, is frequently
utilised as the loss function.
(b) Reducing the discrepancy between the target values and the projected values
is the aim.

3. Activation Functions:

(a) By adding non-linearity to the model, activation functions enable DNNs to


approximate complicated functions.
(b) Rectified Linear Unit (ReLU), hyperbolic tangent (tanh), and sigmoid are
typical activation functions for hidden layers.

4. Optimization:

(a) During training, network parameters are often updated using Gradient Descent-
based optimisation algorithms like Adam or RMSprop.
(b) Mini-batch gradient descent is often employed to train DNNs efficiently on
large datasets.

5. Regularization:

(a) Techniques like L2 regularization (weight decay) or dropout are used to


prevent overfitting in DNNs.
(b) Regularization helps generalize the model to unseen data and improve its
performance.

10
2.4. Classification Using Deep Neural Networks

2.4 Classification Using Deep Neural Networks


It involves using neural network architectures to predict the categorical labels of input
data. It is well suited for tasks having non - linear relationship between inputs and
outputs.

1. Architecture:

(a) Similar to regression, DNNs for classification tasks consist of input, hidden,
and output layers.
(b) The output layer in classification tasks usually comprises of many neurons,
each of which represents a class label. Softmax activation is used to produce
class probabilities.

2. Loss Function:

(a) The loss function for classification tasks is often Cross-Entropy Loss, also
known as Log Loss.
(b) Reducing the discrepancy between the true class labels and the predicted
class probabilities is the aim.

3. Activation Functions:

(a) The output layer of binary classification tasks frequently uses the sigmoid
activation function to generate probabilities between 0 and 1.
(b) The softmax activation function is utilised to generate probabilities over sev-
eral classes in multi-class classification problems.

4. Optimization:

(a) Gradient Descent-based optimization algorithms are also used for training
classification DNNs.
(b) To further modify the learning rate during training, strategies such as learn-
ing rate scheduling might be used.

5. Evaluation Metrics:

(a) For classification tasks, accuracy, precision, recall, F1 score, and area under
the ROC curve (AUC-ROC) are common evaluation metrics.
(b) These metrics provide insights into the model’s performance in correctly
predicting class labels.

11
Chapter 2. Literature Survey

2.5 Normalization (Why Normalize Data?)


1. Improves Model Convergence:

• Normalizing data helps algorithms converge faster during training by bring-


ing features onto a similar scale.
• When features are normalised, big weight updates for features with higher
magnitudes are avoided, which speeds up the convergence of algorithms
such as gradient descent.

2. Facilitates Interpretability:

• Normalization makes it easier to interpret the importance of different fea-


tures in the model.
• Without normalization, features with larger scales might dominate the model’s
learning process, potentially masking the importance of other features.

Common Normalization Techniques:

1. Min-Max Scaling:

• Min-Max scaling, sometimes referred to as normalisation, rescales features


to a predetermined range, usually in the range of 0 and 1.
• The formula for Min-Max scaling is:

x − min(x)
xscaled =
max(x) − min(x)

where x is the original feature value, min(x) is the minimum value of the
feature, and max(x) is the maximum value of the feature.

2. Standardization (Z-score normalization):

• Features that undergo standardisation have a mean of 0 and a standard devi-


ation of 1.
• The formula for standardization is:
x − mean(x)
xstandardized =
std(x)

where x is the original feature value, mean(x) is the mean of the feature,
and std(x) is the standard deviation of the feature.

12
2.5. Normalization (Why Normalize Data?)

When to Use Each Technique:

• Min-Max Scaling:

– Suitable for algorithms that require features to be in a bounded range, such


as neural networks with input values between 0 and 1 or algorithms like
K-Nearest Neighbors.
– May be sensitive to outliers.

• Standardization:

– Suitable for algorithms that assume data is normally distributed, such as


linear regression or logistic regression.
– Less sensitive to outliers compared to Min-Max scaling.

Considerations:

1. Feature Scaling vs. Normalization:

• Feature scaling refers to the process of transforming numerical features to a


common scale, which includes both normalization and standardization.
• Normalization specifically refers to rescaling features to a fixed range (e.g.,
between 0 and 1) using techniques like Min-Max scaling.

2. Impact on Interpretability:

• Normalization changes the scale of the features but does not affect their
distribution or relationship with the target variable.
• Interpretability of the model remains intact after normalization, as the rela-
tive importance of features is preserved.

Normalization is an essential preprocessing step in machine learning workflows,


especially when dealing with numerical features of varying scales. By bringing fea-
tures onto a common scale, normalization helps improve model performance and inter-
pretability. The choice of normalization technique depends on the characteristics of the
data and the requirements of the machine learning algorithm.

13
Chapter 2. Literature Survey

2.6 Optical Character Recognition (OCR)


Through the use of optical character recognition (OCR) technology, text that has been
printed, typewritten, or handwritten can be converted from scanned images into machine-
readable text. Here’s some theory on OCR:

2.6.1 Basic:
1. Image Acquisition: OCR begins with the acquisition of an image containing text.
This image can be obtained from various sources, such as scanned documents,
photographs, or screenshots.

2. Preprocessing: Preprocessing techniques are applied to the image to enhance the


quality of the text and improve OCR accuracy. This may include operations such
as noise reduction, binarization, deskewing, and normalization.

3. Text Localization: Text localization techniques identify the regions in the image
that contain text. This step is crucial for isolating the text from the background
and other non-text elements.

4. Text Segmentation: Text segmentation divides the localized text regions into
individual characters or words. This step is necessary for recognizing and inter-
preting each character or word separately.

5. Feature Extraction: Feature extraction involves extracting relevant features from


each segmented character or word. These features may include shape, size, tex-
ture, and spatial relationships with neighboring characters.

6. Classification: Classification algorithms, such as neural networks or support vec-


tor machines, are trained on the extracted features to recognize the characters or
words. These algorithms assign a label (corresponding to a specific character or
word) to each input image.

7. Postprocessing: These techniques are applied to improve the accuracy of the rec-
ognized text. This may include spell checking, language modeling, and context-
based correction.

8. Text Output: The recognized text is outputted in a machine-readable format,


such as plain text or a structured document format (e.g., PDF). This text can then
be further processed, analyzed, or stored for downstream applications.

14
2.6. Optical Character Recognition (OCR)

2.6.2 Challenges:
1. Variability in Fonts and Styles: OCR systems must handle a wide range of
fonts, styles, sizes, and orientations of text present in images.

2. Noise and Distortions: Images may contain noise, distortions, blurring, or degra-
dation due to scanning artifacts, poor lighting conditions, or low-resolution cam-
eras.

3. Handwriting Recognition: Recognizing handwritten text is more challenging


than printed text due to variability in handwriting styles and individual prefer-
ences.

4. Multi-lingual and Multi-script Text: OCR systems must support multiple lan-
guages and scripts, each with its own unique character sets and writing conven-
tions.

2.6.3 Applications:
1. Document Digitization: OCR is widely used for converting printed documents,
books, and manuscripts into editable and searchable digital formats.

2. Data Entry and Form Processing: OCR automates data entry tasks by extract-
ing information from documents, invoices, forms, and receipts.

3. Automated Number Plate Recognition (ANPR): OCR is used in ANPR sys-


tems for recognizing vehicle registration plates in traffic surveillance, parking
management, and toll collection.

4. Mobile Scanning Apps: OCR is integrated into mobile scanning apps for cap-
turing text from images of business cards, whiteboards, and documents.

5. Accessibility Tools: OCR-based assistive technologies enable visually impaired


individuals to access printed materials and navigate digital content.

2.6.4 Future Directions:


1. Deep Learning: Convolutional neural networks (CNNs), one type of deep learn-
ing approach, have demonstrated encouraging results in increasing OCR accu-
racy, particularly for complex and noisy images.

2. End-to-End OCR Systems: End-to-end OCR systems that directly map input
images to output text without intermediate processing steps are being developed,
offering potential improvements in efficiency and accuracy.

15
Chapter 2. Literature Survey

3. Multimodal OCR: Integrating OCR with other modalities such as natural lan-
guage processing (NLP) and computer vision enables more sophisticated docu-
ment understanding and semantic analysis.

4. Real-time OCR: Real-time OCR systems capable of processing text in streaming


video feeds or live camera input are being developed for applications such as
augmented reality and smart glasses.

2.7 Natural Language Processing [3]


A branch of artificial intelligence (AI) called natural language processing (NLP) studies
how people and computers communicate using natural language. NLP aims to provide
computers the ability to comprehend, interpret, and produce meaningful, contextually
relevant human language. An outline of the concepts and methods used in NLP is
provided below:

1. Tokenization: Tokenization involves breaking down a piece of text into smaller


units, typically words or sentences. This is the first step in many NLP tasks.

2. Part-of-Speech (POS) Tagging: POS tagging involves assigning a grammatical


category (such as noun, verb, adjective, etc.) to each word in a sentence. This
helps in understanding the syntactic structure of the text.

3. Named Entity Recognition (NER): NER involves identifying and classifying


named entities in text into predefined categories such as names of persons, orga-
nizations, locations, dates, etc.

4. Parsing: Parsing involves analyzing the grammatical structure of sentences to


determine their syntactic relationships. This is often done using techniques like
constituency parsing or dependency parsing.

5. Semantic Analysis: Understanding a text’s meaning beyond its syntactic struc-


ture is the goal of semantic analysis. Semantic role labelling is one task that falls
under this category, it involves identifying the roles of various things and events
within a sentence.

6. Machine Translation: Machine translation is the process of automatically trans-


lating text between languages. Techniques include neural machine translation
models and statistical techniques.

7. Text Generation: Text generation involves producing natural language text au-
tomatically. This can include tasks like language modeling, text summarization,
and dialogue generation.

16
2.7. Natural Language Processing [3]

8. Question Answering: The goal of question answering systems is to provide au-


tomated, natural language responses to inquiries. This entails comprehending the
query, looking up pertinent data, and coming up with a suitable response.

9. Word Embeddings: Dense vector representations of words that convey semantic


meaning are called word embeddings. To create word embeddings, methods like
Word2Vec, GloVe, and FastText are frequently employed.

10. Sequence-to-Sequence Models: In natural language processing (NLP), sequence-


to-sequence models—especially those based on transformers or recurrent neural
networks (RNNs)—are extensively employed for tasks including dialogue pro-
duction, machine translation, and summarization.

11. Attention Mechanisms: Attention mechanisms allow models to focus on differ-


ent parts of the input sequence when making predictions. They have become a
fundamental component of many state-of-the-art NLP models, such as transform-
ers.

These are just some of the key theories and techniques in NLP, and the field is
constantly evolving with new advances in deep learning, reinforcement learning, and
other areas of AI.

17
Chapter 3
Proposed Work

3.1 Overview
The initial phase of the internship involved studying medical parameters and develop-
ing predictive models for cardiovascular risk estimation, Retinopathy estimation, and
diabetes prediction using deep learning techniques with TensorFlow. Subsequently, a
solution was created to streamline the extraction of essential patient details from reports
in image or PDF format, followed by the development of a chatbot using data provided
by DQcare.

3.2 Methodology
3.2.1 Creation of various estimation and classification models

Figure 3.1: Architecture of Cardiovascular risk estimation model

1. Cardiovascular Risk Estimation Model: This model estimates chances of car-

19
Chapter 3. Proposed Work

diovascular risk to a patient within 10 years. The architecture of model is shown


in Fig.3.1. This model was created with the following steps:

• Dataset Generation: Dataset was generated using a website provide by


researchers [4]. They created a web page where they estimate risk. Consid-
ering their values as ground truth values, we generated target values using
Selenium.
• Filtering Dataset: Null values and other invalid values containing entries
are removed from dataset.
• Normalization: Dataset was normalised to prevent vanishing and exploding
gradients problem and to train model efficiently.
• Training of model: Model was trained for 500 epochs with Adam opti-
mizer.
• Evaluation of model: Model has a Mean Square Error value of 2.7.
• Model with different frameworks: First model was created using Tensor-
Flow, but during its deployment it’s docker-image was approximately 500
MB in size. So, huggingface/candle framework was used to create model
again and its docker-image was of 22 MB size only.
• At application level, the data from the database is used for cardiovascular
risk estimation. As show in Fig.3.2 first dataset is filtered, then all values
are converted to float64 followed by preprocessing (using Standardisation
method) of that data. Then the data is used for estimation and thus result is
achieved.

Figure 3.2: Flowchart of Estimation of Cardiovascular risk

20
3.2. Methodology

Figure 3.3: Architecture of Retinopathy risk estimation model

2. Retinopathy Risk Estimation Model: This model estimates chances of retinopa-


thy risk to a patient within 10 years. The architecture of the model is shown in
Fig. 3.3. This model was created with the following steps:

• Dataset Generation: Dataset was generated using a website provide by


researchers [5]. They created a web page where they estimate risk. Consid-
ering their values as ground truth values, we generated target values using
Selenium.
• Filtering Dataset: Null values and other invalid values containing entries
are removed from dataset.
• Normalization: Dataset was normalised using sklearn StandardScaler.
• Training of model: Model was trained for 200 epochs with Adam optimizer
using TensorFlow framework.
• Evaluation of model: Model has a Mean Square Error value of 2.4.

3. Diabetes Prediction Model: This model predicts whether a patient is diabetic or


not. This model was created with the following steps:

• Dataset Generation: Dataset was taken from Kaggle [6].

21
Chapter 3. Proposed Work

• Filtering Dataset: Null values and other invalid values containing entries
are removed from dataset.
• Normalization: Dataset was normalised using sklearn StandardScaler.
• Training of model: First logistic regression model was created using sklearn
library. Then a deep neural network model was created using TensorFlow
with Adam as optimizer.
• Evaluation of model: Logistic regression model achieved an accuracy of
0.724. Deep neural network model achieved an accuracy of 0.732.

Figure 3.4: Overview of working of data extraction

22
3.2. Methodology

3.2.2 Creation of a solution for Data Extraction


• Overview A solution was created that takes patient’s report as input and extracts
important medical parameters and save the data to database.
As shown in Fig.3.4 first, report from front-end is sent to python back-end. Then
it gets converted to gray-scale. Then the important data is extracted and saved
into the database. This process is broken into 4 steps:

1. Check if report exists: As soon as the report reaches back-end we will


check whether the report is already present or not using sha256 key. If
report already exists then user will get notified that the file already exists. IF
file is not present then sha256 key will be generated.
2. Preprocessing report: As shown in Fig.3.5:

Figure 3.5: Overview of preprocessing report

(a) After generating sha256 key, report is renamed as usernametime.


(b) After that if report is an image then an array having a single element as
PIL image is created. Else if report is in PDF format, then an array of
PIL images is created from that PDF.
(c) After that all images are converted into gray-scale.

23
Chapter 3. Proposed Work

3. Extract Data: As shown in Fig.3.6:


(a) For every image, each image is passed through pytesseract OCR to get
words and its coordinates.
(b) For each image’s detected words from OCR, it is passed to Gemini
using API call to extract important parameters in json format.
(c) All image’s json response are appended into one and a final json re-
sponse is created.

Figure 3.6: Flowchart of Extracting data from array of images

4. Check if any data extracted: The final json response will be checked
whether it is empty or not. If it is empty then it means no data is extracted
from report, then we notify user about that. Otherwise, if any data is there
in json response then the data is saved to database. Also, report is saved to
cloud storage and report’s original file name and generated sha256 key is
saved to database.

24
3.2. Methodology

3.2.3 Creation of a chatbot for Help and Support


• Overview A chatbot was created that helps user clarify their queries. Two meth-
ods were applied to create the chatbot:

1. Finetuning with GPT2 [7]:


(a) Dataset in the form of question and answer was provided by DQCare.
So, final dataset was created in the appropriate format.
(b) After that on google-colab, using transformers library model was trained
for 20 epochs.
(c) Due to less data model performance was very poor. So, we decided to
go for another approach.
2. Chatbot using Gemini API: Dataset provided by DQCare was insufficient
for finetuning Text generation and question answering model. It only con-
tained 105 questions, so we decided to go for another approach. In this,
we provided all question and answers in the context field in google Gemini
pro and set the rules using system prompt. Then user interacts with chatbot
using user prompt.

Advantages:
(a) Usually a better choice when implementing using low dataset compared
to finetuning other Large Language Models.
(b) Google Gemini model is trained on a vast amount of dataset. This
means if user asks question apart from given data and if question is
valid then model generates response using data from internet.
(c) Maintenance is easier as no manual model deployment is needed.
Disadvantages:
(a) There may be security issues as context data is exposed to google Gem-
ini.

25
Chapter 4
Experimental Work
In this chapter, we will see a demo of the solution build by my team. First, solution
is that we created a simple frontend using React that have a file uploading option, a
graph for displaying cardiovascular risk estimation and some tables denoting tables of
the database. Second one is the AI chatbot integrated with the streamlit frontend.

4.1 Demo for Data Extraction


In this section, demo of data extraction is performed. Given below steps are the main
phases of demo:

1. Sample of Patient’s report from which data is to be extracted.


Fig. 4.1 shows two different test’s report of a patient. One is the Hematology
report and the other is the Lipid Profile report. The main report contained many
other tests but as I was using free resources of Gemini, no of requests were re-
stricted. So, I created a pdf for demo which contained two test reports, Lipid
Profile and Hematology.

2. Uploading patient’s report (as shown in Fig. 4.2)


Patient’s report is being uploaded to the server for data extraction. There is a
loader in the front-end. As soon as data extraction process is completed, the
loader will stop.

3. Displaying extracted data in the tables


After data extraction is completed, refreshing page will update the tables. Also,
as soon as data from lipid profile is extracted, it would be used in cardiovascular
risk estimation and the estimated percentage will be displayed in graph as shown
in Fig. 4.3. If a certain medical parameter is not found then the default value in
that field displayed is -1.

4.2 Demo for AI Chatbot


In this section, demo is performed on AI chatbot which is integrated with streamlit
frontend for question answering. Fig. 4.4 shows 3 different examples of question-
answering performed by chatbot. The details of question-answer showing in Fig. 4.4
are the following:

27
Chapter 4. Experimental Work

Lipid Profile Report

Hematology Report

Figure 4.1: Patient’s report in pdf format

1. Example 1:

• Question: Hello
• Answer: Hello! I’m your virtual health assistant from DQ Care, here to
help you navigate the world of digital healthcare and answer any questions
you have. What can I do for you today?

2. Example 2:

• Question: Is my personal information secure on a digital healthcare plat-


form?
• Answer: Hello! I’m your virtual health assistant from DQ Care, here to
help you navigate the world of digital healthcare and answer any questions
you might have.

28
4.2. Demo for AI Chatbot

Figure 4.2: Uploading Patient’s report

To answer your question, yes, absolutely! Your personal information is se-


cure on our platform. We take data privacy very seriously. You can review
our platform’s privacy policies and terms of service for more details on how
we protect your information.

Do you have any other questions I can help you with today?

3. Example 3:

• Question: Are there any age restrictions for using a digital healthcare plat-
form?
• Answer: Hello! I’m your virtual health assistant from DQ Care, here to
help you navigate the world of digital healthcare.

Regarding your question, individulas under 18 may need parental consent


to use our platform. This ensures young people have proper guidance in
managing their health.

Is there anything else you’d like to know about DQ Care or any other health
related concerns you have?

29
Chapter 4. Experimental Work

Results of Lipid Profile report

Results of Hematology Report

Figure 4.3: Extracted data from patient’s report

30
4.2. Demo for AI Chatbot

Example 1

Example 2

Example 3

Figure 4.4: Extracted data from patient’s report

31
Chapter 5
Conclusion & Future Scope

5.1 Conclusion
The integration of risk prediction models within the application serves as a proactive
measure to empower users in managing their health effectively. By providing users
with insight into their percentage risk for diabetes complications, these models encour-
age adherence to dietary plans and regular medical check-ups, thereby mitigating po-
tential risks and promoting overall well-being. Furthermore, the implementation of data
extraction techniques from medical reports offers users the convenience of maintaining
organized medical records within the application. This functionality enables users to ac-
cess comprehensive summaries and insights directly from the app, facilitating informed
decision-making and enhancing collaboration between users and healthcare profession-
als. Consequently, both users and healthcare providers benefit from streamlined ac-
cess to essential information, thereby simplifying processes and improving efficiency
in healthcare management. Moreover, the incorporation of an AI chatbot within the
application augments user experience by providing real-time assistance and guidance.
The chatbot’s ability to address queries related to app functionalities and medical ter-
minology not only enhances user engagement but also fosters a greater understanding
of health-related concepts. By offering personalized support and educational resources,
the chatbot contributes to an enriched user experience, ultimately promoting health lit-
eracy and empowerment. In essence, the combination of risk prediction models, data
extraction capabilities, and AI chatbot functionalities within the application underscores
a commitment to leveraging technology for the betterment of healthcare outcomes.

5.2 Future Scope


There are several opportunities for further improvement and enhancement within the
current functionalities of the application:

• Improving Prediction Model Accuracy: While the existing prediction models


provide valuable insights into diabetes complications, there is room for refine-
ment to enhance accuracy. By fine-tuning algorithms and incorporating additional
data sources, such as lifestyle factors and genetic predispositions, the accuracy of
prediction models can be increased. Moreover, expanding the scope to include
prediction models for other diseases can broaden the application’s utility and im-
pact in preventative healthcare. This would require comprehensive research and

33
Chapter 5. Conclusion & Future Scope

collaboration with medical experts to develop robust models tailored to specific


diseases.

• Enhancing Image Processing Accuracy: The accuracy of image processing


techniques for character detection can be further improved to ensure precise ex-
traction of data from medical reports. This could involve refining algorithms,
optimizing image resolution, and implementing machine learning techniques to
enhance pattern recognition capabilities.

• Enhancing AI Chatbot Capabilities: The AI chatbot’s effectiveness can be


enhanced by improving its capacity to answer questions accurately and formu-
late responses coherently. This could involve expanding the chatbot’s knowledge
base through continuous learning and integration with reliable medical databases.
Moreover, enhancing the chatbot’s capacity for proper sentence formation can
improve user engagement and satisfaction, contributing to a more seamless user
experience within the application.

34
References
[1] P. Schober and T. R. Vetter, “Linear regression in medical research,” Anesthesia &
Analgesia, vol. 132, no. 1, pp. 108–109, Jan 2021.

[2] M. Maalouf, “Logistic regression in data analysis: An overview,” International


Journal of Data Analysis Techniques and Strategies, vol. 3, pp. 281–299, 07 2011.

[3] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing:


state of the art, current trends and challenges,” Multimedia Tools and
Applications, vol. 82, no. 3, pp. 3713–3744, Jan 2023. [Online]. Available:
https://doi.org/10.1007/s11042-022-13428-4

[4] Białek, “CVD Risk Calculator – Cardiovascular Disease 10-year Risk,” https://
www.omnicalculator.com/health/cvd-risk.

[5] A. D. Association, “Diabetes-Related Retinopathy Risk Test,” https://diabetes.org/


retinopathy-risk-test.

[6] N. I. of Diabetes, Digestive, and K. D. diabetes, “Diabetes Dataset,” https://www.


kaggle.com/datasets/mathchi/diabetes-data-set.

[7] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language


models are unsupervised multitask learners,” 2019.

35

You might also like