You are on page 1of 50

AI-based chatbot for skin disease prediction

using CNN and ID3 Decision Tree


G-13
Team Members:

Vishnupriya N 2019103599
Ishwarya Rani M 2019103527
Navvya L 2019103548

Under the guidance of


Dr. P . Geetha
Introduction:

• Skin diseases are a common problem that affects people of all ages, genders, and ethnicities.

• They can be caused by a variety of factors such as genetics, environmental factors, and lifestyle habits.

• However, access to dermatologists or medical experts can be limited, leading to delayed or incorrect diagnoses,
which can further worsen the condition.

• With the rapid advancements in technology, the development of a disease prediction chatbot has become an effective
solution to this problem.

• In recent years, chatbots have emerged as a promising tool for providing healthcare services to people in remote or
underserved areas.

• This technology can provide individuals with immediate access to dermatological expertise, especially in areas with
limited healthcare resources.
Introduction:

• The skin disease chatbot system works by allowing users to upload images of their skin condition through a user-
friendly interface.

• The chatbot then analyses the images and provides information on the type of skin disease. The chatbot can also
answer basic questions about the condition and provide prevention.

• Overall, the skin disease chatbot is a promising solution to the problem of limited access to dermatologists and can
significantly improve the accuracy and timeliness of skin disease diagnoses, leading to better health outcomes.

• The main idea is to build a chatbot that takes several inputs regarding the user’s lifestyle and takes an image of the
skin disease.

• Predicts the exact disease by asking questions about the symptoms to the user and giving the required remedy.

• Hence here the image data with the metadata of the user is used to predict the skin disease.
Introduction:
• In this project we use the ISIC2019 dataset where each disease has its own set of images.

• A customized convolutional neural network (CNN) has been developed to shortlist the skin disease of the user images.

• A sigmoid function is used to shortlist which diseases are present in the image.

• External factors such as the patient's are taken into consideration when asking questions through the chatbot.

• A decision tree will be built dynamically for each image an use provides to model these external factors, and the order
of the questions for the chatbot is determined based on this decision tree.

• Overall, the skin disease chatbot is a promising solution to the problem of limited access to dermatologists and can
significantly improve the accuracy and timeliness of skin disease diagnoses, leading to better health outcomes.
Overall Objective

• To develop a chatbot system that can predict skin diseases based on images and provide basic information about the
disease condition through a chatbot.

• The system should be able to analyse skin image submitted by users and classify them into different categories of skin
diseases. Additionally, the chatbot asks basic questions to tell the exact disease.

• To make the system should be user-friendly and accessible to people with different levels of technical expertise.

• To get a high accuracy of the skin disease prediction.


Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR ADOPTION

1 Single Model Deep • Modified • Models with moderate • Totally depends on • Class imbalance issue can be dealt.
Learning on DCNNs complexity outperform training. So, if there are • Use of regularization techniques
Imbalanced Small • Regularization the larger ones. symptoms too different such as DropOut
Datasets for Skin using • Significant performance than the trained model
Lesion DropOut and with less computing dataset, then the
Classification. Yao, P., DropBlock resources and shorter prescription may not be
Shen, S., Xu, M., Liu, • Modified time. valid.
P., Zhang, F., Xing, J., RandAugment • Class imbalance issue
Shao, P., • MultiWeighted dealt.
Kaffenberger, B. and New Loss
Xu, R.X. (2022). IEEE • End-to-end
Transactions on Cumulative
Medical Imaging, Learning
41(5), pp.1242–1254. Strategy.

2 An AI-Based Medical • Uses deep • All around 24/7 support • Low accuracy • Provide details regarding hospitals
Chatbot Model for feedforward • Provides necessary and doctors.
Infectious Disease multilayer information about the
Prediction. perceptron for availability of hospital
Chakraborty, S., Paul, Covid-19 beds in an area where the
H., Ghatak, S., dataset user wants the patient to
Pandey, S.K., Kumar, • Utilizes DNN be taken.
A., Singh, K.U. and architecture.
Shah, M.A. (2022).
IEEE Access, 10,
pp.128469–128483.
Literature Survey
SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR ADOPTION
3 Skin Lesion Classification • Combines several • The nevi class • More deep networks and • This method is different from
by Ensembles of Deep convolutional deep scores were other topologies of the standard data augmentation
Convolutional Networks classifiers which form an higher in most lattice can be tested. techniques, which typically involve
and Regularly Spaced ensemble, by merging the of the randomly transforming the training
Shifting. Thurnhofer- class information provided predictions. images during training to improve
Hemsi, K., Lopez-Rubio, by the classifiers when they generalization.
E., Dominguez, E. and are run on shifted versions
Elizondo, D.A. (2021). of the test image.
 IEEE Access, 9,
pp.112193–112205.

4 Skin Disease Prediction. • Computer algorithm • An effective, • Can be binded with a • Training the algorithm on clusters
Sanas, S., Pawale, P., ResNet152V2 is used. low-cost website or an android of data rather than the entire
Ghadage, G. and Sahani, • Classification of data has solution. app to provide real-time dataset is also a good suggestion.
M. (2021). 8(4), been implemented with data for skin disease
pp.4344–4347. the help of a classifier such prediction.
as an artificial neural • For better performance,
network (ANN). designing deep learning
network structures, using
adaptive learning rates,
and training it on clusters
of data rather than the
whole dataset can be
done.
Literature Survey

SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR ADOPTION

5 Targeted Ensemble • A dynamic AI-model • A new classification • Classification can be • Use of a two-phase
Machine Classification configuration and process was provided to done for a wider skin- classification process.
Approach for Supporting secured IoT-Fog-Cloud produce better related disease
IoT Enabled Skin Disease and a new classification classification results in classification.
Detection. Yu, H.Q. and process to produce skin disease detection.
Reiff-Marganiec, S. (2021). better classification
IEEE Access, pp.1–1. results in skin disease
detection is used.

6 Prediction of Skin Diseases • Using computer-aided • Application developed is • It can be explored with • Ensuring that the system
Using Machine Learning. techniques in Machine light-weight and can be recent advances in AI is trained on diverse and
Mtende Mkandawire and learning such as used in machines with and the benefits of representative data.
Dr. Glorindal Selvam Ensemble Algorithm and low system diagnosis assisted with
(2022).  International Data Mining Algorithms specifications. AI.
Journal of Advanced to predict skin diseases
Research in Science, real-time.
Communication and
Technology, pp.54–61.
Literature Survey

SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR ADOPTION

7 Technical Aspects of • It follows a • Results showed that the • It is important to conduct • Question generation
Developing Chatbots for scoping review common language of more in-depth systematic method to be
Medical Applications: methodology, the communication between the reviews on the effectiveness dependent on the
Scoping Review. Safi, Z., PRISMA user and chatbot is English of chatbots. answers in a dynamic
Abd-Alrazaq, A., Khalifa, M. extension of way.
and Househ, M. (2020). scoping reviews
Journal of Medical Internet
Research, 22(12),
p.e19127.
8 IntelliDoctor – AI based • The app utilizes • All the information is displayed • Not reliable or accurate • A user friendly
Medical Assistant. Gandhi, predictive into graphical interface. compared to traditional application.
M., Kumar Singh, V. and analytics to • Periodic health reports are methods. • Focus on usability
Kumar, V. (2019). In: 2019 generate periodic generated for the users to and simplicity
Fifth International health based on follow.
Conference on Science everyday
Technology Engineering activities and
and Mathematics environment.
(ICONSTEM).
Literature Survey

SNO TITLE METHODOLOGY ADVANTAGES DISADVANTAGES IDEAS FOR


ADOPTION

9 A Convolutional Neural • CNN is used for text • The variation of the • accuracy can be • With an addition of
Network Model for Online classification system accuracy with improved (70% accuracy). chatbot to know
Medical Guidance. Yao, C., • Perform feature varying number of the other
Qu, Y., Jin, B., Guo, L., Li, C., construction and returned answers information.
Cui, W. and Feng, L. (2016). transformation on raw,
 IEEE Access, 4, pp.4094– noisy data.
4103.

10 Healthcare Chatbot using • Uses computer-aided • Application developed is • It can be explored with • It's important to
Artificial Intelligence. Patil, techniques in Machine light-weight and can be recent advances in AI and ensure that the
A. (2022). International learning such as used in machines with the benefits of diagnosis limited medical
Journal for Research in Ensemble Algorithm low system assisted with AI. information is
Applied Science and • Data Mining Algorithms specifications. reliable and
Engineering Technology, to predict skin diseases • simple user interface for accurate.
10(8), pp.905–909. real-time. the convenience of the
user.
Summary of Issues
• A project totally depending on images: So, if there are symptoms too different than the
trained model dataset, then the prescription may not be valid.
• A project was implemented in low accuracy state: The bot will not be helpful for the future
user and causes problems .
• A project wasn’t binded with a website or an android app to provide real-time data for skin
disease prediction.
• Classification wasn’t done for a wider skin-related disease classification resulting in
research and execution of limited medical information.
• It is important to conduct more in-depth systematic reviews on the effectiveness of chatbots
in supporting and enhancing positive clinical outcomes.
• A project relies on user's health activities like step counts, sleep tracking etc., so any
discrepancies in these readings could affect its accuracy too.
• A project did not have an online service platform and accuracy was as low as 70%
Proposed System
• In the previous works only image data set have been used to classify the images. In our project
we will be using other meta data of the make the prediction accuracy high.
• Our main contribution is to create chatbot with a questionnaire model. A questionnaire model
helps to collect relevant information from the user and improve the accuracy of the prediction.
• The questionnaire should include a set of questions related to the user's symptoms, medical
history, and other relevant factors that can help identify the skin disease.
• A set of questions related to symptoms and lifestyle will be hard-coded already to the
questionnaire model. The questions will be asked in a certain order based on the dynamically
created decision tree.
• The customized questionnaire (ordered) is now created based on the decision tree made for an
input image.
• Once the questionnaire is created, it can be used to guide the chatbot to ask questions and help
it make a more accurate prediction of skin disease by getting responses from the user.
Overall Architecture
Detailed Module Design
Module 1: Customized CNN Model Creation on ISIC2019 Dataset

Module 2: Shortlisting of skin diseases for given user image

Module 3: Dynamic ID3 Decision Tree Model

Module 4: Creation of Chatbot and Questionnaire Model

Module 5: Prediction of exact Skin Disease by the user- chatbot interaction


 
Module 1: Customized CNN Model Creation on ISIC2019 Dataset
• CNN model is created for the ISCI2019 skin disease image data set.
• This dataset consists of images of six skin diseases which are actinic keratosis, basal cell carcinoma, melanoma,
nevus, seborrheic keratosis, squamous cell carcinoma.
• For this image dataset, the customized CNN model is trained and created

INPUT - ISIC2019 skin disease image data set.


OUTPUT - CNN model
Module 2: Shortlisting of skin diseases for given user image
• Shortlisting skin diseases for given user image is done using the CNN model which was created in module 1.
• User image is collected and is applied on the created CNN model.
• The skin disease above the threshold 0.5 are shortlisted for the next process.

INPUT – CNN model and User Image


OUTPUT – Shortlisted skin diseases
Module 3: Dynamic ID3 Decision Tree Model
• Here another dataset PAD-UFES-20, which consist of the external factors is used.
• Only the details of the diseases that were shortlisted from the previous module are extracted from the PAD-UFES-20
dataset.
• ID3 decision tree model will be created for the extracted details.

INPUT – Shortlisted skin diseases


OUTPUT – ID3 Decision tree model
Module 4:Creation of Chatbot and Questionnaire Model
• The chatbot is created using the questions that need to be answered by the user.
• A model for creating the customized question and answer is created. Using this model, the chatbot generates the
customized question-answer for the dynamically generated decision tree.
• The Chatbot collects the user image for further processing. 

INPUT - NLP Model and Question-Answer model


OUTPUT - Chatbot
Module 5: Prediction of exact Skin Disease by the user- chatbot interaction
• Accurate questions are queried by the chatbot to the user based on the Questionnaire that is generated.
• Based on the user response about the symptoms follow-up questions are also generated.
• The exact skin disease of the user is predicted.
• Remedies for the predicted disease are also provided.
INPUT - Chatbot
OUTPUT - Skin disease of the user is predicted

 
IMPLEMENTATION DETAILS 30%
Module 1: Customized CNN Model Creation on ISIC2019 Dataset
All the required libraries and packages are being imported

The path for extracting the images from the ISIC2019 image dataset is defined. And the count of images is displayed.
Listing out all the class names of the skin cancer and storing them in a list

Visualising the images using matplotlib


Visualising the distribution of each disease in the train dataset. From this visualisation it can be seen that the
data is unevenly distributed among each class and there is less data sets in each class.

Rectifying the class imbalance using augmentor. In each class 1000 images count is maintained. The data is
randomly applied with left rotation and right rotation with maximum magnitude of 10 each.
After data augmentation 1000 images in each class has been created and hence totally for 6 class 6000 images
has been generated
Visualising the distribution of augmented data after adding new images to the training data. And it can be seen
that all the classes are balanced now.
Splitting data for training and validation, with a split of 0.2.
The CNN model is created by adding customised layers to it.
Summary of the created CNN model
Compiling the model. Adam optimizer is used here. The model is trained for 25 epochs.
Accuracy of the created CNN model is displayed for each epoch. At the 25 th epoch, a training accuracy of 0.89 and
a validation accuracy of 0.87 is obtained.

The created model is saved in the path final_model in the name of cnn_sigmoid_model.h5                     
Module 2: Shortlisting of skin diseases for given user image
Loading the previously created CNN model.

Creating the map list and setting the threshold for shortlisting to 0.5.

Creating the function for finding the diseases which are above the threshold. All the predicted value for each of the
six class is checked if it is above 0.5. If the value is above 0.5 the disease is appended to the shortlisted diseases list.
Applying the CNN model to predict the shortlisted diseases for an example test image1.
Applying the CNN model to predict the shortlisted diseases for
an example test image2
Module 3: Dynamic ID3 Decision Tree Model
ID3 DECISION TREE ALGORITHM is implemented from scratch. Here the data of only the shortlisted diseases is
extracted and decision tree is built for it.
Building decision tree for a shortlisted diseases. The PAD-UFES-20 dataset and the shortlisted diseases list is
passed as the arguments for building the tree.
Performance Measures
• For the performance evaluation, First we denote TP, FP, TN and FN as true positive (the number of
instances correctly predicted as required), false positive (the number of instances incorrectly predicted
as required), true negative (the number of instances correctly predicted as not required) and false
negative (the number of instances incorrectly predicted as not required), respectively. Then, we can
obtain four measurements: accuracy, precision, recall and F1-measure. The F1-Measure is the weighted
harmonic mean of the precision and recall and represents the overall performance.

• In addition to the aforementioned evaluation criteria, receiver operating characteristic (ROC) curve and
the area under curve (AUC) can be used to evaluate the pros and cons of the classifier. The ROC curve
shows the trade-off between the true positive rate (TPR) and the false positive rate (FPR), where the
TPR and FPR are defined as follows:
Performance Measures

• If the ROC curve is closer to the upper left corner of the graph, the model is better. The AUC is the
area under the curve. When the area is closer to 1, the model is better. In medical data, more
attention is to be paid to recall rather than accuracy. The higher the recall rate, the lower the
probability that a patient who will have the risk of disease is predicted to have no disease risk.
• Balanced accuracy (BACC) is used as the main evaluation measure. BACC is equivalent to the
average sensitivity or recall, which treats all the classes equally, and is expressed as:

• Where TP denotes true positives, FN denotes false negatives and C denotes the number of classes.
The averaged specificity and the average area under the receiver operating characteristic curve
(AUC) are also reported for the evaluation of the results of state-of-the-art algorithms.
Test Cases

Path to the test dataset, which contains 61 images from different classes:

The trained model is applied on every image and the shortlisted diseases for each image is predicted:
Code to find if the predict list is correct or not:
From the result it can be seen that the shortlisted diseases have been predicted correctly for each image
in each class:
Test Case 1:
Image 1 is imported from the test folder:

Importing image and rescaling it to the height 180 and width 180:
Converting the image to array and applying it on the created CNN model and getting the shortlisted diseases
list.

Generation of the decision tree for the data of the shortlisted diseases:
Test Case 2:
Image 2 is imported from the test folder:

Importing image and rescaling it to the height 180 and width 180:
Converting the image to array and applying it on the created CNN model and getting the shortlisted diseases
list.

Generation of the decision tree for the data of the shortlisted diseases:
References
[1] Yao, P., Shen, S., Xu, M., Liu, P., Zhang, F., Xing, J., Shao, P., Kaffenberger, B. and Xu, R.X.
(2022). Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion
Classification. IEEE Transactions on Medical Imaging, 41(5), pp.1242–1254.
doi:10.1109/tmi.2021.3136682.

[2] Chakraborty, S., Paul, H., Ghatak, S., Pandey, S.K., Kumar, A., Singh, K.U. and Shah, M.A.
(2022). An AI-Based Medical Chatbot Model for Infectious Disease Prediction. IEEE Access, 10,
pp.128469–128483. doi:10.1109/access.2022.3227208.

[3] Thurnhofer-Hemsi, K., Lopez-Rubio, E., Dominguez, E. and Elizondo, D.A. (2021). Skin
Lesion Classification by Ensembles of Deep Convolutional Networks and Regularly Spaced
Shifting. IEEE Access, 9, pp.112193–112205. doi:10.1109/access.2021.3103410.

[4]‌Sanas, S., Pawale, P., Ghadage, G. and Sahani, M. (2021). SKIN DISEASE PREDICTION.
8(4), pp.4344–4347.
References
[5] Yu, H.Q. and Reiff-Marganiec, S. (2021). Targeted Ensemble Machine Classification Approach for
Supporting IoT Enabled Skin Disease Detection. IEEE Access, pp.1–1. doi:10.1109/access.2021.3069024.

[6] Mtende Mkandawire and Dr. Glorindal Selvam (2022). Prediction of Skin Diseases using Machine Learning
Algorithms. International Journal of Advanced Research in Science, Communication and Technology, pp.54–61.
doi:10.48175/ijarsct-7139.

[7] Safi, Z., Abd-Alrazaq, A., Khalifa, M. and Househ, M. (2020). Technical Aspects of Developing Chatbots
for Medical Applications: Scoping Review. Journal of Medical Internet Research, 22(12), p.e19127.
doi:10.2196/19127.
References

[8] Gandhi, M., Kumar Singh, V. and Kumar, V. (2019). IntelliDoctor – AI based Medical Assistant. In: 2019
Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM).

[9] Yao, C., Qu, Y., Jin, B., Guo, L., Li, C., Cui, W. and Feng, L. (2016). A Convolutional Neural Network
Model for Online Medical Guidance. IEEE Access, 4, pp.4094–4103. doi:10.1109/access.2016.2594839.

[10] Patil, A. (2022). Healthcare Chatbot using Artificial Intelligence. International Journal for Research in
Applied Science and Engineering Technology, 10(8), pp.905–909. doi:10.22214/ijraset.2022.46299.

You might also like