Final

DISEASE INSPECTION IDENTIFICATION FOR
FOOD USING MACHINE LEARNING ALGORITHMS
Batch ID – 13/B/2021
Under the esteemed guidance of
G. Mahesh Reddy Asst.Prof
PRESENTED BY:
1. P. Venkata Reddy - 178X1A0582
2. S. Bharath Kumar - 178X1A0599
3. T. Mahesh - 178X1A05A7
4. G. Vishnu Teja - 178X1A05B6
CONTENTS
 Abstract  Output Screenshots
 Introduction  System Specifications
 Literature Survey  Conclusion
 Feasibility study  References
 Existing System
 Proposed system
 Architecture
 UML Diagrams
 Methodologies
 Implementation
ABSTRACT:
 This is one of the study to discuss the relationship between nutritional

ingredients identification in food and inspecting disease through Machine
Learning models to perform the data analysis, the experiments on real life
dataset show that our method improves the performance with efficient
accuracy . Additionally, Our system will recommend food for some
common disease such as Anigma, acne, ovarian, oral cancer, Kidney stone ,
liver disease etc. Our work is able to identify the disease that we may get
effected by lacking of certain nutritional ingredients in our body and
recommends the food that can benefit the rehabilitation of those diseases.
 To achieve high accuracy and low time complexity, the proposed system
implemented using random forest , decision tree , Gaussian Naïve bayes and
KNN models
INTRODUCTION
 NCDS are chronic diseases, which are mainly caused by occupational

and environmental factors, lifestyles and behaviors, including Heart
Disease , Ovarian , Liver Disease, Acne, kidney stone and other
diseases.
 According to the Global Status Report on Non-communicable
Diseases issued by the WHO, the annual death toll from NCDs keeps
adding up, which has caused serious economic burden to the world.
About 41 million people are dying from NCDs each year, which is
equivalent to 70% of the global death toll.
Cont..
 Suitable nutritional diets play an important role in maintaining health and

preventing the occurrence of NCDs. With the gradual recognition of this
concept, India has also repositioned the impact of food on health. However,
research on nutritional ingredients in food via Machine Learning, which are
conducive to the rehabilitation of diseases is still rare in India. At present,
India has just begun the IT (Information Technology) construction of smart
health-care.
 In India, studying the relationship between nutritional ingredients and diseases

using Machine Learning is immature. Most doctors only recommend the
specific food to patients suffering from NCDs, without giving any relevant
nutrition information, especially about nutritional ingredients in food.
LITERATURE SURVEY
[1] Retrospective Analysis of Hypertension Screening at a Mass Gathering in India:
Implications for Non-communicable Disease Control Strategies
[2] DIETOS: A Recommender System for Adaptive Diet Monitoring and Personalized
Food Suggestion
[3] Lumping Versus Splitting: the Need for Biological Data Mining in Precision Medicine
[4] Assessment of Aquatic Ecosystem Health Based on Principal Component Analysis with
Entropy Weight : A Case Study of Warming Reservoir (Hainan Island, China)
[5] A genome-wide Association Study Identifies GRKS and RASGRP1 as Type 2 Diabetes
Loci in Chinese Hans
FESABILITY STUDY:

The feasibility of the project is analyzed in this phase and business
proposal is put forth with a very general plan for the project and some cost
estimates. During system analysis the feasibility study of the proposed
system is to be carried out. This is to ensure that the proposed system is not
a burden to the company. For feasibility analysis, some understanding of
the major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are
1. Economical Feasibility
2. Technical Feasibility
3. Social Feasibility

EXISTING SYSTEM
 In the Existing System ,It only provides a novel system which can
estimate nutritional ingredients of food items by analyzing the input
image of food item. this system works on different deep learning
techniques and models for the accuracy of result of nutritional
components.
 Conversely, These models using images as input results in un stability
at certain times and requires advanced techniques to predict the
output. The complexity is higher in this model and time consuming.
DISADVANTAGES OF EXISTING SYSTEM
1. Low accuracy.
2. High complexities.
3. Time Consuming compared to other
techniques.
PROBLEM STATEMENT
• The Existing System used in performing the disease

analysis has low accuracy and high complexity. To avoid
these problems we propose a system which uses different
machine learning models for analysis ,which gives the
results with better accuracy and efficiency.
PROPOSED SYSTEM
 Proposed that we can identify the disease that may get effected due
to lack of ingredients thus we recommend food according to the
body’s intake on type of food consumption, minerals, and amount in
grams.
 However, the above studies are basically carried out through long-
term clinical trials, which just recommend food for certain specific
diseases and they seldom study the relationship between nutritional
ingredients and disease inspection by Machine Learning techniques.
PROPOSED SYSTEM
 In the proposed system, we can identify the disease that we may get
effected due to lack of certain ingredients in our body to avoid this
problem we recommend food according to the body’s intake food based
on the type of food consumption, minerals, and amount of food that
human body consumes (in grams).
 However, the above studies are basically carried out through long-term
clinical trials, which just recommend food for certain specific diseases
and they seldom study the relationship between nutritional ingredients
and disease inspection by Machine Learning techniques.
ADVANTAGES OF PROPOSED SYSTEM
1 . Low Complexities.
2. Continuous Improvement and better efficiency.
3. Improved accuracy by using the latest machine
learning models.
ARCHITECTURE:
UML DIAGRAMS:
1. Class Diagram
In software engineering, a class diagram in the Unified Modeling Language (UML) is a
type of static structure diagram that describes the structure of a system by showing the
system's classes, their attributes, operations (or methods), and the relationships among the
classes. It explains which class contains information.
user
system
upload dataset()
view dataset()
take dataset()
find preprocesiing data()
view preprocessing data()
find classification for food disease()
view classifcation for food disease()
find graph()
view graph()
2.Use Case Diagram
A use-case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram defined by
and created from a Use-case analysis. Its purpose is to present a graphical overview of the functionality
provided by a system in terms of actors, their goals (represented as use cases), and any dependencies
between those use cases.
upload dataset view dataset
take dataset perform preprocessing
System find classification for food name

User view preprocessing data
find graph
view classification for food name
view graph
3.Sequence Diagram
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how
processes operate with one another and in what order. It is a construct of a Message Sequence Chart.
Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.
User System
upload dataset
view dataset
take dataset
find preprocessing data
view preprocessing data
find classification for food disease
view classification for food disease
find graph
view graph
4. Collaboration Diagram
In collaboration diagram the method call sequence is indicated by some numbering technique as
shown below. The number indicates how the methods are called one after another. We have taken
the same order management system to describe the collaboration diagram. The method calls are
similar to that of a sequence diagram. But the difference is that the sequence diagram does not
describe the object organization whereas the collaboration diagram shows the object organization.
2: view dataset
4: find preprocessing data
6: find classification for food disease
8: find graph
1: upload dataset
3: take dataset
5: view preprocessing data
7: view classification for food disease
9: view graph
User System
5. Component Diagram
Component diagrams are used to describe the physical artifacts of a system. This artifact
includes files, executables, libraries etc. So the purpose of this diagram is different, Component
diagrams are used during the implementation phase of an application. But it is prepared well in
advance to visualize the implementation details. Initially the system is designed using different
UML diagrams and then when the artifacts are ready component diagrams are used to get an
idea of the implementation.
User System
6.Deployment Diagram
Deployment diagram represents the deployment view of a system. It is related to the component
diagram. Because the components are deployed using the deployment diagrams. A deployment
diagram consists of nodes. Nodes are nothing but physical hardware’s used to deploy the
application.
User
dataset
system
METHODOLOGIES:
In our Proposed system we have performed the data exploration,

data analysis operations and training the data models using
machine learning models/algorithms. There are 5 types of models
we have used in our project are listed below.
• Decision Tree
• K- Nearest Neighbors
• Random Forest
• Gaussian Naïve Bayes
1. DECISION TREE :
• Decision tree analysis is a predictive modeling tool that can be applied across many
areas. Decision trees can be constructed by an algorithmic approach that can split the
dataset in different ways based on different conditions. Decisions trees are the most
powerful algorithms that falls under the category of supervised algorithms.
Classification decision trees − in this kind of decision trees, the decision variable is
categorical. The above decision tree is an example of classification decision tree.
Algorithm Steps and Working:
Step1 : Begin the tree with the root node, says S, which contains the complete dataset
Step 2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
and calculate the Gini index.
- Gini index for sub-nodes by using the formula p^2+q^2
Step 3: Divide the S into subsets that contains possible values for the best attributes
in our food dataset to calculate the disease.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot
further classify the nodes and called the final node as a leaf node.
2.K-NEAREST NEIGHBORS
• K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case
into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors

-the most preferred value is k=5
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean

distance.
• Step-4: Among these k neighbors, count the number of the data points in each
category.
• Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
• Step-6: Our model is ready.

3. RANDOM FOREST
Random forest is a supervised learning algorithm which is used for both classification as well as
regression. But however, it is mainly used for classification problems. As we know that a forest is made
up of trees and more trees means more robust forest.
Similarly, random forest algorithm creates decision trees on data samples and then gets the prediction
from each of them and finally selects the best solution by means of voting.
Random Forest works in two-phase first is to create the random forest by combining N
decision tree, and second is to make predictions for each tree created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step- 5: For new data points, find the predictions of each decision tree, and assign the
new data points to the category that wins the majority votes.
4. Gaussian Naïve Bayes
• Naive Bayes Classifiers are based on the Bayes Theorem. One assumption
taken is the strong independence assumptions between the features.
• Naive Bayed classifiers need a small training data to estimate the parameters
needed for classification. Naive Bayes Classifiers have simple design and
implementation and they can applied to many real life situations.
-Bayes theorem : P(A/B) = P(B/A)*P(A) / P(B)
• Gaussian: The Gaussian model assumes that features follow a normal

distribution. This means if predictors take continuous values instead of
discrete, then the model assumes that these values are sampled from the
Gaussian distribution
IMPLEMENTATION
Step-1: Loading the dataset
Step-2: Perform the Preprocessing on dataset.
Step-3: Splitting the dataset into training and testing

models using “train_test_split” method.
Cont..
Step-4: Performing the EDA
Step-5: Training different models based on features selected.

- train the dataset with each model
Step-6: Generating the graph for each model comparing the

training and validation scores for each model
Step- 7: Comparing the accuracy for all the models by

describing a graph
Step-8: Perform the disease prediction by giving necessary

input values as Food Type , Minerals and Grams.
Step-9: Recommend the Food to avoid the disease

Performing the EDA
• We perform the exploration of data analysis between the given inputs such as Food
types, Minerals and Grams.
Training Score and Validation Scores Comparison Graphs:
Here, We are designing the validation curve between the Training and
Validation Scores.
1.Decision Tree 2.KNN Classifier

Cont..
Gaussian Naïve Bayes Random Forest

Accuracy Comparison :
Here we are comparing the accuracy scores of our models including Decision
Tree , Gaussian Naïve Bayes , Random Forest and KNN.
OUTPUT SCREENSHOTS:
In this we are running our application on Anaconda Prompt and predicting the
disease based on the lacking ingredients in our body.
Cont..
After giving the inputs for our model to test and it’ll predict the Disease Input columns
include these,
- Food Type
- Minerals
- Grams
Cont..
On Clicking the Food Recommendation button, we recommend the user with
certain food to avoid the disease and stay healthy.
SYSTEM SPECIFICATIONS:
 Hardware Specifications:
Processor : Intel i3 & above
RAM : 4 GB & above
Hard Disk : 512 GB & above
 Software Specifications:
Operating System : Windows 7 & above
Platform : Anaconda Navigator,
Jupyter Notebook
Python 3.7 with supported packages
CONCLUSION
we obtained and sorted out the dataset and used to predict the
disease based on the dataset and corresponding recommended
food with it from medical and official websites; we discussed
the relationship between nutritional ingredients and diseases,
which mainly aims to find out which ingredients play a
positive role in the rehabilitation of diseases. We have
achieved better accuracy compared to the existing system
FUTURE SCOPE
Our recommendation system works well in terms of predicting

for our dataset even though the project obtained good results,
but there are some future enhancements can be applied for the
project those are.
1. This project can be enhanced by using different datasets.

2. This project can also be trained using other machine
learning models to get more results in future.
3. This project can be performed in multiple languages in
future.
REFERENCES:
1. Abhishek Chaturvedi1 , Chetan Waghade2 , Shraddha Mehta3 , Sneha Ghugare4 , Ashish Dandekar5.” Food
Recognition and Nutrition Estimation Using Deep Learning” .2020
2. CNS,“2016GlobalNutritionReport,”inChineseNutritionSociety,2016.
3. WHO, “Global Status Report on Non communicable Diseases (2014),” in World Health Organization, 2014.
4. S. Balsari, P. Vemulapalli, M. Gofine et al., “A Retrospective Analysis of Hypertension Screening at a Mass
Gathering in India:Implications for Noncommunicable Disease Control Strategies,” Journal of Human
Hypertension, vol. 31, no.11, pp. 750–753, 2017.
5. DNHFPC of PRC,“Chinese Residentâ˘ A´Zs Chronic Disease and Nutrition (2015),”in National Health and
Family Planning Commission of the People’s Republic of China,2015.
6. S. Tellier, A. KiabyLars, P. Nissen et al., “Basic Concepts and Current Challenges of Public Health in
Humanitarian Action,” International Humanitarian Action, pp. 229–317,2017.
7. F. Ara1, F. Saleh, S. J. Mumu, F. Afnan and L. Ali, “Awareness Among Bangladeshi Type2 Diabetic Subjects
Regarding Diabetes and Risk Factors of Non-communicable Diseases,” Diabetologia, pp. S379, 2011.
DOI:10.1007/s00125-011-2276-4.
THANK YOU

Final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final

Uploaded by

Copyright:

Available Formats

DISEASE INSPECTION IDENTIFICATION FOR

FOOD USING MACHINE LEARNING ALGORITHMS

Under the esteemed guidance of

G. Mahesh Reddy Asst.Prof

 This is one of the study to discuss the relationship between nutritional

 NCDS are chronic diseases, which are mainly caused by occupational

 Suitable nutritional diets play an important role in maintaining health and

 In India, studying the relationship between nutritional ingredients and diseases

• The Existing System used in performing the disease

upload dataset view dataset

take dataset perform preprocessing

System find classification for food name

find preprocessing data

view preprocessing data

find classification for food disease

view classification for food disease

In our Proposed system we have performed the data exploration,

- Gini index for sub-nodes by using the formula p^2+q^2

• Step-1: Select the number K of the neighbors

• Step-2: Calculate the Euclidean distance of K number of neighbors

• Step-3: Take the K nearest neighbors as per the calculated Euclidean

• Step-6: Our model is ready.

Step-1: Select random K data points from the training set.

Step-4: Repeat Step 1 & 2.

-Bayes theorem : P(A/B) = P(B/A)*P(A) / P(B)

• Gaussian: The Gaussian model assumes that features follow a normal

Step-2: Perform the Preprocessing on dataset.

Step-3: Splitting the dataset into training and testing

Step-4: Performing the EDA

Step-5: Training different models based on features selected.

Step-6: Generating the graph for each model comparing the

Step- 7: Comparing the accuracy for all the models by

Step-8: Perform the disease prediction by giving necessary

Step-9: Recommend the Food to avoid the disease

1.Decision Tree 2.KNN Classifier

Gaussian Naïve Bayes Random Forest

Our recommendation system works well in terms of predicting

1. This project can be enhanced by using different datasets.

You might also like