Professional Documents
Culture Documents
5
TABLE OF CONTENT
02 LITERATURE SURVEY 26
AIM AND SCOPE OF PRESENT 31
03 INVESTIGATION
3.1 Aim 31
3.2 Project goals 31
3.3 Scope of the project 32
3.4 Feasibility study 32
6
04 EXPERIMENTAL OR MATERIAL AND 35
METHODS; ALGORITHMS USED 35
4.1 List of Algorithm 35
4.2 Project Requirements 35
4.2.1 Functional Requirements 36
4.2.2 Non-Functional Requirements 37
4.3 Environmental Requirements 41
4.4 Software Description 47
4.4.1 Anaconda Navigator 59
4.4.2 Jupyter Notebook 60
4.5 Python 61
4.6 System Architecture 62
4.7 Workflow Diagram 63
4.8 Use Diagram 64
4.9 Class Diagram 65
4.10 Activity Diagram 66
4.11 Sequence Diagram 66
4.12 Entity Relationship Diagram 68
4.13 Module Description 70
4.13.1 Data preprocessing 72
4.13.2 Data validation/Cleaning /Preparing 78
process 82
4.13.3 Comparing Algorithms with 84
prediction in the form of best accuracy result 84
4.13.4 Deployment
05 RESULT AND DISCUSSION, 86
PERFORMANCE ANALYSIS
5.1 Module Results 86
5.1.1 Logistic Regression 87
5.1.2 Random Forest Classifier 88
5.1.3 Decision Tree Classifier 89
5.1.4 Naïve Bayes Algorithm 90
5.2 Prediction Result by Accuracy 90
06 SUMMARY AND CONCLUSION 93
6.1 Coding 93
6.2 Conclusion 126
6.3 Future Work 126
7
LIST OF FIGURES
8
LIST OF SYMBOLS
9
S.N NOTATION NOTATION DESCRIPTION
O NAME
Represents a
collection of similar
1. Class
entities grouped
together.
10
8. State State of the process.
11
17. External entity Represents external
entities such as
keyboard,sensors,etc.
CHAPTER 1 : INTRODUCTION
12
based techniques for pulmonary disease prediction results in best accuracy. The
analysis of dataset by supervised machine learning technique(SMLT) to capture
several information’s like, variable identification, uni-variate analysis, bi-variate
and multi-variate analysis, missing value treatments and analyze the data
validation, data cleaning/preparing and data visualization will be done on the
entire given dataset. To propose a machine learning-based method to accurately
predict the pulmonary diseaseIndex value by prediction results in the form of
pulmonary disease classification best accuracy from comparing supervise
classification machine learning algorithms. Additionally, to compare and
discuss the performance of various machine learning algorithms from the given
dataset with evaluation classification report, identify the confusion matrix and
to categorizing data from priority and the result shows that the effectiveness of
the proposed machine learning algorithm technique can be compared with best
accuracy with precision, Recall and F1 Score.
The term "data science" has been traced back to 1974, when Peter
Naur proposed it as an alternative name for computer science. In 1996, the
International Federation of Classification Societies became the first conference
to specifically feature data science as a topic. However, the definition was still
in flux.
13
The term “data science” was first coined in 2008 by D.J. Patil, and Jeff
Hammerbacher, the pioneer leads of data and analytics efforts at LinkedIn and
Facebook. In less than a decade, it has become one of the hottest and most
trending professions in the market.
Data science is the field of study that combines domain expertise, programming
skills, and knowledge of mathematics and statistics to extract meaningful
insights from data.
15