Professional Documents
Culture Documents
BACHELOR OF
ENGINEERING IN
COMPUTER SCIENCE & ENGINEERING
Submitted to:
Dr Meenu Gupta
Submitted By:
Manish Raj
18BCS2216
During this activity of feature driven development, software requirement specification document
was prepared for capturing the requirements. ER Diagram and requirement specification
document was designed. After that, for the completion of this activity, a domain object model
was prepared along with the overall application architecture.
Functional Specifications
Included in this section are the functional/non-functional requirements of the systems along with
the use-cases and wireframes.
Functional Requirements:
Non-functional Requirements:
The website should be responsive and have consistent across different screen sizes and
resolutions.
The website should provide user information about different values used during the
prediction.
Waveform analysis, time-frequency analysis, Neuro Fuzzy RBF ANN and Total Least Square-
based Prony modeling algorithms are some of the techniques used to identify heart disease
in the literature. However, in a study by Marshall et al (Marshall et al 1991), classification
accuracy was not good with this technique (up to 79%) and the range of improvements to
select the appropriate model was still sufficient. They also demonstrated the efficiency of
neural networks in diagnosing heart attacks (acute myocardial infarction) by comparing
multiple neural network classifiers, the multilayer perceptron and the Boltzmann
perceptron classifier. Most of these approaches relate to diagnosis, not to the
understanding of fundamental knowledge.
3. Methodology Used:
In this project the dataset which is used is taken from Kaggle Heart Disease UCI. So now in
order to perform operations, regressions on dataset. First we’ll import few of the libraries of
python.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib.cm import rainbow
%matplotlib inline
import warnings
here in this project we’ll be experimenting with three of the algorithms. So we’ll be using
three of the algorithms which are KNeighborsClassifier, DecisionTreeClassifier,
RandomForestClassifier.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
Parameter’s sensitivity
We will present some parameters sensitivity for Decision tree J48 classifier and change its
pruning confidence factor parameter, where smaller pruning value would give more
pruning, and we will study the accuracy performance, kappa statistic, MAE and RAE
performance of the Decision tree J48 classifier due to these changes. Decision tree J48 was
used for the sensitivity analysis, because it had the max accuracy percentage out of all other
classifiers. Also, the training sample size for Naive Bay classifier will be used as a sensitivity
parameter, by changing its training set size and observe the changes in its classification
accuracy with respect to the portion of the training samples with respect to the total
samples. Naïve Bay was selected as an example of low accuracy rate classifier, ad to see the
changes of its performance in term of the changes of the training sample size. Regarding the
sensitivity analysis, parameter start with the default value of the parameter, then it was
changed accordingly to study the changes of the classifier performance in term of these
parameters.
Feature extraction
A feature extraction method was performed using Classifier Subset Evaluator by applying a
training classification data to estimate the accuracy of these subsets for all used classifiers
on the HD dataset and measure the quality of the generated subsets in order to evaluate
the classification performance after selecting the relevant attributes per classification
algorithm, and the results of the classifier are shown in Table 5, and a visual representation
is shown in Fig. 10.
dataset = pd.get_dummies(df, columns = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'slo
pe', 'ca', 'thal'])
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
standardScaler = StandardScaler()
columns_to_scale = ['age', 'trestbps', 'chol', 'thalach', 'oldpeak']
dataset[columns_to_scale] = standardScaler.fit_transform(dataset[columns_to_scale])
dataset.head()
y = dataset['target']
X = dataset.drop(['target'], axis = 1)
plt.plot([k for k in range(1, 21)], knn_scores, color = 'red')
for i in range(1,21):
plt.text(i, knn_scores[i-1], (i, knn_scores[i-1]))
plt.xticks([i for i in range(1, 21)])
plt.xlabel('Number of Neighbors (K)')
plt.ylabel('Scores')
plt.title('K Neighbors Classifier scores for different K values')
plt.plot([k for k in range(1, 21)], knn_scores, color = 'red')
for i in range(1,21):
plt.text(i, knn_scores[i-1], (i, knn_scores[i-1]))
plt.xticks([i for i in range(1, 21)])
plt.xlabel('Number of Neighbors (K)')
plt.ylabel('Scores')
plt.title('K Neighbors Classifier scores for different K values')
Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
randomforest_classifier= RandomForestClassifier(n_estimators=10)
score=cross_val_score(randomforest_classifier,X,y,cv=10)
score.mean()