Home Work

HW12 Neural Networks
Q1 Iris Dataset Classification via Neural Networks
https://www.kaggle.com/azzion/iris-data-set-classification-using-neural-network/data
Classify the IRIS data set via Artificial Neural Networks ANN
a. Compute the Confusion Matrix and Accuracy for the ANN
#testing prediction !!
correct_prediction = tf.equal(tf.argmax(y_softmax), tf.argmax(Y_train_flatten))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
1/13/20 1
print("the Accuracy is :"+str(sess.run(accuracy, feed_dict={X: X_train_flatten, Y:
Y_train_flatten})))
Accuracy is 0.94
b. Compare the performance with aNaïve Bayes Classifier
Accuracy : 0.78
Q2 Titanic Dataset Classification via Neural Networks

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her
maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This
sensational tragedy shocked the international community and led to better safety regulations for ships.One of the
reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and
1/13/20 2
crew. Although there was some element of luck involved in surviving the sinking, some groups of people were
more likely to survive than others, such as women, children, and the upper-class.Classify the people were likely to
survive using the following classifiers:
a. Artificial Neural Networks
sns.countplot(x='Pclass', data=df, palette='hls', hue='Survived')

plt.xticks(rotation=45)
plt.show()
sns.countplot(x='Sex', data=df, palette='hls', hue='Survived')

plt.xticks(rotation=45)
plt.show()
1/13/20 3
Creating Neural Network model:
# set random seed for reproducibility

seed(42)
set_random_seed(42)
model = Sequential()
# create first hidden layer

model.add(Dense(lyrs[0], input_dim=X_train.shape[1], activation=act))
# create additional hidden layers

for i in range(1,len(lyrs)):
model.add(Dense(lyrs[i], activation=act))
# add dropout, default is none

model.add(Dropout(dr))
# create output layer

model.add(Dense(1, activation='sigmoid')) # output layer
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
return model
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 8) 136
_________________________________________________________________
dropout_1 (Dropout) (None, 8) 0
_________________________________________________________________
dense_2 (Dense) (None, 1) 9
=================================================================
Total params: 145
1/13/20 4
Trainable params: 145
Non-trainable params: 0
_________________________________________________________________
None
1/13/20 5
b. Compare performance to Logistic Regression or Naïve Bayes
Using Logistic regression:
import pandas as pd
from sklearn import preprocessing
1/13/20 6
import matplotlib.pyplot as plt
plt.rc("font", size=14)
import seaborn as sns
sns.set(style="white") #white background style for seaborn plots
sns.set(style="whitegrid", color_codes=True)
In [2]:
# Read CSV train data file into DataFrame
train_df = pd.read_csv("../input/train.csv")
# Read CSV test data file into DataFrame

test_df = pd.read_csv("../input/test.csv")
# preview train data

train_df.head()
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import RFE
cols = ["Age","Fare","TravelAlone","Pclass_1","Pclass_2","Embarked_C","Embarked_S","Sex_male","IsMinor"]
X = final_train[cols]
y = final_train['Survived']
# Build a logreg and compute the feature importances
model = LogisticRegression()
# create the RFE model and select 8 attributes
rfe = RFE(model, 8)
rfe = rfe.fit(X, y)
# summarize the selection of the attributes
print('Selected features: %s' % list(X.columns[rfe.support_]))
from sklearn.feature_selection import RFECV
# Create the RFE object and compute a cross-validated score.
# The "accuracy" scoring is proportional to the number of correct classifications
rfecv = RFECV(estimator=LogisticRegression(), step=1, cv=10, scoring='accuracy')
rfecv.fit(X, y)
print("Optimal number of features: %d" % rfecv.n_features_)

print('Selected features: %s' % list(X.columns[rfecv.support_]))
# Plot number of features VS. cross-validation scores

plt.figure(figsize=(10,6))
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()
Selected_features = ['Age', 'TravelAlone', 'Pclass_1', 'Pclass_2', 'Embarked_C',

'Embarked_S', 'Sex_male', 'IsMinor']
X = final_train[Selected_features]
plt.subplots(figsize=(8, 5))
1/13/20 7
sns.heatmap(X.corr(), annot=True, cmap="RdYlGn")
plt.show()
from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.metrics import accuracy_score, classification_report, precision_score, recall_score
from sklearn.metrics import confusion_matrix, precision_recall_curve, roc_curve, auc, log_loss
# create X (features) and y (response)

X = final_train[Selected_features]
y = final_train['Survived']
# use train/test split with different random_state values

# we can change the random_state values that changes the accuracy scores
# the scores change a lot, this is why testing scores is a high-variance estimate
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
# check classification scores of logistic regression

logreg = LogisticRegression()
logreg.fit(X_train, y_train)
y_pred = logreg.predict(X_test)
y_pred_proba = logreg.predict_proba(X_test)[:, 1]
[fpr, tpr, thr] = roc_curve(y_test, y_pred_proba)
print('Train/Test split results:')
print(logreg.__class__.__name__+" accuracy is %2.3f" % accuracy_score(y_test, y_pred))
print(logreg.__class__.__name__+" log_loss is %2.3f" % log_loss(y_test, y_pred_proba))
print(logreg.__class__.__name__+" auc is %2.3f" % auc(fpr, tpr))
idx = np.min(np.where(tpr > 0.95)) # index of the first threshold for which the sensibility > 0.95
plt.figure()
plt.plot(fpr, tpr, color='coral', label='ROC curve (area = %0.3f)' % auc(fpr, tpr))
plt.plot([0, 1], [0, 1], 'k--')
plt.plot([0,fpr[idx]], [tpr[idx],tpr[idx]], 'k--', color='blue')
plt.plot([fpr[idx],fpr[idx]], [0,tpr[idx]], 'k--', color='blue')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate (1 - specificity)', fontsize=14)
plt.ylabel('True Positive Rate (recall)', fontsize=14)
plt.title('Receiver operating characteristic (ROC) curve')
plt.legend(loc="lower right")
plt.show()
print("Using a threshold of %.3f " % thr[idx] + "guarantees a sensitivity of %.3f " % tpr[idx] +
"and a specificity of %.3f" % (1-fpr[idx]) +
1/13/20 8
", i.e. a false positive rate of %.2f%%." % (np.array(fpr[idx])*100))
Train/Test split results:

LogisticRegression accuracy is 0.782
LogisticRegression log_loss is 0.504
LogisticRegression auc is 0.839
https://www.kaggle.com/jamesleslie/titanic-neural-network-for-beginners
Q3 Review the ANN Algorithm
And apply ann.py to churn modelling as in the dataset for churn modeling
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset

dataset = pd.read_csv('C:/UCM/DatAnalysys/week12/Q3 ANN/Q3 ANN/Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
1/13/20 9
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Initializing Neural Network

classifier = Sequential()
1/13/20 10
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Creating the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
Accuracy= 0.8625
Q4 Review the CNN Algorithm

Review the CNN algorithm cnn.py and apply to the dataset Cats vs Dogs.
1/13/20 11
1/13/20 12

Home Work

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Home Work

Uploaded by

Copyright:

Available Formats

HW12 Neural Networks

Q1 Iris Dataset Classification via Neural Networks

a. Compute the Confusion Matrix and Accuracy for the ANN

b. Compare the performance with aNaïve Bayes Classifier

Q2 Titanic Dataset Classification via Neural Networks

sns.countplot(x='Pclass', data=df, palette='hls', hue='Survived')

sns.countplot(x='Sex', data=df, palette='hls', hue='Survived')

# set random seed for reproducibility

# create first hidden layer

# create additional hidden layers

# add dropout, default is none

# create output layer

model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

Using Logistic regression:

from sklearn import preprocessing

# Read CSV test data file into DataFrame

# preview train data

print("Optimal number of features: %d" % rfecv.n_features_)

# Plot number of features VS. cross-validation scores

Selected_features = ['Age', 'TravelAlone', 'Pclass_1', 'Pclass_2', 'Embarked_C',

from sklearn.model_selection import train_test_split, cross_val_score

# create X (features) and y (response)

# use train/test split with different random_state values

# check classification scores of logistic regression

Train/Test split results:

# Importing the dataset

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

from sklearn.preprocessing import StandardScaler

#Initializing Neural Network

# Creating the Confusion Matrix

Q4 Review the CNN Algorithm

You might also like