Professional Documents
Culture Documents
The Breast cancer wisconsin (diagnostic) dataset. The dataset includes several data
about the breast cancer tumors along with the classification’s labels, viz.,
malignant or benign.
print(labels)
#each label is linked to binary values of 0 and 1, where 0 represents malignant
tumors and 1 represents benign tumors.
print(feature_names)
# all the 30 features or attributes that each dataset of the tumor has. We will be
using the numerical values of these features in training our model and make the
correct prediction, whether or not a tumor is malignant or benign, based on this
features.
print(features)
# This is a huge dataset containing the numerical values of the 30 attributes of
all the 569 instances of tumor data.
# The train_test_split() function randomly splits the data using the parameter
test_size. What we have done here is that, we have split 33% of the original data
into test data (test). The remaining data (train) is the training data. Also, we have
respective labels for both the train variables and test variables, i.e. train_labels and
test_labels.
# the predict() function returned an array of 0s and 1s. These values represent the
predicted values of the test set for the tumor class (malignant or benign).
# importing the accuracy measuring function
from sklearn.metrics import accuracy_score
This machine learning classifier based on the Naive Bayes algorithm is 94.15%
accurate in predicting whether a tumor is malignant or benign.
References
https://www.geeksforgeeks.org/ml-cancer-cell-classification-using-scikit-learn/