You are on page 1of 3

Report CSC 645

Muhd Fakhrullah Bin Mohd Zaini 2020995275 5B

Random Data

Step 1: Open anaconda navigator and launch Jupyter Notebook


Save the iris.csv dataset in the same folder with the coding
Create new phyton file and start doing the coding

Step 2: Import library pandas in the coding


Import pandas as pd
data = pd.read_csv('iris.csv.csv')

Step 3: Drop the last column


input_data = data.drop(columns = 'species')
input_data

Step 4: Separate the input column and target column


target_data = data['species']
target_data

Step 5: Separate the input column and target column


target_data = data['species']
target_data

Step 6: Import Sklearn and split the testing data randomly using only 20% of the data
from sklearn.model_selection import train_test_split_data
input_train, input_test, target_train, target_test = train_test_split(input_data, target_data,
nnnnnn test_size = 0.2)

Step 7: After split the data user must import SVM and SVC model, also choose 4 different kernels
sklearn.svm import SVC
svcModel = SVC (kernel = 'linear'), or sigmoid, or rbf, or poly
svcModel.fit(input_train, target_train)

Step 8: User must make a prediction


output_test = svcModel.predict(input_test)
output _test

Step 9: Import sklearn.metrics to get a report


from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
sssssssssprint (classification_report(output_test, target_test)) k
xx print (confusion_matrix(output_test, target_test)) cccccccccccccccccccccccccccccccccccc
c print (accuracy_score(output_test, target_test))

Step 10: Compare the accuracy of each kernel and choose the highest accuracy

Step 11: In this practice the best kernel is poly as it get 100% accuracy
Fixed Data

Step 1: Open anaconda navigator and launch Jupyter Notebook


Save the iris.csv dataset in the same folder with the coding
Create new phyton file and start doing the coding

Step 2: Separate the data testing by 30 data and data training by 120 data in new microsoft excel
ssss file using random generator

Step 3: Import library pandas in the coding


Import pandas as pd
data_train = pd.read_csv('iris.train.csv')

Step 4: Drop the last and number column


input_data_train = data.drop(columns = ['No', 'species'])
input_data_train

Step 5: Separate the input column and target column


target_data_train = data_train['species']
target_data_train

Step 6: import SVM and SVC model, also choose 4 different kernels
sklearn.svm import SVC
svcModel = SVC (kernel = 'linear'), or sigmoid, or rbf, or poly
svcModel.fit(input_data_train, target_data_train)

Step 7: import the testing data


data_test = pd.read_csv('iris.test.csv')
data_test

Step 8: Drop the last column and number column for testing data
input_data_test = data_test.drop(columns = ['No', 'species'])
input_data_test

Step 9: Separate the input column and target column


target_data_test = data_test['species'] g…… ..
..vvv target_data_test

Step 10: User must make a prediction


output_test = svcModel.predict(input_data_test)
output _test

Step 11: Import sklearn.metrics to get a report


from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
sssssssssprint (classification_report(target_data_test, output_test)) k
xx print (confusion_matrix(target_data_test, output_test)) ccccccccccccccccccccc l
dd print (accuracy_score(target_data_test, output_test))

Step 12: Compare the accuracy of each kernel

Step 13: In this practice the best kernel is Linear as it get 96.7 accuracy
The difference result accuracy is based on different kernel that has been used to find the most
optimized and efficient kernel technique

Advantage of SVM

• SVM works relatively well when there is a clear margin of separation between classes.
• SVM is more effective in high dimensional spaces.
• SVM is effective in cases where the number of dimensions is greater than the number of
samples.
• SVM is relatively memory efficient

Disadvantage of SVM

• SVM algorithm is not suitable for large data sets.


• SVM does not perform very well when the data set has more noise i.e. target classes are
overlapping.
• In cases where the number of features for each data point exceeds the number of training
data samples, the SVM will underperform.
• As the support vector classifier works by putting data points, above and below the classifying
hyperplane there is no probabilistic explanation for the classification.

You might also like