Professional Documents
Culture Documents
Classification of Ship Images - Kaggle PDF
Classification of Ship Images - Kaggle PDF
Hackthon Timeframe: Friday May 24, 2019 - Sunday June 09, 2019
Problem Statement
Ship or vessel detection has a wide range of applications, in the areas of maritime safety, fisheries management,
marine pollution, defence and maritime security, protection from piracy, illegal migration, etc.
Keeping this in mind, a Governmental Maritime and Coastguard Agency is planning to deploy a computer vision
based automated system to identify ship type only from the images taken by the survey boats. You have been
hired as a consultant to build an efficient model for this project.
Dataset Description
There are 6252 images in train and 2680 images in test data. The categories of ships and their corresponding
codes in the dataset are as follows -
1: Cargo
2: Military
3: Carrier
4: Cruise
5: Tankers
Variable Definition
image Name of the image in the dataset (ID column)
category Ship category code
Evaluation Metric
Misc Rules
Use of external dataset is not allowed, however, transfer learning can be used to build the solution
https://www.kaggle.com/teeyee314/classification-of-ship-images 1/39
5/15/2020 Classification of Ship Images | Kaggle
In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import cv2
import os
import gc
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.model_selection import StratifiedKFold, train_test_split
from keras.preprocessing.image import load_img, img_to_array, array_to_img
from keras import callbacks
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.optimizers import adam
from keras.models import load_model
import warnings
warnings.filterwarnings('ignore')
print(os.listdir("../input"))
Basic EDA
https://www.kaggle.com/teeyee314/classification-of-ship-images 2/39
5/15/2020 Classification of Ship Images | Kaggle
In [2]:
# define the path for loading .jpg images
path = "../input/train/images"
train_files = pd.read_csv('../input/train/train.csv',
dtype={'image': 'object', 'category': 'int8'})
test_files = pd.read_csv('../input/test_ApKoW4T.csv')
In [3]:
train_files.head()
Out[3]:
image category
0 2823080.jpg 1
1 2870024.jpg 1
2 2662125.jpg 2
3 2900420.jpg 3
4 2804883.jpg 2
In [4]:
test_files.head()
Out[4]:
image
0 1007700.jpg
1 1011369.jpg
2 1051155.jpg
3 1062001.jpg
4 1069397.jpg
https://www.kaggle.com/teeyee314/classification-of-ship-images 3/39
5/15/2020 Classification of Ship Images | Kaggle
In [5]:
# display missing categories in train
train_files[train_files.isnull().any(axis=1)]
Out[5]:
image category
No missing values.
In [6]:
# dictionary ship encoding
ship = {'Cargo': 1,
'Military': 2,
'Carrier': 3,
'Cruise': 4,
'Tankers': 5}
In [7]:
# Create test labels for interpretability
train_files['ship'] = train_files['category'].map(ship).astype('category')
labels = list(train_files['ship'].unique())
https://www.kaggle.com/teeyee314/classification-of-ship-images 4/39
5/15/2020 Classification of Ship Images | Kaggle
In [8]:
# display count of ship types
plt.title('Count of each ship type')
sns.countplot(y=train_files['ship'].values)
plt.show()
gc.collect()
Out[8]:
2293
In [9]:
train_files['ship'].value_counts(normalize=False)
Out[9]:
Cargo 2120
Tankers 1217
Military 1167
Carrier 916
Cruise 832
Name: ship, dtype: int64
https://www.kaggle.com/teeyee314/classification-of-ship-images 5/39
5/15/2020 Classification of Ship Images | Kaggle
In [10]:
train_files['ship'].value_counts(normalize=True)
Out[10]:
Cargo 0.339091
Tankers 0.194658
Military 0.186660
Carrier 0.146513
Cruise 0.133077
Name: ship, dtype: float64
Since there are class imbalances in training set, we will need to display a confusion matrix visualize to where the
neural network classifier is having trouble. We will also stratify train_test_split in order to maintain the class
distributions in both train and test.
In [11]:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder(dtype='int8', sparse=False)
y_train = ohe.fit_transform(train_files['category'].values.reshape(-1,1))
https://www.kaggle.com/teeyee314/classification-of-ship-images 6/39
5/15/2020 Classification of Ship Images | Kaggle
In [12]:
# Since most pre-trained models have a specific input dimension,
# we will need to set the target_size to match the pre-trained model input shape.
# Increasing shape of the images requires more RAM.
def load(what='train', target_size=(224,224)):
array = []
if what =='train':
for file in tqdm(train_files['image'].values):
img = load_img(os.path.join(path, file), target_size=target_size)
img = img_to_array(img)/255. # normalize image tensor
array.append(img)
elif what =='test':
for file in tqdm(test_files['image'].values):
img = load_img(os.path.join(path, file), target_size=target_size)
img = img_to_array(img)/255. # normalize image tensor
array.append(img)
gc.collect()
return np.asarray(array)
In [13]:
# Load Train and Test
X_train = load()
test = load('test')
print(f'train dtype: {X_train.dtype}')
print(f'test dtype: {test.dtype}')
print(f'train shape: {X_train.shape}')
print(f'test shape: {test.shape}')
https://www.kaggle.com/teeyee314/classification-of-ship-images 7/39
5/15/2020 Classification of Ship Images | Kaggle
In [14]:
# visualize the top 28 train images
plt.figure(figsize=(12,24))
for i in range(1,29):
plt.subplot(7,4,i)
plt.title(f'{train_files["ship"].values[i]}')
plt.imshow(X_train[i])
plt.axis('off')
plt.show()
gc.collect()
https://www.kaggle.com/teeyee314/classification-of-ship-images 8/39
5/15/2020 Classification of Ship Images | Kaggle
https://www.kaggle.com/teeyee314/classification-of-ship-images 9/39
5/15/2020 Classification of Ship Images | Kaggle
Out[14]:
44777
https://www.kaggle.com/teeyee314/classification-of-ship-images 10/39
5/15/2020 Classification of Ship Images | Kaggle
In [15]:
# visualize the top 28 test images
plt.figure(figsize=(12,24))
for i in range(1,29):
plt.subplot(7,4,i)
plt.imshow(test[i])
plt.axis('off')
plt.show()
del test # free up space for training
gc.collect()
https://www.kaggle.com/teeyee314/classification-of-ship-images 11/39
5/15/2020 Classification of Ship Images | Kaggle
https://www.kaggle.com/teeyee314/classification-of-ship-images 12/39
5/15/2020 Classification of Ship Images | Kaggle
Out[15]:
44726
There are black and white images mixed in with color images. Some images are old. Some images contain
steam/smoke coming out of the smokestacks. Some images contain multiple ships. Some contain clouds and
others contain various background scenery. Some contain a lengthwise display and others contain a display of the
front of head of a ship. Some images are low contrast.
In order to address some of these concerns, such as grayscale, rotation, noise, etc. We will need to perform data
augmentation to create a more robust training set for our Neural Network to learn from. This may help better
generalize into testing dataset as well as help the network from overfitting to some extent.
Just looking at the top 28 ship images from train and test, it is not possible to tell how many edge cases there are,
but on first glance, train and test appear to be fairly uniform. That is to say, validation scores should reflect test
scores, unless you overfit the model.
https://www.kaggle.com/teeyee314/classification-of-ship-images 13/39
5/15/2020 Classification of Ship Images | Kaggle
In [16]:
class printf1(callbacks.Callback):
def __init__(self, X_train, y_train):
super(printf1, self).__init__()
self.bestf1 = 0
self.X_train = X_train
self.y_train = y_train
In [17]:
# to plot training/validation history object
def plt_dynamic(x, vy, ty, ax, colors=['b'], title=''):
ax.plot(x, vy, 'b', label='Validation Loss')
ax.plot(x, ty, 'r', label='Train Loss')
plt.legend()
plt.grid()
plt.title(title)
fig.canvas.draw()
plt.show()
gc.collect()
https://www.kaggle.com/teeyee314/classification-of-ship-images 14/39
5/15/2020 Classification of Ship Images | Kaggle
In [18]:
# https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_ma
trix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py
def plot_confusion_matrix(y_true, y_pred, classes,
normalize=False,
title=None,
cmap=plt.cm.Greens):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if not title:
if normalize:
title = 'Normalized confusion matrix'
else:
title = 'Confusion matrix, without normalization'
fig, ax = plt.subplots(figsize=(6,6))
im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
ax.figure.colorbar(im, ax=ax)
# We want to show all ticks...
ax.set(xticks=np.arange(cm.shape[1]),
yticks=np.arange(cm.shape[0]),
# ... and label them with the respective list entries
xticklabels=classes, yticklabels=classes,
title=title,
ylabel='True label',
xlabel='Predicted label')
ha="center", va="center",
color="white" if cm[i, j] > thresh else "black")
fig.tight_layout()
plt.show()
gc.collect()
In [19]:
# make sure internet is enabled in the settings tab to the right
# do not include the last Fully Connected(FC) layer
from keras.applications.xception import Xception
model = Xception(include_top=False, input_shape=(224,224,3))
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/pyt
hon/framework/op_def_library.py:263: colocate_with (from tensorflow.python.fr
amework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Downloading data from https://github.com/fchollet/deep-learning-models/releas
es/download/v0.4/xception_weights_tf_dim_ordering_tf_kernels_notop.h5
83689472/83683744 [==============================] - 3s 0us/step
In [20]:
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
x = GlobalAveragePooling2D()(model.output)
#x = Dense(6, activation='relu')(x)
output = Dense(5, activation='softmax')(x)
https://www.kaggle.com/teeyee314/classification-of-ship-images 16/39
5/15/2020 Classification of Ship Images | Kaggle
In [21]:
# visualize the Xception model architecture
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
SVG(model_to_dot(model).create(prog='dot', format='svg'))
https://www.kaggle.com/teeyee314/classification-of-ship-images 17/39
5/15/2020 Classification of Ship Images | Kaggle
Out[21]:
input_1: InputLayer
block1_conv1: Conv2D
block1_conv1_bn: BatchNormalization
block1_conv1_act: Activation
block1_conv2: Conv2D
block1_conv2_bn: BatchNormalization
block1_conv2_act: Activation
block2_sepconv1: SeparableConv2D
https://www.kaggle.com/teeyee314/classification-of-ship-images 18/39
5/15/2020 Classification of Ship Images | Kaggle
block2_sepconv2: SeparableConv2D
block2_sepconv2_bn: BatchNormalization
block2_pool: MaxPooling2D
add_1: Add
block3_sepconv1_act: Activation
block3_sepconv2_act: Activation
block3_sepconv2: SeparableConv2D
block3_sepconv2_bn: BatchNormalization
https://www.kaggle.com/teeyee314/classification-of-ship-images 19/39
5/15/2020 Classification of Ship Images | Kaggle
block3_pool: MaxPooling2D
add_2: Add
block4_sepconv1_act: Activation
block4_sepconv1: SeparableConv2D
block4_sepconv2_act: Activation
block4_sepconv2: SeparableConv2D
block4_sepconv2_bn: BatchNormaliz
block4_pool: MaxPooling2
add_3
block5_sepconv1: SeparableCo
block5_sepconv1_bn: BatchNorma
block5_sepconv2_act: Activati
block5_sepconv2: SeparableCon
block5_sepconv2_bn: BatchNorma
block5_sepconv3_act: Activati
block5_sepconv3: SeparableCon
block5_sepconv3_bn: BatchNorma
add_4
block6_sepconv1_act: Activa
https://www.kaggle.com/teeyee314/classification-of-ship-images 21/39
5/15/2020 Classification of Ship Images | Kaggle
block6_sepconv1: SeparableCon
block6_sepconv1_bn: BatchNormali
block6_sepconv2_act: Activatio
block6_sepconv2: SeparableConv
block6_sepconv2_bn: BatchNormali
block6_sepconv3_act: Activatio
block6_sepconv3: SeparableConv
block6_sepconv3_bn: BatchNormali
add_5:
block7_sepconv1_act: Activatio
https://www.kaggle.com/teeyee314/classification-of-ship-images 22/39
5/15/2020 Classification of Ship Images | Kaggle
block7_sepconv1: SeparableConv2D
block7_sepconv1_bn: BatchNormalizat
block7_sepconv2_act: Activation
block7_sepconv2: SeparableConv2D
block7_sepconv2_bn: BatchNormalizat
block7_sepconv3_act: Activation
block7_sepconv3: SeparableConv2D
block7_sepconv3_bn: BatchNormalizat
add_6: Ad
block8_sepconv1_act: Activation
https://www.kaggle.com/teeyee314/classification-of-ship-images 23/39
5/15/2020 Classification of Ship Images | Kaggle
block8_sepconv1: SeparableConv2D
block8_sepconv1_bn: BatchNormalizatio
block8_sepconv2_act: Activation
block8_sepconv2: SeparableConv2D
block8_sepconv2_bn: BatchNormalizatio
block8_sepconv3_act: Activation
block8_sepconv3: SeparableConv2D
block8_sepconv3_bn: BatchNormalizatio
add_7: Add
block9_sepconv1_act: Activation
https://www.kaggle.com/teeyee314/classification-of-ship-images 24/39
5/15/2020 Classification of Ship Images | Kaggle
block9_sepconv1: SeparableConv2D
block9_sepconv1_bn: BatchNormalization
block9_sepconv2_act: Activation
block9_sepconv2: SeparableConv2D
block9_sepconv2_bn: BatchNormalization
block9_sepconv3_act: Activation
block9_sepconv3: SeparableConv2D
block9_sepconv3_bn: BatchNormalization
add_8: Add
block10_sepconv1_act: Activation
https://www.kaggle.com/teeyee314/classification-of-ship-images 25/39
5/15/2020 Classification of Ship Images | Kaggle
block10_sepconv1: SeparableConv2D
block10_sepconv1_bn: BatchNormalization
block10_sepconv2_act: Activation
block10_sepconv2: SeparableConv2D
block10_sepconv2_bn: BatchNormalization
block10_sepconv3_act: Activation
block10_sepconv3: SeparableConv2D
block10_sepconv3_bn: BatchNormalization
add_9: Add
block11_sepconv1_act: Activation
block11_sepconv1_bn: BatchNormalization
block11_sepconv2_act: Activation
block11_sepconv2: SeparableConv2D
block11_sepconv2_bn: BatchNormalization
block11_sepconv3_act: Activation
block11_sepconv3: SeparableConv2D
block11_sepconv3_bn: BatchNormalization
add_10: Add
block12_sepconv1_act: Activation
block12_sepconv1: SeparableConv2D
https://www.kaggle.com/teeyee314/classification-of-ship-images 27/39
5/15/2020 Classification of Ship Images | Kaggle
block12_sepconv1_bn: BatchNormalization
block12_sepconv2_act: Activation
block12_sepconv2: SeparableConv2D
block12_sepconv2_bn: BatchNormalization
block12_sepconv3_act: Activation
block12_sepconv3: SeparableConv2D
block12_sepconv3_bn: BatchNormalization
add_11: Add
block13_sepconv1_act: Activation
block13_sepconv1: SeparableConv2D
https://www.kaggle.com/teeyee314/classification-of-ship-images 28/39
5/15/2020 Classification of Ship Images | Kaggle
block13_sepconv2_act: Activation
block13_sepconv2: SeparableConv
block13_sepconv2_bn: BatchNorm
block13_pool: MaxPooli
add
block14_sepcon
block14_sepconv1_
block14_sepco
block14_sepcon
https://www.kaggle.com/teeyee314/classification-of-ship-images 29/39
5/15/2020 Classification of Ship Images | Kaggle
block14_sepconv2_
block14_sepco
global_average_pooling2
dense
In [22]:
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train,
stratify=y_train,
random_state=2019,
test_size=0.2)
https://www.kaggle.com/teeyee314/classification-of-ship-images 30/39
5/15/2020 Classification of Ship Images | Kaggle
In [23]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
gc.collect()
Out[23]:
48967
Because no external data was allowed for this competition, it was crucial to modify (augment) the existing image
dataset through the use of a data generator. Data augmentation may hurt or improve model performance so care
must be taken at this step. Data augmentation needs to be considered case-by-case. So adding a vertical flip
when none of the ships are flipped upside-down will likely decrease model generaliziblity to testing data.
Adaptive learning rates can help models converge to the optimal solution much better.
https://www.kaggle.com/teeyee314/classification-of-ship-images 31/39
5/15/2020 Classification of Ship Images | Kaggle
In [24]:
# use ImageDataGenerator to augment training data
from keras.preprocessing.image import ImageDataGenerator
batch_size = 8
epochs = 50
datagen = ImageDataGenerator(rotation_range=45,
horizontal_flip=True,
width_shift_range=0.5,
height_shift_range=0.5,
dtype='float32')
f1 = printf1(X_train, y_train)
cp = ModelCheckpoint('best.hdf5', monitor='val_loss', save_best_only=True)
annealer = LearningRateScheduler(lambda x: 1e-4 * 0.95 ** x)
history = model.fit_generator(generator=train_generator,
steps_per_epoch=len(X_train)/batch_size,
validation_data=[X_test, y_test],
callbacks=[cp,f1,annealer],
epochs=epochs)
https://www.kaggle.com/teeyee314/classification-of-ship-images 32/39
5/15/2020 Classification of Ship Images | Kaggle
WARNING:tensorflow:From /opt/conda/lib/python3.6/site-packages/tensorflow/pyt
hon/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is d
eprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/50
626/625 [==============================] - 113s 181ms/step - loss: 0.7094 - a
cc: 0.7328 - val_loss: 0.3811 - val_acc: 0.8673
Train F1 Score: 0.8958
Valid F1 Score: 0.8604
Epoch 2/50
626/625 [==============================] - 103s 165ms/step - loss: 0.4452 - a
cc: 0.8371 - val_loss: 0.3679 - val_acc: 0.8753
Train F1 Score: 0.9256
Valid F1 Score: 0.8718
Epoch 3/50
626/625 [==============================] - 103s 165ms/step - loss: 0.3532 - a
cc: 0.8704 - val_loss: 0.2425 - val_acc: 0.9169
Train F1 Score: 0.9592
Valid F1 Score: 0.9170
Epoch 4/50
626/625 [==============================] - 104s 166ms/step - loss: 0.3104 - a
cc: 0.8836 - val_loss: 0.2679 - val_acc: 0.9097
Train F1 Score: 0.9495
Valid F1 Score: 0.9087
Epoch 5/50
626/625 [==============================] - 104s 166ms/step - loss: 0.2693 - a
cc: 0.9014 - val_loss: 0.2196 - val_acc: 0.9201
Train F1 Score: 0.9692
Valid F1 Score: 0.9204
Epoch 6/50
626/625 [==============================] - 104s 166ms/step - loss: 0.2356 - a
cc: 0.9155 - val_loss: 0.2140 - val_acc: 0.9329
Train F1 Score: 0.9778
Valid F1 Score: 0.9328
Epoch 7/50
626/625 [==============================] - 104s 166ms/step - loss: 0.2267 - a
cc: 0.9143 - val_loss: 0.3099 - val_acc: 0.9049
Train F1 Score: 0.9550
Valid F1 Score: 0.8995
Epoch 8/50
626/625 [==============================] - 104s 166ms/step - loss: 0.2019 - a
https://www.kaggle.com/teeyee314/classification-of-ship-images 33/39
5/15/2020 Classification of Ship Images | Kaggle
Epoch 17/50
626/625 [==============================] - 104s 166ms/step - loss: 0.1064 - a
cc: 0.9609 - val_loss: 0.2312 - val_acc: 0.9353
Train F1 Score: 0.9966
Valid F1 Score: 0.9351
Epoch 18/50
626/625 [==============================] - 104s 166ms/step - loss: 0.1041 - a
cc: 0.9619 - val_loss: 0.2021 - val_acc: 0.9432
Train F1 Score: 0.9980
Valid F1 Score: 0.9431
Epoch 19/50
421/625 [===================>..........] - ETA: 28s - loss: 0.0886 - acc: 0.9
667
https://www.kaggle.com/teeyee314/classification-of-ship-images 35/39
5/15/2020 Classification of Ship Images | Kaggle
In [25]:
# printout competition metric - F1 score
true = np.argmax(y_test, axis=1)
best = load_model('best.hdf5')
valid_pred_best = np.argmax(best.predict(X_test), axis=1)
best_f1_score = f1_score(true, valid_pred_best, average="weighted")
print(f'Best model weighted F1 Score: {best_f1_score:.4f}')
https://www.kaggle.com/teeyee314/classification-of-ship-images 36/39
5/15/2020 Classification of Ship Images | Kaggle
As you can see confusion matrix, the Neural Network is having a little trouble differentiating
between Cargos and Tankers. Xception is converging to minima quite nicely. There is over-fitting but not by much.
https://www.kaggle.com/teeyee314/classification-of-ship-images 37/39
5/15/2020 Classification of Ship Images | Kaggle
In [26]:
test = load('test')
sub = pd.read_csv('../input/sample_submission_ns2btKE.csv')
Out[26]:
image category
0 1007700.jpg 4
1 1011369.jpg 4
2 1051155.jpg 4
3 1062001.jpg 2
4 1069397.jpg 4
In [27]:
sub['category'].map(ship).value_counts(normalize=True)
Out[27]:
Cargo 0.338806
Tankers 0.191045
Military 0.188060
Carrier 0.148507
Cruise 0.133582
Name: category, dtype: float64
https://www.kaggle.com/teeyee314/classification-of-ship-images 38/39
5/15/2020 Classification of Ship Images | Kaggle
End Notes
This concludes my solution to the Analytics Vidhya Computer Vision Hackathon. While this notebook is fairly basic,
it gives a good template to work off and learn from, especially for beginners to image classification. There is
certainly a lot more that can be added, such as balancing the classes through oversampling, data augmentation
via external libraries/scripts and much more. With that being said, this notebook was able to achieve 95%+ in both
public and private leaderboards. 5-fold CV certainly adds another percentage increase. Finally, due to memory
constraints oversampling and 5-fold CV were not shown in this notebook
https://www.kaggle.com/teeyee314/classification-of-ship-images 39/39