You are on page 1of 3

CNN IMAGE INPUT SIZE

Field of view (FOV) of an optical system is often expressed as the maximum angular

size of the object as seen from the entrance pupil [ CITATION JEG04 \l 1033 ] and Cubert

S185 camera field of view is 30.75°. With working distance from oil palm sample to

front principal plane of camera, WD = 150mm, the horizontal field of view (HFOV)

are =2*150*tan(30.75°/2)= 82.49 mm or 8.249 cm x8.249 cm area on top of the

table which is around the area of oil palm sample size captured by Cubert S185

camera. Refer Equation 1 below.

Equation 1 HFOV Equation

AFOV °
HFOV (mm)=2 ×WD ( mm ) × tan( )
2

[CITATION Und19 \l 1033 ]

A typical CNN can handle images of a particular size only because a CNN usually

has a fully connected layer at the end for classification or regression, and the input

image shape is used to figure out the shapes of the fully connected layer’s weights,

biases and outputs. Once kernel and step size of the convolution is defined, the

shapes of the inputs and outputs of each subsequent layer will automatically be

determined, and the shapes of the weights and biases of the layers will not be

affected by the shape of the inputs. For example, if the moving kernel of size 3 x 3

with a step of 1 across an image, it does not matter whether the image is 256 x 256

or 1024 x 1024 (as long as the image is larger than the convolutional filter, that is).

So FCNs with just convolutional and activation layers can handle images of various

shapes [ CITATION Sax17 \l 1033 ]. Current practice of small filter size (3x3) with big
images, the result is shallow network because the nature of the network can't fit a lot

of layers into GPU. So the rule of thumb is use images about 256x256 for ImageNet-

scale networks and about 96x96 for something smaller and easier [ CITATION Shm16 \l

1033 ].

Figure below shows 1000pixel x 1000pixel or 1mega pixel raw Cubert S185 camera

input of oil palm leaf with bagworm infestation present.

Hence, with the images size rule of thumb of 256x256 pixels, the HSI can be

subdivided into 16 sub sample image to be the input in CNN. As mentioned above,

small filter size (3x3) pixel will be the trial filter to process the image in the network.

Refer the figure below for the segmented image.

You might also like