You are on page 1of 14

Intrusion detection on oil pipeline right of way using

monogenic signal representation


Binu M Nair and Varun Santhaseelan and Chen Cui and Dr. Vijayan K Asari
University of Dayton Vision Lab,Dayton, OH-45469,USA
ABSTRACT
We present an object detection algorithm to automatically detect and identify possible intrusions such as construction vehicles and equipment on the regions designated as the pipeline right-of-way (ROW) from high resolution aerial imagery. The pipeline industry has buried millions of miles of oil pipelines throughout the country
and these regions are under constant threat of unauthorized construction activities. We propose a multi-stage
framework which uses a pyramidal template matching scheme in the local phase domain by taking a single high
resolution training image to classify a construction vehicle. The proposed detection algorithm makes use of the
monogenic signal representation to extract the local phase information. Computing the monogenic signal from
a two dimensional object region enables us to separate out the local phase information (structural details) from
the local energy (contrast) thereby achieving illumination invariance. The first stage involves the local phase
based template matching using only a single high resolution training image in a local region at multiple scales.
Then, using the local phase histogram matching, the orientation of the detected region is determined and a
voting scheme gives a certain weightage to the resulting clusters. The final stage involves the selection of clusters
based on the number of votes attained and using the histogram of oriented phase feature descriptor, the object
is located at the correct orientation and scale. The algorithm is successfully tested on four different datasets
containing imagery with varying image resolution and object orientation.
Keywords: Monogenic Signal, Template Matching, Local Phase, Object Detection, Earth Movers Distance,
Phase Histogram, Histogram of Oriented Phase

1. INTRODUCTION
Research in autonomous detection of machinery threats on the energy pipeline right of way from imagery captured
at altitudes of around 3000 feet is challenging as a multitude of factors come into play. Issues in aerial imagery
such as complex appearances of machinery (belonging to a similar kind), the resolution of the object, the noise
present due to the image capture process and more importantly the height and angle at which the object has been
captured are some of the challenges that an object detection algorithm from aerial imagery needs to address. The
main objectives in this research are to tackle these issues by careful analysis of the imagery and develop a fullfledged system which can detect potential threats and aid human analysts for threat evaluation and subsequent
actions to be considered. To be more specific, the objectives in relation to the machinery threat detection are as
follows:
To detect and classify various types of construction machinery on the Pipeline ROW(Right of Way): The
characterization of each construction equipment is according to the following criteria. An illustration
showing the different types of equipment is shown in Figure 1.
Color, size and type of the machinery or equipment.
Outer and inner structural details of the machinery which are visible at around 1000-3000 feet above
the ground.
Further author information: (Send correspondence to Binu M Nair)
Binu M Nair: E-mail: nairb1@udayton.edu
Dr. Vijayan K Asari: E-mail: vasari1@udayton.edu

Signal Processing, Sensor Fusion, and Target Recognition XXII,


edited by Ivan Kadar, Proc. of SPIE Vol. 8745, 87451U 2013
SPIE CCC code: 0277-786X/13/$18 doi: 10.1117/12.2015640
Proc. of SPIE Vol. 8745 87451U-1
Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

(a) Excavator

(b) Backhoe

(c) Mini-Excavator

(d) Trencher

(e) SkidSteer

Figure 1: Illustration of different kinds of construction machinery from Vendor 1 Dataset

(a) Low Illumination

(b) CastShadows

(e) Overexposure

(f) High Resolution

(c) Different Viewpoint (d) Different Orientation

(g) Motion Blur

(h) Different Scale

Figure 2: Illustration of different kinds of construction machinery from Vendor 1 Dataset


To have an object description which can uniquely represent a type of construction equipment in different
constraints and challenges: Some of the challenges and constraints are due to the illumination present and
the viewpoint of the object. These constraints are mentioned below. An illustration of these constraints is
also shown in Figure 2.
Low or uneven illumination on the construction equipment.
Cast shadows of machinery on the ground.
Different viewpoints of machinery captured due to varying height of aircraft.
Different image resolution/scale due to varying height of image capture.
Over exposure of some or all of the construction equipment to lighting.
Varying spatial resolution of objects due to different properties/characteristics of the optical sensor
capturing the image.
To generalize the algorithm parameters in order to have consistent performance across different datasets
(captured by various vendors where each vendor uses sensors with different characteristics): The sensors
can vary in specifications such as
The color filter model of the sensor used in capturing the image.
The ground spatial distance of the sensor (GSD) or spatial resolution.
The image resolution and the height at which the sensor captures the data.

Proc. of SPIE Vol. 8745 87451U-2


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

2. RELATED WORK AND THEORY


In this section, we give a brief description of the monogenic signal analysis for the computation of the local phase
information and provide a relevant background work involved in object detection from aerial imagery along with
a few well-known feature descriptors that are used in object detection.

2.1 Local Phase from Monogenic Signal Analysis


The importance of phase information in the representation of structure in an image is illustrated in Gonzalez
and Woods.1 The magnitude information is a measure of the strength of the signal. However, the estimation
of local phase in an image is not trivial. In order to define the local structure in a one dimensional signal, the
analytical signal representation given in Equation 1 is of importance.
fA (x) = f (x) ifH (x)

(1)

Here f (x) is the original signal and fH (x) is its Hilbert transform, which can be calculated in frequency domain
as given in Equation 2.
FH (u) = H1 (u)F (u)

(2)

Here F (u) is the frequency domain representation of f (x) and H1 (u) = isgn(u) is the definition of the Hilbert
transform. Therefore, from Equation 1,the phase of a signal can be computed as
(x) = arctan(fH (x), f (x))

(3)

There have been multiple techniques that have attempted to extend the analytical signal representation to
multiple dimensions like the use of steerable filters. However, those techniques were not purely isotropic in
nature. The isotropic extension of the analytical signal representation is given by the monogenic signal2 as in
Equation 4.
fM (x1 , x2 ) = (f, fR )(x1 , x2 )
(4)
where fR (x1 , x2 ) = (h f )(x1 , x2 ) and h = (h1 , h2 ), is the Riesz kernel. The spatial and frequency domain
representation of the Riesz kernel is given in Equations 5 and 6 respectively.
x2
x1
,
), x = (x1 , x2 ) R2
(5)
(h1 , h2 )(x1 , x2 ) = (
2|x|2 2|x|2
u2
u1
)(
), (u) = (u1 , u2 ) R2
(6)
(H1 , H2 )(u1 , u2 ) = (
2|u|2 2|u|2
As in Equation 3, the local phase can be computed for the two dimensional signal from the monogenic signal
representation as in Equation 7.
((x)) =

|fR (x)|
fR (x)
arctan(
) = (x) exp(i (x))
|fR (x)|
f (x)

(7)

where (x) is the local phase and (x) is the local orientation. Both these are computed from the Reisz transform
of the signal as shown in Equations in 8 and 9.
q
= arctan( R12 (f ) + R22 (f ), f )
(8)
R2 (f )
), [0, )
R1 (f )
A measure of the local contrast can be estimated as shown in Equation 10.
q
A = f 2 (x) + |fR2 (x)|
= arctan(

(9)

(10)

In the monogenic signal representation, the implicit assumption is that any signal in a local sense is considered
to be intrinsically one dimensional. The local amplitude, local phase and local orientation is computed for this
intrinsic 1D signal. Therefore, design of the bandpass filter in the construction of monogenic signal representation
is significant.

Proc. of SPIE Vol. 8745 87451U-3


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

2.2 Object Detection in Aerial Imagery


Some of the recent works in detection objects addressed issues to counter the complex large variations in the
appearance of the object. Yao and Zhang3 developed a general approach towards detecting objects from aerial
imagery using a semi-supervised learning from contextual information. The context based object detection is
based on the fact that the objects present in the aerial imagery are often surrounded by a homogenous background
region. The main motivation for using a semi-supervised learning scheme is the absence of a large number of
labelled training samples of objects captured on aerial imagery. From a set of unlabelled training samples,
the semi-supervised classification scheme can adaptively label the unlabelled samples, thereby improving the
classification accuracy. This can form the basis for an ad-hoc training scheme where online training can be used
to incorporate the new test samples into the training set during the test phase.
Khan et al.4 used a 3D model based object classifiction scheme to detect vehicles from aerial imagery. An
appearance based information of the aerial object is used to compute a 3D model from where a set of salient
location markers are determined. By simulating the scene conditions through the 3D model rendering, the various
salient locations are used in creating a Histogram of Gradients(HOG) based feature classifier. By computing the
match score such as the Salient Feature Match Distribution Matrix between the features in rendered and real
test scenes, the vehicles in the test scene are classified.
Another approach to detecting objects in aerial imagery is to use a extensive global feature description
which is invariant to view-point, scale and orientation of the object present. Texture of the objects which can
represent the spatial structure from an aerial imagery can be used to represent an object or vehicle. Guo et al.5
proposed a rotation invariant texture classification scheme using the well-known local binary pattern(LBP).6 As
opposed to the orignal LBP texture descriptor, the local binary pattern variance(LBPV) descriptor retains the
global spatial information and the local information where this textural descriptor brings out the local contrast
information. The feature extraction scheme uses a global rotation invariant matching with locally variant LBP
texture features.
Mathew and Asari7 proposed a local intensity histogram based descriptor for tracking an object in very low
resolution videos. One of the main challenges is that there is a large global camera motion and contains poor
gradient and texture information. The descriptor proposed by them uses an intensity histogram which encodes
both spatial and intensity information. While the application has been for tracking, the feature descriptor used
in representing the object of interest can be used for object detection and classification from low-resolution aerial
imagery. The algorithm uses a more robust feature comparison metric known as the Earth Movers distance.8
A rotation and scale invariant object recognition methodology has been proposed by Matungka et al9 where
image feature extraction is combined with a log-polar wavelet mapping. Here, the log-polar mapping converts
a rotation of the cartesian coordinates to a translation in the log-polar coordinates. Hence, a translational shift
is more easier to determine than a rotational one. However, the changes in the image origin in the cartesian
domain can greatly influence the log polar mapping.

3. METHODOLOGY
The general approach to object detection from aerial imagery is to use a novel object representation scheme
which uniquely describes an objects shape, structure, color and texture which is invariant to the issues of nonuniform illumination, view-point, orientation and noise. These artifacts are mainly due to the image acquisition
process in the optical sensor onboard the aircraft. One of main issues in designing an object representation
is the invariance to lighting or invariance to non-uniform illumination. In case of an object present in a good
lighting condition and another object present in dark lighting condition, the object representation of both the
objects should ideally have similar in its invariance to non-uniform illumination. One such representation is
the use of phase information computed from frequency spectrum image analysis. A more localized version of
phase can be computed by monogenic signal analysis where the local phase represents the local structure object
irrespective of the lighting present in the image. This is illustrated in Figure 3. We see that the local phase
brings out the structural details of the backhoe irrespective of the lighting present and helps in distinguishing it
from the surrounding objects in the background region such as trees, shrubs, building etc...The characteristics
of the local phase is that since it is illumination invariant, it is not affected by over exposure to lighting or very
low illumination conditions and it projects the regular edges and corners associated with the description and

Proc. of SPIE Vol. 8745 87451U-4


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

t:

rt. "

(a) Backhoe 1

(b) Local Phase

(c) Backhoe 2

(d) Local Phase 2

Figure 3: Top Row: Backhoe captured in Flight 8(left) and the corresponding local phase computed in that region.
Bottom Row: Backhoe captured in Flight 6(left) and the corresponding local phase information computed in
that region. (Courtesy of Vendor 1)

(a) Excavator1

(b) Excavator2

(c) Excavator3

Figure 4: Images of Excavator illustrating the various constraints that occurs in Aerial imagery for both the
Vendor 4 (Left three) and Vendor 1(Right three)
representation of the object. A feature descriptor which is extracted from the local phase information tend to
hold the illumination property true and hence this representation is a very effective in describing objects captured
by optical sensors at an altitude of 500-3000 feet. Some of the constraints in using the local phase domain are
that the computation of local phase depends on the following factors:
Size of the object region: The sampling frequency refers to the sampling used to create the monogenic
filters used in the computation of the local phase information which in turn is related to the size of the
region of interest containing the object or construction machinery.
Orientation of the object: The local phase changes with the orientation of the object. Since the local phase
inherently depends on the frequency spectrum of the object, a change in the orientation of the object causes
its frequency spectrum to get shifted thereby changing the local structure in a square neighborhood region.
Image Resolution: The variation in the resolution of the object captured in the scene can also cause changes
in the computation of the local phase. More specifically, the frequency content captured by the monogenic
signal analysis shifts to a different band in the frequency spectrum as the resolution of the object changes.
So, to extract similar local phase information from two similar objects but appearing at different image
resolutions, the frequency band at which the local phase operates should be varied.
So, any object descriptor computed from the local phase information needs to be normalized for the scale (related
to the size of the object), the orientation and the image resolution. Illustrations of the constraints are shown
in Figure 4. To counter these constraints, we use a multi-stage approach where at each stage a suitable type of
descriptor is extracted to incorporate rotation,scale and viewpoint invariance.

Proc. of SPIE Vol. 8745 87451U-5


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

Figure 5: Block diagram illustrating the detection framework

4. DETECTION FRAMEWORK
The detection framework used to automatically locate construction vehicles in the pipeline right of way follows
a three stage approach. The detection framework is usually preceded by a training stage where the template for
each construction equipment is computed. The template is extracted from the local phase information of a high
resolution image and stored in a multi-scale fashion.
Local phase based template matching : This is a preliminary stage where a possible set of regions for
location of the object are noted by matching a template of the object on the training set to the test image.
Selection of Orientation and Cluster Voting: Here, the orientation of the object present in possible regions
is determined and a shortlist of such regions is made through Hierarchical clustering.
Final Detection by cluster selection and Feature matching by Histogram of Oriented Phase (HOP): From
the final set of clusters, we extract the a feature descriptor known as Histogram of Oriented Phase from
the local phase information for feature matching with the original template.

4.1 Training
The training stage involves selection of a suitable image for the creation of the template. We use a high resolution,
nader-view (top-view) of the construction vehicle as the training image. The selection of the high resolution image
enables us to create a multi-scale template pyramid with each level corresponding to a lower resolution. This
scaling enables the algorithm to search for objects with a different resolution than the object images present in
the training set. The local phase information of the training image is computed and the template to be used
is selected from a closely cropped region containing the local phase of the actual object. This template from
a high resolution image is down sampled to different lower sizes to create a local phase template pyramid. An
illustration of the template selection is shown in Figure 6. Some of the steps involved in computing the local
phase are given below.
Generation of Log-Gabor filters and monogenic filters for local phase computation.
Computation of local phase of training image using the Log-Gabor filters and monogenic filters to create
a frequency-scale space representation.
Selection of the template region in the local phase domain.
Creation of a template pyramid by down sampling of the template obtained from the local phase of highresolution training image.

4.2 Local Phase-based Template Matching


In this preliminary stage, we locate possible regions where construction equipment can be found by searching
the entire image in a windowed based approach. This is done by a template matching in a sub-region of the
image (a particular window) in the local phase domain using normalized cross correlation. The normalized cross
correlation based template matching is a fast technique which finds the location with the most optimal match.
Since the object can occur in different orientations, the template matching is performed for every 5 degree
rotation of the sub-image where for each rotation, we get an optimal match. This template matching scheme is
applied at every scale of the template to obtain matches corresponding to similar objects with smaller size or
lower resolution. An illustration of the local phase based template matching is shown in Figure 7.

Proc. of SPIE Vol. 8745 87451U-6


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

Select

Voting Scheme

Orientation

If

Multi -Scale Multi -Orientation


Matching

Single Orientation Multi-

Voting

Scale Detections

Figure 8: Orientation Selection and Cluster Voting

Zoomed in
View

HOP Matching

Selected Cluster Group

Final Detection

Zoomed in View of Final


Detection

Figure 9: Cluster Selection and Detection using Histogram of Oriented Phase

4.4 Cluster Selection and Detection using Histogram of Oriented Phase


From the previous stage, we obtain a set of clusters containing locations of possible object regions where each
cluster will be associated with a certain number of votes depending on how close the detections are with respect
to the multi-scale template. Now another stage of pruning out the clusters is by looking at the number of
votes the cluster attained and by setting a specific threshold for the number of votes, we can eliminate certain
clusters. The idea behind this elimination is that those clusters which have got fewer votes are those possible
regions which have some background variation which was projected by the local phase. In short, the noise in the
background was projected and got matched to the training template at a certain scale but the match distance
was too large. After the pruning of such clusters, a different feature set is extracted from the detections of the
remaining clusters and matched with the training template. This feature vector known as the Histogram of
Oriented Phase(HOP), is a weighted histogram over the local orientation (computed from the monogenic signal
analysis) with the weights corresponding to the local phase. This dense descriptor can uniquely identify an object
and can be matched more closely to the corresponding descriptor of the training template. So, in each cluster,
we identify the detections whose HOP descriptor matches to the template region within a certain extent and
compute the number of HOP hits the cluster has. Clusters which have less than a certain number of HOP hits
is discarded and the remaining clusters or so are then considered as the final object location. An illustration is
shown in Figure 9.

5. EXPERIMENTAL RESULTS AND ANALYSIS


The construction equipment detection framework has been tested on three different datasets with each dataset
containing images captured at around 1000-3000 feet by three different vendors: 1,2 and 3. The imagery captured
by these vendors differ in the type of sensors used to capture these images and the height at which they were
captured. One of the main characteristic that the sensor differs is in the Ground Spatial Distance (GSD) or the
spatial resolution of the image. This determines the resolution of the object present in the image. Moreover, the
angle at which the image (depends on the orientation of the sensor mount on the aircraft) has been captured is also

Proc. of SPIE Vol. 8745 87451U-8


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

Figure 6: Training Phase:Template Selection

Figure 7: Local Phase based Template Matching

4.3 Orientation Selection and Cluster Voting


From the previous stage, from sub-regions obtained by overlapping windows, we get clusters of detection at every
orientation and at every scale. In this stage, we select the appropriate orientation among the set of detections
in each sub-region. The local phase of the detected location at each rotation of the image is compared with the
template local phase by the phase histogram matching using the Earth Movers distance matrix. The rotation
of the sub-image which yields the smallest distance to the template is then selected as the correct orientation of
the detected object. Some regions (sub-images) do not have enough detection and may contain few detections
corresponding to a particular scale while some other sub-images have a cluster of detections in a particular
location and at a particular orientation. The latter corresponds to the presence of construction equipment as
the template matching scheme fires at every possible scale for only a single orientation. The latter corresponds
to a false detection by the template matching scheme where the detection may have happened due to the noise
in the background at a particular scale. This creates a scenario where in some parts of the image, clusters of
detection exist and in some parts, only one or two detections are only present. This brings the need to remove
the single detections and to retain the clusters. So a hierarchical clustering scheme is employed which evaluates
the cluster of detection in each sub-region to satisfy the total number of detections required for a particular
orientation. Only the clusters which satisfy a minimum requirement for the number of possible detections are
retained while the rest are discarded. A voting scheme is applied on the retained clusters where a particular
detection in a cluster is given a certain vote or weight depending on how close the detection is to the template
region. The closeness is the matching distance that is computed between the phase histograms of the template
and the detected region. The detections which are closer to the template have a higher probability of being the
actual construction equipment and so a higher vote is given to the detections which are closer to the template.
As shown in Figure 8, the color scheme applied to the detections represents the voting mechanism with the green
having the highest, the red having the least with the yellow having moderate number of votes.

an important factor as different viewing angles of the camera can lead to different viewpoints. So, the testing on
these three datasets will provide a good evaluation of the construction equipment detection framework described
in the previous sections. In this section, we will evaluate the algorithm by testing it on the three datasets for
the detection of the Backhoe and provide statistics on the accuracy, false detection rate and miss rates. The
accuracy will be in terms of the percentages. The false detection rates are the number of detections that was
incorrectly detected as construction equipment in the final stage. The miss rate is the number of construction
equipment (Backhoe) which were not located by the automated algorithm. As mentioned earlier, the algorithm
has a training stage and 3 stages in the testing phase :
Stage 1: Local Phase based Template Matching.
Stage 2: Orientation Selection and Cluster Voting.
Stage 3: Cluster Selection and Matching by Histogram of Oriented Phase.

5.1 Test on Vendor 1 Dataset


The algorithm was tested on the dataset provided by the Vendor 1 and this dataset had fair decent resolution
imagery but covered a large area of the Pipeline Right of Way. The imagery was captured at a height of around
1000-2000 feet above the Pipeline Right of Way. One of the main challenges in this dataset are the dark bands
or regions that appear at the edge of the image probably due to the encasing of the sensor used to capture the
image. Thus, construction equipment appearing at the edges of the images had very low illumination or lighting
present on the object. Out algorithm tackled the problem of illumination by using the local phase. Moreover, a
change in elevation at which the images were captured during different flights results in a change in the spatial
resolution as well.

5.2 Test on Vendor 2 dataset


The algorithm was also tested on the dataset provided by the Vendor 2 and this dataset had higher resolution
imagery as it was captured at at height of around 500-1000 feet above the Pipeline Right of Way. So, the
construction equipment present in the imagery is much more defined and has more structural details for the object
to be detected. The challenges involved in this dataset are the slight illumination variations and orientation and
position changes with slight changes in spatial resolution of the object. Again, we have evaluated our algorithm
by running it on the test image containing the Backhoe. An illustration of the detection results is shown in
Figure 11.

5.3 Test on Vendor 3 Dataset


The imagery provided by Vendor 3 had two different kinds, one taken at a height of 6000 feet and the other set
taken at a height of around 1000-2000 feet. For the evaluation of the algorithm, we use the first set of imagery
captured at 6000 feet taken from Flights 1-4 where each flight corresponds to a single pass over the Pipeline
Right of Way at Gary,Indiana. The challenge in this dataset is that the spatial resolution is very poor which
leads to less number of structural details required for detection. Moreover, there are illumination variations such
as over-exposed illumination over the object to be detected. So, this algorithm is evaluated on this challenging
set of imagery by applying it to detect the largest type of construction equipment such as the Backhoe. An
illustration of the detection procedure for each stage is shown in Figure 12.

5.4 Statistics for Detection of Backhoe


The statistics which we have computed are the detection accuracy and the number of false positives obtained by
testing on images containing the construction equipment such as the Backhoe by using only one training image.
The selection of the training image depends on the resolution of the object present. In this proposed algorithm,
we use the object sample with the highest spatial resolution. The tables shown below illustrate the detection
accuracy and the false positive rate attained for each dataset.

Proc. of SPIE Vol. 8745 87451U-9


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

Ix= ...sr

.atmauler

(b) Test image with manual annotation.

(c) Stage1:Local Phase Template Matching

(d) Stage2:Orientation Selection and Cluster Voting

(a) Training Image.

1.

014 WM
7#1-ii4
i

" '-am....

tr. mom,

lb 416
,

Ramie

'

(e) Stage3:Cluster Selection and HOP Matching

(f) Stage3 Detection for test image from Flight 4

Figure 10: Detection of the Backhoe at different stages on sample images,Courtesy of Vendor 1

Proc. of SPIE Vol. 8745 87451U-10


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

(a) Training Image.

(b) Test image with manual annotation.

(c) Stage1:Local Phase Template Matching

(d) Stage2:Orientation Selection and Cluster Voting

(e) Stage3:Cluster Selection and HOP Matching

(f) Stage3 Detection for test image from Flight 5

Figure 11: Detection of the Backhoe at different stages on sample images,Courtesy of Vendor 2

Proc. of SPIE Vol. 8745 87451U-11


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

41210-1,
(a) Training Image.

lit4NtiR.sir

(b) Test image with manual annotation.

"Alta ikzx

tritta%ar
(d) Stage2:Orientation Selection and Cluster Voting

a,ms

(c) Stage1:Local Phase Template Matching

i11RSs

(e) Stage3:Cluster Selection and HOP Matching

(f) Stage3 Detection for another test image

Figure 12: Detection of the Backhoe at different stages on sample images,Courtesy of Vendor 3

Proc. of SPIE Vol. 8745 87451U-12


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

Table 1: Statistics for Vendor 1 Dataset


Equipment/Stages
Stage1 Stage2 Stage3
Backhoe Instance (Flight No)
1(1)
Y
Y
Y
2(2)
Y
Y
Y
3(3)
Y
X
X
4(4)
Y
Y
Y
5(5)
Y
Y
Y
6(6)
Y
Y
Y
7(7)
Y
Y
Y
Total False Positives
True Detection Rate
100% 85.71% 85.71%

Table 2: Statistics for Vendor 2 Dataset


Equipment/Stages
Stage1 Stage2 Stage3
Backhoe Instance (Flight No)
1(1)
X
X
X
2(2)
X
X
X
3(3)
Y
Y
Y
4(4)
Y
Y
Y
5(5)
Y
Y
Y
6(6)
Y
Y
Y
7(7)
Y
Y
Y
8(8)
Y
Y
Y
Total False Positives
True Detection Rate
75%
75%
75%

Table 3: Statistics for Vendor 3 Dataset


Equipment/Stages
Stage1 Stage2 Stage3
Backhoe Instance (Flight No)
1(1)
Y
Y
Y
2(1)
Y
Y
Y
3(1)
Y
Y
Y
4(1)
Y
Y
Y
5(1)
X
X
X
6(2)
Y
Y
Y
7(2)
Y
Y
Y
8(2)
Y
Y
Y
9(3)
Y
X
X
10(3)
Y
Y
X
11(3)
Y
X
X
Total False Positives
True Detection Rate
91%
73%
64%

Proc. of SPIE Vol. 8745 87451U-13


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms

False Positives
0
0
0
1
0
1
0
2

False Positives
0
0
0
0
0
1
1
0
2

False Positives
0
0
2
4
0
0
2
4
0
0
0
12

6. CONCLUSIONS
We have proposed an algorithm which can autonomously detect construction equipment in various lighting
conditions and different equipment orientations using a multi-layer framework. This framework is based on the
feature extraction from the local phase information generated by a monogenic analysis of the image. The local
phase information brings out the spatial structure of the object and projects it from the surrounding homogenous
background and is invariant to the illumination present in that region. By computing the histogram of phase
and histogram of oriented phase(HOP) along with a template matching scheme, we have successfully detected
the construction equipment such as the Backhoe in three different datasets provided by the vendors 1,2 and 3.
Future work will include the detection of other construction equipment such as the Excavator, Mini-Excavator,
Trencher etc.. on the pipeline right of way (ROW) which is more challenging as its size is considerably smaller
than the Backhoe.

ACKNOWLEDGMENTS
This project has been funded by the Pipeline Research Council International(PRCI) with the test imagery
captured in Gary, Indiana. (Project No: PR-433-133700)

REFERENCES
[1] Gonzalez, R. C. and Woods, R. E., [Digital Image Processing], Addison-Wesley Longman Publishing Co.,
Inc., Boston, MA, USA, 2nd ed. (1992).
[2] Felsberg, M. and Sommer, G., The monogenic signal, Signal Processing, IEEE Transactions on 49(12),
31363144 (2001).
[3] Yao, J. and Zhang, Z., Semi-supervised learning based object detection in aerial imagery, in [Computer
Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on], 1, 10111016
vol. 1 (2005).
[4] Khan, S., Cheng, H., Matthies, D., and Sawhney, H., 3d model based vehicle classification in aerial imagery,
in [Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on ], 16811687 (2010).
[5] Guo, Z., Zhang, L., and Zhang, D., Rotation invariant texture classification using lbp variance (lbpv) with
global matching, Pattern Recogn. 43, 706719 (Mar. 2010).
[6] M.Pietikainen, A.Hadid, G.Zhao, and T.Ahonen, [Computer Vision Using Local Binary Patterns], Springer
(2011).
[7] Mathew, A. and Asari, V., Local region statistical distance measure for tracking in wide area motion
imagery, in [Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on ], 248253
(2012).
[8] Rubner, Y., Tomasi, C., and Guibas, L. J., The earth movers distance as a metric for image retrieval,
International Journal of Computer Vision 40, 2000 (2000).
[9] Matungka, R., Zheng, Y., and Ewing, R., Object recognition using log-polar wavelet mapping, in [Tools
with Artificial Intelligence, 2008. ICTAI 08. 20th IEEE International Conference on], 2, 559563 (2008).

Proc. of SPIE Vol. 8745 87451U-14


Downloaded From: http://spiedigitallibrary.org/ on 11/07/2014 Terms of Use: http://spiedl.org/terms