Project Document Colpitts Final 1

CRANFIELD UNIVERSITY
CAPTAIN ANDREW COLPITTS R.M.C.

ROYAL CANADIAN ENGINEERS
AN INVESTIGATION INTO THE EFFECT OF METADATA, FRAME COUNT

AND OTHER FACTORS ON THE QUALITY OF POINT CLOUD DATA
PRODUCED USING STRUCTURE FROM MOTION (SFM) TECHNIQUES
DEFENCE INTELLIGENCE SECURITY CENTRE
ROYAL SCHOOL OF MILITARY SURVEY
GEOSPATIAL INTELLIGENCE
MSC THESIS
2013
SUPERVISORS:
DR. NIKO GALIATSATOS
DR. STEPHEN ROBINSON
SEPTEMBER 2013
P RESCRIBED F ORM - D ISCLAIMER
This paper was written by a student on the GEOINT MSc course at the Royal
School of Military Survey. It has not been corrected or altered as a result of
assessment and it may contain errors or omissions. The views expressed in it,
together with any recommendations, are those of the author and not those of
the DISC, JFIG, Cranfield University, or individual members of its staff. This
document has been printed on behalf of the author by the JFIG, but it has no
official standing as an MOD, DISC, JFIG or Cranfield University document.
I certify that this is my own work and gives due credit to the contribution of
others through references.
Andrew R. Colpitts
Captain
Royal Canadian Engineers
CRANFIELD UNIVERSITY
ROYAL SCHOOL OF MILITARY SURVEY
MSc THESIS
Academic Year 2012 – 2013
Captain Andrew Colpitts R.M.C.
AN INVESTIGATION INTO THE EFFECT OF METADATA, FRAME COUNT

AND OTHER FACTORS ON THE QUALITY OF POINT CLOUD DATA
PRODUCED USING STRUCTURE FROM MOTION (SFM) TECHNIQUES
Academic Supervisors:
Dr. Niko Galiatsatos
Dr. Stephen Robinson
September 2013
This thesis is submitted in partial fulfilment of the requirements for the Degree of
Master of Science.
© Crown Copyright, 2013. All rights reserved. No part of this publication may be
reproduced without the written permission of the copyright holder.
ABSTRACT
A BSTRACT
Using a technique called Structure from Motion (SfM), 3D geometry can be
reconstructed from overlapping images of a rigid scene. This technique has
been shown to be useful for military and geospatial purposes in its application
with automated terrain extraction, but the potential of SfM has not yet been fully
exploited in a defence or a defence geospatial context. While the concept of
SfM has been studied since before the 1980s, it is only now coming into
widespread availability due to advances in research, hardware capability and
the proliferation of free and commercial software and hardware such as the
Microsoft® Kinect. This research project summarises the theory of SfM and
investigates the effects of three distinct and pervasive factors on the quality of
3D reconstructions: focal length estimation technique, the number of images
supplied as input and the type of image feature detector. The methodology in
this research project is dependent on the existence of both accurate and dense
reference data and also on a sequence of images which represent the same
scene. Among the conclusions and recommendations drawn from this project is
that the SfM technique for 3D reconstruction lends itself to efficient scaling
between large and small tasks at the tactical to operational levels. In addition,
while there are other 3D model generation techniques available, few are as
tactically viable as SfM. Other findings include the empirical confirmation that
the specification of camera parameters yields consistently better results, and
that there exists a distinct ‘quality vs. effort’ ratio, both in the number of
reconstructed 3D points and their overall accuracy, dependent on the density of
the input data supplied.
I
A C K N OW L E D GE ME N T S
A CKNOWLEDGEMENTS
The author would like to gratefully acknowledge the support of the following
individuals and organisations:
My beautiful wife Kristin for her patience, love, faith and resolve in taking care of
our new son David, who was born in July 2013.
Jonathan Fournier at Defence Research and Development Canada (DRDC) for

supplying the basis for the research question as well as the reference data and
input images.
Pierre Simard and David Rowlands at the Mapping and Charting Establishment
(MCE) in Ottawa for effecting the provision of contextual data and support.
Dr. Nikolaos Galiatsatos and Dr. Stephen Robinson for their guidance, patience
and flexibility.
II
TABLE OF C ON TE N TS
T ABLE OF C ONTENTS
ABSTRACT I
ACKNOWLEDGEMENTS ........................................................................................... II
TABLE OF CONTENTS ............................................................................................ III
LIST OF FIGURES ...................................................................................................V
1 INTRODUCTION ................................................................................. 1-1
1.1 BACKGROUND .......................................................................... 1-1
1.2 MOTIVATION ............................................................................. 1-1
1.3 PROJECT AIM ........................................................................... 1-2
1.4 PROJECT OBJECTIVES .............................................................. 1-2
1.5 ABOUT THIS REPORT ................................................................ 1-3
2 LITERATURE REVIEW ......................................................................... 2-1
2.1 OVERVIEW ............................................................................... 2-1
2.2 THE CRUX OF SFM: IMAGE-IMAGE CORRESPONDENCES .............. 2-1
2.3 OTHER CRITICAL FACTORS ....................................................... 2-2
2.4 STRUCTURE FROM MOTION W ORKFLOW STAGES ........................ 2-3
2.5 STAGE ONE: FEATURE EXTRACTION.......................................... 2-5
2.6 STAGE TWO: FEATURE MATCHING .......................................... 2-16
2.7 STAGE THREE: 3D RECONSTRUCTION ..................................... 2-20
2.8 STAGE FOUR: BUNDLE ADJUSTMENT (BA) ............................... 2-21
2.9 STAGE FIVE: DENSE 3D POINT CLOUD GENERATION ................ 2-23
2.10 SFM SOFTWARE .................................................................... 2-25
2.11 POINT CLOUD QUALITY MEASURES AND COMPARISON
TECHNIQUES .......................................................................... 2-27
2.12 SUMMARY .............................................................................. 2-29
3 METHODOLOGY AND DATA ................................................................. 3-1
3.1 OVERVIEW ............................................................................... 3-1
3.2 PART ONE: METHODOLOGY ....................................................... 3-2
3.3 DATA PREPARATION ................................................................. 3-3
3.4 SURF FEATURES ..................................................................... 3-5
3.5 SIFT FEATURES ....................................................................... 3-7
3.6 RECONSTRUCTION .................................................................... 3-8
3.7 POINT CLOUD EVALUATION ....................................................... 3-9
3.8 PART TWO: DATA .................................................................. 3-12
3.9 SUMMARY .............................................................................. 3-16
4 RESULTS .......................................................................................... 4-1
4.1 OVERVIEW ............................................................................... 4-1
4.2 OVERALL PERFORMANCE STATISTICS ........................................ 4-1
4.3 INITIAL OBSERVATIONS ............................................................. 4-2
4.4 NUMBER OF INPUT IMAGES ........................................................ 4-3
4.5 KEY CHARTS ............................................................................ 4-4
4.6 SUMMARY ................................................................................ 4-9
5 ANALYSIS AND DISCUSSION ............................................................... 5-1
5.1 OVERVIEW ............................................................................... 5-1
III
TABLE OF C ON TE N TS
5.2 EXAMINATION OF RESULTS ........................................................ 5-1

5.3 SUGGESTED IMPROVEMENTS TO THE METHODOLOGY .................. 5-7
5.4 IMPLICATIONS OF THE FINDINGS ............................................... 5-10
5.5 SUMMARY .............................................................................. 5-11
6 CONCLUSION AND SUGGESTED RESEARCH ......................................... 6-1
6.1 CONCLUSION ........................................................................... 6-1
6.2 SUGGESTED FURTHER RESEARCH ............................................. 6-1
WORD COUNT ....................................................................................................... I
WORKS CITED ...................................................................................................... II
ANNEX A STRUCTURE FROM MOTION THEORY................................................... A-1
A.1 OVERVIEW ............................................................................... A-1
A.2 BASIC CONCEPTS IN STRUCTURE FROM MOTION ......................... A-2
A.3 STRUCTURE AND MOTION ......................................................... A-6
A.4 TRANSFORMATIONS ................................................................ A-12
ANNEX B IMAGE DATABASE VBA CODE ............................................................ B-1
ANNEX C SURF FEATURE EXTRACTION AND MATCHING MATLAB® CODE ............ C-1
ANNEX D POINT CLOUD ANALYSIS MATLAB® CODE ........................................... D-1
ANNEX E AUXILIARY CALCULATIONS ................................................................. E-1
ANNEX F COMPLETE LIST OF CASES ................................................................ F-1
ANNEX G CHARTS .......................................................................................... G-1
IV
LIST OF FIGURES
L IST OF F IGURES
FIGURE 1 – CORRESPONDENCES EXTRACTED USING THE AFFINE SCALE
INVARIANT FEATURE TRANSFORM (ASIFT).................................................. 2-1
FIGURE 2 – DIAGRAM OF THE SIFT EXTRACTION PROCESS. ................................... 2-8
FIGURE 3 – DOG LAYER OCTAVES (LOWE, 2004). ............................................... 2-8
FIGURE 4 - IDENTIFICATION OF LOCAL EXTREMA IN SCALE AND IMAGE
SPACE (X,Y,Σ) (LOWE, 2004). .................................................................... 2-9
FIGURE 5 - SIFT FEATURE DESCRIPTOR ORIENTATION (VEDALDI AND
FULKERSON, 2008). ................................................................................ 2-10
FIGURE 6 - BASIC SCHEMATIC DIAGRAM OF THE SURF EXTRACTION
PROCESS. ............................................................................................... 2-11
FIGURE 7 - THE DISCRETIZED AND CROPPED GAUSSIAN SECOND ORDER

PARTIAL DERIVATIVE IN THE LYY DIRECTION (LEFT) AND THE LXY
DIRECTION (RIGHT). ................................................................................. 2-12
FIGURE 8 - BOX FILTERS APPROXIMATING THE GAUSSIAN SECOND-ORDER

PARTIAL DERIVATIVE IN DYY (LEFT) AND DXY (RIGHT). ................................... 2-12
FIGURE 9 - SLIDING W INDOW USED TO DETERMINE THE ORIENTATION OF

SURF DESCRIPTORS. ............................................................................. 2-13
FIGURE 10 - K-D TREE EXAMPLE FOR A DATASET IN TWO DIMENSIONS (THE
MATHWORKS, INC., 2013). ....................................................................... 2-18
FIGURE 11 - EXAMPLE OF A CMVS-PMVS DENSE RECONSTRUCTION FOR
THE DATASET IN THIS RESEARCH PROJECT. .............................................. 2-23
FIGURE 12 - EXAMPLE OF BUNDLER CAMERA POSE ESTIMATION (SNAVELY,

SEITZ AND SZELISKI, 2006). ..................................................................... 2-26
FIGURE 13 - METHODOLOGY FLOWCHART. .......................................................... 3-2
FIGURE 14 - DATA PREPARATION PHASE. ............................................................ 3-3
FIGURE 15 - SURF FEATURE EXTRACTION AND MATCHING PHASE. ....................... 3-5
FIGURE 16 - SIFT FEATURE EXTRACTION AND MATCHING PHASE. ......................... 3-7
FIGURE 17 - RECONSTRUCTION PHASE. .............................................................. 3-8
FIGURE 18 - POINT CLOUD EVALUATION PHASE. .................................................. 3-9
FIGURE 19 - NRC TWIN OTTER AIRCRAFT WITH WESCAM TURRET FITTED. .......... 3-12
FIGURE 20 - SENSOR SUITE OF THE JMMES SYSTEM. ....................................... 3-13
FIGURE 21 - AERIAL PLATFORM FLIGHT PATH. ................................................... 3-14
FIGURE 22 - SAMPLE INPUT IMAGES. ................................................................. 3-15
FIGURE 23 - REFERENCE DATASET. .................................................................. 3-15
FIGURE 24 - NUMBER OF RECONSTRUCTED POINTS VS. NUMBER OF
IMAGES..................................................................................................... 4-4
V
LIST OF FIGURES
FIGURE 25 - LE90ABSMIN (PERCENTILE) VS. NUMBER OF

RECONSTRUCTED POINTS .......................................................................... 4-5
FIGURE 26 - LE90ABSMIN (PERCENTILE) VS. NUMBER OF IMAGES ........................ 4-6
FIGURE 27 - CENTROID VERTICAL ADJUSTMENT VS. NUMBER OF IMAGES. .............. 4-7
FIGURE 28: CENTROID HORIZONTAL ADJUSTMENT VS. NUMBER OF IMAGES............ 4-8
FIGURE 29 - NUMBER OF RECONSTRUCTED POINTS VS. NUMBER OF
MATCHED PAIRS. ...................................................................................... 5-2
FIGURE 30 - RATE OF CHANGE OF NUMBER OF RECONSTRUCTED POINTS
VS. NUMBER OF MATCHED PAIRS................................................................ 5-2
FIGURE 31 - SIDE VIEW OF A FALSE-COLOUR SIFT-SPECIFIED

RECONSTRUCTION USING 1000 IMAGES SHOWING A LARGE NUMBER
OF ERRONEOUSLY RECONSTRUCTED POINTS .............................................. 5-5
FIGURE 32 - SIDE-VIEW OF A FALSE COLOUR SURF-SPECIFIED

RECONSTRUCTION USING 1000 IMAGES SHOWING A LIMITED
NUMBER OF ERRONEOUSLY RECONSTRUCTED POINTS. ................................ 5-6
FIGURE 33 – THE CAMERA AND IMAGE COORDINATE SYSTEM. .............................. A-3
FIGURE 34 – COMPARISON BETWEEN METRIC PHOTOGRAMMETRY AND
SFM. ........................................................................................................ A-7
FIGURE 35 – EPIPOLAR GEOMETRY OF THE IMAGE PAIR. ...................................... A-8
VI
C H A P TE R 1: I N T R OD U C TI ON
1 I NTRODUCTION
1.1 B ACKGROUND
McGrath (2010) showed that through second-phase exploitation of full-motion
video, it is possible to extract three-dimensional (3D) information, such as digital
elevation models (DEMs) using structure from Motion (SfM) techniques and
proprietary commercial software. He also demonstrated a limited attempt to
quantitatively assess these DEMs for accuracy. In the past, methods for
objectively assessing elevation data have typically been limited to “2.5D”
datasets, wherein unique horizontal locations correspond to single elevation
value. These surface models are not as useful in complex environments, where
the retention of the full 3D point cloud may be beneficial in permitting visual and
topological layering, and enabling the creation of more complex, true-to-form
structure models. In addition, while there are several recent and ongoing SfM
research projects focusing on the large-scale application of SfM and on the
improvement of SfM time-cost measures, little research has been conducted
focusing on a more holistic approach to SfM and its applications in a defence
context.
1.2 M OTIVATION
1.2.1 S CENARIO
Consider the following scenario: A four-man reconnaissance patrol is tasked to
observe an objective and departs the hide. Upon arrival at the objective, the
patrol commander and another member observe the objective from several
different vantage points using a process called ‘clover-leafing.’ At each
observation point, in addition to the normal notes and observations, the patrol
takes a picture using a basic single-reflex lens (SLR) digital camera. Upon
returning to the headquarters several hours later, the images are turned over to
a geospatial analyst, who uses them to quickly construct a 3D model of the
objective area. The model is cleaned up and given surface colouration directly
from the photos. Annotations are made according to the recollections of the
patrol members, and the 3D model helps them to describe the objective to the
commander.
1-1
The end result is a 3D planning product which can both assist the patrol in
conveying their observations and assist the commander to make decisions.
Later in the week, a deliberate operation is planned and executed using the 3D
model.
1.2.2 T HE B ENEFIT
In total, the application of the SfM technique has improved operational
effectiveness at the tactical level, improved situational awareness and
represented only a small increase in manpower and resource requirements.
Perhaps the most important advantage in the defence context is that a 3D

reconstruction is made possible using the SfM technique without significantly
increasing the ‘time on ground’ factor, thus increasing its tactical viability. Other
3D reconstruction techniques, such as using a laser scanner are
disadvantageous both in the time required to complete the task, and the fact
that laser scanning is an active sensing process. Both of these disadvantages
increase the risk of enemy detection and intervention, and the loss of the
element of surprise.
In Theory, such a scenario is possible using currently-available, open-source

software and commercial off-the-shelf (COTS) hardware. There are, however,
a few limitations and general concepts which govern the applicability of SfM in
the military or geospatial context. Some of these limitations will be explored in
this research project.
1.3 P ROJECT A IM
The aim of this project is to investigate the effect of image metadata, frame
count and processing algorithms on the quality of 3D point clouds produced
using SfM techniques.
1.4 P ROJECT O BJECTIVES

1.4.1 O BJECTIVE O NE
Review current research to identify available open-source software and
techniques for producing point clouds using SfM.
1-2
1.4.2 O BJECTIVE T WO
Develop a methodological approach for production and geo-registration of point
clouds with open-source SfM software using three independent variables:
estimation of camera calibration parameters, image count and choice of feature
detector.
1.4.3 O BJECTIVE T HREE

Validate the proposed methodology by producing 3D point clouds for logical
combinations of the independent variables.
1.4.4 O BJECTIVE F OUR

Review current research to identify techniques for objectively assessing the
quality of a point cloud using a reference point cloud.
1.4.5 O BJECTIVE F IVE

Assess the effect of each independent variable by evaluating the quality of the
model.
1.5 A BOUT THIS R EPORT

This report includes six chapters. Chapter Two includes an overview of the
basic principles of SfM theory, and current research into various SfM
algorithms. As a companion to the literature review, Annex A has been written
to describe the key SfM theory which is required for a complete appreciation of
the findings of this research project. Chapter Three describes the methodology,
independent variables and data used in this research project. Chapter Four is a
short chapter dedicated to the summarization and display of the results from the
experimentation. Initial comments are made based on the results, but the
analysis and discussion of these results is deferred to Chapter Five. In addition,
Chapter Five lists suggested improvements to the methodology for potential
future research projects. The final chapter will consist of concluding remarks
and recommendations for further research focus.
It is assumed that the reader has a basic understanding of the hierarchy of

transformations (Euclidean, Similarity, Affine and Projective) in two and three
dimensions, and is familiar with matrix algebra.
1-3
C H A P TE R 2: L I TE R A TU R E R E V I EW
2 L ITERATURE R EVIEW
2.1 O VERVIEW
This chapter is a review of some of the basic concepts of computer vision and
SfM. A discussion of the current research on the relevant subjects is organized
in terms of a common SfM workflow. The field of computer vision, and SfM
within it, is surprisingly complex, and in no measure is this literature review
intended to be a comprehensive analysis of the current research in every
subject area. It is, however, designed to provide a framework for the
understanding of the methodology, analysis and discussion presented in
subsequent chapters.
2.1.1 R EAD ANNEX A

Readers who are unfamiliar with the theory behind SfM, or at least with the
theory associated with metric photogrammetry, are strongly encouraged to
browse Annex A, which is written as a conceptual basis for this literature review
and for the research project as a whole, before continuing to read this chapter.
2.2 T HE C RUX OF S F M: I MAGE -I MAGE C ORRESPONDENCES

Understanding the theoretical basis for SfM discussed in Annex A, image-image
correspondences are arguably the most significant factor in SfM success, where
both the quantity and quality of correspondences play a major role in
determining the accuracy, density, distribution and reliability of 3D geometry
reconstructed using SfM techniques.
Figure 1 – Correspondences Extracted using the Affine Scale Invariant Feature

Transform (ASIFT).
2-1
To compound their importance, the quality and quantity of image-image

correspondences have profound effects regardless of the algorithms, software,
data types or level of human intervention in any SfM workflow. Since the
phases of feature extraction and feature matching are conducted early in the
SfM process, the effects of inadequate feature matching propagate to the very
end of any workflow and impair the accuracy and density of the result. The
effects of image-image correspondences are a core concept of this research
project and can clearly be seen from the results of the limited experiments
performed herein.
2.3 O THER C RITICAL F ACTORS

Despite the high importance of image-image correspondences, there are many
other important factors which affect the results of an SfM workflow. There are in
fact too many possible parameters, settings and algorithms to describe every
option in detail, especially since SfM is a field of intense ongoing research.
There are a few factors, however, which relate to a broad range of SfM
applications and are sufficiently generalised to permit the extrapolation of
findings to many future possibilities. Two have been selected for this research
project, which are the estimation of camera focal length and the number of
images supplied as input to an SfM application for sparse reconstruction.
2.3.1 F OCAL L ENGTH E STIMATION

As indicated in paragraph A.3.7, the camera calibration matrix K need not
necessarily be provided to an SfM algorithm prior to reconstruction. Instead,
conservative, logical assumptions can be used to build an a priori estimate of K,
and then any successful reconstruction can be used to refine this estimate at a
later stage using the inverse of the perspective three point (P3P) problem. But
this leads to the question of whether or not the refined estimate of K adequately
represents the true intrinsic camera calibration. More importantly from a
pragmatic point of view, there is a question of what effect an estimated intrinsic
camera calibration might have on the accuracy of any reconstructed scene.
2-2
2.3.2 N UMBER OF I MAGES S UPPLIED

In examining the effect of the number of images supplied to an unsupervised
reconstruction, there are several aspects which might be explored, such as the
threshold at which reconstruction becomes reliable, the point density of the
reconstruction, and the effect on the overall accuracy of the reconstruction.
2.3.3 O VERALL P OSITIONAL ACCURACY AND L INEAR E RROR

As indicated in Figure 34, SfM applications often make use of an arbitrary
coordinate system based on the initial estimates of the initialization pair, with
rectification of the model to a world coordinate frame done at a later stage, if at
all. Therefore, in this research project, assessment of accuracy was performed
based upon a minimization of the sum of squared nearest point distance from
the model to the reference data over a Euclidean (volume-preserving)
transformation to remove (or distribute) all sources of bias. Hence, the
accuracy statistics generated from the experiments in this research project
represent the minimum 3D linear absolute error, expressed as LE90AbsMin.
Colloquially, this is the minimum absolute error achievable when all sources of
bias and systematic error are eliminated, with the constraint that relative model
scale is unchanged.
2.4 S TRUCTURE FROM M OTION W ORKFLOW S TAGES

In most applications, the SfM workflow can be divided into approximately five
stages as identified by Rea (2013), which are usually present irrespective of the
level of automation. These stages are described in the following sections. A
short description of what is done in the stage is given, followed by general
comments about the types of processes which can be used to increase the
efficiency or reliability of the SfM workflow at this stage. The specific algorithms
which have been developed to function within each stage are discussed in sub-
sections following this general description. This list of algorithms and methods
is by no means exhaustive. As such, an effort has been made towards
describing in moderate detail the algorithms which directly affect the
experiments in this research project.
2-3
In his critique of various SfM algorithms, Oliensis (2000a) acknowledges that it

is unlikely that any one SfM algorithm will perform well in all cases. Citing
several of his previous works, Oliensis argues for a study of candidate SfM
algorithms at a theoretical level supplemented by experimentation to promote
understanding and to determine their limits of applicability (Oliensis and
Govindu, 1999), (Oliensis, 2000b), (Oliensis, 1999). There has been much
work in past years concerning the theoretical and experimental performance of
certain algorithms and processes which can be used in SfM applications. There
is a comparative lack, however, of studies which examine all aspects of SfM
holistically. This research project attempts to bridge that gap, using a small
number of key independent variables which each have an effect on several
workflow stages.
2-4
2.5 S TAGE O NE : F EATURE E XTRACTION

In the feature extraction stage, the objective is to locate, describe and store
image features. Image features are small, distinct objects present on an image
such that the corresponding locations on other images within the dataset can
reliably be found. As mentioned in paragraph 2.2, feature-to-feature
correspondences between multiple viewpoints are the foundation of SfM.
Therefore, feature detection algorithms must be built to reliably find and
precisely describe features in such a way that the same features can be located
on other images of the same rigid or semi-rigid scene captured at different
angles, scales, focal distances and lighting levels. In other words, a good
feature detection algorithm should produce features which are invariant and
stable despite reasonable projective transformations. (In practice, these are
often approximated by affine transformations, which is a valid assumption for
small features (Mikolajczyk and Schmid, 2004: 64). It is this challenge which
has pushed the limits of feature detection in 2D images; feature detection
algorithms are used in applications from fingerprint scanners to video
stabilisation software.
Research into feature detection is largely driven by object or image recognition

applications. For example, Nistér and Stewénius (2006) developed a novel
feature detection and indexing scheme which drastically improves the time-cost
of image registration. The de-facto standard for feature detection in the
computer vision community is currently the Scale-Invariant Feature Transform
(SIFT), developed by Lowe (2004), or variations of SIFT. In his 2004 work,
however, Lowe recognizes the contribution of many other researchers into
feature extraction technology, admitting that future feature extraction systems
are likely to combine many types of features for optimum performance.
The detector locates and vets candidate features, while the descriptor attempts
to describe the feature and its surrounding region so that it can be reliably
matched. The reliability of matching of detected features depends equally upon
the accuracy of the detector as it does on the robustness and distinctiveness of
the descriptor. Robustness is the ability of the descriptor to be reliably matched
despite image deformations and noise. Distinctiveness is the descriptor’s ability
2-5
to avoid false matches. Robustness and distinctiveness are often competing

aspects of a descriptor’s performance (Robotics Research Group, University of
Oxford, 2007).
In general, feature detectors fall into one of two different levels of complexity:
scale invariance and affine invariance. There are also versions of these which
are invariant only to rotation, such as local greyvalue invariants (Schmid and
Mohr, 1997). In addition, there are largely two different types of features which
can be extracted: blob features and corner features.
Scale invariant detectors aim to extract features whose descriptors do not

change with regard to rotation or isotropic scaling of the image. Scale
invariance is usually achieved through the exploration of scale space, wherein
Gaussian kernels of increasing size are used to smooth input images to mimic
image capture at smaller scales (Lowe, 2004). In practice, however, successful
scale-invariant features are also partially invariant to anisotropic scaling (i.e. 2D
affine transformations). If not, then these features would find little use in SfM
which, as mentioned in paragraph 0, depends upon relative camera centre
translation. The level of anisotropic scaling tolerable by a scale-invariant
feature depends almost entirely on its descriptor’s robustness (Lowe, 2004).
− (x )
G ( x, y , σ ) =
1 2
+ y 2 2σ 2
e
2πσ 2
Equation 1 - The Gaussian kernel used to explore the scale space of an image.
Affine-invariant feature detectors aim to extract features which are invariant to

high levels of anisotropic scaling, or to achieve the same effect using other
methods. The objective of affine-invariant detectors is to permit wide-baseline
matching where the view angle between images is larger than 30, 50 or even 70
degrees. There are several proposed means for achieving this performance,
such as re-scaling detected regions to circular patches (Matas et al., 2002) or
exploring the affine space (Morel and Yu, 2009).
Blob features are local extrema (maxima or minima) of pixel intensity. An

example of an affine-invariant blob detector is the maximally-stable extremal
2-6
regions (MSER) detector (Matas et al., 2002). SIFT (Lowe, 2004) is an

example of a scale-invariant blob detector which performs well even under
moderate anisotropic scaling. Corner features represent unique areas where
the curvature of discreet image intensities is large in two directions. Tuytelaars
and Mikolajczyk (2007) note that since corner features and blob features are
largely complementary, there is a potential benefit to using a combination of
these types of features.
In theory, affine-invariant features should exhibit better performance in SfM

applications than their scale-invariant counterparts. Unfortunately, affine-
invariant feature matching tends to exhibit higher computational cost.
Comparing the work of Mikolajczyk (2002) to his own work, Lowe (2004) points
out that some affine-invariant feature detectors exhibit lower repeatability than
scale-invariant feature detectors if the viewpoint angle between the two images
being matched is less than 50 degrees. Mikolajczyk’s (2002) detector,
however, exhibits better repeatability (approximately 40%) than scale-invariant
detectors at viewpoint angles of 50-70 degrees. Since it is important to
maximise the number of image-to-image correspondences in SfM, this contrast
in applicability is important to note: While scale-invariant detectors such as
SIFT would be preferable for datasets with viewpoint shifts of less than 30
degrees, affine-invariant detectors may be preferable for sparse datasets with
viewpoint angles known to be large. The following sub-sections briefly describe
a few of the available feature detectors.
2-7
2.5.1 S CALE I NVARIANT F EATURE T RANSFORM (SIFT)
Figure 2 – Diagram of the SIFT extraction process.
David Lowe (2004) explains SIFT in great detail, but a short explanation will be
provided here. Figure 2 displays a very basic flowchart for the SIFT extraction
process.
Figure 3 – DoG Layer Octaves (Lowe, 2004).
Firstly, several octaves of Gaussian-smoothed images are produced using the

kernel in Equation 1. Then, adjacent images in scale-space are subtracted to
produce difference of Gaussian (DoG) layers (see Figure 3).
2-8
Figure 4 - Identification of Local Extrema in Scale and Image Space (x,y,σ) (Lowe, 2004).
Next, maxima and minima in the image space and scale space are extracted by
examining the 26 nearest neighbours in the 3x3 adjacent pixels in the current
scale and two adjacent scales (see Figure 4). These candidates are then
subjected to a detailed fit using a quadratic curve in the three dimensions (x,y,σ)
to localise the extrema to sub-pixel accuracy. Next, the extrema with strong
edge responses, those which are poorly localised along image boundaries and
those which exhibit low contrast are eliminated. Each remaining keypoint is
given an orientation assignment based upon the local mean image gradient
direction in the region of the keypoint.
The last step in SIFT is the computation of the keypoint descriptor, which is a
128-dimension vector corresponding to 16 bins in a 4x4 grid centred around the
keypoint and oriented with regard to the keypoint’s assigned orientation. Each
bin contains 8 separate orientations, into which sample gradients are binned.
2-9
Figure 5 - SIFT Feature Descriptor Orientation (Vedaldi and Fulkerson, 2008).
SIFT is a fast, well-proven algorithm in SfM applications. Contributing to its

success is the robustness of its feature descriptor, the concept of which has
been copied by the majority of subsequent feature extraction methods. The
diagram in Figure 5, for example, is adapted from the diagram made by Vedaldi
and Fulkerson (2008) to explain the subtle differences between the VLFeat
implementation of SIFT and Lowe’s (2004) original descriptor.
2-10
2.5.2 S PEEDED -U P R OBUST F EATURES (SURF)

SURF was developed by Bay et al. (2008) as a much faster alternative to SIFT
which makes use of several computational shortcuts while retaining – it is
claimed – the accuracy and robustness of SIFT.
Figure 6 - Basic schematic diagram of the SURF extraction process.
By employing integral images and vastly simplified discretized filters, SURF

drastically reduces the computation time necessary to find interest points. The
result is an approximation of Hessian matrix based interest points, at much
improved computational cost. These interest points are local structures where
the determinant of the Hessian matrix is a maximum. The Hessian matrix at a
point (x,y ) and scale σ is defined as follows:
 Lxx ( x, y, σ ) Lxy ( x, y, σ )
H ( x, y , σ ) =  
 Lxy ( x , y , σ ) L yy ( x , y , σ )
Equation 2 - The Hessian Matrix at scale σ and point (x,y) (Bay et al., 2008).
Where:
∂2
Lxy = I ( x, y ) ∗ G (σ )
∂x∂y
Equation 3 - Lxy is the convolution (*) on the image I(x,y) of the Gaussian second-order
partial derivative over x and y.
2-11
The Hessian filter is normally discretized and cropped to permit fast

computation as in Figure 7.
Figure 7 - The discretized and cropped Gaussian second order partial derivative in the Lyy
direction (left) and the Lxy direction (right).
However, to make use of integral images to speed computation time further,

Bay et al. (2008) push the limits of simplification by employing box filters as in
Figure 8.
Figure 8 - Box filters approximating the Gaussian second-order partial derivative in Dyy
(left) and Dxy (right).
The SURF method is extremely fast; in addition to the improvements listed

above, instead of iteratively smoothing an input image with a Gaussian kernel
as in SIFT, the scale space is approximated by altering the size of the box filter,
and always applying it to the same input image. The discreet size of the box
filter means that there must be increased overlap between octaves, and that
scale-space representation is coarse, but the computational advantages appear
to outweigh this need for overlap and relative loss in precision.
In contrast to SIFT, which uses overall image gradients at the interest point to
determine keypoint orientation, SURF uses the magnitude of the Haar wavelet
response – again taking full advantage of the integral images - in the x and y
directions to map a distribution of weighted responses in the a circular region of
radius 6σ around the keypoint. A rotating envelope of size π/3 is used to
2-12
determine the orientation of the keypoint by taking the orientation with the
greatest sum of wavelet response within the envelope as in Figure 9.
Figure 9 - Sliding Window Used to Determine the Orientation of SURF Descriptors.
The SURF descriptor is extracted from a square region of size 20σ, oriented
according to the keypoint’s orientation. Like in SIFT, this region is further
subdivided into a set of 4x4 squares. Whereas the SIFT descriptor is of length
128, the SURF descriptor is of length 64, where for each of the 16 sub-regions
the Haar wavelet response is recorded in four dimensions:
v= (∑ d ,∑ d , ∑ d , ∑ d )
x y x y
Equation 4 - The haar wavelet response vector is recorded for each of the 16 sub-regions
for each interest point, making the descriptor half as long as the SIFT desciptor.
2-13
According to Bay et al. (2008), the configuration of the SURF descriptor as an

average response signature from a wider region, makes it less susceptible to
image noise than is SIFT. In addition, the smaller length of the descriptor vector
allows for much faster matching.
There are, however, a number of potential drawbacks to SURF which are not
fully discussed by Bay et al. (2008). Firstly, the implementation of SURF as a
fully discretized approximation means that it cannot be used to determine the
location of an interest point to sub-pixel accuracy. Depending on the resolution
and focal length of the input images, this has the potential to become a serious
hindrance on reconstruction accuracy. Secondly, the use by SURF of
extremely large descriptor regions has at least two drawbacks; while it may be
more robust for planar surfaces, more complex 3D scenes with discontinuities
will force keypoints located close to the discontinuity (such as the corner of a
building) to mismatch despite being an otherwise valid keypoint, even for
relatively small view angle shifts. This may make it difficult for the SfM
algorithms using SURF to correctly model sharp corners. The second
disadvantage to large descriptor regions means that a much greater percentage
of the image – anywhere near the edge of the image – becomes unusable for
feature extraction and matching. This effect will be more pronounced for small-
resolution images.
2.5.3 O THER F EATURE D ETECTORS

There are many other prominent feature detectors available to be studied which
will not be discussed in detail in this research project; however there are a few
noteworthy feature detectors which deserve mention:
MSER is a robust, unique, wide-baseline feature extraction and matching

method developed by Matas et al. (2002). MSER are integral regions defined
solely by the intensity values of the pixels they contain. If all of the pixels in the
image are sorted by intensity value and added to the image in accordance with
a moving threshold, MSER are present when the rate of change of pixels being
added to the image is at a local minimum. The MSER themselves are the
contiguous areas of high or low values, with the MSER centre at the pixel with
2-14
the extreme intensity value of the MSER. MSER detection and matching is a
novel concept, but suffers from some of the same weaknesses as SURF.
Affine-SIFT (ASIFT) is one of the only truly affine invariant detection methods
because it functions by the complete exploration of affine space (Morel and Yu,
2009). As such, it boasts far superior performance under severe anisotropic
scaling of the image. Its major drawback is the prohibitive computational cost
(several seconds per image, vs. several images per second with SIFT) for both
feature detection and feature matching. Nonetheless, further experimentation
with ASIFT could explore the ability to construct 3D models from far fewer
images, which has many implications in the defence context.
Principal Components Analysis (PCA)-SIFT (Ke and Sukthankar, 2004) uses

the same initial stages as SIFT, but in the computation of the feature descriptor,
uses PCA to drastically reduce the dimensionality of the descriptor vector. The
resulting vector can be matched using the same techniques as standard SIFT
algorithms. Not only is the matching then able to be done faster, but Ke and
Sukthankar (2004) show that the matching is in fact more reliable after PCA
reduction than with the full 128-vector.
2-15
2.6 S TAGE T WO : F EATURE M ATCHING

Image-to-image feature matching consists of identifying possible feature
correspondences between pairs of images. Given any two input images, the
features identified and described in Stage 1 are compared from image to image,
and features with adequately similar descriptors are said to be putative1
matches. Since feature detection and feature matching are closely related, the
definition of adequately similar depends upon the type of feature descriptor. For
example, software using the feature descriptor vectors generated by SIFT will
generally use a k-nearest-neighbour algorithm to determine the features which
are mutually nearest between two images (Wu, 2013a). Note that at this point
in the process, no shared three-dimensional geometry is inferred from the
feature correspondences; this is reserved until stage three.
Once a set of putative matches has been obtained, calculation of the

fundamental matrix F can be attempted for each matched image pair. Due to
the various errors and noise inherent in computer vision tasks, automated
matching generally produces a large proportion of mismatches among the
putative matches. To resolve this problem, the most popular solution is to use
the Random Sample Consensus (RANSAC) algorithm (Fischler and Bolles,
1981), which, in contrast to error minimization algorithms such as least squares
with which the geospatial community may be more familiar, was specifically
developed to deal with datasets of up to ~50% outliers. If RANSAC is not used,
then alternatives to this algorithm include least median of squares, which is less
sensitive to outliers than least squares.
2.6.1 T IME -C OMPLEXITY OF F EATURE M ATCHING

Feature matching is one of the most time-costly stages of an SfM workflow (Wu,
2013a). The simplest – and most time-consuming – method in determining
feature correspondences is to perform full pairwise matching, which attempts to
match the features in each image to those of every other image. In general,
1
Putative: “Generally considered or reputed to be.” (Oxford Dictionaries Online, 2013)
2-16
therefore, the time required to conduct full pairwise matching increases with the
square of the number of input images (Wu, 2013a). Clearly, for large sets of
input images – which can contain more than 2000 features each – the time
required for feature matching under this method becomes unmanageable. In
addition, Wu (2013a) points out that for typical large sets of images, up to 75%
to 98% of all possible image pairs do not match. For the above reasons, it
becomes prudent to consider alternative methods to performing feature
matching than full pairwise matching. Two of these methods, specified
matching and pre-emptive feature matching, are listed in the following sub-
sections.
2.6.2 S PECIFIED M ATCHING

One of the most obvious ways to decrease the time-cost of feature matching is
to specify a set of image pairs that are known to have a high probability of good
matches. For images generated from video sequences, this is especially easy
since it is known that adjacent (and closely adjacent) video frames will share
similar views. This fact will be heavily exploited in this project, as the time-cost
of “sequence matching” is effectively proportional to the number of images to be
matched for the same frame overlap factor.
2.6.3 P RE - EMPTIVE F EATURE M ATCHING

Wu (2013a) proposes to use in his VisualSFM software a pre-emptive matching
technique designed to eliminate bad pairs for very large collections of images.
In pre-emptive matching, the features are sorted from largest to smallest scale
and the first 100 features from each image are compared. The rest of the
features are then only matched if there are above a certain threshold of putative
matches among the first 100 features in both images. If not, then the image
pair is skipped.
2.6.4 K-N EAREST N EIGHBOUR (KNN) ALGORITHM

Using an exhaustive searching technique, the KNN Search algorithm finds, for
every m-dimensional point dataset P, the k-closest m-dimensional points in
dataset Q. The distance measure can be one of many types of distance, such
as the taxicab (Manhattan or L1 Norm) distance or Euclidean distance. For
example, ASIFT uses the Manhattan distance for feature matching (Morel and
2-17
Yu, 2009). For data with lower dimensionality, use of a k-d tree for feature
matching can drastically reduce computational cost.
Figure 10 - K-D Tree Example for a dataset in two dimensions (The Mathworks, Inc.,
2013).
In a K-D Tree, the reference data is split into bins such that each bin has a set
maximum number of data points. In this way, the appropriate bin can first be
found, so that an exhaustive search only has to be done on a limited number of
possible points in the reference data. The KNN Search algorithm and K-D
Trees are important in several aspects of this research project, such as the
project’s implementation of the iterative closest point (ICP) algorithm. K-D
Trees in 3 dimensions are important for the iterative closest point (ICP)
algorithm, but when dataset dimensionality exceeds 10, K-D Trees offer little
improvement over brute force (exhaustive) searches.
2.6.5 R ANDOM S AMPLE C ONSENSUS (RANSAC)

RANSAC is a parameter estimation approach designed to handle a large
proportion of outliers in the input data (Derpanis, 2010). As such, RANSAC has
been invaluable to the computer vision community and the field of SfM. A brief
2-18
outline of this algorithm (as it applies to the calculation of the fundamental

matrix F from a set of putative feature matches) is displayed in Table 1.
Table 1 - Overview of the RANSAC Algorithm. Adapted from Depanis (2010) to reflect the
calculation of the fundamental matrix F for an image pair with at least the minimum
number of putative matches.
Step Description
From the set of putative matches, randomly select the minimum number of
1
correspondences required to estimate F (7 or 8, depending on the method).
2 Solve for F.
Attempt to fit the remaining putative matches to F with the equation:
3 x'T Fx = v
Equation 5 – x’ ↔ x is an inlier match if the residual ν is less than some error
tolerance ε, which is a parameter of the RANSAC algorithm.
For any putative match which yields a residual v greater than the error tolerance ε, this
4
putative match is an outlier. Otherwise, it is considered an inlier match.
If the fraction of inliers to putative matches exceeds a threshold, then re-compute F
5
using all of the inlier matches and return F.
Otherwise, re-select the minimum number of putative matches and start again, a
6
maximum of N times
RANSAC is an extremely useful tool due to its ability to generate a viable

solution over several iterations, provided only that there is a) at least as many
putative matches as required to generate a minimum solution and b) there is a
minimum baseline chance of choosing a random sample composed of only
inliers from a dataset which may have up to 50% outliers. The robustness of
RANSAC allows for the execution of SfM tasks without specific human
intervention, and allows the constraints of feature descriptor distinctiveness to
be relaxed enough to permit an increased number of putative matches.
2-19
2.7 S TAGE T HREE : 3D R ECONSTRUCTION

In the reconstruction phase, the set of inlier matches x' ↔ x and fundamental
matrices F obtained from the Feature Matching stage are used to reconstruct
3D points using Equation 29.
2.7.1 P ROJECTIVE AMBIGUITY

According to Hartley and Zisserman (2003), the reconstruction is subject to
projective ambiguity until each camera calibration K is known. But if camera
calibration K is unknown, then an estimation of K based on the ambiguous
reconstruction and Equation 31 will also not be correct. This dilemma is
overcome in VisualSFM, the reconstruction software used in this research
project, by at first estimating K based on key logical assumptions, which are
listed in Table 2.
Table 2 - Assumptions made by VisualSFM if no specified camera calibration K is

supplied.
# Assumption Therefore
1 The sensor axes are orthogonal. The skew factor s = 0.
The sensor resolution in unit length mx = my = m.
2 The sensor’s pixel density is isotropic. The focal length cfx = cfy = cf.
The focal length expressed in pixels φx = φy = φ.
The centre of radial distortion is the T T
3 The image centre [x0 y0] = (m)[px py] .
image centre.
2.7.2 F OCAL L ENGTH I NITIALIZATION

Having made the assumptions in Table 2, the only remaining ambiguity in the
camera calibration is the focal length, which is isotropic. If no focal length can
be extracted from the image’s EXIF tags, then VisualSFM estimates the focal
length based on a scalar factor (specified in the program’s settings file) and the
maximum of image height or width in pixels. The default parameter for this
setting is 1.2, which roughly corresponds to an average medium focal length of
a handheld camera (Wu, 2013b). This estimation strategy, while extremely
crude, has been shown experimentally to allow for initial reconstructions, which
permit the re-estimation of camera focal length as cameras are added to the
incremental reconstruction. It also means that if a rough estimate of initial focal
length is known, and if all of the cameras are assumed to have the same
calibration, then the scalar estimation factor can be specified in the settings file
which will allow for a better estimate of initial calibration.
2-20
2.8 S TAGE F OUR : B UNDLE A DJUSTMENT (BA)

As mentioned in paragraph A.3.6, the objective of BA is to minimize the sum of
the squared distances between the measured image coordinates of detected
features and the back-projected image coordinates of the reconstructed 3D
features, given in Equation 30. A few of the most prominent algorithms for
performing BA on large datasets – such as those used in SfM applications with
many images – are given in the following sub-sections.
2.8.1 L EVENBERG -M ARQUARDT (LM) BA

The LM method is a standard nonlinear least squares curve fitting algorithm
(Gavin, 2011). It has been the method of choice for SfM applications to
minimise the sum of squared error in the normal equations (Byröd and Åström,
2010). The LM algorithm can be summarised as a combination of the gradient
descent method and the Gauss-Newton method (Gavin, 2011).
The gradient descent method works by finding some variation δω in the

parameters x which reduces the value of the error function f(ω). To do this, a
Taylor expansion of the error function f(ω + δω) is used:
f (ω + ∂ω ) ≈ f (ω ) + g T ∂ω + ∂ω T H∂ω
1
2
Equation 6 - The quadratic Taylor series expansion of the error function in Equation 30
(Triggs et al., 2000).
Where:
g≡
df
(ω ) is the gradient vector and
dω
 ∂2 f ∂2 f 
 (ω ) L (ω )
 ∂ω1
2
∂ω1∂ωi 
H ≡ M O M  is the Hessian matrix.
 ∂ 2
∂ 2

 ∂ω ∂ω (ω ) L ∂ω ∂ω (ω )
f f
 i 1 i i

By taking the derivative of the Taylor expansion and setting it to zero, the
variation in the parameters δω can be obtained:
2-21
∂ω = − H −1 g
Equation 7 - The Newton step prediction.
Since the determination of the Hessian matrix H is itself computationally

intensive, the LM algorithm makes use of the Gauss-Newton approximation to
the least –squares Hessian H:
H ≈ J TWJ
Equation 8 - Gauss-Newton approximation to the least-squares Hessian matrix H (Triggs
et al., 2000).
Where W is a weight matrix and J is the Jacobian or design matrix as in

Equation 9.
∂xˆ ij
J=
∂ω
Equation 9 - The Jacobian is the matrix of the change in the back-projected points in
terms of the change in the parameters ω (Triggs et al., 2000).
This version of the LM algorithm is not without its disadvantages. Like most
nonlinear least-squares estimation algorithms, the LM algorithm is susceptible
to local minima in the error function, and can falsely converge on a saddle point
(Triggs et al., 2000). Nonetheless, as long as the outlier control measures in
the RANSAC algorithm are well-calibrated and the view geometry is strong, the
LM algorithm is usually able to converge accurately (Triggs et al., 2000).
2.8.2 P RECONDITIONED C ONJUGATE G RADIENT (PCG) BA

The conjugate gradient algorithm is an iterative method for solving a symmetric
positive definite system of linear equations (Shewchuck, 1994). Its main
advantage is that in its basic configuration, it requires no matrix-matrix
multiplication, only matrix-vector multiplication, but it has the disadvantage of
slower convergence near the optimum value of ω (Byröd and Åström, 2010).
The term preconditioning refers to the reduction of the condition number of JTJ
by various methods (Byröd and Åström, 2010).
2-22
2.9 S TAGE F IVE : D ENSE 3D P OINT C LOUD G ENERATION

The fifth stage of some SfM workflows involves taking the output of the adjusted
sparse reconstruction and applying additional operations to produce a denser
3D point cloud. If 3D reconstruction and bundle adjustment are analogous to
re-triangulation and bundle adjustment in metric photogrammetry, then dense
3D point cloud generation can be thought of as analogous to the Enhanced
Automated Terrain Extraction (eATE) in the Intergraph Leica Photogrammetry
Suite (LPS) (ERDAS Inc., 2010) or Next Generation Automated Terrain
Extraction (NGATE) in BAE SOCET Set (BAE Systems, 2009).
This process involves using the internal and external camera parameters and
the set of reconstructed points to create image patches around each
reconstructed point. From these image patches, additional matches are found
and triangulated to vastly increase the density of the point cloud. One example
of a software tool that performs dense reconstruction is Fukurawa and Ponce’s
(2010) Patch-Based Multi-View Stereo Software (PMVS) and Clustering Views
for Multi-view Stereo (CMVS). Figure 11 displays an example of CMVS-PMVS
output for a sparse reconstruction made with 1000 images and the SIFT
detector, centred on the building studied in this research project.
Figure 11 - Example of a CMVS-PMVS Dense Reconstruction for the Dataset in this

Research Project.
Because the accuracy of the densified reconstruction is dependent upon input

from the sparse reconstruction of Stages 3 and 4, it will not be considered
2-23
further for experimentation in this research project, and all experimentation and
testing will be conducted with regard to sparse reconstruction.
2-24
2.10 S F M S OFTWARE
There are a number of open-source SfM software packages available for use in
SfM applications, and also a number of commercial and closed-source software
packages. This research project concentrates on open-source software, and in
the end only one package, VisualSFM, was selected for experimentation. The
following sub-sections describe a select few of the software packages that can
be used to conduct SfM 3D reconstructions.
2.10.1 V ISUAL SFM

VisualSFM is a graphic user interface (GUI) and command-line application for
fast SfM processing (Wu, 2013a). The GUI system and incremental
reconstruction process allow the user to observe the process as it occurs and to
intervene if the settings need to be changed. In addition, since Wu’s (2013a)
focus is on increasing the time-efficiency of the reconstruction process,
VisualSFM takes advantage of several computational shortcuts such as
specified matching, multi-core PCG BA, pre-emptive matching and a graphics
processing unit (GPU) implementation of SIFT (SIFTGPU) (Wu, 2007).
2.10.2 M ATLAB
The newest releases of Matlab® come with built-in image processing libraries
so Matlab scripts may be written to process a sequence of images and perform
3D reconstruction on them. Matlab can also be used as an application
programming interface (API) for another open-source computer vision library
called VLFeat (Vedaldi and Fulkerson, 2008), which can be used on previous
releases of Matlab which do not have built-in image processing libraries.
Matlab is advantageous in that it is so versatile; visualizations of the output and
a complete understanding of the process can be obtained at each stage.
However, using Matlab in this manner has two distinct disadvantages. Firstly,
the use of Matlab is tantamount to writing a software package nearly from
scratch; in addition to data pre-processing, this would involve an enormous
amount of additional effort which is not in the scope or the focus of this research
project. Secondly, Matlab is, in its basic configuration, unable to take
advantage of certain computational exploits which are embedded into other SfM
implementations, such as the graphics processing unit (GPU) implementation of
2-25
SIFT (SIFTGPU) in VisualSFM or vectorized CPU instruction sets used in the

demonstration version of ASIFT (Morel and Yu, 2009).
2.10.3 B UNDLER
Bundler is an SfM system for large, unordered image collections which creates
3D point clouds from input images in much the same way as VisualSFM
(Snavely, Seitz and Szeliski, 2006). Bundler has the disadvantage of being an
entirely command-line software package; no visual output is generated until the
output is opened in a separate viewer. In addition, Bundler doesn’t
automatically assume a starting focal length if focal lengths cannot be obtained
from EXIF tags. Finally, the default behaviour of Bundler is to perform full
pairwise matching on the input data, which for large sets of images, becomes
time-prohibitive. Figure 12 shows an example of camera pose estimation
output from Bundler (Snavely, Seitz and Szeliski, 2006).
Figure 12 - Example of Bundler Camera Pose Estimation (Snavely, Seitz and Szeliski,
2006).
2-26
2.11 P OINT C LOUD Q UALITY M EASURES AND C OMPARISON

T ECHNIQUES
After a review of related literature, there are a limited number of candidate
techniques for assessing the quality of a point cloud based on a reference point
cloud. As discussed in paragraph 2.3.3, since the SfM coordinate system is
arbitrary, discussion of statistical quality measures for the point clouds in this
research project is limited to the linear error (LE) of the modelled points to the
closest reference data point. There are a few quality measures which deserve
mention in this review.
2.11.1 N UMBER OF M ODELLED P OINTS

Since the real-world space under comparison does not change from model-to-
model, a count of the number of modelled points falling within the spatial
bounds of the interest region is sufficient to represent modelled point density. It
must be noted, however, that this statistic does not account for the distribution
of points within the model or whether the model fits the reference data or
models it completely.
2.11.2 H AUSDORFF D ISTANCE

The Hausdorff Distance is a global statistic representing the fit of one dataset to
another. The potential of the Hausdorff distance to provide a model quality
statistic was explored in this research project, and finally rejected as a useful
statistic. Grégoire and Bouillot (1998) quote Rote’s (1991) concise definition of
the Hausdorff distance:
[The] maximum distance of a set to the nearest point in

the other set.
The Hausdorff distance is a maximin distance, that is, it is the maximum of the
set of minimum distances between two datasets, as seen in Equation 10.
{
h(a, b ) = max min{d (a, b )}
a∈A b∈B
}
Equation 10 - The Hausdorff distance (Grégoire and Bouillot, 1998).
The Hausdorff distance has several major disadvantages for use in this
research project. For example, the basic algorithm has no way of knowing
2-27
whether or not a point is an outlier. As can be seen from several of the

reconstructed models, points where clouds have been matched and
reconstructed would make the unmodified Hausdorff distance algorithm almost
meaningless. This is also true for any erroneous points that have been
reconstructed beneath the surface of the soil, or anywhere where the model
does not represent the data (for example, if a tree has been chopped down
between the creation of the reference dataset and the capture of the input
images, etc.). In order for the Hausdorff distance to be used to any effect in this
type of analysis, the model and dataset would need to be discretized into small
sections, and the Hausdorff distance calculated for each block, then the
distribution analysed. From a statistical standpoint one may consider the
Hausdorff distance, as a global maximin function, to be the 100th percentile over
all model points of the nearest point distance, then it is clear that the Hausdorff
distance will give nearly meaningless results.
2.11.3 N EAREST P OINT D ISTANCE (M INIMUM ABSOLUTE L INEAR E RROR )

If the Hausdorff distance is the 100th percentile of the nearest point distance, it
follows that some analysis of the nearest point distance distribution will be more
statistically useful. The nearest point distance is defined in this research project
as the [Euclidean] distance between a point on the trimmed model and the
nearest point on the reference dataset. It has been used by Cignoni, Rocchini
and Scorpigno (1996) and Aspert, Santa-Cruz and Ebrahimi (2002), among
others. Although Girardeau-Montaut et al. (2005) suggest using simplified
surfaces to reduce computational burden, the density of the model data with
respect to the reference data in this research project is very small, and the use
of the standard nearest-distance algorithm should not have a significant effect
or be as computationally-intensive as, for example, comparing two dense
LiDAR datasets. The nearest distance is achieved by using a kNN search
algorithm with k = 1 and a 3-dimensional K-D Tree (Mathworks, 2013). In
th
addition to providing statistically meaningful results, the 90 percentile of the set
of nearest distances represents the minimum possible value for an absolute
LE90, after the model undergoes a Euclidean transformation which minimizes
the sum of squared nearest distances through ICP registration to the reference
2-28
data. The kNN search algorithm runs very quickly due to the fact that the K-D
Tree search object for the reference dataset can be pre-computed once and
then stored. The nearest point distance is the chief quality statistic used during
this research project.
2.12 S UMMARY
In this chapter, a limited literature and conceptual review were offered as
foundation for the methodology, analysis and discussion found in later chapters.
At the beginning of this chapter, the key factors in SfM which are evaluated in
this research project were described with reference to the important theory in
Annex A. Next the important concepts in SfM were presented with regard to a
typical SfM workflow. Finally, the point cloud evaluation methods considered for
use in this research project were presented.
2-29
C H A P TE R 3: M E T H O D O L O G Y AND D A TA
3 M ETHODOLOGY AND D ATA

3.1 O VERVIEW
This chapter is divided into two parts. In the first part, the methodology used for
this research project is discussed. Figure 13 forms a visual basis for the
methodology and will be frequently referenced. In reality, the relatively linear
methodology as it appears in this chapter is not a full representation of the
processes and procedures that were attempted. Many avenues were initially
explored which either did not yield any meaningful results, or encountered a
serious problem. In general, these fruitless exploits are not discussed herein,
unless they contribute to the analysis or conclusions in subsequent chapters.
In the second part of this chapter, the input, reference and supporting data used
in this research project are described.
3-1
3.2 P ART O NE : M ETHODOLOGY
Figure 13 - Methodology Flowchart.
3-2
3.3 D ATA P REPARATION
Figure 14 - Data Preparation Phase.
3.3.1 L ITERATURE R EVIEW AND F EASIBILITY C HECKS

The literature review was conducted alongside continuous feasibility
determination, during which familiarisation was done with the different software
options. Since the concepts of SfM were being learned at the same time, this
process was slow and characterized by trial-and-error scenarios.
3.3.2 C OLLECT AND O RGANISE D ATA

Concurrent with the literature review and feasibility checks, the data structure
was refined. An image database was created using Microsoft® Access 2007.
The purpose of the database was to allow the organization and control of image
metadata, and to create the text files which serve as input to the reconstruction
software. The database generates image lists, pair lists for specified matching
and ground control points (GCP) files for input into VisualSFM. It also
generates a master Windows® batch (.bat) file which calls the VisualSFM
software using the command line function with the necessary switches and
options for each case. The Visual Basic for Applications (VBA) code associated
with this database is available in Annex B.
In addition, in this phase the reference data was cleaned up to better represent
the shape of the building that would be seen from the aerial images. For
example, the reference data contained a number of points located in the interior
of the building which would not be seen in the aerial imagery. Were these
3-3
points left in the reference data, the iterative closest point (ICP) registration and
nearest distance calculations may have been adversely affected.
3.3.3 S YNTHESIZE AND ASSIGN M ETADATA

The input images were assigned individual EXIF tag metadata indicating the
approximate GPS position and focal length. This was done using Harvey’s
(2013) open-source image metadata editing tool called ExifTool. In order to
assign the correct metadata, the approximate focal length first had to be
determined. This was performed using the information in the provided sensor
specifications (BAE Systems, 2012) along with some logical assumptions. The
calculations associated with this determination are listed in Annex E.
3.3.4 I DENTIFY I NDEPENDENT V ARIABLES , C REATE C ASES AND W RITE

L IST F ILES
Three independent variables were chosen to formulate the cases in this
research project. The variables, as well as the reasoning behind their use, are
listed in Table 3.
Table 3 – Independent Variables and Justification.

Independent
# Range of Values Used Justification
Variable
See paragraphs:
Feature
• SIFT with Default Settings as in Lowe (2004) • A.3.3
1 Extraction
• SURF with Default Settings as in Bay et al. (2008) • A.3.4
Method
• 2.2
• Individual Estimation:
The software estimates each focal length
individually.
The focal
• Shared Calibration: See paragraphs:
estimation
The software estimates each focal length
2 method used • A.3.7
individually, but then forces each camera to share
by the • 2.3.1
the same distributed calibration during final BA.
software.
• Specified Calibration:
The camera calibration matrix K is specified –
single calibration for all cameras.
Input Image 3 to 1000 images per reconstruction in 46 discreet See paragraph:
1
Count counts. 2.3.2.
The full list of 276 cases used in this research project is available in Annex F.
Supplied with the list of cases, the image database was used to produce the
image lists required as input to VisualSFM.
3-4
3.4 SURF F EATURES

The SURF feature extraction and matching phase was conducted using
Matlab® scripts to extract, match and reformat the SURF features in the input
image dataset. The Matlab® code used for this is available in Annex C.
Figure 15 - SURF Feature Extraction and Matching Phase.
3.4.1 B UILD THE M ASTER M ATCH M ATRIX

The master match matrix is a lower triangular, logical (true/false) matrix
describing which image pairs need to be matched by any of the 138 SURF
cases. In total, just over 1.0E6 pairs were required.
3.4.2 B UILD R EQUIRED SURF L IST , E XTRACT SURF F EATURES AND

D ESCRIPTORS
Using the master match matrix, a list of all images for which SURF features
needed to be extracted was produced. Then, every SURF features and
descriptors for every image on the list were extracted.
3.4.3 W RITE SURF F EATURES IN SIFT F ORMAT

In order to be accepted as input to VisualSFM, the SURF features needed to be
re-written in Lowe’s (2004) format. Because matching was performed outside
of VisualSFM, the SURF descriptors were replaced by 128-vectors of zeros.
3.4.4 M ATCH ALL R EQUIRED P AIRS

Matching the 1.0E6 SURF image pairs using Matlab® took several days. To
ensure the robustness of this process, the script was carefully written and
tested to be restartable and to perform automated backups after every 10k
pairs.
3-5
3.4.5 W RITE M ATCH L ISTS

After the matching was completed, match lists were written for every SURF
case in the format specified in the VisualSFM documentation (Wu, 2013b).
3-6
3.5 SIFT F EATURES
Figure 16 - SIFT Feature Extraction and Matching Phase.
The SIFT feature extraction phase is built into VisualSFM using Lowe’s (2004)
binary. Every step in this phase is as described in paragraph 0. A visual
depiction is shown in Figure 16. The only noteworthy points in this
implementation are that once a set of SIFT features and descriptors are
extracted, the file is stored to save time in the event of future use. The same is
true for the match indices for matched pairs.
3-7
3.6 R ECONSTRUCTION
The reconstruction process using VisualSFM is fairly transparent to the user. In
this research project, reconstructions were performed using either SIFT or
SURF input as dictated by the current case. Internal settings within VisualSFM
were not modified from the original settings throughout the reconstruction
process.
Figure 17 - Reconstruction Phase.
3.6.1 I NTERLEAVED P ARTIAL OR F ULL BA WITH PCG

As described by Wu (2013a), the reconstruction makes use of the PCG
algorithm described in paragraph 2.8.2. By default, which were the settings that
were used, a full version of BA is performed after the addition of every five new
cameras, and a partial BA (with limited matrix size and number of iterations) is
performed after every new camera is added.
3.6.2 T RANSFORM THE M ODEL

After reconstruction in the arbitrary coordinate system is complete, VisualSFM
uses the camera coordinate information supplied in the GCP list to calculate an
appropriate 3D Similarity transformation to fit the reconstructed camera
locations to their ‘real world’ positions. The same transformation is applied to
the model, thus roughly aligning the model to a real-world coordinate system.
This is as close to actual GCPs as is possible with the current version of the
software, especially in a fully automated sense.
3.6.3 O UTPUT TO NVM F ILE

Finally, the 3D model is output to a VisualSFM-specific ‘.nvm’ format.
3-8
3.7 P OINT C LOUD E VALUATION

The majority of the point cloud evaluation phase is implemented by a set of
comprehensive Matlab® functions and one controlling script, which are included
in Annex D. Nonetheless, a short explanation of the process is given in the
following sub-sections, and a diagram is provided in Figure 18.
Figure 18 - Point Cloud Evaluation Phase.
3.7.1 R EAD IN R EFERENCE PLY F ILE

To read the reference Stanford Polygon (PLY) format file into a Matlab® data
structure, the ‘plyread.m’ routine written by Colin MacDonald (2011) was used.
3.7.2 C REATE K NN S EARCH O BJECT

A 3-dimensional K-D Tree Search Object was created using the reference data
with maximum bin size of 50. This proved to drastically reduce computation
time for the three functions requiring a kNN search: ICP, Horizontal Bracketing
and Near Point Distance.
3.7.3 R EAD IN NVM F ILE

A custom script was written to read in NVM files into the same data structure
format as in paragraph 3.7.1.
3.7.4 V ERTICAL C ORRECTION WITH T IE P OINTS

Four vertical tie points were specified, located on the ground outside the four
corners of the modelled building. Inverse distance weighting (IDW) with a
power of 2 was used to generate tie point height values for the tie point
locations on the reference data set and on the model. This was performed
3-9
before Horizontal alignment as experimentation showed that the vertical

alignment of the model was consistently worse than horizontal alignment.
3.7.5 R OUGH H ORIZONTAL B RACKETING ALIGNMENT

Next, horizontal bracketing was performed to roughly align the model to the
building horizontally. This was done in much the same way as ICP, with the
error function being the sum of squared nearest point distances, however the
minimization was performed by iteratively alternating alignment in the x and y
directions.
3.7.6 I TERATIVE C LOSEST P OINT (ICP) ALIGNMENT

The ICP algorithm was adapted from Jakob Wilm’s (2010) implementation of
ICP, but simplified to reflect the current dataset and modified to conform to a
least-squares error minimization. In addition, the Euclidean transformation
representation was altered to reflect paragraph A.4.1 so that the results of the
transformation are more readily understood.
3.7.7 N EAREST P OINT D ISTANCE

The nearest point distance is simply the output of Matlab®’s kNN search
function using the 3D K-D Tree obtained in paragraph 3.7.2.
3.7.8 T RIM THE P OINT C LOUD

The point cloud was trimmed horizontally (but not vertically) to the study area.
3.7.9 O UTPUT TO F ALSE C OLOUR AND T RUE C OLOUR PLY F ILES

The true colour and false colour PLY files were produced using MacDonald’s
(2011) ‘plywrite.m’ function. In the false colour ply file, each point is assigned a
colour on a linearly-changing scale from blue (close) to yellow (far) closest point
distance. The range for the gradient is given in Equation 11.
rmin = max{0, µ d − 1.96σ d }

rmax = µ d + 1.96σ d
Equation 11 - The range of the colour gradient for the false-colour nearest distance PLY
file.
3.7.10 ANALYSE D ATA

In terms of data analysis, the output comma-separated values (CSV) file was
imported into Microsoft® Excel and calculations conducted to determine the
3-10
LE90AbsMin, etc. The major results of these calculations are given in Chapter 4,
the complete set of charts in Annex G.
3-11
3.8 P ART T WO : D ATA

The following sections briefly discuss the data used in this research project.
Most of the experimental data was provided by Jonathan Fournier at Defence
Research and Development Canada (DRDC), and the contextual data was
provided by Dave Rowlands at the Mapping and Charting Establishment (MCE).
3.8.1 I NPUT I MAGES

The images used as input data for this research project were obtained as
frame-grabs from a 30Hz video sequence. The sequence was shot from a
Wescam turret mounted underneath the National Research Council (NRC) Twin
Otter aircraft (see Figure 19).
Figure 19 - NRC Twin Otter Aircraft with Wescam Turret Fitted.
3-12
Figure 20 - Sensor Suite of the JMMES System.
The sensor in use was the medium-wavelength infrared (MWIR) sensor as part
of the BAE Systems joint multi-mission electro-optical sensor (JMMES) suite.
The resolution of each image is 640x480 pixels, and the focal length determined
to be a constant 6750 pixels, assumed to be isotropic as in Annex E (see Figure
20).
3-13
Figure 21 - Aerial Platform Flight Path.
Figure 21 displays the trace of the flight path as it performs a single orbit around
the target building. The flight path information is taken from the supplied image
metadata.
3-14
Figure 22 - Sample Input Images.
Figure 22 displays two sample images from the input dataset.
3.8.2 R EFERENCE D ATA

The reference data used in this research project was captured and processed
by Terrapoint, Inc. and is a fused LiDAR dataset from both ground and aerial
sources (Terrapoint, Inc., 2007). Importantly, the quality of the ground and
aerial fusion is within 0.05m (assuming at 1σ ≈ 68% confidence). In addition,
the standard deviation σ of absolute vertical accuracy was evaluated at 0.021m,
with a RMSE of 0.041m. Figure 23 displays an oblique view of the untrimmed
reference data in an orthographic projection.
Figure 23 - Reference Dataset.
3-15
3.8.3 S UPPORTING D ATA

The supporting data in this research project provides context and includes aerial
imagery, LiDAR-derived elevation models and vector information. It was
provided by MCE.
3.9 S UMMARY
In this chapter, both the methodology and the data used in this research project
were presented. The methodology is a logical, nearly-linear sequence of
operations, but it was developed over several iterations to its present form. The
data used in this project can be divided into the three distinct categories of input
data, reference data and supporting data. The input data is merely a sequence
of images, meaning that the methodology used in this project can be applied to
any set of images for which there also exists similar reference data.
3-16
C H A P TE R : 4 R E S U L T S
4 R ESULTS
4.1 O VERVIEW
This chapter is designed to convey the most important initial observations and
global results from the experimentation described in Chapter 3. Detailed
discussion of the observations is deferred until Chapter 5. Firstly, the most
important overall performance statistics are described. The second part of this
chapter displays some of the charts in order to introduce the concepts
discussed in Chapter 5.
4.2 O VERALL P ERFORMANCE S TATISTICS

Table 4 displays some of the most prominent overall performance statistics
resulting from the experimentation.
Table 4 - Overall Performance Statistics

Success Rate Focal Length Estimation
Feature Detector/Descriptor Specified Shared Individual
SIFT 63.04% 58.70% 56.52%
SURF 43.48% 39.13% 41.30%
Mean LE90AbsMin (Percentile) Focal Length Estimation

SIFT 3.600m 10.316m 7.720m
SURF 9.201m 20.654m 18.626m
Variance of LE90AbsMin Focal Length Estimation

SIFT 20.167m² 65.222m² 52.117m²
SURF 164.101m² 237.311m² 178.594m²
Mean Vertical Centroid Adjustment Focal Length Estimation

SIFT -55.407m -49.957m -60.523m
SURF -41.217m -68.859m -44.696m
Mean Horizontal Centroid Adjustment Focal Length Estimation

SIFT 11.860m 67.934m 92.247m
SURF 52.917m 105.240m 83.808m
Mean Number of Reconstructed Points Focal Length Estimation

SIFT 14379 12523 13334
SURF 4491 2139 2090
In order to be considered a ‘success,’ the model’s total Euclidean centroid

adjustment had to be less than 1000m, and the model must have contained at
4-1
least 10 reconstructed points within the trimmed area of interest. Models not
meeting both of these requirements were considered unsuccessful
reconstructions.
4.3 I NITIAL O BSERVATIONS

It can be seen from Table 4 that the SIFT features in general perform better
than SURF, and that the reconstructions performed using the specified
calibration in general also performed much better than the individual or shared
estimation techniques.
In addition, the difference in performance between the individual estimation

strategy and the shared estimation strategy is not immediately clear from this
table.
While Table 4is an interesting general tool, it does not convey any information
about the distribution of these statistics with regard to the research project’s
third independent variable: the number of input images.
4-2
4.4 N UMBER OF I NPUT I MAGES

It must be noted that the number of input images in the experiment is directly
related to a number of factors, which can equivalently be compared to the
quality statistics. These other factors are listed in Table 5.
Table 5 - Relation of the Number of Input Images (n) to a Number of Other Factors
Linked
# Determination Significance
Factor
Mean view angle is a common
independent variable for tests of
Mean View
1 θ = n / 360° repeatability against Affine
Angle (θ)
transformations in feature detectors and
descriptors.
If Full Pairwise Matching is
used:
Number of Represents the number of ties each
m = 0.5 * (n – 1)
2 Pairs per image has – the potential for each image
If the standard 10% overlap
Image (m) to match other images.
is used:
m = n * 0.1
Because feature matching is so
computationally expensive, the total
Total Number
number of matched pairs used to
3 of Matched p=n*m
achieve a reconstruction is
Pairs (p)
representative of the total computational
effort required.
Matches per Represents the total number of matches
4 d = p / 360°
Unit Angle (d) scaled to a common factor.
From Table 5, is can be seen that the number of input images can be used to
directly determine a number of other factors. The charts relating to some of
these alternatives are available in Annex G.
4-3
4.5 K EY C HARTS
4.5.1 N UMBER OF R ECONSTRUCTED P OINTS VS . N UMBER OF I MAGES
Figure 24 - Number of Reconstructed Points vs. Number of Images.
Two observations are immediately clear from Figure 24. Firstly, that the
number of reconstructed points is nearly linear with regard to the log10 of the
number of input images. Secondly, the SIFT-Specified scenario is not only
most reliable, but it also yields the highest performance out of the other
estimation methods. Another interesting observation is that other reconstruction
methods only appear to become somewhat reliable at 150+ input images, and
that no reconstruction is successfully done at less than 20 input images.
4-4
4.5.2 LE90 ABS M IN (P ERCENTILE ) VS . N UMBER OF R ECONSTRUCTED

P OINTS
Figure 25 - LE90AbsMin (Percentile) vs. Number of Reconstructed Points
Figure 25 deliberately displays only the SIFT-Specified scenario to introduce

another important observation which will be discussed in greater detail in
Chapter 5: As the number of reconstructed points increases, so does the
apparent minimum value for the LE90AbsMin.
4-5
4.5.3 LE90 ABS M IN (P ERCENTILE ) VS . N UMBER OF I MAGES
Figure 26 - LE90AbsMin (Percentile) vs. Number of Images
Clearly from Figure 26, the SIFT-Specified scenario is again not only the most
reliable reconstruction method, but it also appears to represent a lower bound
for the value of LE90AbsMin.
4-6
4.5.4 C ENTROID V ERTICAL ADJUSTMENT VS . N UMBER OF I MAGES
Figure 27 - Centroid Vertical Adjustment vs. Number of Images.
Figure 27 and Figure 28 represent the vertical and total horizontal adjustments
obtained by combining the centroid translation values for both the coarse
registration and the ICP output. It can be seen that there is a clear bias present
in the vertical coordinates, but also that the SIFT-Specified reconstructions are
consistently adjusted down by approximately 60m.
4-7
4.5.5 C ENTROID H ORIZONTAL ADJUSTMENT VS . N UMBER OF I MAGES
Figure 28: Centroid Horizontal Adjustment vs. Number of Images.
In Figure 28 it can be seen that while the other scenarios become more reliable
beyond 120+ images, the SURF-Shared and SURF-Individual scenarios remain
noisy in terms of horizontal adjustments throughout. This could be an indication
of poor model geometry or it could also indicate that these methods yield an
insufficient number of points for the rough registration techniques to converge in
a bracketing least-squares minimization.
4-8
4.6 S UMMARY
In this brief chapter, the most important results and the key, initial observations
of the experiments conducted in this research project were shown. It is
immediately clear that the Specified calibration cases outperform the other
estimation techniques and that SIFT outperforms SURF. But there remain
several questions as to the potential causes and implications of these
observations. The basic observations made in this chapter will be discussed in
further detail in Chapter 5.
4-9
C H A P TE R : 5 A N A L YS I S AND D I S C U S S I ON
5 A N ALYSIS AND D ISCUSSION

5.1 O VERVIEW
In this chapter, a more thorough analysis of the results is conducted,
understanding that the results from the experimentation in this research project
are particular to the methodology and dataset used. While the discussion of
results will aim to lead to conclusions of a general nature, it is obvious that
different datasets or methodologies may yield wildly different results. The
chapter starts with a detailed examination of the results, in which hypotheses
are offered for the observed phenomena reported in Chapter 4. Later sections
move on to propose improvements to the methodology of the experimentation
which were either not in the original scope of the project, or were not
implemented due to time constraints. Finally, some of the potential implications
for defence and the wider geospatial and computer vision communities are
discussed.
5.2 E XAMINATION OF R ESULTS

5.2.1 R ECONSTRUCTION D ENSITY
Recalling Figure 24 in Chapter 4, it is clear that the reconstruction density
generally increases as more images are added, providing that the
reconstruction is successful in the first place. The fact that the maximum
reconstruction density is almost proportional to the base-10 logarithm of the
number of input images means that the ‘quality vs. effort’ ratio of the output falls
off drastically with the addition of more images. This has important implications
for the defence community, especially when operational time constraints are
considered. To emphasize this point, the reader is referred to Figure 29, where
it can be seen that the curve of the number of reconstructed points vs. the
number of matched pairs (representing computational effort) has a distinct
falloff, but remains positive.
5-1
Figure 29 - Number of Reconstructed Points vs. Number of Matched Pairs.
In addition, it is clear from Figure 30 that the majority of the gains in quality – as
measured purely by the number of reprojected points – occurs below 1000
matched pairs.
Figure 30 - Rate of Change of Number of Reconstructed Points vs. Number of Matched

Pairs
5-2
A deeper consideration of what occurs between 0 and 100 input images (or
1000 matched pairs at 10% overlap) may be necessary. Recall also from
Figure 24 that the reliability of the reconstruction from the SIFT-matched pairs is
poor until about 100 images or 1000 matched pairs. This means that in
general, the number of reconstructed points is likely more dependent at this
stage on the success of the model; more reconstructed points means more
points are available to tie images to the bundle of sparse reprojections.
But there is also a simpler explanation: the rate of change of the number of
reconstructed points is related to the amount of new information given by each
image. If there are 50 images, with a mean view angle of 7.2°, then adding one
more image adds more information than if there are already 100 images, with a
mean view angle of 3.6°.
5.2.2 C OMPARISON B ETWEEN SIFT AND SURF

SIFT clearly outperforms SURF in the experimentation in this research project,
although SIFT and SURF exhibit many of the same trends. While SIFT-
Specified reconstruction becomes reliable at about 100 images, SURF-
Specified becomes reliable at about 200 images. The number of reprojected
points in both methods increases as the number of images increases, and the
lower bound of the LE90 appears to exhibit the same relationship with regard to
the number of reconstructed points. The discrepancy between SIFT and SURF
is difficult to measure because they represent two completely different feature
extraction and feature description methodologies. In addition, SIFT and SURF
both come with their own sets of default parameters, which interact with the
single dataset in different ways. One potential cause of why SURF performed
much poorer than SIFT is the initial number of features – which in turn affects
the number of possible putative matches per image pair – generated by each
method. During the experimentation, there were approximately 5000-7000
SIFT features extracted from each image, and depending mainly on the
matching pair’s view angle shift, up to 1000 matched features per matched
image pair. SURF, by contrast, yielded up to 1000-1200 features per image
and only 120-400 matched features per image pair, even at very small view
5-3
angles. Unfortunately, explicitly recording this information for each case was
not part of the experimental design (see paragraph 5.3.5).
Nonetheless, it can be said that the difference in performance between SIFT

and SURF is likely attributable, at least in part, by the discrepancies in the
numbers of matched features per pair. This certainly is not to say that more
putative matches is always better. An image pair yielding 200 putative matches
of which 90% are inlier matches, agreeing on a solution for the fundamental
matrix F, is far more desirable than an image pair yielding 1000 putative
matches of which 40% are inlier matches. In the latter case, the RANSAC
algorithm will have difficulty in even finding a solution for F (Fischler and Bolles,
1981).
5-4
Figure 31 - Side View of a False-Colour SIFT-Specified Reconstruction Using 1000

Images Showing a Large Number of Erroneously Reconstructed Points
There is an additional reason why more matches is not always better:

Increasing the number of matches purely by relaxing the distinctiveness or
increasing the robustness of the descriptor can yield to increased numbers of
putative matches which are matched to undesirable artefacts in the scene. For
large numbers of images, this problem was encountered with the SIFT
reconstructions as several points were matched to clouds and other ephemeral
elements of the scene. See Figure 31 (SIFT) and Figure 32 (SURF) for a
comparison of the effects of this phenomenon.
5-5
Figure 32 - Side-View of a False Colour SURF-Specified Reconstruction Using 1000

Images Showing a Limited Number of Erroneously Reconstructed Points.
5.2.3 T HE M INIMUM ABSOLUTE L INEAR E RROR AT 90 P ERCENT

C ONFIDENCE (LE90 ABS M IN )
Recall from Figure 25 that there is a small but distinct increase in the minimum
threshold of the LE90AbsMin with increases in the number of reconstructed points.
Also recall from Figure 26 that this threshold is present regardless of the feature
extraction method or the method of focal length estimation. Thomas P. Ager’s
(2004) Percentile method for determining the LE90 was used precisely because
for each reconstruction, there are many measurements and the Percentile
method does not rely on the assumption of a particular error distribution. But as
can be seen in Figure 26 and Figure 25, there is a clear increasing relationship
between the number of projected points and the minimum linear error. The
effects of this relationship across different feature extraction methods can be
seen in Figure 31 and in Figure 32.
This is perhaps due to the increasing difficulty of fitting a more complex model
to the same reference data. If that were the case, however, one would expect
the minimum accuracy to approach and attain a certain limit. A better
explanation is that, while the number of images increases, the view angle
decreases, and therefore the stereo-depth ambiguity increases. While
5-6
increasing the number of images increases the density of reconstructed points,

it has the twofold negative effect of both increasing the number of inlier matches
which would ordinarily not have been matched due to poor distinctiveness, and
also limits the geometric strength of the reprojection measurement of those new
matches. One suggested modification to the reconstruction algorithm leading
from this explanation is given in paragraph 5.3.7.
5.2.4 F OCAL L ENGTH E STIMATION

Considering the results, there is very little question about the value of specifying
the camera focal length as an a priori restriction on the camera parameters.
While this was an expected result, it is important to note that the focal length
and location of the principal point in this research project were calculated based
largely on assumptions (see Annex E). This fact has important implications
which are explained in paragraph 5.4.3.
5.3 S UGGESTED I MPROVEMENTS TO THE M ETHODOLOGY

5.3.1 B RACKETING ALIGNMENT
The problem of local minima is a well-known problem for the computer vision
community in terms of BA (Byröd and Åström, 2010). A similar problem occurs
during ICP when the initial alignment of the two datasets is poor. The
bracketing alignment implemented in this research dissertation attempts to
improve the initial alignment of the model so that ICP is able to converge
properly. However, it also suffers the problem of local minima. One suggested
improvement is to perform rough alignment in the X direction by calculating the
set of residual values for a number of small increments around the current
guess, pick the best alignment, and then do the same for the Y direction.
5.3.2 ICP WITH A S IMILARITY T RANSFORMATION

The current implementation of ICP was adapted from Wilm (2010) and
estimates the parameters of a Euclidean transformation. The addition of solving
for overall scale may allow for those models which were not successfully
transformed by the reconstruction software to be fully aligned to the reference
data.
5-7
5.3.3 C OMBINING B LOB AND C ORNER F EATURE D ETECTION

As suggested in Tuytelaars and Mikolajczyk (2007), the reconstruction
algorithm could easily be changed to take advantage of both blob and corner
features, which are largely complementary. Hypothetically, this could drastically
increase the overall performance of the reconstruction process, and is an
interesting avenue for further research.
5.3.4 E XPLORATION OF AFFINE -I NVARIANT D ETECTORS

The exploration of fully affine invariant feature detectors could be supported in
much the same manner that the implementation of SURF was supported in the
current algorithm. Again, this is a very interesting avenue for further research
on the defence applications of SfM.
5.3.5 R ECORD THE N UMBER OF D ETECTED F EATURES PER I MAGE AND THE
N UMBER OF M ATCHED F EATURES PER P AIR
A simple addition to the current methodology could include the recording of the
number of matched features per pair and the number of detected features per
image. With such precise figures, additional comparisons could be made
regarding the specific performance of SIFT and SURF (or any number of
feature detectors).
5.3.6 C HANGE THE D ISTRIBUTION OF THE N UMBER OF I NPUT I MAGES

Since the representations based on the number of input images in this research
project almost exclusively used a logarithmic scale, it follows that the variation
of the number of input images would be better implemented as a power
function; say, for each case, double the number of input images of the previous
case. In this way, the similar conclusions could likely be drawn with far fewer
cases.
5.3.7 R ESTRICT R ECONSTRUCTIONS OF P OINTS D ETERMINED WITH P OOR

G EOMETRY
As postulated in paragraph 5.2.3, a modification to the reconstruction algorithm
could be made to skip the reconstruction of points obtained from images whose
view angle yielded poor stereo geometry. Alternatively, the poor-geometry
points could be restricted only to points reconstructed from 3+ views, etc.
5-8
5.3.8 ATTEMPT E ACH C ASE WITH MORE THAN O NE C OMBINATION OF

C AMERAS
Intstead of only performing reconstruction once for each case, the same
procedure could be used multiple times for each case, with a different
combination of equally-spaced input images each time. This may improve the
distribution of the data.
5-9
5.4 I MPLICATIONS OF THE F INDINGS

5.4.1 T HE ‘Q UALITY VS . E FFORT ’ R ATIO
To ensure operational effectiveness and efficiency, commanders are interested
in conservation of effort, which is an often-ubiquitous concept. The
determination that the ‘quality vs. effort’ ratio of SfM reconstructions is much
higher with fewer images, and also that the LE90 tends to be lower for
(successful) reconstructions from fewer images, are both indications that a
limited solution based on incomplete information is likely to remain a valuable
decision-making tool, even in a rapidly-changing environment. Large-scale
reconstructions based on significantly larger input datasets will take much
longer to produce, but the product can be stored for later use and will likely
represent a much more static scene. In short, the level of resources required
and the amount of time necessary to produce SfM reconstructions are easily
scalable. SfM can be scaled to the appropriate level of support required, from
quick reconstructions of a small objective at the tactical level, to large
reconstructions of an enemy camp or a city block – or even larger – at the
operational level.
5.4.2 G EOMETRIC D ISSOLUTION OF M EASUREMENTS

It is a well-known concept in the geospatial community that measurement
performance is significantly reliant on good geometry. But in the world of SfM,
there is a competing requirement to ensure that there are an adequate number
of putative matches and within those putative matches, a high percentage of
inlier matches, which, with many of the de-facto standard feature extraction
techniques, severely restricts the operational view angle which can be
supported. A balance must therefore be struck between the need to ensure
good matches and the need to avoid stereo depth ambiguity. If the objective is
to increase the reconstructed point density while retaining reconstruction
accuracy, it would be preferable to use fewer images from two orbits of a scene
at different altitudes than to use a higher density of images from a single orbit.
5.4.3 F OCAL L ENGTH

It is apparent from the results of the experimentation that specifying a best-
estimate for the focal length is consistently preferable to providing no
5-10
information. Due to Gaussian image noise, image distortions and the

inaccuracies associated with feature extraction, SfM is not an exact science.
Whatever can be done, therefore, to improve the probability of reconstruction
success and the accuracy, should be done, even if the parameters are
estimates.
5.5 S UMMARY
In this chapter, the results in Chapter 4 were discussed in detail, yielding
several general findings which are likely to apply to a large number of SfM
applications. In addition, analysis of the results led to several suggested
improvements to the project methodology, most of which can be implemented
relatively quickly.
5-11
C H A P TE R 6: C O N C L U S I ON
6 C ONCLUSION AND S UGGESTED R ESEARCH

6.1 C ONCLUSION
Considering the scenario of the four-man reconnaissance patrol originally
described in the introduction, it is clear that, while possible, the scenario would
only yield favourable results if a few conditions are met. As in any military
action, rehearsals are of paramount importance, and the patrol would have to
be aware that overlapping views are necessary to produce a model. In addition,
the scene must be static, and only objects which remain unchanged across
different views will be properly reconstructed. In addition, the recording the
focal length of each image is important, as is allows reconstruction to a higher
degree of fidelity. Subject to the type of feature detector used, some maximum
and minimum view angle between the images should be provided as a
guideline.
The usefulness of SfM is not limited to such small-scale scenarios. In fact, it

has been shown that SfM applications can be scaled to a wide variety of
applications across the defence and geospatial communities.
6.2 S UGGESTED F URTHER R ESEARCH

It is clear that the utility of such holistic approaches to SfM research is far from
exhausted. The findings from this research project lead to a few key areas in
which further research may yield valuable results for the geospatial and/or
defence community.
6.2.1 AFFINE -I NVARIANT I NTEREST P OINT D ETECTORS

Since there is competition between the distinctiveness and robustness of
feature descriptors, and additional conflict between the geometric dissolution of
measurements and the number of image matches, a holistic-approach
methodology may be useful in exploring the effect that using a fully affine-
invariant interest point detector would have on the reconstruction success rate
and other quality measures.
6.2.2 C OMBINATION OF B LOB AND C ORNER D ETECTORS

Similar to the exploration of affine invariant feature detectors is the interesting
concept of using the complementary feature detection types of blob and corner
6-1
C H A P TE R 6: C O N C L U S I ON
features in tandem. For example, an investigation could attempt to answer

whether or not the number of reconstructed points for such a tandem case
would be equal to the sum of the two separate cases. This research could also
be combined with the use of affine-invariant detectors.
6.2.3 F OCAL L ENGTH

The benefits of using an estimated focal length were established in this
research project, but no experimentation was done to determine the effect of
specifying a focal length that is clearly erroneous. A research project could
explore at what level of distortion to the ‘true’ focal length it becomes more
beneficial to allow the software to estimate the focal length.
6.2.4 ADDITIONAL D ATASETS AND S CENARIOS

As suggested by the title, future research could explore whether or not the
findings of this research project are valid against other scenarios with different
data. The challenge will be to either obtain an image dataset that has an
associated reference dataset, or to develop some other quality assessment
technique.
6.2.5 ADDITIONAL Q UALITY ASSESSMENT T ECHNIQUES

Given the rapid rise in the popularity of SfM, there is a considerable amount of
recent research into different quality assessment techniques for 3D data. Not
explored in this research project was the notion of creating sweep surfaces (Wu
et al., 2012) or representative shapes from the reconstructed models. These
techniques, or other techniques, could be explored as part of a future research
project.
6-2
W OR D C O U N T
W ORD C OUNT
Word Count: 13,301
This word count excludes captions, titles, tables, quotes and annexes.
I
W OR K S C I TE D
W ORKS C ITED
Agarwal, S., Snavely, N., Seitz, S.M. and Szeliski, R. (2009) Building Rome in a
Day, ICCV.
Ager, T.P. (2004) An Analysis of Metric Accuracy Defnitions and Methods of
Computation, NIMA InnoVision.
Aspert, N., Santa-Cruz, D. and Ebrahimi, T. (2002) MESH: Measuring Error
between Surfaces using the Hausdorff Distance, IEEE International
Conference on Multimedia and Expo, 705-708.
BAE Systems (2009) SOCET SET User's Manual, 55th edition, BAE Systems.
BAE Systems (2012) JMMES Sensors Secifications, BAE Systems Spectral
Solutions LLC.
Bay, H., Ess, A., Tuytelaars, T. and Van Gool, L. (2008) Speeded-Up Robust
Features (SURF), Computer Vision and Image Understanding, vol. 110,
no. 3, pp. 346-359.
Byröd, M. and Åström, K. (2010) Conjugate Gradient Bundle Adjustment,
ECCV, vol. II, pp. 114-127.
Cignoni, P., Rocchini, C. and Scorpigno, R. (1996) Metro: Measuring Error of
Simplified Surfaces (Technical Report), Paris, France: Centre National de
la Recherche Scientifique, Paris.
Derpanis, K.G. (2010) Overview of the RANSAC Algorithm, 12th edition,
Unpublished.
ERDAS Inc. (2010) eATE User's Guide 2010, ERDAS, Inc.
Fischler, M.A. and Bolles, R.C. (1981) Random Sample Consensus: A
Paradigm for Model Fitting with Applications to Image Analysis and
Automated Cartography, Communications of the ACM, vol. 24, no. 6,
June, pp. 381-395.
Furukawa, Y. and Ponce, J. (2010) Accurate, Dense and Robust Multi-view
Stereopsis, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 32, pp. 1362-1376.
Gavin, H. (2011) The Levenberg-Marquardt method for nonlinear least squares
curve-fitting problems, Durham, NC: Duke University.
Girardeau-Montaut, D., Roux, M., Marc, R. and Thibault, G. (2005) Change
Detection on Points Cloud Data acquired with a Ground Laser Scanner,
ISPRS Workshop on Laser Scanning, Enshede, the Netherlands, 12-14.
Grégoire, N. and Bouillot, M. (1998) Hausdorff Distance Between Convex
Polygons, [Online], Available:
http://cgm.cs.mcgill.ca/~godfried/teaching/cg-
projects/98/normand/main.html [12 September 2013].
Hartley, R.I. and Zisserman, A. (2003) Multiple View Geometry in Computer
Vision, 2nd edition, Cambridge: Cambridge Univeristy Press.
II
W OR K S C I TE D
Harvey, P. (2013) ExifTool by Phil Harvey, 7 September, [Online], Available:

http://www.sno.phy.queensu.ca/~phil/exiftool/ [10 August 2013].
Keikkilä, J. and Silvén, O. (1997) A Four-Step Camera Calibration Procedure
with Implicit Image Correction, IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, San Juan, 1106-1112.
Keonderink, J.J. and van Doorn, A.J. (1991) Affine Structure from Motion,
Optical Society of America, vol. 8, no. 2, February, pp. 377-385.
Ke, Y. and Sukthankar, R. (2004) PCA-SIFT: A More Distinctive Representation
for Local Image Descriptors, IEEE Computer Society Conference on
Commputer Vision and Pattern Recognition, Washington, DC, 506-513.
Lowe, D.G. (2004) Distinctive Image Features from Scale-Invariant Keypoints,
International Journal of Computer Vision, vol. 60, pp. 91-110.
MacDonald, C. (2011) plyread.m via GitHub File Repository, [Online], Available:
https://github.com/cbm755/cp_matrices/tree/master/surfaces/readply [12
August 2013].
Matas, J., Chum, O., Martin, U. and Pajdla, T. (2002) Robust Wide-Baseline
Stereo from Maximally-Stable Extremal Regions, British Machine Vision
Conference (BMVC) 2002, Cardiff, UK, 384-393.
Mathworks (2013) Documentation Center, [Online], Available:
http://www.mathworks.co.uk/help/stats/knnsearch.html [10 September
2013].
McGrath, A. (2010) An Investigation into the Feasibility of Exploiting Aerial Full
Motion Video for the Creation of High Resolution Three Dimensional
Digital Elevation Models, Cranfield University: Royal School of Military
Survey MSc Thesis.
Mikolajczyk, K. (2002) Detection of Features Invarient to Affine
Transformations, Institut National Polytechnique de Grenoble, p. Ph.D.
Thesis.
Mikolajczyk, K. and Schmid, C. (2004) Scale & Affine Invariant Interest Point
Detectors, International Journal of Computer Vision, vol. 60, no. 1, pp. 63-
86.
Morel, J.-M. and Yu, G. (2009) ASIFT: A New Framework for Fully Affine
Invariant Image Comparison, SIAM Journal on Imaging Sciences, vol. 2,
no. 2.
Nistér, D. and Stewénius, H. (2006) Scalable Recognition with a Vocabulary
Tree, IEEE Computer Society Conference on Computer Vision and Pattern
Recognition 2006, New York City, 2161-2168.
Oliensis, J. (1999) A Multi-Frame Structure-from-Motion Algorithm under
Perspective Projection, International Journal of Computer Vision, vol. 34,
no. 2/3, pp. 163-192.
Oliensis, J. (2000a) A Critique of Structure-from-Motion Algorithms, Computer
Vision and Image Understanding, vol. 80, pp. 172-214.
III
W OR K S C I TE D
Oliensis, J. (2000b) A new structure-from-motion ambiguity, IEEE Transactions

on Pattern Analysis and Machine Intelligence, vol. 22, no. 7, July, pp. 685-
700.
Oliensis, J. and Govindu, V. (1999) An Experimental Study of Projective
Structure from Motion, IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 21, no. 7, pp. 665-671.
Oxford Dictionaries Online (2013) Oxford Dictionaries: The world's most trusted
dictionaries, [Online], Available:
http://oxforddictionaries.com/definition/english/putative?q=putative [19 July
2013].
Rea, T.A. (2013) Software Tools for Video Processing, Engarde! Consulting,
inc.
Robotics Research Group, University of Oxford (2007) Affine Covariant
Features, 15 July, [Online], Available:
http://www.robots.ox.ac.uk/~vgg/research/affine/index.html [12 August
2013].
Rote, G. (1991) Computing the Minimum Hausdorff Distance Between Two
Point Sets on a Line under Translation, Information Processing Letters,
vol. 38, February, pp. 123-127.
Schmid, C. and Mohr, R. (1997) Local Greyvalue invariants for Image Retrieval,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19,
no. 5, pp. 530-534.
Shewchuck, J.R. (1994) An Introduction to the Conjugate Gradient Method
Without the Agonizing Pain, Pittsburgh, PA: School of Computer Science,
Carnegie Mellon University.
Snavely, N., Seitz, S.M. and Szeliski, R. (2006) Photo Touism: Exploring image
collections in 3D, ACM Transactions on Graphics (Special Interest Group
on Graphics and Interactive Techniques (SIGGRAPH)), Boston.
Terrapoint, Inc. (2007) Deliverable Summary: Terrapoint's Static Metadata and
Laser Parameters, Ottawa: Terrapoint, Inc.
The Mathworks, Inc. (2013) Classification Using Nearest Neighbours, [Online],
Available: http://www.mathworks.co.uk/help/stats/classification-using-
nearest-neighbors.html#bsehylk [10 Sep 2013].
Torr, P.H.S. and Zisserman, A. (1997) Robust Parameterization and
Computation of the Trifocal Tensor, Image and Vision Computing, vol. 15,
no. 8, pp. 591-605.
Triggs, B., McLauchlan, P.F., Hartley, R.I. and Fitzgibbon, A.W. (2000) Bundle
Adjustment - A Modern Synthesis, Vision Algorithms 1999, 298-372.
Tuytelaars, T. and Mikolajczyk, K. (2007) Local Invariant Feature Detectors: A
Survey, Foundations and Trends in Computer Graphics and Vision, vol. 3,
no. 3, pp. 177-280.
IV
W OR K S C I TE D
Vedaldi, A. and Fulkerson, B. (2008) Vision Lab Features Library: An Open and
Portable Library of Computer Vision Algorithms, [Online], Available:
http://www.vlfeat.org/ [11 June 2011].
Wilm, J. (2010) Iterative Closest Point (Matlab Central), 30 May, [Online],
Available: http://www.mathworks.com/matlabcentral/fileexchange/27804-
iterative-closest-point [30 August 2013].
Wu, C. (2007) SiftGPU: A Graphics Processing Unit (GPU) Implementation of
Scale Invariant Feature Transform (SIFT), [Online], Available:
http://cs.unc.edu/~ccwu/siftgpu [11 June 2013].
Wu, C. (2013a) Towards Linear-time Incremental Structure from Motion, 3D
Vision.
Wu, C. (2013b) VisualSFM: A Visual Structure from Motion System, 19 Feb,
[Online], Available:
http://homes.cs.washington.edu/~ccwu/vsfm/doc.html#gui [09 Sep 2013].
Wu, C., Agarwal, S., Curless, B. and Seitz, S.M. (2011) Multicore Bundle
Adjustment, IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, 3057-3064.
Wu, C., Agarwal, S., Curless, B. and Seitz, S.M. (2012) Schematic Surface
Reconstruction, IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, Providence, RI, USA.
V
A N N E X A: S T R U C T U R E FR O M M O TI O N T H E O R Y
Annex A S TRUCTURE FROM M OTION T HEORY

A.1 O VERVIEW
This annex is intended as a companion to the literature review in Chapter 2. It
is strongly suggested that readers who are unfamiliar with the theory of SfM
and/or metric photogrammetry peruse this annex before reading the main
document. The annex is a conceptual and theoretical basis for the research
project as a whole. The first part includes the basic concepts and theory of
SfM. The second part of this annex is covers transformations, and how they are
handled in this research project.
A.1.1 R EFERENCE
One of the most frequently-cited texts in SfM projects and applications is
Richard Hartley and Andrew Zisserman’s (2003) textbook, Multiple View
Geometry, 2nd Edition. This textbook is the principal source of the theory and
equations displayed in this annex.
A-1
A.2 B ASIC C ONCEPTS IN S TRUCTURE FROM M OTION

The ‘structure from motion’ (SfM) problem is to infer the structure of spatial
configurations from a sequence of narrow-field projections (Keonderink and van
Doorn, 1991). These narrow-field projections are, in most cases, images of a
rigid or semi-rigid scene, meaning that the internal geometry of the scene does
not change significantly from image to image. Therefore, among the
assumptions in the SfM problem is the supposition that the changes from image
to image are caused by alterations to the camera parameters, not scene
geometry.
A.2.1 C AMERA P ROJECTION M ATRIX

In general, a camera can be considered to be a mapping of any 3D point Ψ in
the scene, which uses an arbitrary world coordinate system, to a 2D point Ψ on
an image, which uses an image coordinate system. Considering any finite,
central-projection camera, the mapping can be represented by a 3x4 camera
projection matrix P, such that:
ψ = PΨ
Equation 12 - A finite projective camera is represented in SfM theory as a 3x4 projection
matrix P with 11 degrees of freedom (Hartley and Zisserman, 2003: 157).
This notation is adapted from Hartley and Zisserman (2003: 158). Note also
that the projected point Ψ’ is [Ψx, Ψy,cf] in the camera coordinate system (the
camera focal length cf is negative to preserve the orientation of the image; see
Figure 33). When these coordinates are divided through by cf then the
projected point is represented in homogeneous coordinates. P can be
factorized into components as in Equation 13.
( )
P = K c R [I | −t ]
Equation 13 - Decomposition of a finite projective camera matrix P.
A.2.2 C AMERA P ARAMETERS

Individuals who are familiar with metric photogrammetry understand that
camera parameters defining P are divided into two functional groups: the
external parameters and internal parameters. External parameters describe the
transformation of a 3D point from the world coordinate system to the camera
A-2
coordinate system. Internal parameters describe the projection of the point

from the camera coordinate system to the image coordinate system.
Figure 33 – The Camera and Image Coordinate System.
A.2.3 E XTERNAL P ARAMETERS

The camera’s external parameters describe the position and orientation of a
camera’s internal coordinate system with regard to the world coordinate system.
The external parameters can therefore be represented as a rotation matrix cR
and the coordinates of the camera centre in the world coordinate system cO
which transform a point oΨ in the world coordinate system to cΨ in the camera
coordinate system, without actually changing the position of the point as in
Equation 14.
c
(
Ψ = cR o Ψ − cO )
Equation 14 – A point in the scene can be expressed in the coordinate system of the
camera by subtracting the coordinates of the camera centre and then applying a rotation.
A.2.4 I NTERNAL P ARAMETERS

The camera’s internal parameters depend upon the camera model being used,
but are in general the focal length cf, the coordinates [px,py] of the principal point
A-3
p in the image coordinate system and one or more parameters describing the
radial distortion of the image. The principal point p is the point on the image
plane intersected by the camera’s principal axis. For cameras with digital image
sensors, it is helpful to express the focal length cf as well as the image
coordinates of the principal point [px,py] in terms of the number of pixels, where
φ is the focal length in pixels (also called the scale factor), [x0,y0] are the
coordinates of the principal point p in pixels and m is the spatial resolution of the
sensor in pixels per unit length. While infrequent, there is a possibility in some
digital sensors that mx ≠ my. This implies that φx ≠ φy as in Equation 15. It will
also affect the coordinates [x0 y0] in pixels of p as in Equation 16.
[ϕ x ϕ y
T
] = f ⋅ [m
c
x m y]T
Equation 15 - Expression of the scale factors in the x and y directions in terms of number
of pixels by multiplying by the sensor's resolution in pixels per unit length.
[x0 y0 ] = p x
T
[ py ] [m
T
x y m ]
Equation 16 - Expression of the coordinates of the principal point in the image
coordinate system in units of pixels.
In addition, if the sensor axes are not orthogonal, a skew factor s must be
accounted for. The internal parameters contribute to the construction of the
matrix 3x3 matrix K as in Equation 17.
ϕ x s x0 
K =  ϕy y0 
 1 
Equation 17 - Construction of the camera calibration matrix K from the internal
parameters.
A.2.5 R ADIAL D ISTORTION

According to Hartley and Zisserman (2003), the effects of radial distortion are
more pronounced for small focal lengths. Nonetheless, as mentioned in
paragraph A.2.4, a simple radial distortion model is specified by including one or
more radial distortion coefficients in the internal camera parameters. The radial
distortion function L(r) is solely a function of r, where r is the Euclidean distance
A-4
of the distorted pixel from the centre of radial distortion [xc,yc]. According to
Hartley and Zisserman (2003: 190-191), the centre of radial distortion need not
be exactly the principal point, although this is a usual assumption. The
corrected coordinates of each pixel [ẋ, ẏ] can be determined:
[x& y& ] = [xc

T
yc ] + L(r )[( x − xc )
T
( y − yc )]T
Equation 18 - Calculating the undistorted pixel locations using the radial distortion
function.
In practice, the distortion function is given by a Taylor series expansion with a

number of coefficients:
L(r ) = 1 + κ 1r + κ 2 r 2 + K + κ n r n
Equation 19 - Taylor series expansion of the radial distortion function (Keikkilä and
Silvén, 1997).
It is important to note, however, that VisualSFM, the main software application

used to perform reconstructions in this dissertation, uses a single radial
distortion coefficient (Wu, 2013b), which is often adequate for most cases:
( )
L ( r ) = 1 + κ ⋅ ( x − xc ) + ( y − y c ) ≈ 1 + κ 2 ⋅ r 2
2 2
Equation 20 - The image distortion function used by VisualSFM is equivalent to a Taylor

series expansion using only the second coefficient (Wu, 2013b).
In addition, because the focal lengths used in this dissertation were relatively
long, the use of κ = 0 (i.e., a pinhole camera model) in the specified calibration
is justified, especially since the British Aerospace Engineering (BAE) Systems
joint multi-mission electro-optical sensor (JMMES) medium-wavelength infrared
(MWIR) camera is a military-grade surveillance sensor (BAE Systems, 2012).
A-5
A.3 S TRUCTURE AND M OTION

While most often called ‘Structure from Motion’ due to the requirement for
camera centre translation between images to permit 3D reconstruction, the SfM
concept can equally be termed ‘Structure and Motion’ because it involves both
the reconstruction of 3D geometry and the refinement of relative camera pose.
A.3.1 C OMPARISON BETWEEN S F M AND M ETRIC P HOTOGRAMMETRY

As briefly mentioned in paragraph A.2.2, there is a large region of overlap
between the fields of metric photogrammetry and SfM. The geospatial
community is largely more familiar with the discipline of metric photogrammetry
than it is with SfM, but the latter can be seen as a relaxation of the constraints
on the former. Metric photogrammetry is usually conducted with fully calibrated
metric photogrammetric sensors, high-quality global navigation satellite system
(GNSS) and inertial navigation system (INS) equipment on carefully planned
routes, with specific objectives and precise ground control. It focuses on the
minimization of absolute and relative error in the end product through rigorous
post-processing and quality control. By contrast, SfM can make use of many
different types of images captured by a variety of sensors under varying
conditions. For example, SfM techniques have been used to reconstruct
monuments, buildings and even cities from large-scale personal photo
collections obtained via the internet (Agarwal et al., 2009). SfM focuses on
adaptability, robustness, data density and relative geometry; geospatial
accuracy and orientation in worldwide coordinate systems are at best a second
priority.
A-6
Figure 34 – Comparison between Metric Photogrammetry and SfM.
Figure 34 displays some of the contrasts and similarities between SfM and
metric photogrammetry.
A-7
A.3.2 E PIPOLAR G EOMETRY OF THE I MAGE P AIR

The epipolar geometry of an image pair is essentially a set of constraints on any
point projected onto two images. See Figure 35 for a visual depiction of the
epipolar geometry.
Figure 35 – Epipolar Geometry of the Image Pair.
Any point captured by two cameras lies on a plane defined by the two camera
centres and the point. The baseline is the line connecting the two camera
centres. It intersects each image at the epipole. The epipoles are hence the
image of the opposite camera’s centre on each image. The epipolar plane is
projected onto each image as a line, connecting the epipole and the image of
the defining point, and all such epipolar lines intersect the epipole (Hartley and
Zisserman, 2003). The epipolar geometry can be shown to give rise to one of
the most important concepts in two-view geometry: There exists a unique, rank
2, 3x3 matrix F, such that for all corresponding points x’ ↔ x’’:
x'T Fx' ' = 0

Equation 21 – The Coplanarity Constraint.
A-8
The matrix F is termed the fundamental matrix and has several unique
properties.
A.3.3 T HE F UNDAMENTAL M ATRIX F

The fundamental matrix is a powerful tool in two-view geometry, and has
analogous forms for three views (the trifocal tensor, (Torr and Zisserman,
1997)) or more views. Among the useful properties of the fundamental matrix
discussed in Hartley and Zisserman (2003), it represents the relationship (and
can be used to calculate a possible transformation) between two camera
matrices P and P’. F can be calculated from at least seven image-image point
correspondences (x’,y’,x,y) providing that the 3D points are in general position
(i.e. not coplanar). The following is a short description of the 8-point method for
the computation of F taken from Hartley and Zisserman (2003: 279-280):
 x'1 x1 x'1 y1 x'1 y '1 x1 y '1 y1 y '1 x1 y1 1  f11 

Af =  M M M M M M M M M  M  = 0
 x'n xn x'n yn x'n y 'n xn y 'n yn y 'n xn yn 1  f33 
T
Equation 22 – A system of linear equations Af = 0 can be obtained by exploiting x' Fx''=0
for eight or more correspondences x' ↔ x''. Seven correspondences can be used, but
the method is different.
Where fij are the individual elements of F. A least-squares solution for the
elements of F can be obtained using the singular value decomposition (SVD) of
A:
A = UDV T
Equation 23 – The solution for fij which minimizes the sum of squared errors is the last
column of V in the SVD of A (Hartley and Zisserman, 2003: 280).
However, due to the aforementioned image noise as well as possible errors in

feature extraction discussed in Chapter 2, it is highly likely that the solution
obtained by taking the last column of V in Equation 23 will not be singular. To
enforce the singularity of F, Hartley and Zisserman (2003: 281) suggest taking
the SVD of F and substituting D with diag(D11, D22, 0). Then, replace F with F’:
F ' = Udiag (D11 , D22 ,0 )V T

Equation 24 – Enforce the singularity of the matrix F by taking the SVD of F and
substituting D33 = 0.
A-9
In this way, computation of the fundamental matrix F can be conducted using

only image-image correspondences.
A.3.4 C AMERA M ATRICES FROM F

The fundamental matrix represents the correspondence between two views,
and from it, the relationship between two cameras can be established. If, for
example the first camera P from a pair of cameras P, P’ is located at the origin:
P = [I | 0]
Equation 25 – A Simple, canonical camera located at the origin.
Then the general formula for a camera P’ corresponding to a fundamental

matrix F is:
[
P' = [e']× F + e' vT | λe' ]
Equation 26 – General formula for a camera matrix P' given the fundamental matrix F and
corresponding camera at the origin.
Where e’ is the epipole of P’, which is the right null-vector of FT:
F T e' = 0
Equation 27 - e' is the right null-vector of FT and can be obtained by solving for the null
T
space e’ = F \ [0 0 0]
Also [e’]× is a skew-symmetric matrix constructed from e’:
 0 − e'3 e '2 
[e']× =  e'3 0 − e'1 
− e'2 e'1 0 
Equation 28 – Construction of a skew-symmetric matrix based on the epipole e' of P'.
Finally, λ is any scalar and v is any 3-vector.
A.3.5 P ROJECTIVE R ECONSTRUCTION

Once camera matrices have been determined based on fundamental matrices
for each image pair, a projective reconstruction of the 3D points can be
undertaken. The equations can be constructed by grouping Equation 12 for
each reconstructed point and each camera matrix:
A-10
 x11 x12 L x1j 

 1 
X X2 L X j   y1 y12 L y1j 
 P1   1  1 L 1
   Y1 Y2 L Y j   1 
 M  Z = M
Z2 L Z j   xi x2i L x ij 
 Pi   1   1i
  1 L 1  y
 1
1 y2i L y ij 
 
 1 1 L 1 
Equation 29 – One implementation of projective reconstruction.
A.3.6 B UNDLE ADJUSTMENT (BA)

Equation 29 will not be satisfied exactly due to image noise and other sources
of error. Therefore, assuming a Gaussian distribution of error, the objective of
BA is to minimize the sum of the squared differences between the measured
image points and the reprojected image points based on the reconstruction.
The error function can be described as:
2
∑ = ∑ Pˆ i Xˆ j − x ij
2
xˆ ij − x ij
ij ij
Equation 30 – The reprojection error is the sum of squared distances between the
reprojected image points and measured image points (Hartley and Zisserman, 2003: 434).
This is done by varying the camera matrices and the positions of the
reprojected points. Since the computational cost of adjusting both the camera
matrices and the positions of the reprojected points is prohibitive for large
collections of images and points, a number of less-costly solutions are offered
by Hartley and Zisserman (2003). The computational method used by Wu
(2013a) in VisualSFM is to interleave the BA. The actual solution is achieved
by using preconditioned conjugate gradient (PCG) bundle adjustment (Wu et
al., 2011), and camera calibration is refined as images are added to the
reconstruction.
A-11
A.3.7 C ALIBRATION T HROUGH THE P ERSPECTIVE -T HREE -P OINT

(P3P) P ROBLEM
Once a reconstruction has been performed, individual camera calibrations can
be refined from initial estimates by combining equations 1 and 2 (Fischler and
Bolles, 1981):
λiui = KR[I | t ]X i
Equation 31 – Relation between the decomposition of a camera matrix, which is defined
up to scale, and a reprojected point.
The use of this formula on a single view is tranditionally called the P3P problem,
wherein the rotation R and translation t of a camera are sought, when the
camera calibration K is known. The inverse of this problem is to solve for K
when R and t are known. Since R, t, Xi, and ui are known for at least i = 1…3
when three projected points are known, then this equation can be solved for the
camera calibration matrix K, and the camera calibration refined from initial
assumptions.
A.4 T RANSFORMATIONS
The following paragraphs summarize conventions and properties for the
transformations discussed in this research project.
A.4.1 R OTATION AND T RANSLATION

A common way of representing a 3D Euclidean transformation is:
X Transformed = RX Original + t
Equation 32 – Common way of representing Euclidean 3D transformations.
Where R is a rotation about the origin of the coordinate system. This makes the
transformation difficult to intuitively grasp, especially if the centroid of the
dataset is far from the coordinate system’s origin. Instead, Euclidean
transformations R,t are presented in this research project as:
X Transformed = R (X Original − X Original ) + (X Original + t )

Equation 33 – The way 3D Euclidean transformations are represented in this research
project: As rotations and translations about a dataset's centroid.
A-12
Where X is the centroid of the dataset. In this way, a translation of t always

means that the dataset is shifted in the direction and magnitude of t from its
original position. Similarly, affine and similarity transformations are also
represented as transformations about the dataset’s centroid, unless otherwise
stated.
A.4.2 T HE H IERARCHY OF T RANSFORMATIONS

Hartley and Zisserman (2003: 43-44) give a concise summary of the hierarchy
of 3D transformations in a table, which is reproduced in part here:
Table 6 – Summary of the Hierarchy of 3D Transformations in homogeneous coordinates

sorted from the most general (projective) to the most constrained (Euclidean). Each
transformation group inherits the list of invariant properties from the transformation
groups above it. (Hartley and Zisserman, 2003: 78)
Degrees of
Group Matrix Invariant Properties
Freedom
A t • Intersection and tangency of faces in contact

Projective 15
vT k  • Sign of Gaussian curvature
 
•
 A t •
Parallelism of Planes
Volume Ratios
Affine 12
0T 1 • Centroids
  • The Plane at Infinity, π∞
 sR t 
Similarity 7
 0T 1 • The Absolute Conic, Ω∞
 
R t
Euclidean 6
0T 1
• Volume

A-13
A N N E X B: I MA GE D A TA B A S E VBA C OD E
Annex B I M AGE D ATAB ASE VBA C ODE

This annex contains the Microsoft® Visual Basic for Applications (VBA) code for
the image database, which was used to produce the image lists and batch files
necessary for input to VisualSFM.
Option Explicit
Public folderProject, folderInput, nameBatchFile, folderImages As String

Public folderSIFT, folderOther, folderOutput As String
Public fixedCalibration As String
Public desiredPairing As Double
Public minimumPairing As Integer
Public caseSpecifiedMatching As Boolean
Public countImagesTotal, countImagesSkipped, countImagesNotSkipped As Integer
Public caseNumber, caseImages, casePairing As Integer
Public caseDetector, caseEstimation, casePathImages As String
Public imageList, pairList, gcplist As String
Private Sub Form_Load()

On Error GoTo Err_Form_Load
Dim rsImages As DAO.Recordset
MsgBox ("Picking Up Values from TBs.")

folderProject = TB_FolderProject.Value
folderInput = TB_FolderInput.Value
nameBatchFile = TB_NameBatchFile.Value
folderImages = TB_FolderImages.Value
folderSIFT = TB_FolderSift.Value
folderOther = TB_FolderOther.Value
folderOutput = TB_FolderOutput.Value
fixedCalibration = TB_FixedCalibration.Value
desiredPairing = Val(TB_DesiredPairing.Value)
minimumPairing = Val(TB_MinimumPairing.Value)
MsgBox ("Done: " & folderProject & folderInput & folderImages & folderSIFT & folderOther &
folderOutput)
countImagesTotal = 0
countImagesSkipped = 0
countImagesNotSkipped = 0
MsgBox ("Opening full_csv.")

Set rsImages = CurrentDb.OpenRecordset("full_csv")
rsImages.MoveLast
countImagesTotal = rsImages.RecordCount
rsImages.MoveFirst
MsgBox ("countImagesTotal = " & countImagesTotal & ". Looping through full_csv now.")
Do While Not rsImages.EOF
If rsImages("Skip").Value = "Y" Then
countImagesSkipped = countImagesSkipped + 1
End If
rsImages.MoveNext
Loop
MsgBox ("Loop Complete. Closing database.")
rsImages.Close
countImagesNotSkipped = countImagesTotal - countImagesSkipped
MsgBox ("countImagesSkipped = " & countImagesSkipped & ". countImagesNotSkipped = " &
countImagesNotSkipped & ".")
Exit_Form_Load:
MsgBox ("Starting the Exit Sub Routine.")
' Set all private variables to nothing now (if they still exist)
Set rsImages = Nothing
Exit Sub
Err_Form_Load:
MsgBox ("There has been an error.")
'Error Handlers
Resume Exit_Form_Load
End Sub
Private Sub Btn_DoFirst_Click()

MsgBox (Left$("Cats", 2))
End Sub
Private Sub Btn_WriteFiles_Click()

On Error GoTo Err_Btn_WriteFiles_Click
Dim rsCases As DAO.Recordset
B-1
Dim p, u As String
Dim fs As Object
Dim batchFile, outputListFile As Object
Dim strOptions, strInput, strOutput, strPathUserData As String
Set fs = CreateObject("Scripting.FileSystemObject")
Set batchFile = fs.CreateTextFile(folderProject & "\" & folderInput & "\" & nameBatchFile &
".bat")
Set outputListFile = fs.CreateTextFile(folderProject & "\" & folderInput & "\" & nameBatchFile &
"_OutputList" & ".txt")
p = "cd " & Chr(34) & folderProject & Chr(34)
batchFile.Writeline p
p = ""
Set rsCases = CurrentDb.OpenRecordset("cases")

rsCases.MoveFirst
Do While Not rsCases.EOF

caseNumber = rsCases("Case_Number")
caseDetector = rsCases("Detector")
' Added this if statement so that I could execute the SIFT cases first. Been having problems
with the other cases.
If caseDetector = "Other" Then
caseEstimation = rsCases("Focal_Estimation")
caseImages = rsCases("Images")
If caseImages > countImagesNotSkipped Then
caseImages = countImagesNotSkipped
End If
casePairing = Int(caseImages * desiredPairing)
If casePairing < minimumPairing Then
casePairing = minimumPairing
End If
If casePairing > (caseImages + 1) \ 2 - 1 Then
caseSpecifiedMatching = False
Else
caseSpecifiedMatching = True
End If
If casePairing < 1 Then
casePairing = 1
End If
imageList = Format(caseNumber, "000") & "_" & caseDetector & "_" & caseEstimation & "_" &
Format(caseImages, "0000")
If caseSpecifiedMatching Then
imageList = imageList & "_" & Format(casePairing, "000")
Else
imageList = imageList & "_XXX"
End If
pairList = imageList & ".match.txt"
gcplist = imageList & ".txt.gcp"
'__________________________________________________________________
' Deal with the strOptions String
strOptions = "cmd /c VisualSFM sfm"
' If there aren't enough images to ensure that each image is matched with the minimum
number of neightbours,
' Then actually perform full pairwise matching.
' Otherwise, use the pairs list.
'If caseSpecifiedMatching Then
strOptions = strOptions & "+import"
'End If
If caseEstimation = "Spec" Then
strOptions = strOptions & "+k=" & fixedCalibration
ElseIf caseEstimation = "Est_Shared" Then
strOptions = strOptions & "+shared"
End If
strOptions = strOptions & "+sort+gcp"
If caseDetector = "SIFT" Then

casePathImages = "./" & folderImages & "/" & folderSIFT & "/"
ElseIf caseDetector = "Other" Then
casePathImages = "./" & folderImages & "/" & folderOther & "/"
End If
'__________________________________________________________________
' Deal with the strInput String
strInput = imageList & ".txt"
'__________________________________________________________________
' Deal with the strOutput String
strOutput = folderProject & "\" & folderOutput & "\" & imageList & ".nvm"
'__________________________________________________________________
' Deal with the strPathUserData String
strPathUserData = pairList
B-2
'__________________________________________________________________
' Write a command to move the three list files to the VisualSFM directory.
p = "move /y " & folderProject & "\" & folderInput & "\" & imageList & "*.* " &
folderProject
'__________________________________________________________________
' Write the line to the batch file.
p = strOptions & " " & strInput & " " & strOutput
'If caseSpecifiedMatching Then
p = p & " " & strPathUserData
'End If
'__________________________________________________________________
' Write the line to the output list file.
u = strOutput
outputListFile.Writeline u
'__________________________________________________________________
' Write a command to move the three list files back to the Input directory.
p = "move /y " & folderProject & "\" & imageList & "*.* " & folderProject & "\" &
folderInput
'Now Write the image List, Pairs List and gcp List to the Input Folder...
WriteLists
Else
'Do nothing for now.
End If
rsCases.MoveNext
Loop
rsCases.Close
p = "pause"
outputListFile.Close
batchFile.Close
Exit_Btn_WriteFiles_Click:
MsgBox ("Starting the Exit Sub Routine.")
Set rsCases = Nothing
Exit Sub
Err_Btn_WriteFiles_Click:
'Error Handlers
Resume Exit_Btn_WriteFiles_Click
End Sub
Private Sub WriteLists()

'On Error GoTo Err_WriteLists
Dim fs As Object
Dim imageListFile, gcpListFile, imageListSimple, pairListFile As Object
Dim q, r, s, t, u As String
Dim skipFactor, skipCounter As Double
Dim imagesListed, i, j As Integer
Dim imageNames(10000) As String
skipFactor = countImagesNotSkipped / caseImages

Randomize
skipCounter = Rnd * skipFactor
imagesListed = 0
Set imageListFile = fs.CreateTextFile(folderProject & "\" & folderInput & "\" & imageList &
".txt")
Set gcpListFile = fs.CreateTextFile(folderProject & "\" & folderInput & "\" & gcplist)
Set imageListSimple = fs.CreateTextFile(folderProject & "\" & folderInput & "\" & imageList &
".a")
rsImages.MoveFirst
If rsImages("Skip").Value = "Y" Then
skipCounter = skipCounter - 1
ElseIf skipCounter <= 0 Then
If imagesListed < caseImages Then
' Relativepath to where the images will be"
q = casePathImages & rsImages("ImageName")
imageListFile.Writeline q
B-3
u = rsImages("ID")
imageListSimple.Writeline u
r = rsImages("ImageName") & " " & Format(rsImages("POINT_X"), "0.0000") & " " &
Format(rsImages("POINT_Y"), "0.0000") & " " & Format(rsImages("POINT_Z"), "0.0000") & " "
gcpListFile.Writeline r
imageNames(imagesListed) = q
imagesListed = imagesListed + 1
skipCounter = skipCounter - 1 + skipFactor
Else
'Exits the loop
Exit Do
End If
Else
skipCounter = skipCounter - 1
End If
rsImages.MoveNext
Loop
rsImages.Close
imageListSimple.Close
gcpListFile.Close
imageListFile.Close
If caseSpecifiedMatching Then
Set pairListFile = fs.CreateTextFile(folderProject & "\" & folderInput & "\" & pairList)
' For every image file listed in the previous files, take a few images before and after in
sequence. These are image pairs.
' The variable i represents the image being paired, and j is the rover pairing with a set
number of images before and after the
' current image.
For i = 0 To imagesListed - 1
s = imageNames(i)
For j = 1 To casePairing
If i + j >= imagesListed Then
t = imageNames(i - imagesListed + j)
Else
t = imageNames(i + j)
End If
pairListFile.Writeline s & " " & t & " "
Next j
Next i
pairListFile.Close
End If
Exit_WriteLists:
'MsgBox ("Starting the Exit Sub Routine.")
Exit Sub
Err_WriteLists:
'Error Handlers
Resume Exit_WriteLists
End Sub
Private Sub Btn_WriteASIFT_Click()

Dim fs As Object
Dim batchASIFT As Object
Dim u, v As String
Dim k, l, m, binMax As Integer
Dim imageIndex(8, 939) As Integer
'The countImagesNotSkipped evenly divides into 8.

binMax = (countImagesNotSkipped - 1) / 8
MsgBox ("binMax is: " & binMax)

rsImages.MoveFirst
'Skip the first image because it's already done.
rsImages.MoveNext
k = 0
l = 0
If rsImages("Skip") = "N" Then
If l < binMax Then
imageIndex(k, l) = rsImages("ID")
l = l + 1
Else
MsgBox ("The bin has been filled. k and l are: " & k & ", " & l)
k = k + 1
l = 0
imageIndex(k, l) = rsImages("ID")
l = l + 1
B-4
End If
Else
'MsgBox (" Skipped a record. k and l are: " & k & ", " & l)
End If
rsImages.MoveNext
Loop
rsImages.Close
Set batchASIFT = fs.CreateTextFile("D:\Colpitts\Project\Software\demo_ASIFT\" & "Batch_RunASIFT"
& ".bat")
batchASIFT.Writeline "cd D:\Colpitts\Project\Software\demo_ASIFT\"

batchASIFT.Writeline ""
batchASIFT.Writeline "ASIFT.exe"
m = 0
For m = 0 To binMax - 1
u = Format(imageIndex(0, m), "0000")
v = Format(imageIndex(1, m), "0000")
batchASIFT.Writeline "ASIFT.exe .\png\" & u & ".png .\png\" & v & ".png vert.png horz.png
matchings.txt .\png\" & u & ".sift .\png\" & v & ".sift"
Next m
batchASIFT.Writeline "pause"
batchASIFT.Close
End Sub
Private Sub Btn_WriteSIFTBatch_Click()

Dim fs As Object
Dim batchConvert As Object
Dim listfile As Object
Dim startnum As Integer
Dim chunksize As Integer
Dim chunkstart As Integer
Dim chunkend As Integer
Dim endnum As Integer
Dim i As Integer
Dim j As Integer
startnum = 1
chunksize = 100
chunkstart = startnum
chunkend = chunkstart + chunksize - 1
endnum = 7539
i = startnum
j = 1
Set batchConvert = fs.CreateTextFile("D:\Colpitts\Project\Software\VisualSFM\Batch" &
Format(startnum, "0000") & "-" & Format(endnum, "0000") & "batchConvert" & ".bat")
batchConvert.Writeline "cd D:\Colpitts\Project\Software\VisualSFM\"
batchConvert.Writeline ""
Do While i < endnum
'Open a list file

Set listfile = fs.CreateTextFile("D:\Colpitts\Project\Software\VisualSFM\" &
Format(chunkstart, "0000") & "-" & Format(chunkend, "0000") & ".txt")
For j = 1 To chunksize
If i <= endnum Then
batchConvert.Writeline "xcopy
E:\DRDC_Data\DRDC_Ottawa_Imagery\SNAP_NRC_bldg_MWIR_closeup2\ASIFT_Mod\" & Format(i, "0000") & ".sift
D:\Colpitts\Project\Software\VisualSFM\02-Images\02-Other\ /i"
listfile.Writeline "./02-Images/02-Other/" & Format(i, "0000") & ".jpg"
i = i + 1
B-5
Else
Exit For
End If
Next j
listfile.Close
batchConvert.Writeline "cmd /c VisualSFM sfm+nomatch+skipsfm " & Format(chunkstart, "0000") &
"-" & Format(chunkend, "0000") & ".txt output.nvm"
If i > endnum Then

Exit Do
Else
chunkstart = i
If chunkstart + chunksize < endnum Then
chunkend = chunkstart + chunksize - 1
Else
chunkend = endnum
End If
End If
Loop
batchConvert.Writeline ""
batchConvert.Writeline "pause"
batchConvert.Close
End Sub
Private Sub Form_Close()
End Sub
B-6
A N N E X C: SURF F E A TU R E E X T R A C TI O N AND M A TC H I N G M A TL A B ® C O D E
Annex C SURF F EATURE E XTRACTION AND

M ATCHING M ATLAB ® C ODE
This annex contains the Matlab® scripts and functions necessary to perform
SURF feature extraction matching, and to write the appropriate output files in
the appropriate format.
% Matlab Script CollateAndMatch.m
% Written by Andrew Colpitts
% This file opens a csv file, which contains a list of lists. Each case
% in the CSV file dictates what matching sequence should be used.
% A gigantic 7539 by 7539 logical matching matrix is produced, and this is
% used to generate a list of images that need to have SURF features
% extracted.
% Next, the script calls a function to extract SURF features from the
% images which are listed on the SURF list.
% Finally, the script calls a function to match all of the pairs indicated
% on the logical matching matrix.
% Set up variables and the workspace

clear;
clc;
% I have listed all of the pathnames and filenames here

fname_csv = 'Cases_Other.csv';
pname_csv = 'D:\Research_Project\Matlab\interest_points\CSV\';
pname_list = 'D:\Research_Project\Matlab\interest_points\List\';
fname_match = 'match_master.txt';
pname_match = 'D:\Research_Project\Matlab\interest_points\Match\';
pname_surf = 'D:\Research_Project\Matlab\interest_points\SIFT\';
pname_image = 'D:\Research_Project\Matlab\interest_points\Images\';
fname_matcharray_reqr = 'matcharray_reqr.mat';
fname_matcharray_done = 'matcharray_done.mat';
fname_surfvect_reqr = 'surfvect_reqr.mat';
fname_surfvect_done = 'surfvect_done.mat';
fname_feature = 'feature.mat';
fname_validpoint = 'validpoint.mat';
fname_matchindex = 'matchindex.mat';
pname_variable = 'D:\Research_Project\Matlab\interest_points\Variable\';
% All of the specific formats are listed here

format_header = '%s %u\n';
format_num = '%04.0f';
format_csv_header = '%s %s %s %s %s %s %s\n';
format_csv_field = '%u %u %s %s %s %u %s\n';
format_imglead = './02-Images/02-Other/';
format_point = '%3.6f %3.6f %3.6f %1.6f ';
format_feat = [repmat('%u ',1,127),'%u\n'];
zeros_feat = zeros(1,128);
% Initialize script settings here

overwrite_surf = false;
do_matching = true;
rematch = false;
match_reciprocal = false;
nimage_all = 7539;
% Attempts to reload all of the variables to resume work;

exist_matcharray_reqr = (exist([pname_variable,fname_matcharray_reqr],'file') == 2);
exist_matcharray_done = (exist([pname_variable,fname_matcharray_done],'file') == 2);
exist_surfvect_reqr = (exist([pname_variable,fname_surfvect_reqr],'file') == 2);
exist_surfvect_done = (exist([pname_variable,fname_surfvect_done],'file') == 2);
exist_surf_reqr = true;
exist_surf_file = false(nimage_all,1);
exist_feature = (exist([pname_variable,fname_feature],'file') == 2);
exist_validpoint = (exist([pname_variable,fname_validpoint],'file') == 2);
exist_matchindex = (exist([pname_variable,fname_matchindex],'file') == 2);
exist_matchfile = (exist([pname_match,fname_match],'file') == 2);
% Attempts to reload all of the variables to resume work; to reset this

% project, just delete all of the variables from pname_variable.
if exist_matcharray_reqr
disp('Loading the matcharray_reqr variable.');
load([pname_variable,fname_matcharray_reqr],'matcharray_reqr');
else matcharray_reqr = false(nimage_all); end
if exist_matcharray_done
disp('Loading the matcharray_done variable.');
C-1
load([pname_variable,fname_matcharray_done],'matcharray_done');
else matcharray_done = false(nimage_all); end
if exist_surfvect_reqr
disp('Loading the surfvect_reqr variable.');
load([pname_variable,fname_surfvect_reqr],'surfvect_reqr');
else surfvect_reqr = []; end
if exist_surfvect_done
disp('Loading the surfvect_done variable.');
load([pname_variable,fname_surfvect_done],'surfvect_done');
else surfvect_done = false(nimage_all,1); end
if exist_feature
disp('Loading the feature variable.');
load([pname_variable,fname_feature],'feature');
else feature = cell(nimage_all,1); end
if exist_validpoint
load([pname_variable,fname_validpoint],'validpoint');
else validpoint = cell(nimage_all,1); end
if exist_matchindex
disp('Loading the matchindex variable.');
load([pname_variable,fname_matchindex],'matchindex');
else matchindex = []; end
% Empty structure for the listcase csv

listcase = [];
listcase = setfield(listcase,'Case_Number',[]); % uint8
listcase = setfield(listcase,'Images',[]); % uint32
listcase = setfield(listcase,'Detector',[]); % string
listcase = setfield(listcase,'Focal_Estimation',[]); % string
listcase = setfield(listcase,'Matching',[]); % string
listcase = setfield(listcase,'Pair',[]); % uint8
listcase = setfield(listcase,'Name',[]); % string
% Reads the list file

% open file in read text mode
[fid_csv,Msg] = fopen([pname_csv,fname_csv],'rt');
if fid_csv == -1, error(Msg); end
clear Msg;
fieldname_csv = textscan(fid_csv,format_csv_header,1,'delimiter',',');
clear fieldname_csv; % no use for this right now.
Buf = textscan(fid_csv,format_csv_field,'delimiter',',');
fclose(fid_csv);
clear fid_csv;
listcase.Case_Number = Buf{1,1};
listcase.Images = Buf{1,2};
listcase.Detector = Buf{1,3};
listcase.Focal_Estimation = Buf{1,4};
listcase.Matching = Buf{1,5};
listcase.Pair = Buf{1,6};
listcase.Name = Buf{1,7};
clear Buf;
if ~exist_matcharray_reqr
% Rebuild the matcharray_reqr variable
disp('Rebuilding the matcharray_reqr variable.');
ncase = size(listcase.Case_Number,1);
for i = 1:ncase
fname_list = listcase.Name{i,1};
[fid_list,Msg] = fopen([pname_list,fname_list],'rt');
if fid_list == -1, error(Msg); end
clear Msg;
listimg = textscan(fid_list,'%u\n');
fclose(fid_list);
listimg = listimg{1,1};
colsize = size(listimg,1);
if strcmp(listcase.Matching{i,1},'Full')
for j = 1:colsize
for k = 1:j
if listimg(j,1) == listimg(k,1)
% do nothing
else
matcharray_reqr(listimg(j,1),listimg(k,1)) = true;
matcharray_reqr(listimg(k,1),listimg(j,1)) = true;
end
end
end
else
% Build listimg into an array of matches, where the first column is
% the left match and the remaining columns are the right match.
for j = 2:listcase.Pair(i,1) + 1
nextcol = zeros(colsize,1);
C-2
nextcol(1:end-1,1) = listimg(2:end,j-1);
nextcol(end,1) = listimg(1,j-1);
listimg = [listimg,nextcol];
end
% Now cycle through the match list and set the image pairs in the
% matcharray_reqr to true
for j = 1:colsize
for k = 2:listcase.Pair(i,1) + 1
matcharray_reqr(listimg(j,1),listimg(j,k)) = true;
matcharray_reqr(listimg(j,k),listimg(j,1)) = true;
end
end
end
end
% Now save the matcharray_reqr variable
save([pname_variable,fname_matcharray_reqr],'matcharray_reqr');
end
fclose('all');
clear ans;
clear colsize;
clear fid_list;
clear listimg;
clear ncase;
clear nextcol;
%%%%%%% Remove this line after testing! %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% matcharray_reqr = false(nimage_all);
% matcharray_done = false(nimage_all);
% matcharray_reqr(200,100) = true;
% for i = 1:50
% matcharray_reqr(100*i,100*(i+1)) = true;
% matcharray_reqr(100*(i+1),100*i) = true;
% end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The total number of matches required

nmatch_reqr = sum(sum(matcharray_reqr));
if ~match_reciprocal
nmatch_reqr = nmatch_reqr / 2;
end
disp(['There are a total of ',num2str(nmatch_reqr),' matches required.']);
% The total number of matches done already

nmatch_done = sum(sum(matcharray_done));
if ~match_reciprocal
nmatch_done = nmatch_done / 2;
end
disp(['There are ',num2str(nmatch_done),' matches already done.']);
% The total number of matches still left to be calculated.

nmatch_pend = nmatch_reqr - nmatch_done;
disp(['There are ',num2str(nmatch_pend),' matches left to be done.']);
% If the matchindex variable doesn't exist, then create it.

if ~exist_matchindex
disp('The matchindex variable does not exist. Creating.');
matchindex = cell(nmatch_reqr,2);
end
% If the surfvect_reqr variable doesn't exist, create and save it.

if ~exist_surfvect_reqr
disp('The surfvect_reqr variable does not exist. Calculating.');
surfvect_reqr = any(matcharray_reqr,2);
save([pname_variable,fname_surfvect_reqr],'surfvect_reqr');
end
% If the surfvect_done variable doesn't exist, create it.

if ~exist_surfvect_done
disp('The surfvect_done variable does not exist. Creating.');
surfvect_done = false(nimage_all,1);
end
% Check if there are missing surf files;

for i = 1:nimage_all
if surfvect_reqr(i,1)
fname_surf = [num2str(i,format_num),'.sift'];
if (exist([pname_surf,fname_surf],'file') == 2)
exist_surf_file(i,1) = true;
else
C-3
exist_surf_reqr = false;
surfvect_done(i,1) = false;
end
end
end
if ~exist_surf_reqr
disp('There are also missing surf files.');
end
% The total number of SURF extractions required

nsurf_reqr = sum(surfvect_reqr);
disp(['There are a total of ',num2str(nsurf_reqr),' feature extractions required.']);
% The total number of extractions done already

nsurf_done = sum(surfvect_done);
disp(['There are a total of ',num2str(nsurf_done),' feature extractions done already.']);
% The total number of extractions left to do.

nsurf_pend = nsurf_reqr - nsurf_done;
disp(['There are a total of ',num2str(nsurf_pend),' feature extractions remaining to do.']);
csurf = 0;
% Now perform the missing feature extractions
if ~((nsurf_pend == 0) && exist_surf_reqr && exist_surfvect_done && exist_feature &&
exist_validpoint)
fname_image = [num2str(i,format_num),'.jpg'];
if ~surfvect_done(i,1)
csurf = csurf + 1;
disp(['SURF extraction # ',num2str(csurf),' of ',num2str(nsurf_pend),' on image:
',fname_image]);
pixel = rgb2gray(imread([pname_image,fname_image]));
point = detectSURFFeatures(pixel);
[feature{i,1},validpoint{i,1}] = extractFeatures(pixel,point);
if ~exist_surf_file(i,1)
fname_surf = [num2str(i,format_num),'.sift'];
npoint = size(validpoint{i,1},1);
[fid_surf] = fopen([pname_surf,fname_surf],'wt');
if fid_surf == -1, error(Msg); end
clear Msg;
fprintf(fid_surf,'%u\n',npoint);
fprintf(fid_surf,'%u\n',128);
for j = 1:npoint
% Remove these two corrections if they are found to not be necessary
% The first correction changes the image coordinate system to match
% Lowe's (2004) output.
% The second correction just ensures that the orientation is in the
% range -pi to pi
loc = validpoint{i,1}(j,1).Location;
loc_x = loc(1,1);
loc_y = loc(1,2);
fprintf(fid_surf,format_point,...
loc_x,...
loc_y,... % first correction
validpoint{i,1}(j,1).Scale,...
rem(validpoint{i,1}(j,1).Orientation+pi,2*pi)-pi); % corr 2
fprintf(fid_surf,format_feat,zeros_feat);
end
fclose(fid_surf);
end
surfvect_done(i,1) = true;
else
disp(['SURF extraction on image: ',fname_image,' is reported as complete.
Skipping.']);
end
end
end
% Now save the surfvect_done, feature and validpoint variables
disp('Saving the surfvect_done variable.');
save([pname_variable,fname_surfvect_done],'surfvect_done');
disp('Saving the feature variable.');
save([pname_variable,fname_feature],'feature');
disp('Saving the validpoint variable.');
save([pname_variable,fname_validpoint],'validpoint');
end
fclose('all');
clear ans;
clear csurf;
clear fid_surf;
clear loc;
clear loc_x;
clear loc_y;
clear pixel;
C-4
clear point;
cmatch = 0;
cmatch_bin = 0;
cmatch_binsize = 10000;
% Now actually perform the matching and output to an index
if (nmatch_pend > 0) && do_matching
disp_line = true;
for j = 1:i
if matcharray_reqr(i,j)
cmatch = cmatch + 1;
cmatch_bin = cmatch_bin + 1;
if ~matcharray_done(i,j)
if disp_line
disp(['Match # ',num2str(cmatch),' of ',num2str(nmatch_reqr),' with images:
',num2str(i),' & ',num2str(j)]);
disp_line = false;
end
matchindex{cmatch,2} = matchFeatures(feature{i,1},feature{j,1},'Prenormalized',
true) - 1;
matchindex{cmatch,1} = [i,j,size(matchindex{cmatch,2},1)];
else
if disp_line
disp(['Match # ',num2str(cmatch),' of ',num2str(nmatch_reqr),' with images:
',num2str(i),' & ',num2str(j),' is already stored. Skipping']);
disp_line = false;
end
end
matcharray_done(i,j) = true;
if cmatch_bin >= cmatch_binsize
cmatch_bin = 0;
% Now back-up the matchindex and the matcharray_done
disp('Saving mathindex.');
save([pname_variable,fname_matchindex],'matchindex');
disp('Saving matcharray_done.');
save([pname_variable,fname_matcharray_done],'matcharray_done');
end
end
end
end
end
% Now save matchindex and matcharray_done
disp('Saving the mathindex variable.');
save([pname_variable,fname_matchindex],'matchindex');
disp('Saving matcharray_done variable.');
save([pname_variable,fname_matcharray_done],'matcharray_done');
end
clear cmatch;
clear do_matching;
clear cmatch_bin;
clear cmatch_binsize;
clear disp_line;
% Any change to the matching requires the text file to be completely

% rewritten.
disp('Writing the master match file.');
[fid_match] = fopen([pname_match,fname_match],'wt');
if fid_match == -1, error(Msg); end
clear Msg;
for i = 1:size(matchindex,1)
fprintf(fid_match,format_header,[format_imglead,num2str(matchindex{i,1}(1,1),format_num),'.jpg
',format_imglead,num2str(matchindex{i,1}(1,2),format_num),'.jpg'],[matchindex{i,1}(1,3)]);
format_index = [repmat('%u ',1,matchindex{i,1}(1,3) - 1),'%u\n'];
fprintf(fid_match,format_index,matchindex{i,2}(:,1)');
fprintf(fid_match,format_index,matchindex{i,2}(:,2)');
end
fclose(fid_match);
clear fid_match;
disp('Done writing, process complete!')
clear exist_feature;
clear exist_matcharray_reqr;
clear exist_matcharray_done;
clear exist_matchfile;
clear exist_matchindex;
clear exist_surf_file;
clear exist_surf_reqr;
clear exist surfvect_done;
clear exist_surfvect_reqr;
clear exist_validpoint;
clear fname_csv;
clear fname_feature;
C-5
clear fname_image;
clear fname_list;
clear fname_match;
clear fname_matcharray_done;
clear fname_matcharray_reqr;
clear fname_matchindex;
clear fname_surf;
clear fname_ surfvect_done;
clear fname_surfvect_reqr;
clear fname_validpoint;
clear format_csv_field;
clear format_csv_header;
clear format_feat;
clear format_header;
clear format_imghead;
clear format_index;
clear format_point;
clear i;
clear j;
clear k;
clear match_reciprocal;
clear nimage_all;
clear nmatch_done;
clear nmatch_pend;
clear nmatch_reqr;
clear npoint;
clear nsurf_done;
clear nsurf_reqr;
clear overwrite_surf;
clear pname_csv;
clear rematch;
clear zerod_feat;
fclose('all');
clear ans;
C-6
A N N E X D: P OI N T C L OU D A N A L Y S I S M A TL A B ® C O D E
Annex D P OINT C LOUD A NALYSIS M ATLAB ® C ODE

This annex contains the Matlab® code necessary to process the produced
models and to perform calculations.
% MATLAB Script nvmconvert.m
% Written by Andrew Colpitts 'colpitts2010@gmail.com'
% This script does the following:
% 0) Clear workspace and set up parameters
% 1) Reads a list of filenames and paths to open nvm files and save plys
% 2) Reads the reference data file as a ply file
% <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
% Now open the loop:
% 3) Reads an nvm file from the list
% 4) Trims the nvm file to the desired horizontal extents
% 5) Reads the tie point vertical values from the reference data
% 6) Reads the tie point vertical values from the nvm file
% 7) RECORDS:
% - Whether or not the reconstruction was generally successful
% (if not successful, then skip all of the next steps in the loop)
% - The total number of reconstructed points
% - The mean and variance of distribution of the points in x,y,z
% - The number of outlier points at alpha = 0.001 for the z direction
%
% 8) Calculates the vertical offset and adjusts the nvm height values
% 9) RECORDS the height offset
% 10) ICP <<<<<---- to be added later
% 11) RECORDS the global transformation values for ICP
% 12) Hausdorff Distance <<<<<< to be added later
% 13) RECORDS the mean Hausdorff Distance
% 14) Saves an image of the Hausdorff distribution?
% 15) Saves the modified nvm file as a PLY file
% Now close the loop and goto step 3 to load the next nvm
% >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
% 16) Now write all observations to an output CSV file with proper headers
% 17) Close all files and clear the workspace
% 0) Clear workspace and set up parameters and records structure

% Reads the reference data file as a ply file
clear;
clc;
disp('Get reference data.')

[element_ref] = setref;
disp('Set up parameters.')
[param] = setparam(element_ref);
% 1) Reads a list of filenames and paths to open nvm files and save plys
% also sets up the record structure for recording the performance of each
% model.
disp('Get List and set up record structure.')
[listcase,record] = setlist(param);
% <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
% Now open the loop:
disp('Open the loop.')
for i = 1:listcase.numcase
% Reset element_in
element_in = [];
record.success(i,1) = true; % Innocent until proven guilty.

record.centroidcorr(i,1) = false;
disp('##############################################################');
% 3) Reads an nvm file (or a ply file) from the list
disp(['Opening file: ',listcase.fname_load{i,1}])
if strcmp(listcase.ftype_load{i,1},'nvm')
element_in = nvmread(...
listcase.fname_load{i,1},listcase.pname_load{i,1});
elseif strcmp(listcase.ftype_load{i,1},'ply')
element_in = plyread(...
[listcase.pname_load{i,1},listcase.fname_load{i,1}]);
end
record.glbl.point(i,1) = 0;
record.glbl.dirstat.mean.x(i,1) = 0;
record.glbl.dirstat.mean.y(i,1) = 0;
record.glbl.dirstat.mean.z(i,1) = 0;
record.glbl.dirstat.var.x(i,1) = 0;
record.glbl.dirstat.var.y(i,1) = 0;
D-1
record.glbl.dirstat.var.z(i,1) = 0;
record.glbl.adjust.offset_x(i,1) = 0;
record.glbl.adjust.offset_y(i,1) = 0;
record.glbl.adjust.offset_z(i,1) = 0;
record.glbl.adjust.icp.rmse.first(i,1) = 0;
record.glbl.adjust.icp.rmse.last(i,1) = 0;
if size(element_in.vertex.x,1) < 10
% Code to skip the rest of the loop.
disp('Reconstruction NOT successful.')
record.success(i,1) = false;
else
% 5) Reads the tie point vertical values from the reference data
% 6) Reads the tie point vertical values from the nvm file
% 8) Calculates the vertical offset and adjusts the nvm height values
disp('Get tie point height values.')
[param.tiepoint.in.coord_z{i,1},...
param.tiepoint.in.vect_logic{i,1}] = tiezval(element_in,param);
if any(param.tiepoint.in.vect_logic{i,1})
tiept_z = param.tiepoint.in.coord_z{i,1};
vect_logic = param.tiepoint.in.vect_logic{i,1};
% Find the overall offset by averageing the tie points
record.glbl.adjust.offset_z(i,1) = mean(...
param.tiepoint.ref.coord_z(vect_logic) - ...
tiept_z(vect_logic));
% Apply the height adjustment to the input data.
disp('Apply the height adjustment to the input data.')
element_in.vertex.z = ...
element_in.vertex.z + record.glbl.adjust.offset_z(i,1);
clear tiept_z;
clear vect_logic;
% Brute Force Horizontal Alignment
% This algorithm tries to align the model to the data in both the x
% and y directions at the same time. It starts with the x
% direction because that is typically worse than the y direction,
% then it moves to the y direction, in both cases moving the window
% before finding a local minima, adjusting the window size and then
% switching axes
disp('Attempting Brute Force Horizontal Bracketing Alignment.')
xguess = param.alignwindow.xguess;
xrange = param.alignwindow.xrange;
xprecis = param.alignwindow.xprecis;
yguess = param.alignwindow.yguess;
yrange = param.alignwindow.yrange;
yprecis = param.alignwindow.yprecis;
dist = zeros(3,1);
% This variable tells us whether the algorith is currently
% searching along the X direction or the y direction
x_or_y = true; % true for x, false for y
while (xrange > xprecis) || (yrange > yprecis)
% Starting positions for the x and y offsets
if x_or_y % true for x, false for y
yoffset = yguess;
str = sprintf(['The xguess is: ',num2str(xguess),'m. The xrange is:
',num2str(xrange),'m.']);
disp(str);
else
xoffset = xguess;
str = sprintf(['The yguess is: ',num2str(yguess),'m. The yrange is:
',num2str(yrange),'m.']);
disp(str);
end
for j = 1:3
xoffset = xguess + (j - 2) * xrange;
else
yoffset = yguess + (j - 2) * yrange;
end
% Replicate element_in
element_br = element_in;
% Apply the current guess to the model.
element_br.vertex.x = element_br.vertex.x + xoffset;
element_br.vertex.y = element_br.vertex.y + yoffset;
% Trim and the element structure to the desired extents
element_br = meshtrim(element_br,param);
[~,distvect] = knnsearch(param.NS,...
[element_br.vertex.x,...
element_br.vertex.y,...
element_br.vertex.z],...
'K',1,'Distance','euclidean','IncludeTies',false);
% take only the bext 90% (rounded down) or matches

len = floor(size(distvect,1) * 0.9);
D-2
distvect = distvect(1:len);
% dist will be the sum of squared distances

% (minimize the sum of squared errors)
dist(j,1) = sum(distvect .^ 2);
end
[~,idx] = min(dist);
% The search window can be adjusted for fit around the
% newly found local minima and then the search run on a
% window half as wide. This is the essence of
% bracketing, so that a precise value can quickly be
% obtained with few iterations
xguess = xguess + (idx - 2) * xrange;
xrange = xrange / 2;
x_or_y = false; % true for x, false for y
else
yguess = yguess + (idx - 2) * yrange;
yrange = yrange / 2;
x_or_y = true; % true for x, false for y
end
end
xoffset = xguess;
yoffset = yguess;
str = sprintf(['Successfully found a local minimum with the required precision.\n The
xoffset is: ',num2str(xoffset),'m. The xrange is: ',num2str(xrange),' m.\n The yoffset is:
',num2str(yoffset),'m. The yrange is: ',num2str(yrange),' m.']);
disp(str);
% Now store the x_offset and y_offset and apply it to the data.
record.glbl.adjust.offset_x(i,1) = xoffset;
record.glbl.adjust.offset_y(i,1) = yoffset;
element_in.vertex.x = element_in.vertex.x + xoffset;
element_in.vertex.y = element_in.vertex.y + yoffset;
end
clear xguess;
clear xoffset;
clear xprecis;
clear xrange;
clear yguess;
clear yoffset;
clear yprecis;
clear yrange;
clear x_or_y;
clear idx;
clear dist;
clear distvect;
clear len;
clear element_br;
clear centroid_offset;
clear model_bar;
clear data_bar;
% Iterative Closest Point (ICP)

disp('Executing ICP.')
ticID_ICP = tic;
[record.glbl.adjust.icp.mat_r{i,1},...
record.glbl.adjust.icp.vect_t{i,1},...
rmse_vect,rmse_first,rmse_last] = icp_colpitts(element_in,element_ref,param);
if size(rmse_vect,1) == 0
disp('ICP failed.')
end
record.glbl.adjust.icp.rmse.first(i,1) = rmse_first;
record.glbl.adjust.icp.rmse.last(i,1) = rmse_last;
clear rmse_vect;
timeElapsed_ICP = toc(ticID_ICP);
disp(['Elapsed time: ',num2str(timeElapsed_ICP,2)])
clear timeElapsed_ICP;
clear ticID_ICP;
clear rmse_first;
clear rmse_last;
% Now apply the rotation matrix and translation to the verteces

disp('Applying the transformation to the verteces.')
model = [element_in.vertex.x,...
element_in.vertex.y,...
element_in.vertex.z];
% Finds the model centroid and the centred model
m = size(model,1);
model_bar = mean(model,1);
model_centred = model - repmat(model_bar,m,1);
% Apply the transformation

model = (record.glbl.adjust.icp.mat_r{i,1} * model_centred')' +...
D-3
repmat(record.glbl.adjust.icp.vect_t{i,1},m,1) +...
repmat(model_bar,m,1);
element_in.vertex.x = model(:,1);
element_in.vertex.y = model(:,2);
element_in.vertex.z = model(:,3);
clear model;
clear m;
clear model_bar;
clear model_centred;
% Try to trim the mesh now; if there are no points in the mesh,
% then it's not successful;
element_tr = meshtrim(element_in,param);
if size(element_tr.vertex.x,1) < 10
% Code to skip the rest of the loop.
disp('Reconstruction NOT successful.')
end
if record.success(i,1)
% Trim and the nvm for the desired extents
disp('Trim the input mesh.')
element_in = meshtrim(element_in,param);
% Nearest point on cloud

disp('Calculating closest points.')
ticID_ClosestPoint = tic;
[~,neardist] = knnsearch(...
param.NS,...
[element_in.vertex.x,...
element_in.vertex.z],...
'K',1,...
'Distance','euclidean',...
'IncludeTies',false...
);
record.glbl.qual.neardist{i,1} = neardist;
timeElapsed_ClosestPoint = toc(ticID_ClosestPoint);
disp(['Elapsed time: ',num2str(timeElapsed_ClosestPoint,2)])
clear timeElapsed_ClosestPoint;
clear ticID_ClosestPoint;
end
% RECORDS how many points are in the model

disp('Record the number of points.')
record.glbl.point(i,1) = size(element_in.vertex.x,1);
% Directional statistics
disp('Generate directional statistics.')
record.glbl.dirstat.mean.x(i,1) = mean(element_in.vertex.x);
record.glbl.dirstat.mean.y(i,1) = mean(element_in.vertex.y);
record.glbl.dirstat.mean.z(i,1) = mean(element_in.vertex.z);
record.glbl.dirstat.var.x(i,1) = var(element_in.vertex.x);
record.glbl.dirstat.var.y(i,1) = var(element_in.vertex.y);
record.glbl.dirstat.var.z(i,1) = var(element_in.vertex.z);
% Write an output ply file in original colours

disp('Writing the ply file in original colours.')
plywrite(element_in,...
[listcase.pname_save{i,1},listcase.fname_save{i,1}],...
param.format.output.ply);
% End of Section to gather stats and write an original-colour ply

% file.
if record.success(i,1)
% Now write a new ply file, but with colours ranging from blue
% (close) to yellow (far) closest point distance.
% Determine the minimum value for the gradient. All distance

% values equal to or below this value will be given a colour of
% blue (0,0,255). The distance value is 0 or (mu - 2.5 sigma),
% whichever is greater, where mu and sigma are the mean and
% stdev of the distance vector.
low = mean(neardist) - 1.96 * std(neardist);
if low < 0
low = 0;
end
high = mean(neardist) + 1.96 * std(neardist);
vect_dist = [];
vect_logic_low = [];
vect_logic_high = [];
D-4
vect_red = [];
vect_green = [];
vect_blue = [];
rise_red = 255;
rise_green = 255;
rise_blue = -255;
run = high - low;
a_red = rise_red / run;
a_green = rise_green / run;
a_blue = rise_blue / run;
b_red = 0;
b_green = 0;
b_blue = 255;
vect_dist = neardist;
vect_logic_low = (vect_dist <= low);
vect_logic_high = (vect_dist >= high);
vect_logic_mid = ~any([vect_logic_low,vect_logic_high],2);
vect_red = element_in.vertex.diffuse_red;
vect_green = element_in.vertex.diffuse_green;
vect_blue = element_in.vertex.diffuse_blue;
% Equations for red colour.

vect_red(vect_logic_low) = uint8(b_red);
vect_red(vect_logic_high) = uint8(255 - b_red);
vect_red(vect_logic_mid) = ...
uint8(a_red .* (vect_dist(vect_logic_mid) - low) + b_red);
% Equations for green colour.

vect_green(vect_logic_low) = uint8(b_green);
vect_green(vect_logic_high) = uint8(255 - b_green);
vect_green(vect_logic_mid) = ...
uint8(a_green .* (vect_dist(vect_logic_mid) - low) + b_green);
% Equations for blue colour.

vect_blue(vect_logic_low) = uint8(b_blue);
vect_blue(vect_logic_high) = uint8(255-b_blue);
vect_blue(vect_logic_mid) = ...
uint8(a_blue .* (vect_dist(vect_logic_mid) - low) + b_blue);
% Replace the colour values with the calculated colour values

element_in.vertex.diffuse_red = vect_red;
element_in.vertex.diffuse_green = vect_green;
element_in.vertex.diffuse_blue = vect_blue;
% Clear the variables created in this section

clear a_blue;
clear a_green;
clear a_red;
clear b_blue;
clear b_green;
clear b_red;
clear high;
clear low;
clear rise_blue;
clear rise_green;
clear rise_red;
clear run;
clear vect_blue;
clear vect_green;
clear vect_red;
clear vect_dist;
clear vect_logic_high;
clear vect_logic_low;
clear vect_logic_mid;
clear neardist;
% 15.5) Saves the modified nvm file as a PLY file

disp('Writing the ply file in false colours.')
plywrite(element_in,...
[listcase.pname_coloursave{i,1},listcase.fname_coloursave{i,1}],...
param.format.output.ply);
end % End of Section to write coloured ply file.
end % End of Section for at least one reprojected point

if ~record.success(i,1);
record.glbl.adjust.icp.mat_r{i,1} = zeros(3,3);
record.glbl.adjust.icp.vect_t{i,1} = zeros(1,3);
end
end
% Now close the loop and goto step 3 to load the next nvm
% >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
clear i;
% 16) Now write all RECORDS to an output CSV file with proper headers
D-5
% open file in write text mode

disp('Writing the main output csv.')
[fid_output_main,Msg] = fopen(...
[param.format.output.csv.main.pname_save,...
param.format.output.csv.main.fname_save],'wt');
if fid_output_main == -1, error(Msg); end
clear Msg;
format_field = [repmat('%s,',1,31),'%s\n'];
fprintf(fid_output_main,format_field,...
param.format.output.csv.main.fieldname{1,:});
clear format_field;
nrows = size(listcase.casenumber,1);
for j = 1:nrows
fprintf(fid_output_main,param.format.output.csv.main.data,...
listcase.casenumber(j,1),...
listcase.imagecount(j,1),...
listcase.detector{j,1},...
listcase.estimation{j,1},...
listcase.matching{j,1},...
listcase.pair(j,1),...
record.success(j,1),...
record.centroidcorr(j,1),...
record.glbl.point(j,1),...
record.glbl.dirstat.mean.x(j,1),...
record.glbl.dirstat.mean.y(j,1),...
record.glbl.dirstat.mean.z(j,1),...
record.glbl.dirstat.var.x(j,1),...
record.glbl.dirstat.var.y(j,1),...
record.glbl.dirstat.var.z(j,1),...
record.glbl.adjust.offset_x(j,1),...
record.glbl.adjust.offset_y(j,1),...
record.glbl.adjust.offset_z(j,1),...
record.glbl.adjust.icp.rmse.first(j,1),...
record.glbl.adjust.icp.rmse.last(j,1),...
record.glbl.adjust.icp.mat_r{j,1}(1,1),...
record.glbl.adjust.icp.vect_t{j,1}(1,1),...
record.glbl.adjust.icp.vect_t{j,1}(1,2),...
record.glbl.adjust.icp.vect_t{j,1}(1,3)...
);
end
fclose(fid_output_main);
clear nrows;
clear j;
clear fid_output_main;
% 16) Now write all dist vectors to an output CSV file with proper headers
% Obtain the format for the field name
for k = 1:(size(listcase.casenumber,1)-1)
param.format.output.csv.dist.fieldname = [...
param.format.output.csv.dist.fieldname,'%s,'];
param.format.output.csv.dist.data = [...
param.format.output.csv.dist.data,'%4.6f,'];
end
clear k;
param.format.output.csv.dist.fieldname = [...
param.format.output.csv.dist.fieldname,'%s\n'];
param.format.output.csv.dist.data = [...
param.format.output.csv.dist.data,'%4.6f\n'];
% open file in write text mode

disp('Writing the dist output csv.')
[fid_output_dist,Msg] = fopen(...
[param.format.output.csv.dist.pname_save,...
param.format.output.csv.dist.fname_save],'wt');
if fid_output_dist == -1, error(Msg); end
clear Msg;
fprintf(fid_output_dist,param.format.output.csv.dist.fieldname,...
listcase.fname_load{:,1});
cell_casepoint = [];
cell_casepoint = mat2cell(record.glbl.point',[1],repmat([1],1,size(record.glbl.point,1)));
fprintf(fid_output_dist,param.format.output.csv.dist.data,...
cell_casepoint{1,:});
clear cell_casepoint;
D-6
ncol = size(record.glbl.point,1);
for i = 1:ncol
neardistlen(i,1) = size(record.glbl.qual.neardist{i,1},1);
end
nrow = max(neardistlen);
mat_out = zeros(nrow,ncol);
for i = 1:ncol
mat_out(1:neardistlen(i,1),i) = record.glbl.qual.neardist{i,1};
end
clear neardistlen;
for i = 1:nrow
fprintf(fid_output_dist,...
param.format.output.csv.dist.data,mat_out(i,:));
end
clear i;
clear nrow;
clear ncol;
clear mat_out;
fclose(fid_output_dist);
clear fid_output_dist;
% 17) Close all files and clear the workspace

fclose('all');
clear ans;
clear str;
disp('SEQUENCE DONE!')
function [element_ref] = setref()

%SETREF Reads reference data either from a Matlab resource, or ply.
% [ELEMENT_REF] = SETREF() Tries to open a matlab data file first,
% and if that fails, then prompts the user to specify a ply file
% as reference data.
disp('Specify the location of the reference file.')

[fname_ref,pname_ref] = uigetfile('*.mat','Reference Data File');
ticID_Load = tic;
if size(fname_ref) < 2
disp('No mat file, open a ply file instead...')
[fname_ref,pname_ref] = uigetfile('*.ply','Reference Data File');
disp('Loading could take a very long time.')
element_ref = plyread([pname_ref,fname_ref]);
else
disp('Loading could take a long time.')
load([pname_ref,fname_ref]);
end
timeElapsed_Load = toc(ticID_Load);
disp(['Elapsed time: ',num2str(timeElapsed_Load,2)])
clear timeElapsed_Load;
clear ticID_Load;
clear fname_ref;
clear pname_ref;
end
function [param] = setparam(element_ref)

%SETPARAM Sets default parameters for the nvmconvert function,
% By using the element_ref as an input, the tie point reference heights
% can be calculated once and then re-used
% >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
param = []; % Empty structure;

param = setfield(param,'tiepoint',[]);
param.tiepoint = setfield(param.tiepoint,'coord_x',[]);
param.tiepoint = setfield(param.tiepoint,'coord_y',[]);
param.tiepoint = setfield(param.tiepoint,'radius',[]);
param.tiepoint = setfield(param.tiepoint,'idwpow',[]);
param.tiepoint = setfield(param.tiepoint,'ref',[]);
param.tiepoint.ref = setfield(param.tiepoint.ref,'vect_logic',[]);
param.tiepoint.ref = setfield(param.tiepoint.ref,'coord_z',[]);
param.tiepoint = setfield(param.tiepoint,'in',[]);
param.tiepoint.in = setfield(param.tiepoint.in,'vect_logic',[]);
param.tiepoint.in = setfield(param.tiepoint.in,'coord_z',[]);
param = setfield(param,'cpoly',[]);
param.cpoly = setfield(param.cpoly,'coord_x1',[]);
param.cpoly = setfield(param.cpoly,'coord_y1',[]);
param.cpoly = setfield(param.cpoly,'coord_x2',[]);
param.cpoly = setfield(param.cpoly,'coord_y2',[]);
D-7
param.cpoly = setfield(param.cpoly,'A',[]);
param.cpoly = setfield(param.cpoly,'B',[]);
param.cpoly = setfield(param.cpoly,'C',[]);
param = setfield(param,'alpha',[]);
param = setfield(param,'format',[]);
param.format = setfield(param.format,'list',[]);
param.format.list = setfield(param.format.list,'fieldname',[]);
param.format.list = setfield(param.format.list,'line',[]);
param.format = setfield(param.format,'output',[]);
param.format.output = setfield(param.format.output,'ply',[]);
param.format.output = setfield(param.format.output,'csv',[]);
param.format.output.csv = setfield(param.format.output.csv,'main',[]);
param.format.output.csv.main = setfield(param.format.output.csv.main,'fieldname',[]);
param.format.output.csv.main = setfield(param.format.output.csv.main,'data',[]);
param.format.output.csv.main = setfield(param.format.output.csv.main,'pname_save',[]);
param.format.output.csv.main = setfield(param.format.output.csv,'fname_save',[]);
param.format.output.csv = setfield(param.format.output.csv,'dist',[]);
param.format.output.csv.dist = setfield(param.format.output.csv.dist,'fieldname',[]);
param.format.output.csv.dist = setfield(param.format.output.csv.dist,'data',[]);
param.format.output.csv.dist = setfield(param.format.output.csv.dist,'pname_save',[]);
param.format.output.csv.dist = setfield(param.format.output.csv.dist,'fname_save',[]);
param = setfield(param,'alignwindow',[]);
param.alignwindow = setfield(param.alignwindow,'xguess',[]);
param.alignwindow = setfield(param.alignwindow,'xrange',[]);
param.alignwindow = setfield(param.alignwindow,'xprecis',[]);
param.alignwindow = setfield(param.alignwindow,'yguess',[]);
param.alignwindow = setfield(param.alignwindow,'yrange',[]);
param.alignwindow = setfield(param.alignwindow,'yprecis',[]);
param.alignwindow = setfield(param.alignwindow,'zguess',[]);
param.alignwindow = setfield(param.alignwindow,'zrange',[]);
param.alignwindow = setfield(param.alignwindow,'zprecis',[]);
param = setfield(param,'xsect',[]);
param.xsect = setfield(param.xsect,'stripwidth',[]);
param.xsect = setfield(param.xsect,'idw_pow',[]);
param.xsect = setfield(param.xsect,'changethresh',[]);
param.xsect = setfield(param.xsect,'xdir',[]);
param.xsect.xdir = setfield(param.xsect.xdir,'ycoord',[]);
param.xsect.xdir = setfield(param.xsect.xdir,'refsect',[]);
param.xsect.xdir = setfield(param.xsect.xdir,'refchange',[]);
param.xsect = setfield(param.xsect,'ydir',[]);
param.xsect.ydir = setfield(param.xsect.ydir,'xcoord',[]);
param.xsect.ydir = setfield(param.xsect.ydir,'refsect',[]);
param.xsect.ydir = setfield(param.xsect.ydir,'refchange',[]);
% param = setfield(param,'DT',[]); %Place holder for a Delaunay triangulation
% of the reference data.
param = setfield(param,'NS',[]); % Place holder for a KNN search object of
% the reference data.
param = setfield(param,'icp',[]);
param.icp = setfield(param.icp,'max_iter',[]);
param.icp = setfield(param.icp,'max_rmse_change',[]);
% Now fill the param structure with parameters
% The following are the tie point locations and radii

param.tiepoint.coord_x = ...
[437.903500000014;
507.026100000017;
437.572800000024;
389.286299999978];
param.tiepoint.coord_y = ...
[31988.4089999999;
31955.9973999997;
31817.4216;
31836.2731999997];
param.tiepoint.radius = ...
[10.0000;
10.0000;
10.0000;
6.0000];
param.tiepoint.idwpow = 2;
[param.tiepoint.ref.coord_z,...
param.tiepoint.ref.vect_logic] = tiezval(element_ref,param);
if ~all(param.tiepoint.ref.vect_logic)
error('Not all tie points contain reference data!!!')
end
% These are the points defining the lines on the convex polygon which is
% used to constrain the scope of the data.
param.cpoly.coord_x1 = ...
[432.281086999981;
520.916690999991;
D-8
447.045187000011;
358.409583];
param.cpoly.coord_y1 = ...
[32001.9688919996;
31961.6198899997;
31799.3447820004;
31839.6937840003];
% x2 and y2 are obtained by changing the order of x1,y1

param.cpoly.coord_x2 = param.cpoly.coord_x1(2:end,1);
param.cpoly.coord_y2 = param.cpoly.coord_y1(2:end,1);
param.cpoly.coord_x2 = [param.cpoly.coord_x2;
param.cpoly.coord_x1(1,1)];
param.cpoly.coord_y2 = [param.cpoly.coord_y2;
param.cpoly.coord_y1(1,1)];
% Any point (x,y) is within the convex polygon (A,B,C)i if it satisfies

% D < 0 for each i, where:
% D = Ax + By + C
% A = -(y2 - y1)
% B = x2-x1
% C = -(Ax1 + By1)
param.cpoly.A = param.cpoly.coord_y1 - param.cpoly.coord_y2;
param.cpoly.B = param.cpoly.coord_x2 - param.cpoly.coord_x1;
param.cpoly.C = - (param.cpoly.A .* param.cpoly.coord_x1 + ...
param.cpoly.B .* param.cpoly.coord_y1);
% These will be the csv file headers

% [casenumber,pname_load,fname_load,ftype_load,pname_save,fname_save,...
% pname_coloursave,fname_coloursave,imagecount,detector,estimation...
% matching,pair]
param.format.list.fieldname = '%s %s %s %s %s %s %s %s %s %s %s %s %s\n';
param.format.list.line = '%u %s %s %s %s %s %s %s %u %s %s %s %u\n';
% The param.format.output.ply is used to specify in what format to write
% the ply files.
% The Options are: 'binary_big_endian', 'binary_little_endian' and 'ascii'
param.format.output.ply = 'binary_big_endian';
param.format.output.csv.main.fieldname = {...
['casenumber'],...
['imagecount'],...
['detector'],...
['estimation'],...
['matching'],...
['pair'],...
['success'],...
['centroidcorr'],...
['numpoints'],...
['mean_x'],...
['mean_y'],...
['mean_z'],...
['var_x'],...
['var_y'],...
['var_z'],...
['offset_x'],...
['offset_y'],...
['offset_z'],...
['icp_rmse_start'],...
['icp_rmse_end'],...
['R11'],...
['R12'],...
['R13'],...
['R21'],...
['R22'],...
['R23'],...
['R31'],...
['R32'],...
['R33'],...
['tx'],...
['ty'],...
['tz'],...
};
param.format.output.csv.main.data = ...
['%u,%u,%s,%s,%s,%u,',...% casenumber, imagecount, detector, estimation, matching, pair
'%u,%u,%u,',... % success, centroidcorr, numpoints
'%6.4f,%6.4f,%6.4f,',... % means
'%6.4f,%6.4f,%6.4f,',... % vars
'%6.4f,%6.4f,%6.4f,',... % offsets
'%6.4f,%6.4f,',... % rmse
'%6.4f,%6.4f,%6.4f,',... % rotation
'%6.4f,%6.4f,%6.4f,',... % rotation
'%6.4f,%6.4f,%6.4f,',... % rotation
'%6.4f,%6.4f,%6.4f\n']; % translation
param.format.output.csv.main.pname_save = ...
D-9
'D:\Colpitts\Project\Software\Matlab\readply\Output\';
param.format.output.csv.main.fname_save = ...
'output_main.csv';
param.format.output.csv.dist.pname_save = ...
'D:\Colpitts\Project\Software\Matlab\readply\Output\';
param.format.output.csv.dist.fname_save = ...
'output_dist.csv';
param.icp.max_iter = 100;
param.icp.max_rmse_change = 0.00005;
% Create a nearest-neighbours search object
ticID_SearchObject = tic;
disp('Create a nearest-neighbours search object.');
param.NS = createns(...
[element_ref.vertex.x,...
element_ref.vertex.y,...
element_ref.vertex.z],...
'NSMethod','kdtree',...
'Distance','euclidean',...
'BucketSize',50);
elapsedTime_SearchObject = toc(ticID_SearchObject);
disp(['Elapsed time: ',num2str(elapsedTime_SearchObject,2)]);
clear elapsedTime_SearchObject;
% param.xsect.xdir.ycoord = 31902;
% param.xsect.ydir.xcoord = 448;
% param.xsect.stripwidth = 2;
% param.xsect.idw_pow = 3;
% param.xsect.changethresh = 2;
% % Create the reference cross sections and the change section.
% ticID_RefSect = tic;
% disp('Get the reference sections.');
% elapsedTime_RefSect = toc(ticID_RefSect);
% disp(['Elapsed time: ',num2str(elapsedTime_RefSect,2)]);
param.alignwindow.xguess = 0;
param.alignwindow.xrange = 64;
param.alignwindow.xprecis = 0.25;
param.alignwindow.yguess = 0;
param.alignwindow.yrange = 64;
param.alignwindow.yprecis = 0.25;
param.alignwindow.zguess = -40;
param.alignwindow.zrange = 128;
param.alignwindow.zprecis = 0.25;
end
function [listcase,record] = setlist(param)

%SETLIST Sets up and reads the input list as a CSV file.
% [LISTCASE,RECORD] = SETLIST() Where LISTCASE is a data structure
% containing the information read from the CSV file.
% There are different fields:
% 'fieldname' stores the field names of the csv file
% 'casenumber' records the case number
% 'pname_load' and fname_load are the path and file name of the input nvm
% 'ftype_load' is the file type, detailing whether or not the file should
% be read as a ply or an nvm
% The 'coloursave' fields are there in case functionality is added to
% save images representing the qual of the dataset.
% RECORD stores the record fields associated with the qual measures
% obtained with the overall script.
% >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
% Structure to store the list that will be read from the csv.
listcase = [];
listcase = setfield(listcase,'fieldname',[]);
listcase = setfield(listcase,'casenumber',[]);
listcase = setfield(listcase,'pname_load',[]);
listcase = setfield(listcase,'fname_load',[]);
listcase = setfield(listcase,'ftype_load',[]);
listcase = setfield(listcase,'pname_save',[]);
listcase = setfield(listcase,'fname_save',[]);
listcase = setfield(listcase,'pname_imagesave_top',[]);
listcase = setfield(listcase,'fname_imagesave_top',[]);
listcase = setfield(listcase,'imagecount',[]);
listcase = setfield(listcase,'detector',[]);
listcase = setfield(listcase,'estimation',[]);
listcase = setfield(listcase,'matching',[]);
listcase = setfield(listcase,'pair',[]);
listcase = setfield(listcase,'numcase',[]); % The total number of cases.
D-10
[fname_list,pname_list] = uigetfile('*.csv','Select the input list file.');

% open file in read text mode
[fid_list,Msg] = fopen([pname_list,fname_list],'rt');
clear fname_list;
clear pname_list;
if fid_list == -1, error(Msg); end
clear Msg;
listcase.fieldname = textscan(...
fid_list,param.format.list.fieldname,1,'delimiter',',');
Buf = textscan(fid_list,param.format.list.line,'delimiter',',');
fclose(fid_list);
clear fid_list;
clear ans;
listcase.casenumber = Buf{1,1};
listcase.pname_load = Buf{1,2};
listcase.fname_load = Buf{1,3};
listcase.ftype_load = Buf{1,4};
listcase.pname_save = Buf{1,5};
listcase.fname_save = Buf{1,6};
listcase.pname_coloursave = Buf{1,7};
listcase.fname_coloursave = Buf{1,8};
listcase.imagecount = Buf{1,9};
listcase.detector = Buf{1,10};
listcase.estimation = Buf{1,11};
listcase.matching = Buf{1,12};
listcase.pair = Buf{1,13};
listcase.numcase = size(listcase.casenumber,1);
clear Buf;
record = []; % Empty structure to store the records

record = setfield(record,'fieldnames',[]);
record = setfield(record,'success',[]);
record = setfield(record,'centroidcorr',[]);
record = setfield(record,'glbl',[]);
record.glbl = setfield(record.glbl,'point',[]);
record.glbl = setfield(record.glbl,'dirstat',[]);
record.glbl.dirstat = setfield(record.glbl.dirstat,'mean',[]);
record.glbl.dirstat.mean = setfield(record.glbl.dirstat.mean,'x',[]);
record.glbl.dirstat.mean = setfield(record.glbl.dirstat.mean,'y',[]);
record.glbl.dirstat.mean = setfield(record.glbl.dirstat.mean,'z',[]);
record.glbl.dirstat.mean = setfield(record.glbl.dirstat.mean,'neardist',[]);
record.glbl.dirstat = setfield(record.glbl.dirstat,'var',[]);
record.glbl.dirstat.var = setfield(record.glbl.dirstat.var,'x',[]);
record.glbl.dirstat.var = setfield(record.glbl.dirstat.var,'y',[]);
record.glbl.dirstat.var = setfield(record.glbl.dirstat.var,'z',[]);
record.glbl.dirstat.var = setfield(record.glbl.dirstat.var,'neardist',[]);
record.glbl = setfield(record.glbl,'adjust',[]);
record.glbl.adjust = setfield(record.glbl.adjust,'offset_x',[]);
record.glbl.adjust = setfield(record.glbl.adjust,'offset_y',[]);
record.glbl.adjust = setfield(record.glbl.adjust,'offset_z',[]);
record.glbl.adjust = setfield(record.glbl.adjust,'icp',[]);
record.glbl.adjust.icp = setfield(record.glbl.adjust.icp,'mat_r',[]);
record.glbl.adjust.icp = setfield(record.glbl.adjust.icp,'vect_t',[]);
record.glbl.adjust.icp = setfield(record.glbl.adjust.icp,'rmse',[]);
record.glbl.adjust.icp.rmse = setfield(record.glbl.adjust.icp.rmse,'first',[]);
record.glbl.adjust.icp.rmse = setfield(record.glbl.adjust.icp.rmse,'last',[]);
record.glbl = setfield(record.glbl,'qual',[]);
record.glbl.qual = setfield(record.glbl.qual,'neardist',[]);
record.glbl.qual = setfield(record.glbl.qual,'hausdorff',[]);
record = setfield(record,'section',[]);
record.section = setfield(record.section,'denscorr',[]);
record.section = setfield(record.section,'hausdorff',[]);
record = setfield(record,'visualize',[]);
record.visualize = setfield(record.visualize,'neardist',[]);
record.visualize.neardist = setfield(record.visualize.neardist,'lowval',[]);
record.visualize.neardist = setfield(record.visualize.neardist,'highval',[]);
end
function [RT,tT,rmse_vect,rmse_first,rmse_last] = icp_colpitts(element_in,element_ref,param)

%ICP_COLPITTS This is my take on the common ICP algorithm.
% ICP_COLPITTS assumes the notation that the rotation R and translation t
% are with regard to the model centroid, not the world origin.
% This makes the output much easier to understand. For example, what
D-11
% rotation and translation would you have to apply to a model about its
% own centroid such that the model best fit the data?
%
% model_new = (R * (model - model_bar) + t) + model_bar
%
% In this arguably more intuitive notation, the translation vector is
% simply the discrepancy between the model centroid and the data.
%
% Such that the error function:
%
% E = SUM( Wi * Di )
%
% is minimised, where Wi are the individual point weights and Di are the
% distances between the model points and the nearest data point.
%
% In order for the error function to represent the sum of SQUARED
% distance, I have chosen than Wi = Di so that each point is scaled again
% by its distance from the closest model point, yielding a sum-of-squares
% error function. This supposition appears to work well during the rough
% alignment.
% Initialisations
RT = eye(3);
tT = zeros(1,3);
rmse_vect = zeros(param.icp.max_iter,1);
model_orig = [element_in.vertex.x,...
element_in.vertex.z];
data_orig = [element_ref.vertex.x,...
element_ref.vertex.y,...
element_ref.vertex.z];
% Finds the model centroid and the deviations from the centroid These can
% be saved and used in subsequent interations
m_orig = size(model_orig,1);
model_bar_orig = mean(model_orig,1);
model_centred_orig = model_orig - repmat(model_bar_orig,m_orig,1);
i = 1;
stop_iter = false;
while ~stop_iter
element_iter = element_in;
% This is the model dataset after the current iteration's transformation

model = (RT * model_centred_orig')' + repmat(tT,m_orig,1) + repmat(model_bar_orig,m_orig,1);
% Saving the transformed model back to the element_iter for trimming

element_iter.vertex.x = model(:,1);
element_iter.vertex.y = model(:,2);
element_iter.vertex.z = model(:,3);
% Trim the input data to the study extents

element_iter = meshtrim(element_iter,param);
% Extract the vertex data back out to the model variable

% But only if there are enough points in the trimmed dataset
if size(element_iter.vertex.x,1) >= 10
model = [element_iter.vertex.x,...
element_iter.vertex.y,...
element_iter.vertex.z];
end
% Perform matching
[idx_ref,dist] = knnsearch(param.NS,model,...
'K',1,'Distance','euclidean','IncludeTies',false);
idx_model = true(size(model,1),1);
% Take only the best %90 of matches, rounded down.

len = floor(size(idx_ref,1) * 0.9);
pair = find(idx_model);
[~,idx] = sort(dist);
idx_model(pair(idx(len+1:end))) = false;
idx_ref = idx_ref(idx_model);
dist = dist(idx_model);
% The current Root Mean Square Error (RMSE)

rmse_vect(i,1) = sqrt(sum(dist .^ 2)/length(dist));
if (i > 1) && ((rmse_vect(i-1,1) - rmse_vect(i,1)) <= param.icp.max_rmse_change)
disp('Reached the maximum RMSE change. Stopping.');
stop_iter = true;
rmse_last = rmse_vect(i,1);
D-12
else
rmse_first = rmse_vect(i,1);
end
% The data_idx variable is the set of points in the reference dataset

% corresponding to the closest reference point to each point in the
% model.
data_idx = data_orig(idx_ref,:);
model = model(idx_model,:);
% Finds the model centroid and the centred model

m = size(model,1);
model_bar = mean(model,1);
model_centred = model - repmat(model_bar,m,1);
% Finds the data_idx centroid and the centred data (around its centroid)
n = size(data_idx,1);
data_bar = mean(data_idx,1);
data_centred = data_idx - repmat(data_bar,n,1);
% Now solve for the rotation about the centroid and the translation
S = (model_centred .* repmat(dist,1,3))' * data_centred;
[U,~,V] = svd(S);
R = V * diag([1,1,det(V * U')]) * U';
t = data_bar - model_bar;
% Now apply the rotation and translation to the total transformation

RT = R * RT;
tT = tT + t;
if i >= param.icp.max_iter
disp('Reached the maximum iterations. Stopping.');
stop_iter = true;
rmse_last = rmse_vect(param.icp.max_iter,1);
elseif ~stop_iter
disp(['Completed iteration number: ',num2str(i),'. The RMSE is: ',num2str(rmse_vect(i,1))]);
i = i + 1;
end
end
str = sprintf(['The final transformation is as follows:\n',...

'[R,t] = [',num2str(RT(1,1),'%01.4f'),' ',num2str(RT(1,2),'%01.4f'),'
',num2str(RT(1,3),'%01.4f'),' ',num2str(tT(1,1),'%01.4f'),'\n',...
' ',num2str(RT(2,1),'%01.4f'),' ',num2str(RT(2,2),'%01.4f'),'
',num2str(RT(2,3),'%01.4f'),' ',num2str(tT(1,2),'%01.4f'),'\n',...
' ',num2str(RT(3,1),'%01.4f'),' ',num2str(RT(3,2),'%01.4f'),'
',num2str(RT(3,3),'%01.4f'),' ',num2str(tT(1,3),'%01.4f'),']']);
disp(str);
str = sprintf(['The final RMSE is: ',num2str(rmse_last),'m.']);
end
D-13
A N N E X E: A U X I L I A R Y C A L C U L A TI O N S
Annex E A UXILIARY C ALCULATIONS

The auxiliary calculations in this annex pertain to the calculation of the
estimated focal length and camera parameters of the input image sequence.
Determining the Basic Parameters of the Fixed Camera Calibration

Adapted from the reverse of Snavely (2008)
http://phototour.cs.washington.edu/focal.html
Assumptions:
1 Sensor uses square pixels, i.e. focal length in pixels will be identical for fx and fy
2 Sensor image centre is not offset
3 Sensor has no radial distortion
Step One: Obtain the sensor focal length

Source 1: JMMES Sensor Overview (BAE Systems, 2012a)
Source 2: JMMES Sensor Specifications (BAE Systems, 2012b)
From the sensor documentation, there are 4 possible set focal lengths:
Name Focal Length FOV Source 1 CCD Width
Source 1 Source 2 Source 2 Source 1 Source 2
Wide 27 mm 26.7° 18.50° 1.44 12.469 mm 8.680 mm
Medium 135 mm 5.43° 3.70° 1.47 12.789 mm 8.716 mm
Narrow 675 mm 1.09° 0.73° 1.49 12.841 mm 8.600 mm
Very Narrow 2024 mm 0.36° 0.24° 1.50 12.717 mm 8.478 mm
Mean 1.48 12.704 mm 8.619 mm
STD 0.03 0.165 mm 0.106 mm
From Calcs Below 12.800 mm
Sensor Widths
Focal Length f 12.7 mm 12.8 mm 12.9 mm
(θ)
FOVs
Possible
Wide 27 mm 26.47° 26.7° 26.87°

Medium 135 mm 5.39° 5.43° 5.47°
Narrow 675 mm 1.08° 1.09° 1.09°
Very Narrow 2024 mm 0.36° 0.36° 0.37°
E-1
Step Two: Estimate the sensor width (Pinhole Camera Model)

X: Sensor Width
Focal Plane
f: Focal Length
θ: Field of View
X= 2·f·sin(θ/2)
X= 12.800 mm (Mean)
Step Three: Converting from Millimeters to Pixels

My first hypothesis, based on image appearances and flight path:
Sensor using the 'Narrow' FOV
Upon further investigation using Arcmap and apparent actual FOV, sensor is using
the 'Medium' FOV.
Image Dimensions:
W: Width = 640 pixels
H: Height = 480 pixels
Focal Length in Pixels (X Direction)

fx = W/X*f
fy = fx (See assumption 1)
fx = 6750 pixels
fy = 6750 pixels
Step Four: (Alternative to Step Three) Find the focal plane resolution of the sensor.
Focal Plane resolution is measured in pixels per (inch/cm)
I will use cm.
Resx = Resy (Assumption 1)
Resx = W / X * (25.4 mm / 1 inch)
Resx = 1270.00 pixels per inch
E-2
Step Five: Calculate the 35mm equivalent focal length
Focal Length (f) / Sensor Width (X) = f₃₅ / 35mm

f₃₅ = 35 * (f / X)
f₃₅ = 369.140625 mm
E-3
A N N E X F: C O M P L E T E L I S T OF CASES
Annex F C OMPLETE L IST OF C ASES

This annex contains a complete list of the 276 discreet combinations of the
independent variables used in this research project.
Focal Length Estimation

Case # Input Image Count Feature Extraction Method
Method
1 3 SIFT Spec
2 3 SIFT EstShared
3 3 SIFT EstIndiv
4 3 Other Spec
5 3 Other EstShared
6 3 Other EstIndiv
7 4 SIFT Spec
8 4 SIFT EstShared
9 4 SIFT EstIndiv
10 4 Other Spec
11 4 Other EstShared
12 4 Other EstIndiv
13 5 SIFT Spec
14 5 SIFT EstShared
15 5 SIFT EstIndiv
16 5 Other Spec
18 5 Other EstIndiv
19 6 SIFT Spec
20 6 SIFT EstShared
21 6 SIFT EstIndiv
22 6 Other Spec
24 6 Other EstIndiv
25 7 SIFT Spec
26 7 SIFT EstShared
27 7 SIFT EstIndiv
28 7 Other Spec
30 7 Other EstIndiv
31 8 SIFT Spec
32 8 SIFT EstShared
33 8 SIFT EstIndiv
34 8 Other Spec
36 8 Other EstIndiv
37 9 SIFT Spec
38 9 SIFT EstShared
39 9 SIFT EstIndiv
40 9 Other Spec
42 9 Other EstIndiv
43 10 SIFT Spec
44 10 SIFT EstShared
45 10 SIFT EstIndiv
46 10 Other Spec
F-1

Method
48 10 Other EstIndiv
49 11 SIFT Spec
51 11 SIFT EstIndiv
52 11 Other Spec
55 12 SIFT Spec
57 12 SIFT EstIndiv
58 12 Other Spec
61 13 SIFT Spec
63 13 SIFT EstIndiv
64 13 Other Spec
67 14 SIFT Spec
69 14 SIFT EstIndiv
70 14 Other Spec
73 15 SIFT Spec
75 15 SIFT EstIndiv
76 15 Other Spec
79 16 SIFT Spec
81 16 SIFT EstIndiv
82 16 Other Spec
85 17 SIFT Spec
87 17 SIFT EstIndiv
88 17 Other Spec
91 18 SIFT Spec
93 18 SIFT EstIndiv
94 18 Other Spec
97 19 SIFT Spec
F-2

Method
99 19 SIFT EstIndiv
100 19 Other Spec
103 20 SIFT Spec
105 20 SIFT EstIndiv
106 20 Other Spec
109 21 SIFT Spec
112 21 Other Spec
115 22 SIFT Spec
118 22 Other Spec
121 23 SIFT Spec
124 23 Other Spec
127 24 SIFT Spec
130 24 Other Spec
133 25 SIFT Spec
136 25 Other Spec
139 30 SIFT Spec
142 30 Other Spec
145 35 SIFT Spec
148 35 Other Spec
F-3

Method
151 40 SIFT Spec
154 40 Other Spec
157 45 SIFT Spec
160 45 Other Spec
163 50 SIFT Spec
166 50 Other Spec
169 60 SIFT Spec
172 60 Other Spec
175 70 SIFT Spec
178 70 Other Spec
181 80 SIFT Spec
184 80 Other Spec
187 90 SIFT Spec
190 90 Other Spec
193 100 SIFT Spec
196 100 Other Spec
199 125 SIFT Spec
202 125 Other Spec
F-4

Method
205 150 SIFT Spec
208 150 Other Spec
211 200 SIFT Spec
214 200 Other Spec
217 250 SIFT Spec
220 250 Other Spec
223 300 SIFT Spec
226 300 Other Spec
229 350 SIFT Spec
232 350 Other Spec
235 400 SIFT Spec
238 400 Other Spec
241 500 SIFT Spec
244 500 Other Spec
247 600 SIFT Spec
250 600 Other Spec
253 700 SIFT Spec
F-5

Method
256 700 Other Spec
259 800 SIFT Spec
262 800 Other Spec
265 900 SIFT Spec
268 900 Other Spec
271 1000 SIFT Spec
274 1000 Other Spec
F-6
A N N E X G: C H A R TS
Annex G C HARTS
This annex contains the charts produced for this research project. Additional
charts can be found in the supporting Excel files which accompany this
research project.
G-1
G-2
G-3
G-4

Project Document Colpitts Final 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Document Colpitts Final 1

Uploaded by

Copyright:

Available Formats

CRANFIELD UNIVERSITY

CAPTAIN ANDREW COLPITTS R.M.C.

AN INVESTIGATION INTO THE EFFECT OF METADATA, FRAME COUNT

DEFENCE INTELLIGENCE SECURITY CENTRE

ROYAL SCHOOL OF MILITARY SURVEY

ROYAL SCHOOL OF MILITARY SURVEY

Academic Year 2012 – 2013

Captain Andrew Colpitts R.M.C.

AN INVESTIGATION INTO THE EFFECT OF METADATA, FRAME COUNT

Jonathan Fournier at Defence Research and Development Canada (DRDC) for

5.2 EXAMINATION OF RESULTS ........................................................ 5-1

FIGURE 7 - THE DISCRETIZED AND CROPPED GAUSSIAN SECOND ORDER

FIGURE 8 - BOX FILTERS APPROXIMATING THE GAUSSIAN SECOND-ORDER

FIGURE 9 - SLIDING W INDOW USED TO DETERMINE THE ORIENTATION OF

FIGURE 12 - EXAMPLE OF BUNDLER CAMERA POSE ESTIMATION (SNAVELY,

FIGURE 25 - LE90ABSMIN (PERCENTILE) VS. NUMBER OF

FIGURE 31 - SIDE VIEW OF A FALSE-COLOUR SIFT-SPECIFIED

FIGURE 32 - SIDE-VIEW OF A FALSE COLOUR SURF-SPECIFIED

Perhaps the most important advantage in the defence context is that a 3D

In Theory, such a scenario is possible using currently-available, open-source

1.4 P ROJECT O BJECTIVES

1.4.3 O BJECTIVE T HREE

1.4.4 O BJECTIVE F OUR

1.4.5 O BJECTIVE F IVE

1.5 A BOUT THIS R EPORT

It is assumed that the reader has a basic understanding of the hierarchy of

2.1.1 R EAD ANNEX A

2.2 T HE C RUX OF S F M: I MAGE -I MAGE C ORRESPONDENCES

Figure 1 – Correspondences Extracted using the Affine Scale Invariant Feature

To compound their importance, the quality and quantity of image-image

2.3 O THER C RITICAL F ACTORS

2.3.1 F OCAL L ENGTH E STIMATION

2.3.2 N UMBER OF I MAGES S UPPLIED

2.3.3 O VERALL P OSITIONAL ACCURACY AND L INEAR E RROR

2.4 S TRUCTURE FROM M OTION W ORKFLOW S TAGES

In his critique of various SfM algorithms, Oliensis (2000a) acknowledges that it

2.5 S TAGE O NE : F EATURE E XTRACTION

Research into feature detection is largely driven by object or image recognition

to avoid false matches. Robustness and distinctiveness are often competing

Scale invariant detectors aim to extract features whose descriptors do not

Affine-invariant feature detectors aim to extract features which are invariant to

Blob features are local extrema (maxima or minima) of pixel intensity. An

regions (MSER) detector (Matas et al., 2002). SIFT (Lowe, 2004) is an

In theory, affine-invariant features should exhibit better performance in SfM

2.5.1 S CALE I NVARIANT F EATURE T RANSFORM (SIFT)

Figure 2 – Diagram of the SIFT extraction process.

Figure 3 – DoG Layer Octaves (Lowe, 2004).

Firstly, several octaves of Gaussian-smoothed images are produced using the

Figure 5 - SIFT Feature Descriptor Orientation (Vedaldi and Fulkerson, 2008).

SIFT is a fast, well-proven algorithm in SfM applications. Contributing to its

2.5.2 S PEEDED -U P R OBUST F EATURES (SURF)

Figure 6 - Basic schematic diagram of the SURF extraction process.

By employing integral images and vastly simplified discretized filters, SURF

The Hessian filter is normally discretized and cropped to permit fast

However, to make use of integral images to speed computation time further,

The SURF method is extremely fast; in addition to the improvements listed

Figure 9 - Sliding Window Used to Determine the Orientation of SURF Descriptors.

According to Bay et al. (2008), the configuration of the SURF descriptor as an

2.5.3 O THER F EATURE D ETECTORS

MSER is a robust, unique, wide-baseline feature extraction and matching

Principal Components Analysis (PCA)-SIFT (Ke and Sukthankar, 2004) uses

2.6 S TAGE T WO : F EATURE M ATCHING

Once a set of putative matches has been obtained, calculation of the