You are on page 1of 4


Kifayat ullah
Cecos university of IT and emerging sciences and engineering Peshawar hayat abad

 Stage 1. The Total Caleb system proposed by Bougnoux et

Abstract: al. can complete the image match [4]. The camera
Multi software used to get the results of multiple images. calibration and 3D reconstruction are semi-automatic.
The (SFM) and (CMVS) used to obtain the sparse and dense Stage 2. A 3D surface automatic building system was put
reconstruction of sequential images collections. By taken forward by Pollefeys at K. U. Leuven University in-
images of engine through robot from different view to Belgium. The System adopts the camera self-calibration
quantify the quality of (SFM) data in relation. We have technique with variable parameters [5]. The System requires
implemented a number of comparison b/w different parts of users to use hand-held video camera to screen a series of
engine in order to assess the mesh deviation and the images about the object and to match the corresponding
reconstruction accuracy. The (SFM) and (CMVS) techniques points of the images to achieve self-calibration and layered
and methods were mostly questioned by creating a complete reconstruction, but the gathered object images must be
3D digital replication of engine. comprehensive.
Stage 3. A similar 3D reconstruction system was developed
1. INTRODUCTION by computer vision study team of Cambridge University [6].
The system can calibrate intrinsic parameters through
T HE 3D reconstruction is that type of process through
which we can achieve the real and authentic shape of an
object. The process can be achieved either active or
manual designating vanishing points which are formed in the
image by the three groups of spatial orthogonal parallel
passive methods. There is an increasing need for geometric lines. But its applicability and automaticity are not enough.
3D models in the movie industry, the games industry, Stage 4. A fully automatic 3D reconstruction method based
reverse engineering mapping (Street View) and others. on images without manual modeling [7]. This method
Generating these models from a sequence of images is much extracts 3D information from 2D images directly. First
cheaper than previous techniques (e.g. 3D scanners). These Feature points are extracted and then they are matched
techniques have also been able to take advantage of the together. To calculate coordinates of 3D points, fundamental
developments in digital cameras and the increasing matrix with extrinsic and intrinsic parameters of camera is
Resolution and quality of images they produce along with used.
the large collections of imagery that have been established 3. METHODOLOGY
on the Internet (for example, on Flickr). Our 3D Reconstruction Pipeline consists of the following

Extract Sift key

Calculate Intrinsic points and its
Input Images
Camera Parameter matches in
consecutive frames

Calculate [R|T]
Make graph of Extract Fundamental
parameter for all
The objective of this report is to identify the various matched 3d points of
and Essential matrix

approaches to generating sparse 3D reconstructions using the

Structure from Motion (SFM) algorithms and the methods to
generate dense 3D reconstructions using the Multi View
Merge graph and
Stereo (MVS) algorithms. Generate Sparse Generate dense
Calculate image
points for all dense
points points points
3D navigation, virtual reality and so on. At present, some
outstanding 3D reconstruction methods appear in
international community as follows. Reconstruction is Coloring of Dense
widely applied in robot
Locations of the observed points are well-conditioned then,
3.1 Technical Approach Given the complexity another image was selected that observes the largest number
involved in creating a full scale SFM and MVS of tracks whose 3D locations have already been estimated. A
implementation from scratch, the approach taken on this New camera's extrinsic parameters are initialized using the
DLT (direct linear transform)
project was to implement the Structure from Motion
Technique inside a RANSAC procedure. DLT also gives an
algorithms by engine on top of the material covered in class
estimate of K, the intrinsic Camera parameter matrix. Using
and sample code found online. These results were compared the estimate from K and the focal length estimated from
with those produced by the open source packages. The EXIF tags of the image, a reasonable estimate for the
3.2 Sorting the Photo Collection focal length of the new camera can be computed.
One of the first steps involved when dealing with an The next step is to add the tracks observed by the new
unordered photo collection is to organize the available camera into the optimization. A track is added if it is
images such that image are grouped into similar views. observed by at least one other camera and if triangulating the
The data consist of 200 images and then matching and track gives a well-conditioned estimate of its location and
reconstruction take a total of 3 hours on a cluster with 213 parts. This procedure is repeated, one image at a time until
compute cores. For this project, the SIFT algorithm was used no remaining image observes any the reconstructed 3D
to compare the images in the collection and images with a points. To minimize the objective function at each iteration,
high number of correspondences were considered to be they used the Sparse Bundle Adjustment library.
/close together and therefore good candidates for SFM The run times for this process were a few hours (engine-120
process.. photos) to 3 days.
3.3 Feature Detection and Matching 3.4.1 SFM using Two Images
Given a feature in an image, what is the corresponding Structure from Motion techniques using a pair of images
projection of the same 3D feature in the other image? This is were covered in class. In particular, Estimation of the
an ill-posed problem and therefore in most cases very hard fundamental matrix F from point correspondences and
to solve. It is not clear that not all possible image features solving the affine Structure from Motion problem using the
are good for matching. Often Points are used. Many interest Factorization Method proposed by Tomasi and Kanade were
point detector exist. In Schmidt ET. Al. concluded that the implemented in problem set.
Harris corner detector gives the best results. Later more The general technique for solving the structure from motion
robust techniques developed, one of them is SIFT which is problem is to
used in our project. -estimate the m 2x4 projection matrices Mi (motion) and the
In the Photo Engine project, the approach used for feature n 3D positions Pij (Structure) from the mxn 2D
detection and matching was to: correspondences pij (in the affine case, only allow
For translation and rotation between the cameras)
 Find feature points in each image using SIFT.
 for each pair of images match key points using the -This gives 2mxn equations in 8m+3n unknowns that can be
approximate nearest neighbors, estimate the solved using the algebraic method or the factorization
fundamental matrix for the pair using RANSAC method.
(use 8 point algorithm followed by non-linear -Convert from perspective to metric via self-calibration and
tenement) and remove matches that are outliers to apply bundle adjustment.
the re-covered fundamental matrix. If less than 20 For this project, two approaches were investigated for the
matches remain, then the pair was considered not scenario where the camera matrices are known (calibrated
good. cameras):
 Organize the matches into tracks, where a track is a The first approach is based on the material given:
connected set of matching key points across  Compute the essential matrix E using RANSAC
multiple images.  Compute the camera matrices P
3.4 Structure from Motion (SFM)  Compute the 3D locations using triangulation. This
In the Photo of engine project, the approach used for the 3D produces 4 possible solutions of which we select
reconstruction was to recover a set of camera parameters and the one that results in reconstructed 3D points in
a 3D location for each track. The recovered parameters front of both cameras.
Should be consistent, in that the projection error is  Run Bundle Adjustment to minimize the projection
minimized (a nonlinear least squares problem that was errors by optimizing the position of the 3D points
solved using Liebenberg Marquardt algorithm) Rather than and the camera parameters.
estimate the parameters for all cameras and tracks at once, The second approach utilizes Open CV and is based on the
they took an incremental approach, adding one camera at a material given in
time.  Compute fundamental matrix using RANSAC
The first step was to estimate the parameters for a single pair (Open CV: find Fundamental Mat ())
of images. The initial pair should have a large number of  Compute essential matrix from fundamental matrix
feature matches, but also a large baseline, so that the 3D and K (HZ 9.12/9.13) Open CV: Compute E =
 Decompose E using SVD to get the second camera SFM & Mesh Lab) was used. This dataset was a good choice
matrix P2 (HZ 9.19) (first camera matrix P1 is for testing with because a complete set of images was
assumed at origin - no rotation or translation) available (24 images at approx. 5 deg increments), the
 Compute 3D points using triangulation (Open CV: camera matrices are available, ground truth is available and
no function for triangulation, code your own) a dense 3D model available for comparison.
When dealing with the situation where the intrinsic camera 7. RESULTS
parameters are unknown, one can run the Self Calibration
(also known as Auto Calibration) process to estimate the
camera parameters from the image features. Possible
techniques for Self Calibration include using the single-view
metrology constraints, the direct approach using the Kruppa
equations, the algebraic approach or the stratified approach
3.4.2 SFM using Multiple Images.
With two images, we can reconstruct up to a scale factor.
However, this scale factor will be Different for each pair of
images. How can we find a common scale so that multiple
images can be combined? One approach is to use the
Iterative Closest Point (ICP) algorithm, where we triangulate
More points and see how they fit into our existing scene
geometry. A second approach (and the one used on this
project) is to use the Perspective N-Point (PnP) algorithm
(Also known as camera pose estimation) where we try to
solve for the position of a new camera using the scene points
we have already found. Open CV provides the solve PnP()
And solve PnP Ransac () functions that implement this
3.5 Multi View Stereo 8. Statistical analysis:
The Multi View Stereo algorithms are used to generate a
dense 3D reconstruction of the object or scene. The
techniques are usually based on the measurement of a DENSE SPARSE POINT
consistency function, a function to measure whether \this 3D Images Time(min)
model is consistent with the input images"Generally, the 82 62.70
answer is not simple due to the effects of noise and
calibration errors. 120 32.46
3.5.1 Color the dense point 200 277.5
Do the cameras see the same color? This approach is valid
for Lambert Ian surfaces only and is based on a
measurement of color variance. images vs time
After finding the 3D point, we need to color those points. 300
Coloring of these points is done in two steps:-
Firstly, for each 3D point we find the corresponding point in 250
the images by using 𝒙=𝑷𝑿 200

Set the color of 3D point is equal to color of the 150

corresponding point in Image. images
3.5.2 Texture: time
The texture around the points the same. This approach can 50
handle glossy materials, but has problems with shiny 0
objects. It is based on a based on a measurement of 1 2 3
correlation between pixel Patches. One of the following two images
approaches are generally used to build the dense 3D model:
build up the model from the good points. Requires many
views otherwise holes appear. Remove the bad points (start
from the bounding volume and carve away inconsistent Vertices used in images
points). Requires texture information to get a good Images Image sizes vertices
82 3376×2704 249450
6. Experiment:
For the First set of tests, the engine dataset, a set of 24 120 3376×2704 266366
images from the Dense MultiView Stereo datasets (visual 200 3376×2704 291750
Proc. Compute Vision and Pattern Recognition, Demo
images vs vertices Session, 1997.
250 300000 [5] Polleleys M., “Self-calibration and metric 3D
290000 reconstruction from uncalibrated image sequences”,
200 Ph.D. Thesis, Katholieke Universiteit Leuven,
270000 Heverlee, 1999.

260000 [6] Cipolla R., Robertson D. P. and Boyer E.G.,
100 250000 “Photobuilder-3Dmodels of architectural scenes from
240000 uncalibrated images”. Proc. IEEE International
50 Conference on Multimedia Computing and Systems,
0 220000 Firenze, volume 1, pp.25-31, June, 1999.
1 2 3
[7] Jiang Ze-tao, Zheng Bi-na, Wu Min, Wu Wen-
huan, “A Fully Autmatic 3D Reconstruction Method
Based on Images”, Computer Science and Information
Engineering, 2009 WRI World Congress, volume 5,
9. Conclusion: pp. 327-331, March-April, 2009.
From above statistical analysis data we concluded that by [8] YUAN Si-cong; LIU Jin-song, “Research on image
increasing number of images in SFM software it will matching method in binocular stereo vision. Computer
increase time for dense sparse point and also increase
Engineering and Applications”, 2008,44 (8): pp. 75-
vertices of images. The Structure from Motion and Multi
View Stereo algorithms provide viable methods for
Engine 3D models of vehicle. The key issues with the [9] N. Snavely, S. Seitz, R. Szeliski. Photo Tourism:
algorithms are they are fairly CPU and memory intensive, Exploring Photo Collections in 3D. ACM
especially when trying to do reconstruction at large scale as Transactions on Graphics (2006)
was done in the Photo engine project. The Visual SFM [10] Y. Furukawa, B Curless, S Seitz, R Szeliski.
implementation was found to be much faster than the other Towards Internet-scale Multi-view Stereo. In
toolkits (a few seconds per image vs. many seconds to a few CVPR (2010)
minutes per image for others). This was mainly due to use of =============Remark================
the GPU for SIFT and the implementation of a multicore
bundle adjustment algorithm.
A lot of Scope is present for Improvement.
1. Removing performance bottleneck.
2. Improving Feature Detection.
3. Creating Mesh from Point Cloud.
4. Applying Texture Mapping.
5. on the Fly Calculation of Intrinsic Camera Matrix.
6. Incorporating Improvements to Handle Radial Distortion
and Skewers.
[1] C. Tomasi and T. Kanade. Shape and motion from
image streams under orthography: A
factorization method. IJCV, 9(2):137-154, November
[2] M. Pollefeyes, R. Koch, M. Vergauwen and L.
Van. “Automated reconstruction of 3D scenes from
sequences of images”, ISPRS journal of
photogrammetry & remote sensing 55, 251-267, 2000.
[3] Lowe, David G. (1999). "Object recognition from
local scale-invariant features".Proceedings of the
International Conference on Computer Vision 2.
[4] Bougnoux S. and Robert L., “A fast and reliable
system for off-line calibration of images sequences”,