You are on page 1of 18

304 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO.

1, JANUARY 2016

Stitching and Surface Reconstruction From


Endoscopic Image Sequences: A Review
of Applications and Methods
Tobias Bergen and Thomas Wittenberg

Abstract—Endoscopic procedures form part of routine clinical


practice for minimally invasive examinations and interventions.
While they are beneficial for the patient, reducing surgical trauma SƟtching
and making convalescence times shorter, they make orientation
and manipulation more challenging for the physician, due to the
limited field of view through the endoscope. However, this draw- Endo-
back can be reduced by means of medical image processing and scopic view
computer vision, using image stitching and surface reconstruction expansion
methods to expand the field of view. This paper provides a compre-
hensive overview of the current state of the art in endoscopic image 3D surface SLAM
stitching and surface reconstruction. The literature in the relevant reconstrucƟon
fields of application and algorithmic approaches is surveyed. The
technological maturity of the methods and current challenges and
trends are analyzed.
Fig. 1. Endoscopic view expansion incorporates methods from different
Index Terms—Dynamic view expansion, endoscopy, image- branches of research on image processing and computer vision. Approaches
based 3-D reconstruction, image enhancement, image mosaicking, range from planar image stitching to 3-D surface reconstruction and SLAM.
image registration, image stitching, simultaneous localization and
mapping (SLAM). stitching and mosaicking refer to the process of combining sev-
eral (partially overlapping) images to create a broader field of
I. INTRODUCTION
view or panorama image with a wider perspective (stitching and
NDOSCOPY is an established procedure for diagnosing
E and treating a wide variety of diseases and injuries in-
side the human body. Endoscopes are used to inspect the lungs
mosaicking are used synonymously throughout this paper; the
result of the process is referred to as panorama image or image
mosaic).
and airways (bronchoscopy); the bladder, urethra, and ureter Classical image mosaicking, originally developed for stitch-
(cystoscopy); the stomach and esophagus (gastroscopy); the ing together sets of photographs, usually uses a planar surface
colon (colonoscopy); joint cavities (arthroscopy); ear, nose, onto which to project all of the images. However, if the scene
and throat (sinuscopy, laryngoscopy); and minimally invasive observed in the images has a significant 3-D structure and the
surgery (MIS) is carried out in the abdomen or in neurosurgery— camera performs any translational motion, the planar projection
to name only a few of the applications. The advantages for the becomes erroneous. This is often the case in endoscopic applica-
patient of minimally invasive procedures of this type are usually tions, and generating a panorama image, therefore, also implies
associated with reduced surgical trauma and shorter convales- reconstructing the underlying surface geometry and estimating
cence times. On the other hand, the techniques involved require the camera motion. At this point, image mosaicking intersects
a high degree of orientation, coordination, and fine motor skills with the field of 3-D surface reconstruction from a series of im-
on the part of the surgeon, due to the very limited field of view ages, which has also been widely studied in the computer vision
provided by the endoscope and the lack of relation between the community. In robotics, the problem of determining the posi-
orientation of the image and the physical environment (horizon). tion of a robot within an initially unknown scene purely from
As a consequence, investigations into ways of providing com- the robot’s own sensor information requires building a map of
puter assistance for endoscopic procedures have become a very the environment and at the same time locating the robot on that
active field of research in recent years. To overcome the prob- map. This problem has generally become known as “simultane-
lem of the limited field of view, stitching technologies based ous localization and mapping” (SLAM) and is obviously also
on image processing have been developed. The terms image related to surface reconstruction and image mosaicking, if the
robot’s sensor is assumed to be a camera. As a consequence,
Manuscript received September 4, 2014; revised November 24, 2014; ac- image stitching, 3-D surface reconstruction, and SLAM can be
cepted December 11, 2014. Date of publication December 18, 2014; date of regarded as involving similar problems and have been applied
current version December 31, 2015.
The authors are with the Fraunhofer Institute for Integrated Circuits IIS, to endoscopic image sequences with the goal of assisting the en-
91058, Erlangen, Germany (e-mail: tobias.bergen@iis.fraunhofer.de; thomas. doscopist by creating an enhanced field of view (see Fig. 1)—
wittenberg@iis.fraunhofer.de). referred to as endoscopic view expansion or endoscopic view
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. enhancement (EVE) throughout this paper. The purpose of this
Digital Object Identifier 10.1109/JBHI.2014.2384134 contribution is to provide an overview of the state of the art in

2168-2194 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 305

these fields with regard to their application to images acquired


from endoscopic video.
The remaining part of this section will outline some general
aspects of image stitching and surface reconstruction methods
and the challenges and requirements arising from their appli-
cation to endoscopic images. Section II reviews publications
about endoscopic image mosaicking and reconstruction in rela-
tion to their specific medical applications, organized by organs
and medical specialties. Section III presents the common pro-
cessing pipeline on which most algorithms are based, in order to
analyze the underlying methods and algorithms in greater detail.
Section IV describes approaches to the evaluation of stitching
results. Section V gives a brief quantitative analysis of publica-
tions in the field of endoscopic image stitching. In Section VI,
finally, conclusions on the state of the art, current trends, and
the major challenges currently being faced, are presented.
Fig. 2. Examples of images from different endoscopic applications. From left
to right and top to bottom: urinary bladder (cystoscopy), retina (eye surgery,
A. Image Stitching and Surface Reconstruction ophthalmoscopy, [11]), esophagus (gastroscopy), liver (laparoscopy), pituitary
gland (endo-nasal neuro surgery), colon polyp (colonoscopy), mouse colon
The methodology of image stitching has its roots in the pho- (CLE, [12]), larynx (laryngoscopy).
tography and photogrammetry community. In the 1980s and
1990s, an increasing number of scientists were exploring ways
of automatically registering images obtained from a photo or contrasted texture; surgical instruments may be moving within
video camera and generating globally consistent maps with a the field of view; and staining from blood or other body fluids,
wide field of view. Comprehensive overviews have been pub- as well as cauterization smoke, makes robust image processing
lished by Szeliski [1], [2]. Seminal works in this field include difficult. In addition, moist tissue can cause specular highlights.
the development of bundle adjustment by Triggs et al. [3], the A major challenge can also arise from deformation of tissue,
concept for creating image mosaics by Szeliski and Shum [4], which violates the assumption of a rigid scene implicit in many
mosaicking and super-resolution techniques by Capel and Zis- algorithms. Third, the handling of the endoscope by the surgeon
serman [5], recognizing panoramas by Brown and Lowe [6], and is generally not constrained, usually leading to three rotational
the geometry of multiple views by Hartley and Zisserman [7]. A and three translational degrees of freedom for camera motion.
more recent paper by Snavely et al. [8] showed how mosaicking If the device is moved quickly, motion blurring is commonly
and 3-D reconstruction can be applied to extremely unstruc- observed in endoscopic images.
tured collections of photographs. In robotics, the creation of Besides all of these challenges, we also need to consider the
panoramic maps has been explored as the problem of SLAM. requirements arising from endoscopic applications—which in
Seminal works in this area are the presentation of Kalman-filter- some respects may be less demanding than in other fields of
based real-time monocular SLAM by Davison et al. [9], as well application. Usually, compromises can be made in terms of the
as a methodology based on feature tracking and simultaneous accuracy of the reconstruction or texturization of the scene, so
3-D reconstruction (parallel tracking and mapping) by Klein that in contrast to photographic image stitching, some visually
and Murray [10]. perceivable deficiencies may often be acceptable. Depending on
the application, computation must be performed in real time or
B. View Expansion of Endoscopic Images may last up to several hours. We have identified two general-
use cases with different requirements with regard to computa-
Applying image stitching or surface reconstruction to endo- tion time. Dynamic view enhancement for navigational support
scopic images poses special challenges. We will structure these refers to real-time expansion of the field of view, and the process-
into three categories, according to their relation to 1) the en- ing speed is therefore crucial for this application. On the other
doscopic device, 2) the scene, and 3) handling by the surgeon. hand, if panorama images are being generated for documenta-
First, we consider the technical characteristics of endoscopic tion and quality assurance, the computation time is less of an
image acquisition. Image resolution is often quite low. Typi- issue. An important requirement arises from the fact that inter-
cal resolutions vary from about 0.25 megapixels (PAL) to 2 ference of the system with the surgical routine has to be kept to a
megapixels (HD), only part of which often contains relevant in- minimum. Visualization enhancement through image stitching
formation, due to the optical setup (see Fig. 2). Illumination can should assist the surgeon, not vice versa. This usually impedes
be very inhomogeneous, as the light source in the endoscope is the availability of user interaction or the inclusion of special
focused and aligned with the optical axis, leading to a decrease demands on the clinical protocol in order to assist the system.
in brightness from the center of the image toward the edges. In
addition, most endoscopes are set up with wide-angle lenses,
II. ENDOSCOPIC APPLICATIONS
causing major geometric image distortions. Second, the condi-
tion of the scene being observed poses certain challenges. The Image stitching and surface reconstruction have been ap-
tissue being examined often has little texture, or only poorly plied in a variety of endoscopic procedures. We present here an
306 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

TABLE I
ENDOSCOPIC VIEW EXPANSION HAS BEEN APPLIED TO A WIDE RANGE OF HUMAN ORGANS AND MEDICAL DISCIPLINES

Organ Organ Discipline Adequate Rigidity Device Processing Purpose


shape Model

Planar Colon (on Gastroenterology Plane Semirigid Confocal laser Real-time Dynamic view
microscopic scale) endomicroscope enhancement
(DVE)
Concave Larynx Otolaryngology Plane Nonrigid Rigid Offline Documentation
hollow laryngoscope
Eye (retina) Ophthalmology Plane, sphere Rigid Ophthalmoscope, Real-time DVE, navigation
funduscope
Bladder Urology Plane, sphere Semirigid Rigid or flexible Real-time, DVE,
cystoscope offline navigation,
documentation
Tubular Urethra Urology Cylinder Semirigid Rigid or flexible Offline Documentation
cystoscope
Esophagus Gastroenterology Cylinder Nonrigid Flexible Offline Documentation
gastroscope
Airways Bronchoscopy Cylinder Semirigid Flexible Real-time Navigation
(+bifurcations) bronchoscope
Complex Colon Gastroenterology Plane, cylinder, Nonrigid Pillcam, flexible Offline Documentation
3-D model-free colonoscope
Abdomen Laparoscopy Plane, Nonrigid Rigid Real-time DVE, navigation
model-free
Sinuses Neurosurgery Plane, Rigid Rigid laparoscope Real-time DVE, navigation
model-free

overview of the relevant applications based on medical branches are used to inspect the inner wall of the bladder. Usually, it is
and anatomical structures, summarized in Table I. filled-up with fluid during the inspection. Many image stitch-
The shapes of the organs under consideration vary from a ing approaches for the bladder use a planar projection surface
nearly planar appearance—as in the colon at a microscopic which works well for small parts of the organ. To visualize the
scale, observed in confocal laser endomicroscopy (CLE)—to entire bladder, also spherical models have been proposed. The
hollow and tubular structures (e.g., in urology), as well as 3-D physician may vary the amount of fluid and apply pressure to
structures (e.g., in laparoscopy), which strongly deviate from the abdominal wall to make all parts of the bladder visible.
such simple geometric models. Accordingly, the algorithms ap- While this causes at least temporary deformation of the organ,
plied in the different fields differ in complexity. Whereas for all stitching and reconstruction approaches found in the litera-
some organs such as the urinary bladder or the retina, most au- ture are based on a rigid body model. The presented approaches
thors assume a planar or spherical shape model, in laparoscopy either aim at building a map of the organ for documentation pur-
the mosaicking process is often based on reconstruction of a poses (offline) or provide a real-time view expansion to facilitate
complex 3-D surface. Another aspect that needs to be consid- orientation and navigation for the surgeon.
ered is the rigidity of the scene. Although the vast majority of Approaches: Initial experiments with a photograph of a pig
algorithms are based on an assumption that the scene is rigid, bladder and a mechanically guided fiberscope were reported in
this rarely holds for endoscopic applications. We characterize 2004 by Miranda-Luna et al. [13], [14]. This research was con-
scenes as rigid, semirigid, and nonrigid, respectively. Organs tinued by Hernandez-Mier et al. in 2006, extending the investi-
that are subject to motion or deformation due to physical pro- gations to fluorescence imaging [15], and by Olijnyk et al. [16]
cesses (such as heartbeat, respiration, and peristalsis) are con- and Ben-Hamadou et al. [17]. The authors applied active stereo
sidered to be nonrigid scenes. An assumption of rigidity can techniques by projecting eight laser dots for surface reconstruc-
therefore only be regarded as an approximation to the true be- tion of the bladder wall [18], [19]. An acceleration of the method
havior of the scene. Other organs are not necessarily subject to presented by Miranda-Luna et al. was described by Hernandez-
motion or deformation, but may be deformed under the influ- Mier et al. in 2010 [20]. Behrens et al. have investigated meth-
ence of interactions from the physician or motion by the patient. ods for image stitching of fluorescence cystoscopy in several
This may be the case in the urinary bladder, for example, as it publications [21]–[25]. They explored the creation of panorama
is not a rigid organ; if it is filled with a fairly constant amount images from cystoscopies with pure image-processing meth-
of fluid, careful handling of the cystoscope can reduce organ ods, as well as navigation support using inertial sensors.
deformation to a minimum, making rigidity a valid assumption. Bergen et al. [26] have applied a graph-based approach to
However, it is questionable whether this is actually viable in stitch images from cystoscopic video. They identified coherent
real-world clinical scenarios. submaps from the frame graph to stitch local patches which are
combined to a larger mosaic afterwards. Weibel et al., building
on the research by Ben-Hamadou et al., integrated graph-cut
A. Urology
and graph-based algorithms to produce visually coherent maps
The most prominent organ to which image stitching is ap- of the urinary bladder [27], [28]. Soper et al. [29] have also de-
plied is the urinary bladder. Both rigid and flexible cystoscopes scribed panorama imaging and surface reconstruction methods
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 307

to support automated bladder surveillance, complemented by Approaches: An early contribution by Rousso et al. [40], de-
the design of a special endoscopic surveillance system by Yoon scribing a pipe projection model, was used by Seibel et al. [41]
et al. [30]–[32]. to generate panorama images of the esophagus from a capsule
Inspection of the urethra and ureter are additional urological endoscope (CE) system. The images are mapped onto a cylindri-
examinations to which EVE has been applied. These are fur- cal surface by unwrapping them around an estimated projection
ther discussed in Section II-C, along with other tubular-shaped center. The camera motion between consecutive video frames
organs. is estimated using an affine optical flow technique. A similar
method of pipe projection was used by Yang et al. [42] to detect
fluorescent hotspots in Barrett’s esophagus images and visual-
B. Retinal Surgery
ize them on a mosaic map. Initial experiments on calculating
Another organ to which image stitching has been applied is an unwrapped image of the esophagus were presented by Shar
the retina. In relation to the visual impression of the images et al. [43] as long ago as 1990 and Kim et al. [44] in 1995.
that are acquired, as well as the spherical geometry, the retina Reynolds et al. [45] used an unwrapped map to quantitatively
is very similar to the urinary bladder. Although retinal surgery describe Barrett’s esophagus. Igarashi et al. [46], [47] and Ishii
is not an endoscopic procedure but is performed using an oph- et al. [48] presented opened panoramic images of tubular or-
thalmoscope or funduscope, we have decided to consider this gans such as the male urethra, porcine colon, and human colon,
application here as well, due to the similarity of the problem and using a “shape-from-shading” (SfS) approach. They assumed a
the algorithmic solutions to it that have been presented in the cylindrical model for the organs and perfect alignment of the
literature. For the purpose of image stitching, the eyeball can optical axis with the cylindrical axis. The panorama was gen-
be considered as a rigid body. Both planar and spherical mod- erated from circles extracted around the image center during
els have been used to visualize the retina. Real-time processing constant pull-back motion of the endoscope. Ou-Yang et al.
is an important requirement since stitching is performed as a [49] stitched images from a radial imaging CE system by ap-
navigation aid for the surgeon. plying a similar unwrapping technique. Recently, Yi et al. [50]
Approaches: In 2002, Can et al. [33] presented mosaics gen- have presented a real-time CE video visualization technique,
erated from images of the human retina acquired with a fun- based on unwrapped panorama images of the gastrointestinal
dus microscope. They explicitly exploited vascular structures to tract. Although their method was only based on homographies
register pairs of images and used a quadric surface model to rep- to describe interframe transformations, the group of Iakovidis
resent the retina. Their work is based on earlier experiments by and Spyrou [51], [52] successfully reduced the amount of video
Becker et al. carried out in 1998 [34]. Cattin et al. [35] built on material captured during wireless capsule endoscopy (WCE) by
the work of Can et al. and presented an alternative retina image creating frame collages (local panorama images).
mosaicking approach using speeded-up robust features (SURF)
and a multiband blending algorithm. Choe et al. [36] extracted
Y-shaped features for registration and applied a shortest path
D. Laparoscopy
algorithm on the frame graph to construct a globally consistent
mosaic with minimal registration error. Wei et al. [37] applied Another medical branch to which stitching and surface recon-
principal component analysis of a scale-invariant feature trans- struction is applied is MIS, especially laparoscopy. Most of the
form (PCA-SIFT) and a quadric surface model for mosaick- research published in this field is aimed at providing real-time
ing. The step toward real-time retinal mosaicking was taken by navigation support during surgical procedures in the abdomi-
Seshamani et al. [38] and was further improved by Richa et al. nal cavity. Most publications have therefore investigated SLAM
[39] in 2012; the latter authors proposed a hybrid tracking ap- methods for reconstructing a surface model, as well as the cam-
proach for improved robustness against image disturbances. era position in real time. The goal of creating an enhanced rep-
resentation of the scene is closely related to real-time stitching,
although the algorithms may differ. The classical method for
C. Tubular Organs solving the SLAM problem solely from the images of a single
Considerable efforts have been made to generate panorama moving camera (or endoscope) is based on the extended Kalman
images of tubular-shaped organs such as the esophagus, tra- filter (EKF) presented by Davison et al. in 2007 [9]. Numerous
chea, intestine, and the ureter and urethra. Tubular structures publications have built on this methodology to solve the SLAM
prohibit the use of classical stitching or surface reconstruction problem for MIS. Special challenges that emerge in the surgical
techniques. Since the direction of view during inspection of context include multiple objects in the scene, such as surgical
the above organs usually coincides with the direction of mo- instruments that appear and disappear; blurry images due to
tion, leading to a zooming effect in the images, special map- rapid camera motion; cauterization smoke; staining from blood
ping and reconstruction methods are needed. Therefore, many and other body fluids; and tissue deformation. Maier-Hein
approaches use a cylindrical model to approximate the organ et al. [53] have recently provided a comprehensive overview
shape. All aforementioned organs are inspected with flexible of state-of-the-art techniques for 3-D surface reconstruction in
endoscopes (except for the urethra which can also be viewed computer-assisted laparoscopic surgery. The specific goal of
with a rigid cystoscope) and are subject to deformation. Com- using surface reconstruction to provide the surgeon with an ex-
putation of a map of an organ or parts of it is usually performed tended field of view has been addressed under the heading of
offline. dynamic view expansion by several authors.
308 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

Approaches: In 2008, Lerotic et al. [54] reported initial ex- papers have used external tracking systems to enable real-time
periments on view expansion in natural orifice transluminal en- guidance during surgery (i.e., by Konen et al. [81], Winne et al.
doscopic surgery (NOTES), based on optical flow. An extension [82], Schulze et al. [83], Daly et al. [84]), Shahidi et al. [85]
of this to a full SLAM approach based on EKF and stereo- and Lapeer et al. [86] used passive optical markers and image-
endoscopic image data were presented by Mountney and Yang processing techniques to substitute for the external tracking sys-
[55], Stoyanov et al. [56], Totz et al. [57], and Warren et al. tem. Mirota et al. [87], [88] argued that direct registration of a
[58]. Whereas they used stereo imaging to obtain reliable depth 3-D reconstruction from endoscopic video and a preoperative
information, Grasa et al. explored the capabilities of monocular CT scan can improve accuracy, since the detour through exter-
EKF-based SLAM for MIS [59], [60]. Their SLAM solution is nal tracking systems tends to introduce significant errors. They
primarily based on the work of Civera et al. [61]–[63]. Dense applied SfM algorithms to reconstruct a surface model from the
surface reconstruction from image data from a da Vinci surgical monocular endoscopic video for CT registration. While all of
robotic system1 was described in 2012 by Bouma et al. [64]. these contributions are related to view enhancement strategies,
Due to the challenging characteristics of endoscopic image data, to the best of our knowledge Konen et al. [89] were the first
dense reconstruction has gained increasing focus in the recent to present experiments with an image mosaicking approach for
past. Further dense reconstruction approaches applied to surgi- neuroendoscopic videos. They applied a real-time mosaicking
cal image data have been presented by Totz et al. [65], Röhl et al. approach earlier described by Kourogi et al. [90]. Since this
[66], Bernhardt et al. [67], and Chang et al. [68]. The application method is purely based on affine frame-to-frame transforma-
of classical image mosaicking techniques to fibroscopic images tions, it is presumably not able to handle the complex geometry
of an ex-vivo kidney was reported by Atasoy et al. [69]. Hu et al. of sinusoidal cavities. Bergen et al. [91] have presented the ini-
[70] applied a mosaicking approach with superresolution to im- tial results of a real-time stitching approach for neurosurgery,
ages of the heart surface captured by the da Vinci system. This applied to a skull phantom.
was accompanied by their research on 3-D organ reconstruc-
tion [71], [72]. First steps toward deformable reconstruction of
F. Colonoscopy
organ surfaces were taken by Malti et al. [73]. They applied
template-based deformable shapes from motion and shading to One of the most challenging organs for stitching and surface
generate a template for the uterus that subsequently undergoes reconstruction is the human colon, due to its complex geometric
deformation. Also, Bartoli et al. [74] and Giannarou et al. [75], structure and extreme lack of rigidity. Although global mapping
[76] presented further procedures and theoretical considerations of the colon would be of great value from a medical point view
about deformable shape-from-motion (SfM) for MIS. to facilitate colonoscopy, only a few contributions on the topic
can be found in the literature. The geometric structure inhibits
E. Otorhinolaryngology and Neurosurgery the application of simple mosaicking algorithms, so that all of
the research has focused on 3-D surface reconstruction of parts
In the field of otorhinolaryngology—ear, nose, and throat
of the colon.
(ENT) conditions—Schuster et al. [77] have successfully ap-
Approaches: Some early work by Thormaehlen et al. [92]
plied general-purpose stitching software to laryngoscopic im-
used an SfM approach to generate a texturized surface model
age sequences and presented panorama images of the larynx for
from colonoscopy video images. They present a reconstruction
documentation purposes.
of part of the colon wall containing a polyp. Similarly, Koppel
In neurosurgery, endoscopic image processing techniques as
et al. [93] and Chen et al. [94] reconstructed a small part of
a method of navigational support have been investigated primar-
the colon, using a variety of feature tracking, camera motion
ily in endonasal surgery, in which the surgeon enters the brain
estimation, and stereo rectification algorithms to generate a tex-
through the nasal cavity and sphenoid bone to reach the anterior
turized surface model. Kaufman and Wang [95] combined an
skull base. The very restricted operating space and limited field
SfS algorithm for 3-D geometry estimation with an SfM algo-
of view here are challenges addressed by several authors, who
rithm to extract the endoscopic camera motion and 3-D feature
have described methods of navigational support through image
point locations. Hong et al. [96] took advantage of the tubular
mosaicking or SLAM solutions. Apart from manipulation by
nature of the colon to reconstruct a virtual colon segment from a
the surgeon, the sinuses can be considered as a rigid structure.
single colonoscopy image, assisted by manually drawn contours
The majority of the approaches for view enhancement for en-
of major colon folds.
donasal surgery aim at 3-D reconstruction of the rather complex
Three-dimensional reconstruction from CE images has been
geometry of the sinusoidal cavities.
investigated by several research groups in the recent years. CE
Approaches: Seminal work in the field of SLAM for sinus
is probably the most challenging endoscopic image source for
surgery has been presented by Burschka et al. [78], [79]. They
stitching or reconstruction, due to the low frame rate of usually
propose a 3-D reconstruction approach from monocular endo-
2–6 frames/s and uncontrolled, unrestricted motion. Neverthe-
scopic images for registration with a preoperative computed
less, some promising progress has been made. In 2010, Fan et al.
tomography (CT) scan, based on SfM and iterative closest point
[97] showed first results of an SfM approach, based on SIFT fea-
registration. Wittenberg et al. [80] presented a 3-D reconstruc-
tures and epipolar geometry applied to CE images of the colon.
tion of the sphenoid sinus with an SfM approach. While several
A very similar approach was also followed by Sun et al. [98] Due
to the lack of prominent textural features in CE images, most
1 www.intuitivesurgical.com subsequent research concentrated on SfS approaches (see, e.g.,
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 309

[99]). Ciuti et al. [100] successfully experimented with SfS for


3-D reconstruction of the colon as a basis for active locomotion
of a CE. Karargyris and Bourbakis [101] presented an advanced
Global
SfS approach for 3-D reconstruction of the colon, including an Input image Preprocessing
Pair-wise
image
alignment Blending and Output
and visualizaƟon panorama
elastic graph model for organ deformation. Prasath et al. [102] registraƟon
opƟmizaƟon
proposed an active contour algorithm to detect and segment
mucosa before applying an SfS reconstruction approach.
Despite the fact, that remarkable progress is made in this field,
there have been no publications presenting panorama images or
Fig. 3. Common pipeline for (endoscopic) image stitching represents the basis
3-D reconstructions of the complete colon. for many mosaicking and view expansion algorithms. Each selected input image
undergoes a preprocessing step. Pairs of input frames are registered to enable
G. Endomicroscopy the alignment of all frames in a common coordinate system. Blending between
images reduces seam artifacts at the frame borders. The result of stitching several
Optical biopsy with endomicroscopic devices allows imag- input frames is one panorama image of the scene.
ing at a cellular level during endoscopic procedures. Confocal
laser endomicroscopes (CLEs) are used in gastroenterology,
pulmonology, and urology to obtain microscopic images of the
involved in the mosaicking process. Each input image is usually
esophagus, stomach, colon, lung, and urinary tract. Due to the
acquired either from the endoscope system via frame-grabbing
microscopic image acquisition process, the field of view is usu-
or, with recorded data, from a video file. In comparison with
ally very small and image stitching has been applied very suc-
stitching applications developed for photographic images (usu-
cessfully. On the microscopic scale, planar image stitching ap-
ally 4–20 megapixels in size), resolutions are smaller. Most typ-
proaches have shown to be sufficient, so the 3-D structure does
ical are resolutions in the range of PAL (720 × 576 pixels) to
not have to be taken into account. To a certain extent, the ob-
high definition (HD; 1920 × 1080 pixels). The images are usu-
served tissue can be assumed to be rigid, but tissue deformation
ally streamed at rates of between 15 and 30 frames/s. A common
caused by the pressure applied to the surface by the microscope
characteristic of endoscopic video is the use of an image mask,
has also been considered. The purpose of image mosaicking is
which reduces the relevant image content. Several authors fur-
the dynamic extension of the field of view of the microscope,
ther reduce the amount of image data by subsampling the input
causing the requirement of real-time computation.
images at the beginning of the pipeline. Typical preprocessing
Approaches: The majority of contributions on endomicro-
steps are detection of the actual image mask, compensation for
scopic image mosaicking have been presented by Vercauteren
radial distortion, and reduction of vignetting effects. Image reg-
et al. [12], [103], [104]. A very interesting fact about their
istration refers to the process of finding a transformation that
research is that to the best of our knowledge it is the only endo-
matches images to each other in a pairwise manner. This is one of
scopic stitching technology that is market-ready and commer-
the most crucial steps in the mosaicking process. Most common
cially available. The Cellvizio probe-based CLE developed by
is the use of pixel-based or feature-based alignment methods.
Mauna Kea Technologies integrates these algorithms into the
These pairwise-registered images then have to be aligned in a
software to generate mosaics of cellular images. Vercauteren
common global coordinate system, meaning that they have to be
et al. describe a planar image stitching approach that also takes
projected onto a surface, which is usually planar, cylindrical, or
into account the local tissue deformation resulting from the mi-
spherical. Some approaches actually extract the 3-D geometry
croscope dragging along some tissue with it when it is moved
of the scene, which is then texturized with the panorama im-
in contact with the surface. An alternative real-time approach to
age. Often, a further optimization process reduces a global error
video mosaicking for endomicroscopy was presented by Bedard
measure to improve the accuracy of the final mosaic. Bundle
et al. [105]. After suppressing the fiber pattern visibile in the im-
adjustment is the most common approach. This is inspired by
ages, they registered sequential images using cross correlation
the idea that the set of projections of scene points onto camera
in the Fourier domain. Aspects of cumulative image registration
images can be interpreted as a bundle of rays. During bundle ad-
and local surface deformation for CLE mosaicking have been
justment, the scene point coordinates and camera parameters are
further investigated by Loewke et al. [106], [107].
refined simultaneously to optimize the global projection error
(usually defined in terms of the deviation of image coordinates
III. ENDOSCOPIC STITCHING AND SURFACE
of projected scene points from their observations). When the
RECONSTRUCTION APPROACHES images are projected onto the surface, usually more than one
Following the consideration above of different fields of appli- source image contributes to the pixel color in the final mosaic.
cation of endoscopic image stitching and surface reconstruction, Choosing the pixel value in a way that creates smooth transitions
we now go on to discuss the algorithms and methods involved in between frames and preserves as much structural information
the generation of an enhanced panoramic view. The majority of from the input images as possible is called image blending.
approaches published about the stitching of images acquired by Finally, the visualization step produces an output, which can
an endoscope share a common basic “pipeline” of procedures. either be a single panorama image or can be rendered as a 3-D
The general concept of this pipeline has been described earlier scene, in some cases allowing for further interaction. A selec-
in publications about general image stitching, comprehensively tion of results of endoscopic view expansion for different fields
reviewed by Szeliski [1]. Fig. 3 illustrates the individual steps of application are presented in Fig. 4. The following sections
310 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

Fig. 4. Selection of recent results of endoscopic view expansion, presented for different fields of application. Top row: A planar panorama image, generated in
real-time from fluorescence cystoscopy video frames [25]. A 3-D reconstruction using SfM of an excavated pig bladder [108]. An unrolled panorama image of the
esophagus, based on a cylindrical surface model [109]. A planar panorama image of the larynx, generated with general-purpose stitching software [77]. Dynamic
view enhancement (real-time) for laparoscopy using an EKF-SLAM approach [55]. Bottom row: Three-dimensional reconstruction of a polyp region in the colon,
using an SfM approach [93]. A planar collage from the colon, generated from images captured by a CE, to provide a visual video summary for faster inspection
[51]. A planar panorama image of the interior of a skull phantom for real-time view expansion during endo-nasal neurosurgery [91]. A planar panorama image of
a mouse colon generated in real-time with a CLE [12]. Result of a real-time mosaicking approach for assistance during retinal surgery [39].

provide details on the different contributions for each step in the tortion, allowing full calibration from a single chessboard image.
pipeline. Thus, minimal effort is needed by the surgeon to calibrate the
endoscope system within the operating room. This method has
A. Distortion Correction and Camera Calibration been further improved by Melo et al. [120]. A practical problem
of endoscopy calibration are frequent changes of focus or zoom
Typical endoscopes are wide-angle lens systems with viewing
settings, which make prior calibration invalid or at least impre-
angles between 90◦ and 120◦ . This setup causes barrel distortion
cise. The problem of continuous re-calibration was addressed
effects in the images, which interfere with the pinhole camera
by Lourenco et al. [121], who tracked salient points in the im-
model usually used. Different ways of overcoming this problem
ages to correct for a change of the focal length. Pratt et al. [122]
have been presented. First, classical camera calibration—e.g.,
presented a solution for intraoperative re-calibration of a stereo
Tsai’s method [110], Heikkilä’s extension [111], and Zhang’s
camera setup by reducing the variable parameter space to one
[112], or Hartley’s method [113]—using a calibration pattern,
parameter (focus position). They suggested to laser-engrave a
can be applied to extract the intrinsic camera parameters, includ-
calibration pattern onto a surgical instrument, so calibration can
ing distortion coefficients. Full calibration of endoscopic camera
be performed inside the body.
systems is a challenging task due to strong distortions, low image
An alternative approach has been used by Miranda-Luna et al.
contrast and the problem of interfering with the clinical work-
[14] (mosaicking of the bladder), Soper et al. [29] (bladder
flow if calibration has to be performed in the operating room.
reconstruction), Chen et al. [94] (colon reconstruction), Grasa
Several publications have explicitly addressed the problem of
et al. [60] (SLAM for laparoscopy), Stoyanov et al. [123] (en-
endoscopic camera calibration. Zhang et al. [114] demonstrated
hanced visualization during laparoscopy), and others. In this
the applicability of their calibration technique to endoscopic
method, the estimation of calibration parameters is incorpo-
video images. Wengert et al. [115] presented a fully automatic
rated into a global optimization process (see Section III-D).
calibration approach, with a newly designed sterilizable calibra-
This autocalibration method makes prior explicit recordings of
tion pattern. Apart from taking at least two calibration images,
a calibration pattern unnecessary and facilitates the calibration
no further user interaction is required for full calibration using
process—although at the cost of increasing the number of pa-
Heikkilä’s model. Stehle et al. [116] considered that the pin-
rameters that have to be optimized.
hole camera model is not suitable for endoscopes and suggested
a more general model, based on prior work by Kannala and
Brandt [117]. Li et al. [118] proposed a distortion correction B. De-Vignetting
pipeline, including a new polynomial model, for endoscopic A vignetting effect is typical in endoscopic images, due to the
images. Barreto et al. [119] developed a single-shot calibration wide field of view and the fact that the light source is directed
method. Their method is based on lifted coordinates to get a toward the center of the field of view. As a consequence, the
linear formulation of the projection model including radial dis- brightness usually decreases toward the edges of the image. To
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 311

make image registration more robust and reduce vignetting ar- information (MI). Feature-based approaches extract higher-level
tifacts in the final panorama image, several authors have opted features from two images that are matched on the basis of their
to compensate for inhomogeneous illumination as a prepro- similarity. SIFT features [124] have proved to be among the
cessing step. As the vignetting effect depends on the distance most distinctive and have been used by Behrens et al. [21],
and perspective alignment between the camera and the scene, [22] and Soper et al. [29]. Bergen et al. [125] have presented
a de-vignetting filter is usually not kept constant, but has to be a combined tracking approach with SIFT features and Kanade–
recalculated for every image. In the early research by Miranda- Lucas–Tomasi) tracking [126]. The high computational load
Luna et al. [13] a high-pass filtering approach was presented. involved in SIFT features motivated the development of SURF
The authors estimated the frequency range of the vignetting ef- features [127], which provide similar performance at a signifi-
fect from a Fourier transform and subtracted a Gaussian-filtered cantly greater speed through the use of integral images. SURF
image with the relevant bandwidth. Most de-vignetting methods features are used by Behrens et al. [25], Vemuri et al. [128], Re-
used for endoscopic image stitching are related to this approach eff et al. [129], Iakovidis et al. [51], and Richa et al. [39]. More
(Weibel et al. [27] and Hernandez-Mier et al. [20]). Alterna- recent developments include fast feature detectors, such as the
tively, de-vignetting can be considered as part of the blending accelerated segment test (e.g., FAST by Rosten and Drummond
step, favoring pixels closer to the image center when calculating [130] and AGAST by Mair et al. [131]) and CenSurE (center-
the pixel values of the final mosaic. In this case, the weighting surrounded extrema) by Agrawal et al. [132], as well as descrip-
function that compensates for illumination differences is based tors that are represented by binary vectors, such as binary robust
on some distance measure from the border or center of the im- independent elementary features [133] and ORB [134], FREAK
age. This function is kept constant for all images, implicitly [135], BRISK [136], and SKB [137], which can be computed
assuming an invariant vignetting effect. This approach has been rapidly and are claimed by their developers to be comparable
followed by Behrens et al. [24], Bouma et al. [64], and Soper to SIFT and SURF in distinctiveness. To the best of our knowl-
et al. [29]. Mountney and Yang [55] have also addressed the edge, none of these has yet been applied to endoscopic stitching,
vignetting problem in SLAM-based mosaicking. To texturize despite their high potential. An exception is the work of Mount-
the reconstructed surface model, they ignored areas close to the ney et al. [138], [139], who have presented an online learning
edge when selecting the texture images from the video stream. scheme for feature descriptors and adapted the method of ran-
domized trees for keypoint recognition by Lepetit et al. [140] to
develop context-specific descriptors for application in laparo-
C. Pairwise Image Registration and Frame Selection
scopic stereoscopy SLAM. They have also presented a com-
Image registration is probably the most crucial step in the parison of feature descriptors for MIS [141]. Another method
stitching pipeline. The goal of image registration is to find designated to laparoscopic feature tracking was presented by
the transformation between pairs of images. Most stitching Giannarou et al. [142]. They proposed an anisotropic affine in-
approaches start by registering frames from the video stream variant region tracking scheme, which is supported by an EKF
sequentially—i.e., each frame is registered to its (direct or in- based prediction mechanism to handle the difficult scenerio of
direct) predecessor. Since using every single frame of the video feature tracking during MIS. Although many authors claim to
stream results in a heavy computational load as well as an unnec- have used feature-based registration successfully, several re-
essarily high amount of overlap between frames, most authors search groups argue that only pixel-based registration is able
choose to implement some sort of frame selection mechanism. to handle frames with little texture and small overlap reliably
The simplest approach is to take every kth frame (e.g., Behrens that are present in endoscopic video sequences. Ben-Hamadou
et al. [25]). Finding an adequate k for a whole video sequence et al. [17], Miranda-Luna et al. [13], [14], and Hernandez-Mier
is difficult, since registration may fail if the value chosen is too et al. [20], therefore, present pixel-based registration techniques.
large. Some authors have therefore implemented a strategy of The disadvantage of these approaches is the long computation
adapting k according to the registration results. Soper et al. [29] time, needed for registration. Hernandez-Mier et al. [20] report
increment k as long as registration is successful and decrement 1.2 s and Ben-Hamadou et al. [17] as long as 60 s to register a
it if necessary. This registration process can be referred to as single pair of images. Weibel et al. [27] have taken a different
sequential frame registration [29]. If stitching is solely based approach, aiming at maximal robustness and combining differ-
on this sequential approach, small registration errors accumu- ent aspects of the techniques mentioned above. They minimized
late and can lead to unacceptably large distortions over time. To an energy function consisting of pixel-based color similarity,
address this problem, overlapping image pairs should be sought SURF keypoint similarity, an overall smoothness constraint,
that are nonsequential in the video stream. Different strategies and a planarity assumption and reported successful registration
have been proposed, and these are discussed in Section III-D. results in cases in which most other approaches fail—i.e., small
A wide range of algorithms has been proposed for solving the overlap (less than 50%) for nonsequential image pairs. Their
basic problem of registering a pair of images. Szeliski [2] pro- implementation requires an average of 20 s per image pair.
vides an overview of different approaches. The main categories An interframe transformation model is needed for the regis-
are pixel-based and feature-based approaches. Pixel-based al- tration process. The one most commonly used is a perspective
gorithms try to minimize an optimization criterion calculated transformation (homography H) in the 2-D projective space P 2 .
over the entire set of pixels within the overlap regions of the The homography accurately describes a coordinate transforma-
two images. Common criteria are the sum of squared differ- tion between two views: 1) if the scene is planar, or 2) if the
ences (SSD), normalized cross-correlation (NCC), and mutual camera motion between the views is a pure rotation (without
312 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

translation). These assumptions are reasonable in the case in sequence of several minutes can easily consist of several thou-
which camera motion between two successive frames is small sands of frames, resulting in millions of possible edges. The
and usually the part of the scene displayed in one image is small crucial point is therefore how to choose promising edge candi-
enough not to contain any dominant 3-D structure. Obviously, dates. Different strategies have been presented in the context of
the planarity assumption is violated on a global scale when endoscopic mosaicking. Soper et al. [29] tried to complete the
stitching images from a larger scene. The problems related to graph by reducing an exhaustive search to every nth frame of the
this have given rise to a whole set of publications, which are sequence, followed by a further edge densification based on as-
discussed in the next section. While a homography with eight sociativity within the graph. Although this already significantly
degrees of freedom is most often used, Behrens et al. [21] re- reduces the number of transformation estimates to calculate,
duce the complexity by assuming an affine transformation with they report processing times of several hours (including global
six degrees of freedom. In order to model the interframe trans- optimization through incremental bundle adjustment). Miranda-
formation on the curved human retina more accurately, Can Luna et al. [14] also point out the need to perform global op-
et al. [33] and Cattin et al. [35] use a 12-parameter quadratic timization and describe a corresponding optimization scheme,
transformation model. but loop-closing edges are selected manually. Another graph-
In the case of pixel-based registration the transformation pa- based approach has been presented by Weibel et al. [27]. They
rameters are estimated by an optimization procedure. In the case estimate the amount of potential overlap between two nonse-
of feature-based registration, the point correspondences give rise quential frames on the basis of the initial homography estimates
to an over-determined system of equations, which can be solved and use this to model a cost function for the frame graph edges.
using a Random Sample Consensus scheme [143] or one of its A greedy algorithm is then used to search for possible overlap-
numerous derivatives, such as MSAC or MLESAC [144]. ping frame pairs within this graph. Their mosaicking method is
For comprehensive surveys on general aspects of image regis- also an offline method, as they report a processing time of 1 h for
tration the reader is referred to the publications by Brown [145], a mosaic consisting of 150 out of 1500 images. Seshamani et al.
Goshtaby [146], and Zitov and Flusser [147]. [149] presented different global adjustment methods for direct
image registration, based on graph representation and loop de-
D. Global Alignment and Optimization tection. They create globally consistent mosaics of some 10–50
images from an endoscopic video of the endometrium. Once ad-
On the basis of the results of registering sequential video ditional frame correspondences have been established, a global
frames, the images can be aligned in a common coordinate sys- error measure is minimized using bundle adjustment. Weibel
tem. The straightforward way of doing this is to compute a et al. [27] used a twofold strategy, first minimizing the SSD
global homography for each frame as the product of all homo- between all overlapping frames and then applying bundle ad-
graphies describing the local frame-to-frame transformations justment to grid points regularly chosen over the mosaic. Soper
and thus placing each frame on a planar projection surface. This et al. [29] followed the standard SfM approach and estimated
is the approach taken by Behrens et al. [21], Miranda-Luna et al. camera poses and 3-D positions of feature points in the scene,
[14], and Weibel et al. [27] for cystoscopic panorama images. upon which classical bundle adjustment is performed to reduce
Similarly, Iakovidis et al. [51] searched for clusters of overlap- the global reprojection error.
ping consecutive frames within the set of thousands of frames Dynamic view enhancement, building on filter-based SLAM,
captured during WCE in order to calculate local panorama im- takes a different approach to the generation of a globally
ages. Two problems arise from the consecutive strategy: First, consistent scene representation. Typically, the current camera
the planar projection surface leads to major distortions if the pose and the global scene representation are combined into
scene significantly deviates from a plane; and second, the errors one state vector. Assuming the Markov property that the cur-
occurring during registration accumulate with increasing frame rent state only depends on its direct predecessor state, the state
numbers. As a consequence, this strategy is only applicable with vector is incrementally modified in an alternating way: In the
small mosaics and tends to fail for the application of mapping prediction step, the current state is estimated on the basis of
of the entire bladder, for example, from a cystoscopy video. a camera motion model. In the measurement step, the state is
The problem of geometric distortions can be reduced by updated according to the observation of the scene in the cur-
choosing an appropriate projection surface that is similar to rent camera image. When loops can be successfully detected,
the shape of the scene. This topic is discussed in Section III-E. the additional observations lead to a refined state estimate. This
The issue of accumulating error can be addressed using a graph approach has been followed—primarily for laparoscopic view
representation. Each vertex of the graph represents a frame and enhancement—by Grasa et al. [59], [60] and Mountney et al.
each edge a connection through a frame-to-frame transforma- [55], [150].
tion. The goal is to find further edges between nonconsecutive
frames, which can then enable a global optimization strategy to
minimize a global error measure. Finding these additional edges E. Projection Surface
is a challenging task and is very similar to the loop-closing prob- In order to generate a composite view of the scene, a sur-
lem in SLAM applications. A comparison of loop-closing tech- face defining a global coordinate system is needed onto which
niques developed for SLAM problems can be found in [148]. the images or parts of images can be projected. Two basic ap-
Checking all possible image pairs for a transformation results in proaches can be identified for choosing the projection surface,
n (n −1)
2 ∈ O(n2 ) edge candidates. A typical cystoscopy video which we call model-based and model-free. The model-based
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 313

approach assumes a fixed geometric model of the underlying a single image. The optical setup of endoscopes gave rise to sev-
scene structure. A planar projection surface or cylindrical or eral methods which adapt the SfS approach to handle a single
spherical shapes are most commonly used. The model chosen light source which is not co-located with the camera center. Wu
should fairly approximate the shape of the organ. The major- et al. [159] extended SfS to deal with perspective projection and
ity of stitching approaches use a planar projection model. This near point light sources not located at the camera center. They
allows a straightforward projection procedure based on the pre- showed the efficacy of their approach on images of artificial
viously calculated interframe affine or projective transform ma- bones. Visentini-Scarzanella et al. [160] presented a variation
trices. As stated earlier, major distortions arise if the scene is of this approach for metric depth recovery, applied to images of
not well represented by a plane. As an alternative representation, the stomach lining and the esophagus.
Szeliski [1] suggests either cylindrical or spherical coordinate
representations. Several papers on cystoscopic and retinal image F. Blending and Visualization
stitching use a spherical projection surface as a fair approxima-
Once all of the relevant frames have been aligned within a
tion to the geometry of the organs (e.g., Soper et al. [29], Wei
global coordinate system, the mosaic can be rendered. Since
et al. [37]). A bootstrapping method, which starts with a simple
usually more than one video frame contributes to a single mo-
two-parameter translational model and extends to a complex
saic pixel, blending techniques have been proposed for choosing
12-parameter quadratic model (which consequently allows for
the final pixel values. Two general cases can be distinguished.
a spherical scene) has been proposed by Can et al. [33] and
If the mosaic is calculated as an offline process, all frames are
Stewart et al. [151]. A cylindrical projection surface is usually
usually available and a weighted average can be computed. On
the first choice for stitching and reconstruction of tubular or-
the other hand, if the stitching process is incremental—i.e., each
gans such as the esophagus (see [41], [42]–[45]) or the urethra
new image is added to the mosaic and discarded afterward—only
(see[46]–[48]).
the mosaic and the most recent image can be used for blend-
The model-free approach does not assume any geometric con-
ing. Most blending algorithms can be applied to both scenarios,
straints, but aims at general 3-D surface reconstruction. SfM,
but with different results. Only Bouma et al. [64], Soper et al.
SfS, and stereo vision, as well as active vision techniques us-
[29], and Weibel et al. [27] therefore use all available frames
ing structured light or distance sensors, have been applied for
for blending; the majority of authors follow the incremental ap-
endoscopic surface reconstruction. Active techniques require
proach. The goal of any blending scheme is to provide smooth
dedicated endoscopes—an area that lies beyond the scope of
transitions between the images and at the same time preserve as
this review and is therefore omitted here. For the field of 3-D
much structural information from the input images as possible.
surface reconstruction in laparoscopy (including stereoscopic
Standard blending algorithms applied for endoscopic mosaick-
reconstruction, monocular shape-from-X methods, sparse and
ing are linear alpha blending [161], multi-band image blending
dense SLAM solutions, as well as structured-light and time-of-
[162], and optimal seam detection [163]. When weighting func-
flight techniques in both rigid and deformable environments),
tions are being designed for the blending schemes, most authors
the reader is referred to the review by Maier-Hein et al. [53].
take into account the fact that the inner regions in endoscopic
SfM has been applied in various fields of endoscopy. In cys-
images show a greater contrast and thus contain more relevant
toscopy, Soper et al. [29] start with a spherical bladder model,
information than the outer regions, mainly due to vignetting
which is later relaxed to allow for a more general reconstruc-
effects. Weighting pixels in relation to their distance from the
tion. A wide variety of applications of SfM algorithms exist
center of the image (also referred to as feather blending) is
in the field of MIS. For reconstruction of the surface of the
the preferred method for endoscopic stitching. The basic linear
heart, Hu et al. [71], [72] have presented methods based on
alpha blending (with or without feathering) usually produces
monocular SfM. Mourgues et al. [152] created a surface model
smooth transitions but also tends to blur image structure. This
to visualize the heart, suppressing surgical instruments, based
effect is reduced with multi-band blending, which is therefore
on stereo vision. The reconstruction of sinusoidal cavities for
applied by Behrens et al., who also introduced a nonlinear com-
video/CT registration with SfM has been described by Mirota
ponent to prefer brighter images over darker ones [24]. In [164]
et al. [87], [88], Wang et al. [153], [154], Burschka et al. [78],
and [27], Weibel et al. present a blending strategy as energy
[79], and Wittenberg et al. [80]. In laparoscopy, Bouma et al.
minimization problem. They implement an optimal seam detec-
[64] have used SfM to reconstruct the abdominal cavity (using
tion algorithm, in which they formulate the energy function in
either monocular or stereoscopic images). The majority of re-
such a way that sharper image regions (based on Michaelson
construction approaches in laparoscopy use EKF-based SLAM
contrast) are preferred over blurrier ones and the color gradient
methods for scene reconstruction (see [55], [59], [60], [155]),
along the seam is kept low at the same time. Their method is
and these have also been extended to handle tissue deforma-
based on that of Kwatra et al. [165].
tion [156], [157]). Gonzalez et al. [158] have combined SfS
The spectrum of methods involved in endoscopic view ex-
to estimate a depth map with a tracking approach for pose es-
pansion and discussed in this section is summarized in a mind
timation of surgical tools. The few publications on stitching
map in Fig. 5.
and reconstruction of the colon are dominated by the classical
SfM approach (see [92]–[94]). Exceptions are Kaufman and
G. Online Versus Offline Methods
Wang [95], who extracted the geometry using SfS, and Hong
et al. [96], who used manually labeled colon folds and a tubular Depending on the application concerned—view expansion
model of the organ to reconstruct the local colon geometry from during endoscopy, or documentation afterward (see Section I-
314 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

Image masking
Distortion correction Preprocessing
Camera calibration Frame selection Fixed step
De-Vignetting Adaptive step

Planar image stitching


Surface SSD
Shape-from-Motion
reconstruction Pixel-based NCC
Shape-from-Shading method
Stereo vision MI
Concept
Active vision SIFT
SLAM Feature-based SURF
Methods of Randomized trees
Linear alpha endoscopic view
blending expansion Translational transform
Registration Affine transform
Feather blending Blending Frame to frame
Multi-band blending transformation Homography
Optimal seam (projective transform)
Quadratic transform
Plane Acculumative
Cube Global frame Graph-based
Model-based Projection surface alignment
Sphere Geometric clues
Cylinder Filter-based SLAM
Model-free

Fig. 5. Methods applied in endoscopic view expansion.

B above)—computation has to be performed either online or error, structural information in the resulting mosaic, smoothness
offline. The stitching and reconstruction approaches mentioned along the seams between images, and processing speed. In the
above differ greatly in their processing speed. In general, the case of stitching in combination with surface reconstruction, the
great majority of algorithms presented so far are not applica- precision of the reconstruction is also an issue. There is no gold
ble in real-time and take several minutes or hours to process. standard for evaluating stitching results, and to the best of our
This is true for all stitching and reconstruction methods that knowledge there is no public database that provides a ground
include a global optimization step, such as bundle adjustment. truth against which the different approaches could be compared.
The SLAM methods, primarily applied in laparoscopy, are an In fact, each group of authors present their own method of eval-
exception to this—as presented by Mountney and Yang’s group uation using nonstandardized data. Since it is generally difficult
[55], [138], [139], [141], [150], [156], [157] as a stereoscopic to generate a ground truth with which to compare the stitching
approach, as well as by Grasa et al. [59], [60], [155] for mono- results, many authors do without any quantitative evaluation and
scopic views (see Section II-D). Bouma et al. [64] have also limit themselves to presenting stitched images to give the reader
presented a real-time reconstruction approach for MIS, which a visual impression of the quality. These images can either be
incorporates stereoscopic ego-motion computation in real-time. calculated from real clinical data or phantom data. Nevertheless,
For microendoscopy, the scene can be assumed to be planar, some efforts have been made to objectify quality assessment in
allowing successful image stitching in real-time, as presented endoscopic mosaicking.
by Vercauteren et al. [12], [103], [104] and Bedard et al. [105]
(see Section II-G). In this method, mosaics are generated at
11 frames/s. For planar stitching in fluorescence cystoscopy, A. Registration Error
Behrens et al. [25], [166] have presented an online method The error made during frame-to-frame registration can
based on a multi-threaded software framework (see Section II- be measured in several ways. Behrens and Röllinger [168]
A). They achieved a rate of 5 frames/s on standard PC hardware compared the calculated frame-to-frame homographies to ref-
in 2010. Bergen et al. described live stitching of liver images at erence homographies calculated from manually set point corre-
a frame rate of 7 frames/s in 2009 [167]. Since 2006, Hager’s spondences. Instead of manual annotation, Ben-Hamadou et al.
group [38], [39] have presented several real-time mosaicking [18], Hernandez-Mier et al. [20], and Miranda-Luna et al. [14]
approaches for the retina (see Section II-B), reporting frame stitched a photograph of a pig bladder with an additional point
rates of 30 frames/s. All of these methods are based on incre- grid printed on it as a phantom. The point grid can be extracted
mental planar stitching algorithms without global optimization automatically from the stitched image to allow comparison of
or reconstruction of the 3-D organ geometry. Exceptions to this the extracted point positions with the true ones. Another com-
are the SLAM methods mentioned above and methods that use mon method of generating ground truth data is through simula-
depth information from stereo vision. tion. Weibel et al. [27], among others, have simulated an image
sequence by taking subimages from a high-resolution pig blad-
der photograph. Since all subimage transformations are known,
IV. EVALUATING STITCHING RESULTS
the registration result can be compared to the true transforma-
In efforts to assess the quality of stitching algorithms, several tions by means of the mean endpoint error, as suggested by
aspects need to be addressed: Accuracy in terms of registration Baker et al. [169] for optical flow evaluation.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 315

B. Seam and Structure Quality


To quantify the quality of the final mosaic in terms of the
preservation of image structure, Behrens et al. [170] have pre-
sented a measure based on the structural similarity index (SSIM)
published by Wang et al. [171] and extended by Li et al. [172].
Weibel [28] used a measure called “difference of mean gradient
magnitude” to compare the gradient strength within the input
images to the gradient strength in the corresponding mosaic
regions. While preservation of the image structure can be quan-
tified in the ways described, there have been no publications
describing a measure for quantifying smooth seams between
images. This is always left up to the reader’s visual impression.

C. Processing Speed Fig. 6. Publication database analysis. The search results from the Scopus
database are depicted as numbers of listed publications over the years.
Processing speed is probably the only parameter that can
be quantitatively evaluated easily. For offline approaches, most
authors indicate the amount of time needed to generate a full
panorama from a given number of input images. The calculation
time can of course increase disproportionately with the number
of frames, so that averaging over all frames is not legitimate. For
real-time stitching approaches, it is important that the processing
time per frame does not increase over time. An average rate of
frames processed per second is usually an adequate measure for
assessing real-time capability.

V. PUBLICATION DATABASE ANALYSIS


We used the Scopus database (www.scopus.com), provided
by the Elsevier academic publishing company, to gain some in-
sight into the historical development of academic publications
on endoscopic image stitching and surface reconstruction. Sco- Fig. 7. Search results for different keywords categorized by field of application
pus includes some 50 million records, covering 20 000 peer- as totals for the years 1996–2013.
reviewed journals from 5000 publishers, and thus provides a
suitable basis for citation analysis. To find relevant papers, we
searched for keywords in the fields for title, abstract, and key- that the application of this type of image processing technique
words. We were interested in any paper on image stitching, to endoscopic procedures is a fairly young research area that
mosaicking, SLAM, or surface reconstruction for endoscopic has only attracted major interest during the past ten years. Since
applications. We therefore carried out six searches for any of 2011, around 20 technical papers have been published every
the keywords stitch*, mosaic*, slam, surface reconstruction in year. As there are no signs of any decrease in this number, it
combination with an endoscopic procedure, or inspected organ: can be assumed that the peak has not yet been reached and
urolog* OR bladder OR cystosc*; laparosc* OR ”minimally that interest in EVE using image processing is continuing to in-
invasive surgery”; colon OR colonoscopy OR coloscopy OR crease. To provide an impression of the most active application
”capsule endoscopy”; neurosurg*; and retinal surgery. In ad- fields of EVE, Fig. 7 shows the total number of database en-
dition, the search was limited to the subject areas of computer tries over time (summed up from 1996 to 2013) for the different
science, engineering and mathematics, since we are primarily application-related search terms. As can be seen, laparoscopy
interested in technical papers. Including medicine int the subject and urology have been the predominant application fields, with
area leads to an unacceptable number of false-positive results, 51 and 25 publications, respectively, while around ten publica-
since some search terms like such as mosaic (=a pathologi- tions are found for the other fields.
cal tissue appearance) and stitching (=surgical sewing) have a
different meaning in the medical vocabulary.
VI. CONCLUSION
Fig. 6 depicts the results as numbers of database entries found
for the years 1996–2013, combining all fields of medical appli- To conclude this review of the current state of the art in endo-
cation. The diagram gives an impression of the development of scopic panorama imaging, we suggest answers to the following
scientific activity on endoscopic image stitching, SLAM, and questions: What is the degree of technological maturity of the
surface reconstruction over time. A progressive increase in the solutions proposed in the literature? What are the current tech-
number of publications over the years can be seen. Very few nical and application-related challenges? What trends can be
publications are found in the database before 2004—showing observed in recent developments?
316 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

Endoscopic applicaƟon Technological Readiness Level (TRL)

Planar sƟtching in Endomicroscopy TRL 9 System ready for full-scale deployment

TRL 8 System incorporated into a commercial design

TRL 7 System prototype demonstraƟon in a clinical environment

System prototype demonstraƟon in a relevant


TRL 6 environment
Bladder sƟtching
Tubular sƟtching of urethra, esophagus TRL 5 Breadboard validaƟon in relevant environment
SLAM in laparoscopy

Bladder surface reconstrucƟon


SƟtching/ReconstrucƟon in sinus surgery
TRL 4 Breadboard validaƟon in laboratory environment

Surface reconstrucƟon of urethra, esophagus AnalyƟcal and experimental criƟcal funcƟon and/or
SƟtching/ReconstrucƟon of small colon parts
TRL 3 characterisƟc proof-of-concept

Laryngoscopic sƟtching
SƟtching/ReconstrucƟon of colon
TRL 2 Technology concept and/or applicaƟon formulated

SƟtching/reconstrucƟon from PillCam TRL 1 Basic principles observed and reported

Fig. 8. TRLs of endoscopic applications for stitching and surface reconstruction methods.

A. Technology Readiness Level


The technology readiness level (TRL) can be used to assess
the technological state of the art of endoscopic image stitch- ComputaƟon of
VisualizaƟon for
Image acquisiƟon in panorama image Human-machine
ing and surface reconstruction [173]. Criteria for the TRL were clinical environment or scene
navigaƟon or
interacƟon
documentaƟon
presented by NASA in 1995 to provide a systematic measure- reconstrucƟon

ment of technological maturity, categorizing technologies from


a basic research level to the state of commercial deployment.
We rated the degree of technological readiness in each field of
endoscopic applications. Fig. 8 shows the definitions of the nine Fig. 9. Activities involved in endoscopic view expansion and panorama imag-
levels on the right side, with our assessment of technological ing.
maturity on the left side. To the best of our knowledge, the only
commercially available product (TRL 9) related to endoscopic
3) has been successfully provided for 3-D reconstruction of the
image stitching is Cellvizio, by Mauna Kea Technologies, for
surface of tubular organs such as the urethra or esophagus, as
endomicroscopy. We regard a fully functional system that is in
well as parts of the colon (e.g., polyps). The next challenge will
the process of being validated in preclinical (TRL 6) or clinical
be to quantitatively validate relevant approaches using phan-
(TRL 7) studies as a system prototype. We are not aware of any
tom or ex-vivo data. The conceptual feasibility of laryngoscopic
endoscopic system providing stitching or 3-D reconstruction
stitching has been demonstrated (TRL 2). Stitching and surface
functionalities that is currently in this state. Although research
reconstruction of arbitrary dynamic and geometrically complex
in this field has been conducted for more than a decade, the tech-
environments such as the colon is still an unsolved problem.
nical challenges involved seem to have prevented a faster devel-
While at least the image data acquired during colonoscopy
opment. The complexity of image processing methods which
can be expected to show most of the surface with relatively
are able to handle mostly deformable tissue inside the human
controlled camera motion, images captured by a CE cannot
body is high. Consequently, the requirement of designing a ro-
be guaranteed to provide sufficient overlap for registration—
bust system which proves itself useful in the clinical routine
making comprehensive stitching or reconstruction of the colon
has not yet been met. Probably, closest to this are the stitching
surface even more challenging.
approaches based on planar or tubular models, which have been
shown to provide reliable results in experimental setups with B. Challenges and Trends
real patient data (TRL 5). The same holds for SLAM meth- We shall structure our consideration of current challenges
ods applied to laparoscopic image data (also in the presence and trends in endoscopic stitching and reconstruction accord-
of periodic motion/deformation). Geometric reconstruction of ing to the activities involved in the process. Fig. 9 depicts four
the urinary bladder has been successfully demonstrated using such activities. Image acquisition is the process of maneuver-
phantom data (TRL 4), as well as stitching in transnasal surgery ing the endoscope (either manually or with robotic assistance)
and reconstruction of the pituitary gland. Proof of concept (TRL in the operating or examination room. The panorama image
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 317

or reconstructed model is computed either online during image algorithms are available for camera calibration in a fixed optical
acquisition, or offline in a postprocessing manner. The result system, changes in the system, such as refocusing, still pose a
has to be visualized for the doctor or clinical staff for naviga- challenge. Image registration based on salient image features or
tional and documentary purposes. Depending on the application, direct alignment is still a current research topic, due to the com-
human-machine interaction may be necessary. monly difficult image quality conditions and high demands on
With regard to the image acquisition process, current trends robustness and computational speed. Interestingly, recent devel-
can be observed in relation to interdisciplinary influences. Ad- opments in nonmedical applications, such as image processing
vances in multimedia technology and camera chip development and computer vision on smartphones, digital photo cameras, and
have led to rapidly increasing image resolutions. While only a embedded systems, do not appear to have been exhaustively
few years ago, VGA (640 × 480 pixels) was a common reso- transferred to endoscopic applications, despite their potential
lution level, most current endoscopy systems already provide for making a valuable contribution. Depending on the hardware
full HD resolution (1920 × 1080 pixels). The first 4K systems platform available in the endoscopic system, different imple-
(3840 × 2160 pixels) are now also being developed. While this mentations for algorithm parallelization and acceleration are
progress is having a positive effect on the image quality, the possible. Multicore central processing units, general-purpose
increasing amount of image data also poses a challenge on the graphic processing units, and embedded processors or field-
computational side with regard to computational load and mem- programmable gate arrays allow parallel computation, which
ory requirements. In addition, the increasing influence of robotic is certainly one of the current high-priority research areas for
and machine vision developments on the medical field is provid- image processing in general and endoscopic stitching and 3-D
ing new data sources for improving stitching and reconstruction reconstruction in particular.
approaches, such as kinematic data. The same also applies for In addition to computation, visualization of a large panoramic
other sensory enhancements for endoscopic devices, such as view and 3-D scene reconstruction also pose challenges in en-
motion sensors (external tracking systems or acceleration sen- doscopy. As the physician is used to being presented with the
sors) and depth sensors (time-of-flight technology, structured image provided by the endoscopic camera for examination, di-
light, and stereo imaging). A further trend, influenced by pro- agnosis, or intervention, the way in which the additional vi-
duction technology, is the increasing miniaturization of endo- sual information should be optimally provided and the view
scopes. Single-port laparoscopy devices, with small-diameter augmented is an open question. For navigational purposes, the
endoscopes and microendoscopes as thin as a human hair, are problem is how to present the unmanipulated endoscope image
being developed, as well as miniaturized CMOS camera sensors within an augmented computed context (which may be delayed
of submillimeter size. The small design often leads to strong op- and is not guaranteed to resemble the current scene correctly) in
tical distortions and an even more reduced field of view. This such a way that it improves orientation for the surgeon without
may even further increase the demand for robust software meth- reducing his or her attentiveness. For documentation purposes,
ods for enhancing the field of view. the computed map or model has to be integrated into the patient’s
Several challenges and trends can be observed with regard record. Although there are several publications in the literature
to algorithmic approaches for image stitching and 3-D surface that deal with these questions (e.g., [57]), they will become even
reconstruction. The problem of image stitching or texturized re- more important as the technology increasingly matures.
construction of unknown rigid scenes was conceptually solved Related to the question of visualization is the matter of
by the computer vision community some 20–30 years ago. Cur- human-machine interaction when manipulation of the view
rent advances significant for endoscopic applications are ac- presented, such as zooming or changes of perspective, is nec-
celeration up to real-time capability, and robust and precise essary. Particularly in the clinical environment, classical forms
handling of large and complex as well as nonrigid scenes. A of interaction through a mouse, keyboard, or touch panel are
real-time capability is inevitably necessary for dynamic view unfeasible due to unacceptable interference with the clinical
enhancement in endoscopy. Some successful implementations workflow (e.g., [174]). To deal with this problem, research is
of SLAM or real-time stitching have been presented for la- being conducted on alternative interfaces such as voice control
paroscopy (using stereoscopic vision) and simple, mostly pla- or gesture recognition ([175], [176]). Like the visualization as-
nar, geometries. Fast reconstruction of more complex shapes is pect, these topics are likely to become even more important in
still a major task. Since the human body is a mostly nonrigid the near future.
environment, the rigidity assumption present in most algorithms
is infringed. There have recently been increasing efforts to find REFERENCES
appropriate ways of handling dynamic scenes. For laparoscopy
[1] R. Szeliski, “Image alignment and stitching: A tutorial,” Found. Trends.
or heart surgery, deformation models have been formulated for Comput. Graph. Vis., vol. 2, no. 1, pp. 1–104, Jan. 2006.
modeling systematic motion or deformation due to heartbeat and [2] R. Szeliski, Computer Vision: Algorithms and Applications. 1st ed., New
breathing. The reconstruction of a nonrigid environment has to York, NY, USA: Springer-Verlag, 2010.
[3] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “
be regarded as a yet unsolved problem, particularly since defor- Bundle adjustment—A modern synthesis,” in Vision Algorithms: Theory
mation can become very complex and unsuitable for a periodic and Practice (ser. Lecture Notes in Computer Science), B. Triggs, A.
motion model in the case of ego-motion of the patient or organ Zisserman, and R. Szeliski, Eds., Berlin, Germany: Springer, Jan. 2000,
deformation due to the physician’s interaction with the patient. pp. 298–372.
[4] R. Szeliski and H.-Y. Shum, “Creating full view panoramic image mo-
At a more basic level, there appears to be room for improvement saics and environment maps,” in Proc. 24th Annu. Conf. Comput. Graph-
for many steps in the image-processing pipeline. While suitable ics Interactive Techn., 1997, pp. 251–258.
318 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

[5] D. Capel and A. Zisserman, “Automated mosaicing with super-resolution [27] T. Weibel, C. Daul, D. Wolf, R. Rösch, and F. Guillemin, “Graph
zoom,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recog., based construction of textured large field of view mosaics for blad-
1998, pp. 885–891. der cancer diagnosis,” Pattern Recog., vol. 45, no. 12, pp. 4138–4150,
[6] M. Brown and D. Lowe, “Recognising panoramas,” in Proc. 9th IEEE 2012.
Int. Conf. Comput. Vision, 2003, vol. 2, pp. 1218–1225. [28] T. Weibel, “Modèles de minimisation d’énergies discrètes pour
[7] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer la cartographie cystoscopique,” Ph.D. dissertation, Univ. Lorraine,
Vision, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2004. IAEM – Ecole Doctorale Informatique, Automatique, Électronique –
[8] N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from Électrotechnique, Mathématiques, Nancy, France, Jul. 2013.
internet photo collections,” Int. J. Comput. Vision, vol. 80, no. 2, [29] T. Soper, M. Porter, and E. J. Seibel, “Surface mosaics of the bladder
pp. 189–210, Dec. 2007. reconstructed from endoscopic video for automated surveillance,” IEEE
[9] A. Davison, I. Reid, N. Molton, and O. Stasse, “MonoSLAM real-time Trans. Biomed. Eng., vol. 59, no. 6, pp. 1670–1680, Jun. 2012.
single camera SLAM,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, [30] W. J. Yoon, S. Park, P. G. Reinhall, and E. J. Seibel, “Development of
no. 6, pp. 1052–1067, Jun. 2007. an automated steering mechanism for bladder urothelium surveillance,”
[10] G. Klein and D. Murray, “Parallel tracking and mapping for small AR J. Med. Devices, vol. 3, no. 1, p. 11004, Mar. 2009.
workspaces,” in Proc. 6th IEEE ACM Int. Symp. Mixed Augmented Re- [31] W. Yoon, M. Brown, P. Reinhall, S. Park, and E. Seibel, “Design
ality, 2007, pp. 225–234. and preliminary study of custom laser scanning cystoscope for auto-
[11] I. Fleming, S. Voros, B. Vagvolgyi, Z. Pezzementi, J. Handa, R. Tay- mated bladder surveillance,” Minimally Invasive Therapy Allied Technol.,
lor, and G. Hager, “Intraoperative visualization of anatomical targets in vol. 21, no. 5, pp. 320–328, 2012.
retinal surgery,” in Proc. IEEE Workshop Appl. Comput. Vision, 2008, [32] M. Burkhardt, T. Soper, W. Yoon, and E. Seibel, “Controlling the trajec-
pp. 1–6. tory of a flexible ultrathin endoscope for fully automated bladder surveil-
[12] T. Vercauteren, A. Perchant, G. Malandain, X. Pennec, and N. Ayache, lance,” IEEE/ASME Trans. Mechatronics, vol. 19, no. 1, pp. 366–373,
“Robust mosaicing with correction of motion distortions and tissue de- Feb. 2014.
formations for in vivo fibered microscopy,” Med. Image Anal., vol. 10, [33] A. Can, C. V. Stewart, B. Roysam, and H. L. Tanenbaum, “A feature-
no. 5, pp. 673–692, 2006. based, robust, hierarchical algorithm for registering pairs of images of
[13] R. Miranda-Luna, Y. Hernandez-Mier, C. Daul, W. Blondel, and D. Wolf, the curved human retina,” IEEE Trans. Pattern Anal. Mach. Intell.,
“Mosaicing of medical video-endoscopic images: data quality improve- vol. 24, no. 3, pp. 347–364, Mar. 2002.
ment and algorithm testing,” in Proc. 1st Int. Conf. Electr. Electron. Eng., [34] D. Becker, A. Can, J. Turner, H. Tanenbaum, and B. Roysam, “Image
2004, pp. 530–535. processing algorithms for retinal montage synthesis, mapping, and real-
[14] R. Miranda-Luna, C. Daul, W. C. P. M. Blondel, Y. Hernandez-Mier, time location determination,” IEEE Trans. Biomed. Eng., vol. 45, no. 1,
D. Wolf, and F. Guillemin, “Mosaicing of bladder endoscopic image se- pp. 105–118, Jan. 1998.
quences: Distortion calibration and registration algorithm,” IEEE Trans. [35] P. C. Cattin, H. Bay, L. V. Gool, and G. Székely, “Retina mosaicing
Biomed. Eng., vol. 55, no. 2, pp. 541–553, Feb. 2008. using local features,” in Proc. Med. Image Comput. Comput.-Assisted
[15] Y. Hernandez-Mier, W. Blondel, C. Daul, D. Wolf, and G. Bourg-Heckly, Intervention, Jan. 2006, pp. 185–192.
“2-D panoramas from cystoscopic image sequences and potential appli- [36] T. E. Choe, I. Cohen, M. Lee, and G. Medioni, “Optimal global mosaic
cation to fluorescence imaging,” in Proc. 6th IFAC Symp. Modelling generation from retinal images,” in Proc. 18th Int. Conf. Pattern Recog.,
Control Biomed. Syst., Sep. 2006, pp. 291–296. 2006, vol. 3, pp. 681–684.
[16] S. Olijnyk, Y. Hernández Mier, W. C. P. M. Blondel, C. Daul, D. Wolf, [37] L. Wei, L. Huang, L. Pan, and L. Yu, “The retinal image mosaic based
and G. Bourg-Heckly, “Combination of panoramic and fluorescence en- on invariant feature and hierarchial transformation models,” in Proc. 2nd
doscopic images to obtain tumor spatial distribution information useful Int. Cong. Image Signal Process., 2009, pp. 1–5.
for bladder cancer detection,” in Proc. Soc. Photo-Opt. Instrum. Eng. [38] S. Seshamani, W. Lau, and G. Hager, “Real-time endoscopic mosaick-
Conf. Ser., Jul. 2007, vol. 6631, p. 29. ing,” in Proc. Med. Image Comput. Comput.-Assisted Intervention, 2006,
[17] A. Ben-Hamadou, C. Soussen, W. Blondel, C. Daul, and D. Wolf, pp. 355–363.
“Comparative study of image registration techniques for bladder video- [39] R. Richa, B. Vágvölgyi, M. Balicki, G. Hager, and R. H. Taylor, “Hy-
endoscopy,” in Proc. Eur. Conf. Biomed. Opt., 2009, p. 737118. brid tracking and mosaicking for information augmentation in retinal
[18] A. Ben-Hamadou, C. Daul, C. Soussen, A. Rekik, and W. Blondel, surgery,” in Proc. Med. Image Comput. Comput.-Assisted Intervention,
“A novel 3D surface construction approach: Application to three- 2012, pp. 397–404.
dimensional endoscopic data,” in Proc. 17th IEEE Int. Conf. Image [40] B. Rousso, S. Peleg, I. Finci, and A. Rav-Acha, “Universal mosaicing
Process., 2010, pp. 4425–4428. using pipe projection,” in Proc. 6th Int. Conf. Comput. Vision, 1998,
[19] C. Daul, W. P. C. M. Blondel, A. Ben-Hamadou, R. Miranda-Luna, C. pp. 945–950.
Soussen, D. Wolf, and F. Guillemin, “From 2D towards 3D cartography [41] E. J. Seibel, R. Carroll, J. Dominitz, R. Johnston, C. Melville, C. Lee,
of hollow organs,” in Proc. 7th Int. Electr. Eng. Comput. Sci. Automat. S. Seitz, and M. Kimmey, “Tethered capsule endoscopy, a low-cost and
Control Conf., 2010, pp. 285–293. high-performance alternative technology for the screening of esophageal
[20] Y. Hernandez-Mier, W. Blondel, C. Daul, D. Wolf, and F. Guillemin, cancer and Barrett’s esophagus,” IEEE Trans. Biomed. Eng., vol. 55,
“Fast construction of panoramic images for cystoscopic exploration,” no. 3, pp. 1032–1042, Mar. 2008.
Comput. Med. Imag. Graphics, vol. 34, no. 7, pp. 579–592, 2010. [42] C. Yang, T. Soper, and E. Seibel, “Detecting fluorescence hot-spots using
[21] A. Behrens, “Creating panoramic images for bladder fluorescence en- mosaic maps generated from multimodal endoscope imaging,” in Proc.
doscopy,” Acta Polytechnica J. Adv. Eng., vol. 48, no. 3, pp. 50–54, SPIE, Prog. Biomed. Opt. Imag., 2013, vol. 8575, p. 857508.
2008. [43] A. O. Shar, J. C. Reynolds, and B. B. Baggott, “Computer enhanced
[22] A. Behrens, T. Stehle, S. Gross, and T. Aach, “Local and global panoramic endoscopic visualization,” in Proc. Annu. Symp. Comput. Appl. Med.
imaging for fluorescence bladder endoscopy,” in Proc. Annu. Int. Conf. Care, Nov. 1990, pp. 544–546.
IEEE Eng. Med. Biol. Soc., 2009, pp. 6990–6993. [44] R. Kim, B. B. Baggott, S. Rose, A. O. Shar, D. L. Mallory, S. S. Lasky, M.
[23] A. Behrens, I. Heisterklaus, Y. Müller, T. Stehle, S. Gross, and T. Aach, Kressloff, L. Y. Faccenda, and J. C. Reynolds, “Quantitative endoscopy:
“2-D and 3-D visualization methods of endoscopic panoramic bladder Precise computerized measurement of metaplastic epithelial surface area
images,” in Proc. Med. Imag. 2011: Visualization, Image-Guided Proce- in barrett’s esophagus,” Gastroenterology, vol. 108, no. 2, pp. 360–366,
dures, Modeling, Feb. 2011, p. 796408. Feb. 1995.
[24] A. Behrens, M. Guski, T. Stehle, S. Gross, and T. Aach, “A non-linear [45] J. C. Reynolds, “Innovative endoscopic mapping technique of barrett’s
multi-scale blending algorithm for fluorescence bladder images,” Com- mucosa,” Am. J. Med., vol. 111, no. 8, Supplement 1, pp. 142–146,
put. Sci. - Res. Develop., vol. 26, no. 1, pp. 125–134, 2011. Dec. 2001.
[25] A. Behrens, M. Bommes, T. Stehle, S. Gross, S. Leonhardt, and T. [46] T. Igarashi, S. Zenbutsu, T. Yamanishi, and Y. Naya, “ Three-dimensional
Aach, “Real-time image composition of bladder mosaics in fluorescence image processing system for the ureter and urethra using endoscopic
endoscopy,” Comput. Sci.-Res. Develop., vol. 26, nos. 1/2, pp. 51–64, video,” J. Endourol./Endourol. Soc., vol. 22, no. 8, pp. 1569–1572, Aug.
2011. 2008.
[26] T. Bergen, T. Wittenberg, C. Münzenmayer, C. C. G. Chen, and G. D. [47] T. Igarashi, H. Suzuki, and Y. Naya, “Computer-based endoscopic image-
Hager, “A graph-based approach for local and global panorama imaging processing technology for endourology and laparoscopic surgery,” Int.
in cystoscopy,” in Proc. SPIE, vol. 8671, p. 86711K-1, 2013. J. Urol., vol. 16, no. 6, pp. 533–543, 2009.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 319

[48] T. Ishii, S. Zenbutsu, T. Nakaguchi, M. Sekine, Y. Naya, and T. Igarashi, [68] P.-L. Chang, D. Stoyanov, A. J. Davison, and P. Edwards, “Real-time
“Novel points of view for endoscopy: Panoramized intraluminal opened dense stereo reconstruction using convex optimisation with a cost-
image and 3D shape reconstruction,” J. Med. Imag. Health Informat., volume for image-guided robotic surgery,” in Proc. Med. Image Comput.
vol. 1, no. 1, pp. 13–20, Mar. 2011. Comput.-Assisted Intervention, Jan. 2013, pp. 42–49.
[49] M. Ou-Yang, W.-D. Jeng, Y.-Y. Wu, L.-R. Dung, H.-M. Wu, P.-K. Weng, [69] S. Atasoy, D. Noonan, S. Benhimane, N. Navab, and G. Yang, “A global
K.-J. Huang, and L.-J. Chiu, “Image stitching and image reconstruction approach for automatic fibroscopic video mosaicing in minimally inva-
of intestines captured using radial imaging capsule endoscope,” Opt. sive diagnosis,” in Proc. Med. Image Comput. Comput.-Assisted Inter-
Eng., vol. 51, no. 5, pp. 057004-1–057004-9, 2012. vention, 2008, pp. 850–857.
[50] S. Yi, J. Xie, P. Mui, and J. A. Leighton, “Achieving real-time capsule [70] M. Hu, D. Hawkes, G. Penney, D. Rueckert, P. Edwards, F. Bello, M.
endoscopy (CE) video visualization through panoramic imaging,” in Figl, and R. Casula, “A robust mosaicing method for robotic assisted
Proc. IS&T/SPIE Electron. Imag., 2013, p. 86560I. minimally invasive surgery,” in Proc. 7th Int. Conf. Informat. Control,
[51] D. K. Iakovidis, E. Spyrou, and D. Diamantis, “Efficient homography- Autom. Robot., Funchal, Portugal, 2010, vol. 2, pp. 206–211.
based video visualization for wireless capsule endoscopy,” in Proc. 2013 [71] M. Hu, G. Penney, P. Edwards, M. Figl, and D. J. Hawkes, “3D re-
IEEE 13th Int. Conf. Bioinformat. Bioeng., 2013, pp. 1–4. construction of internal organ surfaces for minimal invasive surgery,” in
[52] E. Spyrou, D. Diamantis, and D. Iakovidis, “Panoramic visual summaries Proc. Med. Image Comput. Comput.-Assisted Intervention, Jan. 2007,
for efficient reading of capsule endoscopy videos,” in Proc. 8th Int. pp. 68–77.
Workshop Semantic Soc. Media Adaptation Personalization, Dec. 2013, [72] M. Hu, G. Penney, M. Figl, P. Edwards, F. Bello, R. Casula, D. Rueckert,
pp. 41–46. and D. Hawkes, “Reconstruction of a 3D surface from video that is robust
[53] L. Maier-Hein, P. Mountney, A. Bartoli, H. Elhawary, D. Elson, A. to missing data and outliers: Application to minimally invasive surgery
Groch, A. Kolb, M. Rodrigues, J. Sorger, S. Speidel, and D. Stoyanov, using stereo and mono endoscopes,” Med. Image Anal., vol. 16, no. 3,
“Optical techniques for 3D surface reconstruction in computer-assisted pp. 597–611, 2012.
laparoscopic surgery,” Med. Image Anal., vol. 17, no. 8, pp. 974–996, [73] A. Malti, A. Bartoli, and T. Collins, “Template-based conformal shape-
2013. from-motion-and-shading for laparoscopy,” in Information Processing in
[54] M. Lerotic, A. J. Chung, J. Clark, S. Valibeik, and G.-Z. Yang, “Dynamic Computer-Assisted Interventions. New York, NY, USA: Springer, 2012,
view expansion for enhanced navigation in natural orifice transluminal pp. 1–10.
endoscopic surgery,” in Proc. Med. Image Comput. Comput.-Assisted [74] A. Bartoli, Y. Gerard, F. Chadebecq, and T. Collins, “On template-based
Intervention, 2008, pp. 467–475. reconstruction from a single view: Analytical solutions and proofs of
[55] P. Mountney and G. Yang, “Dynamic view expansion for minimally well-posedness for developable, isometric and conformal surfaces,” in
invasive surgery using simultaneous localization and mapping,” in Proc. Proc. IEEE Conf. Comput. Vision Pattern Recog., Jun. 2012, pp. 2026–
Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., 2009, pp. 1184–1187. 2033.
[56] D. Stoyanov, M. V. Scarzanella, P. Pratt, and G.-Z. Yang, “ Real-time [75] S. Giannarou and G.-Z. Yang, “Tissue deformation recovery with Gaus-
stereo reconstruction in robotically assisted minimally invasive surgery,” sian mixture model based structure from motion,” in Augmented En-
in Proc. Med. Image Comput. Comput.-Assisted Intervention, Jan. 2010, vironments for Computer-Assisted Interventions (ser. Lecture Notes in
pp. 275–282. Computer Science), C. A. Linte, J. T. Moore, E. C. S. Chen, and D. R.
[57] J. Totz, K. Fujii, P. Mountney, and G.-Z. Yang, “ Enhanced visualisation H. III, Eds., Berlin, Germany: Springer, Jan. 2012, pp. 47–57.
for minimally invasive surgery,” Int. J. Comput. Assist Radiol. Surg., vol. [76] S. Giannarou, Z. Zhang, and G.-Z. Yang, “Deformable structure from
7, no. 3, pp. 423–432, Jun. 2011. motion by fusing visual and inertial measurement data,” in Proc. 2012
[58] A. Warren, P. Mountney, D. Noonan, and G.-Z. Yang, “ Horizon IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2012, pp. 4816–4821.
stabilized-dynamic view expansion for robotic assisted surgery (HS- [77] M. Schuster, T. Bergen, M. Reiter, C. Münzenmayer, S. Friedl, and T.
DVE),” Int. J. Comput. Assist Radiol. Surg., vol. 7, no. 2, pp. 281–288, Wittenberg, “Laryngoscopic image stitching for view enhancement and
Jun. 2011. documentation—first experiences,” Biomedizinische Technik. Biomed.
[59] O. G. Grasa, J. Civera, A. Guemes, V. Munoz, and J. M. M. Montiel, Eng., vol. 57, no. 1, pp. 704–707, Aug. 2012.
“EKF monocular SLAM 3D modeling, measuring and augmented reality [78] D. Burschka and G. Hager, “V-GPS(SLAM): vision-based inertial system
from endoscope image sequences,” in Proc. 5th Workshop Augmented for mobile robots,” in Proc. IEEE Int. Conf. Robot. Autom., 2004, vol. 1,
Environ. Med. Imag. including Augmented Reality Comput.-Aided Surg., pp. 409–415.
Held Conjunction MICCAI’09, 2009, pp. 102–109. [79] D. Burschka, M. Li, M. Ishii, R. H. Taylor, and G. D. Hager, “Scale-
[60] O. G. Grasa, J. Civera, and J. M. M. Montiel, “EKF monocular SLAM invariant registration of monocular endoscopic images to CT-scans for
with relocalization for laparoscopic sequences,” in Proc. IEEE Int. Robot. sinus surgery,” Med. Image Anal., vol. 9, no. 5, pp. 413–426, Oct. 2005.
Autom. Conf., 2011, pp. 4816–4821. [80] T. Wittenberg, C. Winter, I. Scholz, S. Rupp, M. Stamminger, K.
[61] J. Civera, A. Davison, and J. Montiel, “Dimensionless monocular Bumm, and C. Nimsky, “3-D reconstruction of the sphenoid sinus from
SLAM,” Pattern Recog. Image Anal., vol. 4478, pp. 412–419, 2007. monocular endoscopic views: First results,” Gemeinsame Jahrestagung
[62] J. Civera, A. Davison, and J. Montiel, “Inverse depth parametrization for der Deutschen, Österreichischen und Schweizerischen Gesellschaften für
monocular SLAM,” IEEE Trans. Robot., vol. 24, no. 5, pp. 932–945, Biomedizinische Technik, DGBMT, Zürich, Schweiz, 2006.
Oct. 2008. [81] W. Konen, S. Tombrock, and M. Scholz, “Robust registration procedures
[63] J. Civera, O. G. Grasa, A. J. Davison, and J. M. M. Montiel, “1-point for endoscopic imaging,” Med. Image Anal., vol. 11, no. 6, pp. 526–539,
RANSAC for EKF filtering: Application to real-time structure from mo- Dec. 2007.
tion and visual odometry,” J. Field Robot., vol. 27, no. 5, pp. 609–631, [82] C. Winne, M. Khan, F. Stopp, E. Jank, and E. Keeve, “Overlay visualiza-
Oct. 2010. tion in endoscopic ENT surgery,” Int. J. Comput. Assisted Radiol. Surg.,
[64] H. Bouma, W. Van Der Mark, P. Eendebak, S. Landsmeer, A. Van Eek- vol. 6, no. 3, pp. 401–406, May 2011.
eren, F. Ter Haar, F. Wieringa, and J.-P. Van Basten, “Streaming video- [83] F. Schulze, K. Bühler, A. Neubauer, A. Kanitsar, L. Holton, and S. Wolfs-
based 3D reconstruction method compatible with existing monoscopic berger, “Intra-operative virtual endoscopy for image guided endonasal
and stereoscopic endoscopy systems,” in Proc. SPIE - Int. Soc. Opt. Eng., transsphenoidal pituitary surgery,” Int. J. Comput. Assisted Radiol. Surg.,
Baltimore, MD, USA, 2012, vol. 8371, p. 837112. vol. 5, no. 2, pp. 143–154, Mar. 2010.
[65] J. Totz, P. Mountney, D. Stoyanov, and G.-Z. Yang, “Dense surface [84] M. J. Daly, H. Chan, E. Prisman, A. Vescan, S. Nithiananthan, J. Qiu,
reconstruction for enhanced navigation in MIS,” in Proc. Med. Image R. Weersink, J. C. Irish, and J. H. Siewerdsen, “Fusion of intraoperative
Comput. Comput.-Assisted Intervention, Jan. 2011, no. 6891, pp. 89–96. cone-beam CT and endoscopic video for image-guided procedures,”
[66] S. Röhl, S. Bodenstedt, S. Suwelack, H. Kenngott, B. P. Müller-Stich, R. Proc. SPIE, vol. 7625, pp. 762503-1–762503-8, 2010.
Dillmann, and S. Speidel, “Dense GPU-enhanced surface reconstruction [85] R. Shahidi, M. Bax, J. Maurer, C.R., J. Johnson, E. Wilkinson, B. Wang,
from stereo endoscopic images for intraoperative registration,” Med. J. West, M. Citardi, K. Manwaring, and R. Khadem, “Implementa-
Phys., vol. 39, no. 3, pp. 1632–1645, Mar. 2012. tion, calibration and accuracy testing of an image-enhanced endoscopy
[67] S. Bernhardt, J. Abi-Nahed, and R. Abugharbieh, “Robust dense endo- system,” IEEE Trans. Med. Imag., vol. 21, no. 12, pp. 1524–1535,
scopic stereo reconstruction for minimally invasive surgery,” in Medical Dec. 2002.
Computer Vision. Recognition Techniques and Applications in Medical [86] R. Lapeer, M. Chen, G. Gonzalez, A. Linney, and G. Alusi, “Image-
Imaging (ser. Lecture Notes in Computer Science), B. H. Menze, G. enhanced surgical navigation for endoscopic sinus surgery: Evaluating
Langs, L. Lu, A. Montillo, Z. Tu, and A. Criminisi, Eds. Berlin, Ger- calibration, registration and tracking,” Int. J. Med. Robotics Comput.
many: Springer, Jan. 2013, pp. 254–262. Assisted Surg., vol. 4, no. 1, pp. 32–45, 2008.
320 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016

[87] D. Mirota, H. Wang, R. H. Taylor, M. Ishii, and G. D. Hager, “Toward [109] R. Carroll and S. Seitz, “Rectified surface mosaics,” Int. J. Comput.
video-based navigation for endoscopic endonasal skull base surgery,” Vision, vol. 85, no. 3, pp. 307–315, 2009.
Med. Image Comput. Comput. Assisted Intervention, vol. 12, no. Pt 1, [110] R. Tsai, “A versatile camera calibration technique for high-accuracy 3D
pp. 91–99, 2009. machine vision metrology using off-the-shelf TV cameras and lenses,”
[88] D. Mirota, H. Wang, R. Taylor, M. Ishii, G. Gallia, and G. Hager, IEEE J. Robot. Autom., vol. RA-3, no. 4, pp. 323–344, Aug. 1987.
“A system for video-based navigation for endoscopic endonasal skull [111] J. Heikkila and O. Silven, “A four-step camera calibration procedure with
base surgery,” IEEE Trans. Med. Imag., vol. 31, no. 4, pp. 963–976, implicit image correction,” in Proc. IEEE Comput. Soc. Conf. Comput.
Apr. 2012. Vision Pattern Recog., 1997, pp. 1106–1112.
[89] W. Konen, M. Naderi, and M. Scholz, “Endoscopic image mosaics [112] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans.
for real-time color video sequences,” Comput.-Assisted Radiol. Surg., Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, Nov. 2000.
vol. 2, no. 1, pp. S224–S225, 2007. [113] R. Hartley and S. B. Kang, “Parameter-free radial distortion correction
[90] M. Kourogi, T. Kurata, J. Hoshino, and Y. Muraoka, “Real-time image with center of distortion estimation,” IEEE Trans. Pattern Anal. Mach.
mosaicing from a video sequence,” in Proc. Int. Conf. Image Process., Intell., vol. 29, no. 8, pp. 1309–1321, Aug. 2007.
1999, vol. 4, pp. 133–137. [114] C. Zhang, J. Helferty, G. McLennan, and W. Higgins, “Nonlinear distor-
[91] T. Bergen, P. Hastreiter, C. Münzenmayer, M. Buchfelder, and T. Wit- tion correction in endoscopic video images,” in Proc. Int. Conf. Image
tenberg, “Image stitching of sphenoid sinuses from monocular endo- Process., 2000, vol. 2, pp. 439–442.
scopic views,” in Proc. Tagungsband, 12. Jahrestagung der Deutschen [115] C. Wengert, M. Reeff, P. C. Cattin, and G. Székely, “Fully automatic
Gesellschaft für Computer- und Roboterassistierte Chirurgie, Nov. 2013, endoscope calibration for intraoperative use,” in Proc. Bildverarbeitung
pp. 226–229. für die Medizin, Mar. 2006, pp. 419–423.
[92] T. Thormaehlen, H. Broszio, and P. N. Meier, Three-Dimensional En- [116] T. Stehle, D. Truhn, T. Aach, C. Trautwein, and J. Tischendorf, “Camera
doscopy. Norwell, MA: USA: Kluwer, 2002. calibration for fish-eye lenses in endoscopy with an application to 3D
[93] D. Koppel, C.-I. Chen, Y.-F. Wang, H. Lee, J. Gu, A. Poirson, and R. reconstruction,” in Proc. 4th IEEE Int. Symp. Biomed. Imag.: From Nano
Wolters, “Toward automated model building from video in computer- to Macro, 2007, pp. 1176–1179.
assisted diagnoses in colonoscopy,” in Proc. SPIE, vol. 6509, pp. 65091L- [117] J. Kannala and S. Brandt, “A generic camera model and calibration
1–65091L-9, 2007. method for conventional, wide-angle, and fish-eye lenses,” IEEE Trans.
[94] C.-I. Chen, D. Sargent, and Y.-F. Wang, “Modeling tumor/polyp/lesion Pattern Anal. Mach. Intell., vol. 28, no. 8, pp. 1335–1340, Aug. 2006.
structure in 3D for computer-aided diagnosis in colonoscopy,” in Proc. [118] W. Li, S. Nie, M. Soto-Thompson, C.-I. Chen, and Y. I. A-Rahim, “Robust
SPIE, vol. 7625, p. 76252F-1, 2010. distortion correction of endoscope,” Proc. SPIE, vol. 6918, p. 691812,
[95] A. Kaufman and J. Wang, “3D surface reconstruction from endoscopic 2008.
videos,” in Visualization in Medicine and Life Sciences (ser. Mathematics [119] J. Barreto, J. Roquette, P. Sturm, and F. Fonseca, “Automatic camera
and Visualization), L. Linsen, H. Hagen, and B. Hamann, Eds. Berlin, calibration applied to medical endoscopy,” presented at the 20th British
Germany: Springer, Jan. 2008, pp. 61–74. Mach. Vision Conf., London, U.K., 2009.
[96] D. Hong, W. Tavanapong, J. Wong, J. Oh, and P. Groen, “3D reconstruc- [120] R. Melo, J. Barreto, and G. Falcao, “A new solution for camera calibra-
tion of colon segments from colonoscopy images,” in Proc. 9th IEEE tion and real-time image distortion correction in medical endoscopy—
Int. Conf. Bioinformat. BioEng., 2009, pp. 53–60. Initial technical evaluation,” IEEE Trans. Biomed. Eng., vol. 59, no. 3,
[97] Y. Fan, M.-H. Meng, and B. Li, “3D reconstruction of wireless capsule pp. 634–644, Mar. 2012.
endoscopy images,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., [121] M. Lourenço, J. P. Barreto, F. Fonseca, H. Ferreira, R. M. Duarte, and J.
Aug. 2010, pp. 5149–5152. Correia-Pinto, “Continuous zoom calibration by tracking salient points
[98] B. Sun, L. Liu, C. Hu, and M.-H. Meng, “3D reconstruction based on in endoscopic video,” in Proc. Med. Image Comput. Comput.-Assisted
capsule endoscopy image sequences,” in Proc. Int. Conf. Audio Language Intervention, Jan. 2014, pp. 456–463.
Image Process., Nov. 2010, pp. 607–612. [122] P. Pratt, C. Bergeles, A. Darzi, and G.-Z. Yang, “Practical intraopera-
[99] Q. Zhao and M.-H. Meng, “3D reconstruction of GI tract texture sur- tive stereo camera calibration,” in Proc. Med. Image Comput. Comput.-
face using capsule endoscopy images,” in Proc. IEEE Int. Conf. Autom. Assisted Intervention, Jan. 2014, pp. 667–675.
Logistics, Aug. 2012, pp. 277–282. [123] D. Stoyanov, A. Darzi, and G.-Z. Yang, “Laparoscope self-calibration
[100] G. Ciuti, M. Visentini-Scarzanella, A. Dore, A. Menciassi, P. Dario, and for robotic assisted minimally invasive surgery,” in Proc. Med. Image
G.-Z. Yang, “Intra-operative monocular 3D reconstruction for image- Comput. Comput.-Assisted Intervention, Jan. 2005, pp. 114–121.
guided navigation in active locomotion capsule endoscopy,” in Proc. 4th [124] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
IEEE RAS EMBS Int. Conf. Biomed. Robot. Biomechatron., Jun. 2012, Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.
pp. 768–774. [125] T. Bergen, S. Nowack, C. Münzenmayer, and T. Wittenberg, “A hybrid
[101] A. Karargyris and N. Bourbakis, “Three-dimensional reconstruction of tracking approach for endoscopic real-time panorama imaging,” Int. J.
the digestive wall in capsule endoscopy videos using elastic video in- CARS, vol. 8, Suppl. 1, pp. 352–354, Jun. 2013.
terpolation,” IEEE Trans. Med. Imag., vol. 30, no. 4, pp. 957–971, [126] J. Shi, and C. Tomasi, “Good features to track,” in Proc. IEEE Comput.
Apr. 2011. Soc. Conf. Comput. Vision Pattern Recog., 1994, pp. 593–600.
[102] V. Prasath, I. Figueiredo, P. Figueiredo, and K. Palaniappan, “Mucosal [127] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust
region detection and 3D reconstruction in wireless capsule endoscopy features SURF,” Comput. Vision Image Understanding, vol. 110, no. 3,
videos using active contours,” in Proc. 2012 Annu. Int. Conf. IEEE Eng. pp. 346–359, 2008.
Med. Biol. Soc., Aug. 2012, pp. 4014–4017. [128] A. S. Vemuri, K.-C. Liu, Y. Ho, H.-S. Wu, and M.-C. Ku, “Endoscopic
[103] T. Vercauteren, “Image registration and mosaicing for dynamic in vivo video mosaicing: Application to surgery and diagnostics,” in Proc. Living
fibered confocal microscopy,” Ph.D. dissertation, Comput. Sci, Ecole Imag. Workshop, 2011, pp. 1–2.
Nat. Superieure des Mines de Paris, Paris, France, 2008. [129] M. Reeff, F. Gerhard, P. Cattin, and G. Székely, “Mosaicing of en-
[104] T. Vercauteren, A. Meining, F. Lacombe, and A. Perchant, “Real time doscopic placenta images,” GI Jahrestagung, vol. 2006, pp. 467–474,
autonomous video image registration for endomicroscopy: Fighting the 2006.
compromises,” vol. 6861, p. 68610C, Feb. 2008. [130] E. Rosten and T. Drummond, “Machine learning for high-speed corner
[105] N. Bedard, T. Quang, K. Schmeler, R. Richards-Kortum, and T. S. detection,” in Proc. Eur. Conf. Comput. Vision, Jan. 2006, pp. 430–443.
Tkaczyk, “Real-time video mosaicing with a high-resolution microen- [131] E. Mair, G. D. Hager, D. Burschka, M. Suppa, and G. Hirzinger, “Adap-
doscope,” Biomed. Opt. Exp., vol. 3, no. 10, pp. 2428–2435, Sep. 2012. tive and generic corner detection based on the accelerated segment test,”
[106] K. Loewke, D. Camarillo, C. Jobst, and J. Salisbury, “Real-time image in Proc. Eur. Conf. Comput. Vision, Jan. 2010, pp. 183–196.
mosaicing for medical applications,” Stud. Health Technol. Informat., [132] M. Agrawal, K. Konolige, and M. R. Blas, “CenSurE: Center surround
vol. 125, pp. 304–309, 2007. extremas for realtime feature detection and matching,” in Proc. Eur.
[107] K. Loewke, D. Camarillo, W. Piyawattanametha, M. Mandella, C. Con- Comput. Vision, Jan. 2008, pp. 102–115.
tag, S. Thrun, and J. Salisbury, “In vivo micro-image mosaicing,” IEEE [133] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary robust
Trans. Biomed. Eng., vol. 58, no. 1, pp. 159–171, Jan. 2011. independent elementary features,” in Proc. Eur. Conf. Comput. Vision,
[108] T. Soper, J. Chandler, M. Porter, and E. Seibel, “Constructing spherical Jan. 2010, pp. 778–792.
panoramas of a bladder phantom from endoscopic video using bundle [134] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient
adjustment,” in Proc. SPIE: Prog. Biomed. Opt. Imag., Lake Buena Vista, alternative to SIFT or SURF,” in Proc. IEEE Int. Conf. Comput. Vision,
FL, USA, 2011, vol. 7964, p. 796417. 2011, pp. 2564–2571.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 321

[135] A. Alahi, R. Ortiz, and P. Vandergheynst, “FREAK: Fast retina keypoint,” [157] P. Mountney, D. Stoyanov, and G. Yang, “Three-dimensional tissue de-
in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2012, pp. 510–517. formation recovery and tracking,” IEEE Signal Process. Mag., vol. 27,
[136] S. Leutenegger, M. Chli, and R. Siegwart, “BRISK: Binary robust invari- no. 4, pp. 14–24, Jul. 2010.
ant scalable keypoints,” in Proc. IEEE Int. Conf. Comput. Vision, 2011, [158] A. M. C. González, P. Sánchez-González, F. M. Sánchez-Margallo, I.
pp. 2548–2555. Oropesa, F. d. Pozo, and E. J. Gómez, “Video-endoscopic image analysis
[137] F. Zilly, C. Riechert, P. Eisert, and P. Kauff, “Semantic kernels for 3D reconstruction of the surgical scene,” in Proc. 4th Eur. Conf. Int.
binarized—A feature descriptor for fast and robust matching,” in Proc. Federation Med. Biol. Eng., Jan. 2009, pp. 923–926.
Conf. Visual Media Prod., 2011, pp. 39–48. [159] C. Wu, S. G. Narasimhan, and B. Jaramaz, “A multi-image shape-from-
[138] P. Mountney, and G. Yang, “Soft tissue tracking for minimally invasive shading framework for near-lighting perspective endoscopes,” Int. J.
surgery learning local deformation online,” in Proc. Med. Image Comput. Comput. Vision, vol. 86, nos. 2/3, pp. 211–228, Jan. 2010.
Comput.-Assisted Intervention, 2008, pp. 364–372. [160] M. Visentini-Scarzanella, D. Stoyanov, and G.-Z. Yang, “Metric depth
[139] P. Mountney and G.-Z. Yang, “Context specific descriptors for tracking recovery from monocular images using shape-from-shading and specu-
deforming tissue,” Med. Image Anal., vol. 16, pp. 550–561, Apr. 2012. larities,” in Proc. 19th IEEE Int. Conf. Image Process., Sep. 2012, pp.
[140] V. Lepetit and P. Fua, “Keypoint recognition using randomized trees,” 25–28.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 9, pp. 1465–1479, [161] T. Porter and T. Duff, “Compositing digital images,” in Proc. 11th Annu.
Sep. 2006. Conf. Comput. Graphics Interactive Tech., 1984, pp. 253–259.
[141] P. Mountney, B. Lo, S. Thiemjarus, D. Stoyanov, and G. Zhong-Yang, [162] P. J. Burt and E. H. Adelson, “A multiresolution spline with application
“A probabilistic framework for tracking deformable soft tissue in mini- to image mosaics,” ACM Trans. Graph., vol. 2, no. 4, pp. 217–236,
mally invasive surgery,” in Proc. Med. Image Comput. Comput.-Assisted Oct. 1983.
Intervention, 2007, pp. 34–41. [163] J. Davis, “Mosaics of scenes with moving objects,” in Proc. IEEE Com-
[142] S. Giannarou, M. Visentini-Scarzanella, and G.-Z. Yang, “Probabilistic put. Soc. Conf. Comput. Vision Pattern Recog., Jun. 1998, pp. 354–360.
tracking of affine-invariant anisotropic regions,” IEEE Trans. Pattern [164] T. Weibel, C. Daul, D. Wolf, and R. Rosch, “Contrast-enhancing seam
Anal. Mach. Intell., vol. 35, no. 1, pp. 130–143, Jan. 2013. detection and blending using graph cuts,” in Proc. 21st Int. Conf. Pattern
[143] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm Recog., 2012, pp. 2732–2735.
for model fitting with applications to image analysis and automated [165] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick, “Graphcut tex-
cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, Jun. 1981. tures: Image and video synthesis using graph cuts,” in Proc. ACM SIG-
[144] P. Torr, and A. Zisserman, “MLESAC: A new robust estimator with GRAPH 2003 Papers, 2003, pp. 277–286.
application to estimating image geometry,” Comput. Vision Image Un- [166] A. Behrens, M. Bommes, T. Stehle, S. Gross, S. Leonhardt, and T.
derstanding, vol. 78, no. 1, pp. 138–156, Apr. 2000. Aach, “A multi-threaded mosaicking algorithm for fast image composi-
[145] L. G. Brown, “A survey of image registration techniques,” ACM Comput. tion of fluorescence bladder images,” in Proc. SPIE Med. Imag., 2010,
Surv., vol. 24, no. 4, pp. 325–376, Dec. 1992. p. 76252S.
[146] A. A. Goshtasby, 2-D and 3-D Image Registration: for Medical, Remote [167] T. Bergen, S. Ruthotto, C. Münzenmayer, S. Rupp, D. Paulus, and C.
Sensing, and Industrial Applications. New York, NY, USA: Wiley, Apr. Winter, “Feature-based real-time endoscopic mosaicking,” in Proc. 6th
2005. Int. Symp. Image Signal Process. Anal., 2009, pp. 695–700.
[147] B. Zitová and J. Flusser, “Image registration methods: A survey,” Image [168] A. Behrens and H. Röllinger, “Analysis of feature point distributions
Vision Comput., vol. 21, no. 11, pp. 977–1000, Oct. 2003. for fast image mosaicking algorithms,” Acta Polytechnica J. Adv. Eng.,
[148] B. Williams, M. Cummins, J. Neira, P. Newman, I. Reid, and J. Tardós, vol. 50, no. 4, pp. 12–18, Aug. 2010.
“A comparison of loop closing techniques in monocular SLAM,” Robot. [169] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski,
Auton. Syst., vol. 57, no. 12, pp. 1188–1197, 2009. “A database and evaluation methodology for optical flow,” Int. J. Comput.
[149] S. Seshamani, M. D. Smith, J. J. Corso, M. O. Filipovich, A. Natarajan, Vision, vol. 92, no. 1, pp. 1–31, Mar. 2011.
and G. D. Hager, “Direct global adjustment methods for endoscopic [170] A. Behrens, M. Bommes, S. Gross, and T. Aach, “Image quality assess-
mosaicking,” in Proc. SPIE, 2009, p. 72611D. ment of endoscopic panorama images,” in Proc. 18th Int. Conf. Image
[150] P. Mountney, D. Stoyanov, A. Davison, and G. Yang, “Simultaneous Process., 2011, pp. 3113–3116.
stereoscope localization and soft-tissue mapping for minimal invasive [171] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assess-
surgery,” in Proc. Med. Image Comput. Comput.-Assisted Intervention, ment: from error visibility to structural similarity,” IEEE Trans. Image
2006, pp. 347–354. Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[151] C. Stewart, C.-L. Tsai, and B. Roysam, “The dual-bootstrap iterative [172] C. Li, X. Yang, B. Chu, W. Lu, and L. Pang, “A new image fusion quality
closest point algorithm with application to retinal image registration,” assessment method based on contourlet and SSIM,” in Proc. 2010 3rd
IEEE Trans. Med. Imag., vol. 22, no. 11, pp. 1379–1394, Nov. 2003. IEEE Int. Conf. Comput. Sci. Inf. Technol., 2010, vol. 5, pp. 246–249.
[152] F. Mourgues, F. Devemay, and E. Coste-Maniere, “3D reconstruction of [173] J. C. Mankins, “Technology readiness levels,” White Paper, vol. 6,
the operating field for image overlay in 3D-endoscopic surgery,” in Proc. Apr. 1995.
IEEE ACM Int. Symp. Augmented Reality, 2001, pp. 191–192. [174] A. Albu, Vision-Based User Interfaces for Health Applications: A Survey
[153] H. Wang, D. Mirota, M. Ishii, and G. Hager, “Robust motion estimation (ser. Lecture Notes in Computer Science, including subseries Lecture
and structure recovery from endoscopic image sequences with an adap- Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
tive scale kernel consensus estimator,” in Proc. 26th IEEE Conf. Comput. vol. 4291. New York, NY, USA: Springer, 2006.
Vision Pattern Recog., 2008, pp. 1–7. [175] L. C. Ebert, G. Hatch, G. Ampanozi, M. J. Thali, and S. Ross, “You can’t
[154] H. Wang, D. Mirota, G. Hager, and M. Ishii, “Anatomical reconstruction touch this touch-free navigation through radiological images,” Surgical
from endoscopic images: toward quantitative endoscopy.” Am. J. Rhinol., Innovation, vol. 19, no. 3, pp. 301–307, Sep. 2012.
vol. 22, no. 1, pp. 47–51, 2008. [176] M. G. Jacob and J. P. Wachs, “Context-based hand gesture recognition
[155] O. Grasa, E. Bernal, S. Casado, I. Gil, and J. Montiel, “Visual SLAM for for the operating room,” Pattern Recog. Lett., vol. 36, pp. 196–203, Jan.
handheld monocular endoscope,” IEEE Trans. Med. Imag., vol. 33, no. 2014.
1, pp. 135–146, Jan. 2014.
[156] P. Mountney and G. Yang, “Motion compensated SLAM for image
guided surgery,” in Proc. Med. Image Comput. Comput.-Assisted In-
tervention, 2010, pp. 496–504. Authors’ photographs and biographies not available at the time of publication.

You might also like