Professional Documents
Culture Documents
1, JANUARY 2016
2168-2194 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 305
TABLE I
ENDOSCOPIC VIEW EXPANSION HAS BEEN APPLIED TO A WIDE RANGE OF HUMAN ORGANS AND MEDICAL DISCIPLINES
Planar Colon (on Gastroenterology Plane Semirigid Confocal laser Real-time Dynamic view
microscopic scale) endomicroscope enhancement
(DVE)
Concave Larynx Otolaryngology Plane Nonrigid Rigid Offline Documentation
hollow laryngoscope
Eye (retina) Ophthalmology Plane, sphere Rigid Ophthalmoscope, Real-time DVE, navigation
funduscope
Bladder Urology Plane, sphere Semirigid Rigid or flexible Real-time, DVE,
cystoscope offline navigation,
documentation
Tubular Urethra Urology Cylinder Semirigid Rigid or flexible Offline Documentation
cystoscope
Esophagus Gastroenterology Cylinder Nonrigid Flexible Offline Documentation
gastroscope
Airways Bronchoscopy Cylinder Semirigid Flexible Real-time Navigation
(+bifurcations) bronchoscope
Complex Colon Gastroenterology Plane, cylinder, Nonrigid Pillcam, flexible Offline Documentation
3-D model-free colonoscope
Abdomen Laparoscopy Plane, Nonrigid Rigid Real-time DVE, navigation
model-free
Sinuses Neurosurgery Plane, Rigid Rigid laparoscope Real-time DVE, navigation
model-free
overview of the relevant applications based on medical branches are used to inspect the inner wall of the bladder. Usually, it is
and anatomical structures, summarized in Table I. filled-up with fluid during the inspection. Many image stitch-
The shapes of the organs under consideration vary from a ing approaches for the bladder use a planar projection surface
nearly planar appearance—as in the colon at a microscopic which works well for small parts of the organ. To visualize the
scale, observed in confocal laser endomicroscopy (CLE)—to entire bladder, also spherical models have been proposed. The
hollow and tubular structures (e.g., in urology), as well as 3-D physician may vary the amount of fluid and apply pressure to
structures (e.g., in laparoscopy), which strongly deviate from the abdominal wall to make all parts of the bladder visible.
such simple geometric models. Accordingly, the algorithms ap- While this causes at least temporary deformation of the organ,
plied in the different fields differ in complexity. Whereas for all stitching and reconstruction approaches found in the litera-
some organs such as the urinary bladder or the retina, most au- ture are based on a rigid body model. The presented approaches
thors assume a planar or spherical shape model, in laparoscopy either aim at building a map of the organ for documentation pur-
the mosaicking process is often based on reconstruction of a poses (offline) or provide a real-time view expansion to facilitate
complex 3-D surface. Another aspect that needs to be consid- orientation and navigation for the surgeon.
ered is the rigidity of the scene. Although the vast majority of Approaches: Initial experiments with a photograph of a pig
algorithms are based on an assumption that the scene is rigid, bladder and a mechanically guided fiberscope were reported in
this rarely holds for endoscopic applications. We characterize 2004 by Miranda-Luna et al. [13], [14]. This research was con-
scenes as rigid, semirigid, and nonrigid, respectively. Organs tinued by Hernandez-Mier et al. in 2006, extending the investi-
that are subject to motion or deformation due to physical pro- gations to fluorescence imaging [15], and by Olijnyk et al. [16]
cesses (such as heartbeat, respiration, and peristalsis) are con- and Ben-Hamadou et al. [17]. The authors applied active stereo
sidered to be nonrigid scenes. An assumption of rigidity can techniques by projecting eight laser dots for surface reconstruc-
therefore only be regarded as an approximation to the true be- tion of the bladder wall [18], [19]. An acceleration of the method
havior of the scene. Other organs are not necessarily subject to presented by Miranda-Luna et al. was described by Hernandez-
motion or deformation, but may be deformed under the influ- Mier et al. in 2010 [20]. Behrens et al. have investigated meth-
ence of interactions from the physician or motion by the patient. ods for image stitching of fluorescence cystoscopy in several
This may be the case in the urinary bladder, for example, as it publications [21]–[25]. They explored the creation of panorama
is not a rigid organ; if it is filled with a fairly constant amount images from cystoscopies with pure image-processing meth-
of fluid, careful handling of the cystoscope can reduce organ ods, as well as navigation support using inertial sensors.
deformation to a minimum, making rigidity a valid assumption. Bergen et al. [26] have applied a graph-based approach to
However, it is questionable whether this is actually viable in stitch images from cystoscopic video. They identified coherent
real-world clinical scenarios. submaps from the frame graph to stitch local patches which are
combined to a larger mosaic afterwards. Weibel et al., building
on the research by Ben-Hamadou et al., integrated graph-cut
A. Urology
and graph-based algorithms to produce visually coherent maps
The most prominent organ to which image stitching is ap- of the urinary bladder [27], [28]. Soper et al. [29] have also de-
plied is the urinary bladder. Both rigid and flexible cystoscopes scribed panorama imaging and surface reconstruction methods
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 307
to support automated bladder surveillance, complemented by Approaches: An early contribution by Rousso et al. [40], de-
the design of a special endoscopic surveillance system by Yoon scribing a pipe projection model, was used by Seibel et al. [41]
et al. [30]–[32]. to generate panorama images of the esophagus from a capsule
Inspection of the urethra and ureter are additional urological endoscope (CE) system. The images are mapped onto a cylindri-
examinations to which EVE has been applied. These are fur- cal surface by unwrapping them around an estimated projection
ther discussed in Section II-C, along with other tubular-shaped center. The camera motion between consecutive video frames
organs. is estimated using an affine optical flow technique. A similar
method of pipe projection was used by Yang et al. [42] to detect
fluorescent hotspots in Barrett’s esophagus images and visual-
B. Retinal Surgery
ize them on a mosaic map. Initial experiments on calculating
Another organ to which image stitching has been applied is an unwrapped image of the esophagus were presented by Shar
the retina. In relation to the visual impression of the images et al. [43] as long ago as 1990 and Kim et al. [44] in 1995.
that are acquired, as well as the spherical geometry, the retina Reynolds et al. [45] used an unwrapped map to quantitatively
is very similar to the urinary bladder. Although retinal surgery describe Barrett’s esophagus. Igarashi et al. [46], [47] and Ishii
is not an endoscopic procedure but is performed using an oph- et al. [48] presented opened panoramic images of tubular or-
thalmoscope or funduscope, we have decided to consider this gans such as the male urethra, porcine colon, and human colon,
application here as well, due to the similarity of the problem and using a “shape-from-shading” (SfS) approach. They assumed a
the algorithmic solutions to it that have been presented in the cylindrical model for the organs and perfect alignment of the
literature. For the purpose of image stitching, the eyeball can optical axis with the cylindrical axis. The panorama was gen-
be considered as a rigid body. Both planar and spherical mod- erated from circles extracted around the image center during
els have been used to visualize the retina. Real-time processing constant pull-back motion of the endoscope. Ou-Yang et al.
is an important requirement since stitching is performed as a [49] stitched images from a radial imaging CE system by ap-
navigation aid for the surgeon. plying a similar unwrapping technique. Recently, Yi et al. [50]
Approaches: In 2002, Can et al. [33] presented mosaics gen- have presented a real-time CE video visualization technique,
erated from images of the human retina acquired with a fun- based on unwrapped panorama images of the gastrointestinal
dus microscope. They explicitly exploited vascular structures to tract. Although their method was only based on homographies
register pairs of images and used a quadric surface model to rep- to describe interframe transformations, the group of Iakovidis
resent the retina. Their work is based on earlier experiments by and Spyrou [51], [52] successfully reduced the amount of video
Becker et al. carried out in 1998 [34]. Cattin et al. [35] built on material captured during wireless capsule endoscopy (WCE) by
the work of Can et al. and presented an alternative retina image creating frame collages (local panorama images).
mosaicking approach using speeded-up robust features (SURF)
and a multiband blending algorithm. Choe et al. [36] extracted
Y-shaped features for registration and applied a shortest path
D. Laparoscopy
algorithm on the frame graph to construct a globally consistent
mosaic with minimal registration error. Wei et al. [37] applied Another medical branch to which stitching and surface recon-
principal component analysis of a scale-invariant feature trans- struction is applied is MIS, especially laparoscopy. Most of the
form (PCA-SIFT) and a quadric surface model for mosaick- research published in this field is aimed at providing real-time
ing. The step toward real-time retinal mosaicking was taken by navigation support during surgical procedures in the abdomi-
Seshamani et al. [38] and was further improved by Richa et al. nal cavity. Most publications have therefore investigated SLAM
[39] in 2012; the latter authors proposed a hybrid tracking ap- methods for reconstructing a surface model, as well as the cam-
proach for improved robustness against image disturbances. era position in real time. The goal of creating an enhanced rep-
resentation of the scene is closely related to real-time stitching,
although the algorithms may differ. The classical method for
C. Tubular Organs solving the SLAM problem solely from the images of a single
Considerable efforts have been made to generate panorama moving camera (or endoscope) is based on the extended Kalman
images of tubular-shaped organs such as the esophagus, tra- filter (EKF) presented by Davison et al. in 2007 [9]. Numerous
chea, intestine, and the ureter and urethra. Tubular structures publications have built on this methodology to solve the SLAM
prohibit the use of classical stitching or surface reconstruction problem for MIS. Special challenges that emerge in the surgical
techniques. Since the direction of view during inspection of context include multiple objects in the scene, such as surgical
the above organs usually coincides with the direction of mo- instruments that appear and disappear; blurry images due to
tion, leading to a zooming effect in the images, special map- rapid camera motion; cauterization smoke; staining from blood
ping and reconstruction methods are needed. Therefore, many and other body fluids; and tissue deformation. Maier-Hein
approaches use a cylindrical model to approximate the organ et al. [53] have recently provided a comprehensive overview
shape. All aforementioned organs are inspected with flexible of state-of-the-art techniques for 3-D surface reconstruction in
endoscopes (except for the urethra which can also be viewed computer-assisted laparoscopic surgery. The specific goal of
with a rigid cystoscope) and are subject to deformation. Com- using surface reconstruction to provide the surgeon with an ex-
putation of a map of an organ or parts of it is usually performed tended field of view has been addressed under the heading of
offline. dynamic view expansion by several authors.
308 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016
Approaches: In 2008, Lerotic et al. [54] reported initial ex- papers have used external tracking systems to enable real-time
periments on view expansion in natural orifice transluminal en- guidance during surgery (i.e., by Konen et al. [81], Winne et al.
doscopic surgery (NOTES), based on optical flow. An extension [82], Schulze et al. [83], Daly et al. [84]), Shahidi et al. [85]
of this to a full SLAM approach based on EKF and stereo- and Lapeer et al. [86] used passive optical markers and image-
endoscopic image data were presented by Mountney and Yang processing techniques to substitute for the external tracking sys-
[55], Stoyanov et al. [56], Totz et al. [57], and Warren et al. tem. Mirota et al. [87], [88] argued that direct registration of a
[58]. Whereas they used stereo imaging to obtain reliable depth 3-D reconstruction from endoscopic video and a preoperative
information, Grasa et al. explored the capabilities of monocular CT scan can improve accuracy, since the detour through exter-
EKF-based SLAM for MIS [59], [60]. Their SLAM solution is nal tracking systems tends to introduce significant errors. They
primarily based on the work of Civera et al. [61]–[63]. Dense applied SfM algorithms to reconstruct a surface model from the
surface reconstruction from image data from a da Vinci surgical monocular endoscopic video for CT registration. While all of
robotic system1 was described in 2012 by Bouma et al. [64]. these contributions are related to view enhancement strategies,
Due to the challenging characteristics of endoscopic image data, to the best of our knowledge Konen et al. [89] were the first
dense reconstruction has gained increasing focus in the recent to present experiments with an image mosaicking approach for
past. Further dense reconstruction approaches applied to surgi- neuroendoscopic videos. They applied a real-time mosaicking
cal image data have been presented by Totz et al. [65], Röhl et al. approach earlier described by Kourogi et al. [90]. Since this
[66], Bernhardt et al. [67], and Chang et al. [68]. The application method is purely based on affine frame-to-frame transforma-
of classical image mosaicking techniques to fibroscopic images tions, it is presumably not able to handle the complex geometry
of an ex-vivo kidney was reported by Atasoy et al. [69]. Hu et al. of sinusoidal cavities. Bergen et al. [91] have presented the ini-
[70] applied a mosaicking approach with superresolution to im- tial results of a real-time stitching approach for neurosurgery,
ages of the heart surface captured by the da Vinci system. This applied to a skull phantom.
was accompanied by their research on 3-D organ reconstruc-
tion [71], [72]. First steps toward deformable reconstruction of
F. Colonoscopy
organ surfaces were taken by Malti et al. [73]. They applied
template-based deformable shapes from motion and shading to One of the most challenging organs for stitching and surface
generate a template for the uterus that subsequently undergoes reconstruction is the human colon, due to its complex geometric
deformation. Also, Bartoli et al. [74] and Giannarou et al. [75], structure and extreme lack of rigidity. Although global mapping
[76] presented further procedures and theoretical considerations of the colon would be of great value from a medical point view
about deformable shape-from-motion (SfM) for MIS. to facilitate colonoscopy, only a few contributions on the topic
can be found in the literature. The geometric structure inhibits
E. Otorhinolaryngology and Neurosurgery the application of simple mosaicking algorithms, so that all of
the research has focused on 3-D surface reconstruction of parts
In the field of otorhinolaryngology—ear, nose, and throat
of the colon.
(ENT) conditions—Schuster et al. [77] have successfully ap-
Approaches: Some early work by Thormaehlen et al. [92]
plied general-purpose stitching software to laryngoscopic im-
used an SfM approach to generate a texturized surface model
age sequences and presented panorama images of the larynx for
from colonoscopy video images. They present a reconstruction
documentation purposes.
of part of the colon wall containing a polyp. Similarly, Koppel
In neurosurgery, endoscopic image processing techniques as
et al. [93] and Chen et al. [94] reconstructed a small part of
a method of navigational support have been investigated primar-
the colon, using a variety of feature tracking, camera motion
ily in endonasal surgery, in which the surgeon enters the brain
estimation, and stereo rectification algorithms to generate a tex-
through the nasal cavity and sphenoid bone to reach the anterior
turized surface model. Kaufman and Wang [95] combined an
skull base. The very restricted operating space and limited field
SfS algorithm for 3-D geometry estimation with an SfM algo-
of view here are challenges addressed by several authors, who
rithm to extract the endoscopic camera motion and 3-D feature
have described methods of navigational support through image
point locations. Hong et al. [96] took advantage of the tubular
mosaicking or SLAM solutions. Apart from manipulation by
nature of the colon to reconstruct a virtual colon segment from a
the surgeon, the sinuses can be considered as a rigid structure.
single colonoscopy image, assisted by manually drawn contours
The majority of the approaches for view enhancement for en-
of major colon folds.
donasal surgery aim at 3-D reconstruction of the rather complex
Three-dimensional reconstruction from CE images has been
geometry of the sinusoidal cavities.
investigated by several research groups in the recent years. CE
Approaches: Seminal work in the field of SLAM for sinus
is probably the most challenging endoscopic image source for
surgery has been presented by Burschka et al. [78], [79]. They
stitching or reconstruction, due to the low frame rate of usually
propose a 3-D reconstruction approach from monocular endo-
2–6 frames/s and uncontrolled, unrestricted motion. Neverthe-
scopic images for registration with a preoperative computed
less, some promising progress has been made. In 2010, Fan et al.
tomography (CT) scan, based on SfM and iterative closest point
[97] showed first results of an SfM approach, based on SIFT fea-
registration. Wittenberg et al. [80] presented a 3-D reconstruc-
tures and epipolar geometry applied to CE images of the colon.
tion of the sphenoid sinus with an SfM approach. While several
A very similar approach was also followed by Sun et al. [98] Due
to the lack of prominent textural features in CE images, most
1 www.intuitivesurgical.com subsequent research concentrated on SfS approaches (see, e.g.,
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 309
Fig. 4. Selection of recent results of endoscopic view expansion, presented for different fields of application. Top row: A planar panorama image, generated in
real-time from fluorescence cystoscopy video frames [25]. A 3-D reconstruction using SfM of an excavated pig bladder [108]. An unrolled panorama image of the
esophagus, based on a cylindrical surface model [109]. A planar panorama image of the larynx, generated with general-purpose stitching software [77]. Dynamic
view enhancement (real-time) for laparoscopy using an EKF-SLAM approach [55]. Bottom row: Three-dimensional reconstruction of a polyp region in the colon,
using an SfM approach [93]. A planar collage from the colon, generated from images captured by a CE, to provide a visual video summary for faster inspection
[51]. A planar panorama image of the interior of a skull phantom for real-time view expansion during endo-nasal neurosurgery [91]. A planar panorama image of
a mouse colon generated in real-time with a CLE [12]. Result of a real-time mosaicking approach for assistance during retinal surgery [39].
provide details on the different contributions for each step in the tortion, allowing full calibration from a single chessboard image.
pipeline. Thus, minimal effort is needed by the surgeon to calibrate the
endoscope system within the operating room. This method has
A. Distortion Correction and Camera Calibration been further improved by Melo et al. [120]. A practical problem
of endoscopy calibration are frequent changes of focus or zoom
Typical endoscopes are wide-angle lens systems with viewing
settings, which make prior calibration invalid or at least impre-
angles between 90◦ and 120◦ . This setup causes barrel distortion
cise. The problem of continuous re-calibration was addressed
effects in the images, which interfere with the pinhole camera
by Lourenco et al. [121], who tracked salient points in the im-
model usually used. Different ways of overcoming this problem
ages to correct for a change of the focal length. Pratt et al. [122]
have been presented. First, classical camera calibration—e.g.,
presented a solution for intraoperative re-calibration of a stereo
Tsai’s method [110], Heikkilä’s extension [111], and Zhang’s
camera setup by reducing the variable parameter space to one
[112], or Hartley’s method [113]—using a calibration pattern,
parameter (focus position). They suggested to laser-engrave a
can be applied to extract the intrinsic camera parameters, includ-
calibration pattern onto a surgical instrument, so calibration can
ing distortion coefficients. Full calibration of endoscopic camera
be performed inside the body.
systems is a challenging task due to strong distortions, low image
An alternative approach has been used by Miranda-Luna et al.
contrast and the problem of interfering with the clinical work-
[14] (mosaicking of the bladder), Soper et al. [29] (bladder
flow if calibration has to be performed in the operating room.
reconstruction), Chen et al. [94] (colon reconstruction), Grasa
Several publications have explicitly addressed the problem of
et al. [60] (SLAM for laparoscopy), Stoyanov et al. [123] (en-
endoscopic camera calibration. Zhang et al. [114] demonstrated
hanced visualization during laparoscopy), and others. In this
the applicability of their calibration technique to endoscopic
method, the estimation of calibration parameters is incorpo-
video images. Wengert et al. [115] presented a fully automatic
rated into a global optimization process (see Section III-D).
calibration approach, with a newly designed sterilizable calibra-
This autocalibration method makes prior explicit recordings of
tion pattern. Apart from taking at least two calibration images,
a calibration pattern unnecessary and facilitates the calibration
no further user interaction is required for full calibration using
process—although at the cost of increasing the number of pa-
Heikkilä’s model. Stehle et al. [116] considered that the pin-
rameters that have to be optimized.
hole camera model is not suitable for endoscopes and suggested
a more general model, based on prior work by Kannala and
Brandt [117]. Li et al. [118] proposed a distortion correction B. De-Vignetting
pipeline, including a new polynomial model, for endoscopic A vignetting effect is typical in endoscopic images, due to the
images. Barreto et al. [119] developed a single-shot calibration wide field of view and the fact that the light source is directed
method. Their method is based on lifted coordinates to get a toward the center of the field of view. As a consequence, the
linear formulation of the projection model including radial dis- brightness usually decreases toward the edges of the image. To
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 311
make image registration more robust and reduce vignetting ar- information (MI). Feature-based approaches extract higher-level
tifacts in the final panorama image, several authors have opted features from two images that are matched on the basis of their
to compensate for inhomogeneous illumination as a prepro- similarity. SIFT features [124] have proved to be among the
cessing step. As the vignetting effect depends on the distance most distinctive and have been used by Behrens et al. [21],
and perspective alignment between the camera and the scene, [22] and Soper et al. [29]. Bergen et al. [125] have presented
a de-vignetting filter is usually not kept constant, but has to be a combined tracking approach with SIFT features and Kanade–
recalculated for every image. In the early research by Miranda- Lucas–Tomasi) tracking [126]. The high computational load
Luna et al. [13] a high-pass filtering approach was presented. involved in SIFT features motivated the development of SURF
The authors estimated the frequency range of the vignetting ef- features [127], which provide similar performance at a signifi-
fect from a Fourier transform and subtracted a Gaussian-filtered cantly greater speed through the use of integral images. SURF
image with the relevant bandwidth. Most de-vignetting methods features are used by Behrens et al. [25], Vemuri et al. [128], Re-
used for endoscopic image stitching are related to this approach eff et al. [129], Iakovidis et al. [51], and Richa et al. [39]. More
(Weibel et al. [27] and Hernandez-Mier et al. [20]). Alterna- recent developments include fast feature detectors, such as the
tively, de-vignetting can be considered as part of the blending accelerated segment test (e.g., FAST by Rosten and Drummond
step, favoring pixels closer to the image center when calculating [130] and AGAST by Mair et al. [131]) and CenSurE (center-
the pixel values of the final mosaic. In this case, the weighting surrounded extrema) by Agrawal et al. [132], as well as descrip-
function that compensates for illumination differences is based tors that are represented by binary vectors, such as binary robust
on some distance measure from the border or center of the im- independent elementary features [133] and ORB [134], FREAK
age. This function is kept constant for all images, implicitly [135], BRISK [136], and SKB [137], which can be computed
assuming an invariant vignetting effect. This approach has been rapidly and are claimed by their developers to be comparable
followed by Behrens et al. [24], Bouma et al. [64], and Soper to SIFT and SURF in distinctiveness. To the best of our knowl-
et al. [29]. Mountney and Yang [55] have also addressed the edge, none of these has yet been applied to endoscopic stitching,
vignetting problem in SLAM-based mosaicking. To texturize despite their high potential. An exception is the work of Mount-
the reconstructed surface model, they ignored areas close to the ney et al. [138], [139], who have presented an online learning
edge when selecting the texture images from the video stream. scheme for feature descriptors and adapted the method of ran-
domized trees for keypoint recognition by Lepetit et al. [140] to
develop context-specific descriptors for application in laparo-
C. Pairwise Image Registration and Frame Selection
scopic stereoscopy SLAM. They have also presented a com-
Image registration is probably the most crucial step in the parison of feature descriptors for MIS [141]. Another method
stitching pipeline. The goal of image registration is to find designated to laparoscopic feature tracking was presented by
the transformation between pairs of images. Most stitching Giannarou et al. [142]. They proposed an anisotropic affine in-
approaches start by registering frames from the video stream variant region tracking scheme, which is supported by an EKF
sequentially—i.e., each frame is registered to its (direct or in- based prediction mechanism to handle the difficult scenerio of
direct) predecessor. Since using every single frame of the video feature tracking during MIS. Although many authors claim to
stream results in a heavy computational load as well as an unnec- have used feature-based registration successfully, several re-
essarily high amount of overlap between frames, most authors search groups argue that only pixel-based registration is able
choose to implement some sort of frame selection mechanism. to handle frames with little texture and small overlap reliably
The simplest approach is to take every kth frame (e.g., Behrens that are present in endoscopic video sequences. Ben-Hamadou
et al. [25]). Finding an adequate k for a whole video sequence et al. [17], Miranda-Luna et al. [13], [14], and Hernandez-Mier
is difficult, since registration may fail if the value chosen is too et al. [20], therefore, present pixel-based registration techniques.
large. Some authors have therefore implemented a strategy of The disadvantage of these approaches is the long computation
adapting k according to the registration results. Soper et al. [29] time, needed for registration. Hernandez-Mier et al. [20] report
increment k as long as registration is successful and decrement 1.2 s and Ben-Hamadou et al. [17] as long as 60 s to register a
it if necessary. This registration process can be referred to as single pair of images. Weibel et al. [27] have taken a different
sequential frame registration [29]. If stitching is solely based approach, aiming at maximal robustness and combining differ-
on this sequential approach, small registration errors accumu- ent aspects of the techniques mentioned above. They minimized
late and can lead to unacceptably large distortions over time. To an energy function consisting of pixel-based color similarity,
address this problem, overlapping image pairs should be sought SURF keypoint similarity, an overall smoothness constraint,
that are nonsequential in the video stream. Different strategies and a planarity assumption and reported successful registration
have been proposed, and these are discussed in Section III-D. results in cases in which most other approaches fail—i.e., small
A wide range of algorithms has been proposed for solving the overlap (less than 50%) for nonsequential image pairs. Their
basic problem of registering a pair of images. Szeliski [2] pro- implementation requires an average of 20 s per image pair.
vides an overview of different approaches. The main categories An interframe transformation model is needed for the regis-
are pixel-based and feature-based approaches. Pixel-based al- tration process. The one most commonly used is a perspective
gorithms try to minimize an optimization criterion calculated transformation (homography H) in the 2-D projective space P 2 .
over the entire set of pixels within the overlap regions of the The homography accurately describes a coordinate transforma-
two images. Common criteria are the sum of squared differ- tion between two views: 1) if the scene is planar, or 2) if the
ences (SSD), normalized cross-correlation (NCC), and mutual camera motion between the views is a pure rotation (without
312 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016
translation). These assumptions are reasonable in the case in sequence of several minutes can easily consist of several thou-
which camera motion between two successive frames is small sands of frames, resulting in millions of possible edges. The
and usually the part of the scene displayed in one image is small crucial point is therefore how to choose promising edge candi-
enough not to contain any dominant 3-D structure. Obviously, dates. Different strategies have been presented in the context of
the planarity assumption is violated on a global scale when endoscopic mosaicking. Soper et al. [29] tried to complete the
stitching images from a larger scene. The problems related to graph by reducing an exhaustive search to every nth frame of the
this have given rise to a whole set of publications, which are sequence, followed by a further edge densification based on as-
discussed in the next section. While a homography with eight sociativity within the graph. Although this already significantly
degrees of freedom is most often used, Behrens et al. [21] re- reduces the number of transformation estimates to calculate,
duce the complexity by assuming an affine transformation with they report processing times of several hours (including global
six degrees of freedom. In order to model the interframe trans- optimization through incremental bundle adjustment). Miranda-
formation on the curved human retina more accurately, Can Luna et al. [14] also point out the need to perform global op-
et al. [33] and Cattin et al. [35] use a 12-parameter quadratic timization and describe a corresponding optimization scheme,
transformation model. but loop-closing edges are selected manually. Another graph-
In the case of pixel-based registration the transformation pa- based approach has been presented by Weibel et al. [27]. They
rameters are estimated by an optimization procedure. In the case estimate the amount of potential overlap between two nonse-
of feature-based registration, the point correspondences give rise quential frames on the basis of the initial homography estimates
to an over-determined system of equations, which can be solved and use this to model a cost function for the frame graph edges.
using a Random Sample Consensus scheme [143] or one of its A greedy algorithm is then used to search for possible overlap-
numerous derivatives, such as MSAC or MLESAC [144]. ping frame pairs within this graph. Their mosaicking method is
For comprehensive surveys on general aspects of image regis- also an offline method, as they report a processing time of 1 h for
tration the reader is referred to the publications by Brown [145], a mosaic consisting of 150 out of 1500 images. Seshamani et al.
Goshtaby [146], and Zitov and Flusser [147]. [149] presented different global adjustment methods for direct
image registration, based on graph representation and loop de-
D. Global Alignment and Optimization tection. They create globally consistent mosaics of some 10–50
images from an endoscopic video of the endometrium. Once ad-
On the basis of the results of registering sequential video ditional frame correspondences have been established, a global
frames, the images can be aligned in a common coordinate sys- error measure is minimized using bundle adjustment. Weibel
tem. The straightforward way of doing this is to compute a et al. [27] used a twofold strategy, first minimizing the SSD
global homography for each frame as the product of all homo- between all overlapping frames and then applying bundle ad-
graphies describing the local frame-to-frame transformations justment to grid points regularly chosen over the mosaic. Soper
and thus placing each frame on a planar projection surface. This et al. [29] followed the standard SfM approach and estimated
is the approach taken by Behrens et al. [21], Miranda-Luna et al. camera poses and 3-D positions of feature points in the scene,
[14], and Weibel et al. [27] for cystoscopic panorama images. upon which classical bundle adjustment is performed to reduce
Similarly, Iakovidis et al. [51] searched for clusters of overlap- the global reprojection error.
ping consecutive frames within the set of thousands of frames Dynamic view enhancement, building on filter-based SLAM,
captured during WCE in order to calculate local panorama im- takes a different approach to the generation of a globally
ages. Two problems arise from the consecutive strategy: First, consistent scene representation. Typically, the current camera
the planar projection surface leads to major distortions if the pose and the global scene representation are combined into
scene significantly deviates from a plane; and second, the errors one state vector. Assuming the Markov property that the cur-
occurring during registration accumulate with increasing frame rent state only depends on its direct predecessor state, the state
numbers. As a consequence, this strategy is only applicable with vector is incrementally modified in an alternating way: In the
small mosaics and tends to fail for the application of mapping prediction step, the current state is estimated on the basis of
of the entire bladder, for example, from a cystoscopy video. a camera motion model. In the measurement step, the state is
The problem of geometric distortions can be reduced by updated according to the observation of the scene in the cur-
choosing an appropriate projection surface that is similar to rent camera image. When loops can be successfully detected,
the shape of the scene. This topic is discussed in Section III-E. the additional observations lead to a refined state estimate. This
The issue of accumulating error can be addressed using a graph approach has been followed—primarily for laparoscopic view
representation. Each vertex of the graph represents a frame and enhancement—by Grasa et al. [59], [60] and Mountney et al.
each edge a connection through a frame-to-frame transforma- [55], [150].
tion. The goal is to find further edges between nonconsecutive
frames, which can then enable a global optimization strategy to
minimize a global error measure. Finding these additional edges E. Projection Surface
is a challenging task and is very similar to the loop-closing prob- In order to generate a composite view of the scene, a sur-
lem in SLAM applications. A comparison of loop-closing tech- face defining a global coordinate system is needed onto which
niques developed for SLAM problems can be found in [148]. the images or parts of images can be projected. Two basic ap-
Checking all possible image pairs for a transformation results in proaches can be identified for choosing the projection surface,
n (n −1)
2 ∈ O(n2 ) edge candidates. A typical cystoscopy video which we call model-based and model-free. The model-based
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 313
approach assumes a fixed geometric model of the underlying a single image. The optical setup of endoscopes gave rise to sev-
scene structure. A planar projection surface or cylindrical or eral methods which adapt the SfS approach to handle a single
spherical shapes are most commonly used. The model chosen light source which is not co-located with the camera center. Wu
should fairly approximate the shape of the organ. The major- et al. [159] extended SfS to deal with perspective projection and
ity of stitching approaches use a planar projection model. This near point light sources not located at the camera center. They
allows a straightforward projection procedure based on the pre- showed the efficacy of their approach on images of artificial
viously calculated interframe affine or projective transform ma- bones. Visentini-Scarzanella et al. [160] presented a variation
trices. As stated earlier, major distortions arise if the scene is of this approach for metric depth recovery, applied to images of
not well represented by a plane. As an alternative representation, the stomach lining and the esophagus.
Szeliski [1] suggests either cylindrical or spherical coordinate
representations. Several papers on cystoscopic and retinal image F. Blending and Visualization
stitching use a spherical projection surface as a fair approxima-
Once all of the relevant frames have been aligned within a
tion to the geometry of the organs (e.g., Soper et al. [29], Wei
global coordinate system, the mosaic can be rendered. Since
et al. [37]). A bootstrapping method, which starts with a simple
usually more than one video frame contributes to a single mo-
two-parameter translational model and extends to a complex
saic pixel, blending techniques have been proposed for choosing
12-parameter quadratic model (which consequently allows for
the final pixel values. Two general cases can be distinguished.
a spherical scene) has been proposed by Can et al. [33] and
If the mosaic is calculated as an offline process, all frames are
Stewart et al. [151]. A cylindrical projection surface is usually
usually available and a weighted average can be computed. On
the first choice for stitching and reconstruction of tubular or-
the other hand, if the stitching process is incremental—i.e., each
gans such as the esophagus (see [41], [42]–[45]) or the urethra
new image is added to the mosaic and discarded afterward—only
(see[46]–[48]).
the mosaic and the most recent image can be used for blend-
The model-free approach does not assume any geometric con-
ing. Most blending algorithms can be applied to both scenarios,
straints, but aims at general 3-D surface reconstruction. SfM,
but with different results. Only Bouma et al. [64], Soper et al.
SfS, and stereo vision, as well as active vision techniques us-
[29], and Weibel et al. [27] therefore use all available frames
ing structured light or distance sensors, have been applied for
for blending; the majority of authors follow the incremental ap-
endoscopic surface reconstruction. Active techniques require
proach. The goal of any blending scheme is to provide smooth
dedicated endoscopes—an area that lies beyond the scope of
transitions between the images and at the same time preserve as
this review and is therefore omitted here. For the field of 3-D
much structural information from the input images as possible.
surface reconstruction in laparoscopy (including stereoscopic
Standard blending algorithms applied for endoscopic mosaick-
reconstruction, monocular shape-from-X methods, sparse and
ing are linear alpha blending [161], multi-band image blending
dense SLAM solutions, as well as structured-light and time-of-
[162], and optimal seam detection [163]. When weighting func-
flight techniques in both rigid and deformable environments),
tions are being designed for the blending schemes, most authors
the reader is referred to the review by Maier-Hein et al. [53].
take into account the fact that the inner regions in endoscopic
SfM has been applied in various fields of endoscopy. In cys-
images show a greater contrast and thus contain more relevant
toscopy, Soper et al. [29] start with a spherical bladder model,
information than the outer regions, mainly due to vignetting
which is later relaxed to allow for a more general reconstruc-
effects. Weighting pixels in relation to their distance from the
tion. A wide variety of applications of SfM algorithms exist
center of the image (also referred to as feather blending) is
in the field of MIS. For reconstruction of the surface of the
the preferred method for endoscopic stitching. The basic linear
heart, Hu et al. [71], [72] have presented methods based on
alpha blending (with or without feathering) usually produces
monocular SfM. Mourgues et al. [152] created a surface model
smooth transitions but also tends to blur image structure. This
to visualize the heart, suppressing surgical instruments, based
effect is reduced with multi-band blending, which is therefore
on stereo vision. The reconstruction of sinusoidal cavities for
applied by Behrens et al., who also introduced a nonlinear com-
video/CT registration with SfM has been described by Mirota
ponent to prefer brighter images over darker ones [24]. In [164]
et al. [87], [88], Wang et al. [153], [154], Burschka et al. [78],
and [27], Weibel et al. present a blending strategy as energy
[79], and Wittenberg et al. [80]. In laparoscopy, Bouma et al.
minimization problem. They implement an optimal seam detec-
[64] have used SfM to reconstruct the abdominal cavity (using
tion algorithm, in which they formulate the energy function in
either monocular or stereoscopic images). The majority of re-
such a way that sharper image regions (based on Michaelson
construction approaches in laparoscopy use EKF-based SLAM
contrast) are preferred over blurrier ones and the color gradient
methods for scene reconstruction (see [55], [59], [60], [155]),
along the seam is kept low at the same time. Their method is
and these have also been extended to handle tissue deforma-
based on that of Kwatra et al. [165].
tion [156], [157]). Gonzalez et al. [158] have combined SfS
The spectrum of methods involved in endoscopic view ex-
to estimate a depth map with a tracking approach for pose es-
pansion and discussed in this section is summarized in a mind
timation of surgical tools. The few publications on stitching
map in Fig. 5.
and reconstruction of the colon are dominated by the classical
SfM approach (see [92]–[94]). Exceptions are Kaufman and
G. Online Versus Offline Methods
Wang [95], who extracted the geometry using SfS, and Hong
et al. [96], who used manually labeled colon folds and a tubular Depending on the application concerned—view expansion
model of the organ to reconstruct the local colon geometry from during endoscopy, or documentation afterward (see Section I-
314 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016
Image masking
Distortion correction Preprocessing
Camera calibration Frame selection Fixed step
De-Vignetting Adaptive step
B above)—computation has to be performed either online or error, structural information in the resulting mosaic, smoothness
offline. The stitching and reconstruction approaches mentioned along the seams between images, and processing speed. In the
above differ greatly in their processing speed. In general, the case of stitching in combination with surface reconstruction, the
great majority of algorithms presented so far are not applica- precision of the reconstruction is also an issue. There is no gold
ble in real-time and take several minutes or hours to process. standard for evaluating stitching results, and to the best of our
This is true for all stitching and reconstruction methods that knowledge there is no public database that provides a ground
include a global optimization step, such as bundle adjustment. truth against which the different approaches could be compared.
The SLAM methods, primarily applied in laparoscopy, are an In fact, each group of authors present their own method of eval-
exception to this—as presented by Mountney and Yang’s group uation using nonstandardized data. Since it is generally difficult
[55], [138], [139], [141], [150], [156], [157] as a stereoscopic to generate a ground truth with which to compare the stitching
approach, as well as by Grasa et al. [59], [60], [155] for mono- results, many authors do without any quantitative evaluation and
scopic views (see Section II-D). Bouma et al. [64] have also limit themselves to presenting stitched images to give the reader
presented a real-time reconstruction approach for MIS, which a visual impression of the quality. These images can either be
incorporates stereoscopic ego-motion computation in real-time. calculated from real clinical data or phantom data. Nevertheless,
For microendoscopy, the scene can be assumed to be planar, some efforts have been made to objectify quality assessment in
allowing successful image stitching in real-time, as presented endoscopic mosaicking.
by Vercauteren et al. [12], [103], [104] and Bedard et al. [105]
(see Section II-G). In this method, mosaics are generated at
11 frames/s. For planar stitching in fluorescence cystoscopy, A. Registration Error
Behrens et al. [25], [166] have presented an online method The error made during frame-to-frame registration can
based on a multi-threaded software framework (see Section II- be measured in several ways. Behrens and Röllinger [168]
A). They achieved a rate of 5 frames/s on standard PC hardware compared the calculated frame-to-frame homographies to ref-
in 2010. Bergen et al. described live stitching of liver images at erence homographies calculated from manually set point corre-
a frame rate of 7 frames/s in 2009 [167]. Since 2006, Hager’s spondences. Instead of manual annotation, Ben-Hamadou et al.
group [38], [39] have presented several real-time mosaicking [18], Hernandez-Mier et al. [20], and Miranda-Luna et al. [14]
approaches for the retina (see Section II-B), reporting frame stitched a photograph of a pig bladder with an additional point
rates of 30 frames/s. All of these methods are based on incre- grid printed on it as a phantom. The point grid can be extracted
mental planar stitching algorithms without global optimization automatically from the stitched image to allow comparison of
or reconstruction of the 3-D organ geometry. Exceptions to this the extracted point positions with the true ones. Another com-
are the SLAM methods mentioned above and methods that use mon method of generating ground truth data is through simula-
depth information from stereo vision. tion. Weibel et al. [27], among others, have simulated an image
sequence by taking subimages from a high-resolution pig blad-
der photograph. Since all subimage transformations are known,
IV. EVALUATING STITCHING RESULTS
the registration result can be compared to the true transforma-
In efforts to assess the quality of stitching algorithms, several tions by means of the mean endpoint error, as suggested by
aspects need to be addressed: Accuracy in terms of registration Baker et al. [169] for optical flow evaluation.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 315
C. Processing Speed Fig. 6. Publication database analysis. The search results from the Scopus
database are depicted as numbers of listed publications over the years.
Processing speed is probably the only parameter that can
be quantitatively evaluated easily. For offline approaches, most
authors indicate the amount of time needed to generate a full
panorama from a given number of input images. The calculation
time can of course increase disproportionately with the number
of frames, so that averaging over all frames is not legitimate. For
real-time stitching approaches, it is important that the processing
time per frame does not increase over time. An average rate of
frames processed per second is usually an adequate measure for
assessing real-time capability.
Surface reconstrucƟon of urethra, esophagus AnalyƟcal and experimental criƟcal funcƟon and/or
SƟtching/ReconstrucƟon of small colon parts
TRL 3 characterisƟc proof-of-concept
Laryngoscopic sƟtching
SƟtching/ReconstrucƟon of colon
TRL 2 Technology concept and/or applicaƟon formulated
Fig. 8. TRLs of endoscopic applications for stitching and surface reconstruction methods.
or reconstructed model is computed either online during image algorithms are available for camera calibration in a fixed optical
acquisition, or offline in a postprocessing manner. The result system, changes in the system, such as refocusing, still pose a
has to be visualized for the doctor or clinical staff for naviga- challenge. Image registration based on salient image features or
tional and documentary purposes. Depending on the application, direct alignment is still a current research topic, due to the com-
human-machine interaction may be necessary. monly difficult image quality conditions and high demands on
With regard to the image acquisition process, current trends robustness and computational speed. Interestingly, recent devel-
can be observed in relation to interdisciplinary influences. Ad- opments in nonmedical applications, such as image processing
vances in multimedia technology and camera chip development and computer vision on smartphones, digital photo cameras, and
have led to rapidly increasing image resolutions. While only a embedded systems, do not appear to have been exhaustively
few years ago, VGA (640 × 480 pixels) was a common reso- transferred to endoscopic applications, despite their potential
lution level, most current endoscopy systems already provide for making a valuable contribution. Depending on the hardware
full HD resolution (1920 × 1080 pixels). The first 4K systems platform available in the endoscopic system, different imple-
(3840 × 2160 pixels) are now also being developed. While this mentations for algorithm parallelization and acceleration are
progress is having a positive effect on the image quality, the possible. Multicore central processing units, general-purpose
increasing amount of image data also poses a challenge on the graphic processing units, and embedded processors or field-
computational side with regard to computational load and mem- programmable gate arrays allow parallel computation, which
ory requirements. In addition, the increasing influence of robotic is certainly one of the current high-priority research areas for
and machine vision developments on the medical field is provid- image processing in general and endoscopic stitching and 3-D
ing new data sources for improving stitching and reconstruction reconstruction in particular.
approaches, such as kinematic data. The same also applies for In addition to computation, visualization of a large panoramic
other sensory enhancements for endoscopic devices, such as view and 3-D scene reconstruction also pose challenges in en-
motion sensors (external tracking systems or acceleration sen- doscopy. As the physician is used to being presented with the
sors) and depth sensors (time-of-flight technology, structured image provided by the endoscopic camera for examination, di-
light, and stereo imaging). A further trend, influenced by pro- agnosis, or intervention, the way in which the additional vi-
duction technology, is the increasing miniaturization of endo- sual information should be optimally provided and the view
scopes. Single-port laparoscopy devices, with small-diameter augmented is an open question. For navigational purposes, the
endoscopes and microendoscopes as thin as a human hair, are problem is how to present the unmanipulated endoscope image
being developed, as well as miniaturized CMOS camera sensors within an augmented computed context (which may be delayed
of submillimeter size. The small design often leads to strong op- and is not guaranteed to resemble the current scene correctly) in
tical distortions and an even more reduced field of view. This such a way that it improves orientation for the surgeon without
may even further increase the demand for robust software meth- reducing his or her attentiveness. For documentation purposes,
ods for enhancing the field of view. the computed map or model has to be integrated into the patient’s
Several challenges and trends can be observed with regard record. Although there are several publications in the literature
to algorithmic approaches for image stitching and 3-D surface that deal with these questions (e.g., [57]), they will become even
reconstruction. The problem of image stitching or texturized re- more important as the technology increasingly matures.
construction of unknown rigid scenes was conceptually solved Related to the question of visualization is the matter of
by the computer vision community some 20–30 years ago. Cur- human-machine interaction when manipulation of the view
rent advances significant for endoscopic applications are ac- presented, such as zooming or changes of perspective, is nec-
celeration up to real-time capability, and robust and precise essary. Particularly in the clinical environment, classical forms
handling of large and complex as well as nonrigid scenes. A of interaction through a mouse, keyboard, or touch panel are
real-time capability is inevitably necessary for dynamic view unfeasible due to unacceptable interference with the clinical
enhancement in endoscopy. Some successful implementations workflow (e.g., [174]). To deal with this problem, research is
of SLAM or real-time stitching have been presented for la- being conducted on alternative interfaces such as voice control
paroscopy (using stereoscopic vision) and simple, mostly pla- or gesture recognition ([175], [176]). Like the visualization as-
nar, geometries. Fast reconstruction of more complex shapes is pect, these topics are likely to become even more important in
still a major task. Since the human body is a mostly nonrigid the near future.
environment, the rigidity assumption present in most algorithms
is infringed. There have recently been increasing efforts to find REFERENCES
appropriate ways of handling dynamic scenes. For laparoscopy
[1] R. Szeliski, “Image alignment and stitching: A tutorial,” Found. Trends.
or heart surgery, deformation models have been formulated for Comput. Graph. Vis., vol. 2, no. 1, pp. 1–104, Jan. 2006.
modeling systematic motion or deformation due to heartbeat and [2] R. Szeliski, Computer Vision: Algorithms and Applications. 1st ed., New
breathing. The reconstruction of a nonrigid environment has to York, NY, USA: Springer-Verlag, 2010.
[3] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “
be regarded as a yet unsolved problem, particularly since defor- Bundle adjustment—A modern synthesis,” in Vision Algorithms: Theory
mation can become very complex and unsuitable for a periodic and Practice (ser. Lecture Notes in Computer Science), B. Triggs, A.
motion model in the case of ego-motion of the patient or organ Zisserman, and R. Szeliski, Eds., Berlin, Germany: Springer, Jan. 2000,
deformation due to the physician’s interaction with the patient. pp. 298–372.
[4] R. Szeliski and H.-Y. Shum, “Creating full view panoramic image mo-
At a more basic level, there appears to be room for improvement saics and environment maps,” in Proc. 24th Annu. Conf. Comput. Graph-
for many steps in the image-processing pipeline. While suitable ics Interactive Techn., 1997, pp. 251–258.
318 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016
[5] D. Capel and A. Zisserman, “Automated mosaicing with super-resolution [27] T. Weibel, C. Daul, D. Wolf, R. Rösch, and F. Guillemin, “Graph
zoom,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recog., based construction of textured large field of view mosaics for blad-
1998, pp. 885–891. der cancer diagnosis,” Pattern Recog., vol. 45, no. 12, pp. 4138–4150,
[6] M. Brown and D. Lowe, “Recognising panoramas,” in Proc. 9th IEEE 2012.
Int. Conf. Comput. Vision, 2003, vol. 2, pp. 1218–1225. [28] T. Weibel, “Modèles de minimisation d’énergies discrètes pour
[7] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer la cartographie cystoscopique,” Ph.D. dissertation, Univ. Lorraine,
Vision, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press, 2004. IAEM – Ecole Doctorale Informatique, Automatique, Électronique –
[8] N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from Électrotechnique, Mathématiques, Nancy, France, Jul. 2013.
internet photo collections,” Int. J. Comput. Vision, vol. 80, no. 2, [29] T. Soper, M. Porter, and E. J. Seibel, “Surface mosaics of the bladder
pp. 189–210, Dec. 2007. reconstructed from endoscopic video for automated surveillance,” IEEE
[9] A. Davison, I. Reid, N. Molton, and O. Stasse, “MonoSLAM real-time Trans. Biomed. Eng., vol. 59, no. 6, pp. 1670–1680, Jun. 2012.
single camera SLAM,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, [30] W. J. Yoon, S. Park, P. G. Reinhall, and E. J. Seibel, “Development of
no. 6, pp. 1052–1067, Jun. 2007. an automated steering mechanism for bladder urothelium surveillance,”
[10] G. Klein and D. Murray, “Parallel tracking and mapping for small AR J. Med. Devices, vol. 3, no. 1, p. 11004, Mar. 2009.
workspaces,” in Proc. 6th IEEE ACM Int. Symp. Mixed Augmented Re- [31] W. Yoon, M. Brown, P. Reinhall, S. Park, and E. Seibel, “Design
ality, 2007, pp. 225–234. and preliminary study of custom laser scanning cystoscope for auto-
[11] I. Fleming, S. Voros, B. Vagvolgyi, Z. Pezzementi, J. Handa, R. Tay- mated bladder surveillance,” Minimally Invasive Therapy Allied Technol.,
lor, and G. Hager, “Intraoperative visualization of anatomical targets in vol. 21, no. 5, pp. 320–328, 2012.
retinal surgery,” in Proc. IEEE Workshop Appl. Comput. Vision, 2008, [32] M. Burkhardt, T. Soper, W. Yoon, and E. Seibel, “Controlling the trajec-
pp. 1–6. tory of a flexible ultrathin endoscope for fully automated bladder surveil-
[12] T. Vercauteren, A. Perchant, G. Malandain, X. Pennec, and N. Ayache, lance,” IEEE/ASME Trans. Mechatronics, vol. 19, no. 1, pp. 366–373,
“Robust mosaicing with correction of motion distortions and tissue de- Feb. 2014.
formations for in vivo fibered microscopy,” Med. Image Anal., vol. 10, [33] A. Can, C. V. Stewart, B. Roysam, and H. L. Tanenbaum, “A feature-
no. 5, pp. 673–692, 2006. based, robust, hierarchical algorithm for registering pairs of images of
[13] R. Miranda-Luna, Y. Hernandez-Mier, C. Daul, W. Blondel, and D. Wolf, the curved human retina,” IEEE Trans. Pattern Anal. Mach. Intell.,
“Mosaicing of medical video-endoscopic images: data quality improve- vol. 24, no. 3, pp. 347–364, Mar. 2002.
ment and algorithm testing,” in Proc. 1st Int. Conf. Electr. Electron. Eng., [34] D. Becker, A. Can, J. Turner, H. Tanenbaum, and B. Roysam, “Image
2004, pp. 530–535. processing algorithms for retinal montage synthesis, mapping, and real-
[14] R. Miranda-Luna, C. Daul, W. C. P. M. Blondel, Y. Hernandez-Mier, time location determination,” IEEE Trans. Biomed. Eng., vol. 45, no. 1,
D. Wolf, and F. Guillemin, “Mosaicing of bladder endoscopic image se- pp. 105–118, Jan. 1998.
quences: Distortion calibration and registration algorithm,” IEEE Trans. [35] P. C. Cattin, H. Bay, L. V. Gool, and G. Székely, “Retina mosaicing
Biomed. Eng., vol. 55, no. 2, pp. 541–553, Feb. 2008. using local features,” in Proc. Med. Image Comput. Comput.-Assisted
[15] Y. Hernandez-Mier, W. Blondel, C. Daul, D. Wolf, and G. Bourg-Heckly, Intervention, Jan. 2006, pp. 185–192.
“2-D panoramas from cystoscopic image sequences and potential appli- [36] T. E. Choe, I. Cohen, M. Lee, and G. Medioni, “Optimal global mosaic
cation to fluorescence imaging,” in Proc. 6th IFAC Symp. Modelling generation from retinal images,” in Proc. 18th Int. Conf. Pattern Recog.,
Control Biomed. Syst., Sep. 2006, pp. 291–296. 2006, vol. 3, pp. 681–684.
[16] S. Olijnyk, Y. Hernández Mier, W. C. P. M. Blondel, C. Daul, D. Wolf, [37] L. Wei, L. Huang, L. Pan, and L. Yu, “The retinal image mosaic based
and G. Bourg-Heckly, “Combination of panoramic and fluorescence en- on invariant feature and hierarchial transformation models,” in Proc. 2nd
doscopic images to obtain tumor spatial distribution information useful Int. Cong. Image Signal Process., 2009, pp. 1–5.
for bladder cancer detection,” in Proc. Soc. Photo-Opt. Instrum. Eng. [38] S. Seshamani, W. Lau, and G. Hager, “Real-time endoscopic mosaick-
Conf. Ser., Jul. 2007, vol. 6631, p. 29. ing,” in Proc. Med. Image Comput. Comput.-Assisted Intervention, 2006,
[17] A. Ben-Hamadou, C. Soussen, W. Blondel, C. Daul, and D. Wolf, pp. 355–363.
“Comparative study of image registration techniques for bladder video- [39] R. Richa, B. Vágvölgyi, M. Balicki, G. Hager, and R. H. Taylor, “Hy-
endoscopy,” in Proc. Eur. Conf. Biomed. Opt., 2009, p. 737118. brid tracking and mosaicking for information augmentation in retinal
[18] A. Ben-Hamadou, C. Daul, C. Soussen, A. Rekik, and W. Blondel, surgery,” in Proc. Med. Image Comput. Comput.-Assisted Intervention,
“A novel 3D surface construction approach: Application to three- 2012, pp. 397–404.
dimensional endoscopic data,” in Proc. 17th IEEE Int. Conf. Image [40] B. Rousso, S. Peleg, I. Finci, and A. Rav-Acha, “Universal mosaicing
Process., 2010, pp. 4425–4428. using pipe projection,” in Proc. 6th Int. Conf. Comput. Vision, 1998,
[19] C. Daul, W. P. C. M. Blondel, A. Ben-Hamadou, R. Miranda-Luna, C. pp. 945–950.
Soussen, D. Wolf, and F. Guillemin, “From 2D towards 3D cartography [41] E. J. Seibel, R. Carroll, J. Dominitz, R. Johnston, C. Melville, C. Lee,
of hollow organs,” in Proc. 7th Int. Electr. Eng. Comput. Sci. Automat. S. Seitz, and M. Kimmey, “Tethered capsule endoscopy, a low-cost and
Control Conf., 2010, pp. 285–293. high-performance alternative technology for the screening of esophageal
[20] Y. Hernandez-Mier, W. Blondel, C. Daul, D. Wolf, and F. Guillemin, cancer and Barrett’s esophagus,” IEEE Trans. Biomed. Eng., vol. 55,
“Fast construction of panoramic images for cystoscopic exploration,” no. 3, pp. 1032–1042, Mar. 2008.
Comput. Med. Imag. Graphics, vol. 34, no. 7, pp. 579–592, 2010. [42] C. Yang, T. Soper, and E. Seibel, “Detecting fluorescence hot-spots using
[21] A. Behrens, “Creating panoramic images for bladder fluorescence en- mosaic maps generated from multimodal endoscope imaging,” in Proc.
doscopy,” Acta Polytechnica J. Adv. Eng., vol. 48, no. 3, pp. 50–54, SPIE, Prog. Biomed. Opt. Imag., 2013, vol. 8575, p. 857508.
2008. [43] A. O. Shar, J. C. Reynolds, and B. B. Baggott, “Computer enhanced
[22] A. Behrens, T. Stehle, S. Gross, and T. Aach, “Local and global panoramic endoscopic visualization,” in Proc. Annu. Symp. Comput. Appl. Med.
imaging for fluorescence bladder endoscopy,” in Proc. Annu. Int. Conf. Care, Nov. 1990, pp. 544–546.
IEEE Eng. Med. Biol. Soc., 2009, pp. 6990–6993. [44] R. Kim, B. B. Baggott, S. Rose, A. O. Shar, D. L. Mallory, S. S. Lasky, M.
[23] A. Behrens, I. Heisterklaus, Y. Müller, T. Stehle, S. Gross, and T. Aach, Kressloff, L. Y. Faccenda, and J. C. Reynolds, “Quantitative endoscopy:
“2-D and 3-D visualization methods of endoscopic panoramic bladder Precise computerized measurement of metaplastic epithelial surface area
images,” in Proc. Med. Imag. 2011: Visualization, Image-Guided Proce- in barrett’s esophagus,” Gastroenterology, vol. 108, no. 2, pp. 360–366,
dures, Modeling, Feb. 2011, p. 796408. Feb. 1995.
[24] A. Behrens, M. Guski, T. Stehle, S. Gross, and T. Aach, “A non-linear [45] J. C. Reynolds, “Innovative endoscopic mapping technique of barrett’s
multi-scale blending algorithm for fluorescence bladder images,” Com- mucosa,” Am. J. Med., vol. 111, no. 8, Supplement 1, pp. 142–146,
put. Sci. - Res. Develop., vol. 26, no. 1, pp. 125–134, 2011. Dec. 2001.
[25] A. Behrens, M. Bommes, T. Stehle, S. Gross, S. Leonhardt, and T. [46] T. Igarashi, S. Zenbutsu, T. Yamanishi, and Y. Naya, “ Three-dimensional
Aach, “Real-time image composition of bladder mosaics in fluorescence image processing system for the ureter and urethra using endoscopic
endoscopy,” Comput. Sci.-Res. Develop., vol. 26, nos. 1/2, pp. 51–64, video,” J. Endourol./Endourol. Soc., vol. 22, no. 8, pp. 1569–1572, Aug.
2011. 2008.
[26] T. Bergen, T. Wittenberg, C. Münzenmayer, C. C. G. Chen, and G. D. [47] T. Igarashi, H. Suzuki, and Y. Naya, “Computer-based endoscopic image-
Hager, “A graph-based approach for local and global panorama imaging processing technology for endourology and laparoscopic surgery,” Int.
in cystoscopy,” in Proc. SPIE, vol. 8671, p. 86711K-1, 2013. J. Urol., vol. 16, no. 6, pp. 533–543, 2009.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 319
[48] T. Ishii, S. Zenbutsu, T. Nakaguchi, M. Sekine, Y. Naya, and T. Igarashi, [68] P.-L. Chang, D. Stoyanov, A. J. Davison, and P. Edwards, “Real-time
“Novel points of view for endoscopy: Panoramized intraluminal opened dense stereo reconstruction using convex optimisation with a cost-
image and 3D shape reconstruction,” J. Med. Imag. Health Informat., volume for image-guided robotic surgery,” in Proc. Med. Image Comput.
vol. 1, no. 1, pp. 13–20, Mar. 2011. Comput.-Assisted Intervention, Jan. 2013, pp. 42–49.
[49] M. Ou-Yang, W.-D. Jeng, Y.-Y. Wu, L.-R. Dung, H.-M. Wu, P.-K. Weng, [69] S. Atasoy, D. Noonan, S. Benhimane, N. Navab, and G. Yang, “A global
K.-J. Huang, and L.-J. Chiu, “Image stitching and image reconstruction approach for automatic fibroscopic video mosaicing in minimally inva-
of intestines captured using radial imaging capsule endoscope,” Opt. sive diagnosis,” in Proc. Med. Image Comput. Comput.-Assisted Inter-
Eng., vol. 51, no. 5, pp. 057004-1–057004-9, 2012. vention, 2008, pp. 850–857.
[50] S. Yi, J. Xie, P. Mui, and J. A. Leighton, “Achieving real-time capsule [70] M. Hu, D. Hawkes, G. Penney, D. Rueckert, P. Edwards, F. Bello, M.
endoscopy (CE) video visualization through panoramic imaging,” in Figl, and R. Casula, “A robust mosaicing method for robotic assisted
Proc. IS&T/SPIE Electron. Imag., 2013, p. 86560I. minimally invasive surgery,” in Proc. 7th Int. Conf. Informat. Control,
[51] D. K. Iakovidis, E. Spyrou, and D. Diamantis, “Efficient homography- Autom. Robot., Funchal, Portugal, 2010, vol. 2, pp. 206–211.
based video visualization for wireless capsule endoscopy,” in Proc. 2013 [71] M. Hu, G. Penney, P. Edwards, M. Figl, and D. J. Hawkes, “3D re-
IEEE 13th Int. Conf. Bioinformat. Bioeng., 2013, pp. 1–4. construction of internal organ surfaces for minimal invasive surgery,” in
[52] E. Spyrou, D. Diamantis, and D. Iakovidis, “Panoramic visual summaries Proc. Med. Image Comput. Comput.-Assisted Intervention, Jan. 2007,
for efficient reading of capsule endoscopy videos,” in Proc. 8th Int. pp. 68–77.
Workshop Semantic Soc. Media Adaptation Personalization, Dec. 2013, [72] M. Hu, G. Penney, M. Figl, P. Edwards, F. Bello, R. Casula, D. Rueckert,
pp. 41–46. and D. Hawkes, “Reconstruction of a 3D surface from video that is robust
[53] L. Maier-Hein, P. Mountney, A. Bartoli, H. Elhawary, D. Elson, A. to missing data and outliers: Application to minimally invasive surgery
Groch, A. Kolb, M. Rodrigues, J. Sorger, S. Speidel, and D. Stoyanov, using stereo and mono endoscopes,” Med. Image Anal., vol. 16, no. 3,
“Optical techniques for 3D surface reconstruction in computer-assisted pp. 597–611, 2012.
laparoscopic surgery,” Med. Image Anal., vol. 17, no. 8, pp. 974–996, [73] A. Malti, A. Bartoli, and T. Collins, “Template-based conformal shape-
2013. from-motion-and-shading for laparoscopy,” in Information Processing in
[54] M. Lerotic, A. J. Chung, J. Clark, S. Valibeik, and G.-Z. Yang, “Dynamic Computer-Assisted Interventions. New York, NY, USA: Springer, 2012,
view expansion for enhanced navigation in natural orifice transluminal pp. 1–10.
endoscopic surgery,” in Proc. Med. Image Comput. Comput.-Assisted [74] A. Bartoli, Y. Gerard, F. Chadebecq, and T. Collins, “On template-based
Intervention, 2008, pp. 467–475. reconstruction from a single view: Analytical solutions and proofs of
[55] P. Mountney and G. Yang, “Dynamic view expansion for minimally well-posedness for developable, isometric and conformal surfaces,” in
invasive surgery using simultaneous localization and mapping,” in Proc. Proc. IEEE Conf. Comput. Vision Pattern Recog., Jun. 2012, pp. 2026–
Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., 2009, pp. 1184–1187. 2033.
[56] D. Stoyanov, M. V. Scarzanella, P. Pratt, and G.-Z. Yang, “ Real-time [75] S. Giannarou and G.-Z. Yang, “Tissue deformation recovery with Gaus-
stereo reconstruction in robotically assisted minimally invasive surgery,” sian mixture model based structure from motion,” in Augmented En-
in Proc. Med. Image Comput. Comput.-Assisted Intervention, Jan. 2010, vironments for Computer-Assisted Interventions (ser. Lecture Notes in
pp. 275–282. Computer Science), C. A. Linte, J. T. Moore, E. C. S. Chen, and D. R.
[57] J. Totz, K. Fujii, P. Mountney, and G.-Z. Yang, “ Enhanced visualisation H. III, Eds., Berlin, Germany: Springer, Jan. 2012, pp. 47–57.
for minimally invasive surgery,” Int. J. Comput. Assist Radiol. Surg., vol. [76] S. Giannarou, Z. Zhang, and G.-Z. Yang, “Deformable structure from
7, no. 3, pp. 423–432, Jun. 2011. motion by fusing visual and inertial measurement data,” in Proc. 2012
[58] A. Warren, P. Mountney, D. Noonan, and G.-Z. Yang, “ Horizon IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2012, pp. 4816–4821.
stabilized-dynamic view expansion for robotic assisted surgery (HS- [77] M. Schuster, T. Bergen, M. Reiter, C. Münzenmayer, S. Friedl, and T.
DVE),” Int. J. Comput. Assist Radiol. Surg., vol. 7, no. 2, pp. 281–288, Wittenberg, “Laryngoscopic image stitching for view enhancement and
Jun. 2011. documentation—first experiences,” Biomedizinische Technik. Biomed.
[59] O. G. Grasa, J. Civera, A. Guemes, V. Munoz, and J. M. M. Montiel, Eng., vol. 57, no. 1, pp. 704–707, Aug. 2012.
“EKF monocular SLAM 3D modeling, measuring and augmented reality [78] D. Burschka and G. Hager, “V-GPS(SLAM): vision-based inertial system
from endoscope image sequences,” in Proc. 5th Workshop Augmented for mobile robots,” in Proc. IEEE Int. Conf. Robot. Autom., 2004, vol. 1,
Environ. Med. Imag. including Augmented Reality Comput.-Aided Surg., pp. 409–415.
Held Conjunction MICCAI’09, 2009, pp. 102–109. [79] D. Burschka, M. Li, M. Ishii, R. H. Taylor, and G. D. Hager, “Scale-
[60] O. G. Grasa, J. Civera, and J. M. M. Montiel, “EKF monocular SLAM invariant registration of monocular endoscopic images to CT-scans for
with relocalization for laparoscopic sequences,” in Proc. IEEE Int. Robot. sinus surgery,” Med. Image Anal., vol. 9, no. 5, pp. 413–426, Oct. 2005.
Autom. Conf., 2011, pp. 4816–4821. [80] T. Wittenberg, C. Winter, I. Scholz, S. Rupp, M. Stamminger, K.
[61] J. Civera, A. Davison, and J. Montiel, “Dimensionless monocular Bumm, and C. Nimsky, “3-D reconstruction of the sphenoid sinus from
SLAM,” Pattern Recog. Image Anal., vol. 4478, pp. 412–419, 2007. monocular endoscopic views: First results,” Gemeinsame Jahrestagung
[62] J. Civera, A. Davison, and J. Montiel, “Inverse depth parametrization for der Deutschen, Österreichischen und Schweizerischen Gesellschaften für
monocular SLAM,” IEEE Trans. Robot., vol. 24, no. 5, pp. 932–945, Biomedizinische Technik, DGBMT, Zürich, Schweiz, 2006.
Oct. 2008. [81] W. Konen, S. Tombrock, and M. Scholz, “Robust registration procedures
[63] J. Civera, O. G. Grasa, A. J. Davison, and J. M. M. Montiel, “1-point for endoscopic imaging,” Med. Image Anal., vol. 11, no. 6, pp. 526–539,
RANSAC for EKF filtering: Application to real-time structure from mo- Dec. 2007.
tion and visual odometry,” J. Field Robot., vol. 27, no. 5, pp. 609–631, [82] C. Winne, M. Khan, F. Stopp, E. Jank, and E. Keeve, “Overlay visualiza-
Oct. 2010. tion in endoscopic ENT surgery,” Int. J. Comput. Assisted Radiol. Surg.,
[64] H. Bouma, W. Van Der Mark, P. Eendebak, S. Landsmeer, A. Van Eek- vol. 6, no. 3, pp. 401–406, May 2011.
eren, F. Ter Haar, F. Wieringa, and J.-P. Van Basten, “Streaming video- [83] F. Schulze, K. Bühler, A. Neubauer, A. Kanitsar, L. Holton, and S. Wolfs-
based 3D reconstruction method compatible with existing monoscopic berger, “Intra-operative virtual endoscopy for image guided endonasal
and stereoscopic endoscopy systems,” in Proc. SPIE - Int. Soc. Opt. Eng., transsphenoidal pituitary surgery,” Int. J. Comput. Assisted Radiol. Surg.,
Baltimore, MD, USA, 2012, vol. 8371, p. 837112. vol. 5, no. 2, pp. 143–154, Mar. 2010.
[65] J. Totz, P. Mountney, D. Stoyanov, and G.-Z. Yang, “Dense surface [84] M. J. Daly, H. Chan, E. Prisman, A. Vescan, S. Nithiananthan, J. Qiu,
reconstruction for enhanced navigation in MIS,” in Proc. Med. Image R. Weersink, J. C. Irish, and J. H. Siewerdsen, “Fusion of intraoperative
Comput. Comput.-Assisted Intervention, Jan. 2011, no. 6891, pp. 89–96. cone-beam CT and endoscopic video for image-guided procedures,”
[66] S. Röhl, S. Bodenstedt, S. Suwelack, H. Kenngott, B. P. Müller-Stich, R. Proc. SPIE, vol. 7625, pp. 762503-1–762503-8, 2010.
Dillmann, and S. Speidel, “Dense GPU-enhanced surface reconstruction [85] R. Shahidi, M. Bax, J. Maurer, C.R., J. Johnson, E. Wilkinson, B. Wang,
from stereo endoscopic images for intraoperative registration,” Med. J. West, M. Citardi, K. Manwaring, and R. Khadem, “Implementa-
Phys., vol. 39, no. 3, pp. 1632–1645, Mar. 2012. tion, calibration and accuracy testing of an image-enhanced endoscopy
[67] S. Bernhardt, J. Abi-Nahed, and R. Abugharbieh, “Robust dense endo- system,” IEEE Trans. Med. Imag., vol. 21, no. 12, pp. 1524–1535,
scopic stereo reconstruction for minimally invasive surgery,” in Medical Dec. 2002.
Computer Vision. Recognition Techniques and Applications in Medical [86] R. Lapeer, M. Chen, G. Gonzalez, A. Linney, and G. Alusi, “Image-
Imaging (ser. Lecture Notes in Computer Science), B. H. Menze, G. enhanced surgical navigation for endoscopic sinus surgery: Evaluating
Langs, L. Lu, A. Montillo, Z. Tu, and A. Criminisi, Eds. Berlin, Ger- calibration, registration and tracking,” Int. J. Med. Robotics Comput.
many: Springer, Jan. 2013, pp. 254–262. Assisted Surg., vol. 4, no. 1, pp. 32–45, 2008.
320 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 20, NO. 1, JANUARY 2016
[87] D. Mirota, H. Wang, R. H. Taylor, M. Ishii, and G. D. Hager, “Toward [109] R. Carroll and S. Seitz, “Rectified surface mosaics,” Int. J. Comput.
video-based navigation for endoscopic endonasal skull base surgery,” Vision, vol. 85, no. 3, pp. 307–315, 2009.
Med. Image Comput. Comput. Assisted Intervention, vol. 12, no. Pt 1, [110] R. Tsai, “A versatile camera calibration technique for high-accuracy 3D
pp. 91–99, 2009. machine vision metrology using off-the-shelf TV cameras and lenses,”
[88] D. Mirota, H. Wang, R. Taylor, M. Ishii, G. Gallia, and G. Hager, IEEE J. Robot. Autom., vol. RA-3, no. 4, pp. 323–344, Aug. 1987.
“A system for video-based navigation for endoscopic endonasal skull [111] J. Heikkila and O. Silven, “A four-step camera calibration procedure with
base surgery,” IEEE Trans. Med. Imag., vol. 31, no. 4, pp. 963–976, implicit image correction,” in Proc. IEEE Comput. Soc. Conf. Comput.
Apr. 2012. Vision Pattern Recog., 1997, pp. 1106–1112.
[89] W. Konen, M. Naderi, and M. Scholz, “Endoscopic image mosaics [112] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans.
for real-time color video sequences,” Comput.-Assisted Radiol. Surg., Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, Nov. 2000.
vol. 2, no. 1, pp. S224–S225, 2007. [113] R. Hartley and S. B. Kang, “Parameter-free radial distortion correction
[90] M. Kourogi, T. Kurata, J. Hoshino, and Y. Muraoka, “Real-time image with center of distortion estimation,” IEEE Trans. Pattern Anal. Mach.
mosaicing from a video sequence,” in Proc. Int. Conf. Image Process., Intell., vol. 29, no. 8, pp. 1309–1321, Aug. 2007.
1999, vol. 4, pp. 133–137. [114] C. Zhang, J. Helferty, G. McLennan, and W. Higgins, “Nonlinear distor-
[91] T. Bergen, P. Hastreiter, C. Münzenmayer, M. Buchfelder, and T. Wit- tion correction in endoscopic video images,” in Proc. Int. Conf. Image
tenberg, “Image stitching of sphenoid sinuses from monocular endo- Process., 2000, vol. 2, pp. 439–442.
scopic views,” in Proc. Tagungsband, 12. Jahrestagung der Deutschen [115] C. Wengert, M. Reeff, P. C. Cattin, and G. Székely, “Fully automatic
Gesellschaft für Computer- und Roboterassistierte Chirurgie, Nov. 2013, endoscope calibration for intraoperative use,” in Proc. Bildverarbeitung
pp. 226–229. für die Medizin, Mar. 2006, pp. 419–423.
[92] T. Thormaehlen, H. Broszio, and P. N. Meier, Three-Dimensional En- [116] T. Stehle, D. Truhn, T. Aach, C. Trautwein, and J. Tischendorf, “Camera
doscopy. Norwell, MA: USA: Kluwer, 2002. calibration for fish-eye lenses in endoscopy with an application to 3D
[93] D. Koppel, C.-I. Chen, Y.-F. Wang, H. Lee, J. Gu, A. Poirson, and R. reconstruction,” in Proc. 4th IEEE Int. Symp. Biomed. Imag.: From Nano
Wolters, “Toward automated model building from video in computer- to Macro, 2007, pp. 1176–1179.
assisted diagnoses in colonoscopy,” in Proc. SPIE, vol. 6509, pp. 65091L- [117] J. Kannala and S. Brandt, “A generic camera model and calibration
1–65091L-9, 2007. method for conventional, wide-angle, and fish-eye lenses,” IEEE Trans.
[94] C.-I. Chen, D. Sargent, and Y.-F. Wang, “Modeling tumor/polyp/lesion Pattern Anal. Mach. Intell., vol. 28, no. 8, pp. 1335–1340, Aug. 2006.
structure in 3D for computer-aided diagnosis in colonoscopy,” in Proc. [118] W. Li, S. Nie, M. Soto-Thompson, C.-I. Chen, and Y. I. A-Rahim, “Robust
SPIE, vol. 7625, p. 76252F-1, 2010. distortion correction of endoscope,” Proc. SPIE, vol. 6918, p. 691812,
[95] A. Kaufman and J. Wang, “3D surface reconstruction from endoscopic 2008.
videos,” in Visualization in Medicine and Life Sciences (ser. Mathematics [119] J. Barreto, J. Roquette, P. Sturm, and F. Fonseca, “Automatic camera
and Visualization), L. Linsen, H. Hagen, and B. Hamann, Eds. Berlin, calibration applied to medical endoscopy,” presented at the 20th British
Germany: Springer, Jan. 2008, pp. 61–74. Mach. Vision Conf., London, U.K., 2009.
[96] D. Hong, W. Tavanapong, J. Wong, J. Oh, and P. Groen, “3D reconstruc- [120] R. Melo, J. Barreto, and G. Falcao, “A new solution for camera calibra-
tion of colon segments from colonoscopy images,” in Proc. 9th IEEE tion and real-time image distortion correction in medical endoscopy—
Int. Conf. Bioinformat. BioEng., 2009, pp. 53–60. Initial technical evaluation,” IEEE Trans. Biomed. Eng., vol. 59, no. 3,
[97] Y. Fan, M.-H. Meng, and B. Li, “3D reconstruction of wireless capsule pp. 634–644, Mar. 2012.
endoscopy images,” in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., [121] M. Lourenço, J. P. Barreto, F. Fonseca, H. Ferreira, R. M. Duarte, and J.
Aug. 2010, pp. 5149–5152. Correia-Pinto, “Continuous zoom calibration by tracking salient points
[98] B. Sun, L. Liu, C. Hu, and M.-H. Meng, “3D reconstruction based on in endoscopic video,” in Proc. Med. Image Comput. Comput.-Assisted
capsule endoscopy image sequences,” in Proc. Int. Conf. Audio Language Intervention, Jan. 2014, pp. 456–463.
Image Process., Nov. 2010, pp. 607–612. [122] P. Pratt, C. Bergeles, A. Darzi, and G.-Z. Yang, “Practical intraopera-
[99] Q. Zhao and M.-H. Meng, “3D reconstruction of GI tract texture sur- tive stereo camera calibration,” in Proc. Med. Image Comput. Comput.-
face using capsule endoscopy images,” in Proc. IEEE Int. Conf. Autom. Assisted Intervention, Jan. 2014, pp. 667–675.
Logistics, Aug. 2012, pp. 277–282. [123] D. Stoyanov, A. Darzi, and G.-Z. Yang, “Laparoscope self-calibration
[100] G. Ciuti, M. Visentini-Scarzanella, A. Dore, A. Menciassi, P. Dario, and for robotic assisted minimally invasive surgery,” in Proc. Med. Image
G.-Z. Yang, “Intra-operative monocular 3D reconstruction for image- Comput. Comput.-Assisted Intervention, Jan. 2005, pp. 114–121.
guided navigation in active locomotion capsule endoscopy,” in Proc. 4th [124] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
IEEE RAS EMBS Int. Conf. Biomed. Robot. Biomechatron., Jun. 2012, Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.
pp. 768–774. [125] T. Bergen, S. Nowack, C. Münzenmayer, and T. Wittenberg, “A hybrid
[101] A. Karargyris and N. Bourbakis, “Three-dimensional reconstruction of tracking approach for endoscopic real-time panorama imaging,” Int. J.
the digestive wall in capsule endoscopy videos using elastic video in- CARS, vol. 8, Suppl. 1, pp. 352–354, Jun. 2013.
terpolation,” IEEE Trans. Med. Imag., vol. 30, no. 4, pp. 957–971, [126] J. Shi, and C. Tomasi, “Good features to track,” in Proc. IEEE Comput.
Apr. 2011. Soc. Conf. Comput. Vision Pattern Recog., 1994, pp. 593–600.
[102] V. Prasath, I. Figueiredo, P. Figueiredo, and K. Palaniappan, “Mucosal [127] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust
region detection and 3D reconstruction in wireless capsule endoscopy features SURF,” Comput. Vision Image Understanding, vol. 110, no. 3,
videos using active contours,” in Proc. 2012 Annu. Int. Conf. IEEE Eng. pp. 346–359, 2008.
Med. Biol. Soc., Aug. 2012, pp. 4014–4017. [128] A. S. Vemuri, K.-C. Liu, Y. Ho, H.-S. Wu, and M.-C. Ku, “Endoscopic
[103] T. Vercauteren, “Image registration and mosaicing for dynamic in vivo video mosaicing: Application to surgery and diagnostics,” in Proc. Living
fibered confocal microscopy,” Ph.D. dissertation, Comput. Sci, Ecole Imag. Workshop, 2011, pp. 1–2.
Nat. Superieure des Mines de Paris, Paris, France, 2008. [129] M. Reeff, F. Gerhard, P. Cattin, and G. Székely, “Mosaicing of en-
[104] T. Vercauteren, A. Meining, F. Lacombe, and A. Perchant, “Real time doscopic placenta images,” GI Jahrestagung, vol. 2006, pp. 467–474,
autonomous video image registration for endomicroscopy: Fighting the 2006.
compromises,” vol. 6861, p. 68610C, Feb. 2008. [130] E. Rosten and T. Drummond, “Machine learning for high-speed corner
[105] N. Bedard, T. Quang, K. Schmeler, R. Richards-Kortum, and T. S. detection,” in Proc. Eur. Conf. Comput. Vision, Jan. 2006, pp. 430–443.
Tkaczyk, “Real-time video mosaicing with a high-resolution microen- [131] E. Mair, G. D. Hager, D. Burschka, M. Suppa, and G. Hirzinger, “Adap-
doscope,” Biomed. Opt. Exp., vol. 3, no. 10, pp. 2428–2435, Sep. 2012. tive and generic corner detection based on the accelerated segment test,”
[106] K. Loewke, D. Camarillo, C. Jobst, and J. Salisbury, “Real-time image in Proc. Eur. Conf. Comput. Vision, Jan. 2010, pp. 183–196.
mosaicing for medical applications,” Stud. Health Technol. Informat., [132] M. Agrawal, K. Konolige, and M. R. Blas, “CenSurE: Center surround
vol. 125, pp. 304–309, 2007. extremas for realtime feature detection and matching,” in Proc. Eur.
[107] K. Loewke, D. Camarillo, W. Piyawattanametha, M. Mandella, C. Con- Comput. Vision, Jan. 2008, pp. 102–115.
tag, S. Thrun, and J. Salisbury, “In vivo micro-image mosaicing,” IEEE [133] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary robust
Trans. Biomed. Eng., vol. 58, no. 1, pp. 159–171, Jan. 2011. independent elementary features,” in Proc. Eur. Conf. Comput. Vision,
[108] T. Soper, J. Chandler, M. Porter, and E. Seibel, “Constructing spherical Jan. 2010, pp. 778–792.
panoramas of a bladder phantom from endoscopic video using bundle [134] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient
adjustment,” in Proc. SPIE: Prog. Biomed. Opt. Imag., Lake Buena Vista, alternative to SIFT or SURF,” in Proc. IEEE Int. Conf. Comput. Vision,
FL, USA, 2011, vol. 7964, p. 796417. 2011, pp. 2564–2571.
BERGEN AND WITTENBERG: STITCHING AND SURFACE RECONSTRUCTION FROM ENDOSCOPIC IMAGE SEQUENCES: 321
[135] A. Alahi, R. Ortiz, and P. Vandergheynst, “FREAK: Fast retina keypoint,” [157] P. Mountney, D. Stoyanov, and G. Yang, “Three-dimensional tissue de-
in Proc. IEEE Conf. Comput. Vision Pattern Recog., 2012, pp. 510–517. formation recovery and tracking,” IEEE Signal Process. Mag., vol. 27,
[136] S. Leutenegger, M. Chli, and R. Siegwart, “BRISK: Binary robust invari- no. 4, pp. 14–24, Jul. 2010.
ant scalable keypoints,” in Proc. IEEE Int. Conf. Comput. Vision, 2011, [158] A. M. C. González, P. Sánchez-González, F. M. Sánchez-Margallo, I.
pp. 2548–2555. Oropesa, F. d. Pozo, and E. J. Gómez, “Video-endoscopic image analysis
[137] F. Zilly, C. Riechert, P. Eisert, and P. Kauff, “Semantic kernels for 3D reconstruction of the surgical scene,” in Proc. 4th Eur. Conf. Int.
binarized—A feature descriptor for fast and robust matching,” in Proc. Federation Med. Biol. Eng., Jan. 2009, pp. 923–926.
Conf. Visual Media Prod., 2011, pp. 39–48. [159] C. Wu, S. G. Narasimhan, and B. Jaramaz, “A multi-image shape-from-
[138] P. Mountney, and G. Yang, “Soft tissue tracking for minimally invasive shading framework for near-lighting perspective endoscopes,” Int. J.
surgery learning local deformation online,” in Proc. Med. Image Comput. Comput. Vision, vol. 86, nos. 2/3, pp. 211–228, Jan. 2010.
Comput.-Assisted Intervention, 2008, pp. 364–372. [160] M. Visentini-Scarzanella, D. Stoyanov, and G.-Z. Yang, “Metric depth
[139] P. Mountney and G.-Z. Yang, “Context specific descriptors for tracking recovery from monocular images using shape-from-shading and specu-
deforming tissue,” Med. Image Anal., vol. 16, pp. 550–561, Apr. 2012. larities,” in Proc. 19th IEEE Int. Conf. Image Process., Sep. 2012, pp.
[140] V. Lepetit and P. Fua, “Keypoint recognition using randomized trees,” 25–28.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 9, pp. 1465–1479, [161] T. Porter and T. Duff, “Compositing digital images,” in Proc. 11th Annu.
Sep. 2006. Conf. Comput. Graphics Interactive Tech., 1984, pp. 253–259.
[141] P. Mountney, B. Lo, S. Thiemjarus, D. Stoyanov, and G. Zhong-Yang, [162] P. J. Burt and E. H. Adelson, “A multiresolution spline with application
“A probabilistic framework for tracking deformable soft tissue in mini- to image mosaics,” ACM Trans. Graph., vol. 2, no. 4, pp. 217–236,
mally invasive surgery,” in Proc. Med. Image Comput. Comput.-Assisted Oct. 1983.
Intervention, 2007, pp. 34–41. [163] J. Davis, “Mosaics of scenes with moving objects,” in Proc. IEEE Com-
[142] S. Giannarou, M. Visentini-Scarzanella, and G.-Z. Yang, “Probabilistic put. Soc. Conf. Comput. Vision Pattern Recog., Jun. 1998, pp. 354–360.
tracking of affine-invariant anisotropic regions,” IEEE Trans. Pattern [164] T. Weibel, C. Daul, D. Wolf, and R. Rosch, “Contrast-enhancing seam
Anal. Mach. Intell., vol. 35, no. 1, pp. 130–143, Jan. 2013. detection and blending using graph cuts,” in Proc. 21st Int. Conf. Pattern
[143] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm Recog., 2012, pp. 2732–2735.
for model fitting with applications to image analysis and automated [165] V. Kwatra, A. Schödl, I. Essa, G. Turk, and A. Bobick, “Graphcut tex-
cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, Jun. 1981. tures: Image and video synthesis using graph cuts,” in Proc. ACM SIG-
[144] P. Torr, and A. Zisserman, “MLESAC: A new robust estimator with GRAPH 2003 Papers, 2003, pp. 277–286.
application to estimating image geometry,” Comput. Vision Image Un- [166] A. Behrens, M. Bommes, T. Stehle, S. Gross, S. Leonhardt, and T.
derstanding, vol. 78, no. 1, pp. 138–156, Apr. 2000. Aach, “A multi-threaded mosaicking algorithm for fast image composi-
[145] L. G. Brown, “A survey of image registration techniques,” ACM Comput. tion of fluorescence bladder images,” in Proc. SPIE Med. Imag., 2010,
Surv., vol. 24, no. 4, pp. 325–376, Dec. 1992. p. 76252S.
[146] A. A. Goshtasby, 2-D and 3-D Image Registration: for Medical, Remote [167] T. Bergen, S. Ruthotto, C. Münzenmayer, S. Rupp, D. Paulus, and C.
Sensing, and Industrial Applications. New York, NY, USA: Wiley, Apr. Winter, “Feature-based real-time endoscopic mosaicking,” in Proc. 6th
2005. Int. Symp. Image Signal Process. Anal., 2009, pp. 695–700.
[147] B. Zitová and J. Flusser, “Image registration methods: A survey,” Image [168] A. Behrens and H. Röllinger, “Analysis of feature point distributions
Vision Comput., vol. 21, no. 11, pp. 977–1000, Oct. 2003. for fast image mosaicking algorithms,” Acta Polytechnica J. Adv. Eng.,
[148] B. Williams, M. Cummins, J. Neira, P. Newman, I. Reid, and J. Tardós, vol. 50, no. 4, pp. 12–18, Aug. 2010.
“A comparison of loop closing techniques in monocular SLAM,” Robot. [169] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, and R. Szeliski,
Auton. Syst., vol. 57, no. 12, pp. 1188–1197, 2009. “A database and evaluation methodology for optical flow,” Int. J. Comput.
[149] S. Seshamani, M. D. Smith, J. J. Corso, M. O. Filipovich, A. Natarajan, Vision, vol. 92, no. 1, pp. 1–31, Mar. 2011.
and G. D. Hager, “Direct global adjustment methods for endoscopic [170] A. Behrens, M. Bommes, S. Gross, and T. Aach, “Image quality assess-
mosaicking,” in Proc. SPIE, 2009, p. 72611D. ment of endoscopic panorama images,” in Proc. 18th Int. Conf. Image
[150] P. Mountney, D. Stoyanov, A. Davison, and G. Yang, “Simultaneous Process., 2011, pp. 3113–3116.
stereoscope localization and soft-tissue mapping for minimal invasive [171] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assess-
surgery,” in Proc. Med. Image Comput. Comput.-Assisted Intervention, ment: from error visibility to structural similarity,” IEEE Trans. Image
2006, pp. 347–354. Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[151] C. Stewart, C.-L. Tsai, and B. Roysam, “The dual-bootstrap iterative [172] C. Li, X. Yang, B. Chu, W. Lu, and L. Pang, “A new image fusion quality
closest point algorithm with application to retinal image registration,” assessment method based on contourlet and SSIM,” in Proc. 2010 3rd
IEEE Trans. Med. Imag., vol. 22, no. 11, pp. 1379–1394, Nov. 2003. IEEE Int. Conf. Comput. Sci. Inf. Technol., 2010, vol. 5, pp. 246–249.
[152] F. Mourgues, F. Devemay, and E. Coste-Maniere, “3D reconstruction of [173] J. C. Mankins, “Technology readiness levels,” White Paper, vol. 6,
the operating field for image overlay in 3D-endoscopic surgery,” in Proc. Apr. 1995.
IEEE ACM Int. Symp. Augmented Reality, 2001, pp. 191–192. [174] A. Albu, Vision-Based User Interfaces for Health Applications: A Survey
[153] H. Wang, D. Mirota, M. Ishii, and G. Hager, “Robust motion estimation (ser. Lecture Notes in Computer Science, including subseries Lecture
and structure recovery from endoscopic image sequences with an adap- Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
tive scale kernel consensus estimator,” in Proc. 26th IEEE Conf. Comput. vol. 4291. New York, NY, USA: Springer, 2006.
Vision Pattern Recog., 2008, pp. 1–7. [175] L. C. Ebert, G. Hatch, G. Ampanozi, M. J. Thali, and S. Ross, “You can’t
[154] H. Wang, D. Mirota, G. Hager, and M. Ishii, “Anatomical reconstruction touch this touch-free navigation through radiological images,” Surgical
from endoscopic images: toward quantitative endoscopy.” Am. J. Rhinol., Innovation, vol. 19, no. 3, pp. 301–307, Sep. 2012.
vol. 22, no. 1, pp. 47–51, 2008. [176] M. G. Jacob and J. P. Wachs, “Context-based hand gesture recognition
[155] O. Grasa, E. Bernal, S. Casado, I. Gil, and J. Montiel, “Visual SLAM for for the operating room,” Pattern Recog. Lett., vol. 36, pp. 196–203, Jan.
handheld monocular endoscope,” IEEE Trans. Med. Imag., vol. 33, no. 2014.
1, pp. 135–146, Jan. 2014.
[156] P. Mountney and G. Yang, “Motion compensated SLAM for image
guided surgery,” in Proc. Med. Image Comput. Comput.-Assisted In-
tervention, 2010, pp. 496–504. Authors’ photographs and biographies not available at the time of publication.