You are on page 1of 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/320747098

An efficient solution to the perspective-three-point pose problem

Article  in  Computer Vision and Image Understanding · October 2017


DOI: 10.1016/j.cviu.2017.10.005

CITATIONS READS

16 458

4 authors, including:

Ping Wang Guili Xu


Lanzhou University of Technology Nanjing University of Aeronautics & Astronautics
7 PUBLICATIONS   46 CITATIONS    30 PUBLICATIONS   258 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Saliency detection integrating both background and foreground information View project

camera pose estimation, View project

All content following this page was uploaded by Ping Wang on 11 November 2019.

The user has requested enhancement of the downloaded file.


Computer Vision and Image Understanding 166 (2018) 81–87

Contents lists available at ScienceDirect

Computer Vision and Image Understanding


journal homepage: www.elsevier.com/locate/cviu

An efficient solution to the perspective-three-point pose problem T


a ⁎,a b c
Ping Wang , Guili Xu , Zhengsheng Wang , Yuehua Cheng
a
College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
b
College of Science, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
c
College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

A R T I C L E I N F O A B S T R A C T

Keywords: In this paper, we present a new algebraic method to solve the perspective-three-point (P3P) problem, which
Perspective-three-point problem (P3P) directly computes the rotation and position of a camera in the world frame without the intermediate derivation
Absolute position and attitude of the points in the camera frame. Unlike other online methods, the proposed method uses an “object” coordinate
Pose estimation frame, in which the known 3D coordinates are sparse, facilitating formulations of the P3P problem. Additionally,
Monocular vision
two auxiliary variables are introduced to parameterize the rotation and position matrix, and then a closed-form
Computer vision
solution for the camera pose is obtained from subsequent substitutions. This algebraic approach makes the
MSC: processes more easily followed and significantly improves the performance. Experimental results demonstrated
41A05
that our method offers accuracy and precision comparable to the existing state-of-the art methods but it has
41A10
better computational efficiency.
65D05
65D17

1. Introduction analytical solution to the P3P problem. Nistér and Stewénius (2007)
proposed a different approach that works for the generalized camera
Determining the position and orientation of a camera given its in- model (Chen and Chang, 2004). In 2011, Li and Xu (2011) proposed a
trinsic parameters and a set of three correspondences between 3D perspective similar triangle (PST) method to the P3P problem. The
points and their 2D projections is known as the perspective-three-point strategy of the PST method is to reduce the number of unknown
(P3P) problem (Fischler and Bolles, 1981). Strategies to address the P3P parameters by constructing a perspective similar triangle. The same
problem have widespread applications in augmented reality (Azuma year, Kneip et al. (2011) proposed a novel closed-form solution to the
et al., 2001; 1997), structure from motion (Dani et al., 2012; Ryan P3P problem, which computes the camera pose directly in a single
et al., 2015), pose recovery (Kangni and Laganiere, 2007), pose stage, without the intermediate derivation of the points in the camera
tracking (Li et al., 2015), and visual simultaneous localization and frame. In 2013, Ma et al. (2013) proposed a simple method to the P3P
mapping (SLAM) (Engelhard et al., 2011; Lategahn et al., 2011; Visual, problem, relying on the fact that the inverse of rotation matrix equals
2015). its transpose.
The P3P problem was first solved in 1841 by Grunert (1841) and All existing P3P methods cited above can be classified as single-
then was refined in 1903 by Finsterwalder and Scheufele (1903). In stage methods and two-stage methods. Currently, Kneip’s (Kneip et al.,
1991, Haralick et al. (1991) reviewed the major direct solutions, in- 2011) method is the only single-stage solver, which directly computes
cluding six algorithms described by Grunert (1841), Finsterwalder and the position and orientation of the camera in the world frame as a
Scheufele (1903), Merritt (1949), Fischler and Bolles (1981), function of the 2D image coordinates and the 3D reference points. The
Linnainmaa et al. (1988) and Grafarend et al. (1989), respectively. other methods first calculate the distances between the camera center
Haralick et al. (1991) also gave analytical solutions for the P3P problem and the 3D reference point, and then determine the orientation and
with the necessary computation. Dementhon and Davis (1995) showed position by singular value decomposition (SVD) (Arun et al., 1987;
a simpler computational solution for the P3P problem by using ap- Horn et al., 1988; Umeyama, 1991). Li’s (PST) (Li and Xu, 2011)
proximations of the perspective. Quan and Lan (1999) reviewed the 3- method may be the best version of the two-stage method, offering high
point algebraic method and proposed a linear algorithm to handle the numerical stability and accuracy. The PST approach is not a direct so-
problem with more than 4 points. Gao et al. (2003) used two ap- lution because of the need to construct a perspective similar triangle. In
proaches to solve the P3P problem and provided the first complete contrast to two-stage methods, the main advantage of single-stage


Corresponding author.
E-mail addresses: pingwangsky@gmail.com (P. Wang), guilixu@nuaa.edu.cn (G. Xu).

https://doi.org/10.1016/j.cviu.2017.10.005
Received 1 November 2016; Received in revised form 13 July 2017; Accepted 15 October 2017
Available online 31 October 2017
1077-3142/ © 2017 Elsevier Inc. All rights reserved.
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87

methods is superior computational efficiency. Practical applications


such as feature-based camera tracking (Lepetit and Fua, 2006; Skrypnyk
and Lowe, 2004), visual SLAM (Engelhard et al., 2011; Lategahn et al.,
2011; Visual, 2015), and visual odometry (Nistér et al., 2004; 2006)
usually require dealing with huge amounts of noisy feature points and
outliers in real-time, which requires computationally efficient P3P
solvers. Kneip’s (Kneip et al., 2011) method, as the only single-stage
method, is particularly suitable for these applications. However, Kneip
(Kneip et al., 2011) solves the P3P problem using a geometric approach,
including complex geometric transformations such as coordinate
transformation, direction vector computation, and semi-plane defini-
tion. All these transformations will reduce overall performance and
limit general understanding.
In this paper, we propose a new single-stage method to address the
P3P problem with high numerical stability and computational effi-
ciency. The advantages of our method are summarized as follows: Fig. 1. Illustration of the P3P problem.

— Instead of the traditional geometric approach, an algebraic approach is


used to solve the P3P problem. The proposed method relies on the
definition of an “object” coordinate frame, with sparse known 3D
coordinates, facilitating derivations of the P3P problem. Based on
the perspective camera model, two auxiliary variables are in-
troduced to parameterize the rotation and position matrix, and a
closed-form solution for the camera pose is obtained by subsequent
substitutions of auxiliary variables. All derivations are based on the
algebraic approach, which simplifies understanding of the processes
and improves overall performance.
— The proposed method directly computes the position and orientation of
the camera in the world frame. The inputs of our method are the
image coordinates and the world coordinates of the reference points,
and the direct outputs are the position and orientation of the camera
without the intermediate derivation of the reference points in the
camera frame.
— The proposed method offers accuracy and precision comparable to ex-
isting state-of-the art methods but with significantly lower computational
cost. Experimental results have demonstrated that the running speed
of our method is increased by about 30% compared with Kneip’s
(Kneip et al., 2011) method. This superior computational efficiency
is particularly suitable for real applications. Fig. 2. Illustration of the object frame.

The rest of the paper is organized as follows. Section 2 describes the ⎯⎯⎯⎯⎯⎯⎯→
derivations of the proposed method. Section 3 provides a thorough → W1 W2
uz = ⎯⎯⎯⎯⎯⎯⎯→ ,
analysis of the proposed method by simulated experiments. Section 4 W1 W2
shows the real tests. Section 5, finally, summarizes the work. → ⎯⎯⎯⎯⎯⎯⎯→
→ u × WW
z 1 3
uy = ⎯⎯⎯⎯⎯⎯⎯→ ,

uz × W1 W3
2. The proposed method →
ux = →
uy × →
uz . (1)
⎯⎯⎯⎯⎯⎯⎯→
2.1. Problem statement The → uz -axis is aligned with the vector W1 W2 , and the → u y -axis is
⎯⎯⎯⎯⎯⎯⎯→ ⎯⎯⎯⎯⎯⎯⎯→
perpendicular to the plane determined by W1 W2 and W1 W3 .
The problem considered in this paper is illustrated in Fig. 1. The → → →T
Via the transformation matrix Two = [ u x , u y , uz ] , the 3D reference
three known 3D reference points are W1, W2, and W3, and their cor- points can be easily transformed into the object frame using
responding 2D projections are p1, p2, and p3. The purpose is to recover
⎯⎯⎯⎯⎯⎯⎯→
the rotation Rwc and translation Twc of a calibrated camera. The nor- Pi = Two (Wi − W1) = Two Wi W1 i = 1, 2, 3, (2)
malized vectors f1, f2 and f3 can be easily obtained from p1, p2, and p3,
where Pi = [x i , yi , z i]T . It is obviously that P1 = [0, 0, 0]T . Since
and are used to facilitate subsequent derivations. ⎯⎯⎯⎯⎯⎯⎯→ ⎯⎯⎯⎯⎯⎯⎯→
u ⊥ W W and →
→ u ⊥ W W , we can obtain the following equations by the
x 1 2 y 1 3
definition of dot product
2.2. Building an object frame → ⎯⎯⎯⎯⎯⎯⎯→
u x • W1 W2 = 0,
→ ⎯⎯⎯⎯⎯⎯⎯→
The first step involves the definition of an intermediate frame, u y• W1 W2 = 0,
called “object frame”, from the 3D reference points. As shown in Fig. 2, → ⎯⎯⎯⎯⎯⎯⎯→
u y• W1 W3 = 0. (3)
we choose W1 as the origin and build an object frame [P1 − →ux , →
uy, →
uz ],
where By substituting (3) into (2), we can obtain P2 = [0, 0, z2 and ]T
P3 = [x3 , 0, z 3]T . Now if we are able to obtain the rotation and position
of the object frame with respect to the camera frame, the rotation Rwc

82
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87

and position Twc are obviously also given by Two. S6 = C1 S7 + D1 S4 + D2 , S8 = C2 S7 + D3 S4 + D4 , (14)


where
2.3. Estimating the pose between the object frame and the camera frame
B1 − A1 B − A2
T1 = , T2 = 2 ,
We define the rotation and position of the object frame with respect z2 z2
to the camera frame as R and T, respectively. Based on the pinhole C z − B1 z 3 C − A1 − z 3 T1
D1 = 1 3 , D2 = 1 ,
camera model, the perspective projection from the object frame to the x3 x3
image plane can be expressed by C z − B2 z 3 C − A2 − z 3 T2
D3 = 2 3 , D4 = 2 .
→ x3 x3
αi fi = RPi + T i = 1, 2, 3, (4)
It is clear that the Si (i = 1, 2, 3, 5, 6, 8) are now represented by the
where αi (i = 1, 2, 3) is an unknown scale factor, R and T can be written linear combination of S4 and S7.
r r r t Since the rotation R is an orthogonal matrix, we have two additional
⎡ 1 2 3⎤ ⎡ x⎤
R = ⎢ r4 r5 r6 ⎥, T = ⎢ ty ⎥ . nonlinear constrains as
⎣ r7 r8 r9 ⎦ ⎢ tz ⎥
⎣ ⎦ (5) r1 r3 + r4 r6 + r7 r9 = 0, (15)
Now we target the calculation of the rotation matrix R and the
r12 + r42 + r72 = r32 + r62 + r92. (16)
translation vector T. By substituting Pi (i = 1, 2, 3) and (5) into (4), we
have We multiply 1/ tz2 at both sides of (15) and (16), and plug and
S7 = r7/ tz , S4 = r9/ tz into them. We obtain
t r z + tx
→ ⎡ x⎤ → ⎡3 2 ⎤
α1 f1 = ⎢ ty ⎥, α2 f2 = ⎢ r6 z2 + ty ⎥, S6 S3 + S8 S5 + S7 S4 = 0, (17)
⎢ tz ⎥ ⎢ r9 z2 + tz ⎥
⎣ ⎦ ⎣ ⎦
r1 x3 + r3 z 3 + tx S62 + S82 + S72 = S32 + S52 + S42. (18)
→ ⎡ ⎤
α3 f3 = ⎢ r4 x3 + r6 z 3 + ty ⎥, By substituting (12), (13) and (14) into (17) and (18), we can get
⎢ r7 x3 + r9 z 3 + tz ⎥
⎣ ⎦ (6)
H1 S72 + H2 S42 + H3 S4 S7 + H4 S7 + H5 S4 + H6 = 0, (19)
→ → →
where f1 = (f1, x , f1, y , f1, z )T , f2 = (f2, x , f2, y , f2, z )T and f3 = (f3, x , f3, y , f3, z )T
are known. Dividing the first and second row of (6) by the third row, G1 S42 + G2 S4 S7 + G3 S4 + G4 S7 + G5 = 0, (20)
the αi (i = 1, 2, 3) is eliminated, and we have
where
tx f1, x ty f1, y
= , = , G1 = D1 B1 + D3 B2, G2 = C1 B1 + C2 B2 + 1,
tz f1, z tz f1, z (7) G3 = B1 D2 + D1 T1 + D3 T2 + B2 D4 ,
G4 = C1 T1 + C2 T2, G5 = D2 T1 + D4 T2,
r3 z2 + tx f2, x r6 z2 + ty f2, y
= , = , H1 = C12 + C22 + 1, H2 = D12 + D32 − B12 − B22 − 1,
r9 z2 + tz f2, z r9 z2 + tz f2, z (8) H3 = 2C1 D1 + 2C2 D3 , H4 = 2C1 D2 + 2C2 D4 ,
f3, y H5 = 2D1 D2 + 2D3 D4 − 2B1 T1 − 2B2 T2,
r1 x3 + r3 z 3 + tx f3, x r4 x3 + r6 z 3 + ty
= , = . H6 = D22 + D42 − T12 − T22.
r7 x3 + r9 z 3 + tz f3, z r7 x3 + r9 z 3 + tz f3, z (9)
We can express S7 via
Now we multiply 1/tz at both numerator and denominator of (8)
and (9), and we get G1 S42 + G3 S4 + G5
S7 = − .
f2, x f2, y G2 S4 + G4 (21)
r3 z2/ tz + tx / tz r6 z2/ tz + ty / tz
= , = ,
r9 z2/ tz + 1 f2, z r9 z2/ tz + 1 f2, z (10) After plugging (21) back into (19), expanding and collecting, we can get
a fourth order polynomial of the form
r1 x3 / tz + r3 z 3/ tz + tx / tz f3, x
= , U4 S44 + U3 S43 + U2 S42 + U1 S4 + U0 = 0, (22)
r7 x3 / tz + r9 z 3/ tz + 1 f3, z
r4 x3 / tz + r6 z 3/ tz + ty / tz f3, y where
= .
r7 x3 / tz + r9 z 3/ tz + 1 f3, z (11) U4 = H1 G12 + H2 G22 − H3 G1 G2,
U3 = 2H1 G1 G3 + 2H2 G2 G4 − H3 G1 G4−
Let’s define
H3 G2 G3 − H4 G1 G2 + H5 G22,
S1 = tx / tz , S2 = ty / tz , S3 = r3/ tz , S4 = r9/ tz ,
U2 = H1 G32 + 2H1 G1 G5 + H2 G42−
S5 = r6/ tz , S6 = r1/ tz , S7 = r7/ tz , S8 = r4/ tz ,
H3 G3 G4 − H3 G2 G5,
A1 = f1, x / f1, z , A2 = f1, y / f1, z , U1 = 2H1 G3 G5 − H3 G4 G5 − H4 G3 G4−
B1 = f2, x / f2, z , B2 = f2, y / f2, z , H4 G2 G5 + H5 G42 + 2H6 G2 G4,
C1 = f3, x / f3, z , C2 = f3, y / f3, z , U0 = H1 G52 − H4 G4 G5 + H6 G42.

where Si (i = 1, …, 8) is new unknown variable, Ai, Bi and Ci (i = 1, 2) are S4 can be easily solved from (22), which has four real solutions at
constant coefficients. Note that the rotation R and position T are now most. By substituting S4 into (21), we can get S7. By substituting S4 and
parameterized by Si (i = 1, …, 8) and Ai , Bi , Ci (i = 1, 2) . S7 into (13) and (14), S3, S5, S6 and S8 can be computed, respectively.
Replacing Si (i = 1, …, 8) and Ai , Bi , Ci (i = 1, 2) in (7), (10) and (11), Since the rotation vectors have the same length, so has
expanding and collecting then we have
r32 + r62 + r92 = 1. (23)
S1 = A1 , S2 = A2 , (12)
By substituting S3 = r3/ tz , S4 = r9/ tz and S5 = r6/ tz into (23), we can de-
S3 = B1 S4 + T1, S5 = B2 S4 + T2, (13) rive that

83
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87

1
tz = .
S32 + S42 + S52 (24)

After plugging tz back into tx = S1 tz , ty = S2


tz , r1 = S6 tz , r3 = S3 tz , r4 = S8 tz , r6 = S5 tz , r7 = S7 tz and r9 = S4 tz , we can
get tx, ty, r1, r3, r4, r6, r7, r9, respectively. And then r2, r5, r8 can be easily
obtained using the cross of r1, r4, r7 and r3, r6, r9 as follows

r r r
⎛ 2⎞ ⎛ 1 ⎞ ⎛ 3⎞
r5 = r4 × r6 .
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ r8 ⎠ ⎝ r7 ⎠ ⎝ r9 ⎠ (25)

Now the rotation R and translation T of the object frame with re-
Fig. 3. Average running time of all methods, it is clear that our method is the fastest.
spect to the camera frame have been acquired. Hence the complete
rotation Rwc and translation Twc of the camera with respect to the world
frame are finally given as Table 1
The total(1st Line) and single(2nd Line) running time.
Rwc = RTwo, (26)
Gao Li(PST) Kneip Wang
and
Total Time/s 130.927 127.065 38.915 27.179
Twc = T − RTwo W1. (27) Each Time/us 13.0927 12.7065 3.8915 2.7179

3. Synthetic experiments

In this section, we present our results comparing the performance of


the proposed method, as described in Wang, using simulated data. We
also compared the numerical stability, accuracy, and running speed of
this and other state-of-the-art methods:

• Gao: One of the most popular solver methods. (Gao et al., 2003).
• Li: The best two-stage solver with high numerical stability and ac-
curacy (Li and Xu, 2011).
• Kneip: The best single-stage solver with comparable accuracy and
precision at a substantially lower computational cost (Kneip et al.,
2011). Fig. 4. Numerical stability of all methods. The horizontal axis shows the log10 value of the
absolute rotation error(left) and the absolute translation error(right).
Note that all methods are MATLAB or C++ implementations. The
C++ version is based on Kneip’s opengv library without any additional
optimizations. The details of opengv are described in https://github.
com/laurentkneip/opengv. All codes are executed on a quad-core
desktop with 3.7 GHz CPU and 4GB RAM, and the source codes can be
downloaded from http://pingwang.sxl.cn/.

3.0.1. Synthetic data


We synthesized a virtual perspective camera with an image size of
640 × 480 pixels, focal length 200 pixels, and principle point at the
image center. Then, we generated three 3D reference points, which are
randomly distributed in the range of [−4,4] × [−4,4] × [8,16], in the
Fig. 5. Numerical stability for a high noise. The horizontal axis shows the log10 value of
camera frame, and transformed these 3D points into the world frame
the absolute rotation error(left) and the absolute translation error(right).
using the ground-truth of rotation Rtrue and translation Ttrue. Finally, we
projected these 3D points into the 2D image plane using the virtual
calibrated camera. Depending on the experiment, a different level of 3.0.2. Computational time
white Gaussian noise was added to the 2D image plane. The rotation The first simulated experiment was designed to determine the
and translation errors were calculated as evaluation of the computational cost of both methods. The computa-
tional time in MATLAB is not really related to the number of instruc-
R − Rtrue noise = 0 tions but instead is related to script interpretation. Therefore, to have a
erot (rad ) = ⎧ fair comparison, we evaluated the computational cost of all methods

⎩ Angle (R) − Angle (Rtrue ) noise > 0
using C++ programs. Additionally, time utilities may behave differ-
T − Ttrue noise = 0
etrans (m) = ⎧ ently on different platforms (Linux or Windows platform), and the re-
⎩ T −Ttrue
⎨ noise > 0 (28) lative performance is the most meaningful and reasonable comparison.
This experiment consisted of 2,000 runs, where one run comprised
where R and T are the estimated rotation and translation, respectively, 5,000 evaluations of the same set of 3D reference points. We tested the
and Angle( · ) represents the transformation of the Euler angles from the time of 2,000 runs and show the histograms of computing time in Fig. 3.
rotation matrix.

84
P. Wang et al.

85
Fig. 6. The mean error and standard deviation of rotation (1st and 2nd columns) and position (3rd and 4th columns) of the camera for the compared methods are shown as a function of the noise level.
Computer Vision and Image Understanding 166 (2018) 81–87
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87

dimensions as the real box) next to the real box. As shown, the proposed
method can recover the camera pose.

5. Conclusion

In this paper, instead of a traditional geometric approach, an alge-


braic approach is introduced to solve the P3P problem. The main idea is
to simplify the known 3D coordinates by using an object frame and to
parameterize the rotation and position matrix by introducing two
Fig. 7. The test results of real images.
auxiliary variables. The derivations are easy to understand, and the
final method is more efficient than existing P3P algorithms. The ex-
The data shows that our method is the fastest of the methods, due perimental results show that the proposed method offers numerical
primarily to the simplification of the derivation process. Gao’s stable and accuracy comparable to that of state-of-art methods but has
(Gao et al., 2003) method is the slowest, and Kneip’s (Kneip et al., better computation efficiency. The superior computational efficiency is
2011) and Li’s (Li and Xu, 2011) method are in second and third place, particularly suitable for practical applications.
respectively. The total and single running time of all methods are
shown in Table 1. Our method offers significantly decreased execution Acknowledgment
time. The speed of our method, which takes only 2.7 µ s, is faster by
about 30% compared with Kneip’s (Kneip et al., 2011) method, and is This study was supported by the National Natural Science
nearly 5 times faster than Li’s (Li and Xu, 2011) and Gao’s (Gao et al., Foundation of China (Nos. 61473148). We furthermore want to thank
2003) method. Laurent Kneip and Mohamed H. Merzban for their supportive feedback.

3.0.3. Numerical stability References


The second simulated experiment tested the numerical stability of
our proposed method using noise-free data (noise=0). We performed Arun, K.S., Huang, T.S., Blostein, S.D., 1987. Least-squares fitting of two 3-d point sets.
100,000 runs without the addition of noise to the 2D coordinates. For IEEE Trans. Pattern Anal. Mach. Intell. 9 (5), 698–700.
Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., MacIntyre, B., 2001. Recent
each triplet of 2D-to-3D points, we evaluated all solvers and calculated advances in augmented reality. Comput. Graphics Appl. IEEE 21 (6), 34–47.
the camera rotation and translation. If a solver produced more than one Azuma, R.T., et al., 1997. A survey of augmented reality. Presence 6 (4), 355–385.
feasible solution, we selected the one closest to the ground-truth camera Chen, C.-S., Chang, W.-Y., 2004. On pose recovery for generalized visual sensors. Pattern
Anal. Mach. Intell. IEEE Trans. 26 (7), 848–861.
as the estimated pose. The error histograms for the 100,000 repeats are Dani, A.P., Fischer, N.R., Dixon, W.E., 2012. Single camera structure and motion. Autom.
shown in Fig. 4. The distributions were more concentrated around the Control, IEEE Trans. 57 (1), 238–243.
smallest error region, which means that our method has better nu- Dementhon, D.F., Davis, L.S., 1995. Model-based object pose in 25 lines of code. Int. J.
Comput. Vis. 15 (1–2), 123–141.
merical stability than others. Next, we added a high noise (20 pixels) to
Engelhard, N., Endres, F., Hess, J., Sturm, J., Burgard, W., 2011. Real-time 3d visual slam
one of 3D points and tested the numerical stability. The results are with a hand-held rgb-d camera. Proc. of the RGB-D Workshop on 3D Perception in
shown in Fig. 5, and show that all methods become unstable due to the Robotics at the European Robotics Forum, Vasteras, Sweden. 180.
Finsterwalder, S., Scheufele, W., 1903. Das rückwärtseinschneiden im raum. Verlag d.
addition of a high amount of noise.
Bayer. Akad. d. Wiss.
Fischler, M.A., Bolles, R.C., 1981. Random sample consensus: a paradigm for model fit-
3.0.4. Noise sensitivity ting with applications to image analysis and automated cartography. Commun. ACM
The last simulated experiment was designed to study the effects of 24 (6), 381–395.
Gao, X.S., Hou, X.R., Tang, J.L., 2003. Complete solution classification for the perspec-
noise on the accuracy and precision of the computation for all methods. tive-three-point problem. Pattern Anal. Mach. Intell. IEEE Trans. 25 (8), 930–943.
We added different levels of Gaussian noise from 0 to 5 pixels onto the Grafarend, E., Lohse, P., Schaffrin, B., 1989. Dreidimensionaler ruckwartsschnitt, teil 3.
image points and generated 2,000 triplets of 2D-3D points for each Zeitschrift fur Vermessungswesen 4, 172–175.
Grunert, J., 1841. Das pothenotsche problem, in erweiterter gestalt, nebst bemerkungen
noise level. The results are shown in Figs. 6 and 7, and show that the über seine anwendung in der. Archiv der Mathematik und Physik 238248.
mean and standard deviation increased almost linearly with addition of Haralick, R.M., Lee, C.N., Ottenburg, K., Nölle, M., 1991. Analysis and solutions of the
noise for all methods. Gao’s (Gao et al., 2003) method yielded unstable three point perspective pose estimation problem. Computer Vision and Pattern
Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on.
results with low accuracy in overall evaluations. Our method provided IEEE, pp. 592–598.
accuracy and precision comparable or better than Li’s (Li and Xu, 2011) Horn, B., Hilden, H.M., Negahdaripour, S., 1988. Closed-form solution of absolute or-
and Kneip’s (Kneip et al., 2011) methods. One exception is that the ientation using orthonormal matrices. J. Opt. Soc. Am,. A 5 (7), 1127–1135.
Kangni, F., Laganiere, R., 2007. Orientation and pose recovery from spherical panoramas.
pitch angle error of our method appears to be less accurate, but it is
Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on. IEEE,
precise enough for practical use, especially considering the superior pp. 1–8.
computational efficiency. Note that there are several peaks appearing in Kneip, L., Scaramuzza, D., Siegwart, R., 2011. A novel parametrization of the perspective-
three-point problem for a direct computation of absolute camera position and or-
the standard deviation in the presence of increased noise levels in the Li
ientation. Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference
(Li and Xu, 2011)method. Those peaks are caused by single outlier on. IEEE, pp. 2969–2976.
results with high error, and this behavior is not present in our method, Lategahn, H., Geiger, A., Kitt, B., 2011. Visual slam for autonomous ground vehicles.
indicating that the improved method is more stable in degenerate Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, pp.
1732–1737.
configurations. Lepetit, V., Fua, P., 2006. Keypoint recognition using randomized trees. Pattern Anal.
Mach. Intell. IEEE Transactions on 28 (9), 1465–1479.
4. Real image Li, S., Ngan, K., Paramesran, R., Sheng, L., 2015. Real-time head pose tracking with online
face template reconstruction. IEEE Trans. Pattern Anal. Mach. Intell.(1). 1–1
Li, S., Xu, C., 2011. A stable direct solution of perspective-three-point problem. Int. J.
We also tested our method using real images. As shown in Fig. 8, Pattern Recognit. Artif. Intell. 25 (05), 627–642.
known world coordinates of eight points at the corners of a cuboid and Linnainmaa, S., Harwood, D., Davis, L.S., 1988. Pose determination of a three-dimen-
sional object using triangle pairs. Pattern Anal. Mach. Intell. IEEE Trans. 10 (5),
a calibrated camera were used. Three corners of the cuboid, represented 634–647.
by the green circles in Fig. 8, were selected to calculate the camera Ma, D., Chen, Y., Huang, C., Yan, Y., 2013. A simple analytic solution for the perspective-
pose. The corners of this cuboid were back-projected onto the image by 3-point problem. Int. J. Digital Content Technol. Appl. 7 (10), 129.
Merritt, E., 1949. Explicit three-point resection in space. Photogramm. Engineering 15
computed rotation and translation. In order to visualize the results, we (4), 649–655.
used those reprojected points to construct a virtual box (of identical

86
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87

Nistér, D., Naroditsky, O., Bergen, J., 2004. Visual odometry. Computer Vision and 2015. Uav photogrammetry and structure from motion to assess calving dynamics at
Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer store glacier, a large outlet draining the greenland ice sheet. The Cryosphere 9 (1),
Society Conference on. 1. IEEE, pp. I–652. 1–11.
Nistér, D., Naroditsky, O., Bergen, J., 2006. Visual odometry for ground vehicle appli- Skrypnyk, I., Lowe, D.G., 2004. Scene modelling, recognition and tracking with invariant
cations. J. Field Rob. 23 (1), 3–20. image features. Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and
Nistér, D., Stewénius, H., 2007. A minimal solution to the generalised 3-point pose pro- ACM International Symposium on. IEEE, pp. 110–119.
blem. J. Math. Imaging Vis. 27 (1), 67–79. Umeyama, S., 1991. Least-squares estimation of transformation parameters between
Quan, L., Lan, Z., 1999. Linear n-point camera pose determination. Pattern Anal. Mach. twopoint patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13 (4), 376–380.
Intell. IEEE Trans. 21 (8), 774–780. Visual, S., 2015. Iros 2014: robots descend on chicago. IEEE Rob. Autom. Mag.
Ryan, J., Hubbard, A., Box, J., Todd, J., Christoffersen, P., Carr, J., Holt, T., Snooke, N.,

87

View publication stats

You might also like