Professional Documents
Culture Documents
net/publication/320747098
CITATIONS READS
16 458
4 authors, including:
Some of the authors of this publication are also working on these related projects:
Saliency detection integrating both background and foreground information View project
All content following this page was uploaded by Ping Wang on 11 November 2019.
A R T I C L E I N F O A B S T R A C T
Keywords: In this paper, we present a new algebraic method to solve the perspective-three-point (P3P) problem, which
Perspective-three-point problem (P3P) directly computes the rotation and position of a camera in the world frame without the intermediate derivation
Absolute position and attitude of the points in the camera frame. Unlike other online methods, the proposed method uses an “object” coordinate
Pose estimation frame, in which the known 3D coordinates are sparse, facilitating formulations of the P3P problem. Additionally,
Monocular vision
two auxiliary variables are introduced to parameterize the rotation and position matrix, and then a closed-form
Computer vision
solution for the camera pose is obtained from subsequent substitutions. This algebraic approach makes the
MSC: processes more easily followed and significantly improves the performance. Experimental results demonstrated
41A05
that our method offers accuracy and precision comparable to the existing state-of-the art methods but it has
41A10
better computational efficiency.
65D05
65D17
1. Introduction analytical solution to the P3P problem. Nistér and Stewénius (2007)
proposed a different approach that works for the generalized camera
Determining the position and orientation of a camera given its in- model (Chen and Chang, 2004). In 2011, Li and Xu (2011) proposed a
trinsic parameters and a set of three correspondences between 3D perspective similar triangle (PST) method to the P3P problem. The
points and their 2D projections is known as the perspective-three-point strategy of the PST method is to reduce the number of unknown
(P3P) problem (Fischler and Bolles, 1981). Strategies to address the P3P parameters by constructing a perspective similar triangle. The same
problem have widespread applications in augmented reality (Azuma year, Kneip et al. (2011) proposed a novel closed-form solution to the
et al., 2001; 1997), structure from motion (Dani et al., 2012; Ryan P3P problem, which computes the camera pose directly in a single
et al., 2015), pose recovery (Kangni and Laganiere, 2007), pose stage, without the intermediate derivation of the points in the camera
tracking (Li et al., 2015), and visual simultaneous localization and frame. In 2013, Ma et al. (2013) proposed a simple method to the P3P
mapping (SLAM) (Engelhard et al., 2011; Lategahn et al., 2011; Visual, problem, relying on the fact that the inverse of rotation matrix equals
2015). its transpose.
The P3P problem was first solved in 1841 by Grunert (1841) and All existing P3P methods cited above can be classified as single-
then was refined in 1903 by Finsterwalder and Scheufele (1903). In stage methods and two-stage methods. Currently, Kneip’s (Kneip et al.,
1991, Haralick et al. (1991) reviewed the major direct solutions, in- 2011) method is the only single-stage solver, which directly computes
cluding six algorithms described by Grunert (1841), Finsterwalder and the position and orientation of the camera in the world frame as a
Scheufele (1903), Merritt (1949), Fischler and Bolles (1981), function of the 2D image coordinates and the 3D reference points. The
Linnainmaa et al. (1988) and Grafarend et al. (1989), respectively. other methods first calculate the distances between the camera center
Haralick et al. (1991) also gave analytical solutions for the P3P problem and the 3D reference point, and then determine the orientation and
with the necessary computation. Dementhon and Davis (1995) showed position by singular value decomposition (SVD) (Arun et al., 1987;
a simpler computational solution for the P3P problem by using ap- Horn et al., 1988; Umeyama, 1991). Li’s (PST) (Li and Xu, 2011)
proximations of the perspective. Quan and Lan (1999) reviewed the 3- method may be the best version of the two-stage method, offering high
point algebraic method and proposed a linear algorithm to handle the numerical stability and accuracy. The PST approach is not a direct so-
problem with more than 4 points. Gao et al. (2003) used two ap- lution because of the need to construct a perspective similar triangle. In
proaches to solve the P3P problem and provided the first complete contrast to two-stage methods, the main advantage of single-stage
⁎
Corresponding author.
E-mail addresses: pingwangsky@gmail.com (P. Wang), guilixu@nuaa.edu.cn (G. Xu).
https://doi.org/10.1016/j.cviu.2017.10.005
Received 1 November 2016; Received in revised form 13 July 2017; Accepted 15 October 2017
Available online 31 October 2017
1077-3142/ © 2017 Elsevier Inc. All rights reserved.
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87
The rest of the paper is organized as follows. Section 2 describes the ⎯⎯⎯⎯⎯⎯⎯→
derivations of the proposed method. Section 3 provides a thorough → W1 W2
uz = ⎯⎯⎯⎯⎯⎯⎯→ ,
analysis of the proposed method by simulated experiments. Section 4 W1 W2
shows the real tests. Section 5, finally, summarizes the work. → ⎯⎯⎯⎯⎯⎯⎯→
→ u × WW
z 1 3
uy = ⎯⎯⎯⎯⎯⎯⎯→ ,
→
uz × W1 W3
2. The proposed method →
ux = →
uy × →
uz . (1)
⎯⎯⎯⎯⎯⎯⎯→
2.1. Problem statement The → uz -axis is aligned with the vector W1 W2 , and the → u y -axis is
⎯⎯⎯⎯⎯⎯⎯→ ⎯⎯⎯⎯⎯⎯⎯→
perpendicular to the plane determined by W1 W2 and W1 W3 .
The problem considered in this paper is illustrated in Fig. 1. The → → →T
Via the transformation matrix Two = [ u x , u y , uz ] , the 3D reference
three known 3D reference points are W1, W2, and W3, and their cor- points can be easily transformed into the object frame using
responding 2D projections are p1, p2, and p3. The purpose is to recover
⎯⎯⎯⎯⎯⎯⎯→
the rotation Rwc and translation Twc of a calibrated camera. The nor- Pi = Two (Wi − W1) = Two Wi W1 i = 1, 2, 3, (2)
malized vectors f1, f2 and f3 can be easily obtained from p1, p2, and p3,
where Pi = [x i , yi , z i]T . It is obviously that P1 = [0, 0, 0]T . Since
and are used to facilitate subsequent derivations. ⎯⎯⎯⎯⎯⎯⎯→ ⎯⎯⎯⎯⎯⎯⎯→
u ⊥ W W and →
→ u ⊥ W W , we can obtain the following equations by the
x 1 2 y 1 3
definition of dot product
2.2. Building an object frame → ⎯⎯⎯⎯⎯⎯⎯→
u x • W1 W2 = 0,
→ ⎯⎯⎯⎯⎯⎯⎯→
The first step involves the definition of an intermediate frame, u y• W1 W2 = 0,
called “object frame”, from the 3D reference points. As shown in Fig. 2, → ⎯⎯⎯⎯⎯⎯⎯→
u y• W1 W3 = 0. (3)
we choose W1 as the origin and build an object frame [P1 − →ux , →
uy, →
uz ],
where By substituting (3) into (2), we can obtain P2 = [0, 0, z2 and ]T
P3 = [x3 , 0, z 3]T . Now if we are able to obtain the rotation and position
of the object frame with respect to the camera frame, the rotation Rwc
82
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87
where Si (i = 1, …, 8) is new unknown variable, Ai, Bi and Ci (i = 1, 2) are S4 can be easily solved from (22), which has four real solutions at
constant coefficients. Note that the rotation R and position T are now most. By substituting S4 into (21), we can get S7. By substituting S4 and
parameterized by Si (i = 1, …, 8) and Ai , Bi , Ci (i = 1, 2) . S7 into (13) and (14), S3, S5, S6 and S8 can be computed, respectively.
Replacing Si (i = 1, …, 8) and Ai , Bi , Ci (i = 1, 2) in (7), (10) and (11), Since the rotation vectors have the same length, so has
expanding and collecting then we have
r32 + r62 + r92 = 1. (23)
S1 = A1 , S2 = A2 , (12)
By substituting S3 = r3/ tz , S4 = r9/ tz and S5 = r6/ tz into (23), we can de-
S3 = B1 S4 + T1, S5 = B2 S4 + T2, (13) rive that
83
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87
1
tz = .
S32 + S42 + S52 (24)
r r r
⎛ 2⎞ ⎛ 1 ⎞ ⎛ 3⎞
r5 = r4 × r6 .
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ r8 ⎠ ⎝ r7 ⎠ ⎝ r9 ⎠ (25)
Now the rotation R and translation T of the object frame with re-
Fig. 3. Average running time of all methods, it is clear that our method is the fastest.
spect to the camera frame have been acquired. Hence the complete
rotation Rwc and translation Twc of the camera with respect to the world
frame are finally given as Table 1
The total(1st Line) and single(2nd Line) running time.
Rwc = RTwo, (26)
Gao Li(PST) Kneip Wang
and
Total Time/s 130.927 127.065 38.915 27.179
Twc = T − RTwo W1. (27) Each Time/us 13.0927 12.7065 3.8915 2.7179
3. Synthetic experiments
• Gao: One of the most popular solver methods. (Gao et al., 2003).
• Li: The best two-stage solver with high numerical stability and ac-
curacy (Li and Xu, 2011).
• Kneip: The best single-stage solver with comparable accuracy and
precision at a substantially lower computational cost (Kneip et al.,
2011). Fig. 4. Numerical stability of all methods. The horizontal axis shows the log10 value of the
absolute rotation error(left) and the absolute translation error(right).
Note that all methods are MATLAB or C++ implementations. The
C++ version is based on Kneip’s opengv library without any additional
optimizations. The details of opengv are described in https://github.
com/laurentkneip/opengv. All codes are executed on a quad-core
desktop with 3.7 GHz CPU and 4GB RAM, and the source codes can be
downloaded from http://pingwang.sxl.cn/.
84
P. Wang et al.
85
Fig. 6. The mean error and standard deviation of rotation (1st and 2nd columns) and position (3rd and 4th columns) of the camera for the compared methods are shown as a function of the noise level.
Computer Vision and Image Understanding 166 (2018) 81–87
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87
dimensions as the real box) next to the real box. As shown, the proposed
method can recover the camera pose.
5. Conclusion
86
P. Wang et al. Computer Vision and Image Understanding 166 (2018) 81–87
Nistér, D., Naroditsky, O., Bergen, J., 2004. Visual odometry. Computer Vision and 2015. Uav photogrammetry and structure from motion to assess calving dynamics at
Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer store glacier, a large outlet draining the greenland ice sheet. The Cryosphere 9 (1),
Society Conference on. 1. IEEE, pp. I–652. 1–11.
Nistér, D., Naroditsky, O., Bergen, J., 2006. Visual odometry for ground vehicle appli- Skrypnyk, I., Lowe, D.G., 2004. Scene modelling, recognition and tracking with invariant
cations. J. Field Rob. 23 (1), 3–20. image features. Mixed and Augmented Reality, 2004. ISMAR 2004. Third IEEE and
Nistér, D., Stewénius, H., 2007. A minimal solution to the generalised 3-point pose pro- ACM International Symposium on. IEEE, pp. 110–119.
blem. J. Math. Imaging Vis. 27 (1), 67–79. Umeyama, S., 1991. Least-squares estimation of transformation parameters between
Quan, L., Lan, Z., 1999. Linear n-point camera pose determination. Pattern Anal. Mach. twopoint patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13 (4), 376–380.
Intell. IEEE Trans. 21 (8), 774–780. Visual, S., 2015. Iros 2014: robots descend on chicago. IEEE Rob. Autom. Mag.
Ryan, J., Hubbard, A., Box, J., Todd, J., Christoffersen, P., Carr, J., Holt, T., Snooke, N.,
87