Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Keywords: For many three-dimensional (3D) measurement techniques based on fringe projection profilometry (FPP), mea-
Fringe projection profilometry suring the objects with a large variation range of surface reflectivity is always a very tricky problem due to the
Deep learning limited dynamic range of camera. Many high dynamic range (HDR) 3D measurement methods are developed for
High dynamic range
static scenes, which are fragile for dynamic objects. In this paper, we address the problem of phase information
High-speed 3D measurement
loss in HDR scenes, in order to enable 3D reconstruction from saturated or dark images by deep learning. By using
Binocular system
a specifically designed convolutional neural network (CNN), we can accurately extract phase information in both
the low signal-to-noise ratio (SNR) and saturation situations after proper training. Experimental results demon-
strate the success of our network in 3D reconstruction for both static and dynamic HDR objects. Our method can
improve the dynamic range of three-step phase-shifting by a factor of 4.8 without any additional projected im-
ages or hardware adjustment during measurement. And the final 3D measurement speed of our method is about
13.89 Hz (off-line).
∗
Corresponding author at: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, No. 200 Xiaolingwei Street, Nanjing,
Jiangsu Province 210094, China.
E-mail addresses: chenqian@njust.edu.cn (Q. Chen), geniusshijie@163.com, shijiefeng@njust.edu.cn (S. Feng).
https://doi.org/10.1016/j.optlaseng.2020.106245
Received 25 January 2020; Received in revised form 3 April 2020; Accepted 7 May 2020
Available online 15 June 2020
0143-8166/© 2020 Elsevier Ltd. All rights reserved.
L. Zhang, Q. Chen and C. Zuo et al. Optics and Lasers in Engineering 134 (2020) 106245
ral network structure can effectively eliminate the phase error caused
In FPP system, the random noise In is considered as an additive Gaussian
by HDR. Based on this finding, we can improve the dynamic range of
noise [22]. Therefore, In has a normal distribution with an average value
three-step phase-shifting by a factor of 4.8 without any additional fringe
of zero. On this basis, the variance of phase error 𝜎𝜙2 can be calculated
images or adjustment of hardware during measurement. To further im-
as follows [23]:
prove the measurement speed, a stereo phase unwrapping (SPU) tech-
( ) 𝑁−1
nique [20] is used to calculate the unwrapped phase (absolute phase). 2 𝜎𝑛 2 ∑ 2𝜎𝑛2 2𝜎𝑛2
Experimental results will be presented to verified the success of the pro- 𝜎𝜙2 = 𝑠𝑖𝑛2 (𝜙 + 2𝑘𝜋∕𝑁) = = , (5)
𝑁𝐵 ′
𝑘=0 𝑁𝐵 ′2
𝑁𝛼 2 𝑡2 𝑟2 𝐵 2
posed method.
where 𝜎 n is the standard deviation of In . According to Eq. (5), we can
increase the number of phase-shifting step N to reduce the phase error
caused by low reflectivity.
2. Principle In the second case, the captured images are saturated. The phase
information loss caused by saturation becomes the main source of phase
2.1. N-step phase-shifting method error. Assuming the dynamic range of camera is infinite, the exact value
of 𝜙 can be calculated as
A sinusoidal fringe pattern can be mathematically described as fol- [ ∑𝑁−1 𝑐 ]
lows: 𝑘=0 𝐼𝑘 sin(2𝑘𝜋∕𝑁)
𝜙 = tan−1 ∑ . (6)
𝑁−1 𝑐
( ) 𝑘=0 𝐼𝑘 cos(2𝑘𝜋∕𝑁)
2𝑘𝜋
𝐼𝑘𝑝 = 𝐴 + 𝐵 cos 𝜙 − , 𝑘 = 0, 1, … , 𝑁 − 1, (1) However, it is impossible for existing technology. For an 8-bit camera,
𝑁
when the saturation phenomenon occurs, Eq. 3 should be rewritten as
where A is the average intensity, B is the intensity modulation, N is follows:
the phase-shifting step, k is the phase-shift index, and 𝜙 is the phase { ′
𝐴 + 𝐵 ′ cos(𝜙 − 2𝑘𝜋∕𝑁 ), 𝐼𝑘𝑐 < 255
𝐼𝑘𝑐 =
′
to be measured. The corresponding image captured by camera can be . (7)
255, 𝐼𝑘𝑐 ≥ 255
expressed as
Inserting Eq. (7) into Eq. (6), we have
[ ∑𝑁−1 𝑐 ′ ]
𝐼𝑘𝑐 = 𝛼𝑡𝑟𝐼𝑘𝑝 + 𝐼 𝑛 , (2) ′ 𝑘=0 𝐼𝑘 sin(2𝑘𝜋∕𝑁)
𝜙 = tan −1
∑𝑁−1 𝑐 ′ . (8)
where 𝛼 is the camera sensitivity, t is the camera exposure time, r is 𝑘=0 𝐼𝑘 cos(2𝑘𝜋∕𝑁)
the surface reflectivity of measured objects, and In is the random noise. From Eq. (8), the essence of the phase-shift algorithm is the discrete
Assuming 𝛼 and t remain constant during measurement, the light inten- Fourier transform (DFT) [24]. Based on the property of DFT, when the
sity value of 𝐼𝑘𝑐 depends only on r. For different values of r, the HDR condition of integer-period sampling is not satisfied, the phase error will
problem normally has two cases: 1) when the value of r is small, the im- be introduced. Since the intensity saturation can also be regarded as a
age brightness of its corresponding areas is low, which makes the phase special kind of nonlinearity error, we can still reduce the saturation error
information is easily influenced by noise shown in Fig. 1 (b); 2) when by increasing the number of phase-shifting step [12].
the value of r is large, the corresponding areas is prone to be saturated, In both cases, increasing the number of phase-shifting step is a very
which leads to the phase information loss shown in Fig. 1 (c). effective way of reducing the phase error. To better illustrate its ef-
In the first case, the light intensity value (grayscale) of each pixel is fectiveness, three-step phase-shifting and twelve-step phase-shifting are
within the dynamic range of camera, and thus we can directly use the used to simulate the phase error with different reflectivity. In the sim-
standard N-step phase shifting algorithm to calculate 𝜙. First, Eq. (2) is ulation, the camera is simulated as an 8-bit camera with resolution of
L. Zhang, Q. Chen and C. Zuo et al. Optics and Lasers in Engineering 134 (2020) 106245
Sgt and Cgt ), which are also known as the ground truth, can be calculated
as follows:
∑
11 ( )
2𝑘𝜋
𝐼𝑘𝑐 sin
′
𝑆𝑔𝑡 = , (13)
𝑘=0
12
mean absolute error (MAE) with different values of r. In this simulation, where bij is a bias term for the jth feature map, n is the set of the feature
the acceptable MAE is set within 0.04 rad. According to this, the cor- maps in the (𝑖 − 1)th layer, 𝑤𝑝𝑞
𝑖𝑗𝑛 is the value at the position (p, q) of the
responding measurable reflectivity ranges of three-step phase-shifting filter, P and Q are the height and width of the filter, respectively. In this
and twelve-step phase-shifting are about 0.6–1.0 and 0.3–3.0 respec- paper, the size of each filter is 3 × 3 and the convolution stride is 1.
tively show in Fig 2 (c)-(d). Here we use the quotient of the maximum The following pooling layer simplifies the output of the first con-
measurable reflectivity divided by the minimum measurable reflectivity volutional layer by performing nonlinear downsampling. Each output
to represent the dynamic range of N-step phase-shifting (DRN ): data of the first convolutional layer is downsampled by × 1, × 2, × 4,
1.0 5 and × 8 in four different paths. Therefore, its size is then transformed
𝑇 ℎ𝑟𝑒𝑒 − 𝑠𝑡𝑒𝑝 ∶ 𝐷𝑅3 = = , (11) into w × h × 50, 𝑤2 × ℎ2 × 50, 𝑤4 × ℎ4 × 50, and 𝑤8 × ℎ8 × 50. These paths
0.6 3
are designed to perceive more surface details. The output of the pool-
3.0 ing layer is followed by four residual blocks [26,27]. They can speed
𝑇 𝑤𝑒𝑙𝑣𝑒 − 𝑠𝑡𝑒𝑝 ∶ 𝐷𝑅12 = = 10. (12)
0.3 up the convergence of the training phase, thus increasing the training
The calculation results show that the dynamic range of twelve-step efficiency. Following the residual blocks, there are several upsampling
phase-shifting is six times larger than that of three-step phase-shifting in blocks [28] which are used to return the above multi-scale data to their
the simulation. However, the measurement speed of twelve-step phase- original dimensions. After passing through the second convolutional
shifting is four times slower than that of three-step phase-shifting. It can layer, the output data of these four dataflow paths are concatenated
be seen that the conventional phase-shifting method cannot satisfy the into a tensor with size of w × h × 200 by the concatenation block. It
requirements of high-speed and HDR at the same time. is input to the last convolutional layer. Finally, the output of the last
convolutional layer results in two channels which are denoted as S and
C. In fact, S and C are the numerator and denominator of Eq. (6) respec-
2.2. Phase calculation by deep learning tively, and thus the wrapped phase calculated by our network can be
expressed as
As discussed before, there is a dilemma between increasing measure- ( )
𝑆
ment speed and improving dynamic range in the conventional phase- 𝜙𝐶𝑁𝑁 = tan−1 . (16)
shifting method. In recent years, with the rapid development of com- 𝐶
puter technology, this dilemma can be well solved by using deep learn- S and C cannot be directly used for phase calculation. Their accuracy
ing. Our deep learning network is based on the convolutional neural should be verified by the loss function. The loss function is defined as
network (CNN), it consists of an input and an output layer, as well as the mean-squared-errors (MSE) of S and C with respect to Sgt and Cgt :
multiple hidden layers. Its simplified architecture is detailed in Fig. 3. ∑ ∑𝐻 ‖ ‖2
The deep learning process proceeds in two stages called training and 𝐿𝑜𝑠𝑠(𝜃) = 𝑊 1×𝐻 𝑊 𝑦=1 [ ‖ 𝑆(𝑥, 𝑦) − 𝑆𝑔𝑡 (𝑥, 𝑦)‖ +
𝑥=1 ‖ ‖ (17)
test. ‖ ‖2
‖𝐶(𝑥, 𝑦) − 𝐶𝑔𝑡 (𝑥, 𝑦)‖ ],
In the training stage, we need to collect a large amount of train- ‖ ‖
ing data to train our network. For each measured object, we project where 𝜃 is the parameter space which includes the weights, bias and so
the twelve-step phase-shifting patterns onto it. The projected light in- on. In the training, Loss(𝜃) can be considered as a feedback parameter.
tensity is designed to be larger than the measurement upper limit of Based on Loss(𝜃), the adaptive moment estimation (ADAM) [29] can
L. Zhang, Q. Chen and C. Zuo et al. Optics and Lasers in Engineering 134 (2020) 106245
Fig. 3. Architecture of our CNN, which is consists of an input and an output layer, as well as multiple hidden layers and rapidly processes input data in a parallel,
multi-scale manner.
tune 𝜃 to minimize the loss function. And the training stage will continue
to run until we get the minimum value of Loss(𝜃).
In the test stage, we also captured twelve images for each test object.
Note that the test objects are never seen by our network in the training Fig. 4. Phase MSE with different frequencies when the phase-shifting step is 3.
stage. Their real wrapped phase 𝜙gt can be calculated as
( )
𝑆𝑔𝑡
𝜙𝑔𝑡 = tan−1 . (18) projector is 912 pixel, and these frequencies can make the values of
𝐶𝑔𝑡
wavelengths being integers), when the phase-shifting step is 3.
Different from the training data, each test data contains only three im- The main reason for this phenomenon is that the phase accuracy
ages (I1 ~ I3 ), which means its S and C are calculated by our network of our network relies on Sgt and Cgt . As mentioned before, Sgt and Cgt
without labels. For the test data, we can use their 𝜙CNN and 𝜙gt to cal- are calculated with the twelve-step phase-shifting algorithm. For low
culate the phase error. If the phase error is within the acceptable level, frequencies, especially the basic-frequency (𝑓 = 1), twelve-step is not
we consider that the network is available for actual measurement. Oth- enough to eliminate the saturation error [30], which inevitably impacts
erwise, we need to adjust the network parameters and return to the the output of our network. While for high frequencies, the excessively
training stage. high frequencies (f > 90) deteriorate the contrast and sinusoidal prop-
erty due to the restraint of projector resolution. On the other hand, the
2.3. Stereo phase unwrapping excessively high frequencies also increase the impact of saturation due
to the point spread function (PSF) of camera [31]. After several exper-
The phase of our network provided from Eq. (16) is the wrapped iments, we have found that the applicable range of frequency is about
phase, which has 2𝜋 phase discontinuities. This wrapped phase cannot 60–90.
be directly used for 3D reconstruction. To obtain a continuous phase In this paper, stereo phase unwrapping (SPU) [20] is used to process
distribution, phase unwrapping must be carried out. After years of ex- phase unwrapping. A typical FPP system using SPU is composed of two
ploration and development, there are many mature phase unwrapping cameras and a projector shown in Fig. 5 (a). Depending on the geomet-
methods, but not all of them are suitable for our wrapped phase. That’s ric relationship between two cameras, SPU can calculate the unwrapped
because the phase accuracy of our network changes with the fringe fre- phase without the assistance of the basic-frequency fringe patterns. Fur-
quency f. Here, if the fringes are perpendicular to the horizontal axis of thermore, an additional advantage of SPU is that we can improve its
the projector, f can be expressed as: robustness by depth constraint [32]. According to the motion states of
measured objects and the FPP system parameters, we can approximately
𝑅 estimate the measurement range. 𝑍min and 𝑍max shown in Fig. 5 (b) rep-
𝑓 = , (19)
𝜆 resent the minimum and maximum depth boundaries respectively. If we
where R is the horizontal resolution of the projector, 𝜆 is the wavelength set the measurement range Z as 𝑍min < 𝑍 < 𝑍max , we can remove some
of sinusoidal fringe patterns. In actual measurement, we find that our candidates (e.g., P1 and P5 shown in Fig. 5 (b)) to reduce the computa-
network is more applicable to high frequency fringe patterns. Fig. 4 dis- tional load. However the applicable frequencies of our network (60–90)
plays some measurement results (here the reason why the frequencies are too high for SPU. As shown in Fig. 5 (c), when the fringe frequency
are not integers multiple of ten is that the horizontal resolution of our is high, there still exists many candidates in the measurement range
L. Zhang, Q. Chen and C. Zuo et al. Optics and Lasers in Engineering 134 (2020) 106245
Fig. 6. (a) Photograph of the standard spheres; (b) Captured images with dif-
ferent values of LI; (c) Cross-section plot of the captured images in (b).
3. Experiments
Fig. 8. Experimental results when LI is 75. (a) 3D reconstruction of Sphere A of three-step phase shifting; (b) 3D reconstruction of Sphere B of three-step phase
shifting; (c) Error distribution of (a); (d) Error distribution of (b); (e) 3D reconstruction of Sphere A of our method; (f) 3D reconstruction of Sphere B of our method; (g)
Error distribution of (e); (h) Error distribution of (f); (i) 3D reconstruction of Sphere A of twelve-step phase shifting; (j) 3D reconstruction of Sphere B of twelve-step
phase shifting; (k) Error distribution of (i); (l) Error distribution of (j);.
lem is not the design flaws of our network, but the phase information
loss of the input data. As is known to all, when the image saturation
occurs, the grayscale values of the saturation regions are mapped to the
same value (255 for an 8-bit camera). For the three images input to our
network, when the saturation extent is serious, the grayscale values of
the same point in the three images are all 255. In this case, it is difficult
for our network to correctly extract the phase information from the im-
ages, as most of the useful information are overridden by the saturation
value. Although the dynamic range of our method is smaller than that of
twelve-step phase-shifting, it has the advantage of much faster measure-
ment speed. Compared with three-step phase-shifting, our method has
the same measurement speed. For Sphere A, If the acceptable RMSE is
set within 0.072 mm, we can refer to Eq. (11) to calculate the dynamic
ranges by using LI to replace r:
Fig. 10. RMSE curves of three methods. (a) RMSE curves of Sphere A of three
methods; (b) RMSE curves of Sphere A of twelve-step phase-shifting and our 𝑇 ℎ𝑟𝑒𝑒 − 𝑠𝑡𝑒𝑝 ∶ 𝐷𝑅3 = 50∕40 = 1.25
. (23)
method; (c) RMSE curves of Sphere B of three methods; (b) RMSE curves of 𝑂𝑢𝑟 𝑚𝑒𝑡ℎ𝑜𝑑 ∶ 𝐷𝑅𝑜𝑢𝑟 = 150∕25 = 6.00
Sphere B of twelve-step phase-shifting and our method.
Fig. 12. Experimental results of the first scenes. (a) Overview of the 3D reconstruction of three-step phase shifting; (b) Enlarged detail of the region in (a); (c)
Enlarged detail of the region in (b); (d) Overview of the 3D reconstruction of our method; (e) Enlarged detail of the region in (d); (f) Enlarged detail of the region in
(e).
Fig. 13. Experimental results of the second scenes. (a) Enlarged detail of the black cardboard in (b); (b) Overview of the 3D reconstruction of three-step phase
shifting; (c) Enlarged detail of the statue in (b); (d) Enlarged detail of the black cardboard in (e); (e) Overview of the 3D reconstruction of our method; (f) Enlarged
detail of the statue in (e).
The calculation results show that the dynamic range of our method is
4.8 times larger than that of three-step phase-shifting.
Fig. 15. (a) Photograph of the ping-pong ball; (b) One of the fringe images of
(a); (c) Grayscale values of (b) represented by the “jet” colormap; .
Fig. 17. Dynamic measurement results. (a) x-z view of the pendulum motion of
three-step phase-shifting; (b) Enlarged detail of the region in (a); (c) x-z view of
the pendulum motion of our method; (d) Enlarged detail of the region in (c).
one frame of the dynamic measurement result of our method. The com-
plete dynamic measurement results of our method can be found in Vi-
sualization 1 and Visualization 2. Visualization 1 is the x-y view of the
pendulum motion, Visualization 2 is the x-z view of the pendulum mo-
tion. For better observation, the display frame rates of Visualization 1
and 2 are both 5 Hz. The experimental results demonstrate that our
method can satisfy the measurement requirements of high-speed and
HDR at the same time.