High Speed High Dynamic Range 3D Shape Measuremente Based On Deep Learning

Optics and Lasers in Engineering 134 (2020) 106245
Contents lists available at ScienceDirect
Optics and Lasers in Engineering

journal homepage: www.elsevier.com/locate/optlaseng
High-speed high dynamic range 3D shape measurement based on deep

learning
Liang Zhang a,b, Qian Chen a, Chao Zuo a,b, Shijie Feng a,b,∗
a
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, No. 200 Xiaolingwei Street, Nanjing, Jiangsu Province 210094, China
b
Smart Computational Imaging (SCI) Laboratory, Nanjing University of Science and Technology, Nanjing, Jiangsu Province 210094, China
a r t i c l e i n f o a b s t r a c t
Keywords: For many three-dimensional (3D) measurement techniques based on fringe projection profilometry (FPP), mea-
Fringe projection profilometry suring the objects with a large variation range of surface reflectivity is always a very tricky problem due to the
Deep learning limited dynamic range of camera. Many high dynamic range (HDR) 3D measurement methods are developed for
High dynamic range
static scenes, which are fragile for dynamic objects. In this paper, we address the problem of phase information
High-speed 3D measurement
loss in HDR scenes, in order to enable 3D reconstruction from saturated or dark images by deep learning. By using
Binocular system
a specifically designed convolutional neural network (CNN), we can accurately extract phase information in both
the low signal-to-noise ratio (SNR) and saturation situations after proper training. Experimental results demon-
strate the success of our network in 3D reconstruction for both static and dynamic HDR objects. Our method can
improve the dynamic range of three-step phase-shifting by a factor of 4.8 without any additional projected im-
ages or hardware adjustment during measurement. And the final 3D measurement speed of our method is about
13.89 Hz (off-line).
1. Introduction provide details in highly illuminated areas. The multi-exposure method

improves the dynamic range by merging these images with different de-
In recent years, with the development of technology, the three- tails into a single HDR image [4]. Similarly, adjusting the projected light
dimensional (3D) information acquisition and processing is becoming intensities has the same effect [5] and it can be considered as another
more and more popular. Fringe projection profilometry (FPP), as a 3D form of multi-exposure. The multi-exposure method has advantages of
measurement technique, has been extensively applied in many fields broad measuring dynamic range and simple operation. However, its dis-
such as manufacturing, education, and entertainment industry [1–3]. A advantages are also obvious. For an arbitrary measured object, it is hard
typical FPP system consists of one projection device and several cam- to rapidly find its proper exposure times, because of lacking its surface
eras. The projection device is used to project coded fringe patterns to reflectance information. In order to obtain high-quality 3D measure-
measured objects. Due to the limited dynamic range of camera, in ac- ment results, one has to use as many exposure times (or projected light
tual measurement, image saturation inevitably occurs when the mea- intensities) as possible which causes the problem of low measurement
sured scene contains objects with high reflectivity. If directly reducing efficiency. On the other hand, before collecting enough images with dif-
the projected light intensity (or exposure time), it will lead to a low ferent exposures, the measured objects need to stay still, which means
signal-to-noise ratio (SNR) of objects with low reflectivity in the mea- the multi-exposure method only applies to static measurement. In order
sured scene. To solve this problem, many high dynamic range (HDR) to optimize the multi-exposure method, many improved methods have
methods have been proposed. In general, they can be classified into two been proposed in recent years. These methods can avoid the experiment
categories: the hardware-based methods and the algorithm-based meth- blindness, making the measurement process more efficient [6–10]. How-
ods. ever, their measurement speed is still too slow for dynamic measure-
As the name suggests, the hardware-based methods solve the HDR ment. To increase the measurement speed, Suresh et al. proposed a op-
problem mainly by adjusting system hardware parameters. Among these timized multi-exposure method based on digital-light-processing (DLP)
methods, the multi-exposure method is the most widely used method. technology [11]. This method has good real-time performance, but its
In photograph, images taken with long exposure times reveal details in improvement of dynamic range is limited. How to significantly improve
dimly illuminated areas, while images taken with short exposure times
∗
Corresponding author at: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, No. 200 Xiaolingwei Street, Nanjing,
Jiangsu Province 210094, China.
E-mail addresses: chenqian@njust.edu.cn (Q. Chen), geniusshijie@163.com, shijiefeng@njust.edu.cn (S. Feng).
https://doi.org/10.1016/j.optlaseng.2020.106245
Received 25 January 2020; Received in revised form 3 April 2020; Accepted 7 May 2020
Available online 15 June 2020
0143-8166/© 2020 Elsevier Ltd. All rights reserved.
L. Zhang, Q. Chen and C. Zuo et al. Optics and Lasers in Engineering 134 (2020) 106245
the dynamic range with a satisfactory measurement speed is crucial in

broadening the application of multi-exposure based HDR methods.
Different from the hardware-based methods, the algorithm-based
methods do not adjust hardware parameters at all during measurement,
and thus their measurement speed is much faster than that of most
hardware-based methods. They aim to eliminate the measurement er-
ror caused by HDR in mathematical way, which can be considered as
a kind of post-processing compensation method. In FPP, the sinusoidal
fringe patterns are the most used patterns, due to their robustness to
noise and ability to achieve high resolution [12]. For the sinusoidal
fringe patterns, most algorithm-based methods are designed to elimi-
nate the phase error caused by saturation. Their core idea is increasing Fig. 1. Effect of different reflectivity on the phase reconstruction. (a) Ideal
the number of fringe patterns with different light intensities to ensure phase-shifting pattern and the corresponding cross section; (b) Phase-shifting
at least three unsaturated intensity values on the same pixel [13]. Some pattern and the corresponding cross section in the low SNR case; (c) Phase-
simple operations can achieve this effect, for example, increasing the shifting pattern and the corresponding cross section in the saturation case; (d)
Wrapped phase of (a); (e) Wrapped phase of (b); (f) Wrapped phase of (c).
number of the phase-shifting step [14] or inserting inverted fringe pat-
terns [15]. It can be seen that their improvement of dynamic range is
at the expense of increasing projected fringe patterns. Projecting too expanded as follows:
much patterns will bring motion error [16]. Therefore, it is also very [ ( )]
2𝑘𝜋
difficult for the algorithm-based methods to solve the dilemma between 𝐼𝑘𝑐 = 𝛼𝑡𝑟 𝐴 + 𝐵 cos 𝜙 − 𝑁
+ 𝐼𝑛
( )
improving the dynamic range and ensuring the real-time performance. 2𝑘𝜋
= (𝛼𝑡𝑟𝐴 + 𝐼 𝑛 ) + (𝛼𝑡𝑟𝐵) ⋅ cos 𝜙 − 𝑁
(3)
In this paper, we first attempted to introduce deep learning into ( )
′ 𝑘𝜋
the HDR 3D shape measurement to solve the dilemma between high- = 𝐴 + 𝐵 ′ cos 𝜙 − 2𝑁 ,
speed and HDR. Deep learning is a very powerful tool derived from ma- then 𝜙 can be extracted by the following equation [21]:
chine learning [17–19]. It is developing very quickly nowadays and has { ∑𝑁−1 ′ ′ }
showed its wide prospects of application in many fields. For deep learn- 𝑘=0 [𝐴 𝑠𝑖𝑛(2𝑘𝜋∕𝑁)] + 𝑁𝐵 𝑠𝑖𝑛𝜙∕2
𝜙 = tan −1
∑𝑁−1 ′ . (4)
ing, if providing enough amounts of training data, an optimized neu-
𝑘=0 [𝐴 𝑐 𝑜𝑠(2𝑘𝜋∕𝑁)] + 𝑁𝐵 𝑐𝑜𝑠𝜙∕2
′
ral network structure can effectively eliminate the phase error caused
In FPP system, the random noise In is considered as an additive Gaussian
by HDR. Based on this finding, we can improve the dynamic range of
noise [22]. Therefore, In has a normal distribution with an average value
three-step phase-shifting by a factor of 4.8 without any additional fringe
of zero. On this basis, the variance of phase error 𝜎𝜙2 can be calculated
images or adjustment of hardware during measurement. To further im-
as follows [23]:
prove the measurement speed, a stereo phase unwrapping (SPU) tech-
( ) 𝑁−1
nique [20] is used to calculate the unwrapped phase (absolute phase). 2 𝜎𝑛 2 ∑ 2𝜎𝑛2 2𝜎𝑛2
Experimental results will be presented to verified the success of the pro- 𝜎𝜙2 = 𝑠𝑖𝑛2 (𝜙 + 2𝑘𝜋∕𝑁) = = , (5)
𝑁𝐵 ′
𝑘=0 𝑁𝐵 ′2
𝑁𝛼 2 𝑡2 𝑟2 𝐵 2
posed method.
where 𝜎 n is the standard deviation of In . According to Eq. (5), we can
increase the number of phase-shifting step N to reduce the phase error
caused by low reflectivity.
2. Principle In the second case, the captured images are saturated. The phase
information loss caused by saturation becomes the main source of phase
2.1. N-step phase-shifting method error. Assuming the dynamic range of camera is infinite, the exact value
of 𝜙 can be calculated as
A sinusoidal fringe pattern can be mathematically described as fol- [ ∑𝑁−1 𝑐 ]
lows: 𝑘=0 𝐼𝑘 sin(2𝑘𝜋∕𝑁)
𝜙 = tan−1 ∑ . (6)
𝑁−1 𝑐
( ) 𝑘=0 𝐼𝑘 cos(2𝑘𝜋∕𝑁)
2𝑘𝜋
𝐼𝑘𝑝 = 𝐴 + 𝐵 cos 𝜙 − , 𝑘 = 0, 1, … , 𝑁 − 1, (1) However, it is impossible for existing technology. For an 8-bit camera,
𝑁
when the saturation phenomenon occurs, Eq. 3 should be rewritten as
where A is the average intensity, B is the intensity modulation, N is follows:
the phase-shifting step, k is the phase-shift index, and 𝜙 is the phase { ′
𝐴 + 𝐵 ′ cos(𝜙 − 2𝑘𝜋∕𝑁 ), 𝐼𝑘𝑐 < 255
𝐼𝑘𝑐 =
′
to be measured. The corresponding image captured by camera can be . (7)
255, 𝐼𝑘𝑐 ≥ 255
expressed as
Inserting Eq. (7) into Eq. (6), we have
[ ∑𝑁−1 𝑐 ′ ]
𝐼𝑘𝑐 = 𝛼𝑡𝑟𝐼𝑘𝑝 + 𝐼 𝑛 , (2) ′ 𝑘=0 𝐼𝑘 sin(2𝑘𝜋∕𝑁)
𝜙 = tan −1
∑𝑁−1 𝑐 ′ . (8)
where 𝛼 is the camera sensitivity, t is the camera exposure time, r is 𝑘=0 𝐼𝑘 cos(2𝑘𝜋∕𝑁)
the surface reflectivity of measured objects, and In is the random noise. From Eq. (8), the essence of the phase-shift algorithm is the discrete
Assuming 𝛼 and t remain constant during measurement, the light inten- Fourier transform (DFT) [24]. Based on the property of DFT, when the
sity value of 𝐼𝑘𝑐 depends only on r. For different values of r, the HDR condition of integer-period sampling is not satisfied, the phase error will
problem normally has two cases: 1) when the value of r is small, the im- be introduced. Since the intensity saturation can also be regarded as a
age brightness of its corresponding areas is low, which makes the phase special kind of nonlinearity error, we can still reduce the saturation error
information is easily influenced by noise shown in Fig. 1 (b); 2) when by increasing the number of phase-shifting step [12].
the value of r is large, the corresponding areas is prone to be saturated, In both cases, increasing the number of phase-shifting step is a very
which leads to the phase information loss shown in Fig. 1 (c). effective way of reducing the phase error. To better illustrate its ef-
In the first case, the light intensity value (grayscale) of each pixel is fectiveness, three-step phase-shifting and twelve-step phase-shifting are
within the dynamic range of camera, and thus we can directly use the used to simulate the phase error with different reflectivity. In the sim-
standard N-step phase shifting algorithm to calculate 𝜙. First, Eq. (2) is ulation, the camera is simulated as an 8-bit camera with resolution of
three-step phase-shifting, while smaller than that of twelve-step phase-

shifting. The camera captures the reflected fringe patterns simultane-
ously. These twelve raw images are labeled as 𝐼𝑘𝑐 (𝑘 = 0, 1, … , 11). Since
′
phase estimation is actually a regression problem, our network is de-

signed to be a supervised learning network. The features (labeled as I1 ,
I2 , and I3 ) of our supervised learning network are three images chosen
from the twelve raw images (𝐼0𝑐 , 𝐼4𝑐 , and 𝐼8𝑐 ). And the labels (labeled as
′ ′ ′
Sgt and Cgt ), which are also known as the ground truth, can be calculated
as follows:
∑
11 ( )
2𝑘𝜋
𝐼𝑘𝑐 sin
′
𝑆𝑔𝑡 = , (13)
𝑘=0
12
Fig. 2. Simulation results. (a) MAE of three-step phase-shifting with different ( )

∑
11
2𝑘𝜋
𝐼𝑘𝑐 cos
′
reflectivity; (b) MAE of twelve-step phase-shifting with different reflectivity; 𝐶𝑔𝑡 = . (14)
(c) Measurable reflectivity range of three-step phase-shifting; (d) Measurable 𝑘=0
12
reflectivity range of twelve-step phase-shifting.
These features and labels constitute together our input data for training
shown in Fig. 3.
1000 × 500. For three-step phase-shifting, the simulation image 𝐼𝑘𝑠 can Then we input the training data to our CNN. After passing through
be expressed as the first convolutional layer, the images are abstracted to feature maps
[ ( )] with size of w × h × c (image width × image height × image chan-
20𝜋 2𝑘𝜋
𝐼𝑘𝑠 (𝑥, 𝑦) = 𝑟 127.5 + 127.5 cos ⋅𝑥− + 𝐼 𝑠𝑛 , nels). For the convolutional layer, the number of channels depends on
1000 3 (9)
𝑘 = 0, 1, 2; the number of convolution filters (kernels). In general, the more filters
are used, the more accurate the network estimates are. However, using
and for twelve-step phase-shifting, the simulation image 𝐼𝑘𝑠 can be ex- more filters means much more time for training. To balance the network
pressed as performance and efficiency, the number of filters is set to 50 empirically.
[ ( )] After convolution, the value of pixel (x, y) in the jth feature map, and
20𝜋 2𝑘𝜋
𝐼𝑘𝑠 (𝑥, 𝑦) = 𝑟 127.5 + 127.5 cos ⋅𝑥− + 𝐼 𝑠𝑛 ,
1000 12 (10) in the ith convolutional layer can be denoted as [25]
𝑘 = 0, 1, … , 11,
∑ 𝑃∑
𝑖 −1 𝑄
∑ 𝑖 −1 [ ]
where (x, y) is the pixel coordinate of camera, and Isn is the added Gaus- 𝑣𝑥𝑦
𝑖𝑗 = 𝑏𝑖𝑗 + 𝑤𝑝𝑞 (𝑥+𝑝)(𝑦+𝑞)
𝑖𝑗𝑛 ⋅ 𝑣(𝑖−1)𝑛 , (15)
sian noise with a mean value of 0 and a variance of 1. Fig. 2 shows the 𝑛 𝑝=0 𝑞=0
mean absolute error (MAE) with different values of r. In this simulation, where bij is a bias term for the jth feature map, n is the set of the feature
the acceptable MAE is set within 0.04 rad. According to this, the cor- maps in the (𝑖 − 1)th layer, 𝑤𝑝𝑞
𝑖𝑗𝑛 is the value at the position (p, q) of the
responding measurable reflectivity ranges of three-step phase-shifting filter, P and Q are the height and width of the filter, respectively. In this
and twelve-step phase-shifting are about 0.6–1.0 and 0.3–3.0 respec- paper, the size of each filter is 3 × 3 and the convolution stride is 1.
tively show in Fig 2 (c)-(d). Here we use the quotient of the maximum The following pooling layer simplifies the output of the first con-
measurable reflectivity divided by the minimum measurable reflectivity volutional layer by performing nonlinear downsampling. Each output
to represent the dynamic range of N-step phase-shifting (DRN ): data of the first convolutional layer is downsampled by × 1, × 2, × 4,
1.0 5 and × 8 in four different paths. Therefore, its size is then transformed
𝑇 ℎ𝑟𝑒𝑒 − 𝑠𝑡𝑒𝑝 ∶ 𝐷𝑅3 = = , (11) into w × h × 50, 𝑤2 × ℎ2 × 50, 𝑤4 × ℎ4 × 50, and 𝑤8 × ℎ8 × 50. These paths
0.6 3
are designed to perceive more surface details. The output of the pool-
3.0 ing layer is followed by four residual blocks [26,27]. They can speed
𝑇 𝑤𝑒𝑙𝑣𝑒 − 𝑠𝑡𝑒𝑝 ∶ 𝐷𝑅12 = = 10. (12)
0.3 up the convergence of the training phase, thus increasing the training
The calculation results show that the dynamic range of twelve-step efficiency. Following the residual blocks, there are several upsampling
phase-shifting is six times larger than that of three-step phase-shifting in blocks [28] which are used to return the above multi-scale data to their
the simulation. However, the measurement speed of twelve-step phase- original dimensions. After passing through the second convolutional
shifting is four times slower than that of three-step phase-shifting. It can layer, the output data of these four dataflow paths are concatenated
be seen that the conventional phase-shifting method cannot satisfy the into a tensor with size of w × h × 200 by the concatenation block. It
requirements of high-speed and HDR at the same time. is input to the last convolutional layer. Finally, the output of the last
convolutional layer results in two channels which are denoted as S and
C. In fact, S and C are the numerator and denominator of Eq. (6) respec-
2.2. Phase calculation by deep learning tively, and thus the wrapped phase calculated by our network can be
expressed as
As discussed before, there is a dilemma between increasing measure- ( )
𝑆
ment speed and improving dynamic range in the conventional phase- 𝜙𝐶𝑁𝑁 = tan−1 . (16)
shifting method. In recent years, with the rapid development of com- 𝐶
puter technology, this dilemma can be well solved by using deep learn- S and C cannot be directly used for phase calculation. Their accuracy
ing. Our deep learning network is based on the convolutional neural should be verified by the loss function. The loss function is defined as
network (CNN), it consists of an input and an output layer, as well as the mean-squared-errors (MSE) of S and C with respect to Sgt and Cgt :
multiple hidden layers. Its simplified architecture is detailed in Fig. 3. ∑ ∑𝐻 ‖ ‖2
The deep learning process proceeds in two stages called training and 𝐿𝑜𝑠𝑠(𝜃) = 𝑊 1×𝐻 𝑊 𝑦=1 [ ‖ 𝑆(𝑥, 𝑦) − 𝑆𝑔𝑡 (𝑥, 𝑦)‖ +
𝑥=1 ‖ ‖ (17)
test. ‖ ‖2
‖𝐶(𝑥, 𝑦) − 𝐶𝑔𝑡 (𝑥, 𝑦)‖ ],
In the training stage, we need to collect a large amount of train- ‖ ‖
ing data to train our network. For each measured object, we project where 𝜃 is the parameter space which includes the weights, bias and so
the twelve-step phase-shifting patterns onto it. The projected light in- on. In the training, Loss(𝜃) can be considered as a feedback parameter.
tensity is designed to be larger than the measurement upper limit of Based on Loss(𝜃), the adaptive moment estimation (ADAM) [29] can
Fig. 3. Architecture of our CNN, which is consists of an input and an output layer, as well as multiple hidden layers and rapidly processes input data in a parallel,
multi-scale manner.
tune 𝜃 to minimize the loss function. And the training stage will continue
to run until we get the minimum value of Loss(𝜃).
In the test stage, we also captured twelve images for each test object.
Note that the test objects are never seen by our network in the training Fig. 4. Phase MSE with different frequencies when the phase-shifting step is 3.
stage. Their real wrapped phase 𝜙gt can be calculated as
( )
𝑆𝑔𝑡
𝜙𝑔𝑡 = tan−1 . (18) projector is 912 pixel, and these frequencies can make the values of
𝐶𝑔𝑡
wavelengths being integers), when the phase-shifting step is 3.
Different from the training data, each test data contains only three im- The main reason for this phenomenon is that the phase accuracy
ages (I1 ~ I3 ), which means its S and C are calculated by our network of our network relies on Sgt and Cgt . As mentioned before, Sgt and Cgt
without labels. For the test data, we can use their 𝜙CNN and 𝜙gt to cal- are calculated with the twelve-step phase-shifting algorithm. For low
culate the phase error. If the phase error is within the acceptable level, frequencies, especially the basic-frequency (𝑓 = 1), twelve-step is not
we consider that the network is available for actual measurement. Oth- enough to eliminate the saturation error [30], which inevitably impacts
erwise, we need to adjust the network parameters and return to the the output of our network. While for high frequencies, the excessively
training stage. high frequencies (f > 90) deteriorate the contrast and sinusoidal prop-
erty due to the restraint of projector resolution. On the other hand, the
2.3. Stereo phase unwrapping excessively high frequencies also increase the impact of saturation due
to the point spread function (PSF) of camera [31]. After several exper-
The phase of our network provided from Eq. (16) is the wrapped iments, we have found that the applicable range of frequency is about
phase, which has 2𝜋 phase discontinuities. This wrapped phase cannot 60–90.
be directly used for 3D reconstruction. To obtain a continuous phase In this paper, stereo phase unwrapping (SPU) [20] is used to process
distribution, phase unwrapping must be carried out. After years of ex- phase unwrapping. A typical FPP system using SPU is composed of two
ploration and development, there are many mature phase unwrapping cameras and a projector shown in Fig. 5 (a). Depending on the geomet-
methods, but not all of them are suitable for our wrapped phase. That’s ric relationship between two cameras, SPU can calculate the unwrapped
because the phase accuracy of our network changes with the fringe fre- phase without the assistance of the basic-frequency fringe patterns. Fur-
quency f. Here, if the fringes are perpendicular to the horizontal axis of thermore, an additional advantage of SPU is that we can improve its
the projector, f can be expressed as: robustness by depth constraint [32]. According to the motion states of
measured objects and the FPP system parameters, we can approximately
𝑅 estimate the measurement range. 𝑍min and 𝑍max shown in Fig. 5 (b) rep-
𝑓 = , (19)
𝜆 resent the minimum and maximum depth boundaries respectively. If we
where R is the horizontal resolution of the projector, 𝜆 is the wavelength set the measurement range Z as 𝑍min < 𝑍 < 𝑍max , we can remove some
of sinusoidal fringe patterns. In actual measurement, we find that our candidates (e.g., P1 and P5 shown in Fig. 5 (b)) to reduce the computa-
network is more applicable to high frequency fringe patterns. Fig. 4 dis- tional load. However the applicable frequencies of our network (60–90)
plays some measurement results (here the reason why the frequencies are too high for SPU. As shown in Fig. 5 (c), when the fringe frequency
are not integers multiple of ten is that the horizontal resolution of our is high, there still exists many candidates in the measurement range
Fig. 6. (a) Photograph of the standard spheres; (b) Captured images with dif-
ferent values of LI; (c) Cross-section plot of the captured images in (b).
workstation with a CPU (model: Intel Corei7-7800X) and a GPU (model:

Nvidia GeForceGTX-1080Ti). The operating system of our workstation
is Ubuntu and the framework of our network is TensorFlow (Google).
As the performance of the deep neural network largely depends
on the quantity of training data, we captured 960 different scenes in
Fig. 5. (a) Diagram of SPU and depth constraint; (b) Overhead view of (a); (c) the training stage. Each scene had three images and two labels so our
Situation when using high frequency fringe patterns. training data contained 2880 fringe images (960 × 3) and 1920 labels
(960 × 2), which took our network about 14 hours to finish the train-
ing. As mentioned before, we needed fringe patterns with two frequen-
which may cause phase ambiguity. Of course, we can further narrow
′ ′ cies (57 and 76) for SPU. These two frequencies needed to be trained
the measurement range by using 𝑍min and 𝑍max . But in this way, we
separately. Therefore, the total time required for training was about 28
can hardly guarantee the measured objects are located in such narrow
hours. Once the training was complete, we could obtain the 3D measure-
volume, especially when the object is moving.
ment results quickly and precisely. In this section, we measured several
In this paper, we use multi-wavelength temporal phase unwrapping
scenes to demonstrate the feasibility and effectiveness of our method.
(MTPU) [33] to solve the above problem. We first choose two frequen-
cies e.g., 57 and 76, labeled as f1 and f2 respectively. And their wave-
lengths of the corresponding fringe images are 3.1. Quantitative measurement
𝜆1 = 𝑅∕𝑓1 = 912∕76 = 12 (𝑝𝑖𝑥𝑒𝑙) In the quantitative measurement, we measured a pair of standard

. (20)
𝜆2 = 𝑅∕𝑓2 = 912∕57 = 16 (𝑝𝑖𝑥𝑒𝑙) ceramic spheres shown in Fig. 6 (a). They have the radii of 25.4 mm
We can synthesize these fringe images to a synthetic fringe pattern. Ac- within micron-sized manufacturing error.
cording to MTPU, the wavelength 𝜆sy , the frequency fsy , and the phase In order to determine the dynamic range, we projected the fringe
𝜙sy of the synthetic pattern can be calculated as patterns with nine different light intensities shown in Fig. 6 (b). The
values of these light intensity (LI) are 15, 25, 50, 75, 100, 125, 150,
𝜆𝑠𝑦 = 𝜆1 𝜆2 ∕(𝜆2 − 𝜆1 ) = 12 × 16∕(16 − 12) = 48 (𝑝𝑖𝑥𝑒𝑙), 175, and 200 respectively. As shown in Fig. 6 (c), the light intensities
𝑓𝑠𝑦 = 𝑅∕𝜆𝑠𝑦 = 912∕48 = 19, (21) with the value of 15–25 correspond to the low illumination situation,
𝜙𝑠𝑦 (𝑥, 𝑦) = [𝜙1 (𝑥, 𝑦) − 𝜙2 (𝑥, 𝑦)]𝑚𝑜𝑑(2𝜋), while the light intensities with the value of 75–200 correspond to the
where 𝜙1 and 𝜙2 are the wrapped phase corresponding to f1 and f2 re- saturation situation. For the standard spheres, its optimal light intensity
spectively, and mod is the modulus operation. And the final wrapped is 50.
phase 𝜙final can be calculated with the assistance of the synthetic pat- We first calculated the 3D point cloud data, and then fitted the 3D
terns as follows: point cloud data into a sphere model. The fitted spheres were considered
[ ] to be the ground truth. The measurement error was calculated based
(𝜆𝑠𝑦 ∕𝜆1 ) ⋅ 𝜙𝑠𝑦 − 𝜙1
𝜙𝑓 𝑖𝑛𝑎𝑙 = 𝜙1 + 2𝜋 ⋅ 𝑅𝑜𝑢𝑛𝑑 , (22)
2𝜋
where Round(X) rounds each element of X to the nearest integer. For
𝜙final , its frequency is equal to fsy which is low enough to avoid phase
ambiguity for large measurement range, while maintaining high phase
accuracy. Combined with SPU, we can obtain the unwrapped phase with
high accuracy, and then the 3D coordinates (xw , yw , zw ) can be retrieved
by the parameter matrices derived from calibration parameters [34].
3. Experiments
We built a hardware system to evaluate the proposed method.

The system consists of two parts: the data collection part and data
training part. The data collection part contains a projector (model: TI
LightCrafter4500) with a resolution of 912 × 1140 and two 8-bit cam-
eras (model: Basler acA640-750um) with a resolution of 640 × 480.
Each camera is outfitted with a 8 mm focal length lens (model: Com-
putar M0814-MP2). The data training part contains a deep learning Fig. 7. Overview of the 3D reconstruction result of our method when LI is 75.
Fig. 8. Experimental results when LI is 75. (a) 3D reconstruction of Sphere A of three-step phase shifting; (b) 3D reconstruction of Sphere B of three-step phase
shifting; (c) Error distribution of (a); (d) Error distribution of (b); (e) 3D reconstruction of Sphere A of our method; (f) 3D reconstruction of Sphere B of our method; (g)
Error distribution of (e); (h) Error distribution of (f); (i) 3D reconstruction of Sphere A of twelve-step phase shifting; (j) 3D reconstruction of Sphere B of twelve-step
phase shifting; (k) Error distribution of (i); (l) Error distribution of (j);.
Fig. 9. RMSE values with different values of LI.
lem is not the design flaws of our network, but the phase information
loss of the input data. As is known to all, when the image saturation
occurs, the grayscale values of the saturation regions are mapped to the
same value (255 for an 8-bit camera). For the three images input to our
network, when the saturation extent is serious, the grayscale values of
the same point in the three images are all 255. In this case, it is difficult
for our network to correctly extract the phase information from the im-
ages, as most of the useful information are overridden by the saturation
value. Although the dynamic range of our method is smaller than that of
twelve-step phase-shifting, it has the advantage of much faster measure-
ment speed. Compared with three-step phase-shifting, our method has
the same measurement speed. For Sphere A, If the acceptable RMSE is
set within 0.072 mm, we can refer to Eq. (11) to calculate the dynamic
ranges by using LI to replace r:
Fig. 10. RMSE curves of three methods. (a) RMSE curves of Sphere A of three
methods; (b) RMSE curves of Sphere A of twelve-step phase-shifting and our 𝑇 ℎ𝑟𝑒𝑒 − 𝑠𝑡𝑒𝑝 ∶ 𝐷𝑅3 = 50∕40 = 1.25
. (23)
method; (c) RMSE curves of Sphere B of three methods; (b) RMSE curves of 𝑂𝑢𝑟 𝑚𝑒𝑡ℎ𝑜𝑑 ∶ 𝐷𝑅𝑜𝑢𝑟 = 150∕25 = 6.00
Sphere B of twelve-step phase-shifting and our method.
on these fitted spheres. To better illustrate the improvement of the dy-

namic range, three-step phase-shifting and twelve-step phase-shifting
were used for comparison. Figs. 7 and 8 shows the experimental results
when LI is 75. It can be seen that twelve-step phase-shifting and our
method can obtain satisfactory measurement results in the saturation
situation shown in Figs. 8 (e)-(l). While three-step phase-shifting ba-
sically has no anti-saturation capability, as evidenced by two obvious
differences: 1) the ripples on the surface of Sphere A and B shown in
Fig. 8 (a) and (b); 2) a larger error distribution shown in Fig. 8 (c) and
(d).
Fig. 9 shows the RMSE with the different values of LI. For the RMSE
of three-step phase-shifting and that of twelve-step phase-shifting shown
in Fig. 10, they are in accord with the simulation results shown in Fig. 2. Fig. 11. (a) Photograph of the first scene; (b) One of the fringe images of (a);
For the RMSE of our method, one may notice that its curve appears (c) Grayscale values of (b) represented by the “jet” colormap; (d) Photograph of
change suddenly when LI reaches 125, and then the error is significantly the second scene; (e) One of the fringe images of (d); (f) Grayscale values of (e)
increased with LI shown in Fig. 10 (b) and (d). The cause of this prob- represented by the “jet” colormap.
Fig. 12. Experimental results of the first scenes. (a) Overview of the 3D reconstruction of three-step phase shifting; (b) Enlarged detail of the region in (a); (c)
Enlarged detail of the region in (b); (d) Overview of the 3D reconstruction of our method; (e) Enlarged detail of the region in (d); (f) Enlarged detail of the region in
(e).
Fig. 13. Experimental results of the second scenes. (a) Enlarged detail of the black cardboard in (b); (b) Overview of the 3D reconstruction of three-step phase
shifting; (c) Enlarged detail of the statue in (b); (d) Enlarged detail of the black cardboard in (e); (e) Overview of the 3D reconstruction of our method; (f) Enlarged
detail of the statue in (e).
The calculation results show that the dynamic range of our method is
4.8 times larger than that of three-step phase-shifting.
3.2. Qualitative measurement
In the qualitative measurement, we measured two complex scenes.

The first scene is a head statue of David shown in Fig. 11 (a). The second
scene contains two isolated objects shown in Fig. 11 (d), the left one is
a piece of black cardboard, and the right one is a white plaster statue.
The measurement results are shown in Figs. 12 and 13.
For the first scene, Fig. 12 (f) shows that our method can well re-
cover the details of the head statue, such as the curly hair and facial fea-
tures. Moreover, compared with three-step phase-shifting, our method
Fig. 14. (a) 3D reconstruction of the black cardboard of three-step phase-
can completely eliminate the measurement error caused by saturation
shifting; (b) Fitting result of the dotted line in (a); (c) 3D reconstruction of the
(the ripples on the reconstruction surface shown in Fig. 12 (c)).
black cardboard of our method; (d) Fitting result of the dotted line in (c).
For the second scene, Fig. 13 (c) and (f) show that our method can
obtain a better 3D measurement result in the saturation situation. To
illustrate our method can also work well for the objects with low re-
flectivity, a plane fitting algorithm is used to test the 3D measurement fitting result of our method, its RMSE is 0.0578 mm. The test results
results of the black cardboard. Fig. 14 (b) shows the fitting result of show that for the objects with low reflectivity, the measurement accu-
three-step phase-shifting, its RMSE is 0.1244 mm. Fig. 14 (d) shows the racy of our method is also better than that of three-step phase-shifting.
Fig. 15. (a) Photograph of the ping-pong ball; (b) One of the fringe images of
(a); (c) Grayscale values of (b) represented by the “jet” colormap; .
Fig. 17. Dynamic measurement results. (a) x-z view of the pendulum motion of
three-step phase-shifting; (b) Enlarged detail of the region in (a); (c) x-z view of
the pendulum motion of our method; (d) Enlarged detail of the region in (c).
one frame of the dynamic measurement result of our method. The com-
plete dynamic measurement results of our method can be found in Vi-
sualization 1 and Visualization 2. Visualization 1 is the x-y view of the
pendulum motion, Visualization 2 is the x-z view of the pendulum mo-
tion. For better observation, the display frame rates of Visualization 1
and 2 are both 5 Hz. The experimental results demonstrate that our
method can satisfy the measurement requirements of high-speed and
HDR at the same time.
4. Conclusions and discussion
This paper provides a new concept for HDR 3D measurement, which

is using the powerful calculating ability of deep learning to eliminate
the phase error caused by HDR. In this paper, we have demonstrated
Fig. 16. (a) Overview of the 3D reconstruction of three-step phase-shifting; (b) that a deep learning network can perform high accuracy phase recov-
Overview of the 3D reconstruction of our method. ery in both the low SNR and saturation situations with only three fringe
images. Compared with the conventional phase-shifting methods, our
method can reduce the number of projected fringe patterns, thus ef-
fectively improving the efficiency of HDR 3D measurement. Therefore,
In summary, our method can significantly improve the performance deep learning provides a possibility for the HDR 3D measurement in
in measuring objects with high dynamic range and complex surface real-time. Moreover, the experimental results show that our method can
shapes. not only provide high dynamic range for 3D measurement, but also en-
sure the measurement accuracy. One must pay attention to the follow-
ing two points. First one is that the saturation discussed in our paper
3.3. Dynamic measurement
is caused by diffuse reflection. For the saturation caused by specular
reflection, its principle is much more complex, and its wrapped phase
In the last experiment, we focused on the dynamic measurement abil-
cannot be calculated correctly by the phase-shifting algorithm [13]. Sec-
ity of our method. A hanging ping-pong ball with a black background
ond one is that due to the interpretation of deep learning is yet not clear,
was measured in the saturation situation shown in Fig. 15.
the adjustment of our network is done empirically, rather than theoret-
We made the ping-pong ball do simple pendulum motion, and then
ically. Though the precision of our network is high, there is much room
measured it with our system. In the experiment, the projection period
for improvement in our work.
of each fringe pattern was 12,000 𝜇s. Since every six patterns (three for
There are several aspects that need to be further improved in the
the frequency of 57 and three for the frequency of 76) reconstruct a 3D
proposed method, which we will leave for future consideration. First, as
result, the frame rate of 3D reconstruction was approximately equal to
mentioned before, the dynamic range of our network is limited to the
input data. Currently, we have only used the typical three-step phase-
1 (𝑠) 1 × 106 (𝜇𝑠)
= ≈ 13.89 (𝐻𝑧). (24) shifting fringe images, maybe some particular encoding fringe patterns
6 × 12000 (𝜇𝑠) 6 × 12000 (𝜇𝑠)
could achieve a larger dynamic range. Therefore, how to develop a ef-
It should be noted that since we cannot yet integrate the image acqui- fective encoding strategy is the first work for further investigation. Sec-
sition process into our deep learning framework, the image acquisition ondly, in this paper, six fringe images are used for phase unwrapping,
and 3D calculation are carried on separately. Therefore, the measure- which is not optimal since there are some phase unwrapping methods
ment speed in this paper is off-line. Fig. 16 shows the overview of the [35,36] can use less fringe images. How to use these more efficient phase
3D reconstruction results. Fig. 17 (a) shows one frame of the dynamic unwrapping methods is the key to further increase the measurement
measurement result of three-step phase-shifting, and Fig. 17 (c) shows speed of our network.
Funding [11] Suresh V, Wang Y, Li B. High-dynamic-range 3d shape measurement utilizing the

transitioning state of digital micromirror device. Opt Lasers Eng 2018;107:176–81.
[12] Zuo C, Feng S, Huang L, Tao T, Yin W, Chen Q. Phase shifting algorithms for fringe
This research was funded by National Natural Science Fund of projection profilometry: a review. Opt Lasers Eng 2018;109:23–59.
China (61722506, 61705105, 111574152), National Key R&D Pro- [13] Feng S, Zhang L, Zuo C, Tao T, Chen Q, Gu G. High dynamic range 3d
gram of China (2017YFF0106403), Final Assembly “13th Five-Year measurements with fringe projection profilometry: a review. Meas Sci Technol
2018;29(12):122001.
Plan Advanced Research Project of China (30102070102), Equip- [14] Hu E, He Y, Wu W. Further study of the phase-recovering algorithm for satu-
ment Advanced Research Fund of China (61404150202), The Key rated fringe patterns with a larger saturation coefficient in the projection grating
Research and Development Program of Jiangsu Province, China phase-shifting profilometry. Opt-Int J Light Electron Opt 2010;121(14):1290–4.
[15] Jiang C, Bell T, Zhang S. High dynamic range real-time 3d shape measurement. Opt
(BE2017162), Outstanding Youth Foundation of Jiangsu Province of
Express 2016;24(7):7337–46.
China (BK20170034), National Defense Science and Technology Foun- [16] Feng S, Zuo C, Tao T, Hu Y, Zhang M, Chen Q, et al. Robust dynamic 3-d mea-
dation of China (0106173), “333 Engineering Research Project of surements with motion-compensated phase-shifting profilometry. Opt Lasers Eng
2018;103:127–38.
Jiangsu Province, China (BRA2016407), Fundamental Research Funds
[17] Schmidhuber J. Deep learning in neural networks: an overview. Neural networks
for the Central Universities (30917011204, 30919011222). 2015;61:85–117.
[18] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436.
Declaration of Competing Interest [19] Feng S, Chen Q, Gu G, Tao T, Zhang L, Hu Y, et al. Fringe pattern analysis using
deep learning. Adv Photon 2019;1(2):25001.
[20] Weise T, Leibe B, Van Gool L. Fast 3d scanning with automatic motion compensation.
The authors declare no conflicts of interest. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE; 2007.
p. 1–8.
[21] Bruning JH, Herriott DR, Gallagher J, Rosenfeld D, White A, Brangaccio D. Digital
Supplementary material wavefront measuring interferometer for testing optical surfaces and lenses. Appl Opt
1974;13(11):2693–703.
Supplementary material associated with this article can be found, in [22] Rathjen C. Statistical properties of phase-shift algorithms. JOSA A
1995;12(9):1997–2008.
the online version, at 10.1016/j.optlaseng.2020.106245 [23] Li J, Hassebrook LG, Guan C. Optimized two-frequency phase-measuring-profilom-
etry light-sensor temporal-noise sensitivity. JOSA A 2003;20(1):106–15.
CRediT authorship contribution statement [24] Qi Z, Wang Z, Huang J, Xing C, Gao J. Error of image saturation in the structured–
light method. Appl Opt 2018;57(1):A181–8.
[25] Ji S, Xu W, Yang M, Yu K. 3d convolutional neural networks for human action recog-
Liang Zhang: Methodology, Validation, Writing - original draft, Vi- nition. IEEE Trans Pattern Anal Mach Intell 2012;35(1):221–31.
sualization, Software, Investigation. Qian Chen: Writing - review & edit- [26] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE conference on computer vision and pattern recognition; 2016.
ing, Supervision, Funding acquisition. Chao Zuo: Resources, Writing
p. 770–8.
- review & editing, Project administration, Funding acquisition. Shijie [27] He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: Eu-
Feng: Conceptualization, Writing - review & editing, Funding acquisi- ropean conference on computer vision. Springer; 2016. p. 630–45.
tion. [28] Shi W, Caballero J, Huszár F, Totz J, Aitken AP, Bishop R, et al. Real-time single
image and video super-resolution using an efficient sub-pixel convolutional neural
network. In: Proceedings of the IEEE conference on computer vision and pattern
References recognition; 2016. p. 1874–83.
[29] Kingma D.P., Ba J. Adam: a method for stochastic optimization. arXiv:141269804.
[1] Salvi J, Fernandez S, Pribanic T, Llado X. A state of the art in structured light patterns 2014.
for surface profilometry. Pattern Recognit 2010;43(8):2666–80. [30] Chen B, Zhang S. High-quality 3d shape measurement using saturated fringe pat-
[2] Geng J. Structured-light 3d surface imaging: a tutorial. Adv Opt Photon terns. Opt Lasers Eng 2016;87:83–9.
2011;3(2):128–60. [31] Zhang L, Chen Q, Zuo C, Tao T, Zhang Y, Feng S. High-dynamic-range 3d shape mea-
[3] Blais F. Review of 20 years of range sensor development. Proc SPIE Int Soc Opt Eng surement based on time domain superposition. Meas Sci Technol 2019;30(6):65004.
2004;13(1):228–40. [32] Zuo C, Tao T, Feng S, Huang L, Asundi A, Chen Q. Micro fourier transform pro-
[4] Zhang S, Yau ST. High dynamic range scanning technique. Opt Eng filometry (𝜇ftp): 3d shape measurement at 10,000 frames per second. Opt Lasers
2009;48(3):33604. Eng 2018;102:70–91.
[5] Waddington C, Kofman J. Saturation avoidance by adaptive fringe projection in [33] Zuo C, Huang L, Zhang M, Chen Q, Asundi A. Temporal phase unwrapping algo-
phase-shifting 3d surface-shape measurement. In: Optomechatronic technologies rithms for fringe projection profilometry: a comparative review. Opt Lasers Eng
(ISOT), 2010 international symposium on. IEEE; 2010. p. 1–4. 2016;85:84–103.
[6] Li D, Kofman J. Adaptive fringe-pattern projection for image saturation avoidance [34] Liu K, Wang Y, Lau DL, Hao Q, Hassebrook LG. Dual-frequency pattern scheme for
in 3d surface-shape measurement. Opt Express 2014;22(8):9887–901. high-speed 3-d shape measurement. Opt Express 2010;18(5):5229–44.
[7] Chen C, Gao N, Wang X, Zhang Z. Adaptive pixel-to-pixel projection intensity adjust- [35] Tao T, Chen Q, Feng S, Qian J, Hu Y, Huang L, et al. High-speed real–
ment for measuring a shiny surface using orthogonal color fringe pattern projection. time 3d shape measurement based on adaptive depth constraint. Opt Express
Meas Sci Technol 2018;29(5):55203. 2018;26(17):22440–56.
[8] Feng S, Zhang Y, Chen Q, Zuo C, Li R, Shen G. General solution for high dynamic [36] Breitbarth A, Müller E, Kühmstedt P, Notni G, Denzler J. Phase unwrapping of fringe
range three-dimensional shape measurement using the fringe projection technique. images for dynamic 3d measurements without additional pattern projection. In: Di-
Opt Lasers Eng 2014;59:56–71. mensional optical metrology and inspection for practical applications IV, 9489. In-
[9] Zhang C, Xu J, Xi N, Zhao J, Shi Q. A robust surface coding method for optically chal- ternational Society for Optics and Photonics; 2015. p. 948903.
lenging objects using structured light. IEEE Trans Autom Sci Eng 2014;11(3):775–88.
[10] Lin H, Gao J, Mei Q, He Y, Liu J, Wang X. Adaptive digital fringe projection tech-
nique for high dynamic range three-dimensional shape measurement. Opt Express
2016;24(7):7703.

High Speed High Dynamic Range 3D Shape Measuremente Based On Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

High Speed High Dynamic Range 3D Shape Measuremente Based On Deep Learning

Uploaded by

Copyright:

Available Formats

Optics and Lasers in Engineering 134 (2020) 106245

Contents lists available at ScienceDirect

Optics and Lasers in Engineering

High-speed high dynamic range 3D shape measurement based on deep

1. Introduction provide details in highly illuminated areas. The multi-exposure method

the dynamic range with a satisfactory measurement speed is crucial in

three-step phase-shifting, while smaller than that of twelve-step phase-

phase estimation is actually a regression problem, our network is de-

Fig. 2. Simulation results. (a) MAE of three-step phase-shifting with diﬀerent ( )

workstation with a CPU (model: Intel Corei7-7800X) and a GPU (model:

𝜆1 = 𝑅∕𝑓1 = 912∕76 = 12 (𝑝𝑖𝑥𝑒𝑙) In the quantitative measurement, we measured a pair of standard

We built a hardware system to evaluate the proposed method.

Fig. 9. RMSE values with diﬀerent values of LI.

on these ﬁtted spheres. To better illustrate the improvement of the dy-

3.2. Qualitative measurement

In the qualitative measurement, we measured two complex scenes.

4. Conclusions and discussion

This paper provides a new concept for HDR 3D measurement, which

Funding [11] Suresh V, Wang Y, Li B. High-dynamic-range 3d shape measurement utilizing the

You might also like