You are on page 1of 14

Multimedia Tools and Applications (2020) 79:23189–23202

https://doi.org/10.1007/s11042-020-09127-7

Hybrid cost aggregation for dense stereo matching

Ming Yao 1 & Wenbin Ouyang 2 & Bugao Xu 1,2

Received: 5 May 2019 / Revised: 30 December 2019 / Accepted: 27 May 2020 /


Published online: 6 June 2020
# Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract
Matching cost initialization and aggregation are two major steps in the stereo matching
framework. For dense stereo matching, a matching cost needs to be computed at each
pixel for all disparities within the search range so that it can be used to evaluate pixel-to-
pixel correspondence. Cost aggregation connects the matching cost with a certain
neighbourhood to reduce mismatches by a supporting smoothness term. This paper
presents a hybrid cost aggregation method to overcome mismatches caused by textureless
surface, depth-discontinuity areas, inconsistent lightings in an image. The steps taken to
aggregate costs for an energy function include adaptive support regions, multi-path
aggregation, and adaptive penalties to generate a more accurate disparity map. Compared
with two top-ranked stereo matching algorithms, the proposed algorithm yielded the
disparity maps of the dataset in Middlebury benchmark V2 with smaller error ratios in
depth-discontinuity regions.

Keywords Stereo matching . Hybrid cost aggregation . Adaptive support region

1 Introduction

Stereo correspondences rely on matching costs for determining disparities of corresponding


pixels from paired images [3]. A robust stereo matching method must be able to handle
ambiguities caused by low texture, radiometric bias, depth discontinuity and other abnormality
in the matching images. For dense stereo matching, four major steps, cost initialization, cost
aggregation, disparity computation and disparity refinement, are normally needed. Of these,
cost aggregation is an important step to update values in the 3D cost volume generated in the
initialization with costs calculated over local support regions, and it is required for all the local

* Bugao Xu
Bugao.xu@unt.edu

1
Department of Biomedical Engineering, University of Texas at Austin, Austin, TX 78712, USA
2
Department of Computer Science and Engineering, University of North Texas, Denton, TX 76207,
USA
23190 Multimedia Tools and Applications (2020) 79:23189–23202

methods [12, 18] and for many global optimization methods [11]. Much research has
demonstrated that proper selection and aggregation of matching costs of neighbouring pixels
can make a local optimizer be more effective and efficient than many global optimization
techniques in generating accurate disparity maps [5, 10].
Approaches of cost aggregation can differ in support regions, aggregation paths and penalty
functions [6] used for calculating the new costs. The support regions for a local approach
traditionally uses square windows of a fixed size with an assumption that the disparities of all
pixels in a window are close to the disparity of its center pixel. When this assumption does not
apply, the square window approach performs badly in areas near depth discontinuity bound-
aries [5]. To improve the matching accuracy in depth-discontinuity areas, the shiftable-window
[3], multi-windows [6], and windows with adaptive sizes and shapes [8, 16] have been used as
support regions. The searching for the best disparity in the cost volume is a 2D minimization
problem. To be more efficient, the minimization can be performed along individual image
rows using dynamic programming (DP) [2, 15]. However, it is difficult to relate the 1D
optimization of individual image rows to neighbouring rows, causing streaking effects in DP
solutions [2]. An effective way to avoid streaking in a disparity map is to aggregate matching
costs from multiple directions [6]. During the aggregation along a specific path, different
penalty coefficients in the energy function can be assigned to the pixels in the support region
based on their differences in disparity with the the current pixel, so that pixels with large
disparity changes can be highlighted [6].
For the past years, we have been developing stereovision systems for accurate body
measurements [17, 19], in which a targeted object contains homogeneous texture (skin) and
depth discontinuity regions (edges) and is often imaged under inconsistent lighting conditions.
We adopted multiple techniques in each of the four stereo-matching steps to combat these
problems. In this paper, we focus on reporting a hybrid cost-aggregation approach that
combines adaptive support regions [6], multi-paths and multi-penalties [2], and experimental
results on some Middlebury benchmark images and real human body images.

2 Energy function

The matching cost is calculated between pixel p at (px, py) in a base (left) image and its
potential correspondence pixel q in the match (right) image. The function, q = ebm(p, d),
represents the epipolar line in the right image for pixel p, where d is called disparity and bm
indicates the mapping of the base image to the match image. If the paired images are rectified,
we have ebm(p, d) = [px + d, py]T. In the cost initialization, the raw matching cost between p and
q at d can be measured by the color similarity of pixels in their support region. To balance the
performance in matching accuracy and the robustness while dealing with matching ambiguity,
a hybrid approach seems more effective than any single method. The matching cost volume,
C(p, d), can be calculated by using the normalized cross-correlation CNCC(p, d) [16], the census
transform CC(p, d) [7], and the background-suppressed color absolute difference CAD(p, d)
with bilateral filtering [14] as follows:

C ðp; d Þ ¼ ρðC NCC ; λNCC Þ þ ρðC AD ; λAD Þ þ ρðC C ; λC Þ ð1Þ

where ρ(C, λ) is a robust function of variable C defined as:


Multimedia Tools and Applications (2020) 79:23189–23202 23191

 
  C ½ ðp; d Þ
ρ C ½ ; λ½ ¼ 1−exp − ð2Þ
λ½

The purpose of this function is twofold. Firstly, it maps different cost measures to the range of
[0, 1] so that C(p, d) will not be severely biased by one of the measures. Secondly, it allows a
customizable control on the impact of outliers with parameter λ. C(p, d) is computed at every
pixel and with every possible disparity. However, the pixel-wise calculation of C(p, d) can
easily generate errors because of mis-matches, and smoothness constraints are needed to
penalize variations of neighboring disparities in C(p, d). Thus, the energy function, E(D),
can be introduced over the disparity map D as [2]:

h i
E ðDÞ ¼ ∑p C ðp; DðpÞÞ þ ∑k∈Ωp P1 T ðjDðpÞ−DðkÞj ¼ 1Þ þ ∑k∈Ωp P2 T ðjDðpÞ−DðkÞj > 1Þ
ð3Þ

The first term in Eq. (3) is the pixel-wise matching cost at pixel p where the disparity is d =
D(p). The second term is the sum of penalties, P1, for all pixels in p’s neighborhood Ωp, whose
disparities differ from p’s disparity by one step. The third term adds larger penalties, P2, for
pixels in Ωp whose disparity differences from p are larger than 1. The function T(∙) takes a
boolean expression which returns 1 if its value is true, and 0 otherwise. A smaller penalty for a
small disparity change permits an adaptation of slanted or curved surfaces in a 3D scene, while
a larger penalty for large disparity changes helps to preserve depth discontinuity borders.

3 Aggregation with cross-based support region

In Eq. (3), C(p, d) can be updated by aggregating raw matching costs in a local support region
of anchor pixel p. The size and shape of the support region should be adaptive for each pixel to
ensure that all pixels in the region belong to the same 3D surface and at similar depths. To
decide the pixel-wise support regions U(p) for pixel p in the left image and U(q) for pixel q in
the right image, we adopt an approach that is built on upright crosses [21]. As shown in Fig. 1,
n o
a cross at a kernel pixel p composes of four arms with lengths of h−p ; hþ − þ
p ; vp ; vp . The pixel at
the end of the left arm, pl, is determined by two rules:

1. Dr(pl, p) < τ, where Dr(pl, p) is the radiometric difference between pl and p, and τ is a
preset threshold. The radiometric difference is defined as

Dr ðpl ; pÞ ¼ maxi∈fR;B;Gg jI i ðpl Þ−I i ðpÞj ð4Þ

2. Ds(pl, p) < L, where Ds(pl, p) is the spatial difference (or distance) between pl and p, and L
is a preset maximum distance in pixels. The spatial distance is defined as Ds(pl, p) = |pl −
p|.

The two rules pose constraints on radiometric similarity and support size with parameters τ and
L. The right, up and bottom arms of p are built in the same way. The support region U(p) is
23192 Multimedia Tools and Applications (2020) 79:23189–23202

Vertical arm V(p)

p’ Horizontal arm H(p’)

p Horizontal arm H(p)

Support region U(p)

Fig. 1 The adaptive support region U(p) at pixel p is constructed by merging multiple horizontal segments H(p′)
along the vertical segment V(p)

constructed by merging multiple horizontal segments H(p′) along the vertical segment V(p),
where p′ is a support pixel from V(p). Due to the orthogonal construction of a cross, the
complete map of support regions for all pixels in the image can be computed conveniently.
Parameters τ and L control the shape and size of a support region. Textureless regions may
require large τ and L values to include enough color variations. But simply increasing these
parameters for all the pixels would introduce more errors at pixels with depth discontinuity.
We therefore enhance the cross construction with a dual-threshold scheme:

 
1. Dr ðpl ; pÞ < τ 1 and Dr pþ
l ; pl < τ 1 ;

2.
Ds ðpl ; pÞ < L1 ;

3.
Dr ðpl ; pÞ < τ 2 ; if L2 < Ds ðpl ; pÞ < L1 :

Rule 1 restricts not only the radiometric difference between pl and p, but also the radio metric
difference between pl and its successor pþl on the same arm. Thus, the arm will not run over an
edge in the image. Rules 2 and 3 allow more flexible controls on the arm length. A large L1 is
used to include enough pixels for textureless regions. But when the arm length exceeds a preset
value L2(L2 < L1), a much stricter threshold τ2(τ2 < τ1) is introduced for Dr(pl, p) to make sure
that the arm only extends in a region with a very similar color.
Figure 2 shows two examples of the adaptive support regions using upright crosses.
Parameters used to compute cross arms were L1 = 30, L2 = 20, τ1 = 15, and τ2 = 8. The local
support regions approximated local texture structures with great consistency (red lines).

4 Multipath aggregation

In the cost volume, C(p, d), the matching cost at pixel p and disparity d can be aggregated by
summing the costs on all 1D minimum cost paths passing through (p, d). At (p, d), the cost
along a path in the direction g, Lg ðp; d Þ; is defined recursively as [6],
Multimedia Tools and Applications (2020) 79:23189–23202 23193

Fig. 2 Construction of cross-based local support regions on the Aloe and Cones images. Left column: pixelwise
adaptive crosses are constructed from local support skeletons for each kernel pixel. Right column: the shape-
adaptive local support regions, which approximate local texture structures, are dynamically generated by
integrating multiple horizontal arms of neighboring crosses

 
0 0 0 0 0
Lg ðp; d Þ ¼ C ðp; d Þ þ min Lg ðp−g; d Þ; Lg ðp−g; d−1Þ þ P1 ; Lg ðp−g; d þ 1Þ þ P1 ; min Lg ðp−g; iÞ þ P2 ;
i

ð5Þ

where the pixel coordinate at p − g is predecessor of p along the aggregation path g.


In addition to the cost at (p, d), the lowest cost of the previous pixel p − g of the path is
added with penalties P1and P2if depth discontinuity occurs. The values of Lg increase
constantly along the path, which may lead to very large values. Equation (5) can be modified
by subtracting the minimum path cost of the previous pixel from the whole term [6], that is,

h
Lg ðp; d Þ ¼ C ðp; d Þ þ min Lg ðp−g; d Þ; Lg ðp−g; d−1Þ þ P1 ; Lg ðp−g; d þ 1Þ
i
þ P1 ; min Lg ðp−g; iÞ þ P2 − min Lg ðp−g; jÞ ð6Þ
i j

The cost, Lg, is summed over paths in all directions g as:


S ðp; d Þ ¼ ∑ Lg ðp; d Þ ð7Þ
g

The cost volume, C(p, d), can be drawn in a 3D space whose x- and y-axes are the coordinates
of the image, z-axis is the disparity, and the value at voxel (x, y, z) is the matching cost C. Let
W and H be the width and height of the image, and D be the range of disparities. When cost
aggregation is performed on this cost volume L(W × H × D), multiple path directions can be
defined on each slice of the W-H plane at every disparity step. The parallelized cost aggrega-
tion is illustrated in Fig. 3, in which three path directions, horizontal (east), vertical (south), and
23194 Multimedia Tools and Applications (2020) 79:23189–23202

Cost volume L[W×H×D]

Aggregation cost path in Aggregation cost path in Aggregation cost path in


horizontal direction vertical direction diagonal direction
Fig. 3 Parallelized cost aggregation. Edges highlighted in red on the cross-sectional slice indicate the header
pixels for all paths. Three path directions, horizontal, vertical and diagonal, are shown. The shaded surfaces on
the cost volume represent the voxels serve as path headers for each aggregation direction at each disparity step

diagonal (southeast), are shown. A total of 16 path directions were implemented in our hybrid
cost aggregation scheme. The rest of the cost paths are similar to these three paths.
According to Eq. (5), the aggregated value at a voxel in the cost volume is dependent on its
immediate neighbors preceding to it along the path at a similar disparity, i.e., the disparity that
is one step smaller or greater, namely Lg(p − g, d), Lg(p − g, d − 1), and Lg(p − g, d + 1) in Eq.
(5). However, in a parallel computing framework, the execution order of the same operation on
multiple data cannot be predetermined during the programming stage because it is handled by
the operating system’s task scheduling mechanism at the run time. This data dependency calls
for a careful design of the parallel algorithm to ensure that all the required data have been
updated when following the path to compute the aggregated costs. When multi-thread
computing is executed for multi-path cost aggregation, all the threads must be synchronized
once they finish updating one pixel along the path. Before cost aggregation at a new direction
is started, all the header pixels that define the beginnings of each path along the current
direction are collected (edges highlighted in red in Fig. 2). Each thread then takes one row of
voxels of the cost volume at a specified disparity step, and update the aggregated cost at the
new location. Once updates are done, the threads are synchronized and are ready to move to
the next voxel along the path. This procedure can be visualized as the shaded planes indicated
in Fig. 3, shifting along the path direction one step a time when updating the cost volume. The
details of the parallelized cost aggregation algorithm are illustrated in Fig. 4.
A total of 16 paths were selected to cover 360° of each pixel for a sufficient coverage of the 2D
image. Paths that are not horizontal, vertical, or diagonal are implemented by going one step beyond
the horizontal, vertical or diagonal line to avoid the interpolation of costs between adjacent pixels.

5 Aggregation with adaptive penalties

As defined in the energy function in Eq. (3), two parameters, P1 and P2, are used to penalize
the disparity changes between neighboring pixels. During the aggregation along a specific
path, P1 penalizes small disparity changes (|Δd| = 1), and P2 penalizes large disparity changes
(|Δd| > 1). As suggested in [6], P2 can be made adaptive to the intensity gradient rather than a
constant value, that is,
Multimedia Tools and Applications (2020) 79:23189–23202 23195

Fig. 4 Parallel implementation of matching cost aggregation

P*2
P2 ¼ 0 ð8Þ
jI ðpÞ−I ðkÞj

where p and k are neighboring pixels in the base (left) image, and P*2 is a chosen constant.
Equation (8) is a reciprocal function of the absolute radiometric difference between two
neighbouring pixels. It generates a large penalty value when the absolute difference is small
due to its non-linearity, and it is a continuous function with respect to the absolute difference.
However, this function only depends on the radiometric differences in the base image, ignoring
the differences in the match image. It may reject disparity changes from an incorrect value at
the previous pixel to the correct one at the current pixel, when the pixel color barely changes in
the base image. This behavior can be corrected by checking the radiometric differences in both
the base and the match images. Instead of taking the inverse of difference, we can apply a step
function based on radiometric differences D1 = Dr(p, p − g) in the reference image and D2 =
Dr(p + d, p − g + d). Dr(·, ·)is the same function as defined in Eq. (4). Then, P1 and P2 can be
adaptively adjusted:
8
> P1 ¼ P*1 ; P2 ¼ P*2 ; if D1 < τ Agg ; D2 < τ Agg ;
>
>
>
> P*1 P*2
>
< P1 ¼ 4 ; P2 ¼ 4 ; if D1 ≥τ Agg ; D2 < τ Agg ;
>
P*1 P* ð9Þ
>
> P1 ¼ ; P2 ¼ 2 ; if D1 ≥τ Agg ; D2 < τ Agg ;
>
> 4 4
>
>
> * *
: P ¼ P1 ; P ¼ P2 ; if D ≥τ ; D ≥τ :
1 2 1 Agg 2 Agg
10 10

In the above rules, P*1 and P*2 are constants, and τAgg is a threshold for radiometric difference.
This ensures a fairly large penalty to be applied to disparity changes when radiometric
differences between two neighboring pixels in both the base and match images are small. In
contrary, a relatively small penalty will be applied when radiometric differences between
neighbouring pixels are large. For any cases between these two conditions, a median penalty
will be applied with a constraint, P*2 ≥P*1 , being ensured.
23196 Multimedia Tools and Applications (2020) 79:23189–23202

6 Experiments

For system evaluation, we implemented the presented algorithm in C++, and collected all
benchmark data on a desktop computer with a 2.8 GHz quad-core processor and 8 GB of
system memory. To further study the parallel performance and speedup with respect to the
number of available processor cores, we also enlisted a hex-core computer for comparison.
The parameters used in the different steps of our hybrid algorithm are given in Table 1. These
parameters are kept same for all test image pairs.
The results of applying the hybrid algorithm at depth discontinuities for cost aggregation
are shown in Fig. 5, in which the top row shows three original images of Cones, Teddy, and a
human body, and the second, third and fourth rows show their disparity maps generated
respectively without cost aggregation, with cost aggregation using constant penalties, and with
cost aggregation using adaptive penalties. In the fourth row, the disparity maps contain fewer
mismatched pixels (fewer and smaller holes), and the edges of foreground objects are more
accurate than the maps in the second and third rows. At an edge pixel of a foreground object,
our hybrid aggregation tags 16 path directions evenly distributed around a pixel, about half of
which are from the background to the foreground, and applies a large penalty to the disparity
change at the edge pixel. Quantitatively, the proposed algorithm yielded the minimum error
ratios on non-occlusion (nonocc), depth-discontinuity (disc), and all pixels (all) in the disparity
maps of Cones and Teddy as shown in Table 2. Here, the error ratio was defined as the
percentage of mismatched pixels with respect to the ground truth disparity map. The perfor-
mance of the proposed hybrid algorithm surpasses the other two algorithms in all the three
pixel-categories, which demonstrated effectiveness of the adaptive cost aggregation. As a
result, the disparity map computed from cost aggregation with adaptive penalties is more
complete and accurate than a disparity map computed from the other two ways.
Table 3 lists a quantitative comparison between the proposed algorithm and two other
stereo matching algorithms, IGSM [20] and JSOSP+GCP [9], as applied to the Ver. 2
Middlebury benchmark [13] with an error threshold of 1. On the Middlebury benchmark
among over 150 algorithms, the performances of IGSM and JSOSP+GCP were ranked the first
and second at the time when we started this study. As opposed to the two top-ranked stereo
matching algorithms, the proposed algorithm exhibits the lowest error ratio at depth-
discontinuity pixels (disc) across the four benchmark images although it has larger or

Table 1 Parameters setting for our cost computation and aggregation methods

Parameters Values Descriptions

λNCC 1.0 The control parameters for robust cost function ρ(C[·], λ[·])
λAD 30
λCensus 1.0
L1 22 Arm lengths for calculating adaptive support regions
L2 10
τ1 20 Thresholds of color difference for adaptive support
τ2 6
σr 20 Variance for radiometric difference in bilateral filtering
σs 3 Variance for spatial distance in bilateral filtering
P*1 1.0 Penalties to the costs at disparity discontinuity in cost aggregation
P*2 3.0
τAgg 15 Threshold of radiometric difference to determine disparity discontinuity
Multimedia Tools and Applications (2020) 79:23189–23202 23197

Fig. 5 Cost aggregation with adaptive penalties at depth discontinuity. Top row: depth maps computed without
cost aggregation; middle row: depth maps computed with static penalties in cost aggregation; bottom row: depth
maps computed with adaptive penalties

comparable error ratios in the other two categories. This is particularly important to our body
scanning project. For our body imaging, we designed the stereo system to cover only the front
and back surfaces of an imaged person in order to make the system compact and easy to set up,
thus portable. There was no coverage on the sides. A dedicated 3D body modeling technique
based on surface smoothness and continuity was used in our stereo system to compensate for
the uncovered side surfaces, resulting in high matching accuracy at body boundaries. This
greatly improved the system performance because errors on the boundaries directly influence
the accuracy of body measurements.
The cost computation and aggregation are both computation-intensive tasks, because both
are implemented to work on a three-dimensional cost volume. To improve the algorithm
23198 Multimedia Tools and Applications (2020) 79:23189–23202

Table 2 Comparison among the disparity maps generated by the proposed method with adaptive penalties, the
method with static penalties, and the matching algorithm without cost aggregation on Cones and Teddy

Algorithm (Error ratio %) Cones Teddy

nonocc all disc nonocc all disc

No cost aggregation 13.01 23.02 7.4 23.33 32.54 10.30


Aggregation with static penalties 8.89 18.81 5.9 13.97 23.09 7.40
Aggregation with adaptive penalties 5.64 11.59 4.30 4.23 9.45 3.67

performance, we took advantage of the parallel computing power from computers with multi-
core processors and implemented the matching cost computation and cost aggregation in a
multi-thread paradigm to accelerate the computational speed. The computation parallelism was
implemented in OpenMP [4]. OpenMP parallelizes a computation task by branching the
master thread into a number of slave threads that can be run concurrently to distribute the
task with the runtime environment allocating the threads to different cores or processors. The
number of concurrent threads for a parallel code region can be controlled by OpenMP’s API
call omp_set_num_threads(). Thread numbers from 1 to 8 were tested on our quad-core
desktop computer. The processor on our computer supported Hyper Threading, and thus 8
logical processors were available to our parallel program.
According to the Amdahl’s law [1], the, theoretical speedup of multi-threading can be
calculated by looking at the serial portion of the program. However, in reality, the practically
measured speedup is always lower than the theoretical maximum because of limited compu-
tational resources and the overhead in launching and scheduling compute threads. In our study,
the speedup was calculated as the ratio of the processing time in a serial mode to the time in a
parallel mode and it represents the practical measurement which is lower than the theoretical
maximum. In our experiments, the processing time was recorded by capturing the timestamps
through the time() call immediately before and after the code blocks of interest. We executed
the serial version and the parallel version of the same algorithms eight times on the Cones
image. The average of speedups at each thread number setting was calculated. To establish the
baseline for the speedup measurement, we took the average of the 8 execution times on 1 core
as T1 and the average execution time with n cores as Tn, and then the speedup can be measured
as T1 / Tn. Figure 6 shows the graphs of the speedups measured on three sub-algorithms: the
bilateral filtering of the input images, the accelerated computation of NCC on adaptive
supports, and the cost aggregation. Performance gains among the three sub-algorithms were
achieved by applying multiple threads in the computation. Approximately, a quad-core and a
hex-core desktop computer can speed up by 1.8x and 2.4x.

Table 3 Comparison among the disparity maps generated by the proposed algorithm and two top-ranked stereo
matching algorithms using the Middlebury benchmark version 2 [13] with error threshold of 1

Algorithm (Error ratio Tsukuba Venus Teddy Cones


%)
nonocc all disc nonocc all disc nonocc all disc nonocc all disc

Proposed algorithm 1.31 2.47 0.94 0.11 0.23 0.24 4.23 9.45 3.67 5.64 11.59 4.30
IGSM [20] 0.93 1.37 5.05 0.07 0.17 1.04 4.08 5.98 11.4 2.14 6.97 7.91
JSOSP+GCP [9] 0.74 1.34 3.98 0.08 0.16 1.15 3.96 10.1 11.8 2.28 7.91 6.74
Multimedia Tools and Applications (2020) 79:23189–23202 23199

Speedup of parallel computing


4.0
BiFil
3.5
NCC

3.0 Aggregation

Speedup 2.5

2.0

1.5

1.0

0.5
1 2 3 4 5 6 7 8
Num of cores

Fig. 6 Performance analysis of parallel computation on three algorithms: bilateral filtering (BiFil), NCC, and
cost aggregation multi-core desktop computers

7 Conclusion

In this paper, a hybrid cost aggregation framework that combines adaptive support region,
multi-path, and adaptive penalties was introduced. This aggregation approach was executed
based on the concept of global energy minimization to find a disparity map that yields the
lowest total costs. It calculated cross-based support regions of adaptive size and shape for each
pixel to allow the pixels in the region to have similar disparity, summed the costs along multi-
paths, and applied a high penalty to pixels where large disparity changes occurred to combat
mismatches particularly in depth-discontinuity regions. Compared with the two top-ranked
stereo matching algorithms, the proposed algorithm yielded the minimum error ratios with
threshold 1 in depth- discontinuity regions of the samples in Middlebury Benchmark V2 while
maintaining comparable or slightly higher error ratios in other regions, satisfying most the
need for our body imaging project in which the accuracy at edges is critical to circumferential
measurements. The parallel computing scheme presented in the paper enabled the algorithm to
be executed with overall speedups of 1.8x and 2.4x on a quad-core and a hex-core desktop
computer.

Compliance with ethical standards

Conflict of interest The authors declared no potential conflicts of interest with respect to the research,
authorship and/or publication of this article.

References

1. Amdahl GM (1967) Validity of the single processor approach to achieving large scale computing capabil-
ities. In Proceedings of the April 18–20, 1967, spring joint computer conference
2. Birchfield S, Tomasi C (1999) Depth discontinuities by pixel-to-pixel stereo. Int J Comput Vis 35(3):269–
293
3. Bobick AF, Intille SS (1999) Large occlusion stereo. Int J Comput Vis 33(3):181–200
4. Chapman B, Jost G, Pas RVD (2008) Using OpenMP: portable shared memory parallel programming. MIT
press vol 10
5. Gong M, Yang R, Wang L, Gong M (2007) A performance study on different cost aggregation approaches
used in real-time stereo matching. Int J Comput Vis 75(2):289–296
23200 Multimedia Tools and Applications (2020) 79:23189–23202

6. Hirschmuller H (2008) Stereo processing by semiglobal matching and mutual information. IEEE Trans
Pattern Anal Mach Intell 30(2):328–341
7. Hirschmuller H, Scharstein D (2007) Evaluation of cost functions for stereo matching. In Computer Vision
and Pattern Recognition, 2007. CVPR'07. IEEE Conference on pp 1–8
8. Kanade T, Okutomi M (1994) A sterro matching algorithm with an adaptive window: Theory and
experiment. Pattern Analysis and Machine Intelligence: IEEE Transactions on 16(9):902–932
9. Liu J, Li C, Mei F, Wang Z 3D entity-based stereo matching with ground control points and joint second-
order smoothness prior. The Visual Computer 31(9):1253–1269 215
10. Mei X, Sun X, Dong W, Wang H, Zhang X (2013) Segment-tree based cost aggregation for stereo
matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp 313–320
11. Mei X, Sun X, Zhou M, Jiao S, Wang H, Zhang X (2011) On building an accurate stereo matching system
on graphics hardware. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International
Conference on, pp 467–474
12. Rhemann C, Hosni A, Bleyer M, Rother C, Gelautz M (2011) Fast cost-volume filtering for visual
correspondence and beyond. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE
Conference on, pp 3017–3024
13. Scharstein D, Szeliski R Middlebury stereo vision page. [Online]. Available: http://vision.middlebury.
edu/stereo/. [Accessed 21 3 2018]
14. Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. Computer Vision, 1998. Sixth
International Conference pp 839–846
15. Van Meerbergen G, Vergauwen M, Pollefeys M, Gool LV (2002) A hierarchical symmetric stereo
algorithm using dynamic programming. Int J Comput Vis 47(1–3):275–285
16. Veksler O (2003) Fast variable window for stereo correspondence using integral images. In Computer
Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol 1 pp
I-I
17. Yao M, Xu B (2019) A dense stereovision system for 3D body imaging. IEEE Access 7(1):170907–170918
18. Yoon K-J, Kweon IS (2006) Adaptive support-weight approach for correspondence search. IEEE Trans
Pattern Anal Mach Intell 28(4):650–656
19. Yu W, Xu B (2010) A portable stereo vision system for whole body surface imaging. Image Vis Comput
28(4):605–613
20. Zhan Y, Gu Y, Huang K, Zhang C, Hu K (2016) Accurate image-guided stereo matching with efficient
matching cost and disparity refinement. IEEE Transactions on Circuits and Systems for Video Technology
26(9):1632–1645
21. Zhang K, Lu J, Lafruit G (2009) Cross-based local stereo matching using orthogonal integral images. IEEE
transactions on circuits and systems for video technology 19(7):1073–1079

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Multimedia Tools and Applications (2020) 79:23189–23202 23201

Ming Yao received his BS in 2003 and MS in 2006 both in Communication Engineering from Donghua
University, Shanghai, China, and his Ph.D in 2015 in Biomedical Engineering from University of Texas at
Austin, Texas, USA. His research interests include image processing, computer vision and computer graphics.

Wenbin Ouyang received his B.S. and M.S. degrees in Electrical Engineering from Donghua University,
Shanghai, China, and the Ph.D. degree in Computer Science and Engineering in 2018 from University of North
Texas, USA. His research areas include image processing, deep learning and 3D technology.
23202 Multimedia Tools and Applications (2020) 79:23189–23202

Bugao Xu received his Ph.D. degree in 1992 from the University of Maryland at College Park and joined the
faculty of the University of Texas at Austin in 1993. Since 2016, he has been a professor and the chair of
Department of Merchandising and Digital Retailing, and a professor in Department of Computer Science and
Engineering, University of North Texas. His research interests include high-speed imaging systems, image and
video processing, and 3D imaging and modeling.

You might also like