You are on page 1of 1

NCC-Net: Normalized Cross Correlation Based Deep Matcher with Robustness to Illumination Variations

∗ ∗
Arulkumar Subramaniam , Prashanth Balasubramanian , Anurag Mittal
Indian Institute of Technology Madras

Objectives Siamese (Siam-NCC-Net) architecture Results


60
Illumination experiment - II

FC (2400, 500)
13 11
Problem definition : Patch Matching aims to find correspondences of localized, textured regions which
64 30
26
13 5 Table: FPR95 scores of the proposed models and the baselines. These being False Positive Rates (FPR),
(Natural illumination variation dataset)

26
60

30

13
NCC(5, 5) C(96, 1, 1) C(96, 3, 1)

13

11
lower their values, better is their performance. Testbed: UBC Patches dataset. Color coding : best,

5
C(32, 5, 1) C(96, 5, 1)

64
are usually centered at distinctive keypoints. The matching has to be robust across 64x64 + 60x60
M(2, 2)
30x30 + 26x26
M(2, 2)
13x13
+
RELU 13x13
+
RELU 13x13
+
RELU 11x11
M(2, 2)
2400

many geometric and photometric changes.


x1 RELU x32 x32 RELU x96 x96 x2400 x96 x96 same second best.
Contributions : Shared weigths
Train dataset Liberty Notredame Yosemite
mean
1 Exploration of two Deep Convolutional Neural Network (DCNN) architectures 64 60
26 13 11 Test dataset Notredame Yosemite Liberty Yosemite Liberty Notredame

FC (2400, 500)
30 13 5
different
along with the combination of two robust matching layers to overcome

26
Siam-NCC-Net(ours) 1.25 2.03 3.87 1.86 5.16 1.8 2.66

60

11
C(32, 5, 1) CIN(5)

13
C(96, 5, 1) C(96, 1, 1) C(96, 3, 1)

13
30

5
64
+ M(2, 2) + M(2, 2) + + + M(2, 2) Softmax
illumination variation challenges: 64x64 RELU 60x60 30x30
RELU
26x26 13x13 RELU 13x13 RELU 13x13 RELU 11x11 2400
layer
Siam-w/oMP2-NCC-Net(ours) 1.14 2.30 4.02 2.34 4.71 1.81 2.72
x1 x32 x32 x96 x96 x4800 x96 x96
CS-NCC-Net(ours) 1.24 3.09 5.99 4.22 6.54 2.06 3.86
• Siamese and Center-surround architectures
Figure: Siam-NCC-Net architecture. Here, CS-w/oMP2-NCC-Net(ours) 1.17 2.19 4.28 2.30 4.81 1.7 2.74
• Normalized correlation and Cross-input neighborhood matching layers
C(N,m,s) = Convolutional layer of N filters with filter size m × m and a stride of s pixels, L2-Net (Tian et al (CVPR-2017)))[3] 0.56 2.07 1.71 1.76 3.87 1.09 1.84
2 Empirical evaluation of the proposed architectures for resilience with respect to DeepCD (Yang et. al,(CVPR-2017)) 2.59 7.03 5.85 6.69 7.82 2.95 5.49
M(n,s) = max-pooling layer with receptive field of n × n and a stride of s pixels,
illumination variations on two experimental setups: 2ch-CS stream GLoss (Kumar et al (CVPR-2016)) 0.77 3.09 3.69 2.67 4.91 1.14 2.71
NCC(n, m) = NCC layer matching an n × n support region centered at any pixel with its m × m search space,
• manual variation of the pixel intensities of the patches in the UBC Patches 2ch-CS stream (Zagoruyko et al.(CVPR-2015)) 1.9 4.75 4.55 4.1 7.2 2.11 4.10 Figure: Typical natural illumination variations noticed in the Webcam dataset (category: Mexico). Notice the
CIN(m) = CIN layer with a search space of size m × m, Siamese GLoss (Kumar et al (CVPR-2016)) 1.84 6.61 6.39 5.57 8.43 2.83 5.28
dataset extreme change in illuminations as some portions are under-saturated while others are over-saturated. Figure
FC(n, m) = Fully-connected layer with n inputs and m outputs TFeat (Balntas et. al (BMVC-2016)) 3.12 7.82 7.22 7.08 9.79 3.85 6.48
• natural illumination changes from the real-world, using a new dataset of best viewed in color on a display device.
PNNet (Balntas et. al) 3.71 8.99 8.13 7.1 9.65 4.23 6.97
patches collected from publicly available Webcam dataset
Siam CS-stream (Zagoruyko et al.(CVPR-2015)) 3.05 9.02 6.45 10.45 11.51 5.29 7.63
3 Ablation study to evaluate individual matching layers, illumination based Central-Surround (CS-NCC-Net) architecture MatchNet (Serra et. al (ICCV-2015)) 4.75 13.58 8.84 11.00 13.02 7.7 9.81
augmentation while training of the models CENTRAL STREAM
Siam (Zagoruyko et al.(CVPR-2015)) 4.33 14.89 8.77 13.23 13.48 5.75 10.07
32
VGG-Convex (Simonyan et. al (PAMI-2014)) 7.52 11.63 12.88 10.54 14.82 7.11 10.75
8 8
Illumination experiment-II results

FC (1536, 256)
16 4
16 8

16
32

16
NCC(5, 5) C(96, 1, 1) C(96, 3, 1)

4
8
C(32, 5, 1)

8
C(96, 5, 1)
M(2, 2) M(2, 2) + + + M(2, 2)
+ 32x32 + 8x8 8x8 8x8 1536
16x16 16x16 8x8 RELU RELU RELU
RELU x32 x32 RELU x96 x96 x2400 x96 x96 Table: Results on the newly collected Webcam dataset. FPR95 scores of the proposed models and the
Some prior approaches baselines. These being False Positive Rates (FPR), lower their values, better are their performances.

1
Shared weigths

2x
Illumination experiment - I (Manual illumination variation) Datasets trained from: Liberty, Notredame, Yosemite. Scores obtained from 262, 152 test patch-pairs of this

x3
32

32
16 8 8

FC (1536, 256)
16 8 4
Zagoruyko et al.(CVPR-2015) : Deepcompare : proposed and evaluated several architectures such as newly collected Webcam dataset.

16
32
64 C(32, 5, 1) C(96, 5, 1) CIN(5) C(96, 1, 1) C(96, 3, 1)

16

4
8

8
+ M(2, 2) + M(2, 2) + + + M(2, 2)
Siamese (Siam, 2-Stream), Central-surround (CS), 2-Channel etc.,) for RELU 32x32 16x16
x32
RELU
16x16
x96
8x8 RELU 8x8 RELU 8x8 RELU 8x8 1536 same

64
x32 x96 x4800 x96 x96
Trained dataset Liberty Notredame Yosemite

x1
the task of Patch matching

2
x3
Siam-NCC-Net(ours) 9.67 16.68 11.67

32
Balntas et. al (BMVC-2016) : TFeat, PN-Net : proposed a triplet-based loss function and aimed to 64 SURROUND STREAM
32
increase the speed of descriptor computation and decrease its memory 32
Siam-w/oMP2-NCC-Net(ours) 9.45 12.56 10.56

FC (1536, 256)
16 8 8
x3 16 8 4
2x

32
64
1

16
16
NCC(5, 5) C(96, 1, 1) C(96, 3, 1)
footprint

4
8
C(32, 5, 1)

8
C(96, 5, 1)
+ 32x32
M(2, 2)
16x16 + 16x16
M(2, 2)
8x8
+
RELU 8x8
+
RELU 8x8
+
RELU 8x8
M(2, 2)
1536
different
CS-NCC-Net(ours) 11.34 23.04 15.40
Kumar et al (CVPR-2016) : GLoss : global loss that aims to decrease the intra-class variances and RELU x32 x32 RELU x96 x96 x2400 x96 x96
Softmax
layer CS-w/oMP2-NCC-Net(ours) 11.35 19.30 18.28
increase the inter-class distance

R
ES
Shared weigths

32
L2 Net 9.67 9.21 16.11

IZ
x3
Tian et al (CVPR-2017) : L2-Net : learn descriptors in Euclidean space with additional supervi-

E
2x
32
16 8 8

FC (1536, 256)
16 8 4
sion on the intermediate feature maps 2ch-CS-stream GLoss 12.31 20.01 14.76

32

16
C(32, 5, 1) C(96, 5, 1) CIN(5) C(96, 1, 1) C(96, 3, 1)

16

4
8

8
+ M(2, 2) + M(2, 2) + + + M(2, 2)
16x16 16x16 8x8 8x8 8x8 8x8 1536
RELU 32x32
x32 x32
RELU
x96 x96
RELU
x4800
RELU
x96
RELU
x96 2ch-CS stream 12.31 17.84 19.5
U(8) U(6) U(3) U(0) O(3) O(6) O(8) Siam 31.45 28.21 35.21
Robust matching layers Figure: CS-NCC-Net architecture with NCC and CIN matching layers Figure: Each column shows the varying degree of intensity change introduced to a corresponding pair from Siam-CS stream 27.08 32.17 34.65
UBC Patches dataset. Here the illumination change is induced by the models,
Normalized Cross Correlation(NCC) layer: NCC is a statistical measure of the tendency of two (N −i)∗I(x,y) (N −i)∗I(x,y) iE
U (i) : Ii(x, y) = N , O(i) : Ii (x, y) = N + N . Here, U = under-saturation region, O =
signals to vary linearly with each other. It is given by over-saturation region
N
Illustration of Normalized correlation matching layer
1 X(Xi − µX )(Yi − µY )
N CC(X, Y ) = (1)
N − 1 i=1 σX σY Feature map P Feature map Q Ablation study-I Ablation study-II
where N = number of samples, (µX , µY ) = mean and (σX , σY ) = unbiased standard deviations of signals X P(X,Y)

and Y . NCC is well-known to alleviate amplification variation while matching signals. A differentiable layer Illumination experiment - I results Table: FPR95 scores when using the individual Table: FPR95 scores of DeepCompare[5] networks
P(X,Y) matching layers in Siam-w/oMP2-NCC-Net on retrained on illumination specific augmentation for
based on NCC[1] within the CNN framework is proposed to match the extracted CNN features from the given NCC of
train :liberty −> test :yosemite UBC patches dataset[4]. . Here, L : Liberty, N : UBC Patches dataset[4]. Here, L : Liberty, N :
patches. P(X,Y)
100
P(X, Y) Notredame, Y : Yosemite Notredame, Y : Yosemite
Cross-input Neighborhood(CIN) layer: Though the patches are expected to have sufficient tex- Q(X,Y)
SIFT (Lowe et al)
tures, there may arise some pathological cases which lack them. In such cases, NCC is unreliable as 90 2ch−CS stream GLoss (Kumar et al)
Train Liberty Notredame Yosemeite Train L N Y
σX or σY vanishes . As a remedy, it is fused with the Cross-input Neighborhood(CIN)(Ahmed et. al, 2ch−CS stream (Zagoruyko et al) Test N Y L Y L N Test N Y L Y L N
Siam (Zagoruyko et al) NCC 1.23 1.78 4.17 2.24 4.98 2.00 2ch-CS stream+ 1.82 3.73 2.85 2.56 5.99 1.34
CVPR 2015 [2]) layer. CIN builds “difference maps” between every pixel of a feature map 80
Siam−CS stream (Zagoruyko et al) CIN 1.87 5.40 5.27 4.55 7.34 3.11 Siam CS-stream+ 3.56 9.29 6.46 9.56 11.53 5.47
and a neighborhood window (5 × 5) of its corresponding feature map to help the network learn L2−Net (Tian et al)
Both 1.14 2.30 4.02 2.34 4.71 1.81 Siam+ 6.59 11.92 7.98 12.07 13.43 8.36
70 Siamese NCC−Net (ours)
discriminative features from absolute pixel differences.

FPR at TPR−95%
25

Siamese −1Maxpool NCC−Net (ours)


CS NCC−Net (ours)
60 CS −1Maxpool NCC−Net (ours)

Training particulars References


)
,Y

50
(X
C

[1] A. Subramaniam, M. Chatterjee, and A. Mittal, “Deep neural networks with inexact matching for person re-identification,” in
C

• Training with UBC patches dataset


N

40 Advances in Neural Information Processing Systems 29 (D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett,
(Three partitions of the dataset: Liberty, Notredame, Yosemite - 500K pairs of same/different patches) eds.), pp. 2667–2675, Curran Associates, Inc., 2016.
• Given a pair of patches, each of the proposed model outputs the probability that the patches are same or 30
[2] E. Ahmed, M. Jones, and T. K. Marks, “An improved deep learning architecture for person re-identification,” in The IEEE
different using a softmax layer. Conference on Computer Vision and Pattern Recognition, June 2015.
20
• The standard cross-entropy loss is used for training. It is given by [3] Y. Tian, B. Fan, and F. Wu, “L2-net: Deep learning of discriminative patch descriptor in euclidean space,” in Conference on
Figure: Illustrations of support regions. Using NCC, a support region ωP (x,y) on the left is compared with Computer Vision and Pattern Recognition (CVPR), vol. 2, 2017.
N
1 X multiple, neighboring support regions ωQ(.,.) on the right. Each ωQ(.,.), on the right, is centered at a green pixel. 10
L=− (ti log pi + (1 − ti) log(1 − pi)) (2) [4] M. Brown, G. Hua, and S. Winder, “Discriminative learning of local image descriptors,” IEEE Transactions on Pattern Analysis
N i=0 Comparing a support region with multiple neighboring regions, as shown here, helps the network to learn the and Machine Intelligence, vol. 33, pp. 43–57, Jan 2011.
0
patterns from a large neighborhood. The NCC between P and Q results in an NCC feature map, shown in the U(10) U(8) U(6) U(4) U(2) U(0)=O(0) O(2) O(4) O(6) O(8) O(10)
[5] S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in The IEEE
bottom row. Illumination variation in %
Conference on Computer Vision and Pattern Recognition, June 2015.
Code: https://github.com/InnovArul/patchmatch_normxcorr Figure: Impact of changing pixel intensities on the matching performance
* equal contribution

You might also like