Bidirectional Matching Prototypical Network For Few-Shot Image Classification

982 IEEE SIGNAL PROCESSING LETTERS, VOL.
29, 2022
Bidirectional Matching Prototypical Network for

Few-Shot Image Classification
Wen Fu , Li Zhou, and Jie Chen , Member, IEEE
Abstract—Few-shot image classification (FSIC) is the task of or inconsistent training-testing distribution [9]. Therefore, the
generalizing a model to unknown categories by learning from a problem of learning with few labeled samples called few-shot
small number of labeled samples of some given categories. Recently, learning has become a popular research direction [10]–[18].
metric-based approaches have received lots of attention for their
Metric-based few-shot image classification is an essential
simplicity and effectiveness, but they often only use support set
to generate inaccurate prototypes matching query set, ignoring direction of few-shot learning. In recent years, many models
the rich information contained in queries and the reversibility have been proposed [10], [11], [19], [20]. The matching net-
of the matching relationship between the two. In this letter, we work [10] combines meta-learning with a deep neural network
propose a new simple and effective metric-based method called to train a learnable nearest neighbor classifier. The prototypical
Bidirectional Matching Prototypical Network (BMPN), which has network [11] uses Euclidean distance to measure the distance
three innovations:1)It has an additional reverse matching process. between prototypes and queries to classify queries. Further, the
This process uses queries to generate more accurate prototypes
to improve the model’s performance while also forcing the model
relation network [19] trains an additional neural network to
to learn features far from the decision boundary to enhance gen- learn a nonlinear metric used to calculate the similarity between
eralization capabilities; 2)It has a lightweight coordinate attention prototypes and queries. However, these methods generally use a
feature extractor (CAFE). This module not only captures long-term limited number of support samples to generate inaccurate proto-
dependence along one spatial direction but also saving the accurate type matching queries, ignoring the rich information contained
position information of the other spatial direction, helping the in queries.
model to locate the region of interest more accurately; 3)It has Some work in few-shot image classification areas has recently
a joint loss function, including forward matching loss and reverse
matching loss, and a progressive weighting strategy is used in the attempted to use information from both support set and query
training process to balance the importance of the two. Our model is set [21]–[23]. GNN [23] uses supports and queries to construct
trained end-to-end, and the experimental results show that we have a graph model that transfers the distance metric from Euclidean
reached the most advanced performance on the two benchmark space to non-Euclidean space and uses a message-passing algo-
datasets. rithm to predict queries. TPN [21] uses transductive learning to
Index Terms—Bidirectional matching, deep learning, few-shot feed supports and queries into the network for training and uses
image classification, metric-based method. label-propagation algorithm to predict queries. Their common
idea is to transfer label information from the labeled support
set to the unlabeled query set. CAN [22] proposes a CAM
I. INTRODUCTION module, which generates a cross-attention map between sup-
EEP learning has developed rapidly in recent years. With ports and queries to highlight the target area and generate more
D the support of large amounts of data, more complex
networks, and powerful hardware, deep learning can achieve
discriminative features. Although these attempts have achieved
good performance gains, they have ignored that the matching
satisfactory results in various visual tasks [1]–[8]. However, relationship between support and query should be reversible.
sometimes the cost of labeling training images is very high, Inspired by PANet [24], which reuses the predictive mask as a
or some scarce samples are difficult to obtain, resulting in new segmentation annotation, we propose a new metric-based
few available training samples. In this case, the performance few-shot image classification method based on the prototypical
of the existing model is not ideal enough due to overfitting network called Bidirectional Matching Prototypical Network
(BMPN). The intuitive motivation of our work is that if the
support set can classify the query set well, then the query set
Manuscript received October 12, 2021; revised February 1, 2022; accepted must also be able to classify the support set well. Since there
February 14, 2022. Date of publication February 22, 2022; date of current version are more queries than supports, the model can generate more
April 26, 2022. This work was supported by the National Science Foundation
of China under Grant U1832217. The associate editor coordinating the review accurate prototypes. Meanwhile, due to the reverse matching
of this manuscript and approving it for publication was Prof. Mylene Q. Farias. process, the model will learn more general features, which means
(Corresponding author: Jie Chen.) that the learned features are far from the decision boundary,
Wen Fu is with the Institute of Microelectronics, Chinese Academy of Sci-
ences, Beijing 100029, China, and also with the University of Chinese Academy
and the model will be more general. Our main contributions are
of Sciences, Beijing 100049, China (e-mail: fuwen2020@ime.ac.cn). summarized as follows:
Li Zhou and Jie Chen are with the Institute of Microelectronics, Chinese 1) BMPN with an additional reverse matching process is
Academy of Sciences, Beijing 100029, China (e-mail: zhouli@ime.ac.cn; proposed. It can use the rich information in queries to
jchen@ime.ac.cn).
Digital Object Identifier 10.1109/LSP.2022.3152686 generate more accurate prototypes to improve the model’s
performance while forcing the model to learn features
1070-9908 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: TIANJIN UNIVERSITY. Downloaded on November 14,2022 at 10:18:19 UTC from IEEE Xplore. Restrictions apply.
FU et al.: BIDIRECTIONAL MATCHING PROTOTYPICAL NETWORK FOR FEW-SHOT IMAGE CLASSIFICATION 983
far from the decision boundary to improve generalization

ability.
2) A lightweight coordinate attention feature extractor
(CAFE) is proposed. It captures long-term dependence
along one spatial direction and saves accurate location
information along the other spatial direction, helping the
model locate the region of interest more accurately.
3) A joint loss function containing both forward and reverse
matching loss is proposed, and a progressive weighting
strategy is adopted to balance the importance of both in
the training stages to ensure the stability of training.
4) We have conducted enough experiments on the two bench-
mark datasets, and experimental results show that we have
reached the most advanced performance.
Fig. 1. Structure of our proposed BMPN. Operators and paths are illustrated
in the dashed box in the lower right corner. The black line represents the forward
II. METHODOLOGY matching paths, and the red line represents the reverse matching paths. After the
forward matching process, queries obtain the pseudo-label and generate reverse
In this section, the basic definition of few-shot image clas- prototypes to perform reverse matching.
sification task is introduced and the proposed BMPN model is
described in detail.
A. Problem Formulation
In the few-shot image classification setup, datasets are usually
divided into training set Dtrain and testing set Dtest (Dtrain ∩
Dtest = φ). Following [10], We employ the standard episode
strategies for both the training and the testing set. Specifically,
C(C ≥ 5) categories are randomly sampled from the dataset,
and K(K ≤ 5) samples are randomly selected from each cat-
egory to form a label support set S = {(xi , yi )}CK i=1 , referred
to as the C-way K-shot task. Then, N (N = 15) samples are
randomly selected from the same C categories to form an unseen
query set Q = {(x̂i , ŷi )}CN
i=1 . In the training phase, the model is
trained to minimize the prediction error of the query set in each
episode. In the testing phase, the generalization performance of
the model is tested on Dtest with the same settings.
B. Prototypical Network Fig. 2. (a) is the overall structure of CAFE, which have four CA block. (b) is
the composition of a CA block. The coordinate attention module decomposes the
The prototypical network averages the feature vectors of each 2-D global pooling into two 1-D average pooling along with X and Y directions
and generates attention by subsequent operation.
class in the support set as its corresponding prototype. It then
calculates the distance between the prototype and the query
through forward matching to achieve classification. For C-way
K-shot task, the c-th class prototype pc is calculated by the In the training stage, standard cross-entropy loss is used,
following formula: which is called forward loss in this paper:
1 1
K CN
pc = Fθ (xi ) (1) Lf orward = − log P (y = c | Q) (4)
K i=1 CN j=1
Where xi belongs to the c-th category in the support set,

which contains K images in total, and Fθ (·) represents the C. Bidirectional Matching
feature extractor. The feature extraction network of the pro- The BMPN proposed in this letter consists of a feature ex-
totypical network consists of four convolution blocks, each of tractor with coordinate attention, an additional reverse matching
which contains a 64-channel 3×3 convolution kernel, a batch process, and a progressive weighting strategy, as shown in Fig. 1.
normalization layer, a Relu layer, and a 2×2 Max-pooling layer. 1) Coordinate Attention Based Feature Exactor: The struc-
Given the distance function D(·), the prediction category of the ture of CAFE is shown in Fig. 2. Unlike the prototypical network,
query set can be expressed as: it consists of four CA blocks, each containing an additional
exp(−D (Fθ (Q), pc )) coordinate attention module [25]. Some traditional attention
P (y = c | Q) = C (2) modules, such as the SE module [26], use global pooling opera-
c=1 exp(−D (Fθ (Q), pc )) tions, which lose the essential position information of capturing
ŷi = arg max P (y = c | Q) (3) spatial structures in visual tasks. To solve this problem, we
c introduced the coordinate attention module to decompose the
984 IEEE SIGNAL PROCESSING LETTERS, VOL. 29, 2022
2-D global pooling into two 1-D average pooling operations,

Algorithm 1: Progressive Weight Strategy.
which not only captures the long-term dependence of one spatial
direction but also saves the precise position information in the Input: S = {(xi , yi )}CK
i=1 , Q = {(x̂i , ŷi )}i=1 , learning
CN
other spatial direction, helping the model locate the region of rate ξ, hyper-parameter α;
interest more accurately. Experimental results show that CAFE Output: model parameter θ;
effectively suppresses useless information and significantly im- 1: initialize θ;
proves performance and robustness of the model. 2: for iter in iterations do
Specifically, for feature map x with the size of C × H × W , 3: forward matching, labeling queries: {(x̂i , ŷi )}CN i=1 ;
1
CN
its operation through the coordinate attention module can be 4: Lf orward = − CN j=1 log P(y = c | Q);
expressed as: 5: reverse matching, labeling supports: {(xi , yi )}CK i=1 ;
1
CK
1 1 6: Lreverse = − CK i=1 log P(y = c | S);
uhc (h) = xc (h, i), uw
c (w) = xc (j, w) 7: λ = α × iterations
iter
;
W H
0≤i≤W 0≤j≤H 8: Ltotal = Lf orward + λLreverse ;
(5) 9: Updating model:θ ← θ − ξ · ∇θ Ltotal ;
10: end for
v = Relu BN H1 uh , u w
(6)
v h = H2 (v), v w = H3 (v) (7)
III. EXPERIMENTS
g h = Sigmoid(v h ), g w = Sigmoid(v w ) (8)
We evaluated our BMPN on the two few-shot image classifica-
where xc is the feature map of the c-th channel of the x, Hi (I = tion benchmarks and compared it with the most advanced meth-
1, 2, 3) is 1 × 1 convolution, uhc and uwc are the outputs of xc after ods available: MiniImageNet [10] and CUB-200-2011 [27].
1-D pooling in the X direction and Y direction, respectively. [·] is
C
the concatenation operation, v ∈ R r ×(H+W ) is the output after A. Dataset
nonlinear operation, r is the reduction ratio which is set to 32 in 1) Miniimagenet: It is a subset of ImageNet, containing
C C
this letter, and v h ∈ R r ×H and v w ∈ R r ×W are the partition 60,000 RGB images and 100 categories, with 600 images in
of v into two independent tensors along the spatial dimension. each category, and the size of each image is 84 × 84. Following
The final output of the feature map after CA module is: the settings in [13], we used 64, 16, and 20 randomly selected
classes for training, validation, and testing, respectively.
yc (i, j) = xc (i, j) × gch (i) × gcw (j) (c = 1 · · · C) (9)
2) CUB-200-2011: It is commonly used to evaluate fine-
2) Bidirectional Matching Process: The flow of the bidirec- grained classification and contains 11,788 images in 200 cat-
tional matching algorithm is shown in Fig. 1. The forward match- egories. Following the settings in [28], we use 100, 50, and 50
ing process is precisely the same as the prototypical network, and randomly selected classes for training, validation, and testing,
the model learns from the forward matching loss Lf orward . After respectively.
completing the forward matching process, we get the network’s
prediction for queries, which we call pseudo-label of queries. B. Implementation Details
We get the reverse prototypes by averaging the queries of the
We now conduct experiments on the most common 5-way
same pseudo-label. Assuming that forward matching is accurate
1-shot and 5-way 5-shot cases in few-shot image classification.
enough, adding a reverse matching process has the following two
Accuracy is evaluated as the performance metric. During the
advantages: 1) The reverse matching loss Lreverse will force the
training and testing phases, each class has 15 queries (75 in total).
model to learn features far from the decision boundary, making
Our model is trained end-to-end from scratch with 150,000
the model more generalized. 2) More queries than supports will
iterations. CAFE is used for the feature extraction network,
result in more accurate prototypes. In particular, both kinds of
and Adam [29] is used for the optimizer. The initial learning
losses are cross-entropy losses. The total loss of the model is
rate is set to 0.001, halved at 30,000, 50,000, and 100,000
Ltotal = Lf orward + λLreverse , λ is used to balance the two
iterations. All images are uniformly adjusted to 84 × 84 × 3
kinds of losses.
to input. We use standard data augment methods in the training
3) Progressive Weight Strategy: The reverse matching pro-
phase, including random crop, left-right flip, and color jitter.
cess assumes that the pseudo-label obtained by the forward
α is set to 4 in this letter. In the testing phase, to make the
matching is sufficiently accurate. Since our model is trained
evaluation more convincing, we conducted more than 600 tests
end-to-end, the predictions of queries are mostly wrong at the
with 95% confidence intervals to obtain the results. It’s worth
beginning. To solve this problem, we designed a progressive
emphasizing that we used the same setup for all datasets. Our
weight generation module, as shown in Algorithm 1. It can be
code is implemented in PyTorch [30] and runs on an NVIDIA
seen that at the beginning of training, the proportion of Lreverse
TITAN Xp GPU.
is almost zero, and the model is mainly supervised by forward
matching loss Lf orward . As the training progresses, the model’s
performance is improved, and the proportion of Lreverse is C. Result and Analysis
increasing. In the later stage of training, only Lreverse supervises 1) Result: Our backbone is CAFE, which consists of four
model training. α is the maximum value that λ can reach, set to layers of convolution blocks, the method with the same structure
4 in this letter. is selected for a fair comparison, and the baseline network is
FU et al.: BIDIRECTIONAL MATCHING PROTOTYPICAL NETWORK FOR FEW-SHOT IMAGE CLASSIFICATION 985
TABLE I TABLE III

RESULT ON MINIIMAGENET ABLATION STUDY ON MINIIMAGENET
TABLE II
RESULT ON CUB-200-2011
Fig. 3. Visualization of feature space. (a) is origin Prototipical Network(PN),

(b) is PN with CAFE module, and (c) is BMPN. Each color represents a class.
worth emphasizing that at this time, we default to the weight

the prototypical network. Tables I and II show the performance of the reverse loss remaining constant throughout the process,
comparison of BMPN and other methods on each dataset. On i.e., λ = 4. The progressive weight strategy allows the model
the miniImagenet dataset, the 1-shot and 5-shot performance to dynamically adjust the weights of the reverse matching loss
of BMPN are improved by about 8.8% and 6.7%, respectively, based on the training process. Compared with keeping a fixed
compared to the baseline. For transductive learning methods, weight, this strategy avoids the performance impairment caused
BMPN improved performance by about 4.2% and 6.5% in by the high early error rate and helps the model learn more
1-shot and 5-shot, respectively, compared to GNN. Furthermore, accurate information in queries; thus, the model’s performance
compared to TPN, the performance is improved by about 3.8% is further improved.
and 2.7% in 1-shot and 5-shot, respectively. Two aspects account 3) Visualization: We point out that BMPN has a charac-
for the differences. First, GNN and TPN use the plain Conv-4 teristic: It forces the model to learn features far away from
backbone. The extracted features contain noise such as back- the decision boundary and improves the model’s generalization
ground noise, resulting in compromised performance. Second, ability. T-SNE [32] was used to visualize the feature space on the
GNN and TPN do not have the reverse matching process, so the CUB dataset under 5-way 5-shot task to verify our statement.
prediction of queries depends on a limited number of supports, As shown in Fig. 3, the feature space formed by (a) is poorly
leading to a model that is easily overfitted to the supports. BMPN discriminative, and some categories are mixed. As shown in (b),
is at least 1.4% and 2.5% ahead of other methods, which verifies after adding the CAFE module, the separability is improved,
the effectiveness and superiority of our approach. On the CUB- but the decision boundary is still blurred. Our proposed (c) is
200-2011 dataset, the 1-shot and 5-shot performance of BMPN more compact between classes and significantly enhances the
are improved by 16% and 5.5%, respectively, compared with the separability between different categories, proving our views’
baseline. Compared with other methods, there are also apparent correctness.
improvements. The main reason is that the image background
of the CUB dataset is clearer than the miniImagenet dataset.
IV. CONCLUSION
CAFE module can locate objects of interest well and sup-
press useless information, thus significantly improving model In this letter, we proposed a new metric-based few-shot image
performance. classification method. We forced the network to learn features
2) Analysis: We now conduct an ablation study on MiniIma- far away from the decision boundary by adding a reverse match-
geNet to explore the effectiveness of the proposed components. ing process, enhancing the model’s generalization ability and
As shown in Table III, model’s performance is substantially performance. We designed a progressive weighting strategy to
improved with the CAFE module, which proves that the CAFE balance the ratio of forward matching loss and reverse matching
module enables the model to extract features that are more loss during training to ensure training stability. In addition, we
helpful for classification. The SE module brings less gain than also developed a feature extractor with coordinate attention to
the CAFE module due to global pooling, which causes the loss of help the model extract the region of interest, which signifi-
location information. The performance is further improved with cantly improved the model’s performance. Many experiments on
the reverse matching process, proving that the reverse matching benchmark datasets show that our proposed method is superior
process can effectively utilize the information in queries. It is to other advanced few-shot classification methods.
986 IEEE SIGNAL PROCESSING LETTERS, VOL. 29, 2022
REFERENCES [16] G. S. Dhillon, P. Chaudhari, A. Ravichandran, and S. Soatto, “A baseline

for few-shot image classification,” arXiv, vol. abs/1909.02729, 2020.
[1] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar- [17] C. Simon, P. Koniusz, R. Nock, and M. T. Harandi, “Adaptive subspaces
chies for accurate object detection and semantic segmentation,” in Proc. for few-shot learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580–587. 2020, pp. 4136–4145.
[2] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real- [18] C. Xiong, W. Li, Y. Liu, and M. Wang, “Multi-dimensional edge features
time object detection with region proposal networks,” IEEE Trans. Pattern graph neural network on few-shot image classification,” IEEE Signal
Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2016. Process. Lett., vol. 28, pp. 573–577, 2021.
[3] J. Donahue et al., “Long-term recurrent convolutional networks for visual [19] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. S. Torr, and T. M. Hospedales,
recognition and description,” in Proc. IEEE Conf. Comput. Vis. Pattern “Learning to compare: Relation network for few-shot learning,” in Proc.
Recognit., 2015, pp. 2625–2634. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1199–1208.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image [20] W. Li, L. Wang, J. Xu, J. Huo, Y. Gao, and J. Luo, “Revisiting local
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), descriptor based image-to-class measure for few-shot learning,” in Proc.
2016, pp. 770–778. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7260–7268.
[5] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only look [21] Y. Liu et al., “Learning to propagate labels: Transductive propagation
once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. network for few-shot learning,” in Proc. Int. Conf. Learn. Representations,
Vis. Pattern Recognit., 2016, pp. 779–788. 2019.
[6] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolu- [22] R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen, “Cross attention network
tional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., for few-shot classification,” in Proc. Adv. Neural Inform. Process. Syst.,
2017, pp. 4700–4708. 2019.
[7] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for [23] V. G. Satorras and J. Bruna, “Few-shot learning with graph neural net-
semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, works,” arXiv, vol. abs/1711.04043, 2018.
no. 4, pp. 640–651, Apr. 2017. [24] K. Wang, J. H. Liew, Y. Zou, D. Zhou, and J. Feng, “Panet: Few-shot image
[8] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality semantic segmentation with prototype alignment,” in Proc. IEEE/CVF
object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Conf. Comput. Vis. Pattern Recognit, 2019, pp. 9197–9206.
2018, pp. 6154–6162. [25] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile
[9] W.-Y. Chen, Y.-C. Liu, Z. Kira, Y. Wang, and J.-B. Huang, “A closer look network design,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
at few-shot classification,” arXiv, vol. abs/1904.04232, 2019. 2021, pp. 13713–13722.
[10] O. Vinyals, C. Blundell, T. P. Lillicrap, K. Kavukcuoglu, and D. Wierstra, [26] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation
“Matching networks for one shot learning,” in Proc. Adv. Neural Inform. networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 8,
Process. Syst., 2016. pp. 2011–2023, Aug. 2019.
[11] J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot [27] C. Wah, S. Branson, P. Welinder, P. Perona, and S. J. Belongie, “The
learning,” in Proc. Adv. Neural Inform. Process. Syst., 2017. Caltech-UCSD birds-200-2011 dataset,” 2011.
[12] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast [28] N. Hilliard, L. Phillips, S. Howland, A. Yankov, C. D. Corley, and N. O.
adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, Hodas, “Few-shot learning with metric-agnostic conditional embeddings,”
pp. 1126–1135. arXiv, vol. abs/1802.04376, 2018.
[13] S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” [29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
in Proc. Int. Conf. Learn. Representations, 2017. CoRR, vol. abs/1412.6980, 2015.
[14] Q. Sun, X. Li, Y. Liu, S. Zheng, T.-S. Chua, and B. Schiele, “Learning to [30] A. Paszke et al., “Automatic differentiation in pytorch,” 2017.
self-train for semi-supervised few-shot classification,” in Proc. Adv. Neural [31] L. Bertinetto, J. F. Henriques, P. H. S. Torr, and A. Vedaldi, “Meta-learning
Inform. Process. Syst., 2019. with differentiable closed-form solvers,” arXiv, vol. abs/1805.08136,
[15] Y. Qin, W. Zhang, Z. Wang, C. Zhao, and J. Shi, “Layer-wise adaptive 2019.
updating for few-shot image classification,” IEEE Signal Process. Lett., [32] L. van der Maaten and G. E. Hinton, “Visualizing data using T-SNE,” J.
vol. 27, pp. 2044–2048, 2020. Mach. Learn. Res, vol. 9, pp. 2579–2605, 2008.

Bidirectional Matching Prototypical Network For Few-Shot Image Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bidirectional Matching Prototypical Network For Few-Shot Image Classification

Uploaded by

Copyright:

Available Formats

982 IEEE SIGNAL PROCESSING LETTERS, VOL.

Bidirectional Matching Prototypical Network for

far from the decision boundary to improve generalization

Where xi belongs to the c-th category in the support set,

2-D global pooling into two 1-D average pooling operations,

TABLE I TABLE III

Fig. 3. Visualization of feature space. (a) is origin Prototipical Network(PN),

worth emphasizing that at this time, we default to the weight

REFERENCES [16] G. S. Dhillon, P. Chaudhari, A. Ravichandran, and S. Soatto, “A baseline

You might also like