Adaptive Few-Shot Deep Metric Learning

Currently the most prevalent deep learning methods require a large amount of data for training, whereas few-shot learning tries to learn a model from limited data without extensive retraining. In this paper, we present a loss function based on triplet loss for solving few-shot problem using metric based learning. Instead of setting the margin distance in triplet loss as a constant number empirically, we propose an adaptive margin distance strategy to obtain the appropriate margin distance automatically. We implement the strategy in the deep siamese network for deep metric embedding, by utilizing an optimization approach by penalizing the worst case and rewarding the best. Our experiments on image recognition and co-segmentation model demonstrate that using our proposed triplet loss with adaptive margin distance can significantly improve the performance.

[1] S. Banerjee, A. Hati, S. Chaudhuri, and R. Velmuru-ga, “ Cosegnet: Image
co-segmentation using a conditional siamese convolutional network,” in
IJCAI, pages 673–679,2019.
[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, ”icoseg:
Interactive co-segmentation with intelligent scribble guidance,” In 2010
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. pages 3169–3176.IEEE, 2010.
[3] S. Changpinyo, W.-L. Chao, and F. Sha., “Predicting visual exemplars of
unseen classes for zero-shot learning,” IEEE international conference on
computer vision, pages 3476–3485, 2017.
[4] E. Craeymeersch., “One-shot learning, siamese networks and triplet loss
with keras,”, 2019.
[5] C. Ding and D. Tao, “Trunk-branch ensemble convolutional neural
networks for video-based face recognition,” IEEE Transactions on Pattern
Analysis and Machine Intelligence. 40(4):1002–1014, 2018.
[6] C. B. Do and S. Batzoglou, “What is the expectation maximization
algorithm?” Nature biotechnology 26(8):897–899, 2008.
[7] A. Dominguez-Sanchez, M. Cazorla, and S. Cogdell. ”Orts-Escolano.
Pedestrian movement direction recognition using convolutional neural
networks,” IEEE Transactions on Intelligent Transportation Systems,
18(12):3540–3548, 2017.
[8] Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. ”Transductive multi-view
zero-shot learning.” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 37(11):2332– 2345, 2015.
[9] W. Ge.”Deep metric learning with hierarchical triplet loss.” European
Conference on Computer Vision (ECCV), pages 269–285, 2018.
[10] E. Hoffer and N. Ailon. ”Deep metric learning using triplet network”. In
International Workshop on Similarity-Based Pattern Recognition, pages
84–92. Springer, 2015.
[11] L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris,
R. Giryes, and A. M. Bronstein. ”Repmet: Representative-based metric
learning for classification and few-shot object detection.” IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), pages
5192–5201, 2019.
[12] G. Koch, R. Zemel, and R. Salakhutdinov. ”Siamese neural networks for
one-shot image recognition”. In ICML deep learning workshop, volume
2. Lille, 2015. [13] E. Kodirov, T. Xiang, and S. Gong. ”Semantic autoencoder for zero-shot
learning.” IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 4447– 4456, 2017.
[14] C. H. Lampert, H. Nickisch, and S. Harmeling. ”Learning to detect
unseen object classes by between-class attribute transfer.” IEEE
Conference on Computer Vision and Pattern Recognition, pages 951–958,
[15] Y. Li, D. Wang, H. Hu, Y. Lin, and Y. Zhuang. ”Zero-shot recognition
using dual visual-semantic mapping paths”. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages
3279–3287, 2017.
[16] Li Fe-Fei, Fergus, and Perona. ”A bayesian approach to unsupervised
one-shot learning of object categories.” In Proceedings Ninth IEEE
International Conference on Computer Vision, pages 1134–1141 vol.2,
[17] N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. ”A simple neural
attentive meta-learner”. 2017.
[18] P. M¨uller. ”Model-agnostic meta-learning (maml) for fast adaptation of
deep networks.”
[19] F. Schroff, D. Kalenichenko, and J. Philbin. ”Facenet: A unified
embedding for face recognition and clustering.” In Proceedings of the
IEEE conference on computer vision and pattern recognition, pages
815–823, 2015.
[20] J. Snell, K. Swersky, and R. S. Zemel. ”Prototypical networks for
few-shot learning”. 2017.
[21] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T.
M. Hospedales.”Learning to compare: Relation network for few-shot
learning.” In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 1199– 1208, 2018.
[22] T.Munkhdalai and H. Yu. ”Meta networks.” In 2017 Proceedings of
machine learning research, pages 2554–2563, 2017.
[23] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. ”Matching
networks for one shot learning.” In Advances in neural information
processing systems, pages 3630–3638, 2016.
[24] X. Wang, F. Yu, R. Wang, T. Darrell, and J. E. Gonzalez. ”Tafe-net:
Task-aware feature embeddings for low shot learning.” In 2019 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[25] Y. Wang, X.-M. Wu, Q. Li, J. Gu, W. Xiang, L. Zhang, and V. O. Li.
”Large margin few-shot learning.” 2018.
[26] M. Zhu, D. Shi, M. Zheng, and M. Sadiq. ”Robust facial landmark
detection via occlusion-adaptive deep networks.” In 2019 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), pages
3481–3491, 2019.