Adversarial Disentanglement Using Latent Classifier for Pose-Independent Representation

The large pose discrepancy is one of the critical
challenges in face recognition during video surveillance. Due to
the entanglement of pose attributes with identity information, the
conventional approaches for pose-independent representation lack
in providing quality results in recognizing largely posed faces. In
this paper, we propose a practical approach to disentangle the pose
attribute from the identity information followed by synthesis of a face
using a classifier network in latent space. The proposed approach
employs a modified generative adversarial network framework
consisting of an encoder-decoder structure embedded with a classifier
in manifold space for carrying out factorization on the latent
encoding. It can be further generalized to other face and non-face
attributes for real-life video frames containing faces with significant
attribute variations. Experimental results and comparison with state
of the art in the field prove that the learned representation of the
proposed approach synthesizes more compelling perceptual images
through a combination of adversarial and classification losses.




References:
[1] X. Chai, S. Shan, X. Chen, and W. Gao, “Locally linear regression
for pose-invariant face recognition,” IEEE Transactions on image
processing, vol. 16, no. 7, pp. 1716–1725, 2007. [2] C. Ding and D. Tao, “A comprehensive survey on pose-invariant face
recognition,” ACM Transactions on intelligent systems and technology
(TIST), vol. 7, no. 3, p. 37, 2016.
[3] X. Liu and T. Chen, “Pose-robust face recognition using geometry
assisted probabilistic modeling,” in 2005 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR’05),
vol. 1. IEEE, 2005, pp. 502–509.
[4] X. Liu, J. Rittscher, and T. Chen, “Optimal pose for face recognition,”
in 2006 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR’06), vol. 2. IEEE, 2006, pp. 1439–1446.
[5] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified
embedding for face recognition and clustering,” in Proceedings of the
IEEE conference on computer vision and pattern recognition, 2015, pp.
815–823.
[6] S. Sengupta, J.-C. Chen, C. Castillo, V. M. Patel, R. Chellappa, and
D. W. Jacobs, “Frontal to profile face verification in the wild,” in 2016
IEEE Winter Conference on Applications of Computer Vision (WACV).
IEEE, 2016, pp. 1–9.
[7] T. Hassner, S. Harel, E. Paz, and R. Enbar, “Effective face frontalization
in unconstrained images,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2015, pp. 4295–4304.
[8] M. Kan, S. Shan, H. Chang, and X. Chen, “Stacked progressive
auto-encoders (spae) for face recognition across poses,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
2014, pp. 1883–1890.
[9] C. Sagonas, Y. Panagakis, S. Zafeiriou, and M. Pantic, “Robust statistical
face frontalization,” in Proceedings of the IEEE international conference
on computer vision, 2015, pp. 3871–3879.
[10] J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, and J. Kim, “Rotating
your face using multi-task deep neural network,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, 2015,
pp. 676–684.
[11] X. Zhu, Z. Lei, J. Yan, D. Yi, and S. Z. Li, “High-fidelity pose
and expression normalization for face recognition in the wild,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2015, pp. 787–796.
[12] Z. Zhu, P. Luo, X. Wang, and X. Tang, “Multi-view perceptron: a deep
model for learning face identity and view representations,” in Advances
in Neural Information Processing Systems, 2014, pp. 217–225.
[13] O. M. Parkhi, A. Vedaldi, A. Zisserman et al., “Deep face recognition.”
in bmvc, vol. 1, no. 3, 2015, p. 6.
[14] C. Ding and D. Tao, “Robust face recognition via multimodal deep face
representation,” IEEE Transactions on Multimedia, vol. 17, no. 11, pp.
2049–2058, 2015.
[15] I. Masi, S. Rawls, G. Medioni, and P. Natarajan, “Pose-aware face
recognition in the wild,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2016, pp. 4838–4846.
[16] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
in Advances in neural information processing systems, 2014, pp.
2672–2680.
[17] S. Li, X. Liu, X. Chai, H. Zhang, S. Lao, and S. Shan, “Morphable
displacement field based image matching for face recognition across
pose,” in European conference on computer vision. Springer, 2012, pp.
102–115.
[18] J. Yang, S. E. Reed, M.-H. Yang, and H. Lee, “Weakly-supervised
disentangling with recurrent transformations for 3d view synthesis,”
in Advances in Neural Information Processing Systems, 2015, pp.
1099–1107.
[19] C. Ledig, L. Theis, F. Husz´ar, J. Caballero, A. Cunningham, A. Acosta,
A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single
image super-resolution using a generative adversarial network,” in
Proceedings of the IEEE conference on computer vision and pattern
recognition, 2017, pp. 4681–4690.
[20] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation
learning with deep convolutional generative adversarial networks,” arXiv
preprint arXiv:1511.06434, 2015.
[21] S. Sankaranarayanan, A. Alavi, C. D. Castillo, and R. Chellappa, “Triplet
probabilistic embedding for face verification and clustering,” in 2016
IEEE 8th international conference on biometrics theory, applications
and systems (BTAS). IEEE, 2016, pp. 1–8.
[22] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
review and new perspectives,” IEEE transactions on pattern analysis
and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
[23] F. J. Huang, Y.-L. Boureau, Y. LeCun et al., “Unsupervised learning
of invariant feature hierarchies with applications to object recognition,”
in 2007 IEEE conference on computer vision and pattern recognition.
IEEE, 2007, pp. 1–8.
[24] L. Tran, X. Yin, and X. Liu, “Disentangled representation learning
gan for pose-invariant face recognition,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2017, pp.
1415–1424.
[25] T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum,
“Deep convolutional inverse graphics network,” in Advances in neural
information processing systems, 2015, pp. 2539–2547.
[26] R. Huang, S. Zhang, T. Li, and R. He, “Beyond face rotation: Global
and local perception gan for photorealistic and identity preserving frontal
view synthesis,” in Proceedings of the IEEE International Conference
on Computer Vision, 2017, pp. 2439–2448.
[27] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”
arXiv preprint arXiv:1411.1784, 2014.
[28] H. Kwak and B.-T. Zhang, “Ways of conditioning generative adversarial
networks,” arXiv preprint arXiv:1611.01455, 2016.
[29] J. T. Springenberg, “Unsupervised and semi-supervised learning
with categorical generative adversarial networks,” arXiv preprint
arXiv:1511.06390, 2015.
[30] D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from
scratch,” arXiv preprint arXiv:1411.7923, 2014.
[31] Z. Zhu, P. Luo, X. Wang, and X. Tang, “Deep learning
identity-preserving face space,” in Proceedings of the IEEE International
Conference on Computer Vision, 2013, pp. 113–120.