Optimizing the Capacity of a Convolutional Neural Network for Image Segmentation and Pattern Recognition

In this paper, we study the factors which determine the capacity of a Convolutional Neural Network (CNN) model and propose the ways to evaluate and adjust the capacity of a CNN model for best matching to a specific pattern recognition task. Firstly, a scheme is proposed to adjust the number of independent functional units within a CNN model to make it be better fitted to a task. Secondly, the number of independent functional units in the capsule network is adjusted to fit it to the training dataset. Thirdly, a method based on Bayesian GAN is proposed to enrich the variances in the current dataset to increase its complexity. Experimental results on the PASCAL VOC 2010 Person Part dataset and the MNIST dataset show that, in both conventional CNN models and capsule networks, the number of independent functional units is an important factor that determines the capacity of a network model. By adjusting the number of functional units, the capacity of a model can better match the complexity of a dataset.




References:
[1] V. N. Vapnik, and A. Ya. Chervonenkis, "On the uniform convergence of relative frequencies of events to their probabilities," Theory of Probability and Its Applications, vol. 16, no. 2, pp. 264-280, 1971.
[2] V. N. Vapnik, Estimation of Dependences Based on Empirical Data. New York: Springer-Verlag, 1982, vol. 40.
[3] Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth, "Learnability and the Vapnik–Chervonenkis dimension," Journal of the ACM (JACM), vol. 36, no. 4, pp. 929-965, Oct. 1989.
[4] V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer science & business media, 2013.
[5] Todd K. Moon, "The expectation-maximization algorithm.," IEEE Signal processing magazine, vol. 13, no. 6, pp. 47-60, 1996.
[6] Under review, "Matrix Capsules with EM routing," in International Conference on Learning Representations, 2018.
[7] Y. Jiang, and Z. Chi, "A Fully-Convolutional Framework for Semantic Segmentation," in International Conference on Digital Image Computing: Techniques and Applications, 2017.
[8] Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., and Yuille, A., "Detect what you can: Detecting and representing objects using holistic models and body parts," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1971-1978.
[9] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and L. Alan, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1-1, April 2017.
[10] Li Deng, "The MNIST database of handwritten digit images for machine learning research," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141-142, 2012.
[11] Nelson, M. L. Collins, and M. Luciana, Handbook of developmental cognitive neuroscience.: MIT Press, 2001.
[12] W. Huitt, and J. Hummel., "Piaget’s theory of cognitive development," Educational psychology interactive, vol. 3, no. 2, pp. 1-5, 2003.
[13] Sigaud, O., and Droniou, A., "Towards deep developmental learning.," IEEE Transactions on Cognitive and Developmental Systems, vol. 8, no. 2, pp. 99-114, 2016.
[14] S., Thrun, "Lifelong learning algorithms.," Learning to learn, pp. 181-209, 1998.
[15] T. M. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, and J. Welling, "Never-ending learning," in AAAI, 2015, pp. 2302-2310.
[16] Tessler, C., Givony, S., Zahavy, T., Mankowitz, D. J., and Mannor, S., "A Deep Hierarchical Approach to Lifelong Learning in Minecraft.," in AAAI, 2017, pp. 1553-1561.
[17] Pickett, M., Al-Rfou, R, Shao, L., and Tar, C., "A Growing Long-term Episodic & Semantic Memory.," in arXiv preprint arXiv: 2016, p. 1610.06402.
[18] Guo, Y., Yao, A., and Chen, Y., "Dynamic network surgery for efficient dnns.," in Annual Conference on Neural Information Processing Systems, 2016, pp. 1379-1387.
[19] Zhang, X., Zhou, X., Lin, M., and Sun, J., "Shufflenet: An extremely efficient convolutional neural network for mobile devices.," in arXiv preprint arXiv, 2017, p. 1707.01083.
[20] Veniat, T., and Denoyer, L., "Learning time-efficient deep architectures with budgeted super networks." in arXiv preprint arXiv, 2017, p. 1706.00046.
[21] Xie, L., and Yuille, A., "Genetic CNN," in arXiv preprint arXiv, 2017, p. 1703.01513.
[22] S. Yunus and G. W. Andrew, "Bayesian GAN," in Conference on Neural Information Processing Systems, 2017.
[23] Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., and Zisserman, A., "The pascal visual object classes challenge a retrospective," International journal of computer vision, vol. 111, no. 1, pp. 198-136, 2015.
[24] Chen, L.C., Yang, Y., Wang, J., Xu, W., and Yuille, A.L., "Attention to scale: Scale-aware semantic image segmentation," in Proc. IEEE CVPR, Jun., 2016, pp. 3640 - 3649.
[25] Xia, F., Wang, P., Chen, L.C., and Yuille, A.L., "Zoom better to see clearer: Human part segmentation with auto zoom net," in European conference on Computer Vision, 2016, pp. 648-663.
[26] Liang, X., Shen, X., Xiang, D., Feng, J., Lin, L., and Yan, S., "Semantic object parsing with local-global long short-term memory," in Proc. IEEE CVPR, Jun., 2016, pp. 3185-3193.
[27] Liang, X., Shen, X., Feng, J., Lin, L., and Yan, S., "Semantic object parsing with graph lstm," in Proc. ECCV, Oct., 2016, pp. 125-143.
[28] Chen, Liang-Chieh, George Papandreou, Florian Schroff, and Hartwig Adam, "Rethinking atrous convolution for semantic image segmentation.," in arXiv preprint arXiv: 2017, p. 1706.05587.
[29] Oliveira, G.L., Valada, A., Bollen, C., Burgard, W., and Brox, T., "Deep Learning for human part discovery in images," in Proc. IEEE ICRA, May., 2016, pp. 1634-1641.
[30] Maaten, Laurens van der, and Geoffrey Hinton, "Visualizing data using t-SNE," Journal of machine learning research, vol. 9, pp. 2579-2605, 2008.