The Relationship between Representational Conflicts, Generalization, and Encoding Requirements in an Instance Memory Network

This paper aims to provide an interpretation of artificial neural networks (ANNs) and explore some of its implications. The interpretation views ANNs as a memory which encodes instances of experience. An experiment explores the behavior of encoding and retrieval of instances from memory. A localised representation ANN is created that allows control over encoding and retrieved memory sample size and is experimented with using the MNIST digits dataset. The relationship between input familiarity, conflict within retrieved samples, and error rates is described and demonstrated to be an effective driver for memory encoding. Results indicate that selective encoding and retrieval samples that allow detection of memory conflicts produce optimal performance, and that error rates are normally distributed with input familiarity and conflict. By using input familiarity and sample consistency to guide memory encoding, the number of encoding trials on the dataset were reduced to 18.33% of the training data while maintaining good recognition performance on the test data.





References:
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
no. 7553, pp. 436–444, 2015.
[2] G. E. Hinton, D. E. Rumelhart, and J. L. McClelland, Distributed
Representations. MITP, 1986, pp. 77–109.
[3] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
[4] M. M. Botvinick, T. S. Braver, D. M. Barch, C. S. Carter, and
J. D. Cohen, “Conflict monitoring and cognitive control,” Psychological
Review, vol. 108, no. 3, pp. 624–652, 2001.
[5] D. Kumaran, D. Hassabis, and J. L. McClelland, “What learning systems
do intelligent agents need? complementary learning systems theory
updated,” Trends in Cognitive Sciences, vol. 20, no. 7, pp. 512–534,
2016.
[6] J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, “Why there
are complementary learning systems in the hippocampus and neocortex:
Insights from the successes and failures of connectionist models of
learning and memory,” Psychological Review, vol. 102, no. 3, pp.
419–457, 1995.
[7] M. M. Botvinick, S. Ritter, J. X. Wang, Z. Kurth-Nelson, C. Blundell,
and D. Hassabis, “Reinforcement learning, fast and slow,” Trends in
Cognitive Sciences, vol. 23, no. 5, pp. 408–422, 2019.
[8] J. L. McClelland and D. E. Rumelhart, “Distributed memory and
the representation of general and specific information,” Journal of
Experimental Psychology: General, vol. 114, no. 2, pp. 159–188, 1985.
[9] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals,
“Understanding deep learning requires rethinking generalization,” arXiv
preprint arXiv:1611.03530, 2016.
[10] ——, “Understanding deep learning (still) requires rethinking
generalization,” Commun. ACM, vol. 64, no. 3, p. 107–115, 2021.
[11] D. Arpit, S. Jastrzebski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal,
T. Maharaj, A. Fischer, A. Courville, and Y. Bengio, “A closer look
at memorization in deep networks,” in International Conference on
Machine Learning. PMLR, Conference Proceedings, pp. 233–242.
[12] G. E. Hinton, “What kind of graphical model is the brain?” in Proc.
19th International Joint Conference on Artificial intelligence, vol. 5,
2005, Conference Proceedings, pp. 1765–1775.
[13] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning
representations by back-propagating errors,” Nature, vol. 323, no. 6088,
pp. 533–536, 1986.
[14] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework
for contrastive learning of visual representations,” arXiv preprint
arXiv:2002.05709, 2020.
[15] N. Papernot and P. McDaniel, “Deep k-nearest neighbors: Towards
confident, interpretable and robust deep learning,” arXiv preprint
arXiv:1803.04765, 2018.
[16] S. Grossberg, “How does a brain build a cognitive code?” Psychological
Review, vol. 87, no. 1, pp. 1–51, 1980.
[17] Z. Ghahramani, “Probabilistic machine learning and artificial
intelligence,” Nature, vol. 521, no. 7553, pp. 452–459, 2015.
[18] J. Wang, P. Neskovic, and L. N. Cooper, “Neighborhood size selection
in the k-nearest-neighbor rule using statistical confidence,” Pattern
Recognition, vol. 39, no. 3, pp. 417–423, 2006.
[19] M. Page, “Connectionist modelling in psychology: A localist manifesto,”
Behavioral and Brain Sciences, vol. 23, no. 4, pp. 443–467, 2000.
[20] G. Dong and H. Liu, Feature engineering for machine learning and data
analytics. CRC Press, 2018.
[21] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson,
“Understanding neural networks through deep visualization,” arXiv
preprint arXiv:1506.06579, 2015.
[22] J. S. Bowers, “Parallel distributed processing theory in the age of deep
networks,” Trends in Cognitive Sciences, vol. 21, no. 12, pp. 950–961,
2017.
[23] J. Grainger and A. M. Jacobs, On localist connectionism and
psychological science. Mahwah, New Jersey: Lawrence Erlbaum, 1998,
pp. 1–38.
[24] J. L. McClelland and D. E. Rumelhart, “An interactive activation model
of context effects in letter perception: I. An account of basic findings,”
Psychological review, vol. 88, no. 5, pp. 375–407, 1981.
[25] D. A. Norman and T. Shallice, Attention to Action: Willed and Automatic
Control of Behavior. Boston, MA: Springer US, 1986, pp. 1–18.
[26] J. Yosinski, “Understanding neural networks through deep
visualization,” 2015, accessed: 27-01-2021. [Online]. Available:
http://yosinski.com/deepvis
[27] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
features in deep neural networks?” in Advances in neural information
processing systems 27 (NIPS 2014), Z. Ghahramani, M. Welling,
C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran
Associates, 2014, Conference Proceedings, pp. 3320–3328.
[28] Y. LeCun, C. Cortes, and C. J. C. Burges, “The
MNIST database,” accessed: 03-06-2020. [Online]. Available:
http://yann.lecun.com/exdb/mnist/
[29] A. Pritzel, B. Uria, S. Srinivasan, A. P. Badia, O. Vinyals, D. Hassabis,
D. Wierstra, and C. Blundell, “Neural episodic control,” in Proceedings
of the 34th International Conference on Machine Learning-Volume 70.
JMLR. org, Conference Proceedings, pp. 2827–2836.
[30] S. Grossberg, “Adaptive resonance theory: How a brain learns to
consciously attend, learn, and recognize a changing world,” Neural
Networks, vol. 37, pp. 1–47, 2013.
[31] B. C. Love, D. L. Medin, and T. M. Gureckis, “Sustain: A network
model of category learning,” Psychological Review, vol. 111, no. 2, pp.
309–332, 2004.
[32] G. Shafer and V. Vovk, “A tutorial on conformal prediction,” Journal of
Machine Learning Research, vol. 9, no. 3, 2008.
[33] T. L. Griffiths, N. Chater, C. Kemp, A. Perfors, and J. B. Tenenbaum,
“Probabilistic models of cognition: exploring representations and
inductive biases,” Trends in Cognitive Sciences, vol. 14, no. 8, pp.
357–364, 2010.
[34] J. L. McClelland, M. M. Botvinick, D. C. Noelle, D. C. Plaut, T. T.
Rogers, M. S. Seidenberg, and L. B. Smith, “Letting structure emerge:
connectionist and dynamical systems approaches to cognition,” Trends
in Cognitive Sciences, vol. 14, no. 8, pp. 348–356, 2010.
[35] V. Di Lollo, “The feature-binding problem is an ill-posed problem,”
Trends in Cognitive Sciences, vol. 16, no. 6, pp. 317–321, 2012.
[36] Z. Tu, X. Chen, A. L. Yuille, and S.-C. Zhu, “Image parsing: Unifying
segmentation, detection, and recognition,” International Journal of
computer vision, vol. 63, no. 2, pp. 113–140, 2005.
[37] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola,
A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive
learning,” arXiv preprint arXiv:2004.11362, 2020.