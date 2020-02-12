Yena Han (left) and Tomaso Poggio present an example of the visual stimuli that were used in a new psychophysics study. Photo credit: Kris Brewer

Suppose you are looking at someone you have never met from a distance. Take a few steps back and look again. Can you see her face? “Yes, of course,” you probably think. If so, it would mean that after seeing a single image of an object, such as a particular face, our visual system will robustly recognize it despite changes in the position and scale of the object. On the other hand, we know that state-of-the-art classifiers like Vanilla Deep Networks will not pass this simple test.

In order to recognize a particular face under a series of transformations, neural networks with many examples of the face must be trained under the different conditions. In other words, they can achieve invariance by memorizing, but they cannot if there is only one image available. Therefore, it is important for engineers who want to improve their existing classifiers to understand how human vision can accomplish this remarkable achievement. This is also important for neuroscientists who model the visual system of primates with deep networks. In particular, it is possible that the invariance of biological vision requires a different computing strategy for one-time learning than for deep networks.

A new work by MIT Ph.D. Candidate for electrical engineering and computer science Yena Han and colleagues in Scientific reports, titled “Scaling and Translational Invariance for Novel Objects in the Human View”, describes how they study this phenomenon in more detail to create novel, biologically inspired networks.

“In contrast to deep networks, people can learn from very few examples. This is a big difference that has far-reaching effects on the development of image processing systems and on understanding the actual functioning of human vision,” explains co-author Tomaso Poggio, director the Center for Brain Research. Minds and Machines (CBMM) and Eugene McDermott Professor of Brain and Cognitive Science at MIT. “A main reason for this difference is the relative invariance of the visual system of the primates in terms of scaling, displacement and other transformations. Strangely enough, this has been largely neglected in the AI ​​community, also because the psychophysical data has so far not been clear. Hans Arbeit has now solid measurements of fundamental invariances of human vision are established. “

In order to differentiate the invariance resulting from the intrinsic calculation from experience and memorization, the new study measured the range of invariance in one-shot learning. A unique learning task was to present stimuli for Korean letters to human subjects who were not familiar with the language. These letters were originally submitted once under one condition and tested at scales or positions other than the original condition. The first experimental result is that, as you suspected, humans showed significant scale invariant detection after only a single exposure to these novel objects. The second result is that the range of position invariance is limited depending on the size and placement of the objects.

Han and her colleagues next conducted a similar experiment in deep neural networks to reproduce this human achievement. The results suggest that to explain the invariant detection of objects by humans, neural network models should explicitly include the built-in scale invariance. In addition, a limited position invariance of human vision in the network is better reproduced by increasing the receiving fields of the model neurons the further they are from the center of the visual field. This architecture differs from commonly used neural network models, in which an image with the same resolution is processed with the same shared filters.

“Our work offers a new understanding of the brain representation of objects from different perspectives. It also has an impact on AI because the results provide new insights into good architectural design for deep neural networks,” notes Han, CBMM researcher and lead author of the study ,

