Primates are so fast and accurate in understanding the surrounding visual world. Their visual cortex lets them do invariant object recognition in just 100-150ms [1]. Some scientists believe that this rapid process leaves no room for time-consuming neural computations, such as detecting firing rates or the intervention of feedback signals. Instead, they suggest that the earliest spikes carry the most important information about the visual object and would be enough to do invariant object recognition [2].
In 2007, Masquelier and Thorpe [3] started to examine the idea of the earliest spikes by proposing a shallow feedforward spiking neural network (SNN) for object recognition. By passing multiple scales of the input image through oriented Gabor filters, the saliency maps of oriented edges are extracted (simulating the role of the primary visual cortex V1). The saliency maps are then converted to spike latencies, where the neurons are limited to fire at most once. The more the edge is salient, the earlier is the corresponding spike latency. These spikes are then passed through a local pooling layer and gathered by integrate-and-fire (IF) neurons in the next layer via plastic synapses.
By applying spike-timing-dependent-plasticity (STDP) learning rule, they found that the network extracts high-level meaningful features from the incoming spike waves which represent the presence of a particular object in the input image. Their results showed that the network’s output is enough to classify motorbikes from faces, supporting the importance and efficiency of the earliest spikes. Later, in 2016, Kheradpisheh et al. showed that this shallow network is able to solve multi-category object recognition problems even if the object goes under in-depth rotations [4].
Indeed, hierarchical processing of visual stimuli in the primate’s visual cortex is vital to the invariant object recognition. As we go toward the higher layers in this hierarchy, the visual features, to which the neurons are selective, gets more complex and abstract. Regarding the hierarchical processing, Kheradpisheh et al. proposed a deep convolutional SNN (DCSNN) by extending Masquelier and Thorpe’s model [5]. There, they used difference-of-gaussian (DoG) filters, as a model of retinal ganglion cells, for the entry layer and let the first trainable layer converge to V1-like neurons. Putting three trainable layers of IF neurons and employing STDP learning rule, the network successfully converged to a set of hierarchically-structured features. That is to say, simple oriented bars in the first layer (basic), partial objects or object parts in the second layer (intermediate), and complete object prototypes in the third layer (high-level). Using linear support vector machines (SVMs) as a readout, the network achieved great results on several image datasets.
The use of STDP increased the bio-plausibility of the aforementioned networks, yet they had to use supervised readouts for which there is no biological evidence. A biologically plausible way to tackle this problem is to use reinforcement learning (RL) rules. There are many pieces of evidence supporting that RL happens in the brain by engaging neurotransmitters such as Dopamine and Acetylcholine in synaptic plasticity. Employing RL, the network becomes aware of its behavior by receiving feedback (reward/punishment) signals from the environment. This way, the network keeps adjusting its behavior towards receiving higher levels of positive feedback (reward) signals.
Reward-modulated STDP (R-STDP) is one of the well-know bio-plausible RL rules for synaptic plasticity. According to this rule, neurotransmitters affect the STDP by modifying the polarity, the magnitude, and the effective time window. Mozafari et. al. eliminated the need for an external supervised readout by applying R-STDP to the previously mentioned SNNs [6,7]. The main idea is to label the neurons of the last layer with the categories of the input objects and use them as decision indicators. Precisely, for each image, the neuron with the earliest spike or the maximum potential in the last layer indicates the decision of the network on the category of the input object. Given the decision of the network, a reward (punishment) signal is globally propagated if it matches (mismatches) the label of the image.
The results of their experiments justified that the R-STDP rule can be substituted by external complex readouts and even achieve higher performance. They also examined the effect of applying R-STDP to multiple layers of a deep SNN and suggested that it comes in handy if there are a limited number of neurons and frequent input distractors [7]. These RL-based networks not only achieved high performances but also brought the ability of efficient online on-chip learning that fits the neuromorphic engineer’s needs.
Although in terms of accuracy, deep learning methods and supervised (backprob-based) SNNs are among the state-of-the-art in visual tasks, they are not biologically plausible in terms of neural processing (sending floating-point values) and learning mechanism (supervised backprop algorithm). Actual neurons communicate with each other by sending spikes, and the information is mainly coded in the spike times. Besides, learning in primates is mostly unsupervised or reward-based, and at the neuronal level, it depends on spike times (e.g. STDP). Our results bring hope to employ more biologically-plausible SNNs for solving challenging visual tasks and being benefited from its advantages including lower computational complexity and higher energy efficiency, to mention a few. Moreover, these models help to better understand the neural computations underlying the brain functionalities and verify neuroscientific hypotheses.
These findings are described in the article entitled STDP-based spiking deep convolutional neural networks for object recognition, recently published in the journal Neural Networks. Further information from this article can be found in the articles below:
- Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity (link)
- Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition (link)
- First-Spike-Based Visual Categorization Using Reward-Modulated STDP (link)
- Combining STDP and Reward-Modulated STDP in Deep Convolutional Spiking Neural Networks for Digit Recognition (link)
This work was conducted by Saeed Reza Kheradpisheh and Mohammad Ganjtabesh from the University of Tehran, and Simon J. Thorpe and Timothée Masquelier from the Université Toulouse.
References:
- Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381(6582), 520.
- Vanrullen, R., & Thorpe, S. J. (2001). The time course of visual processing: from early perception to decision-making. Journal of cognitive neuroscience, 13(4), 454-461.
- Masquelier, T., & Thorpe, S. J. (2007). Unsupervised learning of visual features through spike timing dependent plasticity. PLoS computational biology, 3(2), e31.
- Kheradpisheh, S. R., Ganjtabesh, M., & Masquelier, T. (2016). Bio-inspired unsupervised learning of visual features leads to robust invariant object recognition. Neurocomputing, 205, 382-392.
- Kheradpisheh, S. R., Ganjtabesh, M., Thorpe, S. J., & Masquelier, T. (2018). STDP-based spiking deep convolutional neural networks for object recognition. Neural Networks, 99, 56-67.
- Mozafari, M., Kheradpisheh, S. R., Masquelier, T., Nowzari-Dalini, A., & Ganjtabesh, M. (2018). First-Spike-Based Visual Categorization Using Reward-Modulated STDP. IEEE Transactions on Neural Networks and Learning Systems.
- Mozafari, M., Ganjtabesh, M., Nowzari-Dalini, A., Thorpe, S. J., & Masquelier, T. (2018). Combining STDP and Reward-Modulated STDP in Deep Convolutional Spiking Neural Networks for Digit Recognition. arXiv preprint arXiv:1804.00227.