Biased And Crossmap Dropout Strategies For Convolutional Neural Networks

Over the last decade, deep learning models, particularly Convolutional Neural Networks (CNN), have shown outstanding results in various fields, including computer vision area. Endorsed by the huge advancement of GPU-based parallel computing, CNN marks its superiority over conventional computer vision approaches by being able to learn features from its input automatically layer by layer, forming hierarchical layers of features.

This being said, overfitting becomes a crucial problem as CNN involves a large number of parameters in its structure and becomes worse when it is provided with an insufficient amount of training data. Dropout is considered as one of the most efficient ways to combat overfitting in such case, primarily by introducing noise (dropping a random subset of units/ neurons) inside the network in every training phase. In our recent article titled “Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network,” published in the Neural Networks journal, we have discovered simple yet effective approaches to improve the effectiveness of Dropout in CNN model.

Biased Dropout

As the original Dropout approach emphasizes giving all units the same probability p to be dropped during each training case, we found out that giving different probability values might affect the effectiveness of Dropout itself. Due to the nature of each unit, which contributes a different degree to the network final performance, we believe that some units are considered “important” while the others are “less important.”

In a convolution layer, a unit with high activation value indicates an important feature, and its deletion during Dropout might lead to crucial information loss in the network during training. On the contrary, a unit with low activation value gives a small contribution to network loss and eventually will receive a small amount of error gradient during backpropagation; thus, one might say it is less important than the previous case.

Given this circumstance, we introduced a new approach, named Biased Dropout, in which we incorporate different (biased) Dropout probability to units according to their importance factor. Important units (clustered as one group) will have low probability to be dropped to maintain the vital information within, while the less important units will be exposed to a high probability to compensate the lack of noise introduced in the previous group. Later, we discovered that this approach outperformed the original Dropout in most cases, both in term of accuracy performance and training convergence speed. Furthermore, we demonstrated that performing the opposite behavior of Biased Dropout by exposing the important units to higher Dropout probability value resulted in an underfitted network, hence justifying the effectiveness of this approach.

Crossmap Dropout

This work, in particular, is a generalization of original Dropout implementation in convolution layer. The motivation behind this idea is that units in a convolution layer are structured in a unique way, which is different from a normal MLP or a fully connected layer. A convolution layer consists of multiple feature maps which share a strong correlation with each other. Crossmap Dropout aims to maintain this correlation of feature maps during training but still introduces noise within the layer. Therefore, instead of dropping units in entirely random manner across all feature maps as in the case where the original Dropout would do, Crossmap Dropout picks a random subset of units in the first feature map and copies the same noise pattern to all remaining feature maps. We discovered that this approach worked well for small-sized datasets which require a small amount of training iteration.

Results and Future Work

We tested our approaches to several benchmark datasets, including MNIST, CIFAR, and ImageNet datasets and both of our approaches showed higher performance compared to the original Dropout in most datasets we used. Furthermore, combining both approaches achieved the best result in several cases. However, there are obvious drawbacks in our current work, including the lack of mathematical justification of our theories and the exhaustive validation process to find the optimal hyperparameters in Biased Dropout. Therefore, further investigations are needed in the future to address these issues and provide a deeper insight.

These findings are described in the article entitled Biased Dropout and Crossmap Dropout: Learning towards effective Dropout regularization in convolutional neural network, recently published in the journal Neural NetworksThis work was conducted by Alvin Poernomo and Dae-Ki Kang from Dongseo University.