S. Aydore, B. Thirion, O. Grisel, and G. Varoquaux, Using feature grouping as a stochastic regularizer for high-dimensional noisy data, 2018.

J. Ba and B. Frey, Adaptive dropout for training deep neural networks, Adv. Neural. Inform. Process Syst, vol.26, pp.3084-3092, 2013.

P. Baldi, . Peter, and . Sadowski, Understanding dropout, Adv. Neural. Inform. Process Syst, vol.26, pp.2814-2822, 2013.

B. Barlow, Possible principles underlying the transformations of sensory messages. Sensory Communication, Contributions: Contributions, vol.217, 1959.

Y. Bengio, A. Courville, and P. Vincent, Representation learning: a review and new perspectives, 2013.

M. Chris and . Bishop, Training with noise is equivalent to tikhonov regularization, Neural computation, vol.7, issue.1, pp.108-116, 1995.

G. Chen, A. Saied, N. Jaradat, . Banerjee, S. Tetsuya et al., Evaluation and comparison of clustering algorithms in analyzing es cell gene expression data, Statistica Sinica, pp.241-262, 2002.

M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, and D. Batra, Reducing overfitting in deep networks by decorrelating representations, 2015.

G. Desjardins, K. Simonyan, and R. Pascanu, Natural neural networks, Advances in Neural Information Processing Systems, pp.2071-2079, 2015.

T. Devries and G. Taylor, Improved regularization of convolutional neural networks with cutout, 2017.

. Thomas-g-dietterich, Ensemble methods in machine learning, International workshop on multiple classifier systems, pp.1-15, 2000.

Y. Gal and Z. Ghahramani, A theoretically grounded application of dropout in recurrent neural networks, Advances in neural information processing systems, pp.1019-1027, 2016.

. James-e-gentle, Computational statistics, vol.308, 2009.

I. Guyon, J. Li, T. Mader, A. Patrick, G. Pletscher et al., Competitive baseline methods set new standards for the nips 2003 feature selection benchmark, Pattern recognition letters, vol.28, issue.12, pp.1438-1444, 2007.

B. Hassibi, . David, and . Stork, Second order derivatives for network pruning: Optimal brain surgeon, Advances in neural information processing systems, pp.164-171, 1993.

P. David, P. Helmbold, and . Long, Surprising properties of dropout in deep networks, The Journal of Machine Learning Research, vol.18, issue.1, pp.7284-7311, 2017.

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, 2012.

A. Hyvärinen, Independent component analysis: recent advances, Phil. Trans. R. Soc. A, vol.371, p.20110534, 1984.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32Nd International Conference on International Conference on Machine Learning, vol.37, pp.448-456, 2015.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Communications of the ACM, vol.60, issue.6, pp.84-90, 2017.

I. Ludmila, C. Kuncheva, and . Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy, Machine learning, vol.51, issue.2, pp.181-207, 2003.

Y. Lecun, S. John, S. A. Denker, and . Solla, Optimal brain damage, Advances in neural information processing systems, pp.598-605, 1990.

F. Leisch, A. Weingessel, and K. Hornik, On the generation of correlated artificial binary data, 1998.

P. Luo, Learning deep architectures via generalized whitened neural networks, International Conference on Machine Learning, pp.2238-2246, 2017.

. Shin-ichi-maeda, A Bayesian encourages dropout, 2014.

S. Mallat, Group invariant scattering, Comm. Pure Appl. Math, vol.65, issue.10, pp.1331-1398, 2012.

Z. Mariet and S. Sra, Diversity networks, 2016.

H. Peng, F. Long, and C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell, vol.27, issue.8, pp.1226-1238, 2005.

S. John, . Preisser, . Bahjat, and . Qaqish, A comparison of methods for simulating correlated binary variables with specified marginal means and correlations, Journal of Statistical Computation and Simulation, vol.84, issue.11, pp.2441-2452, 2014.

J. Pau-rodríguez, G. Gonzalez, . Cucurull, M. Josep, X. Gonfaus et al., Regularizing cnns with locally constrained decorrelations, 2016.

. Peter-j-rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics, vol.20, pp.53-65, 1987.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

N. Tishby and N. Zaslavsky, Deep Learning and the Information Bottleneck Principle, 2015.

J. Tompson, R. Goroshin, A. Jain, Y. Lecun, and C. Bregler, Efficient Object Localization Using Convolutional Networks, 2014.

S. Wager, S. Wang, and P. Liang, Dropout training as adaptive regularization, Adv. Neural. Inform. Process Syst, vol.26, pp.351-359, 2013.