Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-28T02:12:21.794Z Has data issue: false hasContentIssue false

Graph interpolating activation improves both natural and robust accuracies in data-efficient deep learning

Published online by Cambridge University Press:  28 December 2020

BAO WANG
Affiliation:
Department of Mathematics, Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, UT, USA, email: wangbaonj@gmail.com
STAN J. OSHER
Affiliation:
Department of Mathematics, UCLA, Los Angeles, CA90095-1555, USA, email: sjo@math.ucla.edu

Abstract

Improving the accuracy and robustness of deep neural nets (DNNs) and adapting them to small training data are primary tasks in deep learning (DL) research. In this paper, we replace the output activation function of DNNs, typically the data-agnostic softmax function, with a graph Laplacian-based high-dimensional interpolating function which, in the continuum limit, converges to the solution of a Laplace–Beltrami equation on a high-dimensional manifold. Furthermore, we propose end-to-end training and testing algorithms for this new architecture. The proposed DNN with graph interpolating activation integrates the advantages of both deep learning and manifold learning. Compared to the conventional DNNs with the softmax function as output activation, the new framework demonstrates the following major advantages: First, it is better applicable to data-efficient learning in which we train high capacity DNNs without using a large number of training data. Second, it remarkably improves both natural accuracy on the clean images and robust accuracy on the adversarial images crafted by both white-box and black-box adversarial attacks. Third, it is a natural choice for semi-supervised learning. This paper is a significant extension of our earlier work published in NeurIPS, 2018. For reproducibility, the code is available at https://github.com/BaoWangMath/DNN-DataDependentActivation.

Type
Papers
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agostinelli, F., Hoffman, M., Sadowski, P. & Baldi, P. (2014) Learning Activation Functions to Improve Deep Neural Networks. arXiv preprint arXiv:1412.6830.Google Scholar
Anonymous. (2019) Adversarial Machine Learning against Tesla’s Autopilot. https://www.schneier.com/blog/archives/2019/04/adversarial_mac.html.Google Scholar
Athalye, A., Carlini, N. & Wagner, D. (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: International Conference on Machine Learning.Google Scholar
Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. (2007) Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems.Google Scholar
Brendel, W., Rauber, J. & Bethge, M. (2017) Decision-Based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248.Google Scholar
Carlini, N. & Wagner, D. A. (2016) Towards evaluating the robustness of neural networks. In: IEEE European Symposium on Security and Privacy, pp. 3957.Google Scholar
Chapelle, O., Scholkopf, B. & Zien, A. (2006) Semi-Supervised Learning, MIT Press, Cambridge, Massachusetts.CrossRefGoogle Scholar
Chen, X., Liu, C., Li, B., Liu, K. & Song, D. (2017a) Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv preprint arXiv:1712.05526.Google Scholar
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S. & Feng, J. (2017b) Dual path networks. In: Advances in Neural Information Processing Systems.Google Scholar
Cohen, J., Rosenfeld, E. & Kolter, J. Z. (2019) Certified Adversarial Robustness via Randomized Smoothing. arXiv preprint arXiv:1902.02918v1.Google Scholar
Dou, Z., Osher, S. J. & Wang, B. (2018) Mathematical Analysis of Adversarial Attacks. arXiv preprint arXiv:1811.06492.Google Scholar
Glorot, X., Bordes, A. & Bengio, Y. (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315323.Google Scholar
Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A. & Bengio, Y. (2013) Maxout networks. arXiv preprint arXiv:1302.4389.Google Scholar
Goodfellow, I. J., Shlens, J. & Szegedy, C. (2014) Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6275.Google Scholar
Guo, C., Rana, M., Cisse, M. & van der Maaten, L. (2018) Countering adversarial images using input transformations. In: International Conference on Learning Representations. https://openreview.net/forum?id=SyJ7ClWCb.Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp. 10261034.CrossRefGoogle Scholar
He, K., Zhang, X., Ren, S. & Sun, J. (2016a) Identity mappings in deep residual network. In: European Conference on Computer Vision.Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. (2016b) Identity mappings in deep residual networks. In: European Conference on Computer Vision.CrossRefGoogle Scholar
He, K., Zhang, X., Ren, S. & Sun, J. (2016c) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770778.CrossRefGoogle Scholar
Hinton, G., Osindero, S. & Teh, T. (2006) A fast learning algorithm for deep belief nets. Neural Comput. 180 (7), 15271554.CrossRefGoogle Scholar
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2012) Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. arXiv preprint arXiv:1207.0580.Google Scholar
Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. (2017) Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition.CrossRefGoogle Scholar
Huang, G., Sun, Y., Liu, Z., Sedra, D. & Weinberger, K. (2016) Deep networks with stochastic depth. In: European Conference on Computer Vision.CrossRefGoogle Scholar
Kingma, D. & Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.Google Scholar
Krizhevsky, A. (2009) Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/cifar.html Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 10971105.Google Scholar
Kurakin, A., Goodfellow, I. & Bengio, S. (2017) Adversarial machine learning at scale. In: International Conference on Learning Representations.Google Scholar
LeCun, Y. (1998) The MNIST Database of Handwritten Digits.Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. (2015) Deep learning. Nature 521, 436444.CrossRefGoogle ScholarPubMed
Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D. & Jana, S. (2019) Certified robustness to adversarial examples with differential privacy. In: IEEE Symposium on Security and Privacy (SP).CrossRefGoogle Scholar
Li, Z. & Shi, Z. (2017) Deep Residual Learning and PDEs on Manifold. arXiv preprint arXiv:1708.05115.Google Scholar
Liu, Y., Chen, X., Liu, C. & Song, D. (2016) Delving into Transferable Adversarial Examples and Black-Box Attacks. arXiv preprint arXiv:1611.02770.Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2018) Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations. https://openreview.net/forum?id=rJzIBfZAb.Google Scholar
Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O. & Frossard, P. (2017) Universal adversarial perturbations. In: IEEE Conference on Computer Vision and Pattern Recognition, July 2017.CrossRefGoogle Scholar
Muja, M. & Lowe, D. G. (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Pattern Anal. Mach. Intell. (PAMI) 36, 22272240.CrossRefGoogle ScholarPubMed
Nair, V. & Hinton, G. (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, pp. 807814.Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B. & Ng, A. (2011) Reading digits in natural images with unsupervised features learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning.Google Scholar
Osher, S. J., Wang, B., Yin, P., Luo, X., Pham, M. & Lin, A. (2018) Laplacian Smoothing Gradient Descent. arXiv preprint arXiv:1806.06317.Google Scholar
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B. & Swami, A. (2016a) The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, pp. 372–387.CrossRefGoogle Scholar
Papernot, N., McDaniel, P., Wu, X., Jha, S. & Swami, A. (2016b) Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE European Symposium on Security and Privacy.CrossRefGoogle Scholar
Papernot, N., McDaniel, P. D. & Goodfellow, I. J. (2016c) Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples. CoRR, abs/1605.07277. http://arxiv.org/abs/1605.07277.Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. & Lerer, A. (2017) Automatic Differentiation in PyTorch. https://openreview.net/forum?id=BJJsrmfCZ.Google Scholar
Ross, A. & Doshi-Velez, F. (2017) Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients. arXiv preprint arXiv:1711.09404.Google Scholar
Samangouei, P., Kabkab, M. & Chellappa, R. (2018) Defense-GAN: protecting classifiers against adversarial attacks using generative models. In: International Conference on Learning Representations. https://openreview.net/forum?id=BkJ3ibb0-.Google Scholar
Shi, Z., Wang, B. & Osher, S. (2018) Error Estimation of the Weighted Nonlocal Laplacian on Random Point Cloud. arXiv preprint arXiv:1809.08622.Google Scholar
Simonyan, K. & Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.Google Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. & Fergus, R. (2013) Intriguing Properties of Neural Networks. arXiv preprint arXiv:1312.6199.Google Scholar
Tang, Y. (2013) Deep Learning Using Linear Support Vector Machines. arXiv:1306.0239.Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D. & McDaniel, P. (2018) Ensemble adversarial training: attacks and defenses. In: International Conference on Learning Representations. https://openreview.net/forum?id=rkZvSe-RZ.Google Scholar
Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitiagkas, I., Courville, A., Lopez-Paz, D. & Bengio, Y. (2018) Manifold Mixup: Better Representations by Interpolating Hidden States. arXiv preprint arXiv:1806.05236.Google Scholar
Wan, L., Zeiler, M., Zhang, S., LeCun, Y. & Fergus, R. (2013) Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 10581066.Google Scholar
Wang, B., Lin, A. T., Shi, Z., Zhu, W., Yin, P., Bertozzi, A. L. & Osher, S. J. (2018a) Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization. arXiv preprint arXiv:1809.08516.Google Scholar
Wang, B., Luo, X., Li, Z., Zhu, W., Shi, Z. & Osher, S. (2018b) Deep neural nets with interpolating function as output activation. In: Advances in Neural Information Processing Systems.Google Scholar
Wang, B., Yuan, B., Shi, Z. & Osher, S. (2019) ResNets ensemble via the Feynman-Kac formalism to improve natural and robust accuracies. In: Advances in Neural Information Processing Systems.Google Scholar
Zagoruyko, S. & Komodakis, N. (2016) Wide residual networks. In: British Machine Vision Conference.CrossRefGoogle Scholar
Zhang, H., Yu, Y., Jiao, J., Xing, E., Ghaoui, L. & Jordan, M. (2019) Theoretically Principled Trade-Off between Robustness and Accuracy. arXiv preprint arXiv:1901.08573.Google Scholar
Zheng, S., Song, Y., Leung, T. & Goodfellow, I. (2016) Improving the robustness of deep neural networks via stability training. In: IEEE Conference on Computer Vision and Pattern Recognition.CrossRefGoogle Scholar