Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-25T19:25:47.311Z Has data issue: false hasContentIssue false

AI in actuarial science – a review of recent advances – part 1

Published online by Cambridge University Press:  26 August 2020

Ronald Richman*
Affiliation:
QED Actuaries and Consultants

Abstract

Rapid advances in artificial intelligence (AI) and machine learning are creating products and services with the potential not only to change the environment in which actuaries operate but also to provide new opportunities within actuarial science. These advances are based on a modern approach to designing, fitting and applying neural networks, generally referred to as “Deep Learning.” This paper investigates how actuarial science may adapt and evolve in the coming years to incorporate these new techniques and methodologies. Part 1 of this paper provides background on machine learning and deep learning, as well as an heuristic for where actuaries might benefit from applying these techniques. Part 2 of the paper then surveys emerging applications of AI in actuarial science, with examples from mortality modelling, claims reserving, non-life pricing and telematics. For some of the examples, code has been provided on GitHub so that the interested reader can experiment with these techniques for themselves. Part 2 concludes with an outlook on the potential for actuaries to integrate deep learning into their activities. Finally, a supplementary appendix discusses further resources providing more in-depth background on machine learning and deep learning.

Type
Review
Copyright
© Institute and Faculty of Actuaries 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y. & Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. Paper presented at the OSDI.Google Scholar
Albright, J., Schneider, J. & Nyce, C. (2017). The Chaotic Middle. Available online at the address https://assets.kpmg.com/content/dam/kpmg/us/pdf/2017/06/chaotic-middle-autonomous-vehicle-paper.pdf [accessed 24-Jul-2018].Google Scholar
Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1127.CrossRefGoogle Scholar
Bengio, Y., Courville, A. & Vincent, P. (2013). Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 17981828.CrossRefGoogle ScholarPubMed
Bengio, Y., Ducharme, R., Vincent, P. & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3(Feb), 11371155.Google Scholar
Bengio, Y. & LeCun, Y. (2007). Scaling learning algorithms towards AI. In L. Bottou, O. Chapelle, D. DeCoste & J. Weston (Eds.), Large-Scale Kernel Machines. MIT Press, Cambridge, MA.Google Scholar
Boonen, T. (2017). Solvency II solvency capital requirement for life insurance companies based on expected shortfall. European Actuarial Journal, 7(2), 405434.CrossRefGoogle ScholarPubMed
Borovykh, A., Bohte, S. & Oosterlee, C.W. (2017). Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691.Google Scholar
Breiman, L. (2001). Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199231.CrossRefGoogle Scholar
Bühlmann, H. & Gisler, A. (2006). A Course in Credibility Theory and its Applications. Springer Science & Business Media, Berlin.Google Scholar
Bühlmann, H. & Straub, E. (1983). Estimation of IBNR reserves by the methods chain ladder, Cape Cod and complementary loss ratio. Paper presented at the International Summer School.Google Scholar
Cairns, A.J.G., Blake, D. & Dowd, K. (2006). A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. Journal of Risk & Insurance, 73(4), 687718. doi: 10.1111/j.1539-6975.2006.00195.x.CrossRefGoogle Scholar
Canny, J. (1986). A computational approach to edge detection. In IEEE Transactions on Pattern Analysis and Machine Intelligence: Vol. PAMI-8 (pp. 679698). Elsevier. https://doi.org/10.1109/TPAMI.1986.4767851.Google Scholar
Chollet, F. (2015). Keras Retrieved from keras.io.Google Scholar
Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. (2015). Gated feedback recurrent neural networks. Paper presented at the International Conference on Machine Learning.Google Scholar
Currie, I.D. (2016). On fitting generalized linear and non-linear models of mortality. Scandinavian Actuarial Journal, 2016(4), 356383.CrossRefGoogle Scholar
De Brébisson, A., Simon, É., Auvolat, A., Vincent, P. & Bengio, Y. (2015). Artificial neural networks applied to taxi destination prediction. arXiv:1508.00021.Google Scholar
De Jong, P. & Heller, G.Z. (2008). Generalized Linear Models for Insurance Data. Cambridge University Press, Cambridge.CrossRefGoogle Scholar
Dong, W., Li, J., Yao, R., Li, C., Yuan, T. & Wang, L. (2016). Characterizing driving styles with deep learning. arXiv:1607.03611.Google Scholar
drive.ai. (2018). Drive.ai announces on-demand self-driving car service on public roads in Texas. Frisco, Texas. Available online at the address https://s3.amazonaws.com/www-staging.drive.ai/content/uploads/2018/05/06164346/Press-Release_Drive.ai-Texas-Deployment.pdf [accessed 24-Jul-2018].Google Scholar
Elman, J. (1990). Finding structure in time. Cognitive Science, 14(2), 179211. doi: 10.1207/s15516709cog1402_1.CrossRefGoogle Scholar
Federal Drug Administration. (2018). FDA permits marketing of artificial intelligence-based device to detect certain diabetes-related eye problems. Available online at the address https://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm604357.htm [accessed 24-Jul-2018].Google Scholar
Freund, Y. & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119139.CrossRefGoogle Scholar
Friedman, J. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 11891232. https://doi.org/10.2307/2699986.CrossRefGoogle Scholar
Friedman, J., Hastie, T. & Tibshirani, R. (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction. Springer-Verlag, New York.Google Scholar
Gao, G. & Wüthrich, M.V. (2019). Convolutional neural network classification of telematics car driving data. Risks, 7(1), 6.CrossRefGoogle Scholar
Geladi, P. & Kowalski, B. (1986). Partial least-squares regression: a tutorial. Analytica Chimica Acta, 185, 117.CrossRefGoogle Scholar
Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel Hierarchical Models (Vol. 1). Cambridge University Press, New York, NY.Google Scholar
Gesmann, M., Murphy, D., Zhang, Y., Carrato, A., Crupi, G., Wüthrich, M. & Concina, F. (2017). ChainLadder: Statistical Methods and Models for Claims Reserving in General Insurance. Available online at the address https://CRAN.R-project.org/package=ChainLadder [accessed 24-Jul-2018].Google Scholar
Girshick, R. (2015). Fast R-CNN. arXiv:1504.08083.Google Scholar
Gluck, S. (1997). Balancing development and trend in loss reserve analysis. Paper presented at the Proceedings of the Casualty Actuarial Society.Google Scholar
Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57, 345420.CrossRefGoogle Scholar
Goldberg, Y. (2017). Neural Network Methods for Natural Language Processing (Vol. 10). Morgan & Claypool Publishers, San Rafael, California.Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning. MIT Press, Cambridge, MA.Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative adversarial nets. Paper presented at the Advances in Neural Information Processing Systems.Google Scholar
Graves, A. (2012). Supervised sequence labelling. In Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24797-2_2.CrossRefGoogle Scholar
Graves, A., Mohamed, A. & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. Paper presented at the Acoustics, speech and signal processing (icassp), 2013 ieee international conference on.CrossRefGoogle Scholar
Guo, C. & Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv:1604.06737.Google Scholar
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., … Coates, A. (2014). Deep speech: Scaling up end-to-end speech recognition. arXiv:1412.5567.Google Scholar
Hastie, T., Tibshirani, R. & Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman and Hall/CRC, Boca Raton, Florida.CrossRefGoogle Scholar
Hinton, G., Osindero, S. & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 15271554.CrossRefGoogle ScholarPubMed
Hinton, G. & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504507.CrossRefGoogle ScholarPubMed
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv (arXiv:1207.0580).Google Scholar
Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 17351780.CrossRefGoogle ScholarPubMed
Hoerl, A.E. & Kennard, R.W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12(1), 5567.CrossRefGoogle Scholar
Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 14641480.CrossRefGoogle Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Paper presented at the Advances in Neural Information Processing Systems.Google Scholar
Kuhn, M. & Johnson, K. (2013). Applied Predictive Modeling (Vol. 26). Springer, Berlin.CrossRefGoogle Scholar
LeCun, Y., Bengio, Y. & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.CrossRefGoogle ScholarPubMed
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 22782324.CrossRefGoogle Scholar
Lee, R.D. & Carter, L.R. (1992). Modeling and forecasting US mortality. Journal of the American Statistical Association, 87(419), 659671.Google Scholar
Maaten, L. & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 25792605.Google Scholar
Mack, T. (1993). Distribution-free calculation of the standard error of chain ladder reserve estimates. Astin Bulletin, 23(02), 213225.CrossRefGoogle Scholar
Mack, T. (2002). Schadenversicherungsmathematik 2. Auflage: Schriftenreihe Angewandte Versicherungsmathematik, DGVM.Google Scholar
Makridakis, S., Spiliotis, E. & Assimakopoulos, V. (2018). The M4 competition: results, findings, conclusion and way forward. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2018.06.001.CrossRefGoogle Scholar
McGrayne, S. (2011). The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy. Yale University Press, New Haven, Connecticut.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Paper presented at the Advances in neural information processing systems.Google Scholar
Mitchell, T. (1997). Machine Learning. McGraw-Hill, Boston, MA.Google Scholar
Mullainathan, S. & Spiess, J. (2017). Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2), 87106.CrossRefGoogle Scholar
Nair, V. & Hinton, G. (2010). Rectified linear units improve restricted Boltzmann machines. Paper presented at the Proceedings of the 27th International Conference on Machine Learning.Google Scholar
Noll, A., Salzmann, R. & Wüthrich, M.V. (2018). Case study: French motor third-party liability claims. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3164764.CrossRefGoogle Scholar
Ohlsson, E. & Johansson, B. (2010). Non-Life Insurance Pricing with Generalized Linear Models (Vol. 2). Springer, Berlin.CrossRefGoogle Scholar
Parodi, P. (2014). Pricing in General Insurance. CRC Press, Boca Raton, Florida.CrossRefGoogle Scholar
Parodi, P. (2016). Towards machine pricing. Paper presented at the GIRO 2016, Dublin.Google Scholar
Pascanu, R., Gulcehre, C., Cho, K. & Bengio, Y. (2013). How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026.Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., … Lerer, A. (2017). Automatic differentiation in PyTorch.Google Scholar
Renshaw, A.E. & Verrall, R.J. (1998). A stochastic model underlying the chain-ladder technique. British Actuarial Journal, 4(4), 903923.CrossRefGoogle Scholar
Rentzmann, S. & Wüthrich, M. (2019). Unsupervised learning: what is a sports car? Available at SSRN 3439358.CrossRefGoogle Scholar
Richman, R. (2017). Old age Mortality in South Africa, 1985-2011. University of Cape Town, Cape Town.Google Scholar
Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386.CrossRefGoogle Scholar
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.Google Scholar
Rumelhart, D., Hinton, G. & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533.CrossRefGoogle Scholar
Schreiber, D. (2017). The Future of Insurance. Available online at the address https://www.youtube.com/watch?time_continue=1&v=LDOhFHJqKqI [accessed 24-Jul-2018].Google Scholar
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289310. https://doi.org/10.1214/10-STS330.CrossRefGoogle Scholar
Sutskever, I., Vinyals, O. & Le, Q. (2014). Sequence to sequence learning with neural networks. Paper presented at the Advances in neural information processing systems.Google Scholar
Sutton, R. & Barto, A. (2018). Reinforcement Learning: An Introduction, Second Edition (Vol. 1). MIT Press, Cambridge, MA.Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2015). Going deeper with convolutions. Paper presented at the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).CrossRefGoogle Scholar
Thatcher, A.R., Kannisto, V. & Andreev, K. (2002). The survivor ratio method for estimating numbers at high ages. Demographic Research, 6(1), 215.CrossRefGoogle Scholar
Thomson, R. (2006). A typology of models used in actuarial science: refereed paper. South African Actuarial Journal, 6(1), 1936.CrossRefGoogle Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267288.CrossRefGoogle Scholar
Tomas, J. & Planchet, F. (2014). Prospective mortality tables and portfolio experience. In A. Charpentier (Ed.), Computational actuarial science with R: CRC Press.Google Scholar
Viola, P. & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Paper presented at the Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001.CrossRefGoogle Scholar
Weisstein, E. (2003). Convolution. Available online at the address http://mathworld.wolfram.com/Convolution.html [accessed 24-Jun-2018].Google Scholar
Werbos, P. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339356. https://doi.org/10.1016/0893-6080(88)90007-X.CrossRefGoogle Scholar
Wu, Y., Schuster, M., Chen, Z., Le, Q., Norouzi, M., Macherey, W., … Macherey, K. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.Google Scholar
Wüthrich, M. & Buser, C. (2018). Data analytics for non-life insurance pricing. Available online at the address https://doi.org/10.2139/ssrn.2870308 [accessed 17-Jun-2018].CrossRefGoogle Scholar
Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301320.CrossRefGoogle Scholar
Supplementary material: File

Richman supplementary material

Richman supplementary material

Download Richman supplementary material(File)
File 26.5 KB