Automatic analysis of insurance reports through deep neural networks to identify severe claims

Isaac Cohen Sabban; Olivier Lopez; Yann Mercuzot

doi:10.1017/S174849952100004X

Automatic analysis of insurance reports through deep neural networks to identify severe claims

Published online by Cambridge University Press: 09 March 2021

Isaac Cohen Sabban ,

Olivier Lopez and

Yann Mercuzot

Show author details

Isaac Cohen Sabban*: Affiliation:
Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM, 4 place Jussieu, F-75005 Paris, France Pacifica, Crédit Agricole Assurances, F-75015 Paris, France
Olivier Lopez*: Affiliation:
Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM, 4 place Jussieu, F-75005 Paris, France
Yann Mercuzot: Affiliation:
Pacifica, Crédit Agricole Assurances, F-75015 Paris, France
*: *Corresponding author. E-mails: isaac.cohen-sabban@etu.upmc.fr, olivier.lopez@sorbonne-universite.fr
*Corresponding author. E-mails: isaac.cohen-sabban@etu.upmc.fr, olivier.lopez@sorbonne-universite.fr

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we develop a methodology to automatically classify claims using the information contained in text reports (redacted at their opening). From this automatic analysis, the aim is to predict if a claim is expected to be particularly severe or not. The difficulty is the rarity of such extreme claims in the database, and hence the difficulty, for classical prediction techniques like logistic regression to accurately predict the outcome. Since data is unbalanced (too few observations are associated with a positive label), we propose different rebalance algorithm to deal with this issue. We discuss the use of different embedding methodologies used to process text data, and the role of the architectures of the networks.

Keywords

Insurance Deep neural networks Long Short-Term Memory Convolutional Neural Networks Text analysis

Information

Type: Original Research Paper
Information: Annals of Actuarial Science , Volume 16 , Issue 1 , March 2022 , pp. 42 - 67

DOI: https://doi.org/10.1017/S174849952100004X [Opens in a new window]
Copyright: © The Author(s), 2021. Published by Cambridge University Press on behalf of Institute and Faculty of Actuaries

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y. & Zheng, X. (2015). TensorFlow: large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.Google Scholar

Aggarwal, C.C. & Zhai, C. (2012). Mining Text Data. Springer Science & Business Media, New York.CrossRef Google Scholar

Akosa, J. (2017). Predictive accuracy: a misleading performance measure for highly imbalanced data. In Proceedings of the SAS Global Forum.Google Scholar

Akritas, M.G. (2000). The central limit theorem under censoring. Bernoulli, 6(6), 1109–1120.CrossRef Google Scholar

Aloysius, N. & Geetha, M. (2017). A review on deep convolutional neural networks. In 2017 International Conference on Communication and Signal Processing (ICCSP) (pp. 0588–0592). IEEE.Google Scholar

Andersen, P.K., Borgan, O., Gill, R.D. & Keiding, N. (1998). Statistical Models Based on Counting Processes. Springer Science & Business Media.Google Scholar

Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade (pp. 437–478). Springer.Google Scholar

Berry, M.W. & Castellanos, M. (2004). Survey of text mining. Computing Reviews, 45(9), 548.Google Scholar

Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.Google Scholar

Bojanowski, P., Grave, E., Joulin, A. & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.CrossRef Google Scholar

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.10.1007/BF00058655CrossRef Google Scholar

Chawla, N.V., Bowyer, K.W., Hall, L.O. & Kegelmeyer, W.P. (2002). Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.CrossRef Google Scholar

Cheng, J., Dong, L. & Lapata, M. (2016). Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733.Google Scholar

Duchi, J., Hazan, E. & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.Google Scholar

Ellingsworth, M. & Sullivan, D. (2003). Text mining improves business intelligence and predictive modeling in insurance. Inf. Manag. 13(7), 42.Google Scholar

Friedman, J., Hastie, T. & Tibshirani, R. (2001). The Elements of Statistical Learning , Springer Series in Statistics, vol. 1. Springer, Berlin.Google Scholar

Gerber, G., Faou, Y.L., Lopez, O. & Trupin, M. (2020). The impact of churn on client value in health insurance, evaluation using a random forest under various censoring mechanisms. Journal of the American Statistical Association, 1–12.Google Scholar

Goutte, C. & Gaussier, E. (2005). A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In Springer (Ed.), European conference on information retrieval, vol. 3408 (pp. 345–359). Springer, Berlin, Heidelberg.CrossRef Google Scholar

Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media.CrossRef Google Scholar

Hesterberg, T. (2014). What teachers should know about the bootstrap: resampling in the undergraduate statistics curriculum. The American Statistician, 69, 371–386.CrossRef Google Scholar

Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Google Scholar PubMed

Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H. & Mikolov, T. (2016a). Fasttext.zip: compressing text classification models. arXiv preprint arXiv:1612.03651.Google Scholar

Joulin, A., Grave, E., Bojanowski, P. & Mikolov, T. (2016b). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.CrossRef Google Scholar

Kaplan, E.L. & Meier, P. (1958). Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53(282), 457–481.CrossRef Google Scholar

Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.Google Scholar

Kline, D.M. & Berardi, V.L. (2005). Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Computing & Applications, 14(4), 310–318.CrossRef Google Scholar

Kolyshkina, I. & van Rooyen, M. (2006). Text mining for insurance claim cost prediction. In Data Mining (pp. 192–202). Springer.Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).Google Scholar

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H. & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240.Google Scholar PubMed

Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2980–2988).CrossRef Google Scholar

Lopez, O. (2019). A censored copula model for micro-level claim reserving. Insurance: Mathematics and Economics, 87, 1–14.Google Scholar

Lopez, O., Milhaud, X., Thérond, P.-E. (2016). Tree-based censored regression with applications in insurance. Electronic Journal of Statistics, 10(2), 2685–2716.CrossRef Google Scholar

Mikolov, T., Karafiát, M., Burget, L., Černocky, J. & Khudanpur, S. (2010). Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).Google Scholar

Panchapagesan, S., Sun, M., Khare, A., Matsoukas, S., Mandal, A., Hoffmeister, B. & Vitaladevuni, S. (2016). Multi-task learning and weighted cross-entropy for DNN-based keyword spotting. In Interspeech, vol. 9 (pp. 760–764).Google Scholar

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. & Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Google Scholar

Pennington, J., Socher, R. & Manning, C.D. (2014). Glove: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543).CrossRef Google Scholar

Ramachandran, P., Zoph, B. & Le, Q.V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941Google Scholar

Rong, X. (2014). word2vec parameter learning explained. arXiv preprint arXiv:1411.2738.Google Scholar

Ronneberger, O., Fischer, P. & Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Springer.Google Scholar

Rotnitzky, A. & Robins, J.M. (2014). Inverse probability weighting in survival analysis. Wiley StatsRef: Statistics Reference Online.Google Scholar

Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.CrossRef Google Scholar

Saputro, A.R., Murfi, H. & Nurrohmah, S. (2019). Analysis of deep neural networks for automobile insurance claim prediction. In International Conference on Data Mining and Big Data (pp. 114–123). Springer.CrossRef Google Scholar

Stute, W. (1995). The central limit theorem under random censorship. The Annals of Statistics, 23(2), 422–439.CrossRef Google Scholar

Stute, W. (1996). Distributional convergence under random censorship when covariables are present. Scandinavian Journal of Statistics, 23(4), 461–471.Google Scholar

Stute, W. (1999). Nonlinear censored regression. Statistica Sinica, 9(4), 1089–1102.Google Scholar

Stute, W. & Wang, J.-L. (1993). The strong law under random censorship. Annals of Statistics, 21(3), 1591–1607.CrossRef Google Scholar

Van Keilegom, I. & Akritas, M.G. (1999). Transfer of tail information in censored regression models. Annals of Statistics Peer-Reviewed Journal, 27(5), 1745–1784.Google Scholar

Verdikha, N., Adji, T. & Permanasari, A. (2018). Study of undersampling method: instance hardness threshold with various estimators for hate speech classification. IJITEE (International Journal of Information Technology and Electrical Engineering), 2, 39–44.CrossRef Google Scholar

Wu, Q., Ye, Y., Zhang, H., Ng, M.K. & Ho, S.-S. (2014). Forestexter: an efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Systems, 67, 105–116.CrossRef Google Scholar

Wüthrich, M.V. (2018). Neural networks applied to chain–ladder reserving. European Actuarial Journal, 8(2), 407–436.CrossRef Google Scholar

Article contents

Automatic analysis of insurance reports through deep neural networks to identify severe claims

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests