Hostname: page-component-78c5997874-t5tsf Total loading time: 0 Render date: 2024-11-10T12:34:28.719Z Has data issue: false hasContentIssue false

ADDRESSING IMBALANCED INSURANCE DATA THROUGH ZERO-INFLATED POISSON REGRESSION WITH BOOSTING

Published online by Cambridge University Press:  17 December 2020

Simon C.K. Lee*
Affiliation:
Department of Statistics and Actuarial Science, The University of Hong Kong, E-Mail: slee2016@hku.hk

Abstract

A machine learning approach to zero-inflated Poisson (ZIP) regression is introduced to address common difficulty arising from imbalanced financial data. The suggested ZIP can be interpreted as an adaptive weight adjustment procedure that removes the need for post-modeling re-calibration and results in a substantial enhancement of predictive accuracy. Notwithstanding the increased complexity due to the expanded parameter set, we utilize a cyclic coordinate descent optimization to implement the ZIP regression, with adjustments made to address saddle points. We also study how various approaches alleviate the potential drawbacks of incomplete exposures in insurance applications. The procedure is tested on real-life data. We demonstrate a significant improvement in performance relative to other popular alternatives, which justifies our modeling techniques.

Type
Research Article
Copyright
© 2020 by Astin Bulletin. All rights reserved

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Boucher, J.-P., Denuit, M. and Guillén, M. (2007) Risk classification for claim counts: A comparative analysis of various zeroinflated mixed poisson and hurdle models. North American Actuarial Journal, 11(4), 110131.CrossRefGoogle Scholar
Boucher, J.-P., Denuit, M. and Guillen, M. (2009) Number of accidents or number of claims? An approach with zero-inflated poisson models for panel data. Journal of Risk and Insurance, 76(4), 821846.CrossRefGoogle Scholar
Breiman, L., Friedman, J., Stone, C.J. and Olshen, R.A. (1984) Classification and Regression Trees. Boca Raton, Florida, USA: CRC Press.Google Scholar
Bühlmann, H. and Gisler, A. (2006) A Course in Credibility Theory and Its Applications. Berlin, Germany: Springer Science & Business Media.Google Scholar
Caldern-Ojeda, E., GóMez-Déniz, E. and Barranco-Chamorro, I. (2019). Modelling zero-inflated count data with a special case of the generalised poisson distribution. ASTIN Bulletin: The Journal of the IAA, 49(3), 689707.CrossRefGoogle Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P. (2002) Smote: A ynthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321357.CrossRefGoogle Scholar
Chen, T., He, T., Benesty, M., Khotilovich, V. and Tang, Y. (2015) Xgboost: Extreme gradient boosting. R package version 0.4-2, 1–4.Google Scholar
De Jong, P. and Heller, G.Z. (2008) Generalized Linear Models for Insurance Data. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Fernández, A., Garca, S., Galar, M., Prati, R.C., Krawczyk, B. and Herrera, F. (2018) Learning from Imbalanced Data Sets. Springer.CrossRefGoogle Scholar
Freund, Y. and Schapire, R.E. (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. European Conference on Computational Learning Theory, pp. 2337. Springer.CrossRefGoogle Scholar
Friedman, J.H. (2001) Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 11891232.CrossRefGoogle Scholar
Friedman, J.H. (2002) Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367378.CrossRefGoogle Scholar
Gee, J. and Button, M. (2019) The financial cost of fraud 2019: The latest data from around the world. Tech. rep., Crowe UK.Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H. and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220239.CrossRefGoogle Scholar
He, H. and Ma, Y. (2013). Imbalanced learning: Foundations, Algorithms, and Applications. New York, USA: John Wiley & Sons.CrossRefGoogle Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.-Y. (2017) Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pp. 31463154.Google Scholar
Kingman, J.F.C. (2005) Poisson processes. Encyclopedia of biostatistics 6.CrossRefGoogle Scholar
Klein, N., Kneib, T. and Lang, S. (2015) Bayesian generalized additive models for location, scale, and shape for zero-inflated and overdispersed count data. Journal of the American Statistical Association, 110(509), 405419.CrossRefGoogle Scholar
Lambert, D. 1992. Zero-Inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34, 114.CrossRefGoogle Scholar
Lee, S.C. (2020) Delta boosting implementation of negative binomial regression in actuarial pricing. Risks, 8(1), 19.CrossRefGoogle Scholar
Lee, S.C. and Lin, S. (2018) Delta boosting machine with application to general insurance. North American Actuarial Journal, 22(3), 405425.CrossRefGoogle Scholar
Saha, A. and Tewari, A. (2010) On the finite time convergence of cyclic coordinate descent methods. arXiv preprint arXiv: 1005.2146.Google Scholar
Schapire, R.E. (1990) The strength of weak learnability. Machine Learning, 5(2), 197227.CrossRefGoogle Scholar
Teugels, J.L. and Vynckie, P. (1996). The structure distribution in a mixed poisson process. International Journal of Stochastic Analysis, 9(4), 489496.Google Scholar
Wright, S.J. (2015) Coordinate descent algorithms. Mathematical Programming, 151(1), 334.Google Scholar
Wuthrich, M.V. and Buser, C. (2019). Data analytics for non-life insurance pricing. Swiss Finance Institute Research Paper 2019 (16-68).Google Scholar