Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates

Yanling Li; Zita Oravecz; Shuai Zhou; Yosef Bodovski; Ian J. Barnett; Guangqing Chi; Yuan Zhou; Naomi P. Friedman; Scott I. Vrieze; Sy-Miin Chow

doi:10.1007/s11336-021-09831-9

Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates

Published online by Cambridge University Press: 01 January 2025

Shuai Zhou ,

Yuan Zhou ,

Scott I. Vrieze and

Yanling Li*: Affiliation:
The Pennsylvania State University
Zita Oravecz: Affiliation:
The Pennsylvania State University
Shuai Zhou: Affiliation:
The Pennsylvania State University
Yosef Bodovski: Affiliation:
The Pennsylvania State University
Ian J. Barnett: Affiliation:
University of Pennsylvania
Guangqing Chi: Affiliation:
The Pennsylvania State University
Yuan Zhou: Affiliation:
University of Minnesota
Naomi P. Friedman: Affiliation:
University of Colorado Boulder
Scott I. Vrieze: Affiliation:
University of Minnesota
Sy-Miin Chow: Affiliation:
The Pennsylvania State University
*: Correspondence should be made to Yanling Li, Department of Agricultural Economics, Sociology, and Education, The Pennsylvania State University, PA 16802, State College, USA. Email: yxl823@psu.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we present and evaluate a novel Bayesian regime-switching zero-inflated multilevel Poisson (RS-ZIMLP) regression model for forecasting alcohol use dynamics. The model partitions individuals’ data into two phases, known as regimes, with: (1) a zero-inflation regime that is used to accommodate high instances of zeros (non-drinking) and (2) a multilevel Poisson regression regime in which variations in individuals’ log-transformed average rates of alcohol use are captured by means of an autoregressive process with exogenous predictors and a person-specific intercept. The times at which individuals are in each regime are unknown, but may be estimated from the data. We assume that the regime indicator follows a first-order Markov process as related to exogenous predictors of interest. The forecast performance of the proposed model was evaluated using a Monte Carlo simulation study and further demonstrated using substance use and spatial covariate data from the Colorado Online Twin Study (CoTwins). Results showed that the proposed model yielded better forecast performance compared to a baseline model which predicted all cases as non-drinking and a reduced ZIMLP model without the RS structure, as indicated by higher AUC (the area under the receiver operating characteristic (ROC) curve) scores, and lower mean absolute errors (MAEs) and root-mean-square errors (RMSEs). The improvements in forecast performance were even more pronounced when we limited the comparisons to participants who showed at least one instance of transition to drinking.

Keywords

Bayesian zero-inflated Poisson model forecast intensive longitudinal data regime-switching spatial data substance use

Type: Theory and Methods
Information: Psychometrika , Volume 87 , Issue 2: Special Issue on Forecasting with Intensive Longitudinal Data , June 2022 , pp. 376 - 402

DOI: https://doi.org/10.1007/s11336-021-09831-9 [Opens in a new window]
Copyright: Copyright © 2022 The Author(s) under exclusive licence to The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Arminger, G. (1986). Linear stochastic differential equation models for panel data with unobserved variables. Sociological Methodology, 16, 187– 212. CrossRef Google Scholar

Berry, L. R., & West, M. (2020). Bayesian forecasting of many count-valued time series. Journal of Business and Economic Statistics, 38 (4), 872– 887. CrossRef Google Scholar

Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30 (7), 1145– 1159. CrossRef Google Scholar

Bronfenbrenner, U. (1992). Ecological systems theory. Jessica Kingsley Publishers.Google Scholar

Byrnes, H. F., Miller, B. A., Morrison, C. N., Wiebe, D. J., Remer, L. G., & Wiehe, S. E. (2016). Brief report: Using global positioning system (GPS) enabled cell phones to examine adolescent travel patterns and time in proximity to alcohol outlets. Journal of Adolescence, 50, 65– 68. CrossRef Google Scholar PubMed

Byrnes, H. F., Miller, B. A., Morrison, C. N., Wiebe, D. J., Woychik, M., & Wiehe, S. E. (2017). Association of environmental indicators with teen alcohol use and problem behavior: Teens’ observations vs. objectively-measured indicators. Health and Place, 43, 151–157.CrossRef Google Scholar

Cao, H., Li, X. -L., Woon, D. Y. -K., & Ng, S. -K. (2013). Integrated oversampling for imbalanced time series classification. IEEE Transactions on Knowledge and Data Engineering, 25 (12), 2809– 2822. CrossRef Google Scholar

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321– 357. CrossRef Google Scholar

Chow, S. -M. (2019). Practical tools and guidelines for exploring and fitting linear and nonlinear dynamical systems models. Multivariate Behavioral Research, 54 (5), 690– 718. CrossRef Google Scholar PubMed

Chow, S. -M., Witkiewitz, K., Grasman, R. P. P. P., Maisto, S. A. (2015). The cusp catastrophe model as cross-sectional and longitudinal mixture structural equation models. Psychological Methods, 20, 142– 164. CrossRef Google Scholar PubMed

Chow, S. -M., & Zhang, G. (2013). Nonlinear regime-switching state-space (RSSS) models. Psychometrika Application Reviews and Case Studies, 78 (4), 740– 768. Google Scholar PubMed

Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures. Multivariate Behavior Research, 18, 147– 167. CrossRef Google Scholar PubMed

De Jong, P. (1988). A cross-validation filter for time series models. Biometrika, 75, 594– 600. CrossRef Google Scholar

Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence (Vol .17, pp. 973–978).Google Scholar

Ester, M., Kriegel, H. -P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd (Vol 96, pp. 226–231).Google Scholar

Gelfand, A. E., Dey, D. K. & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. Bayesian Statistics 4 (p. 147–159). Oxford University Press.CrossRef Google Scholar

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis, New York: CRC Press. CrossRef Google Scholar

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6 (6), 721– 741. CrossRef Google Scholar PubMed

Geng, Y., & Luo, X. (2019). Cost-sensitive convolutional neural networks for imbalanced time series classification. Intelligent Data Analysis, 23 (2), 357– 370. CrossRef Google Scholar

Hahsler, M., Piekenbrock, M., & Doran, D. (2019). dbscan: Fast density-based clustering with R. Journal of Statistical Software, 25, 409– 416. Google Scholar

Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics, 56 (4), 1030– 1039. CrossRef Google Scholar PubMed

Hamilton, J. D. (1994). Time series analysis (Vol. 2). Princeton New Jersey.Google Scholar

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143 (1), 29– 36. CrossRef Google Scholar

Harvey, A. C. (2001). Forecasting, structural time series models and the Kalman filter, Cambridge: Cambridge University Press. Google Scholar

Helske, J. (2017). KFAS: Exponential family state space models in R. Journal of Statistical Software, 78 (10), 1– 39. CrossRef Google Scholar

Howard, A. L., Patrick, M. E., & Maggs, J. L. (2015). College student affect and heavy drinking: Variable associations across days, semesters, and people. Psychology of Addictive Behaviors, 29 (2), 430 CrossRef Google Scholar PubMed

Jacobson, N. C., Chow, S. -M., & Newman, M. G. (2019). The differential time-varying effect model (DTVEM): Identifying optimal time lags in intensive longitudinal data. Behavioral Research Methods, 51 (1), 295– 315. 10.3758/s13428-018-1101-0 CrossRef Google Scholar PubMed

James, P., Berrigan, D., Hart, J. E., Hipp, J. A., Hoehner, C. M., Kerr, J., & Laden, F. (2014). Effects of buffer size and shape on associations between the built environment and energy balance. Health and Place, 27, 162– 170. CrossRef Google Scholar PubMed

Jane-Llopis, E., & Matytsina, I. (2006). Mental health and alcohol, drugs and tobacco: A review of the comorbidity between mental disorders and the use of alcohol, tobacco and illicit drugs. Drug and Alcohol Review, 25 (6), 515– 536. CrossRef Google Scholar PubMed

Ji, L., Chen, M., Oravecz, Z., Cummings, E. M., Lu, Z. -H., & Chow, S. -M. (2020). A Bayesian vector autoregressive model with nonignorable missingness in dependent variables and covariates: Development, evaluation, and application to family processes. Structural Equation Modeling: A Multidisciplinary Journal, 27 (3), 442– 467. CrossRef Google Scholar PubMed

Kim, C. -J., & Nelson, C. R. (1999). State-space models with regime switching: classical and Gibbs-sampling approaches with applications. MIT Press Books.CrossRef Google Scholar

Kuiper, R. M., & Ryan, O. (2018). Drawing conclusions from cross-lagged relationships: Re-considering the role of the time-interval. Structural Equation Modeling: A Multidisciplinary Journal, 25 (5), 809– 823. CrossRef Google Scholar

Kuppens, P., Allen, N. B., & Sheeber, L. B. (2010). Emotional inertia and psychological maladjustment. Psychological Science, 21 (7), 984– 991. CrossRef Google Scholar PubMed

Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34 (1), 1– 14. CrossRef Google Scholar

Lee, A. H., Wang, K., Scott, J. A., Yau, K. K., & McLachlan, G. J. (2006). Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros. Statistical Methods in Medical Research, 15 (1), 47– 61. CrossRef Google Scholar PubMed

Li, Y., Ji, L., Oravecz, Z., Brick, T. R., Hunter, M. D., & Chow, S. -M. (2019). dynr.mi: An R program for multiple imputation in dynamic modeling. International Journal of Computer Electrical Automation Control and Information Engineering. 13(5), 302–311.Google Scholar

Li, Y., Wood, J., Ji, L., Chow, S. -M., & Oravecz, Z. (2021). Fitting multilevel vector autoregressive models in Stan, JAGS, and Mplus. Structural Equation Modeling A Multidisciplinary Journal, 5, 1– 24. Google Scholar

Litt, M. D., Cooney, N. L., & Morse, P. (1998). Ecological momentary assessment (EMA) with treated alcoholics: Methodological problems and potential solutions. Health Psychology, 17 (1), 48 CrossRef Google Scholar PubMed

Little, R. J. A, & Rubin, D. B. (1987). Statistical analysis with missing data, New York: Wiley. Google Scholar

Lu, Z. -H., Chow, S. -M., Ram, N., & Cole, P. M. (2019). Zero-inflated regime-switching stochastic differential equation models for highly unbalanced multivariate, multi-subject time-series data. Psychometrika, 84 (2), 611– 645. CrossRef Google Scholar PubMed

Lu, Z. -H., Chow, S. -M., Sherwood, A., & Zhu, H. (2015). Bayesian analysis of ambulatory cardiovascular dynamics with application to irregularly spaced sparse data. Annals of Applied Statistics, 9, 1601– 1620. CrossRef Google Scholar PubMed

Lütkepohl, H. (2005). Introduction to multiple time series analysis, 2 New York: Springer-Verlag. CrossRef Google Scholar

MacCallum, R. C., Roznowski, M., Mar, C. M., & Reith, J. V. (1994). Alternative strategies for cross-validation of covariance structure models. Multivariate Behavioral Research, 29 (1), 1– 32. CrossRef Google Scholar PubMed

Maisto, S. A., Xie, F. C., Witkiewitz, K., Ewart, C. K., Connors, G. J., Zhu, H., & Chow, S. -M. (2017). How chronic self-regulatory stress, poor anger regulation, and momentary affect undermine treatment for alcohol use disorder: Integrating social action theory and the dynamic model of relapse. Journal of Social and Clinical Psychology, 36, 238– 263. CrossRef Google Scholar

Min, Y., & Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Statistical Modelling, 5 (1), 1– 19. CrossRef Google Scholar

Moniz, N., Branco, P., & Torgo, L. (2017). Resampling strategies for imbalanced time series forecasting. International Journal of Data Science and Analytics, 3 (3), 161– 181. CrossRef Google Scholar

Neal, R. M. (2003). Slice sampling. Annals of Statistics, 31 (3), 705– 741. CrossRef Google Scholar

Neelon, B. H., O’Malley, A. J., & Normand, S. -L. T. (2010). A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use. Statistical Modelling, 10(4), 421–439.CrossRef Google Scholar

Oravecz, Z., Tuerlinckx, F., & Vandekerckhove, J. (2011). A hierarchical latent stochastic differential equation model for affective dynamics. Psychological Methods, 16 (4), 468 CrossRef Google Scholar PubMed

Orrù, G., Monaro, M., Conversano, C., Gemignani, A., & Sartori, G. (2020). Machine learning in psychometrics and psychological research. Frontiers in Psychology, 10, 2970 CrossRef Google Scholar PubMed

Oud, J. H., & Jansen, R. A. (2000). Continuous time state space modeling of panel data by means of SEM. Psychometrika, 65 (2), 199– 215. CrossRef Google Scholar

Pasch, K. E., Hearst, M. O., Nelson, M. C., Forsyth, A., & Lytle, L. A. (2009). Alcohol outlets and youth alcohol use: Exposure in suburban areas. Health and Place, 15 (2), 642– 646. CrossRef Google Scholar PubMed

Perchoux, C., Chaix, B., Brondeel, R., & Kestens, Y. (2016). Residential buffer, perceived neighborhood, and individual activity space: New refinements in the definition of exposure areas-the RECORD Cohort Study. Health and Place, 40, 116– 122. CrossRef Google Scholar PubMed

Piironen, J., & Vehtari, A. (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27 (3), 711– 735. CrossRef Google Scholar

Plummer, M., et al. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, pp. 1-10).Google Scholar

Reboussin, B. A., Song, E. -Y., & Wolfson, M. (2011). The impact of alcohol outlet density on the geographic clustering of underage drinking behaviors within census tracts. Alcoholism Clinical and Experimental Research, 35 (8), 1541– 1549. Google Scholar PubMed

Roychoudhury, S., Ghalwash, M., & Obradovic, Z. (2017). Cost sensitive time-series classification. In Joint European conference on machine learning and knowledge discovery in databases (pp. 495–511).CrossRef Google Scholar

Russell, M. A., Almeida, D. M., & Maggs, J. L. (2017). Stressor-related drinking and future alcohol problems among university students. Psychology of Addictive Behaviors, 31 (6), 676 CrossRef Google Scholar PubMed

Russell, M. A., & Odgers, C. L. (2020). Adolescents’ subjective social status predicts day-to-day mental health and future substance use. Journal of Research on Adolescence, 30, 532–544.CrossRef Google Scholar

Sánchez-Sánchez, P. A., García-González, J. R., & Coronell, L. H. P. (2019). Encountered problems of time series with neural networks: Models and architectures. IntechOpen: In Recent trends in artificial neural networks-from training to prediction.Google Scholar

Shen, H. (2010). Exponentially weighted methods for forecasting intraday time series with multiple seasonal cycles: Comments. International Journal of Forecasting, 26, 653– 654. CrossRef Google Scholar

Substance Abuse and Mental Health Services Administration, Office of Applied Studies. (2008). Results from the 2007 National Survey on Drug Use and Health: National Findings (DHHS Publication No. SMA 08-4343, NSDUH Series H-34). Rockville, MD: Substance Abuse and Mental Health Services Administration.Google Scholar

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27 (5), 1413– 1432. CrossRef Google Scholar

Voelkle, M. C., Oud, J. H., Davidov, E., & Schmidt, P. (2012). An SEM approach to continuous time modeling of panel data: Relating authoritarianism and anomia. Psychological Methods, 17 (2), 176 CrossRef Google Scholar PubMed

West, M., & Harrison, J. (1997). Bayesian forecasting and dynamic models, 2 New York: Springer-Verlag. Google Scholar

Wilhelm, F. H., Grossman, P., & Muller, M. I. (2012). Bridging the gap between the laboratory and the real world: Integrative ambulatory psychophysiology. In Handbook of research methods for studying daily life (pp. 210–234). Guilford: New York.Google Scholar

Wray, T. B., Merrill, J. E., & Monti, P. M. (2014). Using ecological momentary assessment (EMA) to assess situation-level predictors of alcohol use and alcohol-related consequences. Alcohol Research: Current Reviews, 36 (1), 19– Google Scholar PubMed

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12 (6), 1100– 1122. CrossRef Google Scholar PubMed

Yau, K. K., & Lee, A. H. (2001). Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Statistics in Medicine, 20 (19), 2907– 2920. CrossRef Google Scholar PubMed

You, D., Hunter, M., Chen, M., & Chow, S. -M. (2019). A diagnostic procedure for detecting outliers in linear state-space models. Multivariate Behavioral Research, (PMID: 31264463) Google Scholar PubMed

Zhou, S., Li, Y., Bodovski, Y., Chi, G., & Chow, S.-M. (2021a). GPS2space: An open-source Python library for spatial data building and spatial measure extraction. https://github.com/shuai-zhou/gps2space. https://doi.org/10.5281/zenodo.4672651.CrossRef Google Scholar

Zhou, S., Li, Y., Chi, G., Yin, J., Oravecz, Z., Bodovski, Y., ... & Chow, S. M. (2021b). GPS2space: an open-source Python library for spatial measure extraction from GPS data. Journal of Behavioral Data Science, 1(2), 127–155.CrossRef Google Scholar

Article contents

Bayesian Forecasting with a Regime-Switching Zero-Inflated Multilevel Poisson Regression Model: An Application to Adolescent Alcohol Use with Spatial Covariates

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests