This paper synthesizes and extends the literature on multivariate two-part regression modelling, with an emphasis on actuarial applications. To illustrate the modelling, we use data from the US Medical Expenditure Panel Survey to explore expenditures that come in two parts. In the first part, zero expenditures correspond to no payments for health care services during a year. For the second part, a positive expenditure corresponds to the payment amount, a measure of utilization. Expenditures are multivariate, the five components being (i) office-based, (ii) hospital outpatient, (iii) emergency room, (iv) hospital inpatient, and (v) home health expenditures. Not surprisingly, there is a high degree of association among expenditure types and so we utilize models that account for these associations. These models include multivariate binary regressions for the payment type and generalized linear models with Gaussian copulas for payment amounts.
As anticipated, the strong associations among expenditure types allow us to establish significant model differences on an in-sample basis. Despite these strong associations, we find that commonly used statistical measures perform similarly on a held-out validation sample. In contrast, out-of-sample risk measures used by actuaries reveal differences in the association among expenditure types.