In this paper, we apply flexible data-driven analysis methods on large-scale mass transit data to identify areas for improvement in the engineering and operation of urban rail systems. Specifically, we use data from automated fare collection (AFC) and automated vehicle location (AVL) systems to obtain a more precise characterisation of the drivers of journey time variance on the London Underground, and thus an improved understanding of delay. Total journey times are decomposed via a probabilistic assignment algorithm, and semiparametric regression is undertaken to disentangle the effects of passenger-specific travel characteristics from network-related factors. For total journey times, we find that network characteristics, primarily train speeds and headways, represent the majority of journey time variance. However, within the typically twice as onerous access and egress time components, passenger-level heterogeneity is more influential. On average, we find that intra-passenger heterogeneity represents 6% and 19% of variance in access and egress times, respectively, and that inter-passenger effects have a similar or greater degree of influence than static network characteristics. The analysis shows that while network-specific characteristics are the primary drivers of journey time variance in absolute terms, a nontrivial proportion of passenger-perceived variance would be influenced by passenger-specific characteristics. The findings have potential applications related to improving the understanding of passenger movements within stations, for example, the analysis can be used to assess the relative way-finding complexity of stations, which can in turn guide transit operators in the targeting of potential interventions.