1. Introduction
Transportation systems and commuting patterns shape and reveal functional regions in a city (Barbosa et al. Reference Barbosa, Barthelemy, Ghoshal, James, Lenormand, Louail, Menezes, Ramasco, Simini and Tomasini2018; Barthelemy Reference Barthelemy2019). The continuing development of network-analysis methods and the increasing availability of transportation data (from a multitude of transportation modes) give exciting opportunities to improve understanding of urban dynamics at both large and small scales. One prominent type of transportation service is bicycle-sharing systems, which can adapt quickly to the needs of travelers. The number of bicycle-sharing programs worldwide has grown rapidly, from 5 in 2005 to 1571 in 2018 (Schmidt Reference Schmidt2018). Over 50 systems were launched in the United States alone between 2010 and 2016, and over 20 bicycle-sharing systems have been launched in France since 2005 (Etienne and Latifa Reference Etienne and Latifa2014; Bike share in the U.S 2018). Many existing bicycle-sharing systems are also growing. For example, the number of stations in New York City’s “Citi Bike” system has more than doubled since it debuted in 2013.
Docked bicycle-sharing systems follow a general structure: Groups of bicycles are parked at “stations” (which are also called “docks” or “hubs”) throughout a coverage area, and users withdraw and return bicycles to these stations on demand, with a cost that depends on usage time. A growing number of bicycle-sharing systems are dockless (as are the increasingly prominent e-scooters), so users can park bicycles at any location in a coverage area. In the present paper, we analyze docked systems, but we consider how to adapt our models to dockless systems in Section 7.
Data from bicycle-sharing systems are available from many cities throughout the world (Austwick et al. Reference Austwick, O’Brien, Strano and Viana2013; Romanillos and Zaltz Austwick Reference Romanillos and Zaltz Austwick2016; Wergin and Buehler Reference Wergin and Buehler2017; Munoz-Mendez et al. Reference Munoz-Mendez, Han, Klemmer and Jarvis2018; Romanillos et al. Reference Romanillos, Moya-GÓmez, Zaltz-Austwick and Lamquiz-DaudÉn2018). Such data, which include detailed temporal records and GPS-tracked routes in some cases, are helpful for capturing commuting behavior (Fishman et al. Reference Fishman, Washington and Haworth2013). Analyzing bicycle-sharing data can increase our understanding of urban flows and human commuting. Bicycle-sharing is often used for “last-mile” transportation, bridging the gap between public transportation and a final destination (Griffin and Sener Reference Griffin and Sener2016), and insights into the dynamics of bicycle-sharing systems can help transit systems evolve to meet the needs of changing cities (Ashqar 2019; Natera Orozco 2020). Studying bicycle-sharing systems is also helpful for maintaining and expanding them (Ashqar 2021). For example, it can yield helpful ideas for the efficient redistribution of bicycles, a problem that has received much attention (Shu et al. Reference Shu, Chou, Liu, Teo and Wang2013; Pfrommer et al. Reference Pfrommer, Warrington, Schildbach and Morari2014; Forma et al. Reference Forma, Raviv and Tzur2015; Singhvi et al. Reference Singhvi, Singhvi, Frazier, Henderson, Mahony, Shmoys and Woodard2015).
In the present paper, we propose two models of time-dependent networks to capture the functional roles of bicycle-sharing stations. We use the lens of mesoscale-structure detection in time-dependent networks (Holme and SaramÄki Reference Holme and SaramÄki2012; Holme Reference Holme2015; Fortunato and Hric Reference Fortunato and Hric2016). We examine trip histories from bicycle-sharing systems in the form of multilayer networks (KivelÄ et al. Reference KivelÄ, Arenas, Barthelemy, Gleeson, Moreno and Porter2014; Aleta and Moreno Reference Aleta and Moreno2019; Porter Reference Porter2018) in which each layer is a network of trips that start in a given hour.
We aim to partition each of our networks based on a relational equivalence of nodes (a perspective with a rich history in the social-networks literature (Lorrain and White Reference Lorrain and White1971; Rossi and Ahmed Reference Rossi and Ahmed2015)), rather than on high internal traffic within sets of nodes (Munoz-Mendez et al. Reference Munoz-Mendez, Han, Klemmer and Jarvis2018). The analysis of time-aggregated data can shed light on “community” membership in the latter sense (Austwick et al. Reference Austwick, O’Brien, Strano and Viana2013) through a partition of a network into contiguous spatial clusters (Munoz-Mendez et al. Reference Munoz-Mendez, Han, Klemmer and Jarvis2018). However, such an approach ignores how bicycle-sharing usage relates to travel patterns during a day (Fishman et al. Reference Fishman, Washington and Haworth2013). By contrast, our models are designed to detect functional roles of bicycle-sharing stations based on time-dependent usage patterns.
The models that we introduce in this paper are time-dependent extensions of stochastic block models (SBMs) (Snijders and Nowicki Reference Snijders and Nowicki1997; Nowicki and Snijders Reference Nowicki and Snijders2001; Karrer and Newman Reference Karrer and Newman2011; Peixoto Reference Peixoto2019). We include parameters to describe intra-block and inter-block traffic that starts in each hour, but we fix the block assignment of each node over time. That is, we treat a bicycle-sharing network as a temporal multilayer network with fixed node identities across layers (KivelÄ et al. Reference KivelÄ, Arenas, Barthelemy, Gleeson, Moreno and Porter2014; Aleta and Moreno Reference Aleta and Moreno2019). We use the terminology “time-dependent” to emphasize the time-dependent nature of our application and the associated interpretations of our results, but our models are relevant for studying any multilayer network with the same set of nodes across layers and no interlayer edges. We also note that the layers do not need to be ordered.
We formulate both mixed-membership and discrete-membership models, where nodes can be members of multiple blocks or exactly one block, respectively. Our models are degree-corrected. In an SBM without degree-correction, the expected weight of an edge is determined by the block memberships of its incident nodes. By contrast, in a degree-corrected SBM, the expected weight of an edge also depends on the activity levels (i.e., the degrees) of its incident nodes. We extend the degree-correction that was introduced by Karrer and Newman (Reference Karrer and Newman2011) to our multilayer SBMs (see Section 3). Incorporating degree-correction when modeling networks with heterogeneous degrees can increase the performance of a model while only slightly increasing its complexity (Karrer and Newman Reference Karrer and Newman2011).In Section 6, we illustrate that degree-correction enables us to identify blocks based on their functional roles in a network, rather than based on their levels of activity. This is important for bicycle-sharing networks, as stations have heterogeneous numbers of parking spaces for bicycles and different neighborhoods have different baseline levels of bicycle usage. Our models are applicable to directed networks and are easily extendable to undirected networks. Additionally, although our models are inspired by the analysis of bicycle-sharing systems, they are applicable more generally to multilayer networks in which nodes have heterogeneous degrees and belong to fixed classes.
Community detection in multilayer networks is a very active research area, and it has yielded insights into many problems in diverse disciplines (Cranmer et al. Reference Cranmer, Menninga and Mucha2015; Valdano et al. Reference Valdano, Ferreri, Poletto and Colizza2015; Barbillon et al. Reference Barbillon, Donnet, Lazega and Bar-Hen2017; Papadopoulos et al. Reference Papadopoulos, Porter, Daniels and Bassett2018; Kobayashi et al. Reference Kobayashi, Takaguchi and Barrat2019; Bazzi et al. Reference Bazzi, Jeub, Arenas, Howison and Porter2020). Many algorithms for community detection have been generalized to multilayer networks. See, for example, Mucha et al. (Reference Mucha, Richardson, Macon, Porter and Onnela2010), Yang et al. (Reference Yang, Chi, Zhu, Gong and Jin2011), Paul and Chen (2016), Stanley et al. (Reference Stanley, Shai, Taylor and Mucha2016), Valles-Catala et al. (Reference Valles-Catala, Massucci, Guimera and Sales-Pardo2016), Barbillon et al. (Reference Barbillon, Donnet, Lazega and Bar-Hen2017), Jeub et al. (Reference Jeub, Mahoney, Mucha and Porter2017), and Zhang et al. (Reference Zhang, Moore and Newman2017).
The SBMs that we introduce have parameters for block-to-block activity that are distinct for different departing hours; however, the block-membership (i.e., community-membership) parameters are fixed across all hours. Most related SBMs that have been introduced differ in one or both of these features. Yang et al. (Reference Yang, Chi, Zhu, Gong and Jin2011) introduced a discrete-membership SBM in which the parameters for block-to-block activity are fixed with respect to time but block memberships evolve with time. Xu and Hero (Reference Xu and Hero2014) and Matias and Miele (Reference Matias and Miele2017) proposed related models (for unweighted and weighted networks, respectively). However, unlike in the model of Yang et al., their models do not fix block-to-block activity parameters over time. Zhang et al. (Reference Zhang, Moore and Newman2017) developed a time-dependent block model with degree-correction in which block membership does not change with time and the edge dynamics are described by a continuous-time Markov process, with edges that are added or removed between block pairs at constant rates. Mixed-membership SBMs with time-evolving communities have also been developed (Xing et al. Reference Xing, Fu and Song2010; Ho et al. Reference Ho, Song and Xing2011). See Rossetti and Cazabet (Reference Rossetti and Cazabet2018) for a survey of community-detection methods for time-dependent networks.
Community detection has been applied to urban bicycle-sharing systems using a variety of approaches (Rosvall et al. Reference Rosvall, Axelsson and Bergstrom2009; Borgnat et al. Reference Borgnat, Abry, Flandrin, Robardet, Rouquier and Fleury2011; Austwick et al. Reference Austwick, O’Brien, Strano and Viana2013; Akbarzadeh et al. Reference Akbarzadeh, Mohri and Yazdian2018; Munoz-Mendez et al. Reference Munoz-Mendez, Han, Klemmer and Jarvis2018; Xie and Wang Reference Xie and Wang2018; He et al. Reference He, Glasser, Bhamidi and Kaza2019; Kobayashi et al. Reference Kobayashi, Takaguchi and Barrat2019). Zhu et al. (Reference Zhu, Khan, Kats, Santosh Bamne and Sobolevsky2018) applied k-means clustering to undirected, time-dependent usage data from bicycle-sharing systems and other urban systems in New York City. Austwick et al. (Reference Austwick, O’Brien, Strano and Viana2013) examined modularity optimization with a directed and spatially-corrected null model to identify communities of bicycle-sharing stations in several cities. However, they detected communities in time-aggregated networks; as they pointed out, there are significant limitations in bicycle-sharing applications when examining community structure while ignoring time-dependent usage patterns. Munoz-Mendez et al. (Reference Munoz-Mendez, Han, Klemmer and Jarvis2018) identified communities in hourly slices of bicycle-sharing data from London using an InfoMap algorithm (Rosvall et al. Reference Rosvall, Axelsson and Bergstrom2009) that respects the directed nature of edges in the underlying bicycle-trip networks. The changes with time that they discovered in those communities highlight the importance of time of day in the usage patterns of bicycle-sharing systems.
There is some recent work that is closely related to ours. Matias et al. (Reference Matias, Rebafka and Villers2018) constructed a time-dependent, discrete-membership SBM with fixed blocks over time and applied it to bicycle-sharing networks in London. They detected some functional blocks, but their approach does not incorporate degree-correction. Xie and Wang (Reference Xie and Wang2018) employed an approach that does not use an SBM directly, but they were able to successfully partition a bicycle-sharing network to find home and work roles of bicycle-sharing stations. They used the ratio of in-degree to out-degree at different times to discover home–work splits during peak commute times. In the present paper, we observe such splits using our two-block models. A similarity of their approach to ours is that it corrects for degree; a key difference is that they relied on human supervision to determine peak hours, whereas our models implicitly increase the weights of more-active hours in our likelihood function. Etienne and Latifa (Reference Etienne and Latifa2014) used a Poisson mixture model to cluster bicycle-sharing stations in Paris based on their time-dependent usage profiles. They were able to capture the time-dependent activity patterns of each group of stations, distinguish between arrival and departure activity, control for the overall activity level of a given station (via degree-correction), and associate the groups that they identified with their roles in the city. A major difference between their approach and ours is that we distinguish activity between blocks, which allows us to detect behavior like last-mile commuting that occurs within blocks. We also consider mixed membership, which they proposed as a subject for future work.
Our paper proceeds as follows. In Section 2, we list our data sources and present basic statistical analysis of the data. In Section 3, we introduce our time-dependent SBMs—a mixed-membership model and a discrete-membership model—and we show that they are equivalent up to a constraint. In Section 4, we describe estimation algorithms for our SBMs. In Section 5, we generate synthetic networks using both of our models; we evaluate the fits of our models and a model by Matias et al. (Reference Matias, Rebafka and Villers2018) to the generated data. In Section 6, we present the results of fitting our models to bicycle-sharing networks in Los Angeles, San Francisco, and New York City. We summarize our results, discuss their implications for bicycle-sharing systems, and suggest areas of further study in Section 7. We show some additional details of our work in the Appendix. We provide code and data to implement our models and replicate the results in our paper at https://github.com/jcarlen/tdsbm_supplementary_material.
2. Data
We examine bicycle-sharing systems in Los Angeles, the Bay Area, and New York City. For Los Angeles, we study only the system’s downtown region, which is self-contained in the sense that very few trips connect it to stations that are not downtown. Similarly, for the Bay Area, we consider only the San Francisco network and exclude stations in Mountain View, Palo Alto, and San Jose. (If we applied our models to these nearly disconnected portions of the Bay Area bicycle-sharing network, it would be better to treat them as separate networks than to model them jointly.) Each of our three focal systems covers a dense urban area, but they vary in their sizes and daily usage. Because of this variability and how the data were reported, we study different time periods for each system. We also select our time periods to exclude days that are likely to be extremely hot or extremely cold. Each of the bicycle-sharing systems that we study has an open-data portal, from which we downloaded our data. After cleaning (see our discussion in the next paragraph), our data consist of the following:
• Downtown Los Angeles: October–December 2016; there are 61 stations and 40,130 trips, of which 73.4% are during weekdays (NYCBS 2017).
-
• San Francisco: September 2015–August 2016; there are 35 stations and 267,412 trips, of which 92.1% are during weekdays (Bay Area Bike Share 2017).
-
• New York City: October 2016; there are 601 stations and 1,551,692 trips, of which 75.6% are during weekdays (LA Metro 2017).
A trip consists of a user checking out a bicycle from a fixed location (a station that includes multiple parking spaces) and returning it to a station. The data for each trip include the starting time; the ending time; and the starting and ending locations by station ID, latitude, and longitude. Each data set also has a few additional fields about the users; these details include whether or not they have memberships in the bicycle-sharing system, but we do not use this information in our investigation. We clean the data by removing anomalous trips, including extremely short and extremely long trips,Footnote 1 and trips that were for testing or maintenance (as indicated in the data). We also exclude a very small number of stations (two in Los Angeles and six in New York City) that did not have at least one departure and at least one arrival during the examined time period. Finally, we exclude one station in New York City that was accessible to other stations only by ferry; it was involved in only nine trips during the examined time period. In total, cleaning removes 7.1% of the trips in Los Angeles, 4.5% of the trips in San Francisco, and 1.4% of the trips in New York City.
We retain self-edges, which represent trips that start and end at the same station. Although it is common to remove self-edges when analyzing networks (Newman Reference Newman2018), we expect the self-edges in our data to have a very similar data-generating mechanism as trips to nearby stations. As in Karrer and Newman (Reference Karrer and Newman2011), including self-edges also simplifies some aspects of parameter estimation. When fitting our models, we include only weekday trips, as we observe that weekday and weekend activity follow distinct patterns (see Figure 1). Moreover, weekend activity does not reflect daily commuting behavior. From the data sets, we construct multilayer networks that are both weighted and directed. In our networks, nodes represent stations, edge values encode the number of directed trips from one station to another that begin in a specified hour, and each of the 24 layers encompasses the trips that start in a certain hour.
2.1 Preliminary data analysis
When examining the bicycle-sharing data sets, we observe strong daily usage patterns, which motivate the development of our time-dependent SBMs. In Figure 1, we illustrate these usage patterns by plotting the number of trips by starting hour for each city. In each city, weekday activity spikes during morning and evening commuting hours (and Los Angeles also has a mid-day spike), whereas weekend trips peak in early afternoon. Similar patterns have been observed previously for bicycle-sharing systems in New York City and many other cities (Austwick et al. Reference Austwick, O’Brien, Strano and Viana2013; Etienne and Latifa Reference Etienne and Latifa2014; Taiyeb Reference Taiyeb2014; Xie and Wang Reference Xie and Wang2018; Zhu et al. Reference Zhu, Khan, Kats, Santosh Bamne and Sobolevsky2018).
For individual stations, the morning and evening peaks for arriving trips (i.e., in-edges) and departing trips (i.e., out-edges) are often unbalanced. One direction has a stronger morning peak, and the opposite direction has a stronger evening peak. In the bottom panel of Figure 1, we show examples of this imbalance for the six busiest stations in San Francisco. (In Appendix A.1, we further explore this issue using singular-value decompositions of the trip data.) These patterns suggest how stations are used in commuting, and they motivate our time-dependent identification of stations into “home” and “work” types.
Another characteristic of our data that we incorporate into the design of our models is the strong positive Pearson correlation coefficient between the total (summed over all hours) in-degree and out-degree of each station; it is $0.99$ in New York City, $0.98$ in San Francisco, and $0.91$ in Los Angeles. Accordingly, we apply degree-correction to the overall degree of each station, rather than to the in-degrees and out-degrees separately (see Section 3.1). This correlation is an intrinsic feature of docked bicycle-sharing systems; at some point, bicycles must be returned to a station before any new trips can originate from it. (The use of trucks to redistribute bicycles within a system can loosen this requirement.) More generally, the strong correlation also relates to a famous axiom of human mobility: for each current of travel, there is an associated countercurrent (Ravenstein Reference Ravenstein1885).
3. Our stochastic block models
In this section, we introduce our time-dependent mixed-membership stochastic block model (TDMM-SBM) and time-dependent discrete stochastic block model (TDD-SBM).
SBMs are a popular class of statistical network models (Peixoto Reference Peixoto2019). The motivating principle of SBMs is a notion of stochastic equivalence in which edges whose incident nodes have the same block membership are identically distributed. It is a standard assumption of SBMs that edge weights are independent, given the block memberships of their associated nodes. More formally, an unweighted (i.e., Bernoulli) random graph Y (with adjacency matrix A) from an SBM with K blocks is defined by
where G (with entries $g_i\in \{1,\ldots, K\}$, with $i \in \{1, \ldots, N\}$) is a vector of block assignments for the nodes of Y and $\eta$ is a $K \times K$ matrix of block-to-block edge probabilities. This definition allows directed graphs, in which $\eta$ can be asymmetric. For early presentations of SBMs, see Holland (1983), Faust and Wasserman et al. (1992), Snijders and Nowicki (Reference Snijders and Nowicki1997), and Nowicki and Snijders (Reference Nowicki and Snijders2001). More recent advances have added flexibility to SBMs. Examples include mixed-membership and overlapping-membership SBMs (Airoldi et al. Reference Airoldi, Blei, Fienberg and Xing2008; Latouche et al. Reference Latouche, BirmelÉ and Ambroise2011), models with covariates (Leger et al. Reference Leger, Barbillon and Chiquet2020), degree-corrected SBMs (Karrer and Newman Reference Karrer and Newman2011), and Bayesian implementations (Peixoto Reference Peixoto2019). Applications of SBMs to time-dependent networks include discrete-membership (Yang et al. Reference Yang, Chi, Zhu, Gong and Jin2011; Zhang et al. Reference Zhang, Moore and Newman2017) and mixed-membership (Xing et al. Reference Xing, Fu and Song2010; Ho et al. Reference Ho, Song and Xing2011) models in which the block assignments of nodes are time-dependent and discrete-membership models (without degree-correction) with fixed blocks over time (Xu and Hero Reference Xu and Hero2014; Matias et al. Reference Matias, Rebafka and Villers2018).
3.1 Time-dependent mixed-membership stochastic block model (TDMM-SBM)
We now describe the framework for our mixed-membership SBM. Let $i,j \in \mathcal{N}$ (with $|\mathcal{N}| = N$) be nodes, which represent bicycle-sharing stations; let $g,h \in \mathcal{K}$ (with $|\mathcal{K}| = K$) be blocks. We treat both the number of nodes and the number of blocks as fixed and given. To each bicycle-sharing network, we associate a three-dimensional array of size (N,N,T), where T is the number of time slices (i.e., time layers). Our model is also valid when layers have meanings other than time slices. However, because of our application to bicycle-sharing systems, we describe the layers as time layers for convenience. We consider hourly groupings of bicycle trips based on their starting times, and we do not include interlayer edges.Footnote 2 The array entry $A_{ijt}$ is the observed number of trips from station i to station j with a starting time that is at least t and less than $t+1$. Our networks are directed, so we count each trip that starts at node i in hour t and ends at node i (i.e., self-edges) exactly once in $A_{iit}$. We use $\tilde{A}_{ij}=\sum_{t=0}^{23}A_{ijt}$ to denote the edge weights of the associated time-aggregated matrix.
For each node i, there is a latent length-K vector of real numbers $C_{ig}\in [0,1]$. These vectors, which we will estimate, signify the mixed-membership block assignments of the nodes. The block-assignment parameter $C_{ig}$ indicates the “strength” of node i in block g. For each ordered pair g,h of blocks and each time $t\in \{0, 1, \ldots, 23\}$ (where $t = 0$ represents the hour that starts at midnight), there is a parameter $\omega_{ght}$, which we call the “block-connectivity” parameter (i.e., the “block-to-block” activity parameter), that represents the directed activity from block g to block h in layer t. Note that $\omega_{ght}$ need not be equal to $\omega_{hgt}$ for a directed network; in our application, this captures asymmetries in the numbers of trips with respect to reversing origins and destinations. We also define the notation $\tilde{\omega}_{gh}=\sum_{t=0}^{23} \omega_{ght}$ for our time-aggregated matrix.
For each pair of nodes, i and j, we assume that the number of trips that depart from i in hour t and arrive at j at any time is Poisson-distributed with mean $\mu_{ijt} = \sum_{g,h} C_{ig}\omega_{ght} C_{jh}$. Our use of the Poisson distribution follows Karrer and Newman (Reference Karrer and Newman2011) and Peixoto (Reference Peixoto2019), facilitates computation, and is standard for modeling count data (although overdispersion is a concern).
For identifiability (i.e., to allow unambiguous inference), we apply the constraint $\sum_i C_{ig}=1$ for all g. This does not constrain the set of possible models in terms of realizable mean edge activities $\mu_{ijt}$. Consider a model with unconstrained parameters $\omega_{ght}$ and $C_{ig}$. A model with parameters $\omega'_{ght}$ and $C'_{ig}$ such that $C'_{ig}=\frac{C_{ig}}{\sum_{j} C_{jg}}$ and $\omega'_{ght}=\omega_{ght}\left(\sum_{j} C_{jg}\right)\left(\sum_{j}C_{jh}\right)$ gives an equivalent parameterization, because the means of the distributions of edge weights are equal to the those of the model with unconstrained parameters. That is, $\mu'_{ijt}=\sum_{g,h}C'_{ig}\omega'_{ght}C'_{jh}=\sum_{g,h}C_{ig}\omega_{ght}C_{jh}=\mu_{ijt}$.
Given that $\sum_{i} C_{ig}=1$, we can think of $C_{ig}$ as the proportion of the total activity of block g from the activity of node i. We interpret $C_{ig}$ relative to $C_{ih}$ as how strongly block g is associated with node i relative to how strongly block h is associated with node i. The expected total number of trips at node i is $\sum_g C_{ig}\sum_{h,t} (\omega_{ght}+\omega_{hgt})$. By contrast, $\sum_g C_{ig}$ is a measure of the activity of node i in which we do not weight $C_{ig}$ by the total activity of its corresponding block. We use $\sum_g C_{ig}$ to determine the size of node i when plotting the block assignments that we infer using the TDMM-SBM (see the left panels of Figures 3 and 6) because it helps ensure that we do not overlook blocks with important usage patterns but relatively low activity. The parameters $C_{ig}$, with i fixed and $g \in \mathcal{K}$, extend the degree-correction parameter for SBMs that was introduced by Karrer and Newman (Reference Karrer and Newman2011) to mixed block membership. We elaborate on this connection in Section 3.2, where we introduce a model that specifies that nodes are assigned to only one block.
We now compute the likelihood function that we will optimize to obtain maximum-likelihood estimates (MLEs) of parameters. We assume conditional independence between the Poisson-distributed hourly numbers of trips along each edge, given model parameters, so the likelihood of the data is
where $\mathbf{\omega}$ and $\mathbf{C}$ are the model parameters (i.e., $\mathbf{\omega}=\{\omega _{ght}\}$ and $\mathbf{C}=\{C_{ig}\}$). Note that $\mu_{ijt}=\sum_{g,h} C_{ig}\omega_{ght} C_{jh}$ is a function of these parameters, the set $\mathcal{N}$ of nodes of the network is fixed and given, and the number K of blocks is also fixed and given. The unnormalized log-likelihood is
where we omit the constant ${-}\!\sum_{i,j,t}\log\left(A_{ijt}!\right)$ because it does not affect the maximum of the function. In (3) and throughout this paper, we use “log” to denote the natural logarithm.
3.2 Time-dependent discrete stochastic block model (TDD-SBM)
We derive a single-membership SBM from our mixed-membership SBM by making the extra assumption, for each node $i\in \mathcal{N}$, that $C_{ig}>0$ for only one block $g \in \mathcal{K}$. For our single-membership SBM, we introduce some new notation to aid our description and promote consistency with the notation in other work (Karrer and Newman Reference Karrer and Newman2011; Zhu et al. Reference Zhu, Yan and Moore2013). For a given node i, the block g for which $C_{ig}>0$ is the block $g_i$ that is associated with node i. Therefore, we describe this model as a discrete-membership model. We introduce a scalar parameter $\theta_i=C_{ig_i}$ for each node i; it is a multilayer extension of the degree-correction parameter in Karrer and Newman (Reference Karrer and Newman2011). The expectation of the Poisson distribution of the value of an edge from node i to node j at time t is $\theta_i\theta_j\omega_{g_ig_jt}=C_{ig_i}\omega_{g_{i},g_{j}t} C_{jg_{j}}{=\mu_{ijt}}$. We retain the sum constraints of our mixed-membership model, so $\sum_{i\in g}\theta_i = 1$ for all g.
We compute optimal values of the parameters $\mathbf{\omega}$ and $\mathbf{\theta}=\{\theta_i\}_{i\in\mathcal{N}}$. As in our TDMM-SBM, we take $\mathcal{N}$ and K to be fixed and given. Again dropping the constant term $-\!\sum_{i,j,t}\log\left(A_{ijt}!\right)$, the log-likelihood of our discrete-membership SBM is
This resembles the degree-corrected SBM in equation (14) of Karrer and Newman (Reference Karrer and Newman2011), but we have adapted it to directed networks and summed over time layers.
We find explicit formulas for the conditional MLEs of $\theta_i$ and $\omega_{ght}$, given the block memberships $\{g_i\}_{i \in \mathcal{N}}$. In the following calculations, the absence of t in the subscript of a parameter and the presence of a tilde designates a sum over all t. Specifically, we define $\tilde{A}_{ij}=\sum_{t=0}^{23} A_{ijt}$ and $\tilde{\omega}_{gh}=\sum_{t=0}^{23} \omega_{ght}$. We differentiate $\ell$ with respect to $\omega_{ght}$ to obtain
where we have used the blockwise sum constraints on $\theta$ to simplify $\sum_{i\in g, j\in h}\theta_i\theta_j$ to 1. Therefore, the conditional MLE for $\omega_{ght}$ is
where $m_{ght}$ is the sum of the weights of the edges from nodes in block g to nodes in block h in layer t. That is, $m_{ght} = \sum_{i\in g,j\in h}A_{ijt}$.
We then differentiate $\ell$ with respect to $\theta_i$ to obtain
At $\hat{\omega}_{ght}$, the conditional MLE for $\theta_i$ is
where $k_i=\sum_j \left(\tilde{A}_{ij}+\tilde{A}_{ji}\right)$ is the degree of node i (i.e., the sum of the in-degree and the out-degreeFootnote 3 of i) over all time layers, $\tilde{m}_{gh}=\sum_{t=0}^{23} m_{ght}$, and $\kappa_{g}=\sum_{h} \left(\tilde{m}_{gh}+\tilde{m}_{hg}\right)$ is the sum of the in-degrees and out-degrees of all nodes in block g over all time layers. The term $2\tilde{m}_{gg}$ in the equation for $\kappa_g$ implies that we count each intra-block edge twice: once for departing from g and once for arriving at g. Similarly, $k_i$ includes the term $2\tilde{A}_{ii}$, so we count self-edges twice in this term.
The parameter MLE $\hat{\theta}_i$ in our TDD-SBM for directed, multilayer networks is analogous to the MLE of a degree-correction parameter in a degree-corrected SBM for undirected networks with one layer (Karrer and Newman Reference Karrer and Newman2011). Indeed, the MLE for the degree-correction parameter of a node in the latter model is the degree of that node divided by the sum of the degrees of all nodes in its block. Another similarity between the degree-corrected SBM of Karrer and Newman (Reference Karrer and Newman2011) and our TDD-SBM is that at the MLE of the TDD-SBM, the expectation of the (time-aggregated) degree of a node i is equal to the degree of node i from the observed data. That is, $\sum_{t}\sum_{j}\left(\mu_{ijt}+\mu_{jit}\right) = k_i$. (See Appendix A.2 for the proof.) For our mixed-membership SBM, we are not aware of such a precise relationship between the data and the expected value of any model statistics.
We now substitute the above conditional parameter estimates into the unnormalized log-likelihood of the TDD-SBM to obtain
where $\tilde{m}$ is the total number of edges in the network. We do a similar calculation as one in Karrer and Newman (Reference Karrer and Newman2011) and write
where $k_{\text{in},it}$ and $k_{\text{out},it}$ are the in-degrees and out-degrees of node i in layer t, the quantity $\kappa_{\text{in},gt}=\sum_{i\in g} k_{\text{in},it}$ is the number of edges that arrive at g in hour t, and $\kappa_{\text{out},gt}=\sum_{i\in g} k_{\text{out},it}$ is the number of edges that depart from g in hour t. Including only the terms that depend on block assignments yields the objective function
which we maximize to obtain the MLE of the block assignments.
Unlike the directed SBM of Zhu et al. (Reference Zhu, Yan and Moore2013), our TDD-SBM does not have two degree-correction parameters for each station. Their directed SBM includes a parameter $\theta_i^{\text{in}}$ to modify in-degree and a parameter $\theta_i^{\text{out}}$ to modify out-degree. The validity of using undirected degree-correction parameters $\{\theta_i\}_{i\in\mathcal{N}}$ to model directed networks depends on the presence of a large correlation between the time-aggregated in-degrees and time-aggregated out-degrees of the nodes. In this situation, including distinct $\theta_i^{\text{in}}$ and $\theta_i^{\text{out}}$ parameters would lead to overfitting. The Pearson correlations between in-degree and out-degree are larger than $0.9$ for all of our bicycle-sharing networks (see Section 2). To increase the applicability of our work to other networks (which may not have such large correlations), we provide estimation code in our implementation materials (see Section 4.2) for a version of our TDD-SBM with directed degree-correction parameters $\theta_i^{\text{in}}$ and $\theta_i^{\text{out}}$.
4. Computations
In this section, we describe the algorithms that we use to estimate the parameters of our TDMM-SBM and our TDD-SBM.
4.1 Inference using the TDMM-SBM
Let $\mathbf{\omega}=\{\omega_{ght}\}$ be the $K\times K\times T$ array of block-connectivity parameters, and let $\mathbf{C}=\{C_{ig}\}$ be the matrix of node-strength parameters. We estimate the model parameters using a two-step gradient descent.Footnote 4 First, we move in the direction of the gradient with respect to $\mathbf{\omega}$ and update the block-connectivity parameters. Second, we move along the direction of the gradient with respect to $\mathbf{C}$ and update the node-strength parameters.
In our algorithm, $\omega^{(n)}$ and $\mathbf{C}^{(n)}$ denote the $n^{\text{th}}$ updates of the block-connectivity and node-strength parameters, respectively. We initialize the algorithm with random values $\mathbf{\omega}^{(0)}$ and $\mathbf{C}^{(0)}$ with entries that are distributed according to $\exp(X)$, where X is a Gaussian random variable with mean 0 and variance 1. (That is, we draw random values from a log-normal distribution.) We denote the mean activity along edge (i,j) with initial parameters $\mathbf{\omega}^{(0)}$ and $\mathbf{C}^{(0)}$ by $\mu_{ijt}^{(0)}$. We scale the initial parameters so that the TDMM-SBM at the starting point of the optimization procedure has the same mean total number of trips as the data. Specifically, we multiply the block-connectivity parameters $\omega^{(0)}_{ght}$ by $\left(\sum_{t}\omega^{(0)}_{ght}\right)^{-1}\big(\sum_{i,j,t} A_{ijt}\big)\big/K^2$ and normalize $\mathbf{C}^{0}$ to satisfy the constraint $\sum_i C_{ig}^{(0)}=1$ for each block g. This yields $\sum_{ijt} \mu_{ijt}^{(0)}=\sum_{i,j,t} \sum_{g,h} C_{ig}^{(0)}\omega_{ght}^{(0)} C_{jh}^{(0)}=\sum_{g,h,t} \omega_{ght}^{(0)}=\sum_{g,h}\left(\sum_{i,j,t} A_{ijt}\right)\big/K^2=\sum_{i,j,t} A_{ijt}$, which is the total number of edges in the network. Without this scaling, the initial parameters can have very small magnitudes, such that the mean total number of trips from the TDMM-SBM with these initial parameters is much smaller than the total number of trips in the data. In that situation, the initial likelihood is much smaller than the likelihood at the MLE.
To ensure that our estimated parameters are nonnegative, we use the following change of variables: $\exp(\tilde{\omega}^{(n)})=\omega^{(n)}$ and $\exp(\tilde{\mathbf{C}}^{(n)})=\mathbf{C}^{(n)}$, where the exponential of a matrix or a vector signifies componentwise exponentiation of each of its entries. We can then write the gradient descent as
where $h^{(n)}$ and $\eta^{(n)}$ are small positive numbers. From the definitions of $\tilde{C}^{(n)}$ and $\tilde{\omega}^{(n)}$, we write
Let $h^{(0)}=\eta^{(0)}=\Delta>0$ be the fixed initial step size. For our application, we choose $\Delta =10^{-4}$. We generate two candidate updates for $\mathbf{\omega}^{(n+1)}$ for the first step in our algorithm using $h^{(n+1)}=1.2\,h^{(n)}$ and $h^{(n+1)}=0.8\,h^{(n)}$, and we choose the one that gives the $\mathbf{\omega}^{(n+1)}$ value that yields the larger $\ell(\mathbf{C}^{(n)},\mathbf{\omega}^{(n+1)})$. Similarly, we choose the one of $\eta^{(n+1)}=1.2\,\eta^{(n)}$ and $\eta^{(n+1)}=0.8\,\eta^{(n)}$ that gives the $\mathbf{C}^{(n+1)}$ value that yields the larger $\ell(\mathbf{C}^{(n+1)},\mathbf{\omega}^{(n+1)})$.
We compute the gradient of the log-likelihood function $\ell$ using the chain rule. Recall that we compute the log-likelihood in two parts. One part is the computation of the mean $\mu_{ijt}=\sum_{g,h} C_{ig}\omega_{gh}^t C_{jh}$ of the number of trips from node i to node j at time t. We then insert the expression for the mean into $\ell=\sum_t\sum_{i,j}\left(A_{ijt}\log(\mu_{ijt})-\mu_{ijt}\right)$. We compute the derivative of $\ell$ with respect to $\mu_{ijt}$ to obtain
The derivative of $\mu_{ijt}$ with respect to $C_{kg}$ is
where $\delta_{ab}$ denotes the Kronecker delta (i.e., $\delta_{ab}=1$ if $a=b$ and $\delta_{ab}=0$ if $a \neq b$). The derivative of $\mu_{ijt}$ with respect to $\omega_{ght}$ is
Using the above calculations, we see that the derivatives of $\ell$ with respect to $C_{kg}$ and $\omega_{ght}$ are
We run gradient descent until four significant digits of the base-10 floating-point representation of the log-likelihood (3) do not change for 600 consecutive steps. For the networks that we examine, this usually takes between 600 and 5,000 iterations, with models with more blocks generally needing more iterations to reach this stopping criterion. Because of the nonconvexity of the log-likelihood function (3), we are not guaranteed to reach a global optimum. Most of the time, our method converges to an interesting local optimum (which may also be a global optimum) that appears to reveal functional roles of bicycle-sharing stations (see Section 6). Our results produce recognizable block-connectivity parameters $\omega_{ght}$ (e.g., reflecting home–work commute patterns and leisure patterns) and the parameters $C_{ig}$ indicate known spatial divisions of the stations (e.g., residential versus commercial districts). In some cases, however, our algorithm converges to an uninteresting local optimum; for example, sometimes the block-assignment parameters $C_{ig}$ for each station appear as if they were assigned independently at random. To improve our results, we run our algorithm several times (specifically, 10 times for each network) and store the estimate with the largest likelihood.
For several examples, we compare the parameters that we obtain from gradient descent to those that we obtain by running a Hamiltonian Monte Carlo (HMC) sampling method in a Bayesian framework with weak priors. (We implement this sampling method in $\textsf{Stan}$ (Carpenter et al. Reference Carpenter, Gelman, Hoffman, Lee, Goodrich, Betancourt, Brubaker, Guo, Li and Riddell2017; Stan Development Team 2018).) The log-likelihoods that result from our gradient-descent method are as good or better than those that we obtain with the HMC method. The HMC method is more computationally intensive and memory intensive than our gradient-descent method, although it may be preferable in applications in which one has meaningful prior information about parameters. Improving our optimization method and investigating trade-offs between accuracy and efficiency are worthwhile topics for future work. For instance, it likely will be beneficial to adapt optimization methods for related time-dependent SBMs (Xing et al. Reference Xing, Fu and Song2010; Ho et al. Reference Ho, Song and Xing2011; Yang et al. Reference Yang, Chi, Zhu, Gong and Jin2011; Xu and Hero Reference Xu and Hero2014; Matias et al. Reference Matias, Rebafka and Villers2018) to the optimization of our SBMs.
The repository at https://github.com/jcarlen/tdsbm_supplementary_material has our $\textsf{Python}$ implementation of our gradient-descent method and code for our inference using <monospace>Stan</monospace> in $\textsf{R}$.
4.2 Inference using the TDD-SBM
To fit our TDD-SBM, we use a Kernighan–Lin-type (KL-type) algorithm (Kernighan and Lin Reference Kernighan and Lin1970) that we base on the one in Karrer and Newman (Reference Karrer and Newman2011). (Karrer and Newman (Reference Karrer and Newman2011) noted for their time-independent degree-corrected SBM that their KL-type algorithm is faster than node-switching algorithms and that it achieves results that are similar to those from such algorithms.) Given K blocks, we initialize the KL-type algorithm by assigning each node to a block uniformly at random, so each node has probability $1/K$ of being assigned to a given block. The algorithm then calculates the best possible block reassignment for any single node with respect to the associated change in log-likelihood (either the largest increase or the smallest decrease). It then chooses a new node uniformly at random and determines its best reassignment with respect to change in log-likelihood. The algorithm cycles through all nodes; a key feature of the algorithm is that a node that has been reassigned cannot change blocks again until all other nodes have been reassigned. A sequential reassignment of all nodes constitutes one step of the algorithm. The algorithm then searches all of the states that occurred during the step, and it selects the state with the maximum log-likelihood of any state in that step. This state is the starting point of the next step of the algorithm. A single run of the algorithm finishes when a step does not increase the log-likelihood beyond a preset tolerance value near 0. (In practice, we use $1 \times 10^{-4}$.) To find block assignments that are as good as possible, we do many runs of the algorithm for each network; this process is easily parallelizable. In our examples, we use 50 runs per network. We initialize each run randomly as described above.
Another key feature of the KL-type algorithm is that changes in the block memberships of nodes affect only the terms of the objective function that involve the origin and destination blocks of the changes. The simplified objective function (7) is a sum over block-pair terms over T time layers. Consequently, we do not need to recalculate the full objective function for each node reassignment. This estimation method is practical for networks with up to a few thousand nodes and a small number of blocks, depending on available computing resources.
We implement our KL-type algorithm for our TDD-SBM in an $\textsf{R}$ package using Rcpp (Eddelbuettel and FranÇois Reference Eddelbuettel and FranÇois2011; R Core Team 2018). The back-end calculations are in $\textsf{C++}$ for speed, and we return results in $\textsf{R}$ to enable visualization and other downstream tasks. One can also use our $\textsf{R}$ package to estimate the parameters of time-independent SBMs (including directed and/or degree-corrected ones), a variant of our TDD-SBM without degree-correction, and a variant of our TDD-SBM with directed degree-correction parameters $\theta_i^{\text{in}}$ and $\theta_i^{\text{out}}$. This facilitates comparisons of inferences from various related SBMs. See https://github.com/jcarlen/sbm for our $\textsf{R}$ package $\textbf{\texttt{sbmt}}$ for parameter estimation for our TDD-SBM.
5. Evaluation of model fitting for synthetic networks
In this section, we generate networks using three models: our TDD-SBM, a variant of our TDD-SBM without degree-correction, and our TDMM-SBM. Unless we specify otherwise, we use the term TDD-SBM to refer to our time-dependent discrete-membership SBM with degree-correction. For networks that we generate using discrete-membership models, we estimate model parameters using the TDD-SBM, the TDD-SBM variant without degree-correction, and the Poisson-process SBM (PPSBM) of Matias et al. (Reference Matias, Rebafka and Villers2018),Footnote 5 which also does not have degree-correction. For the networks that we generate using our TDMM-SBM, we estimate model parameters using our TDMM-SBM and our TDD-SBM. We compare how well these estimated models fit the generated data. We illustrate three key points. First, our estimation methods are able to recover data-generating parameters for generated examples. Second, incorporating degree-correction in our models facilitates the identification of blocks with similar temporal activity patterns, as opposed to ones with similar activity levels (i.e., similar degrees), when nodes have heterogeneous degrees. Third, fitting our discrete-membership model (TDD-SBM) to networks that are generated by our mixed-membership model (TDMM-SBM) can result in smaller likelihoods of the observed data than when fitting to these data using our TDMM-SBM. That is, one can lose information when modeling mixed-membership networks as discrete-membership networks.
We now discuss how we generate TDD-SBM networks. We construct networks with 16 time layers to facilitate comparison to the PPSBM. In Figure 2, we show the mean edge weights between nodes in the block pairs for each time layer of the TDD-SBM. We design the underlying curves to resemble patterns that we observed in the urban bicycle-sharing networks. Using these mean edge weights and a specified number of nodes in each block, we calculate the expected sums of the edge weights from one block to another for each layer. In other words, we calculate the MLE $\hat{\omega}$ (see (5)), given this information. We set the parameter $\mathbf{\omega} = \{\omega_{ght}\} = \hat{\omega}$ to generate our networks.
We construct both two-block and three-block networks, where the former only uses values in the upper $2 \times 2$ panel of Figure 2. For simplicity, we divide block membership equally between blocks in the TDD-SBM. For example, if there are 30 nodes and two blocks, 15 nodes belong to each block. We assume that the degree-correction parameters $\mathbf{\theta}$ take one of five distinct values $\{1x, 2x, \ldots, 5x\}$, which are distributed evenly within each block; the value of x satisfies the blockwise sum constraints on $\mathbf{\theta}$.
Using the above setup, we generate multilayer networks from our TDD-SBM. We also generate and fit networks using a version of our TDD-SBM without degree-correction. A discussion of this model is beyond the scope of this paper, but we implemented parameter estimation for it in our $\textbf{\texttt{sbmt}}$ package. The log-likelihood of this model is
which is the log-likelihood of the TDD-SBM without the degree-correction parameter $\mathbf{\theta}$. For each experiment, we produce 100 instantiations of the networks. All of the networks that we generate are directed, and we allow self-edges.
For the networks that we generate using the TDD-SBM or the TDD-SBM without degree-correction, we estimate model parameters by fitting them using our TDD-SBM, our TDD-SBM without degree-correction, and the PPSBM of Matias et al. (Reference Matias, Rebafka and Villers2018) (which is implemented in the $\textbf{\texttt{ppsbm}}$ package in $\textsf{R}$ (Giorgi et al. Reference Giorgi, Matias, Rebafka and Villers2018)). Their PPSBM is a time-dependent SBM that is similar to our TDD-SBM without degree-correction. It differs in that it assumes continuous-time networks and either adaptively splits data into equal-sized bins (to use a nonparametric histogram-based estimator) or uses a kernel estimator of time-dependent intensity functions. For our comparison, we implement the former approach because the data input structure aligns well with our preprocessed data. We estimate the parameters of the PPSBM using the variational expectation-maximization (EM) algorithm from the <monospace>ppsbm</monospace> package. The package includes a method to select the number of blocks using integrated classification likelihood (ICL).
In Table 1, we show the results for networks that we generate using our TDD-SBM and our TDD-SBM without degree-correction. The first two columns give the number of blocks (K) and number of nodes (N) that we use to construct the networks. (Recall that we predetermine these values in our models, so they are given.) We indicate in the “Gen DC” column whether or not the data-generating TDD-SBM has degree-correction. We indicate the model type that we use to fit the data in the “Fit Method” column, and we indicate whether or not it incorporates degree-correction in the “Fit DC” column.
We evaluate the success of each estimation method at detecting the true block assignments by calculating the adjusted Rand index (ARI) using the $\textbf{\texttt{fossil}}$ package in $\textsf{R}$ (Vavrek Reference Vavrek2011). We report the mean ARI values of 100 network instantiations in the “ARI” column of Table 1. The ARI is a function of two group-membership vectors and achieves its maximum value of 1 when they match perfectly (up to label permutations). We expect an ARI of 0 when we compare two uniformly random permutations of block assignments that preserve the block sizes.
We evaluate the ability of the estimation methods to recover the true value of the parameter $\mathbf{\omega}$ using the mean absolute-percentage error (MAPE) between $\hat{\mathbf{\omega}}$ and $\mathbf{\omega}$. The formula for the MAPE is $\text{MAPE}(\hat{\omega})=\frac{1}{ght}\sum_{ght}\big|\frac{\hat{\omega}_{ght} - \omega_{ght}}{\omega_{ght}}\big|$; we report the mean MAPE values of 100 network instantiations in the “MAPE$(\hat{\omega})$” column of Table 1. We also compare the unnormalized log-likelihoods of the generated networks under the data-generating model (we show their means in the “Gen LLIK” column) to the unnormalized log-likelihoods under the estimated model. Given the similarities of the candidate models, this yields an overall measure of how well we estimate the means of the Poisson distributions that describe the edge weights. We show the means of the differences between these two quantities in the “Diff LLIK” column, where positive values signify that the data are more likely under the estimated model.
We find perfect matches (i.e., an ARI of 1 with standard deviation of 0) between the true and estimated block-membership assignments whenever the network-generation and estimation methods match or the estimation model (TDD-SBM) is a generalization of the network-generation model (TDD-SBM without degree-correction). We also find perfect matches when the networks are generated by our TDD-SBM without degree-correction and are estimated by the PPSBM; this illustrates the similarity between these two models. In all of those cases, there is very little error in $\hat{\mathbf{\omega}}$, as indicated by the small MAPE values, and the likelihoods of the generated networks under the estimated model are slightly larger than the likelihoods under the data-generating model. (Recall that we fit the former to the generated networks.) This indicates that our TDD-SBM estimation method is effective in these examples.
When we generate networks using our TDD-SBM with degree-correction, the two methods without degree-correction (i.e., the TDD-SBM variant and PPSBM) do a poor job of fitting the data. In these cases, we obtain large values of both ARI and MAPE for three-block networks. Even in the two-block examples, for which we recover the blocks effectively, we obtain much smaller log-likelihoods of the generated networks under their estimated models than under the data-generating model. This is the case because using a model without degree-correction implies that all edges between a given pair of blocks are identically distributed, regardless of the nodes to which the edges are attached. This does not accurately reflect our generated networks, which have heterogeneous degrees within blocks.
For estimation using PPSBM, the results in Table 1 reflect the fact that we set the number of blocks to be equal to the value (which is two or three in our examples) in our data-generating model. When we apply the block-selection method of Matias et al. (Reference Matias, Rebafka and Villers2018) to models with between one and four blocks (using the implementation in the $\textbf{\texttt{ppsbm}}$ package), we identify the true number of blocks for models without degree-correction. However, when we apply it to networks that we generate using our TDD-SBM with degree-correction and let it search for up to four blocks, we overestimate the number of blocks. The PPSBM method attempts to classify nodes based both on their roles and on their activity levels, so it requires extra blocks.
We generate TDMM-SBM networks using the same values of $\omega$, with corresponding numbers of blocks and nodes, that we used to generate TDD-SBM networks. We are interested in how well our mixed-membership model (i.e., TDMM-SBM) is able to recover data-generating parameters and how much log-likelihood we lose when we use a TDD-SBM to fit a network that we generate using the TDMM-SBM. For each combination of number of blocks (K) and number of nodes (N), we draw values of $\mathbf{C}$ uniformly at random from $\{0, 1, \ldots, 5\}$ and then normalize each block so that $\sum_i C_{ig}=1$. We also include results for networks that we generate with our TDMM-SBM using parameters that we determine by fitting to the Los Angeles bicycle-sharing network (see Section 6.1) with two and three blocks. In these examples, $N=61$ because that is the number bicycle-sharing stations in that network. As with the TDD-SBM, we produce 100 network instantiations for each case. We summarize our results in Table 2.
Because block membership is mixed, we do not use ARI to compare $\hat{\mathbf{C}}$ with $\mathbf{C}$. Instead, we calculate the blockwise absolute error (BAE), which is the sum of the absolute errors divided by the number of blocks. That is, $\text{BAE}(\hat{\mathbf{C}}) = \frac{1}{K}\sum_{i,g}|\hat{C}_{ig} - C_{ig}|$. To evaluate the error in $\hat{\mathbf{\omega}}$, we calculate the MAPE. We also calculate a “pairwise” MAPE of $\hat{\mathbf{\omega}}$ (which we denote by MAPE$_p(\hat{\omega})$ in Table 2), in which we average the absolute percentage error over each pair of blocks. That is, $\text{MAPE}_p(\hat{\omega})= \frac{1}{K^2}\sum_{g,h}\frac{\sum_t|\hat{\omega}_{ght} - \omega_{ght}|}{\sum_t\omega_{ght}}$. The pairwise MAPE is less susceptible than MAPE to having very large values when individual values of $\omega_{ght}$ are near 0.
We compare the log-likelihoods of the generated networks under the data-generating TDMM-SBM (“Gen LLIK”) to their log-likelihoods under the estimated parameters, and we report the means of the differences (the latter minus the former) in the “Diff LLIK” column of Table 2. Finally, we fit our TDD-SBM to the generated networks and calculate the log-likelihoods of the networks under that model. We report the means of these log-likelihoods minus those that we averaged for the “Gen LLIK” column in the “Diff LLIK Discrete” column.
We find that the likelihoods of the generated networks when we use the estimated parameters are slightly larger on average than the likelihoods of the networks when we use the data-generating parameters. (Because of the randomness in the networks that we generate, we expect that a network that we generate using a given model will have a larger likelihood when we use the parameters that we fit to that network than when we use the data-generating parameters.) Additionally, the mean likelihoods of the generated networks under parameters that we estimate using our TDD-SBM are significantly smaller than those that we obtain under the data-generating TDMM-SBM parameters. (See the “Diff LLIK Discrete” column of Table 2.) In short, we gain a lot of information by employing our mixed-membership model in this case.
The BAE, MAPE, and MAPE$_p$ values (i.e., the “error values”) tend to be small for the two-block models, indicating that we are recovering the data-generating parameters fairly well in these cases. (We do not report MAPE values for generated networks when we use the estimated parameters of our TDMM-SBM for Los Angeles, which has $N=61$ bicycle-sharing stations, because a few values of $\mathbf{\omega}$ that are very close to 0 drastically inflate those values.) However, the values of BAE, MAPE, and MAPE$_p$ are much larger for the three-block models (except for the BAE for Los Angeles). The fact that the estimated parameters achieve similar likelihoods as the data-generating parameters despite fairly large values of BAE, MAPE, and MAPE$_p$ suggests that our TDMM-SBM is not completely identifiable. It is important to be aware of this identifiability limitation when interpreting the parameters of our TDMM-SBM. This is especially true for $\mathbf{\omega}$, given the large MAPE and MAPE$_p$ values for all three-block TDMM-SBMs.
The identifiability of time-dependent SBMs is a relatively unexplored research topic. Matias and Miele (Reference Matias and Miele2017) pointed out that the time-dependent SBM of Xu and Hero (Reference Xu and Hero2014) is not fully identifiable, and they introduced a demonstrably identifiable (up to label permutations) time-dependent SBM. They also established that identifiability is an inherent limitation of time-dependent SBMs in which block-membership parameters and block-connectivity parameters vary with time, unless one applies additional constraints or penalization. As we described in Section 3, our models allow only the block-connectivity parameters to vary. Further research is necessary to establish what conditions ensure the identifiability of our time-dependent SBMs.
6. Results on bicycle-sharing networks
We apply our models to bicycle-sharing networks in Los Angeles, San Francisco, and New York City. (See Section 2 for full descriptions of these data sets.) The networks that we examine in downtown Los Angeles and San Francisco are relatively small; they have 61 stations and 35 stations, respectively. The stations in these networks are concentrated in downtown areas, where residential buildings and high-rise office buildings are interspersed. The New York City network is much larger than the other two networks. It has about 600 stations, which encompass a range of commercial areas, residential neighborhoods, parks, and manufacturing areas.
Before discussing our results in detail, we highlight some of our key insights. We present results for two-block models for the Los Angeles (see Section 6.1) and San Francisco (see Section 6.2) bicycle-sharing networks that illustrate the ability of our models to detect functional roles in the form of “home” and “work” roles of bicycle-sharing stations. These results enable us to compare bicycle-sharing systems across cities. We fit discrete-membership and mixed-membership models with two, three, and four blocks to each of our data sets; we also fit five-block and six-block models in several cases. We provide the outputs of these models and all code to recreate our results at https://github.com/jcarlen/tdsbm_supplementary_material. In most cases, we find that additional blocks do not illuminate new functional roles beyond the home and work roles. For example, using models with additional blocks for Los Angeles and San Francisco results in small variations of these roles but increases the noise in the block-connectivity parameters $\hat{\omega}_{ght}$. In Section 6.3, we present five-block TDMM-SBM and TDD-SBM fits of a subset of the New York City bicycle-sharing network that illustrate another role (a “park” block) that does not appear distinctly when we employ models with fewer blocks. A method to select the number of blocks is beyond the scope of our paper, but we discuss aspects of this model-selection problem in Appendix A.4 and illustrate it by fitting mixed-membership models with up to 10 blocks on the Los Angeles bicycle-sharing network.
6.1 Downtown Los Angeles
In Figure 3, we show the mixed-membership (TDMM-SBM) and discrete-membership (TDD-SBM) block assignments of two-block models for the downtown Los Angeles bicycle-sharing network. For the TDMM-SBM, we scale the size of each node i in our plots based on its value of $\sum_gC_{ig}$. We refer to these sums as “C total” values. These values correlate strongly with node degree (specifically, with the sum of the in-degrees and out-degrees of all layers), which is evident in the similarity of node sizes in the left and right panels of Figure 3. For both models, we observe that home and work blocks are interspersed geographically. As part of our discussion in this subsection, we will describe our method for determining the block labels in Figure 3.
The TDMM-SBM output reveals a group of stations in the left panel of Figure 3 that are neither strongly home-identified nor strongly work-identified. Instead, they possess a roughly even mixture of the two types. For this network, the TDD-SBM output is very similar to what we obtain from a discretization of the TDMM-SBM output (where we discretize by assigning each node i to the block with the maximum value of its $C_{ig}$ parameter), but this is not true for all of our bicycle-sharing networks.
Our model does not yield “home” and “work” labels for each block on its own, so we use the time-dependent block-connectivity parameter estimates $\hat{\omega}_{ght}$ to assign these labels. We assign the labels heuristically under the assumption that the “home” block is the origin of many trips to the work block in the morning and the “work” block is the origin of many trips to the home block in the evening. Figure 4, which shows $\hat{\omega}_{ght}$ for each possible value of g and h, with the hour t on the horizontal axis, supports our labeling. Given our labeling, we observe a clear peak in home-to-work traffic in the mornings and work-to-home traffic in the evenings. We make similar “home” and “work” assignments for San Francisco and New York City. In Los Angeles, the traffic within the work block peaks in the middle of the day. This may represent lunchtime errands, leisure activity, or touristic activity. (There are many tourist attractions in the downtown area.) The traffic from the home block to itself has a mild evening peak and is the lowest overall among the block pairs.
To further evaluate our block labels, we use the zoning map for downtown Los Angeles from the Los Angeles Department of City Planning (Department of City Planning, City of Los Angeles 2015).Footnote 6 Zoning ordinances determine the allowable uses of city land. These ordinances distinguish between land that is available for commercial uses, industrial uses, residential uses, park districts, and other uses. In the background of Figure 5, we show a simplified version of the underlying zoning map in which we have grouped similar designations. The industrial areas include a mixture of manufacturing and commercial uses. Public facilities include government buildings, public schools, parking under freeways, and police and fire stations (Department of City Planning, City of Los Angeles 2022). In downtown Los Angeles, manufacturing and industrial areas are split cleanly from residential areas, whereas residential and commercial areas are intermixed across the bicycle-sharing system’s coverage area.
Figure 5 illustrates that most stations that are strongly home-identified are in or near zones for pure residential use or mixed residential and commercial use. We find that many stations that are not predominantly home-identified or work-identified align with mixed-use residential/commercial zones. The discrete-role plot (see the right panel of Figure 3) has a stripe of “home” stations that cut diagonally through the “work” stations. In Figure 5, we see that this aligns roughly with areas that are zoned for purely residential use. By contrast, industrial and public-facility zones tend to have stations that are primarily work-identified, although some of the most strongly work-identified stations are in mixed-used areas.
One station that seems to deviate from the overall pattern is the heavily trafficked station at Union Station. Although it is adjacent to a public facility zone with many government buildings, it is also strongly home-identified. This may seem surprising on its surface, but this classification is consistent with those of other home-identified stations because Union Station is a major transit hub for the Los Angeles metropolitan area. Many morning trips originate there as commuters transition from other modes of transportation, and many evening trips conclude there. This type of activity pattern is sensibly associated with home-identified stations. Such idiosyncrasies of transit hubs also arise in our results for San Francisco and New York City.
6.2 San Francisco
In Figure 6, we compare our two-block TDMM-SBM and two-block TDD-SBM for San Francisco. As we saw for Los Angeles, the San Francisco blocks are interspersed geographically and stations vary from strongly home-identified to strongly work-identified. The two most strongly home-identified stations are located at a major transit hub (the San Francisco Caltrain Station on 4th Street).
In Figure 7, we show the estimated traffic between the “home” and “work” blocks for the two-block TDMM-SBM and the two-block TDD-SBM. As in the downtown Los Angeles network, we observe inter-block commuting. However, unlike in downtown LA, the TDD-SBM reveals intra-block morning and evening peaks in both the home and work blocks. This may be due to last-mile commuting, such as using bicycle-sharing facilities to get to or from a transit station. The 10 most-popular trips within blocks (i.e., home-to-home travel or work-to-work travel when there are two blocks) that originate during the heavy commuting hours of 7–9 am and 4–6 pm all begin at major public transportation stops in the morning and end at those stops in the evening. Recognizing last-mile usage is important for integrating bicycle-sharing systems with nearby public transportation. One possible reason that we do not observe a similar phenomenon in downtown LA is that San Franciscans are more likely than residents of Los Angeles to use public transportation (Weikel and Karlamangla Reference Weikel and Karlamangla2015). The intra-block morning and evening peaks may also arise from the intermixing of commercial and residential uses of land. In other words, some travel within blocks may also constitute commuting.
6.2.1 Comparison to SBM variants
We briefly compare our above results for the Los Angeles and San Francisco bicycle-sharing networks to results that we obtain for these networks using a time-independent SBM and a time-dependent SBM without degree-correction. We do this to illustrate that both time-dependence and degree-correction are important for achieving our goal of detecting functional roles of bicycle-sharing stations.
To fit the time-independent SBM, we construct the time-independent adjacency matrix with entries $\tilde{A}_{ij}=\sum_{t=0}^{23}A_{ijt}$ and fit a two-block TDD-SBM to the associated time-aggregated network with a single layer. For the downtown Los Angeles network, we observe a clear geographical division in the output of the time-independent SBM (see Figure 8). For the San Francisco network, the differences between the blocks of the time-dependent SBM and time-independent SBM are less noticeable, although they are still present. This confirms that our time-dependent SBMs detect usage patterns that are not evident in the time-aggregated data.
We now fit the variant of our two-block TDD-SBM without degree-correction (see Section 5) to the Los Angeles and San Francisco bicycle-sharing networks. In Figure 9, we see that stations are now divided into one group of small-degree stations and one group of large-degree stations, where the “degree” is the sum over all layers of the in-degrees and out-degrees. These blocks differ significantly from those that we obtained using the TDD-SBM with degree-correction (see Figures 3 and 6). For example, for Los Angeles, the TDD-SBM with degree-correction classifies the high-traffic node at Union Station as a “home” station because of its frequent departures in the morning and frequent arrivals in the evening, whereas the TDD-SBM without degree-correction groups Union Station with other high-traffic stations that are classified predominantly as “work” stations in the degree-corrected TDD-SBM.
6.3 New York City
We fit our TDMM-SBM and our TDD-SBM to the New York City bicycle-sharing network with 2–6 blocks. Our results suggest that there is a limit to the size of networks for which our time-dependent SBMs can recover blocks that are based on functional roles of stations (rather than on geographical proximity). Therefore, in the present subsection, we apply our models to a subset of the New York City bicycle-sharing network to gain insight about the functional roles of stations in this subset. We select this subset based on the results of applying our three-block TDD-SBM to the entire New York City network. We examine the subnetwork that is induced by the stations that are in one of these three blocks. We explore three-block SBMs of the entire New York City network in Appendix A.3. In those results, we infer that stations are assigned to blocks along geographical lines, rather than to blocks with distinct functions.
6.3.1 Manhattan subnetwork
We examine the New York City bicycle-sharing network on a scale that is smaller than the full system and fit models to the subset of stations and trips that lie within the Manhattan (home) block of the three-block TDD-SBM (see Figure A.4) that we examine in Appendix A.3. We refer to this subnetwork as the “Manhattan” network, even though it does not include all of the stations in the borough of Manhattan. This network consists of 166 stations and 256,840 trips. In Figure 10, we show our results for a five-block TDMM-SBM and five-block TDD-SBM for the Manhattan network. The area without stations in the middle of each panel of Figure 10 is Central Park, which has stations on its perimeter but not in its interior.
The blocks that we infer using the five-block TDMM-SBM and five-block TDD-SBM outline similar regions. These models yield block-membership parameters that capture residential and commercial zones much better than the three-block TDMM-SBM and TDD-SBM do for the full New York City network. One can see this by comparing the five-block results with the underlying zoning map of the area in Figure A.6. The stations in residential zones tend to have larger block-membership parameters for “home” blocks than for “work” blocks, and the opposite is true for stations in commercial zones. We label the five detected blocks as (clockwise from the top left in each panel of Figure 10) “home (west),” “park,” “home (east),” “work,” and “mixed.” We base these labels on the land use of the underlying areas and the time-dependent block-connectivity parameters $\hat{\omega}_{ght}$ (see Figure 11). When we use the TDMM-SBM, the regions tend to blend together; they have sharp boundaries in the TDD-SBM (see Figure 10).
We highlight the appearance of the “park” block, which we have not observed in previous examples and has distinctive usage patterns. The park block is similar to a residential block in its spike in morning traffic to the work block and its spike in evening traffic from the work block, but it has distinctive intra-block activity that peaks in the afternoon. The intra-block activity resembles weekend activity in the New York City bicycle-sharing system as a whole (see Figure 1); this reflects leisure use of the bicycles. Bicycles near Central Park (which also places them near several major museums) are likely to be used by tourists and other non-commuters during the day for leisure or for travel to nearby attractions.
In Figure 11, we show the values of the block-connectivity parameters $\hat{\omega}_{ght}$ for the five-block TDMM-SBM and the five-block TDD-SBM. Our estimates of $\hat{\omega}_{ght}$ for these models illustrate important differences in the usage patterns of different blocks that we observe only when we employ a time-dependent model.Footnote 7 We observe some overlap in the time-dependent usage patterns of the blocks, so there is potential overfitting. For example, the home (east), home (west), and mixed blocks have traffic that is similar to that in blocks other than their own. However, examining the Manhattan network using models with fewer than five blocks does not clearly distinguish the “park” block of stations from other stations in residential zones.
One reason that our time-dependent SBMs for the Manhattan network perform better (with respect to detecting functionally meaningful blocks) than models that we apply to the entire New York City network (see Appendix A.3) is the dependence of station-to-station trip counts on the distances between stations. Although our SBMs correct for the overall activity of each station, they do not normalize expected edge values by the distances between stations. In a small geographical area, such as the coverage areas of the Los Angeles and San Francisco networks, this is a reasonable choice, as all stations are within “biking distance” of each other. However, when examining a system as large as New York City’s, the lack of distance correction weakens the functional roles that we obtain using our time-dependent SBMs. Intra-block trips dwarf inter-block trips (see Figure A.5), and it seems more reasonable to interpret each block as its own ecosystem.
7. Conclusions and discussion
We introduced two types of time-dependent stochastic block models (SBMs) and used them to reveal interesting patterns in urban bicycle-sharing networks. We formulated these SBMs to account both for degree heterogeneity and for a balance between cumulative in-degrees and out-degrees of bicycle-sharing stations over the course of a day. (The latter feature reflects a classical axiom of Ravenstein (Reference Ravenstein1885) that every current of human mobility has an associated countercurrent (Barbosa et al. Reference Barbosa, Barthelemy, Ghoshal, James, Lenormand, Louail, Menezes, Ramasco, Simini and Tomasini2018).) Our SBMs group stations based on their time-dependent usage patterns; this allows us to identify them with home and work roles. Work stations are characterized by inflow from home stations in the morning and outflow to home stations in the afternoon and evening, and residential stations have the opposite flows. It is also sometimes possible to identify other roles, such as a Manhattan park block that combines residential and leisure/touristic behavior.
We illustrated through case studies of Los Angeles, San Francisco, and New York City how our mixed-membership and discrete-membership SBMs can provide complementary insights about transportation patterns in bicycle-sharing systems. We found that many bicycle-sharing stations in our three focal cities have a mixture of roles, which we captured with our mixed-membership SBM. However, in some cases, we observed that a discrete-membership SBM gives a clearer picture of the usage patterns that are associated with each block. We also demonstrated the importance of time-dependence and degree-correction in our models by comparing our results to those from time-independent SBMs and time-dependent SBMs without degree-correction.
We evaluated our block labels by comparing them to city zoning maps. The home–work dichotomies that we detected generally align well with the underlying zones, but there are important deviations near major transit hubs. It is common to evaluate the results of community detection through comparisons with so-called “ground-truth” communities (Fortunato and Hric Reference Fortunato and Hric2016; Barbillon et al. Reference Barbillon, Donnet, Lazega and Bar-Hen2017) (although it is crucial to be cautious in such evaluations (Peel et al. Reference Peel, Larremore and Clauset2017)). The time-dependent commuting flows that we detected using our SBMs enabled us to identify and label the functional roles of blocks of bicycle-sharing stations even without a corresponding zoning map. In the future, it will be beneficial to compare the activity patterns that we detected to activity patterns from other mobility data, such as taxis, subway systems, e-scooters, and geo-tagged mobile apps (Zhu et al. Reference Zhu, Khan, Kats, Santosh Bamne and Sobolevsky2018).
There are several worthwhile ways to improve our models and algorithms. One valuable extension is to allow distributions other than the Poisson distribution for the numbers of trips between stations. Additionally, although it is a convenient simplification to assume independence between the numbers of trips that start in each hour, it may be beneficial to relax this assumption. For example, these numbers are not independent if there are stations that run out of bicycles at some point. As we have discussed, mixed-membership and discrete-membership SBMs can reveal different insights, as can examining SBMs with different numbers of blocks. This helps illustrate why it is important to consider model selection in greater depth. It is also worth considering other estimation techniques for both our mixed-membership and discrete-membership models, especially if one alters the assumption of conditionally independent Poisson edge distributions that we used to simplify our parameter estimation. Variational inference has been employed for many related models (Xing et al. Reference Xing, Fu and Song2010; Yang et al. Reference Yang, Chi, Zhu, Gong and Jin2011; Matias and Miele Reference Matias and Miele2017; Matias et al. Reference Matias, Rebafka and Villers2018), and it is likely that it can also be used in our setting to improve performance (especially for large networks).
Another important direction for future work is the exploration of different methods of preprocessing data to include only the most important edges. The two most apparent ways to do this are (1) eliminating potentially insignificant edges by thresholding and (2) choosing time layers that reduce variance. The preferential-attachment model of Zhu et al. (Reference Zhu, Yan and Moore2013) gives one possible approach for eliminating insignificant edges. The way that one splits the times of a day can improve both accuracy and computational efficiency by reducing the total number of parameters. For example, in the cities that we studied, bicycle trips occurred sporadically between 1 am and 5 am, so it may be desirable to aggregate all of the associated time layers into one time layer to decrease the number of parameters and thereby decrease variance. There exist methods that aim to find suitable ways to segment time periods (Caceres and Berger-Wolf Reference Caceres and Berger-Wolf2013), and trying to determine the best ones to use in different situations is an active area of research.
Broadening our models to control explicitly for spatial relationships (Barthelemy Reference Barthelemy2018) is another natural direction to build on our work. For example, it may enable one to identify functional blocks in geographically diffuse networks such the New York City bicycle-sharing network. The radiation, intervening-opportunities, and gravity models have had some success at describing human mobility over various distances (Barbosa et al. Reference Barbosa, Barthelemy, Ghoshal, James, Lenormand, Louail, Menezes, Ramasco, Simini and Tomasini2018). These mobility models put more weight on longer trips, and some mobility models account for opportunities that lie between origin and destination locations. Some mobility models also possess statistical justification based on entropy arguments, and it is worthwhile to investigate methods to incorporate them into SBMs. Mobility models were incorporated into null models in time-dependent modularity objective functions in Sarzynska et al. (Reference Sarzynska, Leicht, Chowell and Porter2016), who found that radiation and gravity null models are more appropriate than the usual Newman–Girvan null model (which is a variant of a configuration model (Fosdick et al. Reference Fosdick, Larremore, Nishimura and Ugander2018)) for spatial networks. One can use similar ideas to incorporate spatial information into SBMs. The value of using spatial null models in studies of bicycle-sharing systems has been examined previously (Austwick et al. Reference Austwick, O’Brien, Strano and Viana2013; Taiyeb Reference Taiyeb2014), so this is a very interesting direction to pursue.
It is also worthwhile to use our time time-dependent SBMs and extensions of them to study other types of mobility data. One example is dockless vehicle-sharing networks, such as those from e-scooter-sharing services. To make the data from such systems amenable to analysis using our models, one can partition a city into a grid and measure the usage in each region, taking care to recognize irregularities from transit hubs. Depending on how heavily these systems are used in commuting, one may discover functional blocks other than ones for “home” and “work.” We expect that it will also be possible to tailor our methods to analyze multimodal transportation systems and other urban flows, which are particularly suitable for analysis using multilayer networks (KivelÄ et al. Reference KivelÄ, Arenas, Barthelemy, Gleeson, Moreno and Porter2014; Aleta and Moreno Reference Aleta and Moreno2019; Natera Orozco 2020; Gallotti et al. Reference Gallotti, Bertagnolli and De Domenico2021).
Acknowledgements. We thank Brian Karrer and Mark Newman for allowing us to use and share their code for degree-corrected SBMs from Karrer and Newman (Reference Karrer and Newman2011). We thank Susan Handy’s lab at UC Davis for useful discussions on contextualizing our work for transportation researchers and planners, David Kempe for pointing us to Caceres and Berger-Wolf (Reference Caceres and Berger-Wolf2013), and Michelle Feng and others in the networks journal club at UCLA for helpful comments. We also thank two anonymous referees for many helpful comments and Phil Chodrow for technical assistance with setting up computations. CM and SSC thank NSF (DMS-1351860) for funding, and SSC also thanks NIGMS (R01 GM126556) and an NIH Ruth L. Kirschstein National Research Service Award (T32-GM008185) for funding. SW thanks NSF (CCF-1422795), ONR (N000141110719, N000141210834), DOD (W81XWH-15-1-0147), Intel STC-Visual Computing Grant (20112360), and Adobe Inc. for funding.
Competing interests. None.
A.1 Singular vectors for bicycle-sharing data
To inform our modeling approach, we explore the imbalances between morning and evening activity in the stations of each network. We calculate singular-value decompositions (SVDs) of the matrices of the hourly in-degree and out-degree of each station. To be explicit, entry i,j of the matrix of in-degrees is equal to the total number of trips that arrive at station i in hour j, and we construct the matrix of out-degrees analogously for departing trips. We show results for Los Angeles, San Francisco, and New York City in Figures A.1, A.2 and A.3, respectively. The first two principle components either strengthen both observed peaks or weaken one peak while strengthening the other. Across the three data sets, the first two singular vectors explain between $88 \%$ and $97 \%$ of the variation in the corresponding matrix, supporting the importance of peak morning and peak evening commutes for classifying stations.
A.2 Proof that the expected node degrees of the MLE of the TDD-SBM equal the observed node degrees
Suppose that we generate a network from the MLE of our time-dependent discrete-membership stochastic block model (TDD-SBM). We prove that the expected value of the degree of a node (i.e., the sum of its in-degree and its out-degree over all time layers) is the same as the degree of the node in the observed data. That is, we prove that $\sum_{j}\sum_{t=0}^{23} \left(\mu_{ijt} + \mu_{jit}\right) = k_i=\sum_j \sum_{t=0}^{23} A_{ijt}$.
Let i and j be nodes of the network, and let $t \in \{0,1,\ldots,23\}$ index the time layers. Additionally, recall that g denotes block g in the TDD-SBM and that $\kappa_g$ is the sum of the in-degrees and the out-degrees of all nodes in block g over all time layers. We calculate
A.3 TDMM-SBM and TDD-SBM for the entire New York City bicycle-sharing network
In Figure A.4 we compare our results from a three-block TDMM-SBM and a three-block TDD-SBM for the New York City bicycle-sharing network. In initial calculations, we found that a two-block TDD-SBM divides the network along the East River into a Manhattan block and Brooklyn block and that a two-block TDMM-SBM divides the network slightly farther north in Lower Manhattan. We do not show the outputs of the two-block models, but we provide the replication code, estimated parameters, and associated visualizations at https://github.com/jcarlen/tdsbm_supplementary_material.
In Figure A.5, we compare the inferred inter-block traffic, as captured by the values of $\hat{\omega}_{ght}$, for the three-block TDMM-SBM and three-block TDD-SBM. We observe prominently that all intra-block traffic has two peaks and much higher hourly trip counts than inter-block traffic. The double peaks are reminiscent of the overall system activity in Figure 1. This may be due in part to last-mile commuting, as we saw in the San Francisco bicycle-sharing network. However, for a system that is this geographically large, the two peaks and minimal inter-block traffic suggest that it is useful (and important) to consider each block as its own ecosystem. There is also a strong similarity between the results from our TDD-SBM and the results (not shown) from a three-block time-independent SBM for a time-aggregated New York City network, providing further evidence that our time-dependent SBMs are not capturing time-dependent roles for New York City when we consider its entire bicycle-sharing system. Consequently, we choose the labels of these blocks (see Figures A.4 and A.5) based on the primary borough and zone type of each block’s stations, as indicated in the zoning map for this part of New York City (Department of City Planning, New York City 2018) in Figure A.6.
In Figure A.6, we illustrate that there is general overlap, although it is far from perfect, between (1) the Upper Manhattan (“home”) block and residential areas or parks and between (2) the Lower Manhattan (“work”) block and commercial or manufacturing areas. All stations in Brooklyn are in the third block, which mostly has residential areas. We show only the blocks that we infer using our TDD-SBM in Figure A.6, but the same reasoning motivates our labels for the three-block TDMM-SBM. No block has exclusively commercial or residential areas.
We examined several time-dependent SBMs for New York City with more than three blocks to try to discover functional blocks, but we found that the blocks were still primarily geographical. In some cases, the TDMM-SBM with four or more blocks yielded functional divisions within smaller geographical areas (subdividing the blocks in Figure A.4), but none of our models detected system-wide “home” or “work” blocks. See our code and figures at https://github.com/jcarlen/tdsbm_supplementary_material to generate and view the results of applying time-dependent SBMs with 2–6 blocks to the New York City bicycle-sharing network.
Across our models, we find that the distance between two stations in the New York City bicycle-sharing network is the feature that is associated most strongly with whether or not those stations belong to the same block. As we noted above, the detected blocks represent primarily geographical divisions, with most of the traffic occurring within blocks. This contrasts with our results for the Los Angeles and San Francisco networks. We believe that this occurs because the New York City bicycle-sharing network is geographically larger than the San Francisco and Los Angeles bicycle-sharing networks. Our models are useful for detecting functional blocks only when the distances between stations are not the main determinant of inter-station traffic over time.
A.4 Model selection
Although statistically rigorous model selection is outside the scope of our paper, we briefly compare the number of parameters in our mixed-membership SBM (TDMM-SBM) and discrete-membership SBM (TDD-SBM). This is valuable for considering model-selection criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), that penalize a model based on its number of parameters. For a network with N nodes, K blocks, and T time layers, the number of parameters in the TDMM-SBM is
and the number of parameters in the TDD-SBM is
The first term of (8) comes from the fact that each node in the TDMM-SBM has K parameters ($C_{ig}$, with $g \in \{1,\ldots,K\}$) that express the strength of membership in each block. By contrast, each node in the TDD-SBM has one parameter for block membership and one degree-correction parameter. Therefore, given a value of N, the first term in (8) increases linearly with the number of blocks, whereas the corresponding term in (9) is fixed. Otherwise, formulas (8) and (9) are identical. The $-K$ term in each formula arises from the identifiability constraints in each model. As we described in Section 3, these constraints are $\sum_i C_{ig}=1$ (for each block g) for the TDMM-SBM and $\sum_{i\in g}\theta_i = 1$ (for each block g) for TDD-SBM. The last term (i.e., $TK^2$) in each formula is the total number of $\omega_{ght}$ terms in the model (see Section 3.1).
We show the unnormalized log-likelihoods and numbers of parameters ($N_p$) in the TDMM-SBM and TDD-SBM for the Manhattan (home) subnetwork (which has $N = 166$ nodes) with 2–6 blocks (see Table A.1). In this example, the TDMM-SBM outperforms the TDD-SBM with respect to log-likelihood when the two models have the same number of parameters. This result makes sense because of the additional constraint of the TDD-SBM that each station must belong to exactly one block.
In Figure A.7, we plot the Akaike information criterion (AIC) (Akaike Reference Akaike1974), which is given by
versus the number of blocks in our TDMM-SBM for the Los Angeles bicycle-sharing network with 2–10 blocks. We consider models with a larger numbers of blocks in the Los Angeles network than we did in the Manhattan network because the former’s smaller number of stations allows faster computations.
The AIC is a cost function for comparing the relative qualities of statistical models. Given two models, one regards the model with a smaller AIC as the “better” model. The AIC accounts for both the likelihood and the complexity of a model, where one measures the latter based on the number of parameters in the model. In Figure A.7, we see that the AIC of the TDMM-SBM for the Los Angeles bicycle-sharing network decreases as we increase the number of blocks from 2 to 10. Based on the AIC criterion, one would select the 10-block model. However, plots of the block-connectivity parameters $\hat{\omega}_{ght}$ for the TDMM-SBM for the Los Angeles network with 7 or more blocks are noisier and no more informative than ones from the TDMM-SBM with fewer blocks. These plots suggest strongly that there is overfitting in these cases. This is in line with experiments by Funke and Becker (Reference Funke and Becker2019), who found that the AIC overestimates the number of blocks for several types of SBMs.
Our time-dependent SBMs are incomplete descriptions of the data-generating process in bicycle-sharing networks. For example, they do not account for the distances between stations. The AIC tends to recommend a model with more blocks to compensate for this incompleteness, but (as we just discussed) visual inspection reveals the likely overfitting that results from including too many blocks. We seek to gain insight into bicycle-sharing systems, and we prefer models with fewer blocks that describe station roles in broad but interpretable terms over less-interpretable models with more blocks but smaller AIC values. Choosing appropriate model-selection criteria for SBMs deserves additional consideration and is an active research area (Yan et al. Reference Yan, Shalizi, Jensen, Krzakala, Moore, ZdeborovÁ, Zhang and Zhu2014; Yan Reference Yan2016; SaldaÑa et al. Reference SaldaÑa, Yu and Feng2017; Wang and Bickel Reference Wang and Bickel2017; Hu et al. Reference Hu, Qin, Yan and Zhao2020). We leave such investigations for future work.