Policy Significance Statement
COVID-19 crisis confirmed that mobile phone data can provide information on the population present in a given place and at a given time, and on mobility analyses, useful for public decision-making not only in times of epidemics. However, indicators provided by the data analytics solutions of the different mobile network operators (MNOs) showed some shortcomings as regards to official statistics standards and needs. This calls for stronger collaborations between official statistics and MNOs, to construct a fruitful partnership in which commercial and public uses are articulated in a secured legal environment using privacy-enhancing technologies.
1. Introduction
Data passively generated by mobile networks have emerged as a valuable data source for studies of human presence, mobility, and social interactions (Blondel et al., Reference Blondel, Decuyper and Krings2015). Given that they provide precise and up-to-date information, these data are of great interest for public decision-making. The French National Institute of Statistics and Economic Studies (INSEE) together with Eurostat and some other European national statistical institutes (NSIs) identified some years ago their potential for official statistics, as a complement to more standard statistical data sources. These NSIs have developed frameworks to integrate mobile phone data (MPD) into statistical production, but practical experiments remain rare (DGINS, 2013; Debusschere et al., Reference Debusschere, Sonck and Skaliotis2016; Ricciato et al., Reference Ricciato, Widhalm, Pantisano and Craglia2017; Ricciato et al., Reference Ricciato, Lanzieri, Wirthmann and Seynaeve2020; ESSnet Big Data WP5, n.d.). In parallel, INSEE, as a data producer, has initiated methodological collaborations with the R&D departments of different French mobile network operators (MNOs) in order to confront MPD with that of official statistics standards as well as to develop innovative experimental statistics (Vanhoof et al., Reference Vanhoof, Combes, de Bellefon, Petrucci and Verdec2017; Sakarovitch et al., Reference Sakarovitch, de Bellefon, Givord and Vanhoof2018; Vanhoof et al., Reference Vanhoof, Reis, Ploetz and Smoreda2018).
At the beginning of the COVID-19 lockdown, with a view to measuring the large movements of people that occurred just before the confinement came into force, INSEE quickly initiated limited-in-time and specific collaborations with three of the four MNOs operating in France. INSEE’s strategy was to combine aggregated and anonymous indicators on population movements and mobility provided by those MNOs, as well as to keep control over the final statistical treatments in order to rapidly disseminate these experimental statistics as reliably as possible. This paper relates this initiative.
This MNO project was part of a wider strategy to provide statistics on crucial issues such as the economic turndown and excess mortality (see Chief Statistician J. L. Tavernier’s post on the INSEE blog; Tavernier, Reference Tavernier2020). This experience has indeed confirmed that MNO data do present a strong interest for public decision-making and for official statistics more generally. However, more collaborations between statistical institutes and MNOs are needed to increase relevance of and trust in these data, in particular, through methodological improvements. Our experience has shown that MNOs do have an important role to play alongside NSIs, in the production of valuable information that serves the public interest.
2. Setting the Scene
France, like many countries affected by SARS-CoV-2, took strict restrictive measures in March 2020 to contain the circulation of the virus. Within 4 days (March 14–17), schools and nonessential businesses were successively closed and the population was placed under lockdown nationwide for almost 2 months (March 17–May 11); see Salje et al., Reference Salje, Tran, Lefrancq, Courtejoie, Bosetti, Paireau, Andronico, Hozé, Richet, Dubost, Le Strat, Lessler, Levy-Bruhl, Fontanet, Opatowski, Boelle and Cauchemez2020; Legifrance, 2020. Just before the lockdown, major population movements took place, leading to an unknown redistribution of the population over the territory, which had to be documented in the context of the management of the health crisis. This prompted INSEE to demand access to contemporary MPD to provide nationwide information on population distribution that could be used in addition to more traditional residence-based population statistics (census and administrative data).
INSEE offered, during the very first weeks of lockdown, to the four main mobile phone operators operating in the country, a collaboration specific to the health crisis. The topics of collaboration were clearly specified since the beginning: documenting the population distribution across the territory that should differ from the distribution of people in their usual places of residence, its evolution when the lockdown would be released; and providing mobility indicators—especially commuting indicators—to enrich the toolbox of high-frequency indicators used in the INSEE economic outlook.
As the NSI, INSEE should ensure the compliance of the statistical products it disseminates with the principles of official statistics, that is, professional independence, objectivity, impartiality, relevance, quality, and sound methodology as established by national and European regulations (National Statistical Law and n.d. Regulation (EC) No 223/2009). This is especially challenging when the Institute is not the direct collector of data and when the data are not primarily collected for statistical purposes. In front of the emergency and the massive nature of the socioeconomic shocks that had to be measured, it was not feasible to address directly in time, all the methodological aspects needed to ensure official statistics standards were respected. INSEE used anonymous indicators produced by the data analytics services of the MNOs based on network signaling data in accordance with the Directive 2002//EC (the ePrivacy Directive; EUR-Lex, n.d.). Having no control on the methodology used to construct the indicators, INSEE strategy relied on combining data products already available or easily achievable coming from various MNOs—to mitigate risks of biases, or other consistency issues (Batista e Silva et al., Reference Batista E Silva, Carneiro Freire, Schiavina, Rosina, Marín Herrera, Ziemba, Craglia, Koomen and Lavalle2020). INSEE chose also to keep in hand the final calibration of the results.
Three MNOs responded favorably to the proposal and engaged time-limited philanthropic collaborations for the sole context of the sanitary crisis. Confidentiality agreements were established to frame the delivery and use of aggregates respecting GDPR. Each collaboration had its specificity, concerning the geographical and time ranges, the purposes, and the range of the philanthropic collaboration. At the same time, some MNOs were solicited by various public bodies and also felt the need for a (partly) centralized response on which INSEE could participate. This lasted until the end of May. Since then, INSEE has not engaged commercial partnerships.
INSEE had already ongoing methodological collaborations for the construction of experimental statistics on population present within a given place and time with two MNOs. The collaborations with these MNOs were faster to launch. INSEE participation in the European Task Force on the use of MPD for official statistics and its role as a usual methodological partner of telecommunication operators have accelerated the design of the necessary data and their processing.
3. Providing Input Data to a Data/Statistical Producer
The three operators who collaborated with INSEE had data analytics units exploiting network signaling data. They provided statistical products coming from their commercial offers, namely aggregated and anonymous population-adjusted counts of people, department of presence (NUTS3) during the night by department of residence; and origin–destination trip count matrices. Only one operator accepted the production of a specific product for INSEE. With INSEE objective to combine information, the use of already existing statistical products raised a first problem of compatibility between the concepts, measures, and methods that would depend on the MNO choice.
INSEE core competency is to process, produce data, and make it understandable. So, INSEE needed the least preprocessed data possible in order to adjust the results thanks to its own data sources and to perform specific statistical treatments (the lockdown situation allowed one to assume no entry/exist in the French territory, justifying to recalibrate the overall population counting to national population estimates). With these post-treatments, INSEE went further than a direct use of the products provided by MNOs. However, many indicators received by INSEE were already statistically adjusted to the whole population following MNOs’ methodologies, and this hinders INSEE from getting the most out of the combination of sources. NSIs have the legitimity and a unique capacity to pull together various data sources inter alia to implement relevant statistical adjustment in order to identify and correct sample bias. MNOs do not share this capacity, and their datasets are limited by definition to their customer database, which are not representative of the whole population. Since there is a public interest to improve the quality of the data used to produce new public or private services, it would be interesting to further study under which terms and conditions, NSIs could produce and provide ad hoc anonymous information needed by operators to correct more accurately MNOs indicator sample biases.
The transmission of nonadjusted data to NSIs: issues for MNOsFootnote 1
Nonadjusted data can reveal the respective market shares of the competing MNOs according to variable geographical scales. If they are required to provide to the regulator their market share at the national level, MNOs are reluctant to share with NSIs their local market knowledge—expressing fears it could leak to competitors. A solid guarantee of confidentiality drawn up between NSIs and operators is therefore crucial.
The MNOs made quite regular deliveries of the data during the crisis: from a single delivery for one MNO to more than 10 deliveries for another one. Some operators agreed to exchange on a very regular basis on the methodology of computation of the indicators, making their methodologists available to answer our questions and even sometimes to adjust the calculation method. However, even in the case of very regular exchanges, the methodology had never been made fully transparent. On the one hand, operators considered that fully exposing their methodology carries the risk of revealing their technical innovations. This issue is crucial and legitimate. On the other hand, INSEE must respect its commitments in terms of deontology and standards. In particular, INSEE needs to measure the quality of the data it produces and to communicate its reliability to the public. The balance between these two requirements has yet to be found.
The transparency of the methodology: issues for MNOs
INSEE recommends that common methodological standards should be adopted (definition of what is a place of residence, what is commuting, etc.). Nevertheless, sharing this methodology would raise at least two problems in their views: it could reveal their innovations, and it would require them to adjust their statistical production line. The latter is costly given the computational burden in a Big Data ecosystem, but also because a new algorithm can require the Data Protection Authority approval.
4. Outcomes and Impacts
At the announcement of the lockdown, significant population movements were observed. These changes raised concerns about pressures on local health systems and therefore had to be quantified. In addition, a better understanding of who had changed residences (workers, second homeowners, and students) was also of interest for decision-makers and for the public information. After the lockdown, the gradual return of population movements was also measured, and the return of population, especially nonresidential workers, within the big cities was also related to economic activity rebound in the country. INSEE published two press releases. The first one, published 3 weeks after the lockdown announcement, provided first experimental results on the population distribution before and during lockdown at the department level (NUTS3) (see INSEE Press Release of April 8, n.d.). It gave first measures of the population movements that happened—tourists, workers, and students returning to their homes, and urbans moving to more rural areas. It relied on data coming from one MNO only, and results were announced to have to be confirmed by a later cross-MNO analysis. In addition to the press release, INSEE also communicated these experimental statistics to prefects in the sanitary crisis management context.
The second press release (INSEE Press Release of May 18, n.d.) consolidated the first results with data coming from two MNOs and covering a longer period (up to the end of April). An econometric-based approach was performed to combine MNOs’ indicators, with final population adjustment made by INSEE. The second press release was accompanied by a detailed analysis report, which compared the signaling data-based population distribution over the territory before and after lockdown, to census and other official statistics data sources-based descriptions of the territories the most likely to host mobile groups—students, young adults, and owners of secondary residences—and the most likely to show population changes at the lockdown ease. The lockdown was gradually and partially lifted on May 11. The econometric approach adopted enabled also to deliver messages about the inherent uncertainty of daily signaling data-based indicators and to compare the population variations before and after lockdown to usual weekly changes.
The third communication was published in the INSEE collection dedicated to summaries of research studies conducted by the Institute for large audience (see Galiana et al., Reference Galiana, Castillo, Sémécurbe, Coudin and de Bellefon2020). This study relied on data coming from three MNOs, and covered the whole period from prelockdown to the first phase of lockdown ease (up to the end of May). As for the second press release, it used an econometric approach to combine indicators coming from the three MNOs, and population adjustment of the MNOs’ counts were performed by INSEE (even when the indicators were already population-adjusted by the MNOs). It consolidated the previous findings and focused on what happened at the end of the lockdown period: weekly movements between urban centers during the week and more rural and coastal departments on weekends increased.
In addition to population movements, INSEE used daily morning mobility indicators as proxies for commuting indicators to shed light on the pace of the economic recovery starting from the ease of the lockdown. This study relied on daily origin–destination matrices at a fairly fine geographical grid provided by only one MNO. This analysis of morning commuting was published in the Economic Outlook of June 17, 2020, along with other high-frequency data indicators (INSEE, 2020). This analysis relied on the same dataset as the one used in INSERM–Orange lab study that characterizes mobility to inform an age-structured stochastic transmission model and evaluates the impact of the lockdown in curbing COVID-19 epidemic (Pullano et al., Reference Pullano, Valdano, Scarpa, Rubrichi and Colizza2020).
Aside from disseminations of statistical results, INSEE also published a post on its blog explaining its strategy and practical approach regarding MPD (Sémécurbe et al., Reference Sémécurbe, Suarez Castillo, Galiana, Coudin and Poulhes2020). For private life secrets and privacy concerns, and in a context when all kinds of MPD were perceived by the public as a potential risk of tracing, which legitimately raises privacy concerns, it was necessary to explain what—anonymous counting aggregates—was used by INSEE, what for, and how the current work specific to the sanitary crisis articulated with the INSEE long-term strategy and methodological works, often in collaboration with MNOs and other European NSIs.
All publications were positively received, and widely covered by the media. This initiative undoubtedly confirmed the need for official statistics information on population present within a given place and time in addition to residential population, and on daily mobility.
Nevertheless, comparisons with INSEE reference/benchmark statistics showed also some shortcomings of the indicators based on network signaling data such as produced by MNOs for meeting official statistics needs and standards, namely the inherent uncertainty and variability of these kinds of data, and the population-adjustment process. The combination of sources from several MNOs allowed us to limit the impact of information collection problems. Measurement biases due to changes in the behavior of users could be corrected in the lockdown context through the final step of population adjustment performed by INSEE. However, this approach that assumed that the overall population present on the territory is constant day after day was eased by the very peculiar context of closed borders.
As a whole, the results showed consistent trends but detailed understanding of the phenomena at stake at a local level remained sometimes arduous, as reported, for instance, by the INSEE regional representatives.
5. Strengthen the Collaborations Between MNOs, Research Institutes, and NSIs to Take the Most of the Public Interest of MPD
The rapid response to the crisis was made possible by the fact that there was already substantial methodological expertise within the NSI on mobile data and their potential use for public interest. To prepare for future epidemics and more generally for public interest uses of these data, this investment must continue and be deepened. In fact, the production of robust and reliable statistics of public interest requires going further in the combination of sources, including the mobilization of raw data. As described above, commercial indicators are not always adapted to the NSI’s needs for official statistics (Cousin and Hillaireau, Reference Cousin and Hillaireau2018)—leading Eurostat and various NSIs to launch initiatives to design an end-to-end statistical process (Dattilo et al., Reference Dattilo, Radini and Sabato2016; Ricciato et al., Reference Ricciato, Lanzieri, Wirthmann and Seynaeve2020; ESSnet Big Data II WPI, n.d.). Consequently, work to assess the quality of the information that can be extracted from mobile data, to establish reliable and transparent statistical processing methods that protect privacy must continue. In this sense, multipartner research initiatives as well as experimental work on real data should be encouraged. As an example, the French National Research Agency funded project MobiTic (measuring people’s mobility and presence using information technologies and communication) brings together teams from Gustave Eiffel University, INSEE, CNRS, and Orange. MobiTic aims to produce a reliable, representative, and open-source estimation method of the population present in a given place and time and mobility statistics by combining digital and traditional data (https://mobitic.huma-num.fr). Like some other initiatives, such as the OPAL project (OPen ALgorithms for better decisions, https://www.opalproject.org), it aims to prepare, test, and validate algorithms that could be integrated directly into the MNO Information System. The technical issues of collecting, storing, and processing the massive data required for human mobility analyses have, in fact, long since been resolved.
6. Build Roles for Public Parties in Accordance with the Purposes They Are Mandated for
The great willingness of the parties to collaborate for the public interest in the context of the health crisis has made it possible to respond quickly and effectively. However, future partnerships to be built could even better take advantage of the roles that the parties can play in accordance with the purposes they are mandated for. For instance, the principles under which an NSI can build partnerships with mobile operators are: (i) neutrality: an NSI is open to work with each and all mobile operators, (ii) protection of privacy and business secrets: an NSI will protect both, and (iii) transparency and quality control: an NSI operates on strict transparency principles, which requires a complete traceability of the data used to produce official statistics and access to the needed information to assess and ensure the quality of the products. The COVID-19 crisis gave INSEE the first opportunity to release multioperator statistics on the population present in a given place and time. This position must be consolidated. The NSI is legitimate to combine sensitive information—such as penetration rates, coming from several operators, since it is used to protect business secrets (NSIs already receive very detailed financial information on firms through tax records, for instance) as well as privacy secrets.
7. A Regulatory Framework Aligned to GDPR
While the GDPR has well integrated the public interest objective of processing personal data (including geolocation data), the e-privacy regulation has not yet followed. EU telecom operators are subject to the e-privacy regulation and were therefore unable to make the billing data they permanently store available. Moreover, location data collected from electronic communication providers, such as MNOs, may only be processed within the remits of Articles 6 and 9 of the ePrivacy Directive. The national laws implementing the ePrivacy Directive specify that such data can only be used by the operator when they are made anonymous, or with the consent of the individuals. This regulation does not provide for an exemption for scientific research or public statistics, such as Article 89 of the GDPR (https://gdpr-info.eu/art-89-gdpr).The future regulation, currently under negotiation, should be aligned to GDPR.
8. Investments in Privacy-Preserving Techniques
Whether for research or for official statistics, it seems necessary to be able to develop anonymized aggregates from raw data in collaboration with the operator. Developing high-quality indicators requires exploiting individual and longitudinal microdata over a long period of time for adjustment. The development of technical solutions such as multiparty computing, which by design preserves privacy, could be a way to limit the reidentification risk and, at the same time, to allay MNOs’ concerns about revealing sensitive business information.
Investing in privacy-preserving techniques: issues for MNOs
On top of confidentiality guarantees, secure multiparty computation in which different actors collaborate on producing a common output from private data could be realistically considered when leading to product improvement. Nevertheless, such new solutions would require financial investments that most operators consider out of reach given the fragility and uncertainty of the current market.
9. Jointly Promote the Social Acceptability of a Reasoned Use of MPD
A central issue has been the tension between the reputational risks facing an MNO and its willingness to participate in the fight against pandemia. Even when only statistical indicators were actually used, the risk of possible suspicion of individual tracing was critical for the operator’s decision-makers. In fact, bad press and cost in terms of brand image seem to be a considerable risk to consider when using customer data, even when transformed into anonymous aggregates. However, the open collaborations with official statistics as described here serve as an example and make it possible to clarify the processing conditions in the interest of privacy while emphasizing the public interest of the information finally released, such as the post of INSEE blog that accompanied press releases (Sémécurbe et al., Reference Sémécurbe, Suarez Castillo, Galiana, Coudin and Poulhes2020).
10. Articulating Commercial and Public Interest Rationales
The simultaneous treatment of MPD for commercial purposes and for public statistical uses creates obvious tensions. Both aims are legitimate but follow different modus operandi and can conflict with each other if not articulated. The circumstances of the COVID-19 crisis were so exceptional that the issue was easily overcome, on an ad hoc basis, during the apex of the crisis. But, defining ex-ante the principle of such articulation in more normal circumstances appears to be critical to the sustainability of any public–private partnership. The commercial potential of mobile data lies in addressing the specific requirement of public or private customers through custom-made treatment (i.e., specific geographical and temporal scope), whereas NSIs are producing reference statistics which are more general and do not aim to address specific needs. The potential overlap appears minimal, unless the market remains a niche market. In addition, one of the constraints to grow such a market lies in the difficulty for the potential clients to assess the accuracy and value of the statistics produced. The simultaneous exploitation of mobile data by operators and NSI can be an opportunity for the NSI to contribute to the qualification and improvement of the data extraction process, thus contributing to the development of the market. It relies on the capacity of the NSI to access the needed information on raw data treatment by mobile operators while legitimately preserving business secrets.
11. Articulating Transparency and Business Secrets Requirements
The business aims pursued by mobile operators imply to protect the intellectual property deriving from the investment made to produce the data and develop the associated data treatments. The methodological investments constitute legitimate business secrets, which can be protected by law. In addition, when the raw data underlying the statistics are particularly sensitive on privacy grounds, the requirement to process the data transparently is even more acute. Similarly, as parternships are developed, mobile operators should document further the treatment being applied to produce the commercial statistics to improve their commercial value, while not revealing the core intellectual property. This documentation may not typically include the release of the detailed treatments and algorithms’ source codes to third parties. The NSIs, which are routinely subject to the requirement of documenting their statistics can assist in this process with their experience, to ensure that the information released is sufficient to assess the relevance and accuracy of the statistics produced. This potential contribution of NSIs appears as a natural synergy of NSIs/mobile operator partnership, as NSIs are required to understand and assess the relevance and quality of intermediate data used to produce public statistics.
12. Toward a Sound Governance
Even though all parties were willing to make exceptional efforts to contribute to addressing the challenges arising from the COVID-19 crisis, the need to ensure a strict compliance with relevant regulation and to clarify respective responsibilities was a real constraint. The principles under which an NSI can build partnerships with mobile operators should be reaffirmed. To enlarge the scope of future fruitful collaborations, governance schemes between mobile operators, NSIs, and public bodies should be further strengthened on an ex ante basis, as developing governance schemes in crisis periods is more challenging.
A future partnership between MNOs and INSEE? A conditional Yes!
The shared exploitation of mobile data by the individual MNOs and INSEE could be a real opportunity to improve the MNO statistical production, to multiply the users of this underused dataset and ultimately to contribute to the positive evolution of this market. However a win–win partnership is reachable only if public indicators seldom interfere with the operators’ market, and as long as MNOs can rely on financial incentive to put these operations in place.
Acknowledgements
The authors deeply thank Bouygues Telecom, Orange Business Services, and SFR data analytics units for having made the COVID-19 collaboration possible, as well as for their fruitful exchanges and discussions all along the COVID-19 collaboration, and hope this will continue. They are grateful to Pascal Chambreuil (OBS), Marc Jossermoz, and Loïc Lelievre (SFR Géostatistics) for fruitful discussions on building long-term partnerships between INSEE and mobile network operators. They also greatly thank Benoit Loutrel, Stefania Rubrichi, and Zbigniew Smoreda for helpful comments and discussions on earlier versions of the paper. This paper reflects the authors’ opinions.
Funding Statement
This work received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Competing Interests
The authors declare no competing interests exist.
Data Availability Statement
The data concerning the exchanges between INSEE and MNOs following-up on a specifically designed questionnaire are available from the authors with the permission of the MNOs.
Author Contributions
Writing-review & editing, E.C., M.P., and M.S.C. The authors contributed equally to the article.
Comments
No Comments have been published for this article.