Today’s technology giants have won market dominance through the collection, analysis, and synthesis of data. With the increasing dependence on digital technology, and increasing data dependency of said technology, data can be seen as a precondition to economic participation. Exploiting the steep economies of scale and network externalities of data, firms such as Google, Facebook, and Amazon are in positions of near monopoly. When such service providers disallow users from transferring their data to competing services, they can lock in users and markets, limiting the entry of market competition. Providing users with rights to both retrieve their data and transmit it to other firms potentially serves as a counterbalance, easing the acquisition of users for new market entrants. As such, data portability legislation has been claimed to have far-reaching implications for the private sector, reducing or hindering tools of forced tenancy. With users no longer married to a single firm, inroads for new technology are paved, with the average user more likely to have the ability and resource to change provider and adopt a solution that better suits their individual needs.
This chapter explores the concept of data portability in a world driven by artificial intelligence (AI). Section I maps out the journey that data takes in a data economy and investigates the valuation and cost of data. It posits that, because of data analytics and machine learning models, “generated” data, as data that has been derived or inferred from “raw” data, is of higher value in the data market, and carries a higher cost of production. Section II discusses what is required for the free flow of data in competitive datacentric markets: regulations on data tradability and portability. Our analysis leads to doubt that the newly introduced, hotly debated rules regarding portability of data under European Union (EU) law will adequately provide these prerequisites. The chapter concludes by suggesting an alternative model for data portability that distinguishes on a value basis rather than between personal and nonpersonal data.
I The Journey and Value of Data
This first section reviews the journey of data from collection to classification: the path from its moment of provision by a data subject to its subsequent transformation into inferred data. We present a categorization model that distinguishes data according to its origin, primarily distinguishing between raw and generated data. Utilizing these categories, we illustrate how data generated by machine learning models is being created at an exponential rate in today’s data-driven economy. Lastly, a data valuation model is introduced, holding that the value of generated data is higher than raw data, and that the value of generated data scales exponentially in aggregation. We claim that the added value of generated data is created by firms that carry the costs of providing big data analytics, including machine learning.
A Origin of Data
Data can be classified according to a variety of parameters. A classification model can rely on the sensitivity of the subject, purpose of use, context of procession, degree of identifiability, or method of collection of data. We build on a categorization model of data that was introduced by a roundtable of Organisation for Economic Co-operation and Development (OECD) privacy experts in 2014,Footnote 1 and expanded by Malgieri.Footnote 2 The taxonomy categorizes data according to its origin – that is, the manner in which it originated – and distinguishes between raw data (provided and observed data) and generated data (derived and inferred data).
Raw data (“user-generated data”) encompasses provided and observed data. Provided data is data originating from the direct actions of individuals (e.g., registration form filing, product purchases with credit card, social media post, etc.). Observed data is data recorded by the data controller (e.g., data from online cookies, geolocation data, or data collected by sensors).
Generated data (“data controller-generated data”) consists of derived and inferred data. Derived data is data generated from other data, created in a “mechanical” manner using simple, non-probabilistic reasoning and basic mathematics for pattern recognition and classification creation (e.g., customer profitability as a ratio of visits and purchases, common attributes among profitable customers). Inferred data is data generated from other data either by using probabilistic statistical models for testing causal explanation (“causal inferences”) or by using machine learning models for predicting output values for new observations given their input values (“predictive inferences”).Footnote 3
The relationship between information and the data subject can be classified as either strong (provided data), intermediate (observed and derived data), or weak (inferred data). The stronger the relationship, the more individuals are involved in the creation of the data. Illustratively, a Facebook user has a strong relationship with their registration data and their posts. An example of a weaker relationship would exist when Facebook, based on its algorithmic models, assigns a liberal or conservative political score to this user. The user’s age, geographic location, and posts are all data provided by the user, and eventually included as independent variables in the model. But it is Facebook’s model that will ultimately predict the likelihood the user belongs to either group.
The evolving relationship from provided to inferred data, or from a weak to strong relationship between the data subject and the data, is illustrated in Figure 11.1. Although the delimitation of data types is crucial, a number of gray areas exist. Take the example of a data subject that does not upload data themself, but actively selects which sets of data and their conditions the data controller may access. It is unclear whether these datasets are received or observed by the controller.Footnote 4 Note that inferred data is created not only by the analysis of a specific user’s data, but also by the analysis – via statistical learning and automatic techniques to elicit patterns – of all data available to the data generator, including personal data provided by other users.Footnote 5
B Artificial Intelligence and Data
With the rise of AI, generated data is expected to proliferate at an exponential rate. As more and more institutions take advantage of increasingly broad datasets, computing power, and mathematical processes,Footnote 6 the amount of generated data will expand and the costs of prediction decrease.Footnote 7 As pointed out by a recent report ordered by the House of Commons of the United Kingdom, protecting data helps to secure the past, but protecting inferences is what will be needed to protect the future.Footnote 8 Inferential data generated by machine learning techniques has already been used (with varying success)Footnote 9 to predict sexual orientation based upon facial recognition; emotions of individuals based on voice, text, images, and video; a neighborhood’s political leanings by its cars; and physical and mental predictions, to name but a few.Footnote 10 With the advancement of the technology and the availability of large training sets, the accuracy of inferred predictions will increase as well.
The predictive potential of machine learning is not confined to academic use cases, as commercial applications abound. Recent patent applications in the USA include methods for predicting personality types from social media messages,Footnote 11 predicting user behavior based on location data,Footnote 12 predicting user interests based on image or video metadata,Footnote 13 or inferring the user’s sleep schedule based on smartphone and communication data.Footnote 14 In all these instances, raw user data is collected on mobile devices (e.g., smartphones, tablets, etc.) to build and train the predictive model, and then used to predict individual user characteristics and behaviors (as generated data). This generated data is of value to marketing and advertising firms or organizations more generally to identify target users for their products and services.
C Valuation of Data
In general, the valuation of data is difficult, as it varies widely by type, scale, and industry sector. We make two assumptions that underly this chapter, and that support our position that generated data is of higher value than raw data. We claim that the higher value of generated data derives from the investment of firms in development, and subsequent use of statistical and machine learning models.
Our first assumption is that at the single datapoint level, raw data is on average of lower value than generated data. Our explanation for this assumption is as follows: raw data (such as the age of a data subject) is assumed to be, on average, of lower value to companies than generated data (such as future health predictions). In fact, in the marketplace, the price for general information, such as age, gender, and location, can be purchased for as little as $0.0005 or $0.50 per 1,000 people.Footnote 15 We assume that the price for creation of and access to generated data is higher.Footnote 16 The value of the datapoint integrates the value-added created by the respective algorithm. This is a generalizable claim despite specific and highly contextual differences. To provide a counterexample, data relating to diseases directly provided by a patient might be of higher value to an insurance company than a prediction based on that data.Footnote 17
Our second assumption is that, on a large scale, the value of raw data increases linearly, whereas the value of generated data increases exponentially. We make this assumption for the following reasons: for statistical or machine learning approaches, received and observed data will need to be purchased on a large scale in order to build models that will create inferred data. Since the accuracy of predictions is largely a function of the size of training datasets, we can assume that the value of received and observed data is close to zero for small-scale datasets. On the other hand, past acquisitions of datacentric companies reveal a significantly higher value per user, varying between $15 and $40. For instance, Facebook acquired WhatsApp and Instagram for $30 per user.Footnote 18 These per-user valuations reflect both the quality and scope of the information collected, as well as the expectation of continued platform engagement, and subsequent additional data creation by each acquired user.Footnote 19 In short, these acquisitions aim at exploiting trends and patterns in large groups with high confidence in the quality of the data. The process of value creation directly depends on investment in machine learning models needed to convert data into predictions.Footnote 20 These include direct operating costs, such as the employment costs of engineers, the licensing costs for software programs, the costs for obtaining more computer power and storage, and the costs of integrating systems with the implementation platform, as well as indirect costs such as training and change management costs or cybersecurity monitoring costs.Footnote 21 Therefore, the valuation of datacentric companies reflects the value of aggregated generated data or the potential for firms to create aggregated generated data.Footnote 22 We represent the respective value of raw data and generated data in Figure 11.2: with more data, the value of raw data increases linearly (a), whereas the value of generated data increases exponentially (b).
II Portability of Data
This second section analyzes the newly introduced regulation of data portability in Europe. With the goal of moving toward a single market for data, the EU has sought to remove obstacles to the free movement of data via two regulations regarding personal and non-personal data. We evaluate the newly introduced right to data portability under the General Data Protection Regulation (GDPR)Footnote 23 and the porting of data regime under the Non-Personal Data Regulation (NPDR).Footnote 24 Our analysis of both data portability concepts suggests that the current separation between personal and non-personal data does not provide for a comprehensive and coherent data portability regime.
A Free Flow of Data
EU law has a long tradition of shaping regulation to create a single market for goods, services, people, and capital. In recent years, the European Commission has emphasized the need for a data ecosystem built on trust, data availability, and infrastructure.Footnote 25 Ensuring the free flow of data is part of this effort to establish a “digital single market”.Footnote 26 Data is increasingly seen as a tradable commodity.Footnote 27 While the framework for trading data can be found in the traditional civil law rules for purchase contracts, the contract performance – that is, the actual transfer of the data – largely depends on the existence of data portability as a legal institution.Footnote 28 We are interested in how a regulatory framework for the market may level the playing field, challenging large incumbents with a vested interest in not transferring potentially valuable data to competitors.Footnote 29
The more data is concentrated in the hands of a provider, the more likely it will be considered to hold a dominant position under EU competition law.Footnote 30 Although the dominant competition law test is based on market share, not data concentration, said concentration is likely to lead to large market shares in data-driven markets.Footnote 31 The European Data Protection Supervisor has discussed how portability of data can foster a functioning market by preventing the abuse of dominance and the lock-in of consumers.Footnote 32 EU competition law, however, can generally be characterized as an ex post regulation: in fact, the European Commission only intervenes once a dominant position has been abused in already existing markets.Footnote 33
As digital markets are especially prone to winner-takes-all (or -most) outcomes,Footnote 34 additional ex ante regulations are key. The EU has set up a number of these ex ante mechanisms, in particular in sector-specific regulation. A prominent example of this is the telecommunications sector: the Universal Service Directive established a right to number portability, considered a predecessor to the right to data portability under EU law.Footnote 35 The portability of telephone numbers and of data facilitates effective competition and can be considered a form of ex ante regulation as it creates the prerequisites for establishing a functioning telecommunication market.
The free movement of data is further addressed in Art. 16(2)1 of the Treaty on the Functioning of the European Union (TFEU), which gives the EU legislator the power to establish rules regarding the protection and free movement of personal data. The GDPR confirms the free movement of data as a subject-matter of the regulation and postulates that the free movement of personal data within the EU shall not be restricted or prohibited for the protection of personal data.Footnote 36 These affirmations refer once more to the foundation of the EU: free movement of goods, services, people, capital, and now data in a single market. Since May 2019, the regime is complemented by the NPDR.Footnote 37 Targeting non-personal data, the NPDR is entirely based on the ideal of the free flow of data. According to the NPDR, the two regulations provide a coherent set of rules that cater for free movement of different types of data.Footnote 38
B Regimes for Data Portability
The portability of data is explicitly covered by both the GDPR and the NPDR. The former only applies to “personal data”, the latter to “non-personal data”.Footnote 39 EU law therefore clearly delineates personal from non-personal data.
Personal data is defined as “any information relating to an identified or identifiable natural person (‘data subject’)”.Footnote 40 The notion of personal data is broad, as it only requires that a natural person can be identified directly or indirectly. It is sufficient, for instance, that the link to the natural person can be established using other reasonably accessible information – such as a combination of specific browser settings used to track behavior for personalized advertising.Footnote 41
Non-personal data, by contrast, is any information that does not relate to an identified or identifiable natural person. Firstly, this encompasses data that originally does not relate to an identified or identifiable natural person, such as weather information or data relating to the operation of machines. Secondly, properly anonymized data cannot be attributed to a specific person and is therefore non-personal.Footnote 42 However, if non-personal data can be linked to an individual, the data must be considered personal.Footnote 43
1 Portability of Personal Data
The newly introduced right to data portability in Art. 20 GDPR gives the data subject the “right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided”. Data subjects shall have the right to receive the personal data concerning them and transmit that data to other controllers.
The provision mirrors the GDPR’s dual purpose: both the protection of personal data and the free flow of personal data. The right to the protection of personal data is intertwined with a market-centered economic approach to personal data.Footnote 44
Not all personal data is subject to the right of portability. Only personal data for which the processing is based on consent or a contractual relationship is covered by the norm.Footnote 45 This limitation largely corresponds to the requirement that the personal data in question was provided by the data subject.Footnote 46 Accordingly, raw personal data is covered by portability because provided data is, by definition, directly provided by the data subject and observed data is (by most accounts) considered as such.Footnote 47 Generated data, however, whether derived or inferred, is not considered as being provided by the data subject.Footnote 48 Therefore, a large share of personal data is not subject to portability as it is not provided by the data subject.Footnote 49
The GDPR provides a relatively strong right to data portability for the data subject. Data portability is seen from the data subject’s perspective, with a focus on data protection. Creating a comprehensive regime for the portability of all kinds of personal data was not the priority of the EU legislator, as shown by the exclusion of generated personal data. Although the norm is often discussed as being situated in the area of competition law – with its aim of facilitating the free flow of data – data portability under the GDPR is still being considered closer to genuine data protection law than to regulatory competition law.Footnote 50
2 Portability of Non-personal Data
With the NPDR, the EU encourages the porting of non-personal data.Footnote 51 The Internet of Things or industrial settings are major sources of non-personal data, as exemplified by aggregate and anonymized datasets used for big data analytics, data on precision farming, or data on maintenance needs for industrial machines.Footnote 52 The Regulation addresses two obstacles to non-personal data mobility: data localization requirements imposed by the public sector and private vendor lock-in practices.Footnote 53 Such a lock-in effect might exist if cloud services like data storage or cloud-based data applications do not ensure the portability of the respective data.
While the GDPR provides for an enforceable right of the data subject, the NPDR approaches portability differently. The regulation encourages self-regulatory codes of conducts; that is, legally nonbinding instruments. The norm expressively refers to best practices that should facilitate the porting of data through “structured, commonly used and machine-readable formats including open standard formats where required or requested by the service provider receiving the data”.Footnote 54 Meanwhile, codes of conduct on the porting of data and switching between cloud service providers have been developed by the cloud switching and porting data working group (SWIPO) for Infrastructure-as-a-Service (IaaS) and Software-as-a-Service (SaaS) cloud services.Footnote 55 These codes require, inter alia, the use of application programming interfaces (APIs),Footnote 56 open standards, and open protocols by cloud service providers.
C Analysis: The Concept of Data Portability
Our analysis depicts the limitations of existing EU law in providing for the free movement of data via a comprehensive and effective portability regime. In particular, we discuss how the distinction between personal versus non-personal data and raw versus generated data may impact the concept of data portability.
1 Distinction between Personal and Non-personal Data
The EU framework separates data into two types: personal and non-personal. This separation subjects data to different regulatory regimes – with a number of consequences in terms of portability. The distinction between personal and non-personal data is meant to preserve a high level of protection for data that can be related to an individual. The GDPR accordingly sets forth a right to access available for all type of personal data, whether raw or generated.Footnote 57 The NPDR targets data that is not related to an identifiable natural person. The interests of datacentric businesses stand in the center of the regulation. The free flow of data is therefore targeted from the data subject’s perspective as well as from the perspective of market regulation.
In theory, the distinction between personal and non-personal data appears straightforward. In practice, this is often not the case. For instance, large datasets where personal and non-personal data are mixed up (so-called mixed datasets) make it hard to identify the applicable legal regime. The NPDR recognizes this situation and addresses it by splitting up the application of both legal regimes to the respective type of data.Footnote 58 In cases where both types are inextricably linked, the application of the GDPR takes precedence (even if personal data represents a small part of the set only).Footnote 59 Addressing the complexity of GDPR compliance for mixed datasets can have a large impact on technology firms’ associated costs. Uncertainty still prevails in the field on how to avoid falling under the GDPR.
This lack of legal certainty provides an incentive to anonymize data. The underlying belief is that personal data can be turned into non-personal data by anonymization, as anonymization destroys the link to an identifiable person. Consequently, the NPDR takes into consideration future technological developments making it possible to turn anonymized data into personal data, with the consequence of then having to treat such data as personal data and to apply the GDPR to it.Footnote 60 Recent studies, however, have challenged the common understanding of anonymization. The European Commission itself has addressed these concerns, but remains committed to the belief that anonymization can be achieved in practice.Footnote 61 In an influential study, Rocher, Hendrickx, and de Montjoye showed that 99.98 percent of Americans would be correctly reidentified in any dataset using fifteen demographic attributes.Footnote 62 A range of additional studies have supported this point, with reidentification of supposedly anonymous datasets in healthcare, ride-sharing, subway, mobile phone, and credit card datasets.Footnote 63 All this raises doubts about whether the distinction between personal and non-personal data can be upheld in the future.
2 Distinction between Raw and Generated Data
The right to data portability under the GDPR only applies to personal data provided by the data subject. From the viewpoint of providing access to the market of social media services, the portability of raw data alone is considered sufficient to prevent customer lock-in. Although the controller uses raw data (provided and observed) to generate derived and inferred data, generated data is not considered as “provided by the data subject” in the sense of Art. 20(1) GDPR. As such, generated data does not fall under the right to data portability. However, if it qualifies as personal data, generated data is still subject to the right of access or the right to not be subject to automated individual decision-making.Footnote 64 Consequently, the GDPR offers the data subject access to its personal data and protection regardless of whether the data is raw or generated.Footnote 65
A reason why the right to data portability under the GDPR does not cover data created by the controller (i.e., generated data) might be that portability would here grant a strong advantage to competitors. Porting generated data would grant companies access to especially valuable data (Assumption 1), whose aggregation scales exponentially (Assumption 2).Footnote 66 The GDPR envisages a model where the data subject provides raw data to social media providers and leaves the additional value of the data to these providers as a compensation of their costs. But this is only justified in instances like Facebook, where the user “pays” with their data in exchange for the free use of the service. The service provider bears the cost of providing the social network and may recoup their investments by gaining profit from the added value the raw data gains through inferential information, illustratively via advertising. If the data subject, however, pays for a service, be it social networking or an analysis of their personal data, the situation is entirely different: the service provider’s costs are being compensated by monetary payment. The added value of the derived or inferred data should then remain with the data subject and fall under the scope of the right to data portability.Footnote 67
This situation is similar to the one envisaged by the NPDR: one between a customer and a service provider. When the customer provides raw data to the data service provider who conducts statistical analysis or prediction through machine learning on this data on behalf of the customer, the customer bears the cost of transformation of the data. As the costs are assigned to them, they should be able to obtain the value of the resultant generated data, to transfer it and switch providers. This right would in general already be subject to a civil law contract by which the relationship between service provider and customer is governed. The role and task of regulation would then only be to enforce portability in cases where service providers have market power to the extent that such portability and its conditions (portable file format, interfaces, etc.) is not subject to the service agreement. For this reason, the data porting rules under the NPDR may be insufficient as they are nonbinding and limited to self-regulatory measures. The European Commission or the respective member state authorities would need to take competition law measures based on abuse of dominant position, which have the limitation of being ex post in nature.
An already binding obligation of non-personal data portability can be seen in Art. 16(4) Digital Content Directive,Footnote 68 albeit limited to the area of digital content and digital services: the consumer is granted the right to request “any content other than personal data, which was provided or created by the consumer when using the digital content or digital service supplied by the trader”. This stipulation affirms our position: the value of digital content – that is, the data – created by the customer is assigned to the customer, leading to a right to retrieve that data in a commonly used and machine-readable format, as the second subparagraph states.
D Data Portability beyond the European Union
The EU has taken the lead in shaping the way the world thinks about data protection, privacy, and other areas of digital market regulation.Footnote 69 Its data protection standards in particular have been diffused globally.Footnote 70 Firstly, the ideas and concepts of the GDPR – in our case of data portability – have influenced a number of jurisdictions to enact data portability norms themselves. Secondly, international firms are bound directly by the GDPR’s and the NPDR’s extraterritorial scope, even without being established in the EU. Thirdly, because of the “Brussels Effect” foreign corporations often prefer to respect EU law even without a legal obligation to do so. Fourthly, international soft law has been and can be deployed to integrate data privacy principles from the EU, playing thereby a key role in the international governance of portability regimes. Fifthly, data privacy obligations have been stipulated in international treaties, requiring the implementation of data portability norms within the national law of ratifying states. In this regard, international economic law can help to diffuse data portability rules across the world.
1 Adoption by Third Countries
Numerous data protection laws around the world have emulated the GDPR, including its right to data portability.Footnote 71 A prominent example is the California Consumer Privacy Act, signed a month after the GDPR came into effect. The legislation incorporates portability in the context of the right to access as a modality of how electronic access should be provided; that is, in a portable format.Footnote 72 In comparison to the GDPR, the stipulation has an arguably broader scope, as all personal data is portable, and not only the personal data provided by the data subject. On the other hand, the norm is weaker as the businesses collecting personal information can provide access nonelectronically by simple mail, even if the data is stored digitally. Businesses are thus offered a way to circumvent portability, unless they do not mind the additional costs of mail delivery (which might be less than investing in interoperability).
Other examples of adoption include Benin, which enacted a GDPR-like legislation with its Code du numérique and included a right to data portability.Footnote 73 Brazil has adopted a new General Data Protection Law that introduces a right to the portability of data.Footnote 74 A possible codification of data portability is further vividly discussed by a number of countries.Footnote 75 Japan, for instance, has initiated a study to assess the merits and demerits of data portability, taking into consideration the costs for firms to establish portability.Footnote 76
2 Extraterritorial Application
EU law imposes itself on foreign entities by extending its scope of application beyond EU territory. Inspired by the famous Google Spain judgment of the European Court of Justice,Footnote 77 Art. 3(2) GDPR introduces a remarkably broad territorial scope: GDPR applies to controllers or processors not established in the EU if the processing activities are related to the offering of goods or services to data subjects in the EU or to the monitoring of their behavior within the EU.Footnote 78 Data portability can therefore be requested by an EU citizen or resident from a foreign – for instance, US – firm, if the activities of the firm fall under the GDPR.
Similarly, the NPDR applies in cases where the processing of electronic non-personal data in the EU is provided as a service to users residing or having an establishment in the EU, regardless of whether the service provider is established in the EU (Art. 2(1)(a) NPDR). However, since data portability under the NPDR is of a nonbinding nature, abiding is voluntary. As we suggest, a comprehensive data portability regime for personal and nonpersonal data would therefore be desirable at an international level.
3 Unilateral Power
The EU has been able to externalize its laws beyond its borders via the so-called unilateral power of the EU. While foreign firms are only bound by their national laws, they increasingly have been following EU data protection law.Footnote 79 This can, on the one hand, be explained by the advantages of international firms following a single rule, and preferring to harmonize their process and services for cost mitigation purposes.Footnote 80 In other words, it might be cheaper for a company to develop a single framework (that follows European data protection law), rather than two or more different ones (one following a stricter European regime, one a more lenient one). In the past, large technology companies like Facebook and Google have often made their data portability tools available to all their customers, independently of their location.Footnote 81 On the other hand, Apple took a staged approach and introduced its portability tool for users in Europe only in 2019 and made it available to US and Canadian users in 2020.Footnote 82 Apple, Facebook, Google, Microsoft, and Twitter are further contributing to the creation of an open-source framework connecting providers by translating provider specific APIs into common “data models” that can be transferred.Footnote 83
4 International Soft Law
In the past, a number of guiding documents from the EU, such as the Article 29 Working Party Guidelines on the right to data portability in particular, already had a major impact on the interpretation of data portability concepts.Footnote 84 The guidelines, set by this former advisory board that has been replaced by the European Data Protection Board (EDPB) representing the data protection authorities of the EU member states, have been subject to extensive academic discussion and scrutiny by corporations.Footnote 85
International soft law has long served as inspiration for national privacy codification, beginning with the OECD Privacy Guidelines of 1980, which were revised in 2013.Footnote 86 The Guidelines explicitly refer to personal data as an “increasingly valuable asset”. Their aim has been to foster the free flow of information by preventing unjustified obstacles to economic development, namely by setting a minimum standard for national legal frameworks on privacy. The original 1980 OECD Privacy Guidelines were influential at first, encouraging the adoption of data protection laws in eight countries outside Europe (including Canada and Japan), but their impact diminished when the EU adopted its Data Protection Directive in 1995,Footnote 87 which went beyond the OECD Guidelines.Footnote 88 The OECD Guidelines also influenced the Asia-Pacific Economic Cooperation (APEC) Privacy Framework.Footnote 89
As the OECD is reviewing its Guidelines, it could include a data portability norm in a future revision. However, as the OECD Guidelines only cover personal data, a right to data portability in the OECD Guidelines (alone) would not match its (optimal) scope.Footnote 90 Data portability should, in our view, rather be added to other international soft law instruments and encompass both personal and non-personal data.
5 International Hard Law
European portability concepts have been reflected in international treaties. This may be exemplified by the inclusion of a clause regarding the portability of telephone numbers in the Comprehensive and Progressive Agreement for Trans-Pacific Partnership (CPTPP), a trade agreement between Australia, Brunei, Canada, Chile, Japan, Malaysia, Mexico, New Zealand, Peru, Singapore, and Vietnam.Footnote 91 As mentioned in subsection B, the right to telephone number portability in the former Art. 30 Universal Services DirectiveFootnote 92 can be seen as a predecessor to data portability. Furthermore, the Eastern Caribbean Telecommunications Authority is planning to codify number portability in its new Electronic Communications Bill.Footnote 93
Against this backdrop, the question arises whether international trade agreements should include data portability provisions going forward – either in competition chapters or in dedicated electronic commerce or digital trade chapters. Regulation on an international level, however, would require a supranational understanding of the modalities of data portability. Because data flows cross-countries, the need for a coherent regulation of portability is strong. Nonetheless, views on the modalities of a portability regime differ across states. The type of data it should cover, the concrete definition of “portability”, the extent of interoperability required, the kinds of standardization of formats and interfaces, and whether retrieval “in a commonly used and machine-readable format” suffices are some of the many questions on which consensus should be reached. In this regard, the first experiences with the GDPR and NPDR will be crucial in determining the future of portability.
III Conclusion: Toward an Alternative Concept of Data Portability
The EU regulations regarding data portability have an ambitious aim: to contribute to the creation of an effective data ecosystem characterized by the free flow of data. Both regimes, however, were designed to address very specific situations – the GDPR regime for users and their free-of-charge social media provider; the NPDR regime for business customers and their big data analytics providers. Both regimes find application beyond the use cases they were designed for. Instead of distinguishing between personal and non-personal data, a better regime for data portability could hinge on whether the value of generated data serves as compensation for the respective service providers’ costs.
Ultimately, the distinction between personal and non-personal data can be challenged as inappropriate for data portability. Data portability is a concept that primarily serves the free flow of data rather than the protection of personal data. A classification distinguishing between raw and generated data has its advantages, particularly when it factors in the value of data. Competition law could rely more heavily on the value of data and its role in providing cost compensation, instead of using a terminology inherited from data protection law. Future data portability regimes may be better designed once they are removed from the realm of data protection. This assumes that the data subject is sufficiently protected by the remaining rights under the GDPR, especially via the right of access. Guaranteeing an effective right of access for raw and generated data is key.
Consequently, we propose that the difference in choice of the regulatory regime for data portability should be made with a view to the value of data and depending on whether it provides compensation for cost-bearing. Raw data, being assigned to the customer or the data subject, would be portable, while generated data would require a more refined regime depending on whether it serves as a means of compensation.