The training of Artificial Intelligence (AI) models relies on extensive amounts of “data,” often sourced from content protected by copyright, related and sui generis rights. The discussion of whether and how to strike a balance between licensing and exceptions under copyright law is one of global relevance. While some countries have adopted or considered adopting specific exceptions to allow text and data mining (TDM), others (most) have not introduced any new legislation. In Europe, much of the attention has so far centred on Article 4 of Directive 2019/790 (DSMD), including in the context of a potential UK reform.
The starting point of this contribution is the following four-fold observation. First, TDM may be part of AI training processes, but it is neither synonymous with AI training nor is it all that AI training entails, including in terms of acts restricted by copyright and related rights. Second, from a European (thus including both the EU and the UK) perspective, limiting the attention to Article 4 DSMD is myopic, as national case law demonstrates. Third, calls have recently been made to relax EU copyright rules to facilitate “research,” seemingly including the President of the European Commission herself, who announced forthcoming legislative proposals “to make Europe the home of innovation again.” Fourth, the UK Government’s Copyright and AI consultation has recently ended: should no reform be ultimately undertaken, the application of the existing TDM exception will depend to a large extent on how courts construe the notions of “research” and the “non-commercial” requirement thereof.
Moving from the above, this study investigates whether and to what extent unlicensed AI training activities could be undertaken by relying, not on Article 4 DSMD as transposed into national law or a hypothetical reform of the UK system of exceptions, but rather on what appear to be so far potentially overlooked defences. Reference is made specifically to research and education exceptions, notably Article 3 DSMD and Article 5(3)(a) of Directive 2001/29 (InfoSoc Directive), also read in light of Article 5 DSMD. The discussion of other jurisdictions – including the US and countries, like South Korea and Singapore, which have adopted open-ended fair use-style defences – is also undertaken. This is done to determine whether unlicensed AI training, including training seemingly done for the purpose of research or education/learning, might be considered lawful.
In light of the context summarized above, the study tackles two key questions: (a) whether unlicensed AI training may be classified as “research” or even “learning” in the context of “teaching,” and (b) whether commercial AI developers may take advantage of the provisions above. Ultimately, both questions are answered in the negative, finding that no exception or open-ended defence fully covers unlicensed AI training activities. As a result, a licensing approach (and culture) appears to be the way for AI training to be undertaken lawfully, including when this is done for “research” and “learning.”