Introduction
Scholars interested in explaining judicial behavior often use court judgments as a primary source of information to test their theoretical expectations. In doing so, it may seem natural to approach the judgment as the main or only unit of observation (see for example Vanberg Reference Vanberg2005; Owens et al. Reference Owens, Wedeking and Wohlfarth2013; Corley and Wedeking Reference Corley and Wedeking2014). After all, the main output of litigation is frequently a single court-produced document. However, this approach overlooks that judgments are neither disconnected from each other nor internally homogeneous. For one, judgments are interlinked by addressing similar and related questions. This is obvious when a court in one judgment explicitly cites existing case law for a rule and then develops that rule to be applied in subsequent cases. Further, a court settling a dispute must frequently address multiple legal questions within a single judgment, including procedural, constitutional, and substantive questions.
Consider, for example, the judgment of the Court of Justice of the European Union (CJEU) in Laval. Footnote 1 The case concerned a labor conflict between a Latvian corporation, Laval un Partneri Ltd., who had won a contract to construct school buildings in Sweden and Swedish labor unions who blocked Laval’s access to the construction site in order to force them to enter into a collective bargaining agreement (CBA) with terms in line with other Swedish CBAs. Although the unions’ actions were supported by Swedish law, Laval challenged their legality on the grounds of EU law.Footnote 2 Laval convinced the Swedish court to request a preliminary reference from the CJEU on two questions regarding the interpretation of EU law. In addition to answering those two questions, the Court also had to decide whether the request for a preliminary reference was admissible. Thus, the CJEU’s judgment in Laval addresses three legal questions, two substantive and one procedural, that are connected to one dispute but distinct from each other.
These characteristics of judgments have implications for scholars of judicial politics who rely on quantitative text data in their work. As scholars’ focus shifted toward the evolution of law at the hand of judges, Clark and Lauderdale (Reference Clark and Lauderdale2012, 329) diagnosed that the empirical literature “has struggled to keep pace” with theories of judicial decision making that center on the contents of judge-made law. Working closely with the text of judgments appears to be the most promising avenue to close this gap (Panagis and Sadl Reference Panagis, Sadl and Rotolo2015), and recent studies illustrate the promise of using computer-assisted text analysis in the field of judicial politics (see Lauderdale and Clark Reference Lauderdale and Clark2014; Aletras et al. Reference Aletras, Tsarapatsanis, Preoţiuc-Pietro and Lampos2016; Dyevre Reference Dyevre2020; Solan Reference Solan2017; Vogel et al. Reference Vogel, Hamann and Gauer2018; Medvedeva et al. Reference Medvedeva, Vols and Wieling2020).
In order to make full use of these powerful methods to explain how judge-made law develops, how individual judges’ preferences feed into their writings, and how external influences shape judges’ answers to the legal questions before them (see Clark and Carrubba Reference Clark and Carrubba2012; Owens and Wedeking Reference Owens and Wedeking2011; Corley and Wedeking Reference Corley and Wedeking2014; Staton and Vanberg Reference Staton and Vanberg2008), we argue that judgment texts ought to be split into blocks addressing individual, internally coherent issues, a concept explained in more detail in Section 2. Viewing judgments as combinations of text blocks that address distinct issues unlocks a unit of observation that captures the aspects of judicial decisions we are often most interested in—the written reasoning of courts on the legal questions they need to address. In this article, we show that splitting judgments into issues is practically feasible and identifies patterns in case law and judicial decision making that studies relying on judgments as units of analysis struggle to uncover.
The article proceeds as follows. In Section 2, we conceptualize what we understand as issues in judgments. Throughout the remainder of the article, we then draw on our experiences of working with judgments of the CJEU to illustrate the implementation and benefits of our issue-splitting approach. In Section 3, we show how machine learning classifiers can mitigate the effort needed to split judgments into issues through manual coding. Empirical illustrations in Section 4 demonstrate how working with issue-split judgments improves our ability to identify the topical content and coherent clusters of judge-made law compared to relying on entire judgments. Finally, in Section 5, we discuss the implications of using issues rather than entire judgments as units of observation for studies applying standard econometric tools to study judicial behavior. We replicate a study conducted by Larsson et al. (Reference Larsson, Naurin, Derlén and Lindholm2017) on the CJEU’s strategic references to its own case law and compare results from an issue-level to a judgment-level analysis. Section 6 offers concluding remarks.
Splitting the judgment
The cases that judges hear engage with the law on various levels of generalization. From a narrow and result-focused perspective, cases are about rulings. A ruling refers to the outcome in a specific case and how the court decided the lawsuit and, more specifically, ruled on the claim(s) brought before it by the parties. For example, one could say that a case is about whether a defendant should be required to pay damages to a plaintiff.
However, most modern studies of judicial behavior center on the questions that the court had to answer in order to decide the case (see Clark and Lauderdale Reference Clark and Lauderdale2010; Lax Reference Lax2011). These questions are typically divided into two types, questions of fact and questions of law, that is, questions regarding what the law is on a particular point. All courts answer questions of fact and law and frequently multiple of both types. Questions of law are particularly significant in cases heard by apex courts as their answers serve as models for deciding subsequent cases.
Previous research acknowledges that courts often need to address multiple legal questions to decide a case. For example, the Supreme Court Database addresses judgments both as a whole, split by what they refer to as issues and legal provisions, and split by actions (see Baum Reference Baum2017; Epstein and Knight Reference Epstein and Knight1998; Segal and Spaeth Reference Segal and Spaeth2002). We draw on the terminology of issues in judgments to characterize blocks of text within judgments that address distinct legal questions.Footnote 3 We conceptualize an issue as a connecting middle layer between the judgment level and the paragraph level, clustering paragraphs addressing the same legal question.
To illustrate, we return to the example of the CJEU’s judgment in Laval. Figure 1 provides a simplified illustration of the CJEU’s judgment in the case, clustering paragraphs into three blocks of text in the judgment: paragraphs 42 to 50 addressing the admissibility of the preliminary reference, paragraphs 53 to 111 addressing the Swedish court’s first substantive question,Footnote 4 and paragraphs 112 to 120 addressing its second substantive question.Footnote 5
The nature of issues considered by different courts is dependent on the institutional and procedural context. In preliminary references answered by the CJEU, issues are closely tied to the questions referred by national courts and capture the CJEU’s interpretations of different aspects of EU law. The issues considered by the European Court of Human Rights (ECtHR), for instance, concern questions whether participating States failed to respect rights included in the European Convention on Human Rights, while courts like the German Federal Constitutional Court (GFCC) typically interpret different constitutional provisions in their judgments to determine whether a statutory act or ordinance infringed claimants’ constitutional rights.
Courts differ not only with regard to the kinds of issues they need to address but also, due to legal and stylistic differences, in how they signal a transition from one issue to another in their judgment texts. Nonetheless, we often find that courts write in a clear, consistent, and structured manner that gives rise to linguistic patterns that mark such transitions. Illustrations in Tables 1 and 2 from jurisprudence of the ECtHR, the GFCC, and infringement proceedings at the CJEU show that such patterns tend to be unique to a particular court or even to one type of procedure within a court.Footnote 6
The table provides typical examples of paragraphs starting an issue within judgments from three courts and the associated, recurring linguistic patterns we can find in these paragraphs.
The table provides typical examples paragraphs concluding an issue within judgments from three courts and the associated, recurring linguistic patterns we can find in these paragraphs.
For example, in the ECtHR’s judgments, a text block addressing an issue can be distinguished from other issues as the court consistently begins its answer by restating the applicant’s claim, using the word “applicant” together with a verb associated with making a claim (e.g., “submit,” “complain,” or “contend”). Paragraphs concluding whether the State in fact violated a convention article then commonly include words such as “accordingly” or “therefore” together with the patterns “been a violation” or “been no violation.” We should highlight that not all judgments texts follow patterns that allow for the identification of both paragraphs beginning and concluding an issue (as we typically find it in the CJEU’s judgments on direct actions and preliminary references; see below). The GFCC is an interesting example here in that it is linguistically consistent and predictable when it starts reasoning on a new issue – in fact, by stating its conclusion – but not when it concludes. Even so, given the GFCC consistently uses particular linguistic patterns it should be feasible to split its judgments into issues by taking blocks of texts between paragraphs that start an issue. Overall, although the legal questions considered by courts such as the CJEU, the GFCC, or the ECtHR are distinctly different, each court makes use of recurring linguistic patterns, and we show below that once we identify the relevant legal context we can exploit these patterns to split judgments into issues.
Going through the trouble of splitting judgments into issues has several uses. First, we can classify what topic the issue concerns by analyzing the associated text or references, for example, using unsupervised approaches such as topic modeling or network-based clustering on the basis of references. Thus, we distinguish between the clustering problem of what text deals with the same legal question (issues) and the classification problem of determining the nature of that question (topics). While there is no objectively correct number of topics, in order for topics to be a useful analytical tool there should obviously be significantly fewer topics than issues. Second, it allows case-law references to be analyzed on an issue level and for the construction of issue-to-issue citation networks. Figure 2 illustrates how such a network would more accurately capture relevant references between judgments that allows for more accurate representation and analysis of network structure and centrality. It would also enhance the ability to assess how central a judgment is in the context of a particular topic as judgments must not entirely belong to a single cluster.
Splitting judgments using supervised classification
A researcher familiar with the jurisprudence of a court is capable of reading a judgment and assigning paragraphs within the text to distinct issues. Such manual content analyses are resource intensive, especially when researchers are dealing with large numbers of judgments (see Dyevre Reference Dyevre2020). We argue that investing these efforts pays off, particularly where manual coding can be facilitated by computer-assisted methods. Existing research has shown that, once an adequate volume of manually coded data is available, machine learning classifiers can be trained to replicate even complex coding tasks on new data (see Lowe et al. Reference Lowe, Benoit, Mikhaylov and Laver2011; Anastasopoulos and Bertelli Reference Anastasopoulos and Bertelli2020).
In the following section, we draw on our experience of working with the CJEU’s judgments in preliminary reference proceedings to highlight the characteristics of judgments that allow researchers to rely on supervised machine learning classification and mitigate the costs of manually splitting large volumes of judgments into issues.
Issues in the CJEU’s preliminary rulings
The CJEU’s decisions in so-called preliminary reference proceedings have left a deep imprint on the legal systems of EU Member States (Craig and de Búrea Reference Craig and de Búrca2020). As national courts may (and, in some instances, must) submit a preliminary reference to the CJEU concerning the interpretation of EU law, the CJEU was able to enroll national courts as decentralized “enforcers of EU law” (Craig de Búrca Reference Craig and de Búrca2020, 497) and expand the reach of EU law even against the interests of Member States (see Alter Reference Alter2001; Weiler Reference Weiler1994; Stone Sweet and Brunell Reference Stone Sweet and Brunell1998).
National courts often submit multiple questions concerning different aspects of EU law to the CJEU within a single reference, and the Court’s judgments typically comprise a discussion of the relevant national legal context and details from the original case before the national court. A researcher studying CJEU judgment texts to learn from its answers to national courts therefore needs to process the texts prior to analysis to avoid mixing in text elements that have little connection to their research questions.
We hired four research assistants with a background in European law to read a sample of CJEU’s judgments in preliminary reference proceedings and separate text segments comprising the CJEU’s answers to national court questions from other elements of the judgments (e.g., national legal contexts, case facts, etc.). Our research assistants were instructed to identify paragraphs belonging to one of four classes within each judgment: (1) paragraphs introducing the CJEU’s response to a national court’s referred question (question_start), (2) paragraphs stating the CJEU’s concluding response to the national court’s question (question_stop), (3) and paragraphs stating that a national court question does not require an answer from the CJEU (question_noanswer).Footnote 7 All remaining paragraphs in the judgment were assigned to a residual category (residual). Table 3 provides typical examples of the semantic structure of these paragraph classes. Section A in the online appendix provides further details on the process of our research assistants’ manual coding of the CJEU’s judgments, illustrates with examples how these paragraph classes are embedded in the judgment texts, and discusses the intercoder reliability of the coding.
Note: Trained research assistants were asked to identify paragraphs marking the beginning of the CJEU’s answer to a national court’s referred question (question_start), and the Court’s concluding answer to that question (question_stop and question_noanswer). All remaining paragraphs were assigned to a residual category (residual).
Once our research assistants had coded all paragraphs within a judgment, we were able to identify text segments that concern a particular issue. A text segment capturing an issue begins at the paragraph marking the start of the Court’s answer (question_start), ends at the next paragraph marking the conclusion of the Court’s answer (question_stop), and comprises all paragraphs between these two. By the time of writing, our hand coders had completed their manual coding task for a sample of 1,080 preliminary references lodged with the CJEU between 1998 and 2011. In total, hand coders had identified 1,804 paragraphs of the class question_start, 1,804 corresponding paragraphs of the class question_stop, and 189 paragraphs of the class question_noanswer. We processed the hand-coded paragraphs’ texts, removed numbers and punctuation, stemmed each term, and tokenized terms into three-grams. We then identified the most frequently occurring three-grams per paragraph class, displayed in Table 4. Note that the most frequent terms for the three classes question_start, question_stop, and question_noanswer displayed in Table 4 appear to have plausible connections to their respective paragraph classes (yet, note also that some terms frequently appear in more than one class, a possible complication we address in Section 3.2).
Note: The column “Most frequent features (three-grams)” shows the five most frequent three-grams per paragraph class (N13,797).
We show in the following section that these linguistic similarities across paragraphs within the same class allow us to put the collected data to work and train a supervised machine learning classifier that can replicate our research assistants’ task with high accuracy.
Classifying paragraphs
Our data comprise a total of 13,797 hand-coded paragraphs from our sample of 1,080 judgments the CJEU issued in preliminary reference proceedings.Footnote 8 After processing the text as described above and dropping any features that occur in only 10 or fewer paragraphs, we randomly split our data for each paragraph class in half to create training and test sets to evaluate the performance for three supervised machine learning classifiers: a naive Bayes classifier, a random forest model, and a feedforward neural network.
Naive Bayes classifiers are relatively simple machine learning classifiers that can be fit easily, while providing often reasonable performance (see Kim et al. Reference Kim, Han, Rim and Myaeng2006). Random forests and neural networks are computationally more demanding classifiers, yet well suited to learn more complex patterns for classification tasks. All three classifiers learn from patterns in sparse document-feature matrices, representing paragraphs as bags of words, ignoring the sequence of features. In addition, we programmed a convolutional neural network (CNN) and a long short-term memory (LSTM) network to solve our classification problem, which incorporate the sequence of features when training, but found that the bag-of-words approaches outperform these classifiers (see appendix Section B). We programmed all classifiers in R, relying on the quanteda, randomForest, and keras packages. All replication material, including text data, R code and files of the trained models, are made available in the supplementary material. Details on the tuning process to define optimal hyperparameters for both the random forest model and the feedforward neural network are provided in Section B of the appendix.
Performance metrics for the naive Bayes classifier, the random forest model, and the feedforward neural network are reported for each paragraph class in Table 5.Footnote 9 Table 5 shows that the naive Bayes classifier performs reasonably well for the paragraph classes question_start, question_stop, and residual yet poorly for question_noanswer. This does not surprise as there are only 94 paragraphs classed as question_noanswer in our training data, well below the numbers of the remaining paragraph classes – and arguably below adequate numbers to train a machine learning classifier. However, turning to the metrics for the random forest model and the neural network, we can see that both classifiers perform remarkably well across all four paragraph classes, including question_noanswer. We find that the more sophisticated classifiers can effectively replicate the coding decisions of our research assistants (i.e., $ {F}_1 $ metrics for both classifiers are well above or close to 0.90 across the four paragraph classes).
Note: All classifiers were trained on identical $ \mathrm{6,899}\times \mathrm{10,175} $ document-feature matrices (paragraphs × three-grams).
The key to the successful classification is the CJEU’s consistent use of linguistic patterns. Patterns such as “by its question the referring court asks in essence” or the “the answer to the question must be” are characteristic of paragraphs marking the beginning and concluding paragraphs of the Court’s reasoning on legal issues, while linguistic patterns such as “there is no need to answer” occur virtually exclusively in paragraphs our hand coders had identified as belonging to the class question_noanswer. These patterns allow machine learning classifiers to distinguish between paragraph classes even when the available amount of training data is limited. To illustrate, we plot the 30 most important features for the classification of paragraphs for our random forest model in Figure 3. We can see that all three-grams identified as the most important features are connected to linguistic patterns characteristic of the respective paragraph classes. Recall also that some of the features listed in Figure 3 frequently appear in more than one paragraph class (e.g., “must_be_interpret” frequently appears in paragraph classes question_start and question_stop). Although initially a cause of concern, we find that all three classifiers rarely struggled to distinguish between the classes question_start and question_stop but instead that most misclassifications occur between the class residual and the remaining classes, respectively.Footnote 10 , Footnote 11
The results we present here suggest that, instead of instructing our research assistants to manually classify paragraphs in the entire sample of judgments of interest, the task can be performed for a subset of this sample and the manually coded data can be used to train a classifier to complete the job for the remaining judgment texts. In our case, rather than having our research assistants classify paragraphs from all 2,460 preliminary rulings the CJEU issued between 1998 and 2011, we were able to limit their task to a sample of roughly 1,000 judgments.
Limitations
While these results are encouraging, there are limitations to our approach. We need to be confident that the same linguistic patterns present in the training data also appear in the documents that ought to be classified by the trained classifier. For our purposes, given we asked our research assistants to code a sample of preliminary references lodged with the CJEU between 1998 and 2011, the use of the classifier is, strictly speaking, limited to preliminary rulings issued within this time frame.Footnote 12 Without evidence suggesting that the linguistic patterns driving the performance of the classifiers discussed above are present in judgments outside the time frame of our analysis (for instance, via manual validation of a number of out-of-sample classifications), we would caution against using the same trained classifier to predict paragraph classes in older or more recent judgments or even judgments issued in other procedures (e.g., direct actions) and other courts. Put simply, the use of a trained classifier is limited to the temporal-, institution- and procedure-specific context of the training data.
Our approach does not make manual coding obsolete, and the investigating researcher still has to carefully select both the judgments on which a classifier is trained and the judgments that ought to be classified for a particular research project. However, even within the limitations outlined above, manually coding a subsample of judgment texts, training a machine learning classifier, and then classifying paragraphs in the remaining set of judgments of interest saves resources and is worth the effort. We believe that classifying paragraphs to split judgments into issues is beneficial for research in the domain of law and politics, and we now turn to demonstrate these benefits for the CJEU’s preliminary rulings in the following sections.
Putting issues to the test: The value of issue splitting
In this section, we highlight the practical benefits of issue splitting and show that working with issues as data is preferable to working with entire judgments when analyzing court decisions. First, we set out to identify topics in a sample of the CJEU’s case law and compare the results from a topic model estimated on complete judgments and a topic model estimated on judgments that had been split into issues. Second, we identify clusters of case law connected through references, again comparing the results of a network analysis performed on complete judgments and a network analysis of issue-split judgments.
Here, we use a sample dataset consisting of all 206 CJEU judgments in preliminary proceedings concerning free movement of goods referred to the CJEU between 1998 and 2011.Footnote 13 Free movement of goods is one of the fundamental freedoms that serve as the pillars of the internal market that, in turn, constitutes the heart of the European Union and EU law. The right to free movement of goods is enshrined in the Treaties, but the language is enigmatic, which has given rise to a significant body of CJEU case law clarifying the proper interpretation of EU law.
Classification using LDA topic modeling
Judgments concern distinct legal question and knowing what those questions are and which judgments address which question is arguably the most important knowledge that lawyers have, but they are also important to scholars seeking to understand how courts behave in different areas of law. Automated text analysis allows us to predict topic labels from text (see, e.g., Aletras et al. Reference Aletras, Tsarapatsanis, Preoţiuc-Pietro and Lampos2016; Ashley and Bruninghaus Reference Ashley and Bruninghaus2009; Salaün et al. Reference Salaün, Langlais, Lou, Westermann, Benyekhlef, Métais, Meziane, Horacek and Cimiano2020), typically using latent Dirichlet allocation (LDA) topic modeling, which summarizes a corpus assuming an unknown structure of topics reflected in the individual documents of the corpus. Previous studies have used LDA topic modeling to identify topics in different bodies of case law (Carter et al. Reference Carter, Brown and Rahmani2016; Lauderdale and Clark Reference Lauderdale and Clark2014; Panagis et al. Reference Panagis, Christensen, Urska, Bex and Villata2016; Soh et al. Reference Soh, Khang and Chai2019; Trappey et al. Reference Trappey, Amy and Liu2020; Venkatesh and Raghuveer Reference Venkatesh and Raghuveer2013).
However, applying topic models to entire judgment texts may mask significant differences in the topics addressed in its constituent issues. To illustrate, we trained an LDA model and applied it to the CJEU’s judgment in Fazenda Pública, a judgment addressing two issues. Figure 4 shows that, applied to the entire judgment text, the model indicates that the judgment predominantly addresses Topic 4 and to a lesser extent Topic 8. However, when applied to its two constituent issues, a different and clearer pattern emerges: Issue 1 predominantly concerns Topic 4, and issue 2 is dominated by Topic 8.
In the following, we use LDA topic modeling to compare how well a topic model performs when classifying complete judgments and issue-split judgments. We examine whether and to what extent a topic model is capable of identifying the topic of an issue with greater probability than for the judgment to which the issue belongs. Specifically, we compare the maximum topic probability of the issue to the maximum topic probability of the judgment, arguing that the better approach is the one that achieves the highest probability.Footnote 14
We first trained a model based on a random selection of 165 of the 206 judgments for our training set. The process involves identifying words in the corpus that appear more frequently together in the document, and the model is trained over multiple iterations to provide an efficient representation of the entire corpus as well the documents of which it consists (Blei et al. Reference Blei, Ng and Jordan2003). Following standard text preprocessing,Footnote 15 we trained an LDA topic model that, in order to ensure that the model was not biased in favor of complete judgments or issues, included all judgment text twice: with the entire judgment as document and with the issues as documents.Footnote 16 The resulting model essentially represents the probability that certain words appear together under 10 topics in free movement of goods case law. In Section C of the online appendix, we describe the topics that the LDA model identified and show that these relate to several distinct themes we would expect to find in CJEU judgments concerning the free movement of goods.
This model was then applied to classify the text of the remaining 41 judgments, our test set, identifying topics in unseen free movement of goods judgments.Footnote 17 To test the efficacy of issue splitting, the model was applied to classify (i) the complete text of each judgment, (ii) the text of each issue, and (iii) a filtered version of each judgment consisting only of text belonging to an issue.Footnote 18 This returns a classification of each document (here, either a judgment or an issue) expressed as a probability that the document addresses each of the topics in the model. The approach used for classifying the text is thus constant, the only changing variable being whether the judgment text is analyzed in its entirety or split into issues. We then match and compare the maximum topic probability of the issues to that of the judgments to which they belong, both the complete text and the filtered version.
Results for judgments containing more than one issue are displayed in Figure 5.Footnote 19 With few exceptions, the topic model performs significantly better on issues than on judgments in the vast majority of cases. The maximum topic probability for issues is on average 45% higher compared to that of the complete judgments they were taken from and, in some instances, 100–400% higher.
This is in part attributable to a key advantage of issue splitting, “noise-filtering.” Complete judgments contain portions of text unrelated to any legal question, such as presentations of the actors involved and discussion of litigation costs, and removing these improves accurate issue classification. However, even if judgments are allowed to benefit from this, the maximum topic probability for issues are on average 36% higher than the maximum topic probability of the filtered version of the judgments to which they belong.Footnote 20 Thus, even when disregarding the filtering function, there is a significant advantage to issue splitting per se when it comes to text classification. In practical terms, this means that scholars using issue splitting will be able to more accurately classify judgments and, consequently, draw more accurate conclusions. Finally, while we demonstrated the benefits for one specific text classification approach, we expect other approaches to benefit similarly.
Community detection using network analysis
The last decade has seen a significant rise in the use of network analysis for studying courts, for example, on the basis of references between cases, but these studies have generally suffered from a lack of nuanced data (Panagis and Sadl Reference Panagis, Sadl and Rotolo2015; Winkels et al. Reference Winkels, Ruyter and Kroese2011). For example, following the approach of studies of other courts (see Fowler et al. Reference Fowler, Johnson, Spriggs, Jeon and Wahlbeck2007; Lupu and Voeten Reference Lupu and Voeten2012; Winkels et al. Reference Winkels, Ruyter and Kroese2011), Derlén and Lindholm (Reference Derlén and Lindholm2014) studied the CJEU’s references to its own previous decisions and concluded that the systemic importance of some decisions, such as Bosman,Footnote 21 has been overlooked.
A key use for network analysis is community structure detection, also known as clustering, to identify “densely connected groups of vertices, with only sparser connections between groups” (Newman Reference Newman2006a, 8577). Community detection has a broad range of uses, including, when applied to case law citation networks, being able to identify communities of judgments addressing similar topics (Mirshahvalad et al. Reference Mirshahvalad, Lindholm, Derlén and Rosvall2012). The leading measurement for assessing the quality of the communities is modularity, a value between 0 and 1 calculated by taking the fraction of edges within communities minus the expected fraction if edges were distributed randomly (Newman and Girvan Reference Newman and Girvan2004, 7).
We use modularity to evaluate the impact of issue splitting on community detection. We construct two citation networks based on references to CJEU judgments found in the test set described above: one based on references from and to complete judgments (judgment network), one based on the same references but from issues to judgment (issue network). We then apply six leading community detection algorithms to both networks and compare the modularity. Table 6 shows that the communities in the issue network consistently have greater internal density and lower external density than the communities in the judgment network, in most cases around 10%.
Note: The table displays the modularity of communities in the judgment network and issue network respectively using algorithms introduced by, in order, Rosvall and Bergstrom (Reference Rosvall and Bergstrom2008); Pons and Latapy (Reference Pons and Latapy2006); Blondel et al. (Reference Blondel, Guillaume, Lambiotte and Lefebvre2008); Newman (Reference Newman2006b); Clauset et al. (Reference Clauset, Newman and Moore2004); Newman and Girvan (Reference Newman and Girvan2004). Higher values are preferred over lower values.
This means that a citation network based on issue-split judgments will more accurately represent the structure of the case law. A more accurate understanding of which judgments belong to the same community is practically important, both as it is a form of topic identification and for the reasons explained immediately above, but also as it enables researchers to more accurately identify the centrality of judgments on a topics.
Application: The CJEU’s strategic references to case law
In this final section, we show how moving from judgments to issues as units of analysis affects the specification and estimation of statistical models used to test theories of judicial behaviour. Existing research has argued that judges at the CJEU are aware that EU Member States are instrumental in the implementation of its judgments and may attempt to override unfavorable decisions of the Court (Garrett et al. Reference Garrett, Kelemen and Schulz1998; Carrubba et al. Reference Carrubba, Gabel and Hankla2008; Carrubba and Gable Reference Carrubba and Gabel2015; Larsson and Naurin Reference Larsson and Naurin2016). The CJEU is therefore sensitive – and evidently, responsive – to the interests of Member States. Drawing on this literature, Larsson et al. (Reference Larsson, Naurin, Derlén and Lindholm2017) argue that the CJEU uses legal justifications as a legitimation strategy when its decisions run counter to Member States’ interests: “[T]he Court argues more carefully, by means of reference to precedent, when it takes decisions that conflict with the positions of EU governments” (Larsson et al. Reference Larsson, Naurin, Derlén and Lindholm2017, 881).
Their empirical analysis draws on two separate datasets. Data provided by Derlén and Lindholm (Reference Derlén and Lindholm2014) capture citation patterns between preliminary rulings and is complemented by information on the CJEU and Member States’ positions on the questions addressed in preliminary rulings between 1998 and 2011 (see Naurin et al. Reference Naurin, Cramér, Larsson, Lyons, Moberg and Östlund2015). While the units of observation for the CJEU’s references to precedent are judgments, actors’ positions are expressed for the specific questions national courts had referred to the CJEU, with a single CJEU judgment typically dealing with multiple national court questions. Larsson et al. (Reference Larsson, Naurin, Derlén and Lindholm2017) solve this discrepancy in units of observations by aggregating data on actors’ positions to the judgment level.
Avoiding aggregation of data at the judgment level is potentially critical, as we demonstrate with the following example. In Case C-324/99 DaimlerChrysler AG v. Land Baden–Württemberg, the CJEU considered four distinct issues within a single judgment. On two of these issues, Member States supported the Court’s conclusion; on another issue Member States held no clearly identifiable position, while Member States opposed the Court’s answer on the final issue. On the judgment level, aggregating these positions suggests that overall Member States supported the CJEU’s conclusions, and the judgment-level data show that the Court made four references to its own case law. Our issue-level data reveal that all four of these references were made in response to the first issue, while the Court made no references in its answer to the final issue despite facing opposition from Member States. Such patterns at odds with Larsson et al.’s expectations are lost in aggregation.
Operationalizations of outcome and explanatory variables
We first identify the CJEU’s citations of its previous case law in the texts of the preliminary rulings using regular expressions.Footnote 22 Given our issue-splitting approach provides us with the text blocks for each issue, we can identify citations both at the judgment level, $ N=206 $ , and the issue level, $ N=487 $ . Like Larsson et al. (Reference Larsson, Naurin, Derlén and Lindholm2017), we then construct a variable Outdegree, which counts the number of outward citations for a particular unit of observation (i.e., a judgment or an issue).Footnote 23 Specific case law can be cited multiple times in a judgment, and we count each of these instances. This means that the count of outward citations at the judgment level equals the sum of outward citations across the issues within the judgment.
Following Larsson et al. (Reference Larsson, Naurin, Derlén and Lindholm2017), we then construct a variable MS Conflict, comprising three categories: (1) in conflict, indicating that the CJEU favored an interpretation of EU law that would restrict Member States’ autonomy while Member States’ net position favored an interpretation of EU law that preserved national autonomy; (2) in favor, indicating that Member States’ net position aligned with the CJEU’s position on a ruling concerning national autonomy; and (3) ambivalent indicating that no clear implications regarding the effects of legal integration on national autonomy could be drawn for either the CJEU or Member States’ position (or both).Footnote 24
The same approach was used to measure whether or not the CJEU’s positions conflicted with positions of the Advocate General (AG) and the European Commission, captured by the variables AG Conflict and Commission Conflict, respectively. We reconstruct the variables MS Conflict, AG Conflict, and Commission Conflict for our subset of preliminary rulings, both at the judgment level and issue level.Footnote 25
Estimation
Not every CJEU judgment actually comprises multiple issues. Out of the 206 judgments, 85 contain only one issue, while we find two or more issues discussed in the remaining judgments. Rather than estimating judgment-level regressions after aggregating values for the issue-level predictors, we incorporate the hierarchical structure of our data in the statistical model and estimate a multilevel regression (Gelman and Hill Reference Gelman and Hill2007). Some of the control variables included in the original model by Larsson et al. (Reference Larsson, Naurin, Derlén and Lindholm2017) are measured at the judgment level, and our multi-level regression model can handle predictors at the both issue and judgment level, while accounting for judgment-level variation by allowing intercepts to vary across judgments.
Given that our explanatory variable is a discrete count variable, we estimate a negative binomial multilevel regression model. In light of the relatively large number of multilevel model parameters that need to be estimated for a relatively small dataset, we follow advice by Gelman and Hill (Reference Gelman and Hill2007) and opt for a Bayesian estimation of the model parameters’ posterior distributions, specifying uninformative priors and running four chains with 10,000 sampling iterations. All estimations are implemented through the rstanarm package for R.
Results
Figure 6 plots the regression coefficients’ posterior means along with their 95% highest probability density (HPD) intervals for the judgment-level and the multilevel regression. Reference categories for the three variables MS Conflict, AG Conflict, and Commission Conflict are observations indicating no conflict between the CJEU’s position and the positions of Member States, the AG, and the Commission, respectively.
We can spot two patterns in Figure 6. First, coefficient estimates for the categories indicating conflict between the CJEU and respective actors’ positions for MS Conflict, AG Conflict, and Commission Conflict are overall similar across the judgment-level and multilevel regressions. The CJEU is more likely to reference its own case law when its position conflicts with the expressed positions of Member States and the AG, relative to instances in which their respective positions align, while no such effects are discernible when the CJEU’s position conflicts with the Commission’s position. This evidence is consistent with expectations formulated by Larsson et al. (Reference Larsson, Naurin, Derlén and Lindholm2017) that the CJEU makes an effort to embed its decisions in existing case law when facing an adverse environment to signal a legal legitimacy of an otherwise controversial decision.
Second, coefficients for MS Conflict: Ambivalent and AG Conflict: Ambivalent differ markedly between the judgment-level regression and the multilevel regression. Results from the judgment-level regression suggest that the Court makes an additional effort to embed its decisions in existing case law, even when Member States’ positions are ambivalent and fewer references to existing case law when the AG’s position is ambivalent. The coefficients from the multilevel regression, however, show that neither of these inferences hold once we consider the CJEU’s positions on the actual issues discussed in a judgment: The coefficients for MS Conflict: Ambivalent and AG Conflict: Ambivalent are indistinguishable from zero.
The reason for these differences lies in the nuance in information that is lost when data are aggregated at the judgment level. Figure 7 plots the distribution for the explanatory variable MS Conflict at the judgment level and the issue level. Splitting the CJEU’s preliminary rulings into issues reveals that for most of the issues considered, the CJEU and EU Member States’ positions were ambivalent. Our results suggest that the loss in nuance in information from aggregating data at the judgment level ultimately translates into coefficients that would lead researchers to draw misleading inferences from their analyses.
In the final step, we show that our issue-level data not only uncover substantively important differences to analyses relying on aggregated data but also allows us to make more precise predictions of how many references the CJEU makes to case law in its judgments. We first predict the number of citations for each judgment using the coefficients from our judgment-level regression and compare these predictions to the observed citations. We then predict the number of citations for each issue using our issue-level regression coefficients, sum up the predictions of issues that belong to the same judgment, and again compare these sums to the actually observed citations at the judgment level. Figure 8 shows the distributions of residuals from these two approaches, indicating that residuals from our issue-level analysis are more tightly clustered around zero.
Conclusion
Scholars of law and judicial politics have urged students of judicial behavior to center their attention on the text of courts’ jurisprudence (see, for example, Lax Reference Lax2011). Tiller and Cross (Reference Tiller and Cross2006, 523) argue that “[t]he language of the opinion at least purports to establish the rules to govern future cases, but political science researchers have generally disregarded the significance of this language.” Until recently, empirical analyses of judicial behavior have overlooked that “decisions are often most important because of the qualitative changes in law that they effect, rather than because of the decision they provide on the case facing the Court” (Clark and Lauderdale Reference Clark and Lauderdale2010, 871).
However, once scholars shift their attention to the language of court decisions, they are faced with large volumes of text from judgments that commonly address multiple issues. In this contribution, we introduced an approach that structures the text of judgments into clusters of paragraphs that deal with distinct, internally consistent issues. We showed that supervised classification can facilitate the splitting of judgments into issues, exploiting recurring linguistic patterns in judgment texts. Although our approach does not eliminate the need for manual coding, it reduces the time and effort coders would otherwise need to identify distinct issues in a sample of judgments. A key benefit of our approach is that researchers end up with the actual text for each issue that is discussed in a judgment. This opens up a variety of opportunities for empirical research. Rather than having to rely on full judgment texts, which often include more information than we care for, scholars may construct measures connected to relevant aspects of judicial behavior based on word counts, lexical diversity, or sentiment analyses specific to each substantive issue a court considered in its judgment.
Our experience of splitting a subset of the CJEU’s preliminary rulings into issues suggests that supervised classification allow us to provide structure to complex judicial decisions without having to read every single word within them, although we are conscious that the context of preliminary references proceedings, with national courts submitting distinct legal questions for the CJEU to resolve, appears particularly well suited to our approach. Nonetheless, we are confident that our approach can be used or modified to identify similar structures and issues in other courts’ jurisprudence as well. Whenever courts use recurring linguistic patterns in their judgments, researchers can employ machine learning classifiers trained to identify such patterns to provide structure to large volumes of unstructured text. Our approach thus helps to reduce the complexity of courts’ jurisprudence that would otherwise present obstacles to text-driven research of judicial behavior.
Acknowledgments
We are thankful for thoughtful feedback from Daniel Naurin, Lisa Lechner, Benjamin Engst, Theresa Squatrito, Måns Magnusson, and Andreas Östling on earlier versions of the manuscript. We would also like thank the two anonymous reviewers and the editor for their useful suggestions on how to improve the manuscript.
Funding Statement
This research was conducted as part of the IUROPA project (www.iuropa.pol.gu.se), financed by the Swedish Research Council Project No. 2018-04215.
Data Availability Statement
All replication material is available at the at the Journal’s Dataverse archive.
Supplementary Materials
To view supplementary material for this article, please visit https://doi.org/10.1086/717421