1. Introduction
The introduction of a novel noun to French, whether by borrowing, spontaneous neology or any other process of word formation, necessarily brings with it the question of the word's grammatical gender. Despite a noted preference in the literature for the masculine as the “default” gender in French, many factors come into play in determining the ultimate gender of a given noun, such as lexical, semantic and phonological factors, as well as diatopic and diastratic variation (grosso modo, geographic and socioeconomic factors, respectively) and finally the attitudes of the speech community towards linguistic authority.
Given its sudden but well-documented genesis, the word “COVID-19”Footnote 1 provides an interesting case study in the morphosyntactic incorporation of neologisms in French. It effectively serves as a microcosm for observing both the establishment of norms in borrowing and the real-time influence of the previously noted factors in loanword adaptation. In this article, we take a quantitative and multi-pronged approach to this question, tracing the evolution of the gender of this noun in varieties of French spoken in Africa, AmericaFootnote 2 and Europe.
Two corpora inform our study: First, we generated a database of more than 76,000 unique, French-language tweets from February to June 2020 from the COVID-19-Tweet-IDs repository, complete with unambiguous geographic data and gender cues for the word “COVID.” Second, we queried the Eureka.cc databaseFootnote 3 to build a similar corpus from traditional francophone media, for the same three areas and time frame. We then correlated trends in the data with the publication of gender-specific recommendations by the press and linguistic authorities such as the Office québécois de la langue française and the Académie française. While much ink has been spilled in the public sphere over the question of the gender of “COVID” in French, the present study is unique in our two-pronged approach and our inclusion of African varieties of French, which are frequently neglected from such discussions.
The rest of our article is structured as follows: In section 2, we discuss gender in the French lexicon, with an emphasis on borrowings, as well as regional-specific differences. We also discuss the history of the word “COVID,” with specific reference to French. Section 3 outlines the methodology of both studies, and section 4 presents our results. We analyze our results in section 5, discuss future directions and conclude.
2. Literature Review
In this section, we discuss the literature surrounding various aspects of gender in French, as well as a brief history of the word “COVID-19”.
2.1 Gender in the French lexicon
French nouns obligatorily fall into two morphosyntactic groups, traditionally called masculine and feminine genders, which can readily be observed in prenominal determiners (e.g., definite article le vs. la, which are masculine and feminine, respectively) and in adjective agreement. Unlike certain languages in which gender is highly predictable according to phonological factors (e.g., Afar; see Parker and Hayward Reference Parker and Hayward1985) or a combination of phonological factors and declensional classes (e.g., Russian; see Corbett Reference Corbett1991), French gender is often considered as more opaque with respect to these factors (see, for instance, Bloomfield Reference Bloomfield1933 for an early formulation, and Poplack Reference Poplack2018 for a view on the diminished role of phonetic factors). However, more recent research finds a certain number of regularities – albeit often interacting and sometimes competing – across the lexicon.
Setting aside animate nouns,Footnote 4 both phonological and derivational factors contribute to gender certainty (i.e., the degree to which a given form can be reliably predicted as having a certain gender) in French nouns. As Tucker et al. (Reference Tucker, Lambert and Rigault1977) show, certain word-final strings, whether simple or complex, demonstrate high degrees of gender regularity in the lexicon. For instance, more than 99% of words ending in [ɑ˜] are masculine (e.g., un accent ‘an accent’), versus only 12% of [ad]-final nouns (e.g., un grade ‘a rank’). Since nominal suffixes contribute a categorical gender, the segmentability of these endings (that is, whether or not a given substring constitutes or belongs to a separable morpheme) must also be considered. For instance, whereas the endings [aʒ] and [ɛʒ] are both predominantly masculine, only the former is a productive suffix (e.g., the -age in lavage ‘washing’). Meanwhile, words ending in [ʁʒ] are predominantly feminine and monomorphemic (e.g., auberge ‘hostel’; see Tucker et al. (Reference Tucker, Lambert and Rigault1977: 104) for more information). These findings are corroborated to an even stronger degree by Lyster (Reference Lyster2006), who finds that the gender of at least 80% of both feminine and masculine nouns is categorically predictable based on their endings. Note that his analysis makes a more explicit link between rhyme shape and orthographic form (e.g., distinguishing highly masculine -al from highly feminine -alle from ambiguous -ale, all for the same rhyme [al]).
Beyond these observations, the evidence is robust that French speakers pay attention to these cues in processing lexical information and in assigning gender to nonce or novel words, both independently of each other and conjointly (e.g., Tucker et al. Reference Tucker, Lambert and Rigault1977, Karmiloff-Smith Reference Karmiloff-Smith1979, Desrochers et al. Reference Desrochers, Paivio and Desrochers1989, Taft and Meunier Reference Taft and Meunier1998, Holmes and Dejean de la Bâtie Reference Holmes and de la Bâtie1999, Holmes and Segui Reference Holmes and Segui2004, Becker and Dow Reference Becker and Michael2013), and gender errors are strikingly uncommon in L1 French acquisition (Carroll Reference Carroll1989).
2.2 Gender in anglicisms and borrowings
Just as in the French lexicon in general, the attribution of gender to borrowed words in French has been described as arbitrary and mysterious (Pergnier Reference Pergnier1989, p. 39); however, more recent studies reveal the existence of several complex and competing factors.
First of all, there is some evidence that nouns borrowed from languages with a grammatical gender system preserve their gender when moved to the target language, at least between French, on one hand, and classical and Romance languages, on the other (Roché Reference Roché1992), insofar as the source language's categories align with those of French. While general and more theoretical discussions of borrowings and gender in French do not specifically consider African varieties of French, this tendency is independently confirmed for Arabic loans in Algerian French (Smaali Reference Smaali1994, Derradji Reference Derradji1999) and Moroccan French (Benzakour Reference Benzakour1995, Gaadi Reference Gaadi1995), as well as Italian loans in the French of Cameroonian internet users (Cutrì Reference Cutrì2014). The adaptation of genders other than masculine and feminine (e.g., neuter) have been shown to be subject to the same forces as those driving borrowings from languages without gender (see Baetens Beardsmore Reference Baetens Beardsmore1971 for the adaptation of Flemish neuter nouns in Brussels French), to which we now turn our attention.Footnote 5
The French lexicon has a fairly equal number of nouns of each gender, if slightly biased towards the masculine (56% vs. 44%); however, the vast majority of contemporary borrowings from languages without gender are masculine, at 85% (Roché Reference Roché1992). While the general equilibrium of genders is noted as far back as Old French, the disparity in borrowings at that stage is reversed (only 36% masculine), with a steady rise in masculine borrowings over time (ibid.). This reversal in trends can be explained in part by a change in source languages. Borrowings in Old French were most prominently technical or learned vocabulary from Latin, which skews heavily feminine due to its derivational suffixes. After a rise in borrowings from Romance languages (with their own gender systems, see above) in Middle French, English became the dominant source language in the 19th century (Roché Reference Roché1992), originating nearly 2.5% of the modern French lexicon (Rey-Debove Reference Rey-Debove1987). This coincided with, if not contributed to, the rise of an increasing and self-reinforcing productivity of the masculine, to the point where scholars consider it the “default” or “unmarked” gender in French (see in particular Roché Reference Roché1992, pp. 114–116),Footnote 6 and currently only 10 to 12.4 percent of borrowings from English are feminine (Hanon Reference Hanon1970, Humbley Reference Humbley1974, Surridge Reference Surridge1984, Soubrier Reference Soubrier1985, Johnson Reference Johnson1986).
A major factor in determining the gender of an English borrowing in French is the attraction of pre-existing words in the lexicon. This is typically discussed in the literature in terms of parasynonyms and/or quasi-homonyms. That is, English words often receive the gender of their French calques or translations, whether based on orthographic, phonetic or semantic analogy (Haden and Joliat Reference Haden and Joliat1940, Nymansson Reference Nymansson1995, Lupu Reference Lupu2005). Examples of this for the feminine include une love affair (based on une affaire) and une backroom (based on the correspondence of room with the French une pièce).
Another, somewhat more opaque factor in the determination of a borrowing's gender is via ellipsis with a syntactically higher and often unexpressed French noun (Haden and Joliat Reference Haden and Joliat1940, Nymansson Reference Nymansson1995, Lupu Reference Lupu2005). This is the argument for words such as une Ford and une start-up, in that they receive the feminine gender based on understood nouns une voiture ‘a car’ and une entreprise ‘a business’ (or une firme ‘a firm’), respectively. Belleau (Reference Belleau2016) similarly notes the importance of “paradigmatic integration,” by which borrowings in a certain semantic field (e.g., types of sausages and cured meats such as pepperoni, proscuitto, chorizo, and so on) tend to pattern together within a variety of French, presumably based on analogy with a more frequent and/or established borrowing within that field. Note that this sort of explanation (in particular, ellipsis) will be the argument put forward by several linguistic authorities for “COVID” (see section 2.4).
On a somewhat similar note, the gender attributed to English-based initialisms tends to be the same as that of its French equivalent (e.g., la CIA, based on une agence ‘an agency’), provided it is “visible” enough (Lupu Reference Lupu2005: 267). As for true acronyms such as laser, Saugera (Reference Saugera2006, Reference Saugera2017) argues that this transparency is usually not available to speakers, in which case we presume lexical tendencies and phonetic factors (discussed below) to take precedence.Footnote 7
Phonetic factors play a role, though diminished (Belleau Reference Belleau2016), in determining a borrowing's gender. These may be based on analogy with the lexicon, or may be unique to borrowings. Concerning the former, English word endings may be associated with certain word endings in French and their gender. For instance, English -y (as in party) is frequently associated with French -ie [i], which skews feminine in the lexicon (Haden and Joliat Reference Haden and Joliat1940, Nymansson Reference Nymansson1995). These factors may conflict with those discussed above, yielding variation. For instance, the new beat genre of music may be either feminine by analogy with la musique or masculine due to the phonetic factors (Nymansson Reference Nymansson1995).Footnote 8 It is crucial to recall, however, that these forms in variation are not the norm, as discussed above.
Other phonetic factors are documented, but must be considered in light of regional differences, which we will turn our attention to in section 2.3. Before doing so, given the potential influence of the aforementioned phonetic factors, we find it opportune to present some basic statistics on the gender of the ending [id] and its various orthographic representations in the French lexicon.
A survey of the [id]-final singular nouns of Lexique-Infra (Gimenes et al. Reference Gimenes, Perret and New2020) yields 108 entries, 80 of which are masculine and 28 feminine. In nearly all words, the [id] rhyme corresponds in the orthography to -ide, -ïde, -yde or -oïd. Almost all of the feminine forms belong to the first two endings (though both are still predominantly masculine). The loanwords ending with [id], namely caïd (Arabic), kid, speed, tweed (English) and lied (German), are all masculine. Otherwise, words ending in orthographic -id are not pronounced as [id], rather [a] (as in froid) and [ɛ] (as in laid), and all gender-bearing entries are masculine, with the sole exception of forms related to the English loan maid. If we generalize to [d]-final singular, gender-bearing nouns, however, 411 of 613 (67%) are feminine. As such, we find at best varying evidence in the lexicon for the attribution of the feminine to “COVID” based on phonetic factors (setting aside the “19” for the sake of argument).
2.3 Regional differences in gender and borrowing
We start with European and American varieties of French, which are more extensively studied with respect to gender and often contrasted with each other. These varieties do not show significant differences with respect to the gender of common, native French lexemes. A small number of exceptions are noted in the literature, especially in native French vowel-initial words (where a tendency is noted for the feminine in QuébécoisFootnote 9 varieties of French, e.g., une avion ‘a plane’ in place of the normative un avion). Phenomena such as this, however, are not necessarily specific to a region, rather being a property of oral, vernacular French (see Belleau Reference Belleau2016: 62–67, for example).
Quantitatively speaking, patterns in gender assignment to English borrowings are noted to be quite similar between Québécois and/or Canadian and European varieties of French (Haden and Joliat Reference Haden and Joliat1940, Nymansson Reference Nymansson1995, Belleau Reference Belleau2016), with a few exceptions. First, the gender of specific words may show differences, a famous example being party, which is feminine in European varieties of French but masculine in Québécois varieties of French (e.g., Belleau Reference Belleau2016). Additionally, the gender of specific morphemes has historically differed between the two regions. English -ing has occasionally yielded feminine nouns in the history Canadian varieties of French, whether phonetically adapted or not, for instance, la réguine ‘old machine or car’ (from rigging) vs. la siding, respectively (Haden and Joliat Reference Haden and Joliat1940).Footnote 10 Meanwhile, this ending is categorically masculine in European varieties of French.
Both of these previous examples (i.e., party and -ing) illustrate two purported larger differences between the two regions with respect to phonetic factors. Vowel-final English words tend to be masculine and consonant-final ones feminine in varieties of French spoken in Quebec, unlike European varieties of French (Léard Reference Léard1995: 178–180). Finally, monosyllabic words (e.g., job) tend to be masculine in European varieties but feminine in Québécois varieties (Belleau Reference Belleau2016).
Findings on gender in African varieties of French generally fall into two categories: First, a general difficulty in acquiring and consistently applying the gender distinctions of French is noted among language learners and in certain lects of French in certain countries, regardless of the existence of nominal class systems in co-existing, vernacular languages. Biloa (Reference Biloa2003: a.o.) notes this for Cameroonian French, going so far to state that “en l’état actuel des études portant sur le français du Cameroun, il est difficile de systématiser l'emploi du genre en français du Cameroun, sans courir le risque de se tromper à chaque fois” (“in the current state of studies about Cameroonian French, it is difficult to systematize the use of gender in Cameroonian French, without running the risk of being wrong every time”) (pp. 144–145). Holtzer (Reference Holtzer2004) and Calvet and Dumont (Reference Calvet and Dumont1969) make similar observations for Guinean and Senegalese French, respectively. Ndjerassem (Reference Ndjerassem2005) mentions that certain words in Chadian French have a different gender than in normative French (e.g., cafétéria being masculine instead of the normative feminine).
A second theme arising in the literature is the omission of gender-signaling determiners. This is noted for French as spoken in Côte d'Ivoire (Jabet Reference Jabet2006, Boutin Reference Boutin2007) as well as in the French of Ivorian students (Hérault and Vonrospach Reference Hérault and Vonrospach1967, N'Guessan Reference N'Guessan1982), a phenomenon which leads to general confusion over the use of the masculine and feminine (Ayewa Reference Ayewa2009). Such determiner dropping has been proposed to be a commonality between Ivorian French and the popular French of Montreal (e.g., quand j'ai lâché [l’]école ‘when I quit school’), though less common in the latter (Hattiger and Simard Reference Hattiger and Simard1982, citing Sankoff and Cedergren Reference Sankoff, Cedergren and Darnell1971). Omission of gender agreement is a documented feature of “Camfranglais”,Footnote 11 both spoken (de Féral Reference Féral2006) and written on the Internet (Telep Reference Telep2014).
While English loanwords in African varieties of French are extensively documented (e.g., Schmidt Reference Schmidt1990 and what can be extracted from Blondé Reference Blondé1983), we were not able to find a detailed discussion or synthesis on the attribution of gender to these nouns, especially inanimate ones. A superficial survey of the lexicons of Gabonese (Boucher and Lafage Reference Boucher and Lafage2000), Chadian (Ndjerassem Reference Ndjerassem2005) and Cameroonian (Nzesse Reference Nzesse2009) French reveals a very small number of feminine loans from English (Gabonese la blaze ‘showoff’, la shoes ‘pair of shoes’; Cameroonian la dream-team and la shoes) but not enough to derive any significant trends about any one variety.Footnote 12 A search of Tunisian French (Naffati and Queffélec Reference Naffati and Queffélec2004) yielded no feminine English loans.
2.4 COVID-19
On February 11, 2020, the International Committee on Taxonomy of Viruses officially named the novel coronavirus detected a few months prior “severe acute respiratory syndrome coronavirus 2” (abbreviated SARS-CoV-2). The same day, the World Health Organization (WHO) gave the disease caused by SARS-CoV-2 the abbreviated name “COVID-19,” for “coronavirus disease 2019” (World Health Organization 2020a). Also on February 11, both Radio-Canada (R-C) and the Office Québécois de la langue française (OQLF) created terminological records for the term. On the one hand, Nathalie Bonsaint, a linguistic consultant at Radio-Canada (p.c.) reported that the R-C record classified “COVID-19” as masculine. On the other hand, Xavier Darras (p.c.), a language production coordinator at the OQLF, indicated that their record did not, at that time, include a gender. As observed by Nathalie Bonsaint, and corroborated by our corpora and by the analysis published for the news site The Conversation in May by Mathieu Avanzi (Avanzi Reference Avanzi2020), the term “COVID-19” was generally employed in the masculine until early March, with the exception of earlier WHO publications on the subject which variably used the feminine form. A statement on the web page for one of their online courses reflects that fact (World Health Organization 2020b):
Suite à la révision par l'OMS de l'appellation de la maladie et du virus qui la cause, ‘COVID-19’ est considérée comme une locution féminine. Nous vous prions ainsi de noter que toute mention ‘le COVID-19’ fait donc référence à la COVID-19. (Following the WHO's revision of the name of the disease and the virus that causes it, ‘COVID-19’ is considered a feminine expression. Please note that any mention of ‘COVID-19’ in the masculine thus references ‘COVID-19’ in the feminine.)
Bonsaint reported that by March 6 the WHO had updated its web site in order to use the feminine and that she took the same action on R-C's internal terminological record in keeping with the French publications by the WHO (Radio-Canada 2020). We are not aware of a press release by the WHO specifically recommending the feminine apart from sporadic mention on web pages dealing with the disease. According to Xavier Darras, the OQLF also updated its terminological record on the same day classifying the term “COVID-19” as feminine (Office québécois de la langue française 2020). The Académie française finally published an official recommendation of the feminine on May 7, 2020 (Académie française 2020).Footnote 13 As far we know, at the time of writing, the Délégation générale à la langue française et aux langues de France (DGLFLF) had not put forward any formal recommendation on the subject.
The reasoning for the classification of “COVID-19” as feminine in all three sources (i.e., the Bonsaint memo and the recommendations of the OQLF and Académie française) is the same, that is, the base referent of the term is the feminine word maladie ‘disease’, whether directly expressed or not. That is, regardless of whether one acknowledges the ‘D’ for the English ‘disease’ in the acronym, these sources argue that la COVID-19 should be interpreted as an ellipsis for la maladie COVID-19 or similar.
The question of the gender of “COVID-19” proved contentious in the public sphere, and one can find polemics on the matter in francophone media up through December 2020 (e.g., Meteyer Reference Meteyer2020). This debate is outside the scope of this article, and we make no claims regarding the merits of arguments for or against the feminine usage of “COVID-19”. We seek only to document usage of either gender by the public and the media over time as a function of variety of French, and to elucidate potential causes for and/or explanations of these trends.
3. Methodology
In this section, we present the methodology of both our Twitter and traditional media studies.
3.1 Twitter study
The COVID-19-TweetIDs repository (Chen et al. Reference Chen, Lerman and Ferrara2020) served as the starting point for the current study's Twitter database. This repository provides the unique identification numbers (hereafter, “tweet IDs”) of all publicly available tweets since January 21, 2020 containing any of a list of keywords such as “coronavirus,” “COVID-19,” and so on.Footnote 14 According to the June 23, 2020 version of the project's documentation (around the date that we stopped our data collection), French-language tweets comprised roughly 3% of the corpus, numbering over 5.5 million tweets.
The tweet IDs from the months of January to June inclusive were then “hydrated”. The process of hydration essentially consists of downloading all available information provided by Twitter for a given unique tweet identifier, the amount of information varying from tweet to tweet. This was performed using a Python script provided by the authors of the repository. The data were then standardized, subsetted and analyzed for gender in the R language (R Core Team 2020) along the following lines.
3.1.1 Text processing
First, all non-French-language tweets were discarded. Here and throughout this article, “French-language tweets” refers to tweets whose language is automatically identified as such by Twitter's proprietary algorithm.Footnote 15 In accordance with the findings that geolocation is a useful metric in gauging the accuracy of Twitter's automatic language detection (e.g., Williams and Dagli Reference Williams and Dagli2017, Graham et al. Reference Graham, Hale and Gaffney2014), we also extracted geographical data from the user profiles of our database. By focusing on continents with large French-speaking populations, we were able to limit our dataset to more probable true positive identifications. As indicated in section 3.1.2, samples of potentially questionable tokens (e.g., those originating from Spain) were manually verified and confirm the proper functioning of automatic language detection to a high degree of accuracy.
Tweets were then limited to those whose text contains gender-marked instances of the string “covid” in a case-insensitive search, regardless of of the presence of “-19” (or any permutation thereof). Gender marking was identified by the presence of the following words in the immediately preceding word: le, au, du and ce for the masculine and la and cette for the feminine.
Tweet text was then cleaned up as follows. In order to later eliminate duplicate tweets, entry-initial “RT @[username]” was eliminated. URLs and Unicode characters were also removed. Apostrophes were standardized, and all punctuation (including the hash character) was then removed, except for apostrophes, commas and periods. Once line breaks and unnecessary whitespaces were finally cleaned up, each duplicate tweet was then reduced to a single instance.
The number of masculine and feminine occurrences in each entry of the database was then tabulated. Meanwhile, the timestamps provided by Twitter (expressed in Coordinated Universal Time) were converted to a POSIX date/time class interpretable by R, and the month and day were retained. The total of masculine and feminine occurrences of the word “COVID” was then calculated for each day.
The package EnvCpt (Killick et al. Reference Killick, Beaulieu, Taylor and Hullait2020) was used to detect the date of the maximum-likelihood estimates of change points in the percentage of feminine uses for each subgroup (continent by follower size). In our case, this corresponds to any day on which the percentage of feminine uses of the word “COVID” rises to an important degree. The dates identified by this procedure were then compared manually with the percentages themselves to eliminate negligible or ephemeral switchpoints.Footnote 16
Finally, user follower count was used to approximate popularity. We present these results for informational purposes only and refrain from making explicit links between popularity and social influence, on one hand, and sociolinguistic explanations, on the other, with respect to our results (see, for instance, Garcia et al. Reference Garcia, Mavrodiev, Casati and Schweitzer2017 for the terms and stakes involved). Accounts within each continent were separated into three bins of “small,” “medium” and “large,” each containing a roughly equal number of observations (i.e., tweets). This was achieved using the cut_number() command of the ggplot2 package for R. These ranges are reported in section 4.2.
3.1.2 Geographical information
The remaining tweets were then processed for geographical information, ultimately in order to deduce the continent of users in our database. While Twitter allows for users to tag their tweets for location, unfortunately, this information was present in only approximately 1% of the data at this stage of processing. In order to fill this gap, we processed the user.location field (non-empty for nearly 63% of the dataset) for relevant information, after Unicode characters had been removed.
Two initial issues presented themselves with this field: First, the formatting is non-standard, in that people can include information such as city, country, both or neither. Second, country names may be in either French or English (among others). To counteract these issues, we made a bilingual database of cities and regions (equivalent to French régions and Canadian provinces) with their respective countries and continents using the maps (Brownrigg Reference Brownrigg2018), countrycode (Arel-Bundock et al. Reference Arel-Bundock, Enevoldsen and Yetman2018) and raster (Hijmans Reference Hijmans2020) R packages. Names in this database were limited to those found on the European, African and American continents, in order to reduce mismatches.
After standardizing names between the packages, we removed from the user-provided information all words unattested in our custom place-name database. Words in user.location were then matched for cities in our database and their corresponding continent. This process was repeated separately for regional and country names. Finally, a subset of the 1,000 most common unmatched user-provided locations were manually assigned a continent. Subsets were also verified throughout the procedure, and certain manual corrections were implemented in the algorithm. For instance, North American cities beginning with “San” matched both America and Africa due to the San commune in Mali; this was corrected. Geotagged users’ country information was also extracted from the place.country field and matched with its continent. In the rare occurrence of mismatches between sources of information, or of multiple returns (typically because place names spanning two or more continents were provided by the user), the manually provided and geotagged information were taken as authoritative. Otherwise, the first continent was arbitrarily chosen.
A subset of 450 users, 50 per continent per follower number group, was randomly selected for verification of the accuracy of continent identification. We found 93.8% of the subset to be correctly identified and thus within the limits of acceptability. Africa had the lowest accuracy of the three continents at 87.3%, versus America at 95.3% and Europe at 98.7%.
All in all, this procedure resulted in a final database of 76,054 unique French-language tweets which, in summary, contained unambiguous gender information about the word “COVID” and from which geographical information could be ascertained.
3.2 Media study
The Eureka.cc database, essentially an aggregator of the world's newspapers and other forms of media, was used in order to trace the evolution in usage of both genders for the term “COVID(-19)” in francophone media. The same masculine and feminine forms of “COVID” detailed above were entered separately in week-long intervals beginning with February 11, 2020 and ending June 30, 2020. Omission of “-19” did not preclude the full form “COVID-19” from appearing in the results. Each week's search was performed separately for all French-language media in the database for each continent. The number of sources for each continent at the time of data collection were the following: 653 (North America), 825 (Europe) and 78 (Africa).
The number of articles corresponding to each gender (again, by week and continent) was then entered into a database.Footnote 17 While syndicated articles (i.e., a singular article that is reprinted in various different news outlets) are present in the database, they could not be eliminated, nor do we believe they should be. Not only do we strongly doubt the gender of the term “COVID-19” to be a deciding factor on which articles are syndicated, but also we believe that the proliferation of certain articles containing one gender or the other reflects a certain Zeitgeist as well as consumers’ experience.
4. Results
Here, we present the results of our study, starting with a breakdown of the places included in our system of geographical categorization.
4.1 Geographical results
While we recognize the diversity of the varieties of French and the context in which it is spoken within any given continent, our tagging was necessarily limited to the level of continent, seeing as the number of observations was insufficient to extend the analysis to the level of country or region/province. Each continent is necessarily diverse, but certain locations are predominant, which will ultimately inform our interpretation of the results. In this section, we provide additional details about the three continents under study.
At the level of country, Canada accounts for the vast majority of the pre-processed American Twitter database (67.6%), with the United States (12.8%) and Haiti (5.8%) in second and third place, respectively. All in all, North American countries account for 93.6% of the American database. At an even closer level, “Quebec” is the most frequent word (English and French stopwords removed) in the user description field at 4,036 occurrences, compared with “Ottawa,” “Ontario” and “Manitoba” at 414, 297 and 88 occurrences, respectively. Variants of “Louisiana” and “New Orleans” are present only eight times. Concerning the Eureka.cc database, while it would be unfeasible to exhaustively profile our sources, a manual inspection suggests the vast majority of North American sources are based in Quebec, and virtually all based in Canada. We can confirm that Haitian news sources are classified as Central American, and there do not appear to be any French overseas departmental news sources in our entire media corpus, regardless of continent. In sum, we will focus on Québécois French in our interpretation of both the so-called American Twitter data and the North American media corpus.
France represents 88.9% of the pre-processed European Twitter data. In second and third place are Belgium and Switzerland at 3% and 2.6%, respectively. Within the user description field, “Paris” is the most representative place name beneath the level of country (6,876 occurrences), with “Lyon” at a distant second with 1,388 occurrences. As for the media database, all but 50 of the 776 sources listed at the time of revision were based in France. We thus focus on Hexagonal French attitudes and institutions in our interpretation of our European results.
Finally, while our review of the extant literature does not allow us to nuance our results of varieties of French spoken in Africa, we note that the Twitter results are much more heterogeneous with respect to country. Senegal is most represented at 23.1% of the data, followed by the Democratic Republic of the Congo (12%) and Cameroon (10.5%). This diversity of country of origin is also noted in the list of sources in our media database, along with the presence of a few pan-African or larger regional (e.g., Maghreb) sources.
4.2 Twitter results
Table 1 presents the number of tweets in our final database by continent and month. The number of distinct users for each continent for the entire database are the following: 6,649 for Africa, 4,712 for America and 32,767 for Europe. Given the sum of tweets per continent reported in Table 1, these users contributed on average the following number of tweets: 1.8 (Africa), 1.97 (America) and 1.67 (Europe).
Follower size groups were defined in the following way: Small accounts (abbreviated “S” in certain tables and figures) range from 0 to 213 followers in Africa, 0 to 285 in America and 0 to 196 in Europe. Medium (M) accounts consist of 214 to 1558 followers in Africa, 286 to 1595 followers in America and 197 to 1017 followers in Europe. Finally, large (L) accounts have minimally 1559 followers in Africa, 1596 in America and 1018 in Europe.
Table 2 presents the number of feminine uses of “COVID” and its percentage of total gendered uses (masculine or feminine) per month within each continent's group. Counts of masculine and feminine uses (as indicated by colour) per day are graphed over time, using X-splines, in Figure 1, according to continent and follower size. Note that the x- and y-axis limits are technically unique to each pane. The marked spike in activity in early June in all types of accounts is due to a shift in tweet collection by Chen et al. (Reference Chen, Lerman and Ferrara2020) towards cloud computing.
The American Twitter data show an immediate and important increase in the feminine coinciding with the events detailed in section 2.4 (in particular, the Radio-Canada memo and the related publication). This effect, however, is stratified by number of followers. Small and medium accounts converge on 50% feminine usage towards June, while large accounts show a steeper increase in March and a higher convergence, at 70%.
In stark contrast, the European Twitter data demonstrate both negligible usage of the feminine and little stratification between account sizes. While all account types see a rise in feminine instances of “COVID” coinciding with the recommendation of the Académie française in early May, the difference between April and May is approximately 2 to 4 percent, or from 1 or 2 percent to 5 or 6 percent. While June saw a similar rise from May, the average percent of feminine use did not reach 9 percent.
The African data can be seen as situated between the other two continents. Just like the European Twitter data, African accounts are not stratified in the same way as the American data are. They do, however, show a more important increase in the use of feminine in May (an increase of approximately 12 to 16 percent), a trend which continues into July.
The switchpoint results are plotted in Figure 2. The y-axis of each plot corresponds to the percent feminine per day. Red lines indicate the mean percent of each period identified by the model; a switchpoint is then the date at which the mean changes. From this data, we attempted to identify a single, crucial date for each type of account, which are the following (presented in the order of small, medium and large within each continent): May 12, 11 and 10 for African accounts; March 7, 6 and 8 for American accounts; and May 9, 8 and 7 for European accounts. Note, of course, that the degree of change is not comparable from one group to the next, especially at the level of continent, as can be seen in Figure 2.
4.3 Media results
The number and proportion of feminine instances of “COVID” are provided in Table 3 by week (as indicated by the starting day of the seven-day period) and by continent. These numbers are plotted in Figure 3. We can see a near-categorical passage to the feminine in North American sources early March, while European media outlets range from 1 to 3 percent around the publication of the Académie française, and reach only a maximum of 6.9 percent in mid-June. Interestingly, these European media sources appear to be even slower and less consistent in their use of the feminine than their counterparts in the European Twitter corpus. Finally, African media sources show a stark rise in the feminine in early May, which then continues to rise, mirroring the African Twitter data (though with a slightly higher end result at 49%).
Using the Eureka.cc database, we finally extended the media trends beyond the limits of the initial study. For the month of December 2020, North American media remains stable at more than 95 percent feminine (32,021 feminine vs. 1613 masculine). Otherwise, we see increases in both African and European media, more prominently in the latter. European outlets amount to slightly higher than 15 percent feminine (12,354 feminine vs. 68,732 masculine), while African outlets rise to over 61 percent (3,441 feminine vs. 2191 masculine). It remains to be seen whether African media will converge on near-categorical use of the feminine, as in America, or will continue to show variation (as American Twitter accounts do towards the end of the Twitter database).
5. Discussion and conclusion
In this section, we analyze our results in light of both language-internal and language-external factors and conclude our paper.
5.1 Analysis
Both our Twitter data and our media data show important differences among the three continents studied, concerning the mean usage of the feminine gender of “COVID” and variation therein. Without direct input from speakers (e.g., survey data), we can only speculate on the reasons behind these trends. However, we see three originating causes for the differences among continents, one linguistic and two extralinguistic.
First, we consider the linguistic variable of dialect-specific practices in morphosyntactic adaptation of loanwords (especially English loanwords), as well as community-specific differences in the functional load of gender. We noted in section 2.3 that Québécois French has a tendency to feminize consonant-final (English) loanwords. This is one factor which may favour attribution of feminine to “COVID.” The reader is reminded, however, that this is only one of several differences from European varieties of French, and that Québécois varieties of French do not categorically feminize English loans (recall, for instance, the word party). It is unclear whether this case study of “COVID” suggests that generalization across word-final strings in the lexicon (i.e., [id] being a predominantly masculine ending) is less important to speakers of Québécois French; we leave this matter open to future research. It should be noted, however, that Poplack (Reference Poplack2018) and Poplack et al. (Reference Poplack, Pousada and Sankoff1982) express scepticism about the role of phonetic factors in determining the normative gender of English borrowings into French, and predict high degrees of variation. They instead find that frequency is the determining factor in the establishment of a “fixed” gender. Since the American Twitter data still show relatively high degrees of variation as of June 2020, a follow-up study would be needed to pursue this line of reasoning.
Concerning African varieties of French, less discussion was available in the literature, but we saw that in certain lects and/or certain geographically-specific varieties, gender distinctions proved less important. This was manifested by omission of gender markers and variation in the gender of native French words. While we believe that these observations may account for some of the variation, we are skeptical as to whether it is the impetus for the tendencies observed in either the Twitter or media results, especially given that both are written media. Much more research needs to be done in this area before stronger conclusions can be drawn, with respect to both shared and novel vocabulary as well as to loans of various sources.
The second potential explaining factor is the unique relationship between media outlets and linguistic authorities in Quebec. (Recall that Quebec represents the vast majority of the American corpora.) Specifically, the OQLF offers a linguistic consultation service to Québécois media outlets with respect to terminological and neological questions, as does Radio-Canada for its own journalists across Canada. With respect to the term “COVID” and its gender, both the OQLF and Radio-Canada recounted having consulted with journalists, and the recommendations of the feminine detailed in section 2.4 were met with little resistance on the part of Canadian journalists (Darras, p.c.; Bonsaint, p.c.). We are not aware of similar services offered by the Académie, and while the Délégation générale à la langue française et aux langues de France (DGLFLF) does offer linguistic consultation to French journalists, the DGLFLF has not, at the time of writing, published a recommendation for either gender for the word “COVID.”Footnote 18 Meanwhile, we are not aware of governmental agencies in African countries specific to the French language, though some countries have agencies in matters of the Francophonie or in affairs of national languages.Footnote 19
The influence of these institutions on the North American francophone media landscape and the evident (but voluntary) compliance of journalists to these authorities are no doubt a crucial factor in the propagation of the feminine there and eventually beyond its borders. This is in stark contrast with the perseverance of European media in the use of the masculine after the recommendations both March and May. Meanwhile, judging from the increase in the feminine in early May in African media (traditional and social), it would appear, at least in the case “COVID-19,” that a non-negligible sector of African francophone media defers to the Académie française for matters of terminology and neology, although it may certainly be the case that local instances or intermediaries played a role in encouraging the feminine.
Finally, related to this second point are the attitudes of the public with respect to linguistic authorities and their recommendations. In Kim's (Reference Kim2017) study, Québécois participants responded positively to the statements (1) that French should be regulated in line with the societal norm and (2) that the government's work in promoting French is helpful. In comparison, French, Belgian and Swiss participants responded negatively to these questions. Similarly, Tremblay (Reference Tremblay1994) finds in a survey of Québécois speakers that, while they generally prefer endogenous terms (that is, terms organically or spontaneously arising in Quebec) to those created by the OQLF, they respect the work of the OQLF and hold a positive attitude towards the French spoken in Quebec. This positive atitude towards their own variety of French has been growing stronger, a phenomenon that has been documented in multiple studies since then (Pöll Reference Pöll2005; Maurais Reference Maurais2008; Chalier Reference Chalier2019, Reference Chalier2018; Pustka et al. Reference Pustka, Bellonie, Chalier and Jansen2019; Sebková et al. Reference Sebková, Reinke and Beaulieu2020). To our knowledge, little has been written on the attitudes of speakers of varieties of French spoken in Africa towards the Académie française, although language policy has largely proven ineffectual, according to Spolsky (Reference Spolsky2018: 71):
After independence (whether it was seized or granted), the French-speaking elite replaced the colonial rulers, applying much the same language policy in most cases or attempting to establish hegemony for a local variety […] [C]entralized language policy failed to change the widespread traditional language practices […] Assuming that the answers [to language problems] are linguistic and that central language management will work appears, from the French colonial experience, to be a mistake.
Indeed, this failure has created an environment in African countries for innovation and the creation of local norms, as Francine Quérémer of the Organisation internationale de la Francophonie notes: “The French language is not going to wait in all these [African] countries for the Académie to decide before it evolves” (O'Mahony Reference O'Mahony2019). It would appear from our results that a sizeable cross-section of African media outlets and Twitter users do indeed defer to the Académie, but it is unclear to what degree this deference is sustainable or representative of the future of African varieties of French.
The American data also touch upon the question of a local language norm in Québécois society and the role that the media, most specifically Radio-Canada, plays therein. As Bigot (Reference Bigot2017) notes, Radio-Canada presenters regularly receive linguistic training (Bertrand Reference Bertrand, Stefanescu and Georgeault2005), and their French is largely considered as the reference variety for Québécois French, citing the results of Bouchard and Maurais (Reference Bouchard and Maurais1999). The widespread and seemingly immediate acceptance of the feminine in the American (read: Québécois) Twitter corpus may thus speak to a complex interplay between homegrown, implicit community norms and the explicit norms of language authorities, be they the OQLF or the media by proxy. More specifically, in the absence of competition of a more spontaneous, informal and widely accepted in-group (Québécois) variant, the important rise in the feminine in spring 2020 may in part be attributed to the public's trust in and cooperation (though incomplete) with entities like Radio-Canada and the OQLF. This is, of course, assuming most uses were made with direct knowledge of these recommendations and that linguistic variables (in particular, feminization of consonant-final words) were not the sole cause of early results in the Twitter corpus.
Additionally, it is worth noting that this acceptance and propagation of the feminine in America was made despite the persistence of the masculine in European media (both traditional and social), as well as the silence of the Académie française until May. This may be taken as a sign for the codification of a norm for Québécois French independently of an international standard. However, the mere act of this codification and the public's acceptance may also speak to a persistent pressure on Québécois French to justify its features to outside parties. It may be the case that speakers of the more prestigious European, especially Hexagonal and Parisian, variety of French do not feel such pressures to defer to linguistic authorities, going so far as to equate “la COVID” with snobbery (Meteyer Reference Meteyer2020).
It is crucial to note, finally, that the relative lateness of the Académie to recommend the feminine and the lack of action from the DGLFLF gave ample time for the masculine to take root in European usage. As Poplack et al. (Reference Poplack, Pousada and Sankoff1982) and Poplack (Reference Poplack2018) note, the gender of loanwords in French, once “established”, is essentially invariable. It would appear, then, from the European data that the period of February to May proved sufficient for the masculine gender to become fixed. (This interpretation hinges, however, on an ignorance of or disregard for the February recommendation from the WHO.) Meanwhile, the variation still present (at least in June) in American Twitter accounts may speak to the difficulty in switching from the masculine to the feminine after only a month of exposure. Only time will tell if the feminine prevails in the French of everyday American (as well as African) speakers, although – with a little optimism – we can only hope that the circumstances give us increasingly fewer occasions to speak of COVID-19 in the future.
5.2 Future directions
Since our initial period of data collection, with the arrival of vaccines and new variants, discussion of COVID-19 has continued to persist. Follow-up work may confirm whether or not stabilization of a gender has taken place in any given continent. In addition, with greater amounts of data, a more geographically specific analysis may be more tractable. We see this as crucial especially in North America, where one can reasonably expect areas such as Haiti and French overseas départements to align with Hexagonal French practices over Canadian (Québécois) ones. We also intentionally refrained in this article from drawing conclusions about the potential role of popularity on Twitter (via follower count) in driving trends. A social network analysis is needed to answer such questions. Finally, with targeted surveys, we may be able to affirm or refine our hypotheses concerning the motivation behind the use of either gender in the varieties of French as spoken in these three areas of the world. The issue remains particularly enigmatic with respect to the African continent.
5.3 Conclusion
Our goal with this article was to follow the evolution of gender for the noun “COVID-19” in French. Being a sudden but globally used neologism, this word provides an unparalleled testing ground for the factors influencing the morphosyntactic incorporation of novel words in various varieties of French. We processed data from a corpus of social media (Twitter) and a newspaper corpus to identify the geographical origin of the tweets and newspaper articles in order to compare and contrast the varieties of French spoken in three continents: Africa, (North) America and Europe. Overall, we found that American media passed overwhelmingly to the feminine in March 2020, following recommendations by Canadian (and more specifically, Québécois) sources of linguistic authority, while usage in American Twitter plateaued off to 50–70% by June. Meanwhile, African media and users increased dramatically in their use of the feminine, but only after the recommendation of the Académie française in May. Finally, use of the feminine is essentially negligible in both European datasets. We proposed an interplay of several factors to explain these results, both linguistic and extralinguistic. First, varieties of French differ somewhat with respect to their gender systems, particularly in English loanword adaptation. In addition, we noted differing roles of and attitudes towards language authorities. Finally, the relative tardiness of European (French) institutions likely played a role in solidifying those trends (despite a similar recommendation by the WHO months prior), allowing the masculine to become the community norm.