1. Introduction
Abusive language is a form of language that aims to create and enhance social tensions, provoking psychological and physical problems in victims, as well as dangerous forms of violence as offline attacks. For these reasons, in the last years, the need to detect the various forms of abusive language online (i.e., hate speech and stereotypes) is becoming an important topic for several public and private actors as well as activists.
However, the expression of abuses is not always explicit. Some messages, like Example (1),Footnote a implicitly contribute to mine the social tolerance towards the perceived outgroup (Fiske Reference Fiske1998).
-
(1) I carabinieri hanno individuato come possibile spacciatore un 27enne del Marocco. La tipica #risorsa straniera, ammiro la madre! URL
The Italian police have identified a 27-year-old from Morocco as a possible drug dealer. The typical foreign #resource, I admire the mother! URL
As suggested by Bianchi (Reference Bianchi2021), we can individuate two main dimensions of abusive language: an evident dimension that consists in the “verbal violence” that evokes the “physical violence” appearing explicitly aggressive and offensive; and a dimension of propaganda that aims at attesting the social identity presenting some roles or assumptions as normal and conventional, and appearing as a form of proselytism of negative idea. Thus, their consequences are various, both in victims and in society.
The exposure to harassment and microaggressions could provoke, in the long run, serious physical health issues such as cardiovascular diseases (Calvin et al. Reference Calvin, Winters, Wyatt, Williams, Henderson and Walker2003) and immediate complex mental health issues such as depression and state of anxiety that might culminate in suicide (Nadal et al. Reference Nadal, Griffin, Wong, Hamit and Rasmus2014). It could also affect their willingness to engage in public and civic life, leaving, on the one hand, the communities which they are from unrepresented and depriving, on the other hand, the society where they live of a plurality of perspectives, useful in a democratic context.Footnote b
Moreover, although a causal link between the cyberharassment and hate crime is generally hard to demonstrate due to the difficulty to trace the particular texts that encourage the physical offenses, the risk of crime is assessed by victim surveys collected in the EU by the European Union Agency for Fundamental Rights,Footnote c by the systematic recording of crimes motivated by discriminatory bias, and by specific social studies that display the connection between hate speech spread online and crimes towards women, refugees, or religious and cultural minorities.
For instance, in the USA, Fulper et al. (Reference Fulper, Ciampaglia, Ferrara, Ahn, Flammini, Menczer, Lewis and Rowe2014) demonstrated the existence of a correlation between the number of rapes and the amount of misogynistic tweets per state. In London, linking police crime, census, and Twitter data, Williams et al. (Reference Williams, Burnap, Javed, Liu and Ozalp2020) revealed a consistent association between online hate speech targeting race and religion and offline racially and religiously aggravated crimes. While in Germany, Müller and Schwarz (Reference Müller and Schwarz2021) demonstrated a connection among the political propaganda anti-refugees on Facebook, higher usage of social media, and crimes against refugees, as a clear effect of the echo chambers phenomenon.
The detection of hateful messages online, therefore, turns into a task of growing interest, and the techniques from Natural Language Processing (NLP) and Computational Linguistics (CL), could help to provide frameworks to formalize abusive language; unmask cognitive and linguistic processes implied in its comprehension; and propose models that allow machines to detect it automatically. However, the task of detecting abusive language is really challenging due to the different forms it can take. Waseem et al. (Reference Waseem, Davidson, Warmsley and Weber2017) emphasize the need to take into account two factors: the type of target (individuals or groups) and the degree of explicitness, which is established looking at the level of denotation and connotation of the hateful message.
Looking at the following examples:
-
(2) user user … ma tutti i psicolabili stranieri li dobbiamo tenere noi in Italia?… se non sanno reprimere i loro istinti vanno tenuti segregati per non nuocere pi… invece stanno liberi… user user … but do we have to keep all the psychologically insane foreigners in Italy? … if they don’t know how to repress their instincts they must be kept segregated in order not to harm anymore … instead they stay free …
-
(3) Come osano chiedere il biglietto ai #migranti? Mandate subito i caschi blu dell’#ONU a vigilare sui diritti delle #risorse! Aggredito controllore sul #treno GTT URL
How dare they ask #migrants for a ticket? Send the #ONU peacekeepers immediately to monitor the rights of #resources! Controller assaulted on #train GTT URL
we notice that in (2), the abusive language is unambiguous, whereas Example (3) involves sociocultural assumptions and ironic intention, that could be difficult to comprehend by humans with different backgrounds (Akhtar, Basile, and Patti Reference Akhtar, Basile and Patti2020). Detecting correctly abusive language means, thus, understanding also the processes that make it indirect, such as typical cognitive bias and figurative language.
Our main purpose in this paper is to shed light, by means of an ensemble of computational techniques, on the linguistic and cognitive elements involved in the implicit and explicit manifestations of abuse in texts, considering different media sources characterized by different ways of conveying abusive language.
In this regard, we focus on a benchmark dataset, called here HaSpeeDe2020, released by the organizers of the HaSpeeDe shared task (Sanguinetti et al. Reference Sanguinetti, Comandini, Di Nuovo, Frenda, Stranisci, Bosco, Caselli, Patti and Russo2020) at EVALITA 2020 for detecting hate speech and stereotypes in Italian. This shared task provides a suitable framework to investigate the different expressions of abusive language in a corpus of tweets and news headlines.
In HaSpeeDe, stereotypes are conceived as an orthogonal dimension of abusive language which do not necessarily coexist with hate speech: “a standardized mental picture that is held in common by members of a group and that represents an oversimplified opinion, prejudiced attitude, or uncritical judgment.” The proliferation of oversimplified and uncritical judgments about especially minorities causes the reinforcement of outgroup homogeneity perceived as different and, sometimes, in contrast with own in-group (Fiske Reference Fiske1998), like in:
-
(4) #DecretoSalvini esatto e’ buono anche per gli immigrati regolari che si vogliono integrare sul serio. la nostra cultura millenaria fara loro del bene.
#DecretoSalvini exactly is even good for legal immigrants who want to integrate seriously. our millennial culture will do them good.
Concerning hate speech, the messages that aim at spreading or justifying hate, inciting violence, or threatening the freedom, dignity, and safety of individuals are considered hateful (Erjavec and Kovačič Reference Erjavec and Kovačič2012; Sanguinetti et al. Reference Sanguinetti, Poletto, Bosco, Patti and Stranisci2018). In HaSpeeDe2020, the considered targets are the three most attacked minor communities in Italy: immigrants, Muslims, and Roma. Hate speech about these targets is often based on negative stereotyped ideas that categorize them as criminals or parasites. Such negative evaluations are very common also in traditional media such as news headlines where the hateful message is mostly implicit but still hurtful, for example in:
-
(5) Il regno di immigrati e no global: “Ecco l’anticamera dell’inferno”
The kingdom of immigrants and no global: “Here is the antechamber of hell”
The context of HaSpeeDe gives us the opportunity to investigate this social problem also from a multi-genres perspective, highlighting how hate speech and stereotypes are expressed in texts from different sources, and how, as orthogonal dimensions of abusive language, to interact between them.
As seen in Examples (1) and (3), hate speech could be made ambiguous by the use of figurative devices. Positive words (“resources,” “admire,” “peacekeepers,” “rights”) are sarcastically used to sugar-coat the real negative meaning and mask the stereotypes about immigrants. Some figures of speech, indeed, prove to be suitable for expressing hurtful opinions. For instance, in Frenda et al. (Reference Frenda, Cignarella, Basile, Bosco, Patti and Rosso2022), sarcasm showed to be characterized by aggressive language, differently from other forms of irony that appear principally offensive in texts about immigration issues. Furthermore, Sanguinetti et al. (Reference Sanguinetti, Poletto, Bosco, Patti and Stranisci2018) noticed that, especially in case of negative and hateful opinions, social media users tend to be less explicit employing irony in their claims, in order to limit their exposure. Considering these previous studies, our main intuition is that the implicit abuses online can be manifested by the use of ironic language, and, thus, making aware the system of irony could improve the detection of abusive language.
HaSpeeDe2020 contains texts coming from the dataset of tweets released during the first edition of HaSpeeDe in EVALITA 2018 (Bosco et al. Reference Bosco, Dell’Orletta, Poletto, Sanguinetti and Tesconi2018), and, since it was part of the Italian Hate Speech corpus (IHSC) described in Sanguinetti et al. (Reference Sanguinetti, Poletto, Bosco, Patti and Stranisci2018), they are already annotated as ironic and non-ironic. Therefore, we harmonized the annotation of HaSpeeDe2020 labeling the missing instances, and, inspired by Cignarella et al. (Reference Cignarella, Frenda, Basile, Bosco, Patti and Rosso2018), we provided also the annotation of sarcastic and non-sarcastic texts. Then, HaSpeeDe2020 was extended with other tweets with the same annotation coming from IHSC. This longer sample, called here HaSpeeDe2020_ext, was used for the statistical and experimental analyses.
Considering the availability of such a dataset, we wonder:
-
RQ1 What are the evident and hinted characteristics of hate speech and stereotypes in different textual genres?
-
RQ2 How do hate speech and stereotypes interact between them?
To this purpose, we carried various statistical and computational analyses exploiting the advantages of fine-tuning three Italian BERT-based Language Models (LMs) to detect hate speech and stereotypes in tweets and news headlines, putting emphasis on their “knowledge” derived from the data used for the training. Along with pre-trained LMs, we employed techniques of multi-task learning to make the classifier aware of other phenomena (i.e., stereotypes, hate speech, irony, and sarcasm). And, to show up the linguistic peculiarities of hate speech and stereotypes, we create a set of linguistic features aimed at capturing mainly connotative meanings, affective information, and syntactic patterns.
Answering these research questions leads us to examine deeply the creative and cognitive aspects of the abusive language online, providing a solid basis for the development of systems able to identify and prevent the spread of explicit and implicit manifestations of hate speech and stereotypes in texts coming from social media and news.
2. Implicit abuse and its open challenges
Although the attention on abusive language is recent in NLP and CL communities, the existing literature in this field is vast and not uniform. The subjective perception of the issue has caused various interpretations of the term hate speech and certain vagueness in the use of related terms such as abusive, toxic, dangerous, offensive, and aggressive language (Poletto et al. Reference Poletto, Basile, Sanguinetti, Bosco and Patti2021; Vidgen and Derczynski Reference Vidgen and Derczynski2021). Following the typology delineated by Waseem et al. (Reference Waseem, Davidson, Warmsley and Weber2017) and Poletto et al. (Reference Poletto, Basile, Sanguinetti, Bosco and Patti2021), we adopt the term abusive language as an umbrella term to enclose different expressions of abuse online.
Due to this lack of uniformity, most of the studies investigated on the detection of specific manifestations of abusive language, such as aggressiveness (Kumar et al. Reference Kumar, Ojha, Malmasi and Zampieri2018; Carmona et al. Reference Carmona, Guzmán-Falcón, Montes-y-Gómez, Escalante, Pineda, Reyes-Meza and Sulayes2018), flames (Lapidot-Lefler and Barak Reference Lapidot-Lefler and Barak2012), incivility (Rösner and Krämer Reference Rösner and Krämer2016), cyberbullying (Dinakar, Reichart, and Lieberman Reference Dinakar, Reichart and Lieberman2011), offensiveness (Zampieri et al. Reference Zampieri, Malmasi, Nakov, Rosenthal, Farra and Kumar2019), toxicity (Taulé et al. Reference Taulé, Ariza, Nofre, Amigó and Rosso2021), misogyny (Fersini et al. Reference Fersini, Rosso and Anzovino2018; Pamungkas et al. Reference Pamungkas, Basile and Patti2020) and racism (Waseem and Hovy Reference Waseem and Hovy2016).
Recently, the focus of some benchmark competitions was extended to detection of various types of abuses in the same instances, providing, thus, new basis for more robust theoretical observations and computational models. Among them, Basile et al. (Reference Basile, Bosco, Fersini, Nozza, Patti, Rangel-Pardo, Rosso and Sanguinetti2019) organized the HatEval shared task at SemEval-2019 on hate speech and aggressive behavior detection in a multilingual corpus; the organizers of HASOC (Hate Speech and Offensive Content Identification in Indo-European Languages) at FIRE 2020 (Mandl et al. Reference Mandl, Modha, Kumar and Chakravarthi2020) proposed to detect and distinguish offensive language, hate speech, and profanities in a multilingual dataset; Kumar et al. (Reference Kumar, Ojha, Malmasi and Zampieri2020) presented the second edition of TRAC (Trolling, Aggression and Cyberbullying) in 2020 on the detection of aggression and misogynistic aggression in multilingual data; Fersini, Nozza, and Rosso (Reference Fersini, Nozza and Rosso2020) proposed a new edition of AMI (Automatic Misogyny Identification) at EVALITA 2020 asking participants to detect misogynistic texts and its aggressive attitude in Italian data, and finally, for the same occasion, Sanguinetti et al. (Reference Sanguinetti, Comandini, Di Nuovo, Frenda, Stranisci, Bosco, Caselli, Patti and Russo2020) presented the second edition of HaSpeeDe focused on detecting hate speech and stereotypes in Italian tweets and news headlines.
Nevertheless, the efforts towards combined analyses are still few. Clarke and Grieve (Reference Clarke and Grieve2017), for instance, investigated the functional linguistic variations between racist and sexist tweets of the corpus of Waseem and Hovy (Reference Waseem and Hovy2016), discovering that tweets against women tend to be more interactive and attitudinal than racist ones, addressed principally to persuade and argue the discrimination reporting events. Lavergne et al. (Reference Lavergne, Saini, Kovács and Murphy2020) developed, for the second edition of HaSpeeDe, a competitive model based on multi-task learning approach to detect simultaneously hate speech and stereotypes, showing that injected knowledge about stereotypes improves the detection of hate speech only in tweets.
Moreover, the existing surveys on abusive language detection (Schmidt and Wiegand Reference Schmidt and Wiegand2017; Fortuna and Nunes Reference Fortuna and Nunes2018) underline the necessity to computationally approach the implicitness of toxic discourses, especially in the cases where these are disguised by sarcasm, euphemism, rhetorical questions, litotes, or where there are no explicit accusations, negative evaluations, or insults. This kind of implicitness eludes the offensiveness of the text, making its recognition hard, especially for machines (Nobata et al. Reference Nobata, Tetreault, Thomas, Mehdad and Chang2016; Frenda Reference Frenda2018; MacAvaney et al. Reference MacAvaney, Yao, Yang, Russell, Goharian and Frieder2019).
Additionally, the evaluation process based on specific test sets and measures, such as accuracy and F1-score, could overestimate the model performance without revealing particular weak points. To this purpose, Röttger et al. (Reference Röttger, Vidgen, Nguyen, Waseem, Margetts and Pierrehumbert2021) developed a suite of functional tests, called HateCheck, to evaluate deeply the models for abusive language detection in English. In particular, on the basis of previous works and interviews, they elaborated 29 functional tests that cover the most common challenges in hate speech detection, such as negation, slurs, pronoun reference, threatening language, counter speech, and spelling errors. The usefulness of this tool was confirmed by Vidgen et al. (Reference Vidgen, Thrush, Waseem and Kiela2021), who tested their dynamic approach of dataset generation, proving the robustness of its model trained in various rounds. A dynamic approach, indeed, allows coping with problems when the work is conducted, discussing with expert annotators, extending and ameliorating step by step the training set, annotated taking into account different types and targets of hate.
The availability of various vulnerable targets helps the classifier to generalize better the presence of hate, without excluding the identification of abusive language towards unseen groups. An example comes from Talat, Thorne, and Bingel (Reference Talat, Thorne and Bingel2018), where authors proposed a multi-task learning based model with the aim to bridge differences in annotation and data collection such as different annotation schemes, labels, or geographic and cultural influences from data sampling.
As suggested by Jurgens, Hemphill, and Chandrasekharan (Reference Jurgens, Hemphill and Chandrasekharan2019), NLP community needs, indeed, to expand its efforts to recognize infrequent abuses (taking into account especially the context where these abuses occur) and detect subtle abuses that could be manifested as benevolent stereotyping, condescension, minimization, or disparity in treatment of social groups. In addition, Wiegand, Ruppenhofer, and Eder (Reference Wiegand, Ruppenhofer and Eder2021) identified specific subtypes of implicit abuse analyzing various benchmark datasets in English: stereotypes, perpetrators, comparisons, dehumanization, euphemistic constructions, call-for-action, multimodal abuse, and all the phenomena that require world knowledge and inferences such as jokes, sarcasm and rhetorical questions.
Some of these subtypes have been identified by scholars as problematic challenges in abusive language detection, demonstrating that only its explicit manifestations are understood by current classifiers (supervised and unsupervised). For instance, Van Aken et al. (Reference Van Aken, Risch, Krestel and Löser2018) proposed a detailed error analysis of an ensemble classifier’s performance in a WikipediaFootnote d and Twitter (Davidson et al. Reference Davidson, Warmsley, Macy and Weber2017) dataset, individuating specific phenomena that make abusive language difficult to recognize: lack of explicit offenses (such as swearwords), idiosyncratic expressions, rhetorical questions, metaphorical, and ironic language. As shown also by Wiegand, Ruppenhofer, and Kleinbauer (Reference Wiegand, Ruppenhofer and Kleinbauer2019), the performance of classifiers in presence of implicit abuse decreases considerably, with some exception regarding those cases where the sampling process introduces data bias in the training and test set. These analyses that take into account the explicit and implicit portion of abusive documents are carried out looking at the vocabulary of the corpora: a document contains explicit abusive language if it includes at least one word from a lexicon of abusive words (Wiegand et al. Reference Wiegand, Ruppenhofer, Schmidt and Greenberg2018). The same approach is employed to OLID/OffensEval dataset (Zampieri et al. Reference Zampieri, Malmasi, Nakov, Rosenthal, Farra and Kumar2019) by Caselli et al. (Reference Caselli, Basile, Mitrović, Kartoziya and Granitzer2020) as a basic analysis to reflect about the notions of explicit/implicit and offensive/abusive and then propose a new annotation on OLID/OffensEval creating AbuseEval v1.0. As expected, the authors showed that the documents annotated as offensive in OffensEval overlaps largely with the documents annotated as explicitly abusive in AbuseEval and that the identification of the implicit abuse is more difficult than the explicit one.
Coping with implicit phenomena is necessary to make systems able to understand these messages that have a strong abusive effect but very weak offensive forms. Bowes and Katz (Reference Bowes and Katz2011), for example, noted that the victims of sarcastic utterances do not perceive the expression as humorous, differently from the aggressors’ point of view, and not less polite than the literal counterpart. This study contradicts the line of some scholars that stress the hypothesis that considers ironic language as a device to mute the negative meaning (Dews and Winner Reference Dews and Winner1995). In this regard, Pexman and Olineck (Reference Pexman and Olineck2002) proposed a pragmatic analysis of ironic insult and ironic compliment: the former is perceived as more polite whereas the latter as mocking and sarcastic. Speakers, in fact, tend to criticize someone lowering the social cost of doing so, and ironic language seems appropriate to conceal the abuse.
In spite of the theoretical literature clearly describes the implicitness of abusive language, the computational efforts that could support it are few. To our knowledge, only stereotypes and metaphors have been exploited for abusive language detection. For instance, Lemmens, Markov, and Daelemans (Reference Lemmens, Markov and Daelemans2021) proved the contribution of hateful metaphors as features for the identification of the type and target of hate speech in Dutch Facebook comments using models based on classical machine learning and transformers.
In this context, our contribution aims to bring to light, by means of statistical and computational analyses, the invisible processes that characterize indirect abusive language in terms of cognitive bias and ironic language.
3. Figurative and cognitive aspects: Statistical analysis of the corpus
To answer our research questions, the HaSpeeDe context seems to provide a suitable framework for the Italian language. The HaSpeeDe shared task proposed by Sanguinetti et al. (Reference Sanguinetti, Comandini, Di Nuovo, Frenda, Stranisci, Bosco, Caselli, Patti and Russo2020) consists of three sub-tasksFootnote e:
-
Task A (Hate Speech Detection) is a binary classification task that, like in the first edition in 2018, asks participating systems to predict whether a text contains hate speech (hs or non-hs) towards a given target (immigrants, Muslims and Roma);
-
Task B (Stereotype Detection) is a binary classification task aimed at determining the presence or the absence of a stereotype (stereoor non-stereo) towards the same targets;
-
Task C (Identification of Nominal Utterances) is a sequence labeling task to recognize NUs in texts previously predicted as hateful.
Considering this context of analysis on the implicitness of abusive language, in this work we face computationally only the first two tasks, exploiting the annotation provided for Task C for additional analysis.
3.1 Description of the dataset
The HaSpeeDe2020 dataset contains tweets and news headlines: a part gathered from other existing corpora and another collected in the last years from social media and newspapers online. All these data were annotated using the guidelines defined for IHSC.Footnote f
Considering the aim of the organizers of HaSpeeDe of encouraging the development of more robust systems of detection in cross-genre contexts, they proposed two test sets in the competition: one composed of tweets (Test_TW) and another composed of news headlines (Test_TW), whereas the training set (Train_TW) of HaSpeeDe2020 consists only of tweets.
About the composition of Train_TW and Test_TW, a part of tweets comes from the Twitter dataset released in the first edition of HaSpeeDe in 2018 (and partially derived from IHSC); the rest instead has been gathered for the Italian hate speech monitoring project “Contro l’Odio” (Capozzi et al. Reference Capozzi, Lai, Basile, Musto, Polignano, Poletto, Sanguinetti, Bosco, Patti, Ruffo, Semeraro and Stranisci2019). In particular, only data posted between September and December 2018 were included in Train_TW, whereas Test_TW contains the tweets posted between January and May 2019.
The news headlines about immigrants related events were used only in the second test set (Test_NW). These data were retrieved between October 2017 and February 2018 from online newspapers such as La Stampa, La Repubblica, Il Giornale, Liberoquotidiano.
Taking into account this composition of HaSpeeDe2020 and the fact that some tweets are already annotated as ironic and non-ironic (iro and non-iro), we harmonized the annotation of HaSpeeDe2020 labeling the missing instances, and providing also the annotation of sarcastic and non-sarcastic (sarc and non-sarc) texts. For this last label, we followed the schema of annotation used for creating the dataset released for the IronITA shared task organized by Cignarella et al. (Reference Cignarella, Frenda, Basile, Bosco, Patti and Rosso2018) at EVALITA 2018.
In IronITA, irony is defined as a figurative language device that conveys a meaning that is secondary or opposite to the literal one. Linguistic literature places irony among metalogic figures that are the figures that modify the logic value of the utterance, breaking the maxim of quality (Grice Reference Grice1975) and affecting the literal meaning (Garavelli Reference Garavelli1997). Instead, sarcasm is conceived as a type of irony that aims to mock and scorn a victim, and that, differently from other forms of irony, is used to ridicule a specific target (Lee and Katz Reference Lee and Katz1998).
Finally, Train_TW was extended with other instances with the same annotation from IHSC, to have a longer sample for the analyses. This extended version is called here HaSpeeDe2020_ext.
Table 1 shows the data distribution in HaSpeeDe2020_ext (as well as the increment of tweets from Train_TW to Train_TW_ext) and the average of number of tokens per instance in each class (#token/text). Taking into account the different textual genres of our dataset, we calculated the average separately for tweets and news headlines. Table 2 reports some examples from Train_TW_ext.
3.2 Statistical analysis
Exploiting the various labels for each instance, we applied a statistical analysis to study the association between ironic language (irony and sarcasm) and abusive language (hate speech and stereotype) interpreted as nominal variables of a population. In particular, we computed:
-
the $\chi^2$ test of independence that, by means of the interpretation of p-value, gives information on the existence or not of significant relations between nominal variables;
-
the Yule’s Q to indicate if the association between two binary variables is positive (values close to 1), negative (values close to -1), or null (values close to 0).
To reject the null hypothesis (hypothesis that the variables are independent) of the $\chi^2$ test of independence, the p-value should be minor than the significance level set by convention to 0.05, and to calculate the p-value, we considered a degree of freedom based on the number of observations. The results of this analysis are reported in Table 3.
Table 3 shows that in tweets, the association between sarcasm and abusive language reports high scores, especially in the case of hate speech. Differently from sarcasm, the values related to the relation between irony and stereotypes are lower and in the case of non-sarcastic irony the relation is even absent.
About the genre of news headlines, the association between hateful and ironic language appears stronger than in tweets, especially in the cases where irony is not sarcastic. Although the values related, particularly, to news headlines are based on very few data (Table 1), this analysis gives us a first look of the possible characteristics of indirect language in messages containing hate speech and stereotypes in different textual genres.
We propose the same analysis between the two dimensions of abusive language analyzed in this work.
Table 4 shows a strong association between hate speech and stereotypes, confirming the fact that, especially in implicit contexts such as news headlines, abusive language is characterized mainly by negative stereotypes that support the intolerance towards specific groups (see Table 2).
4. Computational analyses
The statistical analysis, that partially answer our research questions, confirms our initial intuition on the possible implicit manifestations of abusive language through sarcasm in informal texts and through stereotyped ideas in formal contexts such as news headlines. In order to bring to light the implicit and explicit characteristics of hate speech and stereotypes in different textual genres [RQ1], and understand well how these two dimensions of hate interact between them [RQ2], we carried out a battery of computational experiments on abusive language detection exploiting the HaSpeeDe framework.
We used the same test sets (Test_TW and Test_NW) and evaluation measure proposed at the shared task, to compare the results with attested baselines and other models. However, for the training phase, we exploit the extended version of the training set: Train_TW_ext.
In particular, we designed three different set of systems based on:
-
1. simple fine-tuning three LMs for Italian (FT_model),
-
2. combination of the LMs’ knowledge with the awareness derived from the simultaneous learning of related tasks (MTL_model),
-
3. combination of the knowledge derived from MTL_model with specific linguistic features (MTL_model+Features).
FT_model. Considering the popularity in recent years of transformers, we selected three LMs trained on different genres of texts to reveal the contribution of transfer learning in a cross-genre context for abusive language detection.
-
a. AlBERTo (Polignano et al. Reference Polignano, Basile, de Gemmis, Semeraro and Basile2019) is trained on twita (Basile, Lai, and Sanguinetti Reference Basile, Lai and Sanguinetti2018), a large dataset collecting Italian tweets from February 2012.Footnote g
-
b. Italian BERT (ItBERT)Footnote h is trained on Wikipedia dump and various texts from the OPUS corpora (Tiedemann and Nygaard Reference Tiedemann and Nygaard2003) for a total size of 13gb.
-
c. Italian BERT XXL (ItBERTXXL)Footnote i is a sort of extended version of ItBERT. It is trained on the same data from OPUS of ItBERT and on additional documents coming from the Italian part of the OSCAR corpus (Ortiz Suárez, Sagot, and Romary Reference Ortiz Suárez, Sagot and Romary2019) for a final size of 81gb.
MTL_model . As shown in Tables 3 and 4, abusive language tends to be expressed with creative or cognitive aspects in respect to the formal or informal context. Therefore, in this set of experiments we aim:
-
(1) to quantify the impact of additional knowledge related to ironic language in abusive language detection,
-
(2) to comprehend deeply the interaction between hate speech and stereotypes, even in a neural network context.
To this purpose, we employed an approach based on multi-task learning. At the computational level, the advantages derived from the use of multi-task learning techniques, such as the hard parameter sharing, are various. Firstly, this technique gives systems more evidences to evaluate if a feature is relevant or not, focusing strictly on the most relevant ones for each task. Then, the hard parameter sharing allows a better generalization for each task: learning simultaneously more tasks means to find a representation that is appropriate for learning all the tasks, reducing consequently the over-fitting on the original task (Baxter Reference Baxter1997).
MTL_model+Features. In the last set of experiments, we aim to examine the relevance of specific linguistic features in texts containing hate speech and stereotypes, and to estimate their contribution ahead of a classification task.
To this purpose, we designed:
-
a set of linguistic features to capture information related to style of writing, syntax, lexical semantics, and pragmatics such as emotions and sentiment,
-
a neural network that converges in a unique model the knowledge coming from the LMs in the MTL context, and the specific knowledge derived from dedicated linguistic features.
4.1 Neural network architecture
In the first set of experiments (FT_model), we simply fine-tuned the LMs on hate speech and stereotypes classification tasks, taking into account the pooled output of the BERT-based model, adding a dropout layer to prevent the over-fitting, a dense layer with standard ReLU activation, and a final dense layer to get the class-related probability employing a Sigmoid function. As optimizer, we used Adam with a really low learning rate (0.00001) found by means of a specific callback function.Footnote j Finally, to minimize the loss function during the training, we used the binary cross-entropy function for binary classification provided by keras library.
The same structure and hyperparameters are applied for the second set of experiments (MTL_model). For the MTL context, on the top of the previous network, we added two final dense layers that employ Sigmoid function to get one output for each task. In accordance with the standard BERT input representation (Devlin et al. Reference Devlin, Chang, Lee and Toutanova2018), the text is represented in both networks as tokens, segments, and masked input. To load the pre-trained models for TensorFlow environment and tokenize the texts for creating tokens-input, we used transformers libraryFootnote k.
In the last set of experiments (MTL_model+Features), we employed the same network of MTL_model, adding a concatenate-layer that combines the pooled output of BERT-based models with a features’ vector representation input-layer. Before the concatenation, we applied the batch normalization technique to the features’ input-layer to standardize the layer and stabilize the learning process. For the MTL context, also here we have two output layers, one for each task.
To weight the majority of extracted features, we used the TF-IDF (term frequency-inverse document frequency) measure calculated for each word of the vocabulary (TF-IDF_dict). Other semantic and pragmatic features are weighted considering the scores of polarity and the cosine similarity. These weights have been standardized using MinMaxScaler of scikit-learn with default range of scaling.
To create the TF-IDF_dict and the word embedding model used to compute the cosine similarity, we preprocessed the texts: deleting URLs and symbols like @ and # to maintain the lexical information of hashtags and usernames; tokenizing and lemmatizing words using the implementation of TreeTaggerFootnote l for pythonFootnote m; and removing stopwordsFootnote n to retain lexical significant words.
4.2 Linguistic features
To create the set of features, we took inspiration from previous works that define specific patterns to identify irony, sarcasm (Hernández Farías, Patti, and Rosso Reference Hernández Farías, Patti and Rosso2016; Cignarella et al. Reference Cignarella, Basile, Sanguinetti, Bosco, Benamara and Rosso2020), and abusive language (Frenda et al. Reference Frenda, Banerjee, Rosso and Patti2020).
Style related features. This set contains punctuation marks and patterns of negation. Punctuation (punct) is commonly used to express the intended meaning of the message. For instance, users use quotation marks to point out the opposite of the literal meaning of a word, such as “risorsa” (“resource”). Negation (negation) proved to play an important role in the process of comprehension of figurative language (Giora, Givoni, and Fein Reference Giora, Givoni and Fein2015; Karoui et al. Reference Karoui, Farah, Moriceau, Patti, Bosco and Aussenac-Gilles2017). In the features’ vector, they are represented by the sum of their TF-IDF values in the text.
Syntax related features. This set involves specific syntactic dependencies expressing adverbial locutions (adv_loc), intensifiers (intens), discourse connections (disc_conn), mentions (mention), and nominal phrases (and the number of nominal phrases in the tweet) (nom_phrase and num_nom_phrase). To extract these features, we used spacy-udpipe library with TWITTIRò model specified for short texts in ItalianFootnote o (Cignarella et al. Reference Cignarella, Bosco and Rosso2019), and to retrieve their weights, we exploited TF-IDF_dict.
Lexical semantics-related features. This set is composed of:
-
Lexical information about offensive language extracted exploiting the HurtLexFootnote p multilingual lexicon (Bassignana, Basile, and Patti Reference Bassignana, Basile and Patti2018). HurtLex was created from the Italian lexicon “Le Parole per Ferire” by Tullio de Mauro, and the words in the lexicon are classified in 17 types of offenses (see Table 5) enclosed in two macro-categories: conservative, that are the words with literally offensive sense; and inclusive, that are the words with not literally offensive sense but that could be used with negative connotation. To extract these features, we used the featurizerFootnote q created specifically for this lexicon that put the attention on the categories. Therefore, to represent them in the features’ vector, we computed the sum of the TF-IDF values of all words in the text belonging to each category.
-
Semantic information about incongruities and similarities revealed by words and pairs of words in the text. These features are extracted considering the variability of the TF-IDF weights of the words in the text by means of the standard deviation ( $\sigma$ ) and the coefficient of variation (cv), the average of weights (avg), and the maximum (max), minimum (min), and median (med) values of the list of the TF-IDF values of words (W) and bigrams of words (B) in a text. The values related to bigrams are computed using the weights’ normalization on the scores of maximum and minimum (C1) and of standard deviation and average (C2). Moreover, we calculated the similarity ( $\cos(\theta)$ ): 1 between pairs of words (vector of bigram of words) and the sentence context (corresponding to sentence vector) ( $\mathtt{\cos(\theta)\_BS}$ ), and 2 between the bigrams of words within the sentence ( $\mathtt{\cos(\theta)\_BB}$ ). To represent them in the features’ vector, we computed $\sigma$ , the coefficient of variation, the average, and maximum, minimum, and median scores of lists of cosine similarity values. The word embedding model, created for computing these values has been created following the methodology presented in Frenda et al. (Reference Frenda, Cignarella, Basile, Bosco, Patti and Rosso2022).
Pragmatics related features. This set consists of affective information extracted exploiting Sentix and Emolex dictionaries.
-
To extract the sentiment of the text, we used SentixFootnote r (Basile and Nissim Reference Basile and Nissim2013), that contains for each lemma scores about positive and negative sentiment, polarity, and intensity. Using this information, we calculated the average of positive and negative score of words in the text (avg_positive and avg_negative), $\sigma$ of polarity, and the intensity average (avg_intensity).
-
To capture emotions and feelings, we used EmoLexFootnote s (Mohammad and Turney Reference Mohammad and Turney2013), a multilingual lexicon containing for each lemma information about the 8 principal emotions of Plutchik (Plutchik and Kellerman Reference Plutchik and Kellerman2013). Inspired by Plutchik (Reference Plutchik2001), we exploited the wheel of emotions to capture in the message the principal emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust), the primary dyads or feelings (aggressiveness, optimism, love, submission, awe, disapproval, remorse, contempt), and the variability of opposite emotions/feelings by means of $\sigma$ . Emotions and feelings are represented in the features’ vector by the sum of the TF-IDF values of the words related to each emotion/feeling in the text.
4.2.1 Features relevance
Figure 1 shows the relevance of the sets of features analyzed for each phenomenon, computed by means of $\chi^2$ values. In particular, we report the most relevant features with a $\chi^2$ greater than 3.
Looking at Figure 1, we can notice that Hate Speech and Stereotypes are characterized by very similar features. In general, both are featured by negative emotions and feelings (anger, awe, disgust, aggressiveness, fear) and by offensive words with conservative and inclusive interpretation. Some categories of offenses related especially to animals, physical disabilities or diversity, behaviors/morality, and general swear words are more relevant in hate speech, whereas in stereotyped messages the offenses more significant are linked in particular to economic and social issues, cognitive, and ethnic sphere, even if in a more indirect way.
At semantic level, the minimum score of similarity between the bigrams of words within the sentence ( $\mathtt{\cos(\theta)\_BB}$ ) appears to be relevant in stereotypes recognition. This specific feature brings out the semantic incongruity in the text: a common technique used to express irony (Riloff et al. Reference Riloff, Qadir, Surve, De Silva, Gilbert and Huang2013; Joshi, Sharma, and Bhattacharyya Reference Joshi, Sharma and Bhattacharyya2015; Pan et al. Reference Pan, Lin, Fu and Wang2020).
Finally, both phenomena appear characterized by specific syntactic patterns, such as negation and adverbial locutions. From a manual examination, we found that the former are used especially to mark some characteristics of outgroup, juxtaposing it sometimes with the in-group, while the latter tend to increase the intensity of some beliefs or make the sentences mainly nominal. Here are some examples of the presence of these markers:
-
(6) Ora ankio andrò ad emigrare dato che qui sarà tutto occupato da stranieri. tanto di noi italiani non gliene frega nessuno…. vergognaaa
Now me too I will go to emigrate since everything here will be occupied by foreigners. so much of us Italians do not give a damn…. shame on
-
(7) I nostri migranti non erano assolutamente come questa gentaglia qui! Continuare a fare questo paragone un’offesa per tutti quelli che si sono rotti mani e schiena nelle miniere e nelle fabbriche e vivevano nascosti il resto del tempo.
Our migrants were absolutely not like this scum here! Continuing to make this comparison is an offense to all those who broke their hands and backs in mines and factories and lived in hiding the rest of the time.
4.3 Experimental setting
Our computational analyses, interpreted in this work as classification tasks of the texts containing hate speech (Task A) and stereotypes (Task B), are performed using the same setting and hyperparameters for all the sets of experiments.
To evaluate the performance of the models during the training, we used 20% of Train_TW_ext as validation set. The systems have been trained with a maximum of 20 epochs for each run and a batch size of 32 for each epoch. To avoid problems of over-fitting, we used the early stopping function, monitoring the loss value obtained on the validation set, and to obtain reproducible results, we set a seed function.
In regard to MTL_model+Features, we evaluated the performance using both the designed features (Feat) and selected features (SelectedFeat) for each detection task. To select the best features, we considered their $\chi^2$ value (greater than 10) for a total of 27 features for hate speech and 25 for stereotype detection.
Exploiting the HaSpeeDe framework, each model, for both Task A and Task B, has been evaluated using the test sets (Test_TW and Test_NW) and evaluation measure proposed by the organizers of the shared task: F1-macro as average score of F1 value of each class. Moreover, to assess the characteristics of hate speech and stereotypes in tweets and news headlines, we compared the obtained results with the straightforward baselines provided by the organizers and the best scores obtained by the teams first ranked in the competition:
-
Baseline_MFC (Most Frequent Class) that assigns to each instance the majority class of the respective task.
-
Baseline_SVC that classifies the texts using an SVM algorithm with unigrams and char-grams (2-5) weighted with TF-IDF.
-
TheNorth team (Lavergne et al. Reference Lavergne, Saini, Kovács and Murphy2020) used a simple neural network that fine-tunes UmBERToFootnote t LM using specifically a linear layer with a softmax on top of the CLS token, and applying a novel technique of layer-wise learning rate. TheNorth submitted two runs for each task. The first obtained simply fine-tuning UmBERTo and the second using a multitasking approach that exploits the possible correlation between texts containing hate speech and texts expressing stereotypes about the targets.
-
CHILab team (Gambino and Pirrone Reference Gambino and Pirrone2020) experimented transformer encoders in the first run, creating specifically two transformer/convolution blocks for each input (texts and Part-of-Speech or PoS tags) averaged through max pooling and processed finally by a dropout and dense layer to obtain the predictions, and a depth-wise Separable Convolution techniques in the second one. CHILab also used additional tweets taken from twita by means of some keywords extracted from the provided training set to extend the embedding layer of their model.
4.4 Results and discussion
Observing the results in Table 6 Footnote u obtained within the first set of experiments, we can notice interesting correlations in both tasks: the systems using the ItBERTXXL model perform well in tweets (0.788 and 0.764), whereas the systems using ItBERT in news headlines (0.674 and 0.733). AlBERTo, differently from what we expected, contribute to improve the detection of stereotypes in the news genre when the classifier is aware of irony (0.752). Evidently, the generalization about the style of short texts coming from AlBERTo and the knowledge derived from the training on ironic texts help the system to catch some indirect dependencies that make it able to recognize the stereotypes in a very difficult context such as news headlines.
In order to quantify the impact of linguistic and cognitive aspects, we computed the percentage of $\Delta$ between the values of F1-score of best models (underlined in Table 6) and the best baseline (Baseline_SVC). For a better visualization, we report these values in Table 7.
About the expression of stereotypes, differently from the statistical results reported in Table 3, the percentage of variation from baseline model (12.41%) suggests that it is characterized by specific patterns typical also of ironic language, such as the reference to secondary or indirect meanings, principally in less spontaneous texts like news headlines. The opposite trend is visible in hate speech detection, that shows higher values of $\Delta$ when the classifier is aware of sarcasm in both genres, confirming the previous statistics about the use of a sharper form of irony especially in tweets [RQ1].
Looking at the contribution of the mutual information between hate speech and stereotypes, we can notice that, actually, only the detection of hate speech takes advantage of knowledge derived from the simultaneous learning of stereotype detection. And, as we can see in Table 7, the values of $\Delta$ in Task A (12.07% and 11.59%) are very high regardless the textual genre. During the HaSpeeDe shared task, only TheNorth team experimented an approach multi-task learning, showing that the knowledge about stereotypes could improve the identification of hate speech in tweets ( $\Delta$ = 12.20%).
These results allow us to interpret better the strong association emerged in Table 4, proving that even if hate speech could be expressed using negative stereotypes to reinforce or justify the message, the same is not true in reverse [RQ2].
Finally, observing in Table 6 the role of features in the different experiments, we notice that specific linguistic information contributes to increase above all the performance of ItBERT-based models in news headlines. News headlines, in general, are characterized by a different style of writing, less spontaneous than texts coming from social media. To examine their characteristics, we exploited the coarse-grained annotation of NUs provided for Task C, extracting from them the most frequent trigrams of words: “basta balle ecco” (“no more lies here”), “via i migranti’’ (“out the migrants”), and “immigrati la verità” (“immigrants the truth”). These are involved in specific syntactic contexts, like:
-
(8) Immigrati, ammiraglio brutale: ora basta balle. “Ecco chi trama contro l’Italia, serve una guerra”
Immigrants, brutal admiral: no more lies now. “Here is who is plotting against Italy, we need a war”
-
(9) C’è la scuola, via i migranti: “Siamo contrari allapartheid ma ora serve più sicurezza” There is the school, out the migrants: “We are against apartheid but now we need more security”
The identified NUs remember a peculiar political rhetoric that aims to feed the intolerance against immigrants, called specifically Slogan-like NUs by Comandini and Patti (Reference Comandini and Patti2019). The style of news headlines, thus, makes the detection especially of hate speech harder, and although the designed linguistic features seem to help the systems to go beyond the stylistic aspects, the best performance on hate speech detection in news headlines is obtained by the CHILab team exploiting the syntactic representation of text.
5. Conclusion
In this work, we investigated the linguistic and cognitive aspects involved in the explicit and implicit manifestations of abusive language in social media and news with the aim to provide a solid basis for computational applications. Inspired by previous works, we examined the presence of ironic language, specific linguistic patterns, and the co-occurrence of various forms of abusive language in Italian text, exploiting the framework of the HaSpeeDe shared task.
In particular, we carried out various statistical and computational analyses that shed light, firstly, on how hate speech and stereotypes are expressed in texts from different sources, and, secondly, on how, as orthogonal dimensions of abusive language, interact between them.
The statistical analyses revealed a strong association, confirmed also at computational level, between the presence of hate speech and sarcasm in spontaneous texts such as tweets. Sarcasm is a sharp form of irony, used for mocking and ridicule a victim (Lee and Katz Reference Lee and Katz1998). For its peculiarities, it proved to be adequate to express hateful messages, lowering the social cost of what has been said (Frenda et al. Reference Frenda, Cignarella, Basile, Bosco, Patti and Rosso2022).
About stereotypes, a different trend was observed between statistical and computational analyses. Indeed, the awareness of irony improves the performance of the classifier, principally in news headlines, that stand out also for its syntactic structure. This suggests that the expressions of stereotypes are characterized by specific patterns typical also of ironic language, such as the reference to secondary or indirect meanings.
Moreover, although general negative emotions and offenses appear similarly in texts containing hate speech and stereotypes, the analysis of the relevance of linguistic features shows some differences. For instance, offenses related especially to animals, physical disabilities or diversity, behaviors/morality, and general swear words are more relevant in hate speech, whereas in stereotyped messages, the offenses more significant are linked to economic and social issues, cognitive, and ethnic sphere. Stereotypes, in addition, are characterized by particular features aimed at capturing the semantic incongruity within the text, commonly used also to express irony [RQ1].
Stereotypes are conceived as oversimplified judgments shared by the members of a group that reinforce the outgroup homogeneity perceived as different or in contrast (Fiske Reference Fiske1998). Differently from hate speech, stereotypes rarely appear explicitly aggressive or offensive. For this reason, Wiegand et al. (Reference Wiegand, Ruppenhofer and Eder2021) categorized them as types of implicit abuses that could be used to justify the hate. This explains the good performance reached in hate speech detection, when the classifier learned to recognize also stereotypes [RQ2].
However, when we process the texts only at message level, we miss contextual information. Indeed, reading the text in isolation oversimplifies how hate speech happens in reality. And, even if some texts are clearly abusive, in the other cases, context could help to give a more informed perspective to interpret them as abuses or not. In this line, as further work, we want to investigate the impact of contextual information (images, urls, conversational thread) on the resolution of implicitness in abusive language.
6. Ethical considerations
The issue faced in this paper reflects a real social problem, and we are aware of the fact that some readers could feel offended by the reported examples. Their illocutory force, in both explicit and implicit cases, is strong and reinforced by the fact that the addressed targets are entire identity groups and the offenses could touch us personally. Taking into account the sensibility of this issue, we preferred to anonymize the users’ names and replaced the urls with the label URL.
We want to underline that in no way these examples reflect the opinion of the authors. The aim of this work is to create a solid foundation to theoretical debate and computational applications of prevention and detection of abusive language, encouraging academy and industry to take into account its implicitness that has severe effects like direct offenses.
Acknowledgements
The work of S. Frenda and V. Patti was partially funded by the research projects “STudying European Racial Hoaxes and sterEOTYPES” (STERHEOTYPES, under the call “Challenges for Europe” of VolksWagen Stiftung and Compagnia di San Paolo). The work of P. Rosso was partially funded by the Spanish Ministry of Science and Innovation under the research project MISMIS-FAKEnHATE on MISinformation and MIScommunication in social media “FAKE news and HATE speech” (PGC2018-096212-B-C31) and by the Generalitat Valenciana under DeepPattern (PROMETEO/2019/121).