We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Armed conflict presents a multitude of risks to civilians, prisoners of war and others caught in the middle of hostilities. Harmful information spreading on social media compounds such risks in a variety of tangible ways, from potentially influencing acts that cause physical harm to undermining a person's financial stability, contributing to psychological distress, spurring social ostracization and eroding societal trust in evidentiary standards, among many others. Despite this span of risks, no typology exists that maps the full range of such harms. This article attempts to fill this gap, proposing a typology of harms related to the spread of harmful information on social media platforms experienced by persons affected by armed conflict. Developed using real-world examples, it divides potential harm into five categories: harms to life and physical well-being, harms to economic or financial well-being, harms to psychological well-being, harms to social inclusion or cultural well-being, and society-wide harms. After detailing each component of the typology, the article concludes by laying out several implications, including the need to view harmful information as a protection risk, the importance of a conflict-specific approach to harmful information, the relevance of several provisions under international law, and the possible long-term consequences for societies from harmful information.
The information used for this typology is based entirely on open-source reporting covering acts that occurred during armed conflict and that were seemingly related to identified harmful information on social media platforms or messaging applications. The authors did not verify any reported incidents or information beyond what was included in cited sources. Throughout the article, sources have been redacted from citations where there is a risk of reprinting harmful information or further propagating it, and where redaction was necessary to avoid the impression that the authors were attributing acts to particular groups or actors.
Although pretrained large language models (PLMs) have achieved state of the art on many natural language processing tasks, they lack an understanding of subtle expressions of implicit hate speech. Various attempts have been made to enhance the detection of implicit hate by augmenting external context or enforcing label separation via distance-based metrics. Combining these two approaches, we introduce FiADD, a novel focused inferential adaptive density discrimination framework. FiADD enhances the PLM finetuning pipeline by bringing the surface form/meaning of an implicit hate speech closer to its implied form while increasing the intercluster distance among various labels. We test FiADD on three implicit hate datasets and observe significant improvement in the two-way and three-way hate classification tasks. We further experiment on the generalizability of FiADD on three other tasks, detecting sarcasm, irony, and stance, in which surface and implied forms differ, and observe similar performance improvements. Consequently, we analyze the generated latent space to understand its evolution under FiADD, which corroborates the advantage of employing FiADD for implicit hate speech detection.
Hate speech comprises any form of hateful or contemptuous expression that attacks, degrades, or vilifies people based on their social identities. This Element focuses on hate speech targeting social identities that are devalued by a society's dominant groups, and that is likely to evoke, promote, or legitimize harms such violence, discrimination, and oppression. After detailing the ways in which hate speech is expressed (e.g., through derogatory labels, metaphors, offensive imagery), the production of hate speech is explored at theindividual level (e.g., prejudiced attitudes), group level (e.g., realistic intergroup threat), and societal level (e.g., hierarchy maintenance; free speech protections). A discussion of the effects of blatant and anonymous hate speech on targets (e.g., anxiety and depression) and nontargets (e.g., stereotype activation; desensitization; fomenting violence) follows. Finally, the effectiveness of mitigation efforts isexplored, including use of computer-based technologies, speech codes, confrontation, and counterspeech.
This paper argues that what scholars call ‘the free speech principle’ is not one principle but a slew of principles, and that these principles harbour several important differences that have remained largely unremarked upon, namely: (i) extending vs. limiting principles; (ii) comparative vs. non-comparative principles; and (iii) monistic vs. pluralistic principles. The paper also critically assesses certain generalisations that people might be tempted to make about these different principles, such as that one kind of free speech principle is harder to defend than another. Finally, the paper teases out the practical as well as theoretical implications of these insights, including degrees of complexity, the logical relationship between free speech principles and free speech policy dilemmas, and the virtue of compromise over free speech principles.
Digital platforms and social media have expanded the ways in which individuals can exercise their right to freedom of expression and obtain and diffuse information. At the same time, they have become a principal means for haters to express and spread their hate in ways that would have been unthinkable some years ago. Responsive to the challenge, the EU has progressively developed a broad range of instruments and tools to counter online hate speech. This chapter discusses the key characteristics of the EU arrangements made to fight digital hate speech, shedding light on what is a multi-faceted and daunting regulatory task.
This paper provides data resources for low-resource hate speech detection. Specifically, we introduce two different data resources: (i) the HateBR 2.0 corpus, which is composed of 7,000 comments extracted from Brazilian politicians’ accounts on Instagram and manually annotated a binary class (offensive versus non-offensive) and hate speech targets. It consists of an updated version of the HateBR corpus, in which highly similar and one-word comments were replaced; and (ii) the multilingual offensive lexicon (MOL), which consists of 1,000 explicit and implicit terms and expressions annotated with context information. The lexicon also comprises native-speaker translations and its cultural adaptations in English, Spanish, French, German, and Turkish. Both corpus and lexicon were annotated by three different experts and achieved high inter-annotator agreement. Lastly, we implemented baseline experiments on the proposed data resources. Results demonstrate the reliability of data outperforming baseline dataset results in Portuguese, besides presenting promising results for hate speech detection in different languages.
Though social media helps spread knowledge more effectively, it also stimulates the propagation of online abuse and harassment, including hate speech. It is crucial to prevent hate speech since it may have serious adverse effects on both society and individuals. Therefore, it is not only important for models to detect these speeches but to also output explanations of why a given text is toxic. While plenty of research is going on to detect online hate speech in English, there is very little research on low-resource languages like Hindi and the explainability aspect of hate speech. Recent laws like the “right to explanations” of the General Data Protection Regulation have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we create the first interpretable benchmark hate speech corpus hate speech explanation (HHES) in the Hindi language, where each hate post has its stereotypical bias and target group category. Providing descriptions of internal stereotypical bias as an explanation of hate posts makes a hate speech detection model more trustworthy. Current work proposes a commonsense-aware unified generative framework, CGenEx, by reframing the multitask problem as a text-to-text generation task. The novelty of this framework is it can solve two different categories of tasks (generation and classification) simultaneously. We establish the efficacy of our proposed model (CGenEx-fuse) on various evaluation metrics over other baselines when applied to the Hindi HHES dataset.
Disclaimer
The article contains profanity, an inevitable situation for the nature of the work involved. These in no way reflect the opinion of authors.
This study adds to the analogic perspective-taking literature by examining whether an online perspective-taking intervention affects both antisemitic attitudes and behaviors – in particular, engagement with antisemitic websites. Subjects who were randomly assigned to the treatment viewed a 90-s video of a college student describing an experience with antisemitism and reflected on its similarity to their own experiences. In a survey, treated subjects reported greater feelings of sympathy (+29 p.p.), more positive feelings toward Jews, a greater sense that Jews are discriminated against, and more support for policy solutions (+2–4 p.p.). However, these effects did not persist after 14 days. Examining our subjects’ web browsing data, we find a 5% reduction in time spent viewing antisemitic content during the posttreatment period and some limited, suggestive evidence of effects on the number of site visits. These findings provide the first evidence that perspective-taking interventions may affect online browsing behavior.
Chapter 3 makes the case for classifying five grey area examples as hate speech in the ordinary sense of the term based on the global resemblance test. We also argue that Facebook’s community standard on hate speech is ambiguous, inconsistent, and incomplete in relation to these examples, and so recommend specific reforms. Section 3.2 looks at hybrid attacks, which, like personal insults, target specific individuals, but, like prototypical hate speech, also attack the groups to which targeted individuals belong and, go beyond standard protected characteristics. Section 3.3 investigates selective attacks, which are derogatory or insulting words that refer to only a subset of a group or else to an amalgamated set of people who belong to multiple groups. Section 3.4 scrutinises reverse attacks, which involve words used by members of groups typically perceived as victims against groups more usually considered perpetrators. Section 3.5 examines righteous attacks, which we associate with attacks in pursuit of some righteous cause. Finally, Section 3.6 assesses indirect attacks, wherein the speaker intentionally uses a derogatory word to address or target someone whom the speaker knows or presumes to be not a member of the group referred to by the word under its literal or primary meaning.
Chapter 8 pulls together the various analyses and practical recommendations we have made concerning the two concepts of hate speech and deals with potential objections and lingering worries. Section 8.2 focuses on the ordinary concept hate speech and confronts possible objections to our normative conceptualisation. This section also summarises the many recommendations we made concerning how Meta (Facebook, Instagram) should amend and revise its community standard and authoritative internal policy on hate speech, and tries to allay residual doubts the company might have about these recommendations. Section 8.3 turns to consider the legal concept hate speech and deals with potential objections to our limited formal definition of the concept and our analyses of grey area laws including those that appear to fall between hate speech laws and hate crime provisions, and also denialism laws. In addition, we reiterate our guidance on how states can develop a coherent body of hate speech laws in the light of both national perspectives and international norms against hate speech, and offer some reasons why states should place special weight on the latter.
Chapter 7 argues that denialist speech can, and should, be classified as a sui generis form of hate speech in the legal sense of the term. Section 7.2 looks at the many faces of denialist speech, including forms of synchronous denialism that are often overlooked in these debates. Sections 7.3 and 7.4 address the thorny problem of classification and attempt to explain why it matters. In Sections 7.5 and 7.6 we turn to examine in more detail the spread of denialism laws at the domestic level and try to uncover the many different functions or purposes served by such laws. Finally, in Section 7.7 we address two sceptical challenges to our main thesis. First, if denialist speech is rightly classifiable as hate speech, then why were denialism provisions absent from the early landmark international laws dealing with incitement to genocide and hate speech in general? And second, if denialist speech can be considered hate speech despite its absence from landmark international laws, then what about other things, such as defamation of religion, and what prevents our characterisation of hate speech in the legal sense from becoming absurdly capacious?
Chapter 6 explores the distinction between the legal concepts of hate speech and hate crime. Our purpose is not only to shed light on but also to resolve the ambiguity, as well as to further illustrate and stress test our analyses. Sections 6.2 and 6.3 propose that the legal concept hate speech, formally speaking, only refers to laws which create bespoke crimes or other sorts of offences that do not have corresponding or parallel basic or base versions, whereas the legal concept hate crime only refers to laws which identify aggravated crimes that do have corresponding or parallel basic or base versions. Section 6.4 makes several key comparisons and contrasts between the concepts, beyond the merely formal analysis, while Section 6.5 develops an account of why the distinction between hate speech and hate crime matters legally speaking, both for victims and defendants. Finally, Section 6.6 discusses four potential grey areas of hate speech law, namely using threatening words or behaviour to stir up hatred; incitement to commit genocide; incitement to discrimination or violence; and torts and delicts involving racist abuse.
Chapter 1 establishes the context of our project and defends its theoretical and practical importance. Section 1.2 outlines the basic conceptual framework employed in the book, including the distinction between two concepts of hate speech and our twin-track approach to analysing them. We also highlight some of the pay-offs that flow from this conceptual framework. Section 1.3 explains what we mean by ‘grey areas of hate speech’ including identifying three underlying reasons or explanations why certain phenomena might end up falling into these areas, namely moral, semantic, and conceptual. We also try to motivate the significance and value of working to clear up the grey areas. Finally, Section 1.4 introduces and attempts to respond to the sceptical challenge that says, because the term ‘hate speech’ is linked to conceptual ambiguities, misleading connotations, an explosion of applications, and politicisation, it would be better to dispense with both the term and its concepts. We critically examine five main ways of responding to this sceptical challenge: rehabilitation, downsizing, abandonment, replacement, and enhanced understanding. We defend the final response as being the most promising and the overarching goal of the book.
Chapter 2 identifies prototypical examples of hate speech and seeks to explain what makes them such. Section 2.2 lists the original examples of hate speech cited in Mari Matsuda’s seminal article on the legal concept. We then explain how, even though the ordinary and legal concepts of hate speech share paradigmatic examples, the ordinary concept now has its own extended body of exemplars. Section 2.3 attempts to plot the complex pattern of overlapping and criss-crossing similarities among these exemplars. Section 2.4 looks in more depth at one of the paradigmatic examples of hate speech, namely racial slurs such as ‘nigger’. We highlight similarities it shares with other prototypical examples of hate speech. Finally, Section 2.5 defends a particular account of what it means for a new example to have enough similarities with exemplars to count as hate speech. If there are enough similarities across at least four out of five of the distinguishing qualities of target, style, message, act, and effect, then this conceptually justifies applying the phrase ‘x is also hate speech’ to the new example. We dub this the global resemblance test.
Chapter 5 seeks to orient the ordinary and legal concepts of hate speech. Section 5.2 uncovers various ways in which the ordinary and legal concepts of hate speech come together, including in terms of the kinds of speech they both count as hate speech. In Section 5.3, however, we turn to consider the potential sources of divergence between the ordinary and legal concepts of hate speech including the differing social functions or purposes played by the two concepts. Section 5.4 addresses the nature of the relationship and interaction between the ordinary and legal concepts of hate speech. Finally, in Section 5.5 we try to show why theoretical disagreements about the relationship between the ordinary and legal concepts of hate speech matter. In particular, we argue that uncovering these deeper disagreements can help to explain both the source of some academic controversies about the legitimacy of hate speech laws and the source of some wider public debates about the rights and wrongs of social media platform content policies on hate speech.
Chapter 4 defends classifying a further five grey area examples as hate speech in the ordinary sense of the term under the global resemblance test. We shall also critically examine Facebook’s community standard on hate speech in relation to its handling of these kinds of attacks, and make specific recommendations to address relevant weaknesses. Section 4.2 looks at what we call identity attacks. Section 4.3 investigates existential denials, namely statements denying the very existence of people identified by a protected characteristic. Section 4.4 scrutinises identity denials, by which we mean statements denying that certain people are who they take themselves to be, based on protected characteristics. Section 4.5 examines identity miscategorisations, which go one step further and attribute identities to people that do not match the identities they take themselves to possess, based on protected characteristics. Finally, Section 4.6 assesses identity appropriations, wherein people adopt elements of the identities of other people, based on protected characteristics, but without claiming to possess the relevant identities.
No serious attempt to answer the question 'What is hate speech?' would be complete without an exploration of the outer limits of the concept(s). This book critically examines both the ordinary and legal concepts of hate speech, contrasting social media platform content policies with national and international laws. It also explores a range of controversial grey area examples of hate speech. Part I focuses on the ordinary concept and looks at hybrid attacks, selective attacks, reverse attacks, righteous attacks, indirect attacks, identity attacks, existential denials, identity denials, identity miscategorisations, and identity appropriations. Part II concentrates on the legal concept. It considers how to distinguish between hate speech and hate crime, and examines the precarious position of denialism laws in national and international law. Together, the authors draw on conceptual analysis, doctrinal analysis, linguistic analysis, critical analysis, and diachronic analysis to map the new frontiers of the concepts of hate speech.
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets—analyzing key cultural differences. We observed a constant usage of these emojis to represent offensiveness throughout different timespans on Twitter. We manually annotate and publicly release the largest Arabic dataset for offensive, fine-grained hate speech, vulgar, and violence content. Furthermore, we benchmark the dataset for detecting offensiveness and hate speech using different transformer architectures and perform in-depth linguistic analysis. We evaluate our models on external datasets—a Twitter dataset collected using a completely different method, and a multi-platform dataset containing comments from Twitter, YouTube, and Facebook, for assessing generalization capability. Competitive results on these datasets suggest that the data collected using our method capture universal characteristics of offensive language. Our findings also highlight the common words used in offensive communications, common targets for hate speech, specific patterns in violence tweets, and pinpoint common classification errors that can be attributed to limitations of NLP models. We observe that even state-of-the-art transformer models may fail to take into account culture, background, and context or understand nuances present in real-world data such as sarcasm.
Disinformation, hate speech and political polarization are evident problems of the growing relevance of information and communication technologies (ICTs) in current societies. To address these issues, decision-makers and regulators worldwide discuss the role of digital platforms in content moderation and in curtailing harmful content produced by third parties. However, intermediary liability rules require a balance that avoids the risks arising from the circulation at scale of harmful content and the risks of censorship if excessive burdens force content providers to adopt a risk-averse posture in content moderation. This piece examines the trend of altering intermediary liability models to include ‘duty of care’ provisions, describing three models in Europe, North America and South America. We discuss how these models are being modified to include greater monitoring and takedown burdens on internet content providers. We conclude with a word of caution regarding this balance between censorship and freedom of expression.