We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In most social psychological studies, researchers conduct analyses that treat participants as a random effect. This means that inferential statistics about the effects of manipulated variables address the question whether one can generalize effects from the sample of participants included in the research to other participants that might have been used. In many research domains, experiments actually involve multiple random variables (e.g., stimuli or items to which participants respond, experimental accomplices, interacting partners, groups). If analyses in these studies treat participants as the only random factor, then conclusions cannot be generalized to other stimuli, items, accomplices, partners, or groups. What are required are mixed models that allow multiple random factors. For studies with single experimental manipulations, we consider alternative designs with multiple random factors, analytic models, and power considerations. Additionally, we discuss how random factors that vary between studies, rather than within them, may induce effect size heterogeneity, with implications for power and the conduct of replication studies.
This chapter focuses on experimental designs, in which one or more factors are randomly assigned and manipulated. The first topic is statistical power or the likelihood of obtaining a significant result, which depends on several aspects of design. Second, the chapter examines the factors (independent variables) in a design, including the selection of levels of a factor and their treatment as fixed or random, and then dependent variables, including the selection of items, stimuli, or other aspects of a measure. Finally, artifacts and confounds that can affect the validity of results are addressed, as well as special designs for studying mediation. A concluding section raises the possibility that traditional conceptualizations of design – generally focusing on a single study and on the question of whether a manipulation has an effect – may be inadequate in the current world where multiple-study research programs are the more meaningful unit of evidence, and mediational questions are often of primary interest.
This chapter introduces a research design to study the effects of community policing. The chapter introduces the Metaketa model of multi-site trials, which are used to answer questions relevant to policy using coordinated experiments in which the same intervention is randomly assigned to units in multiple contexts and the same outcomes are measured to estimate effects. In specific, the chapter introduces how the six countries were selected for study and describes their characteristics in terms of crime and policing and then how the interventions were selected and harmonized across the settings and how they compare to community policing policies in the world. The remainder of the chapter details the experimental design, from how police beats and units are sampled, how community policing intervention was randomly assigned, how outcomes were measured and harmonized, how effects were estimated for each site and then averaging across sites, and how we planned to address threats to inference.
from
Part I
-
The Philosophy and Methodology of Experimentation in Sociology
Davide Barrera, Università degli Studi di Torino, Italy,Klarita Gërxhani, Vrije Universiteit, Amsterdam,Bernhard Kittel, Universität Wien, Austria,Luis Miller, Institute of Public Goods and Policies, Spanish National Research Council,Tobias Wolbring, School of Business, Economics and Society at the Friedrich-Alexander-University Erlangen-Nürnberg
This chapter focuses on different research designs in experimental sociology. Most definitions of what constitutes an experiment converge on the idea that the experimenter "control" the phenomenon under investigation, thereby setting the conditions under which the phenomenon is observed and analyzed. Typically, the researcher exerts experimental control by creating two situations that are virtually identical, except for one element that the researcher introduces or manipulates in only one of the situations. The purpose of this exercise is to observe the effects of such manipulation by comparing it with the outcomes of the situation in which the manipulation is absent. One way to look at how the implementation of this rather straightforward exercise produces a variety of designs is by focusing on the relationship that experimental design bears with the theory that inspires it. Therefore, we begin this chapter with a discussion of the relationship between theory and experimental design before turning to a description of the most important features of various types of designs. The chapter closes with a short overview of experiments in different settings such as laboratory, field, and multifactorial survey experiments.
Diabetes and depression have a bidirectional relationship, but some antidepressants (such as the tricyclics) may have detrimental effects in diabetes that are exacerbated by behavioural changes associated with depression. This month's Cochrane Review evaluated the efficacy of psychological and pharmacological treatments of comorbid depression in diabetes and found that such interventions have a moderate and clinically significant effect on depression outcomes in people with diabetes. However, conclusions were limited by significant heterogeneity within examined populations and interventions, and significant risk of bias within trials. This commentary critically appraises the review and aims to contextualise its findings.
When researchers design an experiment, they usually hold potentially relevant features of the experiment constant. We call these details the “topic” of the experiment. For example, researchers studying the impact of party cues on attitudes must inform respondents of the parties’ positions on a particular policy. In doing so, researchers implement just one of many possible designs . Clifford, Leeper, and Rainey (2023. “Generalizing Survey Experiments Using Topic Sampling: An Application to Party Cues.” Forthcoming in Political Behavior. https://doi.org/10.1007/s11109-023-09870-1) argue that researchers should implement many of the possible designs in parallel—what they call “topic sampling”—to generalize to a larger population of topics. We describe two estimators for topic-sampling designs: First, we describe a nonparametric estimator of the typical effect that is unbiased under the assumptions of the design; and second, we describe a hierarchical model that researchers can use to describe the heterogeneity. We suggest describing the heterogeneity across topics in three ways: (1) the standard deviation in treatment effects across topics, (2) the treatment effects for particular topics, and (3) how the treatment effects for particular topics vary with topic-level predictors. We evaluate the performance of the hierarchical model using the Strengthening Democracy Challenge megastudy and show that the hierarchical model works well.
Researchers are often interested in whether discrimination on the basis of racial cues persists above and beyond discrimination on the basis of nonracial attributes that decision makers—e.g., employers and legislators—infer from such cues. We show that existing audit experiments may be unable to parse these mechanisms because of an asymmetry in when decision makers are exposed to cues of race and additional signals intended to rule out discrimination due to other attributes. For example, email audit experiments typically cue race via the name in the email address, at which point legislators can choose to open the email, but cue other attributes in the body of the email, which decision makers can be exposed to only after opening the email. We derive the bias resulting from this asymmetry and then propose two distinct solutions for email audit experiments. The first exposes decision makers to all cues before the decision to open. The second crafts the email to ensure no discrimination in opening and then exposes decision makers to all cues in the body of the email after opening. This second solution works without measures of opening, but can be improved when researchers do measure opening, even if with error.
We aimed to assess whether viewing expert witness evidence regarding the mental health of Johnny Depp and Amber Heard in the 2022 court case in the USA would affect viewers’ attitudes towards the mental health of the two protagonists and towards mental illness in general. After viewing excerpts of the cross-examination evidence, 38 trial-naive undergraduate students completed the Prejudice towards People with a Mental Illness (PPMI) scale.
Results
Following viewing, participants held more stigmatising views of the protagonists than they held about mental disorders in general.
Clinical implications
It is plausible that mass media trial coverage further stigmatises mental illness.
Productive scholars prioritize research and use productive research approaches. How else could some produce ten or more publications per year and hundreds over their career? Productive scholars spend about half their work days focused on research, usually preserving the morning hours for research and writing, because those are their top priority and scholars want to give them their full attention when they are most alert. Productive scholars rarely publish alone. They collaborate on nearly 90 percent of their publications. Benefits of collaboration include the division of labor, multiple viewpoints, quicker outputs, and working on several projects simultaneously. Productive scholars typically juggle a half-dozen projects or more, in various phases of completion. They often seek grants that help them do more and better research. They also find publication opportunities by occasionally mining existing data sets, conducting meta-analyses, and composing literature reviews and conceptual pieces. Their research is marked by good research questions that are feasible to carry out with simple but powerful research designs. Productive scholars are self-regulatory, carefully monitoring progress and adjusting their approach as needed. Still, they occasionally fail, as all do. They are not disheartened, knowing that failure is their catalyst and success guide.
The bumblebee gut parasite, Crithidia bombi, is widespread and prevalent in the field. Its interaction with Bombus spp. is a well-established epidemiological model. It is spread faecal-orally between colonies via the shared use of flowers when foraging. Accurately measuring the level of infection in bumblebees is important for assessing its distribution in the field, and also when conducting epidemiological experiments. Studies generally use 1 of 2 methods for measuring infection. One approach measures infection in faeces whereas the other method measures infection in guts. We tested whether the method of measuring infection affected the estimation of infection. Bumblebees were inoculated with a standardized inoculum and infection was measured 1 week later using either the faecal or gut method. We found that when the gut method was used to measure infection intensity estimates were significantly different to and approximately double those from the faecal method. These results have implications for the interpretation of previous study results and for the planning of future studies. Given the importance of bumblebees as pollinators, the impact of C. bombi on bumblebee health, and its use as an epidemiological model, we call on researchers to move towards consistent quantification of infections to enable future comparisons and meta-analyses of studies.
Even in the best-designed experiment, noncompliance can complicate analysis. While the intent-to-treat effect remains identified, randomization alone no longer identifies the complier average causal effect (CACE). Instrumental variables approaches, which rely on the exclusion restriction, can suffer from high variance, particularly when the experiment has a low compliance rate. We provide a framework which broadens the set of design and analysis techniques political science researchers can use when addressing noncompliance. Building on the growing literature about the advantages of ex-ante design decisions to improve precision, we show blocking on variables related to both compliance and the outcome can greatly improve all the estimators we propose. Drawing on work in statistics, we introduce the principal ignorability assumption and a class of principal score weighting estimators, which can exhibit large gains in precision in low compliance settings. We then combine principal ignorability and blocking with a simple estimation strategy to derive a more efficient estimation strategy for the CACE. In a re-evaluation of a study on the effect of GOTV on turnout, we find that the principal ignorability approaches result in confidence intervals roughly half the size of traditional instrumental variable approaches.
This chapter details the practical, theoretical, and philosophical aspects of experimental science. It discusses how one chooses a project, performs experiments, interprets the resulting data, makes inferences, and develops and tests theories. It then asks the question, "are our theories accurate representations of the natural world, that is, do they reflect reality?" Surprisingly, this is not an easy question to answer. Scientists assume so, but are they warranted in this assumption? Realists say "yes," but anti-realists argue that realism is simply a mental representation of the world as we perceive it, that is, metaphysical in nature. Regardless of one's sense of reality, the fact remains that science has been and continues to be of tremendous practical value. It would have to be a miracle if our knowledge and manipulation of the nature were not real. Even if they were, how do we know they are true in an absolute sense, not just relative to our own experience? This is a thorny philosophical question, the answer to which depends on the context in which it is asked. The take-home message for the practicing scientist is "never assume your results are true."
The majority of research papers in computer-assisted language learning (CALL) report on primarily quantitative studies measuring the effectiveness of pedagogical interventions in relation to language learning outcomes. These studies are frequently referred to in the literature as experiments, although this designation is often incorrect because of the approach to sampling that has been used. This methodological discussion paper provides a broad overview of the current CALL literature, examining reported trends in the field that relate to experimental research and the recommendations made for improving practice. It finds that little attention is given to sampling, even in review articles. This indicates that sampling problems are widespread and that there may be limited awareness of the role of formal sampling procedures in experimental reasoning. The paper then reviews the roles of two key aspects of sampling in experiments: random selection of participants and random assignation of participants to control and experimental conditions. The corresponding differences between experimental and quasi-experimental studies are discussed, along with the implications for interpreting a study’s results. Acknowledging that genuine experimental sampling procedures will not be possible for many CALL researchers, the final section of the paper presents practical recommendations for improved design, reporting, review, and interpretation of quasi-experimental studies in the field.
This chapter provides an accessible introduction to experimental methods for social and behavioral scientists. We cover the process of experimentation from generating hypotheses through to statistical analyses. The chapter discusses classical issues (e.g., experimental design, selecting appropriate samples) but also more recent developments that have attracted the attention of experimental researchers. These issues include replication, preregistration, online samples, and power analyses. We also discuss the strengths and weaknesses of experimental methods. We conclude by noting that, for many research questions, experimental methods provide the strongest test of hypothesized causal relationships. Furthermore, well-designed experiments can elicit the same mental processes as in the real world; this typically makes them generalizable to new people and real-life situations.
A strong participant recruitment plan is a major determinant of the success of human subjects research. The plan adopted by researchers will determine the kinds of inferences that follow from the collected data and how much it will cost to collect. Research studies with weak or non-existent recruitment plans risk recruiting too few participants or the wrong kind of participants to be able to answer the question that motivated them. This chapter outlines key considerations for researchers who are developing recruitment plans and provides suggestions for how to make recruiting more efficient.
Conventional models of voting behavior depict individuals who judge governments for how the world unfolds during their time in office. This phenomenon of retrospective voting requires that individuals integrate and appraise streams of performance information over time. Yet past experimental studies short-circuit this 'integration-appraisal' process. In this Element, we develop a new framework for studying retrospective voting and present eleven experiments building on that framework. Notably, when we allow integration and appraisal to unfold freely, we find little support for models of 'blind retrospection.' Although we observe clear recency bias, we find respondents who are quick to appraise and who make reasonable use of information cues. Critically, they regularly employ benchmarking strategies to manage complex, variable, and even confounded streams of performance information. The results highlight the importance of centering the integration-appraisal challenge in both theoretical models and experimental designs and begin to uncover the cognitive foundations of retrospective voting.
Many animal preference experiments involve test stimuli that have been chosen by the experimenter to represent different strengths of a single attribute. It is assumed that the animals also scale the test stimuli along a single dimension. This paper shows how it is possible to use the ‘Unfolding Technique' developed by Coombs (1964) to check the validity of this assumption. A simple experiment is described which used Coombs' technique to verify that three visual test stimuli were ranked by laboratory rats along a single dimension. These stimuli were subsequently used in an experiment to see how different housing conditions changed rats' preferences for visual complexity.
Although the physiological and behavioural changes that can indicate poor welfareare generally agreed upon, using these measures in practice sometimes yieldsresults that are hard to interpret. For example, different types of measure maysuggest quite different things about an animal's welfare. Suchcontradictions are often due to the differing properties of the variables beingmeasured. How each variable responds to a stressor can be affected by severalfactors - the type of unpleasant stimulus to which the animal is exposed; whenand for how long exposure occurs; the animal's psychological state, egdoes it feel that it is in control?; and the time at which the measurement ismade, relative to the stressor. Typical responses also often differ betweenspecies and between individuals, and may even change in a single individual overtime. Furthermore, some responses used to assess welfare lack specificity: theycan be elicited by neutral or even pleasant events as well as by aversive ones.Appreciating these factors is vital when designing experiments, when choosingwhat to measure along with each welfare variable, and when interpreting results.Even after taking these factors into consideration, interpreting a result canstill be difficult. One approach then is to consider the effects on welfare ofthe changes measured, eg if there is immunosuppression, does the animal succumbto disease? Another is to use the animal's behaviour to indicate itspreference for, or aversion to, particular environments. Ultimately, however,interpreting welfare measures involves subjective judgements which will beinfluenced by the nature of our concern for the animal under consideration. Byraising these problems, we hope that this review will highlight and clarify theapparent contradictions that sometimes emerge in scientific studies of animalwelfare, and help researchers improve the designs of their experiments for thebenefit of the animals concerned.
Prospect Theory (PT: Kahneman & Tversky, 1979) of risky decision making is based on psychological phenomena (paradoxes) that motivate assumptions about how people react to gains and losses, and how they weight outcomes with probabilities. Recent studies suggest that people’s numeracy affect their decision making. We therefore conducted a large-scale conceptual replication of the seminal study by Kahneman and Tversky (1979), where we targeted participants with larger variability in numeracy. Because people low in numeracy may be more dependent on anchors in the form of other judgments we also manipulated design type (within-subject design, vs. single-stimuli design, where participants assess only one problem). The results from about 1,800 participants showed that design type had no effect on the modal choices. The rate of replication of the paradoxes in Kahneman and Tversky was poor and positively related to the participants’ numeracy. The Probabilistic Insurance Effect was observed at all levels of numeracy. The Reflection Effects were not fully replicated at any numeracy level. The Certainty and Isolation Effects explained by nonlinear probability weighting were replicated only at high numeracy. No participant exhibited all 9 paradoxes and more than 50% of the participants exhibited at most three of the 9 paradoxes. The choices by the participants with low numeracy were consistent with a shift towards a cautionary non-compensatory strategy of minimizing the risk of receiving the worst possible outcome. We discuss the implications for the psychological assumptions of PT.
Determination of sample size (the number of replications) is a key step in the design of an observational study or randomized experiment. Statistical procedures for this purpose are readily available. Their treatment in textbooks is often somewhat marginal, however, and frequently the focus is on just one particular method of inference (significance test, confidence interval). Here, we provide a unified review of approaches and explain their close interrelationships, emphasizing that all approaches rely on the standard error of the quantity of interest, most often a pairwise difference of two means. The focus is on methods that are easy to compute, even without a computer. Our main recommendation based on standard errors is summarized as what we call the 1-2-3 rule for a difference of two treatment means.