6.1 Introduction
Social scientists and policy-makers care deeply about their ability to draw clear causal inferences from research – and justifiably so. But descriptive accuracy also matters profoundly for the success of this enterprise. Correctly identifying relevant parties, choice points, and perceptions, for example, strongly impacts our ability to understand sources of influence on development outcomes, successfully disrupt and overcome obstacles, and identify scope conditions. The challenge is how to tease out this kind of information in interview-based qualitative research.
This chapter draws on a decade of experience in developing policy implementation case studies under the auspices of a Princeton University program called Innovations for Successful Societies in order to highlight ways to address some of the most common difficulties in achieving descriptive accuracy. The program responded to a need, in the mid-2000s, to enable people leading public sector reform and innovation in low-income and low-middle-income countries to share experiences and evolve practical wisdom suited to context. To develop reasonably accurate portrayals of the reform process and create accurate after-action reports, the program carried out in-depth interviews with decision-makers, their deputies, and the other people with whom they engaged, as well as critics. The research employed intensive conversation with small-N purposive samples of public servants and politicians as a means of data collection.Footnote 1 In the eyes of some, this interview-generated information was suspect because it was potentially vulnerable to bias or gloss, fading or selective memory, partial knowledge, and the pressures of the moment. Taking these concerns seriously, the program drew on research about the interaction between survey design and respondent behavior and evolved routines to boost the robustness of the interviews it conducted.
6.2 Background
Accuracy is related to reliability, or the degree of confidence we have that the same description would emerge if someone else reviewed the data available or conducted additional interviews.Footnote 2 Our methods have to help us come as close as possible to the “true value” of the actual process used to accomplish something or the perceptions of the people in the room where a decision took place. How do we ensure that statements about processes, decisions, actions, preferences, judgments, and outcomes acquired from interviews closely mirror actual perceptions and choices at the time an event occurred?
In this chapter, I propose that we understand the interview process as an exercise in theory building and theory testing. At the core is a person we ask to play dual roles. On the one hand, our interviewees are a source of facts about an otherwise opaque process. On the other hand, the people we talk to are themselves the object of research. To borrow the words of my colleague Tommaso Pavone, who commented on a draft of this chapter, “The interviewee acts like a pair of reading glasses, allowing us to see an objective reality that is otherwise inaccessible.” At the same time, however, we are also interested in that person’s own perceptions, and subjective interpretations of events, and motivations. “For example,” Pavone said, “we might want to know whether and why a given meeting produced divergent interpretations about its relative collegiality or contentiousness, and we might subsequently probe how the interviewee’s positionality, personality, and preferences might have affected their views.”
In both instances, there are many potential confounding influences that might blur the view. Some threats stem from the character of the subject matter – whether it is comparatively simple or causally dense (in Michael Woolcock’s terms, i.e., subject to many sources of influence, interactions, and feedback loops; see Chapter 5, this volume), whether it is socially or politically sensitive, or whether the person interviewed is still involved in the activity and has friends and relatives whose careers might be affected by the study. Other threats emanate from the nature of the interviewee’s exposure to the subject matter, such as the amount of time that has elapsed since the events (memory), the extent of contact (knowledge), and the intensity of engagement at the time of the events. Still other influences may stem from the interview setting: rapport with the researcher, the order and phrasing of questions, whether there is a risk of being overheard, and the time available to expand on answers, for example.
Our job as case writers is to identify those influences and minimize them in order to get as close as possible to a true account, much as we do in other kinds of social science. And there are potential trade-offs. A source may be biased in an account of the facts, but, as Pavone suggested in his earlier comments, “he may be fascinating if we treat him as a subjective interpreter,” whose gloss on a subject may reveal how decision-makers rationalized intense investment in a particular outcome or how they responded to local norms in the way they cloaked discord.
The following section of this chapter treats the pursuit of descriptive accuracy as an endeavor very closely aligned with the logic used in other types of research. Subsequent sections outline practices we can mobilize to ensure a close match between the information drawn from interviews and objective reality – whether of a process or of a perspective.
6.3 The Interview as Social Science Research
An interview usually aims to take the measure of something, and any time a scientist takes a measurement, there is some risk of error, systematic or random. We can try to reduce this problem by modeling the effects of our instruments or methods on the information generated. The goal is to refine the interview process so that it improves the accuracy and completeness of recall, whether of events and facts or of views.
First, let’s step back a bit and consider how theory fuels qualitative interviewing, for even though we often talk about this kind of research as inductive, a good interview is rarely free-form. A skilled interviewer always approaches a conversation self-consciously, with a number of key ideas and hypotheses about the subject and the interviewee’s relationship to that subject in mind. The inquiry is exploratory, but it has a strong initial deductive framework.
This process proceeds on three dimensions simultaneously, focused at once on the subject matter; the interviewee’s preferences, perceptions, and biases; and the need to triangulate among competing accounts.
Theory and interview focus.
The capacity to generate good description draws heavily on being able to identify the general, abstract problem or problems at the core of what someone is saying and to quickly ask about the conditions that likely set up this challenge or the options likely tried. At the outset, the interviewer has presumably already thought hard about the general focus of the conversation in the kinds of terms Robert Weiss outlines in his helpful book, Learning from StrangersFootnote 3 – for example, describing a process, learning how someone interpreted an event, presenting a point of view, framing hypotheses and identifying variables, etc. In policy-focused case studies, we begin by identifying the broad outcomes decision-makers sought and then consider hypotheses about the underlying strategic challenges or impediments decision-makers were likely to encounter. Collective action? Coordination across institutions? Alignment of interests or incentives (principal–agent problems)? Critical mass? Coordination of social expectations? Capacity? Risk mitigation? Spoilers? All of the above? Others?
Locking down an initial general understanding – “This is a case of what?” – helps launch the conversation: “As I understand it, you faced ___ challenge in order to achieve the outcomes the program was supposed to generate. How would you characterize that challenge? … This problem often has a couple of dimensions [fill in]. In what ways were these important here, or were they not so important?” Asking follow-up questions in order to assess each possible impediment helps overcome problems of omission, deliberate or inadvertent. In this sense, accuracy is partly a function of the richness of the dialogue between the interviewer’s theoretical imagination, the questions posed, and the answers received.
The interview then proceeds to document the steps taken to address each component problem, and here, again, theory is helpful. The characterization of the core strategic challenges spawns a set of hypotheses. For example, if the key delivery challenge is the need for collective action, then we know we will have to ask questions to help assess whether the outcome sought was really a public good as well as how the decision-maker helped devise a solution, including (most likely) a way to reduce the costs of contributing to the provision of that public good, a system for monitoring contributions, or whether there was one person or organization with an exceptional stake in the outcome and therefore a willingness to bear the costs. In short, the interview script flows in large part from a sense of curiosity informed by theory.
Theory also helps us think about the outcomes we seek to explain in a case study and discuss in an interview. In policy-relevant research we are constantly thinking in terms of measures or indicators, each of which has an imperfect relationship with the overarching development outcome we seek to explain. A skilled interviewer comes to a conversation having thought deeply about possible measures to evaluate the success or failure of an action and asks not only how the speaker, the interviewee, thought about this matter, but also whether any of these plausible measures were mooted, understanding that the public servants or citizens involved in a program may have had entirely different metrics in mind.
Often the outcomes are not that easy to measure. Nonetheless, we want something more concrete than a personal opinion, a thumbs up or thumbs down. To take one example, suppose the aim of a case study is to trace the impact of cabinet management or cabinet design on “ability of factions to work together.” This outcome is not easy to assess. Certainly, we want to know how people themselves define “ability to work together,” and open-ended questions are initially helpful. Instead of nudging the interviewee’s mind down a particular path, the interviewer allows people to organize their own thoughts.Footnote 4 But a skilled interviewer then engages the speaker to try to get a better sense of what it was about people’s perceptions or behavior that really changed, if anything. In the abstract it is possible to think of several possible measures: how long it took to arrange meetings to coordinate joint initiatives, how often requested meetings actually took place, how often the person interviewed was included in subcabinet deliberations that involved the other political parties, whether the person interviewed felt part of the decision process, whether the deputy minister (always from the other party in this instance) followed through on action items within the prescribed timeframe, whether there was name-calling in the meeting room or fistfights, etc.
An interviewer also wants to figure out whether the theory of change that motivated a policy intervention actually generated the outcomes observed. Again, theory plays a role. It helps to come to an interview with plausible alternative explanations in mind. In the example above, maybe power-sharing had little to do with reducing tension among faction leaders. Instead, war weariness, a shift in public opinion, a sharp expansion of economic opportunity outside government that reduced the desire to remain in government, the collapse of factional differences in the wake of demographic change, the personality of the head of government – any of these things might also have accounted for the results. A skilled interviewer devises questions to identify whether any of these causal dynamics were in play and which facts would help us understand the relative importance of one explanation versus the others.
Let me add a caveat at this point. Theoretically informed interviewing leads us to the kinds of descriptive detail important for analysis and understanding. However, it is also crucial to remember that our initial frameworks, if too narrowly defined, can cause us to lose the added value that interviews can generate. As Albert Reference HirschmanHirschman (1985) noted years ago, paradigms and hypotheses can become becoming a straightjacket (‘confirmation bias’), and the unique contribution of interview-based research is that it can foster a dialogue that corrects misimpressions. Openness to ideas outside the interview script is important for this reason
For example, understanding the source of political will is important in a lot of policy research, but sometimes the most important outcome the lead decision-maker wants to achieve is not the one that most people associated with a policy know about or share. Say we want to use interview-based cases to help identify the conditions that prompt municipal public works programs and other city services to invest in changes that would improve access to early childhood development services. It soon becomes clear that the mayors who had made the most progress in promoting this kind of investment and collaboration sought outcomes that went well beyond boosting children’s preparedness for preschool, the initial supposition, and, moreover, each wanted to achieve something quite distinctive. For some, the larger and longer-term aim was to reduce neighborhood violence, while for others the ambition was to diminish inequality or boost social capital and build trust. The open-ended question “Why was this program important to you?” helps leverage this insight.
Theory and the interview process
. Interviewing employs theory in a second sense as well. To reveal what really happened, we have to weed out the details people have remembered incorrectly while filling in the details some never knew and others didn’t consider important, didn’t want to highlight, or simply forgot. Therefore, in the context of an interview, it is the researcher’s job not only to seek relevant detail about processes, but also to perceive the gaps and silences and use additional follow-up questions or return interviews to secure explanations or elaboration.
In this instance, the researcher navigates through a series of hypotheses about the speaker’s relationship to the issue at hand and knowledge of events. A few of the questions that leverage information for assessing or weighting answers include:
“You had a peripheral role in the early stages of these deliberations/this implementation process, as I understand it. How did you learn about the rationale for these decisions? [Co-workers on the committee? Friends? Briefed by the person who led the committee or kept the minutes? Gleaned this information as you began to participate?] How would you say that joining the deliberation/process/negotiation late colored your view of the issues and shaped your actions, or did it not make much difference?”
“You were involved in a lot of difficult decisions at the time. How closely involved were you in this matter? Did you spend a lot of time on it? Was it an especially high priority for you, or was it just part of your daily work?” “Given all the difficult matters you had to deal with at the time, how greatly did this issue stand out, or is it hard to remember?” (Level of knowledge helps the interviewer weight the account when trying to integrate it with other information.)
“The other people involved in this decision/process/negotiation had strong ties to political factions. At least some of them must have tried to influence you. At what stages and in what form did these kinds of pressures arise?” “Were some voices stronger than others?” “How would you say these lobbying efforts affected your decision/work/stance?”
“As I understand it, you took a decision/action that was unusual/worked against your personal interest/was sure to be unpopular with some important people. How would you characterize your reasons for doing so?”
The information these questions leverage helps the case writer assess the likely accuracy of an account, in at least three ways: First, it helps us understand whether someone was in a position to know or heard about an action secondhand. Second, it helps us assess the integrity of a response – for example, does a statement run contrary to the speaker’s obvious personal interests and is it therefore more believable? Third, it can also help spot purely opportunistic spin: Is the view expressed consistent with the speaker’s other actions and attitudes, or is at odds with these? Can the person offer a clear story about this divergence, or did this perception or preference evolve in association with a promotion, an election, or some other event that may have influenced behavior?
Theory and ability to arbitrate among competing statements.
The third task of interview-based case research is to meld information garnered from different conversations and other types of sources in order to triangulate to the truth. This too is a theory-driven enterprise. Every time there is a clash between two assertions, we ask ourselves the familiar refrain “what could be going on here?” (hypothesis formation, drawn from theory), “how would I know?” (observable measures), and “by what method can I get this information?” (framing a follow-up question, checking a news report, consulting a register, etc.).
We may weigh the account of someone who joined a process late less heavily if it clashes with the information provided by those closer to a process, but it could be that the latecomer is less vulnerable to groupthink, has no reputation at stake, and offers a clearheaded story. Maybe we know the person was brought in as a troubleshooter and carried out a careful review of program data, or that the person is highly ambitious, eager to appear the hero who saved a failing initiative that, in fact, had not performed as badly as stated? Career paths, reputational information, and the written record – for example, longitudinal performance data – can all assist in making sense of disparate accounts.
This thought process may have to take place in the context of an interview as we listen and form follow-up questions, but it can also fuel exit interviews or second conversations designed both to provide another occasion to relate events remembered after the first encounter and to afford a chance to react to divergent information or ideas others may have voiced. This is the task of the exit interview.
To stress that skilled interviewing is theory-driven does not mean social scientists do a better job than journalists. Journalists might call the same kind of thought process “intuition” or “savvy,” but, when asked to step back, be self-conscious, and break down the mental exercise involved, the reality of what they do differs little from how social scientists or historians proceed in their work. The editor who tells a cub reporter, “I smell a rat” upon hearing the sketch of a story is positing an alternative hypothesis to test the adequacy of a description. Employing a general model built on experience, the editor pushes the reporter to use evidence to identify what the people described in the reporter’s draft are really doing.Footnote 5 A reporter’s intuition is equivalent to a social scientist’s skill in quickly framing plausible hypotheses and crafting a follow-up question that will yield the evidence to arbitrate among conflicting accounts – conducting social science inquiry “on the fly.”
Regardless of the interviewer’s own background, skilled interviewing places a premium on preparation. Even if the interviewer does not have a full-blown research design, it is crucial to have a preconversation sketch that frames hypotheses about the subject matter of the case and alternative plausible explanations; specifies the role the interviewee played in these events and the level of knowledge or type of gloss that relationship might produce (or at least does so with as much care as possible at this stage); and summarizes what archival sources say about the story line. This practice helps frame the questions that will tease out the evidence that disconfirms, modifies, or corroborates different versions of a story mid-conversation as new facts and observations emerge.
6.4 Improving Recall and Specificity
Solid, substantive preparation alone does not generate the requisite level of detail and accuracy needed in a policy case study. The skilled interviewer also has to overcome barriers to cognition. The people we interview are busy. They often work in a language different from ours. They may not understand what a study is about and what kinds of information the interviewer seeks. Further, like the rest of us, they forget and they tire. As a result, their answers to questions may vary from one interview to the next, making the descriptions we assemble less reliable.
Survey researchers have struggled with these challenges for decades.Footnote 6 They have investigated how people answer questions and how to improve accuracy in responses. Their reflections are helpful for those who do qualitative interviews.
1. One fairly obvious starting point or maxim is to make sure that the interviewee understands the purpose of the project or study and can perceive the level of detail expected. It is common for someone to ask quizzically, “Why would anyone be interested in what I did?”
Helping a speaker understand the intended audience improves motivation and accuracy. With respect to policy-focused interviews, asking someone to help a peer learn from a program’s experience usually changes an interviewee’s mental stance, and enables the person to hone in on the kind of subject matter sought and the level of operational detail needed. In the Princeton program we often used the phrase, “The purpose is to help your counterparts in other countries learn from your experience” in the invitation letter and follow-up, as well as in the lead-in to the interview itself. We also emphasized that “the aim of the case study is to profile the reform you helped design, the steps you took to implement the new system, and the results you have observed so that others can learn from you.” Periodically, we reiterated these points. When an interviewee can imagine a conversation with the person who will use the information, answers are more likely to be specific. It also becomes easier to induce someone to be compassionate and speak honestly about the real problems that arose during a process, so that the target group of readers don’t go astray or fail to benefit from the experience.
2. A second maxim is to ensure questions are clear so the interviewee does not have to struggle with meaning. A long, rambling question that requires energy to parse can sink an interview. By contrast, a simple, open-ended “grand tour” question is often a good place to begin, because many people are natural storytellers, become engaged, and start to focus their comments themselves when given this latitude. In his ethnographic interview classic, for example, Spradely suggests asking “Could you describe what happened that day?” or “Could you tell me how this office works?” Subsequent questions can focus on the elements of special relevance to the subject and may include prompts to reach specific subject matter or the requisite level of detail.Footnote 7
3. In framing questions, we try to avoid ambiguous or culturally loaded terms that increase the amount of mental calculation an answer requires.Footnote 8 How much is “usually” or “regularly”? “Big”? How many years is “young” or “recently”? It may be better to ask, “About how often did that happen during that year?” or “How many times did that happen that year?” Novice interviewers often refer to seasons as they pinpoint the time of an action, but of course these vary globally, so the references merely confuse (moral: benchmark to events or national holidays).
Similarly, we try to eliminate questions that require the interviewee to talk about two different things – the “double-barreled question” or compound question. “Did that step reduce error?” is clear, but “Did that process reduce error and delay?” asks about two dimensions that may not be related, yet it seems to require one answer. In this instance, it does not take much effort to sort out the two dimensions and in an interview context, as opposed to a survey, that is feasible. However, a speaker will have a slightly tougher time with a compound question about a preference, motivation, or interaction: “Was the main challenge to compensate those who would have to alter their farming practices and to help the community monitor illegal deforestation?” “Was this group an obstacle to winning the vote in the legislature and a source of public backlash?” Simple questions and quick follow-ups usually elicit better information than complex questions that ask for views on two or more things at once.
4. The passage of time influences the ability to remember information and potentially also makes it hard to check the reliability of a description. In the 1980s, studies of physician recall found that memory of specific patient visits decayed very rapidly, within two weeks of a visit.Footnote 9 Norma Bradburn and her colleagues reported that about 20 percent of critical details are irretrievable by interviewees after a year and 60 percent are irretrievable after 5 years.Footnote 10 The ability to remember distant events interacts with the salience or importance of the events to the interviewee and with social desirability. A well-received achievement that occurred two years earlier may be easier to remember than something that did not work very well or was not considered important at the time. Using “probes,” or questions that fill in a little detail from archival research, can help break the mental logjam.
Phrasing that takes the interviewee carefully back in time and provides reminders of the events that occurred or the locations in which they occurred may improve recall. Specific dates rarely have the same effect (imagine trying to remember what you were doing in August three years ago). Recall can improve during the course of an interview or after the interviewer has left.
The passage of time may also alter perceptions. Views change, and the interviewee may subconsciously try to harmonize interests, attitudes, or opinions. As in a historical account, what the principal actors knew at the time they recognized a problem and decided what to do is very important to capture accurately, and it may take some extra effort to trigger memory of initial perceptions and how these changed. Here is an example from a series of cases on the 2014 West Africa Ebola Outbreak Response.Footnote 11
Example: Effects of time on accuracy (two interviews conducted in late 2015 about the Ebola response):
Interview 1: Question: “How useful was the US military response to improving logistics capability?” “The US timing was all wrong. The military built emergency treatment centers that were never used because the epidemic ended by the time the centers were ready. The US military action was irrelevant.”
Interview 2: Question: “Let’s go back to August and September 2014 when the outbreak escalated dramatically in Liberia. Could you talk about the impact of the US military on logistics?” “In September 2014, the models said the number of people infected would rise to over a million. The US military prepared for that eventuality. Later the epidemic declined and the ETUs [emergency treatment units] weren’t used, but in the end what seemed to matter to the public was the visible sign that a big power cared, which generated a psychological boost. We hoped the military would be more useful in moving lab materials around but they had instructions not to enter areas where an outbreak had occurred so they just dropped us at the edges of these areas and then we made our way from there.”
There is some truth to both statements but the timestamp in the second question elicited a more complete answer that helped resolve tensions among accounts.
5. Memory of actions taken in a crisis atmosphere, when people may have worked intensely on many different fronts, tends to be less good, emerges in a highly fragmented form with high levels of error, or acquires a gloss. Said one ISS interviewee who had worked intensely on a disaster response, “As we talk, I can feel PTSD [post-traumatic stress disorder] coming back.” Words tumbled out, and the interviewer had to piece together the order in which actions occurred.
In these circumstances, it is helpful to plan one or more return interviews. Between sessions, people will tend to remember more, though their memories may also start to embellish or spin the account. Questions that contain specific information about the circumstances and ask for a reaction may help alleviate that problem. For the researcher, the challenge then becomes integrating the different versions of an event to ensure that they synchronize accurately.
6. Research on how respondents react to surveys suggests that question order can make a big difference in the responses people offer.Footnote 12 Although there is not parallel research on long-form qualitative interviews, it stands to reason that some of the same issues arise in this slightly different context, although it is easier for the interviewer to circle back and ask for views a second time than it might be in a survey, providing a possible corrective.
In designing and modifying the informal script that structures the interviews, it may help to consider how the sequence or juxtaposition of particular questions might influence what people say by inadvertently priming a particular response. For example, if one question focuses attention on the influence an interest group brought to bear on a decision, the answer to an unrelated question may place heavier emphasis on interest groups than it would have in a different question lineup. Sometimes the best cure for this type of spillover is to acknowledge it directly: “We have talked a lot about interest group influence. I want to change the topic and focus on ____, now. Although there may have been some interest group influence, most likely other things were important too, so I encourage you to step back and think about what shaped this decision, more broadly.” An alternative is to shift to a different, more minor topic – or recommend a brief break – before returning to the line of questioning.
In policy research, political or social sensitivity may lead to self-censorship. To lessen this response, while also respecting the risks a speaker faces, it is sometimes possible to sequence questions so that they enable the speaker to articulate a problem in a diplomatic way, threading the needle: “I imagine that people who had invested in that land were upset that the city wanted to build a road there. Did any of those people ever speak about this problem in public? Did any of them ever come here to express their views? I see in the newspapers that politician X owned some land in that area – was he part of the group that objected? Did the program change after this point?”
Pacing sensitive questions may necessitate extra care in order to prevent the interviewee from calling an end to the conversation or from shifting to highly abbreviated responses. If the point of an interview is to acquire information about a particular stage of a negotiation, then it may be better to proceed to that point in the conversation before posing sensitive questions about earlier matters – and then loop back to these other sensitive issues when asking about results or reflections toward the conclusion of the conversation. By that point the interviewer has had a chance to build credibility and signal facility with some of the technical details, making it more likely the speaker feels s/he has had a fair chance to explain actions and views, while also realizing the interviewer is unlikely to be satisfied with a stock answer. Ethics rules that require returning to the speaker for permission to use quotes or identifying information can also assist willingness to speak, provided the speaker trusts that the interviewer will indeed live up to this commitment. (Note that this research ethics commitment runs counter to the standards in journalism, where the emphasis is on conveying publicly important information in real time and not allowing the holders of that information to act as censors.)
Ending on a more positive note is also helpful, both for the well-being of the interviewee and for maintaining the goodwill that makes it possible to return to the conversation later: “You accomplished __, ___, and ____. When you think back on this episode are there other things that make you especially proud/happy/satisfied with the work/____?”
7. Offer the right kinds of rewards. Because it takes a lot of mental energy to respond to questions and because there is no immediate tangible reward, an interview has to generate and sustain motivation. Usually, helping someone understand the important purpose and specific focus increases interest. Most people also want some sense that they are responding with useful information. If they don’t have this sense, they will drop out.
There is a fine line between leading, on the one hand, and nondirective feedback that merely sustains a conversation. A leading question suggests correct answers or reveals the researcher’s point of view. This type of feedback reduces accuracy. By contrast, there are neutral forms of feedback that can motivate and lead the interviewee to persist in answering questions. Reference Cannell, Miller, Oksenberg and LeinhardtCannell, Miller, and Oksenberg (1981: 409–411) suggest a set of four responses that ISS has also found helpful:
“Thanks, that’s useful, OK.”
“I see. This is the kind of information we want.”
“Thanks, you have mentioned ___ things …”
“Thanks, we are interested in details like these …”
Speakers often model the length of their responses on the interviewer’s behavior. Rewarding specificity in a response to an open-ended question early in the interview with “Thanks, we are interested in details like these” can send the right signal (assuming detail is in fact what we want).
6.5 Integrating Streams of Evidence, Arbitrating Differences
In survey research, social scientists aggregate data from multiple respondents by analyzing central tendencies, assessing variance, and then evaluating the influence of causal factors on responses to questions using some type of regression analysis. Although less concerned with central tendencies and average effects, qualitative case study research also has to integrate multiple streams of information from interviews – information about views as well as processes. This stage of the research can catch and reconcile discrepancies or spin, but it can also become a source of error if the researcher incorrectly judges one account to be more truthful than another, with little basis in fact or little transparency about the reasons for privileging a particular point of view.
Arbitrating among conflicting streams of evidence takes place in journalism every day, and the adages journalists follow are equally applicable in social science research. Editors and reporters term the failure to resolve a contradiction or a clash of perspectives as “he said, she said” journalism.Footnote 13 Columbia University journalism professor Jay Rosen, who led the charge against “he said, she said” reporting, offered an illustration. In this instance, a US National Public Radio reporter described a controversy over new reproductive health regulations and said that one group portrayed the rules as “common sense” while another saw them as designed to drive clinics out of business. The reporter laid out each group’s claims and moved on. Rosen cried foul and said the reporter had an obligation to offer a more complete description that gave the reader some sense of the evidence underlying the seemingly disparate claims. This imperative has grown stronger as quality journalism has tried to combat disinformation.
Rosen’s remedies were exactly those his social science counterparts would have offered: hypothesis formation, measurement, and follow-up questions. In this instance, Rosen said, the reporter could have compared the new regulations to those already in place for similar procedures in the same state and to regulations in other jurisdictions so the reader could see whether the claim that the new rules were “common sense” had some basis in fact. The reporter could have read the safety report to see whether the accident or infection rates were especially high compared to related procedures. In short, Rosen argues, the researcher’s obligation to the reader is to resolve discrepancies when they involve matters that affect the reader’s ability to make a judgment about the core subject matter of the case. The reader’s mind should not buzz with further questions, and the description must have all the components necessary to understand the intervention described, including those an expert would consider fundamental.
Discrepancies in streams of interview evidence can arise from many sources, including differences regarding when two people became involved in a process, the roles they played and the knowledge available to them in each of these roles, and the life experiences or technical skills they brought to the job. That is, disagreements do not always arise from deliberate spin. Here are three examples of descriptive or reporting challenges drawn from ISS case study research and the intellectual process these challenges triggered.
One: Superficially discrepant timelines (case about the Liberian Ebola Outbreak response coordinationFootnote 14):
Question: When did Liberia adopt an incident management system for responding to the Ebola outbreak?
Interview 1: CDC Director Tom Frieden persuaded President Ellen Sirleaf to support an incident management system for coordinating the Ebola response. (From archival record: This meeting took place on or around August 24, 2014, on Frieden’s visit to the country.)
Interview 2: A CDC team visited Monrovia the third week in July 2014 and began to work with officials to set up an incident management system. (From archival record: The president appointed Tolbert Nyenswah head of the incident management system on August 10.)
Thought Process: The interviewer seeks accuracy in describing a sequence of events. At first blush it might seem that one subject just remembered a date incorrectly, but the archival evidence suggests that the dates of the events cited are indeed different. What else could be going on? One hypothesis is that something happened in between the two periods that required the president to revisit the choice of approach. An interviewer in strong command of the timeline might then frame a follow-up for interviewee 1: “Could you clarify the situation for me? I thought that the president had earlier appointed someone to head an incident management system. Did the first effort to launch the system fail or flounder?” For interviewee 2: “I understand that later the president and the head of the CDC discussed whether to continue the system in late August. Did anything happen in mid-August to shake the president’s confidence that the IMS was the right approach?”
Two: Superficially discrepant information about states of mind or relationships: (case on cabinet coordination in a power-sharing governmentFootnote 15)
Interview 1 (with someone who was in the meeting): “The dialogue process helped resolve stalemates and we emerged from these sessions in a better position to work together.”
Interview 2 (with a knowledgeable observer who was not in the meeting): “The dialogue process just helped the parties delay taking steps to meet the goals they had jointly agreed to. The leaders argued for long periods.”
Thought Process: In this instance, the researcher wants to know whether tensions among political parties in a unity government were lower, about the same, or higher after resort to an externally mediated “dialogue” mechanism. There are three challenges. First, few people were in the room and the perceptions may vary with knowledge. Second, “tension” or “trust” among political parties is something that is “latent” or hard to measure. Third, delay and levels of distrust could be related in a wide variety of ways. Delay might have increased trust or decreased it.
In this instance, the researcher would likely have to return to the people interviewed with follow-up questions. One might venture several hypotheses and ask what evidence would allow us to rule out each one, then frame questions accordingly. Did the number of matters referred to mediation go down over time? Did the number of days of mediation required diminish over time? Did deputy ministers perceive that it became easier or harder to work with colleagues from the other party during this period? Did progress toward pre-agreed priorities stall or proceed during this period?
If what went on in the mediation room is confidential, then the researcher has to frame questions that rely on other types of information: “Comparing the period before the mediation with the weeks after the mediation, would you say that you had more purely social conversations with people in the opposite party, fewer, or about the same? Was there a new practice introduced after the mediation that affected your ability to have these conversations?”
Three: Insufficient detail; “the mind does not come to rest” and the reader is left with an obvious, big, unanswered question. This challenge arises frequently. For example, the Princeton ISS program ran into this issue in trying to document the introduction of a public service delivery tracking system in the Dominican Republic.Footnote 16
Interview 1 (with an officer responsible for tracking action items in a ministry): “At first we added data to the tracking system each month but after a few months everything slowed down and we added information only every three or four months.”
Interview 2 (with an officer responsible for overseeing the central recording process): “Some ministries didn’t report at all. They never added information to the tracking system.”
Thought process: There is a discrepancy between the two statements, but in both instances it is clear that work had ground to a halt. The issue is when and why. Was the new system unworkable in all ministries or just some? Further, was the system impossible for most to use, or was there something else going on? One could ask a general question, “Why did that happen?” – an approach that often yields surprising answers. But hypotheses and follow-ups might also help winnow out plausible from less-plausible explanations. Did a few ministries try to report and then give up and join the others in noncompliance, or did a few continue to report? Then to the rationale: Was there no progress to report? Was there no penalty for not reporting? Was it hard to find the time to file the report? Did the software break down, or was there limited electrical power to run the system? Was someone designated to acquire and upload the information, or was no one really in charge of that function? Were the instructions hard to follow? Did the minister care or say anything when reporting slowed or halted? Did anyone from the president’s office/delivery unit call to ask why the report was slow? Was there pressure to delay the reports? Why?
If there are no data available to resolve a contradiction or settle a logical subsidiary question, then the researcher can say so. Andrew Bennett has proposed valuing evidence from different interviews according to kind of schedule of plausibility.Footnote 17 Attach an estimate or “prior” to the possible motives of each interviewee who provides evidence and weigh the evidence provided accordingly. If someone offers evidence that clearly does not make that person “look good,” one might have more confidence in the other information offered. “Social psychologists have long noted that audiences find an individual more convincing when that person espouses a view that is seemingly contrary to his or her instrumental goals,” Bennett suggests; “For similar reasons, researchers should follow established advice on considering issues of context and authorship in assessing evidence. Spontaneous statements have a different evidentiary status from prepared remarks. Public statements have a different evidentiary status from private ones or from those that will remain classified for a period of time.”
6.6 Selection Bias
The protocols used to select interviewees are always a potential source of error, whether in survey research or in qualitative interviewing. In process-tracing case studies, we choose the people to interview instrumentally. That is, we want information from people who have direct knowledge of a process. But there is a consequent risk of interviewing only people from a single political party; from the group tasked with the daily work of carrying out a program; or from one ethnic, religious, or economic group affected. This problem may not always be damning. It may be partly contingent on the question asked: for example, there may be circumstances when the only people who are in a position to know about a series of actions are indeed people from the same small group. However, if the aim is to know how others perceived a decision or a program, or whether people thought a process was representative – or whether beneficiaries viewed the program in the same way policy-makers did – it goes without saying that we need the views of a broader group of people.
Avoiding selection bias in interview-based research can prove challenging, especially in less open societies. At ISS, which has focused on governmental reform, researchers typically spend much of their time securing an accurate description of a change in structure or practice and its implementation. Usually the only people with that information are those in government who actually carried out the daily legwork. In some settings, these people are likely to have a party affiliation and come only from one political party. They also may not know how the “clients” – the country’s citizens – view what they do. Because of this, the research program made it standard practice to include in its interview lists:
people most likely to be critical of the reforms we profile (we try to identify such people by looking at local newspaper editorials and headlines, speaking on background with journalists, etc.)
counterparts from another political party where these exist (predecessors in the same role, for example)
civic leaders or university researchers who work closely with the intended beneficiaries or clients
public servants who worked on the project in different locations
the “victors” and the “vanquished” – the people whose views prevailed and those whose views did not.
Where there are few civic groups it may be particularly difficult to identify people who are close to the views of clients and users and can generalize reliably about perceptions and experiences.
Problem: Critics won’t speak (case study on extension of civilian oversight of the military)
Interview 1: The defense minister in the political party that recently came to power says, “We retrenched several thousand soldiers and gave them severance pay. There was no serious objection to the new policy. Some of the senior military officers believed the policy was the right thing to do and supported it.”
Missing Interview: From archival sources we know that a political party led a protest against this very policy. Neither the officers of that political party nor the identifiable leaders of the protest assented to an interview, however.
Remedy: There are some partial solutions for countering selection bias that arises from this sort of “missing actor” problem. One is to try to induce people who will speak on the record to be self-reflective. For example, Jeffrey Berry suggests that the researcher “ask the subject to critique his own case – Why aren’t Democrats buying this?” or to say, “I’m confused on one point; I read ….”Footnote 18 Another approach is to draw on the publications the critics have authored. These may not get at the real reasons for the criticisms the groups raise, but they may provide enough information to represent the view, and enough detail for the researcher to use to seek a reaction from those who will go on the record.
Another kind of selection bias can arise in new democracies or more authoritarian political systems: self-censorship. In this situation, because of concerns about vulnerability or because of traditions that discourage openly critical comments, everyone interviewed offers a “careful” or biased response. The question is how to break through the reserve without jeopardizing any participant’s safety. One possibility is to identify fractures within the political party – a committee or wing or leadership group that genuinely wants to know how well something works and is willing to talk about suspected problems. We can then use these to frame questions that don’t require an interviewee to criticize but instead just ask for steps taken when X happened. Phrasing questions so that they don’t force one person to impugn another can also help: “If you had to do this over again, what would you do differently?” “If you could advise your counterpart in another country how to do _____, what special advice or tips would you want to convey?”
The act of “getting in the door” can create selection bias problems too.Footnote 19 The first people to respond favorably to requests to interviews may be people who are distinctive in some way – those who feel empowered, younger people, people who have aspirations in electoral politics and see a way to lend some credibility to their campaigns. To guard against this kind of bias, it is important to step back periodically and to ensure that the list of those who responded favorably to interview invitations includes people who were involved with what we seek to document but don’t have the same profile as others.
6.7 Conclusion
Qualitative case studies are a form of empirical research. Facts are the currency in which they trade. As such they are potentially vulnerable to the same kinds of problems that bedevil quantitative research, from low measurement validity to data collection techniques that bias the views or accounts surveyed or introduce error. This chapter offers a schema for thinking about these challenges in the context of preparing interview-based process-tracing case studies, along with a few partial solutions to some common problems.
One implication of the observations offered here is that careful interview preparation yields a high return with respect to the accuracy and completeness of a process-tracing case study. That means 1) knowing the subject well enough to frame thoughtful hypotheses and measures in advance, and to build these into draft questions; 2) establishing a timeline and “prestory” from news sources, operations reports, or preliminary “informant” interviews; 3) learning about the backgrounds of the people central to the policy initiative; 4) identifying representatives of the beneficiary groups as well as likely critics or people who had special vantage points; and 5) understanding options tried in other, similar settings or in other periods. This background preparation then shapes the development of interview scripts, useful for thinking hard about clarity, narrative flow, question sequence, and other matters that impinge on the quality of the information elicited. Although the interview itself is a conversation, not (usually) a series of survey questions read off a schedule, the development of the written script sharpens the interviewer’s ability to elicit the information required, while maintaining a positive rapport with the speaker.
A second implication is that we have to be transparent about the basis for arbitrating differences that emerge across interviews. Sometimes the prose can say what the reader needs to know: “Staff members involved at the beginning of the project, when the initial pilot failed, remembered that ____. But those who joined later, after the first results of the revised program started to emerge and neighborhood resistance had dissipated, had a different view of the challenges the government faced.” In other instances, a discursive footnote of the sort that Andrew Moravcsik (Chapter 8, this volume) proposes may be the best way to help the reader understand the judgments the author made.
A third implication of this analysis is that the purported differences in ability to rely on quantitatively analyzed survey data, on the one hand, and qualitative interview data, on the other, are vastly overstated. The main difference between the two has more to do with whether frequencies or distribution of perspectives across populations matter to the aim of the project. If they do, then survey data may have greater value. But if the aim is to elicit understanding of strategic interaction or a process, then purposive interviewing will tell us more. In both contexts, however, the same concerns about eliciting accurate responses apply and some of the same remedies prove useful.
7.1 Introduction
In the lead article of the first issue of Comparative politics, Harold Lasswell posited that the “scientific approach” and the “comparative method” are one and the same (Reference LasswellLasswell 1968: 3). So important is comparative case study research to the modern social sciences that two disciplinary subfields – comparative politics in political science and comparative-historical sociology – crystallized in no small part because of their shared use of comparative case study research (Reference Collier and FinifterCollier 1993; Reference Adams, Clemens, Orloff, Adams, Clemens and OrloffAdams, Clemens, and Orloff 2005: 22–26; Reference Mahoney and ThelenMahoney and Thelen 2015). As a result, a first-principles methodological debate emerged about the appropriate ways to select cases for causal inquiry. In particular, the diffusion of econometric methods in the social sciences exposed case study researchers to allegations that they were “selecting on the dependent variable” and that “selection bias” would hamper the “answers they get” (Reference GeddesGeddes 1990). Lest they be pushed to randomly select cases or turn to statistical and experimental approaches, case study researchers had to develop a set of persuasive analytic tools for their enterprise.
It is unsurprising, therefore, that there has been a profusion of scholarship discussing case selection over the years.Footnote 1 Reference Gerring and CojocaruGerring and Cojocaru (2015) synthesize this literature by deriving no less than five distinct types (representative, anomalous, most-similar, crucial, and most-different) and eighteen subtypes of cases, each with its own logic of case selection. It falls outside the scope of this chapter to provide a descriptive overview of each approach to case selection. Rather, the purpose of the present inquiry is to place the literature on case selection in constructive dialogue with the equally lively and burgeoning body of scholarship on process tracing (Reference George and BennettGeorge and Bennett 2005; Reference Brady and CollierBrady and Collier 2010; Reference Beach and PedersenBeach and Pedersen 2013; Reference Bennett and CheckelBennett and Checkel 2015). I ask a simple question: Should our evolving understanding of causation and our toolkit for case-based causal inference courtesy of process-tracing scholars alter how scholars approach case selection? If so, why, and what may be the most fruitful paths forward?
To propose an answer, this chapter focuses on perhaps the most influential and widely used means to conduct qualitative research involving two or more cases: Mill’s methods of agreement and difference. Also known as the “most-different systems/cases” and “most-similar systems/cases” designs, these strategies have not escaped challenge – although, as we will see, many of these critiques were fallaciously premised on case study research serving as a weaker analogue to econometric analysis. Here, I take a different approach: I argue that the traditional use of Millian methods of case selection can indeed be flawed, but rather because it risks treating cases as static units to be synchronically compared rather than as social processes unfolding over time. As a result, Millian methods risk prematurely rejecting and otherwise overlooking (1) ordered causal processes, (2) paced causal processes, and (3) equifinality, or the presence of multiple pathways that produce the same outcome. While qualitative methodologists have stressed the importance of these processual dynamics, they have been less attentive to how these factors may problematize pairing Millian methods of case selection with within-case process tracing (e.g., Reference Hall, Mahoney and RueschemeyerHall 2003; Reference TarrowTarrow 2010; Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015). This chapter begins to fill that gap.
Taking a more constructive and prescriptive turn, the chapter provides a set of recommendations for ensuring the alignment of Millian methods of case selection with within-case sequential analysis. It begins by outlining how the deductive use of processualist theories can help reformulate Millian case selection designs to accommodate ordered and paced processes (but not equifinal processes). More originally, the chapter concludes by proposing a new, alternative approach to comparative case study research: the method of inductive case selection. By making use of Millian methods to select cases for comparison after a causal process has been identified within a particular case, the method of inductive case selection enables researchers to assess (1) the generalizability of the causal sequences, (2) the logics of scope conditions on the causal argument, and (3) the presence of equifinal pathways to the same outcome. In so doing, scholars can convert the weaknesses of Millian approaches into strengths and better align comparative case study research with the advances of processualist researchers.
Organizationally, the chapter proceeds as follows. Section 7.2 provides an overview of Millian methods for case selection and articulates how the literature on process tracing fits within debates about the utility and shortcomings of the comparative method. Section 7.3 articulates why the traditional use of Millian methods risks blinding the researcher to ordered, paced, and equifinal causal processes, and describes how deductive, processualist theorizing helps attenuate some of these risks. Section 7.4 develops a new inductive method of case selection and provides a number of concrete examples from development practice to illustrate how it can be used by scholars and policy practitioners alike. Section 7.5 concludes.
7.2 Case Selection in Comparative Research
7.2.1 Case Selection Before the Processual Turn
Before “process tracing” entered the lexicon of social scientists, the dominant case selection strategy in case study research sought to maximize causal leverage via comparison, particularly via the “methods of agreement and difference” of John Stuart Reference MillMill (1843 [1974]: 388–391).
In Mill’s method of difference, the researcher purposively chooses two (or more) cases that experience different outcomes, despite otherwise being very similar on a number of relevant dimensions. Put differently, the researcher seeks to maximize variation in the outcome variable while minimizing variation amongst a set of plausible explanatory variables. It is for this reason that the approach also came to be referred to as the ‘most-similar systems’ or ‘most-similar cases’ design – while Mill’s nomenclature highlights variation in the outcome of interest, the alternative terminology highlights minimal variation amongst a set of possible explanatory factors. The underlying logic of this case selection strategy is that because the cases are so similar, the researcher can subsequently probe for the explanatory factor that actually does exhibit cross-case variation and isolate it as a likely cause.
Mill’s method of agreement is the mirror image of the method of difference. Here, the researcher chooses two (or more) cases that experience similar outcomes despite being very different on a number of relevant dimensions. That is, the researcher seeks to minimize variation in the outcome variable while maximizing variation amongst a set of plausible explanatory variables. An alternative, independent variable-focused terminology for this approach was developed – the ‘most-different systems’ or ‘most-different cases’ design – breeding some confusion. The underlying logic of this case selection strategy is that it helps the researcher isolate the explanatory factor that is similar across the otherwise different cases as a likely cause.Footnote 2
Mill himself did not believe that such methods could yield causal inferences outside of the physical sciences (Reference MillMill 1843 [1974]: 452). Nevertheless, in the 1970s a number of comparative social scientists endorsed Millian methods as the cornerstones of the comparative method. For example, Reference Przeworski and TeunePrzeworski and Teune (1970) advocated in favor of the most-different cases design, whereas Reference LijphartLijphart (1971) favored the most-similar cases approach. In so doing, scholars sought case selection techniques that would be as analogous as possible to regression analysis: focused on controlling for independent variables across cases, maximizing covariation between the outcome and a plausible explanatory variable, and treating cases as a qualitative equivalent to a row of dataset observations. It is not difficult to see why this contributed to the view that case study research serves as the “inherently flawed” version of econometrics (Reference Adams, Clemens, Orloff, Adams, Clemens and OrloffAdams, Clemens, and Orloff 2005: 25; Reference TarrowTarrow 2010). Indeed, despite his prominence as a case study researcher, Reference LijphartLijphart (1975: 165; Reference Lijphart1971: 685) concluded that “because the comparative method must be considered the weaker method,” then “if at all possible one should generally use the statistical (or perhaps even the experimental) method instead.” As Reference Hall, Mahoney and RueschemeyerHall (2003: 380; 396) brilliantly notes, case study research
was deeply influenced by [Lijphart’s] framing of it … [where] the only important observations to be drawn from the cases are taken on the values of the dependent variable and a few explanatory variables … From this perspective, because the number of pertinent observations available from small-N comparison is seriously limited, the analyst lacks the degrees of freedom to consider more than a few explanatory variables, and the value of small-N comparison for causal inference seems distinctly limited.
In other words, the predominant case selection approach through the 1990s sought to do its best to reproduce a regression framework in a small-N setting – hence Lijphart’s concern with the “many variables, small number of cases” problem, which he argued could only be partially mitigated if, inter alia, the researcher increases the number of cases and decreases the number of variables across said cases (Reference Lijphart1971: 685–686). Later works embraced Lijphart’s formulation of the problem even as they sought to address it: for example, Reference Eckstein, Greenstein and PolsbyEckstein (1975: 85) argued that a “case” could actually be comprised of many “cases” if the unit of analysis shifted from being, say, the electoral system to, say, the voter. Predictably, such interventions invited retorts: Reference LiebersonLieberson (1994), for example, claimed that Millian methods’ inability to accommodate probabilistic causation,Footnote 3 interaction effects, and multivariate analysis would remain fatal flaws.
7.2.2 Enter Process Tracing
It is in this light that ‘process tracing’ – a term first used by Reference HobarthHobarth (1972) but popularized by Reference George and LaurenGeorge (1979) and particularly Reference George and BennettGeorge and Bennett (2005), Reference Brady and CollierBrady and Collier (2010), Reference Beach and PedersenBeach and Pedersen (2013), and Reference Bennett and CheckelBennett and Checkel (2015) – proved revolutionary for the ways in which social scientists conceive of case study research. Cases have gradually been reconceptualized not as dataset observations but as concatenations of concrete historical events that produce a specific outcome (Reference MahoneyGoertz and Mahoney 2012). That is, cases are increasingly treated as social processes, where a process is defined as “a particular type of sequence in which the temporally ordered events belong to a single coherent pattern of activity” (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 214). Although there exist multiple distinct conceptions of process tracing – from Bayesian approaches (Reference Bennett, Bennett and CheckelBennett 2015) to set-theoretic approaches (Reference Mahoney, Kimball and KoivuMahoney et al. 2009) to mechanistic approaches (Reference Beach and PedersenBeach and Pedersen 2013) to sequentialist approaches (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015) – their overall esprit is the same: reconstructing the sequence of events and interlinking causal logics that produce an outcome – isolating the ‘causes of effects’ – rather than probing a variable’s mean impact across cases via an ‘effects of causes’ approach.Footnote 4
For this intellectual shift to occur, processualist social scientists had to show how a number of assumptions underlying Millian comparative methods – as well as frequentist approaches more generally – are usually inappropriate for case study research. For example, the correlational approach endorsed by Reference Przeworski and TeunePrzeworski and Teune (1970), Reference LijphartLijphart (1971), and Reference Eckstein, Greenstein and PolsbyEckstein (1975) treats observational units as homogeneous and independent (Reference Hall, Mahoney and RueschemeyerHall 2003: 382; Reference MahoneyGoertz and Mahoney 2012). Unit homogeneity means that “different units are presumed to be fully identical to each other in all relevant respects except for the values of the main independent variable,” such that each observation contributes equally to the confidence we have in the accuracy and magnitude of our causal estimates (Reference Brady and CollierBrady and Collier 2010: 41–42). Given this assumption, more observations are better – hence, Reference LijphartLijphart (1971)’s dictum to “increase the number of cases” and, in its more recent variant, to “increase the number of observations” (Reference King, Keohane and VerbaKing, Keohane, and Verba 1994: 208–230). By independence, we mean that “for each observation, the value of a particular variable is not influenced by its value in other observations”; thus, each observation contributes “new information about the phenomenon in question” (Reference Brady and CollierBrady and Collier 2010: 43).
By contrast, practitioners of process tracing have shown that treating cases as social processes implies that case study observations are often interdependent and derived from heterogeneous units (Reference MahoneyGoertz and Mahoney 2012). Unit heterogeneity means that not all historical events, and the observable evidence they generate, are created equal. Hence, some observations may better enable the reconstruction of a causal process because they are more proximate to the central events under study. Correlatively, this is why historians accord greater ‘weight’ to primary than to secondary sources, and why primary sources concerning actors central to a key event are more important than those for peripheral figures (Reference TrachtenbergTrachtenberg 2009; Reference TanseyTansey 2007). In short, while process tracing may yield a bounty of observable evidence, we seek not to necessarily increase the number, but rather the quality, of observations. Finally, by interdependence we mean that because time is “fateful” (Reference SewellSewell 2005: 6), antecedent events in a sequence may influence subsequent events. This “fatefulness” has multiple sources. For instance, historical institutionalists have shown how social processes can exhibit path dependencies where the outcome of interest becomes a central driver of its own reproduction (Reference PiersonPierson 1996; Reference PiersonPierson 2000; Reference MahoneyMahoney 2000; Reference Hall, Mahoney and RueschemeyerHall 2003; Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015). At the individual level, processual sociologists have noted that causation in the social world is rarely a matter of one billiard ball hitting another, as in Reference HumeHume’s (1738 [2003]) frequentist concept of “constant conjunction.” Rather, it hinges upon actors endowed with memory, such that the micro-foundations of social causation rest on individuals aware of their own historicality (Reference SewellSewell 2005; Reference AbbottAbbott 2001; Reference Abbott2016).
At its core, eschewing the independence and unit homogeneity assumptions simply means situating case study evidence within its spatiotemporal context (Reference Hall, Mahoney and RueschemeyerHall 2003; Reference Falleti and LynchFalleti and Lynch 2009). This commitment is showcased by the language which process-sensitive case study researchers use when making causal inferences. First, rather than relating ‘independent variables’ to ‘dependent variables’, they often privilege the contextualizing language of relating ‘events’ to ‘outcomes’ (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015). Second, they prefer to speak not of ‘dataset observations’ evocative of cross-sectional analysis, but of ‘causal process observations’ evocative of sequential analysis (Reference Brady and CollierBrady and Collier 2010; Reference MahoneyGoertz and Mahoney 2012). Third, they may substitute the language of ‘causal inference via concatenation’ – a terminology implying that unobservable causal mechanisms are embedded within a sequence of observable events – for that of ‘causal inference via correlation’, evocative of the frequentist billiard-ball analogy (Reference Waldner and KincaidWaldner 2012: 68). The result is that case study research is increasingly hailed as a “distinctive approach that offers a much richer set of observations, especially about causal processes, than statistical analyses normally allow” (Reference Hall, Mahoney and RueschemeyerHall 2003: 397).
7.3 Threats to Processual Inference and the Role of Theory
While scholars have shown how process-tracing methods have reconceived the utility of case studies for causal inference, there remains some ambiguity about the implications for case selection, particularly using Millian methods. While several works have touched upon this theme (e.g., Reference Hall, Mahoney and RueschemeyerHall 2003; Reference George and BennettGeorge and Bennett 2005; Reference LevyLevy 2008; Reference TarrowTarrow 2010), the contribution that most explicitly wrestles with this topic is Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney (2015), who acknowledge that “the application of Millian methods for sequential arguments has not been systematically explored, although we believe it is commonly used in practice” (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 226). Falleti and Mahoney argue that process tracing can remedy the weaknesses of Millian approaches: “When used in isolation, the methods of agreement and difference are weak instruments for small-N causal inference … small-N researchers thus normally must combine Millian methods with process tracing or other within-case methods to make a positive case for causality” (Reference Falleti, Mahoney, Mahoney and Thelen2015: 225–226). Their optimism about the synergy between Millian methods and process tracing leads them to conclude that “by fusing these two elements, the comparative sequential method merits the distinction of being the principal overarching methodology for [comparative-historical analysis] in general” (Reference Falleti, Mahoney, Mahoney and Thelen2015: 236).
Falleti and Mahoney’s contribution is the definitive statement of how comparative case study research has long abandoned its Lijphartian origins and fully embraced treating cases as social processes. It is certainly true that process-tracing advocates have shown that some past critiques of Millian methods may not have been as damning as they first appeared. For example, Reference LiebersonLieberson’s (1994) critique that Millian case selection requires a deterministic understanding of causation has been countered by set-theoretic process tracers who note that causal processes can indeed be conceptualized as concatenations of necessary and sufficient conditions (Reference MahoneyGoertz and Mahoney 2012; Reference Mahoney and VanderpoelMahoney and Vanderpoel 2015). After all, “at the individual case level, the ex post (objective) probability of a specific outcome occurring is either 1 or 0” (Reference MahoneyMahoney 2008: 415). Even for those who do not explicitly embrace set-theoretic approaches and prefer to perform a series of “process tracing tests” (such as straw-in-the-wind, hoop, smoking gun, and doubly-decisive tests), the objective remains to evaluate the deterministic causal relevance of a historical event on the next linkage in a sequence (Reference CollierCollier 2011; Reference MahoneyMahoney 2012). In this light, Millian methods appear to have been thrown a much-needed lifeline.
Yet processualist researchers have implicitly exposed new, and perhaps more damning, weaknesses in the traditional use of the comparative method. Here, Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney (2015) are less engaged in highlighting how their focus on comparing within-case sequences should push scholars to revisit strategies for case selection premised on assumptions that process-tracing advocates have undermined. In this light, I begin by outlining three hitherto underappreciated threats to inference associated with the traditional use of Millian case selection: potentially ignoring (1) ordered and (2) paced causal processes, and ignoring (3) the possibility of equifinality. I then demonstrate how risks (1) and (2) can be attenuated deductively by formulating processualist theories and tweaking Millian designs for case selection.
Risk 1: Ignoring Ordered Processes
Process-sensitive social scientists have long noted that “the temporal order of the events in a sequence [can be] causally consequential for the outcome of interest” (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 218; see also Reference PiersonPierson 2004: 54–78). For example, where individual acts of agency play a critical role – such as political elites’ response to a violent protest – “reordering can radically change [a] subject’s understanding of the meaning of particular events,” altering their response and the resulting outcomes (Reference AbbottAbbott 1995: 97).
An evocative illustration is provided by Reference SewellSewell’s (1996) analysis of how the storming of the Bastille in 1789 produced the modern concept of “revolution.” After overrunning the fortress, the crowd freed the few prisoners held within it; shot, stabbed, and beheaded the Bastille’s commander; and paraded his severed head through the streets of Paris (Reference SewellSewell 1996: 850). When the French National Assembly heard of the taking of the Bastille, it first interpreted the contentious event as “disastrous news” and an “excess of fury”; yet, when the king subsequently responded by retreating his troops to their provincial barracks, the Assembly recognized that the storming of the Bastille had strengthened its hand, and proceeded to reinterpret the event as a patriotic act of protest in support of political change (Reference SewellSewell 1996: 854–855). The king’s reaction to the Bastille thus bolstered the Assembly’s resolve to “invent” the modern concept of revolution as a “legitimate rising of the sovereign people that transformed the political system of a nation” (Reference SewellSewell 1996: 854–858). Proceeding counterfactually, had the ordering of events been reversed – had the king withdrawn his troops before the Bastille had been stormed – the National Assembly would have had little reason to interpret the popular uprising as a patriotic act legitimating reform rather than a violent act of barbarism.
Temporal ordering may also alter a social process’s political outcomes through macro-level mechanisms. For example, consider Reference FalletiFalleti’s (2005, Reference Falleti2010) analysis of the conditions under which state decentralization – the devolution of national powers to subnational administrative bodies – increases local political autonomy in Latin America. Through process tracing, Falleti demonstrates that when fiscal decentralization precedes electoral decentralization, local autonomy is increased, since this sequence endows local districts with the monetary resources necessary to subsequently administer an election effectively. However, when the reverse occurs, such that electoral decentralization precedes fiscal decentralization, local autonomy is compromised. For although the district is being offered the opportunity to hold local elections, it lacks the monetary resources to administer them effectively, endowing the national government with added leverage to impose conditions upon the devolution of fiscal resources.
For our purposes, what is crucial to note is not simply that temporal ordering matters, but that in ordered processes it is not the presence or absence of events that is most consequential for the outcome of interest. For instance, in Falleti’s analysis both fiscal and electoral decentralization occur. This means that a traditional Millian framework risks dismissing some explanatory events as causally irrelevant on the grounds that their presence is insufficient for explicating the outcome of interest (see Figure 7.2).
The way to deductively attenuate the foregoing risk is to develop an ordered theory and then modify the traditional Millian setup to assess the effect of ordering on an outcome of interest. That is, deductive theorizing aimed at probing the causal effect of ordering can guide us in constructing an appropriate Millan case selection design, such as that in Figure 7.3. In this example, we redefine the fourth independent variable to measure not the presence or absence of a fourth event, but rather to measure the ordering of two previously defined events (in this case, events 1 and 2). This case selection setup would be appropriate if deductive theorizing predicts that the outcome of interest is produced when event 1 is followed by event 2 (such that, unless this specific ordering occurs, the presence of events 1 and 2 is insufficient to generate the outcome). In other words, if Millian methods are to be deductively used to select cases for comparison, the way to guard against prematurely dismissing the causal role of temporal ordering is to explicitly theorize said ordering a priori. If this proves difficult, or if the researcher lacks sufficient knowledge to develop such a theory, it is advisable to switch to the more inductive method for case selection outlined in the next section.
Risk 2: Ignoring Paced Processes
Processualist researchers have also emphasized that, beyond temporal order, “the speed or duration of events … is causally consequential” (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 219). For example, social scientists have long distinguished an “eventful temporality” (Reference SewellSewell 1996) from those “big, slow moving” incremental sequences devoid of rapid social change (Reference Pierson, Mahoney and RueschemeyerPierson 2003). For historical institutionalists, this distinction is illustrated by “critical junctures” – defined as “relatively short periods of time during which there is a substantially heightened probability that agents’ choices will affect the outcome of interest” (Reference Capoccia and KelemenCapoccia and Kelemen 2007: 348; Reference Capoccia, Mahoney and ThelenCapoccia 2015: 150–151) – on the one hand, and those “causal forces that develop over an extended period of time,” such as “cumulative” social processes, sequences involving “threshold effects,” and “extended causal chains” on the other hand (Reference PiersonPierson 2004: 82–90; Reference Mahoney, Thelen, Mahoney and ThelenMahoney and Thelen 2010).
An excellent illustration is provided by Reference BeissingerBeissinger (2002)’s analysis of the contentious events that led to the collapse of the Soviet State. Descriptively, the sequence of events has its origins in the increasing transparency of Soviet institutions and freedom of expression accompanying Gorbachev’s Glasnost (Reference BeissingerBeissinger 2002: 47). As internal fissures within the Politburo began to emerge in 1987, Glasnost facilitated media coverage of the split within the Soviet leadership (Reference Beissinger2002: 64). In response, “interactive attempts to contest the state grew regularized and began to influence one another” (Reference Beissinger2002: 74). These challenging acts mobilized around previously dormant national identities, and for the first time – often out of state incompetence – these early protests were not shut down (Reference Beissinger2002: 67). Protests reached a boiling point in early 1989 as the first semicompetitive electoral campaign spurred challengers to mobilize the electorate and cultivate grievances in response to regime efforts to “control nominations and electoral outcomes” (Reference Beissinger2002: 86). By 1990 the Soviet State was crumbling, and “in many parts of the USSR demonstration activity … had become a normal means for dealing with political conflict” (Reference Beissinger2002: 90).
Crucially, Beissinger stresses that to understand the causal dynamics of the Soviet State’s collapse, highlighting the chronology of events is insufficient. The 1987–1990 period comprised a moment of “thickened history” wherein “what takes place … has the potential to move history onto tracks otherwise unimaginable … all within an extremely compressed period of time” (Reference Beissinger2002: 27). Information overload, the density of interaction between diverse social actors, and the diffusion of contention engendered “enormous confusion and division within Soviet institutions,” allowing the hypertrophy of challenging acts to play “an increasingly significant role in their own causal structure” (Reference Beissinger2002: 97, 27). In this light, the temporal compression of a sequence of events can bolster the causal role of human agency and erode the constraints of social structure. Proceeding counterfactually, had the exact same sequence of contentious events unfolded more slowly, it is doubtful that the Soviet State would have suddenly collapsed.
Many examples of how the prolongation of a sequence of events can render them invisible, and thus produce different outcomes, could be referenced. Consider, for example, how global climate change – which is highlighted by Reference PiersonPierson (2004: 81) as a prototypical process with prolonged time horizons – conditions the psychological response of social actors. As a report from the American Psychological Association underscores, “climate change that is construed as rapid is more likely to be dreaded,” for “people often apply sharp discounts to costs or benefits that will occur in the future … relative to experiencing them immediately” (Reference SwimSwim et al. 2009: 24–25; Reference Loewenstein and ElsterLoewenstein and Elster 1992). This logic is captured by the metaphor of the “boiling frog”: “place a frog in a pot of cool water, and gradually raise the temperature to boiling, and the frog will remain in the water until it is cooked” (Reference BoyatzisBoyatzis 2006: 614).
What is important to note is that, once more, paced processes are not premised on the absence or presence of their constitutive events being causally determinative; rather, they are premised on the duration of events (or their temporal separation) bearing explanatory significance. Hence the traditional approach to case selection risks neglecting the causal impact of temporal duration on the outcome of interest (see Figure 7.4).
Here, too, the way to deductively assess the causal role of pacing on an outcome of interest is to explicitly develop a paced theory before selecting cases for empirical analysis. On the one hand, we might theorize that it is the duration of a given event that is causally consequential; on the other hand, we might theorize that it is the temporal separation of said event from other events that is significant. Figure 7.5 suggests how a researcher can assess both theories through a revised Millian design. In the first example, we define a fourth independent variable measuring not the presence of a fourth event, but rather the temporal duration of a previously defined event (in this case, event 1). This would be an appropriate case selection design to assess a theory predicting that the outcome of interest occurs when event 1 unfolds over a prolonged period of time (such that if event 1 unfolds more rapidly, its mere occurrence is insufficient for the outcome). In the second example, we define a fourth independent variable measuring the temporal separation between two previously defined events (in this case, events 1 and 2). This would be an appropriate case selection design for a theory predicting that the outcome of interest only occurs when event 1 is temporally distant to event 2 (such that events 1 and 2 are insufficient for the outcome if they are proximate). Again, if the researcher lacks a priori knowledge to theorize how a paced process may be generating the outcome, it is advisable to adopt the inductive method of case selection described in Section 7.4.
Risk 3: Ignoring Equifinal Causal Processes
Finally, researchers have noted that causal processes may be mired by equifinality: the fact that “multiple combinations of values … produce the same outcome” (Reference MahoneyMahoney 2008: 424; see also Reference George and BennettGeorge and Bennett 2005; Reference MahoneyGoertz and Mahoney 2012). More formally, set-theoretic process tracers account for equifinality by emphasizing that, in most circumstances, “necessary” conditions or events are actually INUS conditions – individually necessary components of an unnecessary but sufficient combination of factors (Reference Mahoney and VanderpoelMahoney and Vanderpoel 2015: 15–18).
One of the reasons why processualist social scientists increasingly take equifinality seriously is the recognition that causal mechanisms may be context-dependent. Sewell’s work stresses that “the consequences of a given act … are not intrinsic to the act but rather will depend on the nature of the social world within which it takes place” (Reference SewellSewell 2005: 9–10). Similarly, Reference Falleti and LynchFalleti and Lynch (2009: 2; 11) argue that “causal effects depend on the interaction of specific mechanisms with aspects of the context within which these mechanisms operate,” hence the necessity of imposing “scope conditions” on theory building. One implication is that the exact same sequence of events in two different settings may produce vastly different causal outcomes. The flip side of this conclusion is that we should not expect a given outcome to always be produced by the same sequence of events.
For example, consider Sewell’s critique of Reference SkocpolSkocpol (1979)’s States and Social Revolutions for embracing an “experimental temporality.” Skocpol deploys Millian methods of case selection to theorize that the great social revolutions – the French, Russian, and Chinese revolutions – were caused by a conjunction of three necessary conditions: “(1) military backwardness, (2) politically powerful landlord classes, and (3) autonomous peasant communities” (Reference SewellSewell 2005: 93). Yet to permit comparison, Skocpol assumes that the outcomes of one revolution, and the processes of historical change more generally, have no effect on a subsequent revolution (Reference SewellSewell 2005: 94–95). This approach amounts to “cutting up the congealed block of historical time into artificially interchangeable units,” ignoring the fatefulness of historical sequences (Reference SewellSewell 2005). For example, the Industrial Revolution “intervened” between the French and Russian Revolutions, and consequently one could argue that “the revolt of the Petersburg and Moscow proletariat was a necessary condition for social revolution in Russia in 1917, even if it was not a condition for the French Revolution in 1789” (Reference SewellSewell 2005: 94–95). What Sewell is emphasizing, in short, is that peasant rebellion is an INUS condition (as is a proletariat uprising), rather than a necessary condition.
Another prominent example of equifinality is outlined by Reference CollierCollier’s (1999: 5–11) review of the diverse pathways through which democratization occurs. In the elite-driven pathway, emphasized by Reference O’Donnell and SchmitterO’Donnell and Schmitter (1986), an internal split amongst authoritarian incumbents emerges; this is followed by liberalizing efforts by some incumbents, which enables the resurrection of civil society and popular mobilization; finally, authoritarian incumbents negotiate a pacted transition with opposition leaders. By contrast, in the working-class-driven pathway, emphasized by Reference Rueschemeyer, Stephens and StephensRueschemeyer, Stephens, and Stephens (1992), a shift in the material balance of power in favor of the democracy-demanding working class and against the democracy-resisting landed aristocracy causes the former to overpower the latter, and via a democratic revolution from below a regime transition occurs. Crucially, Reference CollierCollier (1999: 12) emphasizes that these two pathways need not be contradictory (or exhaustive): the elite-driven pathway appears more common in the Latin American context during the second wave of democratization, whereas the working-class-driven pathway appears more common in Europe during the first wave of democratization.
What is crucial is that Millian case selection is premised on there being a single cause underlying the outcome of interest. As a result, Millian methods risk dismissing a set of events as causally irrelevant ex ante in one case simply because that same set of events fails to produce the outcome in another case (see Figure 7.6). Unlike ordered and paced processes, there is no clear way to leverage deductive theorizing to reconfigure Millian methods for case selection and accommodate equifinality. However, I argue that the presence of equifinal pathways can be fruitfully probed if we embrace a more inductive approach to comparative case selection, as the next section outlines.
7.4 A New Approach: The Method of Inductive Case Selection
If a researcher wishes to guard against ignoring consequential temporal dynamics but lacks the a priori knowledge necessary to develop a processual theory and tailor their case selection strategy, is there an alternative path forward? Yes, indeed: I suggest that researchers could wield most-similar or most-different cases designs to (1) probe causal generalizability, (2) reveal scope conditions, and (3) explore the presence of equifinality.Footnote 5 To walk through this more inductive case selection approach, I engage some case studies from development practice to illustrate how researchers and practitioners alike could implement and benefit from the method.
7.4.1 Tempering the Deductive Use of Millian Methods
To begin, one means to ensure against a Millian case selection design overlooking an ordered, paced, or equifinal causal process (in the absence of deductive theorizing) is to be wary of leveraging the methods of agreement and difference to eliminate potential explanatory factors (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 225–226). That is, the decision to discard an explanatory variable or historical event as causally unnecessary (via the method of agreement) or insufficient (via the method of difference) may be remanded to the process-tracing stage, rather than being made ex ante at the case selection stage.
Notice how this recommendation is particularly intuitive in light of the advances in process-tracing methods. Before this burgeoning literature existed, Millian methods were called upon to accomplish two things at once: (1) provide a justification for selecting two or more cases for social inquiry, and (2) yield causal leverage via comparison and the elimination of potential explanatory factors as unnecessary or insufficient. But process-tracing methodologists have showcased how the analysis of temporal variation disciplined via counterfactual analysis, congruence testing, and process-tracing tests renders within-case causal inference possible even in the absence of an empirical comparative case (Reference George and BennettGeorge and Bennett 2005; Reference GerringGerring 2007; Reference CollierCollier 2011; Reference MahoneyMahoney 2012; Reference Beach and PedersenBeach and Pedersen 2013; Reference Bennett and CheckelBennett and Checkel 2015; Reference LevyLevy 2015). That is, the ability to make causal inferences need not be primarily determined at the case selection stage.
The foregoing implies that if a researcher does not take temporal dynamics into account when developing their theory, the use of Millian methods should do no more than to provisionally discount the explanatory purchase of a given explanatory factor. The researcher should then bear in mind that as the causal process is reconstructed from a given outcome, the provisionally discounted factor may nonetheless be shown to be of causal relevance – particularly if the underlying process is ordered or paced, or if equifinal pathways are possible.
Despite these limitations, Millian methods might fruitfully serve additional functions from the standpoint of case selection, particularly if researchers shift (1) when and (2) why they make use of them. First, Millian methods may be as – if not more – useful after process tracing of a particular case is completed rather than to set the stage for within-case analysis. Such a chronological reversal – process tracing followed by Millian case selection, instead of Millian case selection followed by process tracing – inherently embraces a more inductive, theory-building approach to case study research (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 229–231) which, I suspect, is far more commonly used in practice than is acknowledged. I refer to this approach as the method of inductive case selection, wherein “theory-building process tracing” (Reference Beach and PedersenBeach and Pedersen 2013: 16–18) of a single case is subsequently followed by the use of a most-similar or most-different cases design.
7.4.2 Getting Started: Selecting the Initial Case
The method of inductive case selection begins by assuming that the researcher has justifiable reasons for picking a particular case for process tracing and is subsequently looking to contextualize the findings or build a theory outwards. Hence, the first step involves picking an initial case. Qualitative methodologists have already supplied a number of plausible logics for selecting a single case, and I describe three nonexhaustive possibilities here: (1) theoretical or historical importance; (2) policy relevance and salience; and (3) empirically puzzling nature.
First, an initial case may be selected due to its theoretical or historical importance. Reference Eckstein, Greenstein and PolsbyEckstein (1975), for example, defines an idiographic case study as a case where the specific empirical events/outcome serve as a central referent for a scholarly literature. As an illustration, Reference Gerring and CojocaruGerring and Cojocaru (2015: 11) point to Reference North and WeingastNorth and Weingast (1989)’s influential study of how the Glorious Revolution in seventeenth-century Britain favorably shifted the constitutional balance of power for the government to make credible commitments to protecting property rights (paving the way for the financial revolution of the early eighteenth century). Given that so much of the scholarly debate amongst economic historians centers on the institutional foundations of economic growth, North and Weingast’s case study was “chosen (it would appear) because of its central importance in the [historical political economy] literature on the topic, and because it is … a prominent and much-studied case” (Reference Gerring and CojocaruGerring and Cojocaru 2015: 11). In other words, Reference North and WeingastNorth and Weingast (1989)’s study is idiographic in that it “aim[s] to explain and/or interpret a single historical episode,” but it remains “theory-guided” in that it “focuses attention on some theoretically specified aspects of reality and neglects others” (Reference LevyLevy 2008: 4).
While the causes of the Glorious Revolution are a much-debated topic amongst economic historians, they have less relevance to researchers and practitioners focused on assessing the effects of contemporary public policy interventions. Hence, a second logic for picking a first case for process tracing is its policy relevance and salience. Reference George and BennettGeorge and Bennett (2005: 263–286) define a policy-relevant case study as one where the outcome is of interest to policy-makers and its causes are at least partially amenable to policy manipulation. For example, one recent World Bank case study (Reference El-Saharty and NagarajEl-Saharty and Nagaraj 2015) analyzes how HIV/AIDS prevalence amongst vulnerable subpopulations – particularly female sex workers – can be reduced via targeted service delivery. To study this outcome, two states in India – Andhra Pradesh and Karnataka – were selected for process tracing. There are three reasons why this constitutes an appropriate policy-relevant case selection choice. First, the outcome of interest – a decline in HIV/AIDS prevalence amongst female sex workers – was present in both Indian states. Second, because India accounts for almost 17.5 percent of the world population and has a large population of female sex workers, this outcome was salient to the government (Reference El-Saharty and NagarajEl-Saharty and Nagaraj 2015: 3). Third, the Indian government had created a four-phase National AIDS Control Program (NACP) spanning from 1986 through 2017, meaning that at least one set of possible explanatory factors for the decline in HIV/AIDS prevalence comprised policy interventions that could be manipulated.Footnote 6
A third logic for picking an initial case for process tracing is its puzzling empirical nature. One obvious instantiation is when an exogenous shock or otherwise significant event/policy intervention yields a different outcome from the one scholars and practitioners expected.Footnote 7 For example, in 2004 the federal government of Nigeria partnered with the World Bank to improve the share of Nigeria’s urban population with access to piped drinking water. This partnership – the National Urban Water Sector Reform Project (NUWSRP1) – aimed to “increase access to piped water supply in selected urban areas by improving the reliability and financial viability of selected urban water utilities” and by shifting resources away from “infrastructure rehabilitation” that had failed in the past (Reference Hima and SantibanezHima and Santibanez 2015: 2). Despite $200 million worth of investments, ultimately the NUWSRP1 “did not perform as strongly on the institutional reforms needed to ensure sustainability” (Reference Hima and SantibanezHima and Santibanez 2015). Given this puzzling outcome, the World Bank conducted an intensive case study to ask why the program did “not fully meet its essential objective of achieving a sustainable water delivery service” (Reference Hima and SantibanezHima and Santibanez 2015).Footnote 8
The common thread of these three logics for selecting an initial case is that the case itself is theoretically or substantively important and that its empirical dynamics – underlying either the outcome itself or its relationship to some explanatory events – are not well understood. That being said, the method of inductive case selection merely presumes that there is some theoretical, policy-related, empirical, or normative justification to pick the initial case.
7.4.3 Probing Generalizability Via a Most-Similar Cases Design
It is after picking an initial case that the method of inductive case selection contributes novel guidelines for case study researchers by reconfiguring how Millian methods are used. Namely, how should one (or more) additional cases be selected for comparison, and why? This question presumes that the researcher wishes to move beyond an idiographic, single-case study for the purposes of generating inferences that can travel. Yet in this effort, we should take seriously process-tracing scholars’ argument that causal mechanisms are often context-dependent. As a result, the selection of one or more comparative cases is not meant to uncover universally generalizable abstractions; rather, it is meant to contextualize the initial case within a set or family of cases that are spatiotemporally bounded.
That being said, the first logical step is to understand whether the causal inferences yielded by the process-traced case can indeed travel to other contexts (Reference GoertzGoertz 2017: 239). This constitutes the first reconfiguration of Millian methods: the use of comparative case studies to assess generalizability. Specifically, after within-case process tracing reveals a factor or sequence of factors as causally important to an outcome of interest, the logic is to select a case that is as contextually analogous as possible such that there is a higher probability that the causal process will operate similarly in the second case. This approach exploits the context-dependence of causal mechanisms to the researcher’s advantage: Similarity of context increases the probability that a causal mechanism will operate similarly across both cases. By “context,” it is useful to follow Reference Falleti and LynchFalleti and Lynch (2009: 14) and to be
concerned with a variety of contextual layers: those that are quite proximate to the input (e.g., in a study of the emergence of radical right-wing parties, one such layer might be the electoral system); exogenous shocks quite distant from the input that might nevertheless effect the functioning of the mechanism and, hence, the outcome (e.g., a rise in the price of oil that slows the economy and makes voters more sensitive to higher taxes); and the middle-range context that is neither completely exogenous nor tightly coupled to the input and so may include other relevant institutions and structures (the tax system, social solidarity) as well as more atmospheric conditions, such as rates of economic growth, flows of immigrants, trends in partisan identification, and the like.
For this approach to yield valuable insights, the researcher focuses on ‘controlling’ for as many of these contextual explanatory factors (crudely put, for as many independent variables) as possible. In other words, the researcher selects a most-similar case: if the causal chain similarly operates in the second case, this would support the conclusion that the causal process is likely at work across the constellation of cases bearing ‘family resemblances’ to the process-traced case (Reference SoiferSoifer 2020). Figure 7.7 displays the logic of this design:
As in Figure 7.7, suppose that process tracing of Case 1 reveals that some sequence of events (in this example, event 4 followed by event 5) caused the outcome of interest. The researcher would then select a most-similar case (a case with similar values/occurrences of other independent variables/events (here, IV1–IV3) that might also influence the outcome). The researcher would then scout whether the sequence in Case 1 (event 4 followed by event 5) also occurs in the comparative case. If it does, the expectation for a minimally generalizable theory is that it would produce a similar outcome in Case 2 as in Case 1. Correlatively, if the sequence does not occur in Case 2, the expectation is that it would not experience the same outcome as Case 1. These findings would provide evidence that the explanatory sequence (event 4 followed by event 5) has causal power that is generalizable across a set of cases bearing family resemblances.
For example, suppose a researcher studying democratization in Country A finds evidence congruent with the elite-centric theory of democratization of Reference O’Donnell and SchmitterO’Donnell and Schmitter (1986) described previously. To assess causal generalizability, the researcher would subsequently select a case – Country B – that is similar in the background conditions that the literature has shown to be conducive to democratization, such as level of GDP per capita (Reference Przeworski and LimongiPrzeworski and Limongi 1997; Reference Boix and StokesBoix and Stokes 2003) or belonging to the same “wave” of democratization via spatial and temporal proximity (Reference Collier, Rustow and EricksonCollier 1991; Reference HuntingtonHuntington 1993). Notice that these background conditions in Case B have to be at least partially exogenous to the causal process whose generalizability is being probed – that is, they cannot constitute the events that directly comprise the causal chain revealed in Case A. One way to think about them is as factors that in Case A appear to have been necessary, but less proximate and important, conditions for the outcome. Here, importance is determined by the “extent that they are [logically/counterfactually] present only when the outcome is present” (Reference Mahoney, Kimball and KoivuMahoney et al. 2009: 119), whereas proximity is determined by the degree to which the condition is “tightly coupled” with the chain of events directly producing the outcome (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 233).
An example related to the impact of service delivery in developmental contexts can be drawn from the World Bank’s case study of HIV/AIDS interventions in India. Recall that this case study actually spans across two states: Andhra Pradesh and Karnataka. In a traditional comparative case study setup, the selection of both cases would seem to yield limited insights. After all, they are contextually similar: “Andhra Pradesh and Karnataka … represent the epicenter of the HIV/AIDS epidemic in India. In addition, they were early adopters of the targeted interventions”; and they also experience a similar outcome: “HIV/AIDS prevalence among female sex workers declined from 20 percent to 7 percent in Andhra Pradesh and from 15 percent to 5 percent in Karnataka between 2003 and 2011” (Reference El-Saharty and NagarajEl-Saharty and Nagaraj 2015: 7; 3). In truth, this comparative case study design makes substantial sense: had the researchers focused on the impact of the Indian government’s NACP program only in Andhra Pradesh or only in Karnataka, one might have argued that there was something unique about either state that rendered it impossible to generalize the causal inferences. By instead demonstrating that favorable public health outcomes can be traced to the NACP program in both states, the researchers can support the argument that the intervention would likely prove successful in other contexts to the extent that they are similar to Andhra Pradesh and Karnataka.
One risk of the foregoing approach is highlighted by Reference SewellSewell (2005: 95–96): contextual similarity may suggest cross-case interactions that hamper the ability to treat the second, most-similar case as if it were independent of the process-traced case. For example, an extensive body of research has underscored how protests often diffuse across proximate spatiotemporal contexts through mimicry and the modularity of repertoires of contention (Reference TillyTilly 1995; Reference TarrowTarrow 1998). And, returning to the World Bank case study of HIV/AIDS interventions in Andhra Pradesh and Karnataka, one concern is that because these states share a common border, cross-state learning or other interactions might limit the value-added of a comparative design over a single case study, since the second case may not constitute truly new data. The researcher should be highly sensitive to this possibility when selecting and subsequently process tracing the most-similar case: the greater the likelihood of cross-case interactions, the lesser the likelihood that it is a case-specific causal process – as opposed to cross-case diffusion mechanism – that is doing most of the explanatory work.
Conversely, if the causal chain is found to operate differently in the second, most-similar case, then the researcher can make an argument for rejecting the generalizability of the causal explanation with some confidence. The conclusion would be that the causal process is sui generis and requires the “localization” of the theoretical explanation for the outcome of interest (Reference TarrowTarrow 2010: 251–252). In short, this would suggest that the process-traced case is an exceptional or deviant case, given a lack of causal generalizability even to cases bearing strong family resemblances. Here, we are using the ‘strong’ notion of ‘deviant’: the inability of a causal process to generalize to similar contexts substantially decreases the likelihood that “other cases” could be explained with reference to (or even in opposition to) the process-traced case.
There is, of course, the risk that by getting mired in the weeds of the first case, the researcher is unable to recognize how the overall chronology of events and causal logics in the most-similar case strongly resembles the process-traced case. That is, a null finding of generalizability in a most-similar context calls on the researcher to probe whether they have descended too far down the “ladder of generality,” requiring more abstract conceptual categories to compare effectively (Reference SartoriSartori 1970; Reference Collier and LevitskyCollier and Levitsky 1997).
7.4.4 Probing Scope Conditions and Equifinality Via a Most-Different Cases Design
A researcher that has process-traced a given case and revealed a factor or sequence of factors as causally relevant may also benefit from leveraging a most-different cases approach. This case selection technique yields complementary insights to the most-similar cases design described in the previous section, but its focus is altogether different: instead of uncovering the degree to which an identified causal process travels, the objective is to try to understand where and why it fails to travel and whether alternative pathways to the same outcome may be possible.
More precisely, by selecting a case that differs substantially from the process-traced case in background characteristics, the researcher maximizes contextual heterogeneity and the likelihood that the causal process will not generalize to the second case (Reference SoiferSoifer 2020). Put differently, the scholar would be selecting a least-likely case for generalizability, because the context-dependence of causal mechanisms renders it unlikely that the same sequence of events will generate the same outcome in the second case. This would offer a first cut at establishing “scope conditions” upon the generalizability of the theory (Reference TarrowTarrow 2010: 251) by isolating which contextual factors prevented the process from producing the outcome in the most-different case.
Figure 7.8 provides a visual illustration of what this design could look like. Suppose, once more, that process tracing in Case 1 has revealed that some event 4 followed by event 5 generated the outcome of interest. To maximize the probability that we will be able to place scope conditions on this finding, we would select a comparative case that is most different to the process-traced case (a case with different values/occurrences of other independent variables/events [denoted as IV1–IV3 in Figure 7.8] that might also influence the outcome) but which also experienced the sequence of event 4 followed by event 5. Given the contextual differences between these two cases, the likelihood that the same sequence will produce the same outcome in both is low, which then opens up opportunities for the researcher to probe the logic of scope conditions. In this endeavor, temporality can serve as a useful guide: a means for restricting the set of potential contextual factors that prevented the causal process from reproducing the outcome in Case 2 is to identify at what chronological point the linkages between events 4 and 5 on the one hand and the outcome of interest on the other hand branched off from the way they unfolded in Case 1. The researcher can then scout which contextual factors exuded the greatest influence at that temporal location and identify them as central to the scope conditions to be placed upon the findings.
To provide an example for how this logic of inquiry can work, consider a recent case study focused on understanding the effectiveness of Mexico’s conditional cash transfer program – Opportunitades, the first program of its kind – in providing monetary support to the female heads of Indigenous households (Reference Alva Estrabridis and Ortega NietoAlva Estrabridis and Ortega Nieto 2015). The program suffered from the fact that Indigenous beneficiaries dropped out at higher rates than their non-Indigenous counterparts. In 2009 the World Bank spearheaded an Indigenous Peoples Plan (IPP) to bolster service delivery of cash transfers to Indigenous populations, which crucially included “catering to indigenous peoples in their native languages and disseminating information in their languages” (Reference Alva Estrabridis and Ortega NietoAlva Estrabridis and Ortega Nieto 2015: 2). A subsequent impact evaluation found that “[w]hen program messages were offered in beneficiaries’ mother tongues, they were more convincing, and beneficiaries tended to participate and express themselves more actively” (Reference Alva Estrabridis and Ortega NietoAlva Estrabridis and Ortega Nieto 2015; Reference Mir, Gámez, Loyola, Martí and VerazaMir et al. 2011).
Researchers might well be interested in the portability of the foregoing finding, in which case the previously described most-similar cases design is appropriate – for example, a comparison with the Familias en Accion program in Colombia may be undertaken (Reference Attanasio, Battistin, Fitzsimons, Mesnard and Vera-Hernandez.Attanasio et al. 2005). But they might also be interested in the limits of the policy intervention – in understanding where and why it is unlikely to yield similar outcomes. To assess the scope conditions upon the “bilingualism” effect of cash transfer programs, a most-different cases design is appropriate. Thankfully, conditional cash transfer programs are increasingly common even in historical, cultural, and linguistic contexts markedly different from Mexico, most prominently in sub-Saharan Africa (Reference Lagarde, Haines and PalmerLagarde et al. 2007; Reference Garcia and MooreGarcia and Moore 2012). Selecting a comparative case from sub-Saharan Africa should prove effective for probing scope conditions: the more divergent the contextual factors, the less likely it is that the policy intervention will produce the same outcome in both contexts.
On the flip side, in the unlikely event that part or all of the causal process is nonetheless reproduced in the most-different case, the researcher would obtain a strong signal that they have identified one of those rare causal explanations of general scope. In coming to this conclusion, however, the researcher should be wary of “conceptual stretching” (Reference SartoriSartori 1970: 1034), such that there is confidence that the similarity in the causal chain across the most-different cases lies at the empirical level and is not an artificial by-product of imprecise conceptual categories (Reference Bennett and CheckelBennett and Checkel 2015: 10–11). Here process tracing, by pushing researchers to not only specify a sequence of “tightly-coupled” events (Reference Falleti, Mahoney, Mahoney and ThelenFalleti and Mahoney 2015: 233), but also to collect observable implications about the causal mechanisms concatenating these events, can guard against conceptual stretching. By opening the “black box” of causation through detailed within-case analysis, process tracing limits the researcher’s ability to posit “pseudo-equivalences” across contexts (Reference SartoriSartori 1970: 1035).
Selecting a most-different case vis-à-vis the process-traced case is also an excellent strategy for probing equifinality – for maximizing the likelihood that the scholar will be able to probe multiple causal pathways to the same outcome. To do so, it is not sufficient to merely ensure divergence in background conditions; it is equally necessary to follow Mill’s method of agreement by ensuring that the outcome in the process-traced case is also present in the second, most-different case. By ensuring minimal variation in outcome, the scholar guarantees that process tracing the second case will lead to the desired destination; by ensuring maximal variation in background conditions, the scholar substantially increases the likelihood that process tracing will reveal a slightly or significantly different causal pathway to said destination. Should an alternative route to the outcome be found, then its generalizability could be assessed using the most-similar cases approach described previously.
Figure 7.9 visualizes what this case selection design might look like. Here, as in previous examples, suppose process tracing in Case 1 provides evidence that event 4 followed by event 5 produced the outcome of interest. The researcher then selects a case with the same outcome, but with different values/occurrences of some independent variables/events (in this case, IV1–IV3) that may influence the outcome. Working backwards from the outcome to reconstruct the causal chain that produced it, the researcher then probes whether (i) the sequence (event 4 followed by event 5) also occurred in Case 2, and (ii) whether the outcome of interest can be retraced to said sequence. Given the contextual dissimilarities between these most-different cases, such a finding is rather unlikely, which would subsequently enable to the researcher to probe whether some other factor (perhaps IV2/event 2 in the example of Figure 7.9) produced the outcome in the comparative case instead, which would comprise clear evidence of equifinality.
To return to the concrete example of Mexico’s conditional cash transfer program’s successful outreach to marginalized populations via bilingual service provision, an alternative route to the same outcome might be unearthed if a cash transfer program without bilingual outreach implemented in a country characterized by different linguistic, gender, and financial decision-making norms proves similarly successful in targeting marginalized populations. Several factors – including recruitment procedures, the size of the cash transfers, the requirements for participation, and the supply of other benefits (Reference Lagarde, Haines and PalmerLagarde et al. 2007: 1902) – could interact with the different setting to produce similar intervention outcomes, regardless of whether multilingual services are provided. Such a finding would suggest that these policy interventions can be designed in multiple ways and still prove effective.
To conclude, the method of inductive case selection complements within-case analysis by supplying a coherent logic for probing generalizability, scope conditions, and equifinality. To summarize, Figure 7.10 provides a roadmap of this approach to comparative case selection.
In short, if the researcher has the requisite time and resources, a multistage use of Millian methods to conduct four comparative case studies could prove very fertile. The researcher would begin by selecting a second, most-similar case to assess causal generalizability to a family of cases similar to the process-traced case; subsequently, a third, most-different case would be selected to surface possible scope conditions blocking the portability of the theory to divergent contexts; and a fourth, most-different case experiencing the same outcome would be picked to probe equifinal pathways. This sequential, four-case comparison would substantially improve the researcher’s ability to map the portability and contours of both their empirical analysis and their theoretical claims.Footnote 9
7.5 Conclusion
The method of inductive case selection converts process tracing meant to simply “craft a minimally sufficient explanation of a particular outcome” into a methodology used to build and refine a causal theory – a form of “theory-building process-tracing” (Reference Beach and PedersenBeach and Pedersen 2013: 16–18). Millian methods are called upon to probe the portability of a particular causal process or causal mechanism and to specify the logics of its relative contextual-dependence. In so doing, they enable theory-building without presuming that the case study researcher holds the a priori knowledge necessary to account for complex temporal dynamics at the deductive theorizing stage. Both of these approaches – deductive, processualist theorizing on the one hand, and the method of inductive case selection on the other hand – provide some insurance against Millian methods leading the researcher into ignoring the ordered, paced, or equifinal structure that may underlie the pathway(s) to the outcome of interest. But, I would argue, the more inductive approach is uniquely suited for research that is not only process-sensitive, but also open to novel insights supplied by the empirical world that may not be captured by existing theories.
Furthermore, case study research often does (and should!) proceed with the scholar outlining why an outcome is of interest, and then seeking ways to not only make inferences about what produced said outcome (via process tracing) but situating it within a broader empirical and theoretical landscape (via the method of inductive case selection). This approach pushes scholars to answer that pesky yet fundamental question – why should we care or be interested in this case/outcome? – before disciplining their drive for generalizable causal inferences. After all, the deductive use of Millian methods tells us nothing about why we should care about the cases selected, yet arguably this is an essential component of any case selection justification. By deploying a most-similar or most-different cases design after an initial case has been justifiably selected due to its theoretical or historical importance, policy relevance, or puzzling empirical nature, the researcher is nudged toward undertaking case study research yielding causal theories that are not only comparatively engaged, but also substantively interesting.
The method of inductive case selection is most useful when the foregoing approach constitutes the esprit of the case study researcher. Undoubtedly, deductively oriented case study research (see Reference LiebermanLieberman 2005; Reference Lieberman, Mahoney and Thelen2015) and traditional uses of Millian methods will continue to contribute to social scientific understanding. Nevertheless, the perils of ignoring important sequential causal dynamics – particularly in the absence of good, processualist theories – should caution researchers to proceed with the greatest of care. In particular, researchers should be willing to revise both theory building and research design to its more inductive variant should process tracing reveal temporal sequences that eschew the analytic possibilities of the traditional comparative method.
8.1 Introduction
A transparency revolution is sweeping the social sciences.Footnote 1 The failure to replicate existing findings, a suspicious absence of disconfirming results, the proliferation of uninformative or inaccurate citations, and broader concerns about a media environment that privileges “fake news” and sensationalism over rigorously grounded facts have all raised concerns about the legitimacy and credibility of academic scholarship. Journals, professional associations, funders, politicians, regulators, and colleagues now press researchers to open their data, analysis, and methods to greater scrutiny.Footnote 2 Qualitative researchers who conduct case studies, collect archival or interview data, and do ethnography, participant observation, or other types of nonquantitative studies are no exception. They have been developing specific standards and techniques for enhancing transparency, including some that exploit digital technology. Reputable research now requires more than solid empirical evidence, state-of-the-art theory, and sophisticated methods: It must be transparent.Footnote 3
Yet the transparency of qualitative analysis by practitioners in governmental, intergovernmental, and civil society institutions lags behind. In recent years, practitioners have pushed policy-makers to improve governmental transparency, yet, ironically, the data, analysis, methods, and other elements of their own research lack a similar openness.Footnote 4 The data and analysis in policy case studies and histories, after-action reports, and interview or focus-group analyses are often opaque. This is troubling, since the justifications for enhancing transparency in academic research apply equally, or even more so, to research by practitioners in governments, think tanks, and international organizations. To them, moreover, we can add numerous and pressing justifications for greater transparency specific to the policy world. Safeguarding the clarity, accessibility, and integrity of policy-relevant research helps ensure that decision-makers avoid basing costly policy interventions on flawed analysis or incomplete information. Transparency helps guard against potential conflicts of interest that might arise in research or policy implementation. Most importantly, it opens up public assessment and evaluation to proper official and public deliberation – thus according them greater legitimacy.
This chapter offers a brief background on the basic logic and practice of transparency in qualitative social science and reviews the cost-effectiveness of the available practical options to enhance it – both in the academy and in the policy world. Section 8.2 defines three dimensions of research transparency and explores some of the distinctiveness of qualitative research, which suggests various reasons why the applied transparency standards in qualitative research may differ from those employed in quantitative research. Section 8.3 examines three commonly discussed strategies to enhance transparency. It argues that in most cases it is infeasible and inappropriate – and, at the very least, insufficient – for qualitative policy analysts to employ conventional footnotes, hyperlinks to web-based sources, or, as some suggest by analogy to statistical research, centralized “datasets” to store all of a project’s qualitative source material. Section 8.4 introduces a new strategy to enhance qualitative research transparency that is emerging as a “best practice.” This is “Active Citation” (AC) or “Annotation for Transparency Initiative” (ATI): a digitally enabled open-source discursive annotation system that is flexible, simple, and compatible with all existing online formats.Footnote 5 For practitioners, as for scholars, AC/ATI is likely to be the most practical and broadly applicable means to enhance the transparency of qualitative research and reporting.
8.2 Research Transparency in the Social Sciences
Transparency is a norm that mandates that “researchers have an ethical obligation to facilitate the evaluation of their evidence-based knowledge claims.”Footnote 6 This is a foundational principle of all scientific work. Scholars embrace it across the full range of epistemological commitments, theoretical views, and substantive interests. It enjoys this status because nearly all researchers view scholarship as a collective enterprise: a conversation among scholars and often extending to those outside academia.Footnote 7 Researchers who conduct transparent work enhance the ability of others to engage in the conversation through productive evaluation, application, critique, debate, and extension of existing work. Without transparent data, theory, and methods, the conversation would be impoverished. A research community in which scholars can read, understand, verify, and debate published work when they choose should foster legitimate confidence in results. A research community in which analysts accept findings because of the prominence of the author or the apparent authority of big data, copious citations, clever arguments, or sophisticated “gold standard” methods should not inspire trust.
Research transparency has three broad dimensions.Footnote 8 The first, data transparency, stipulates that researchers should publicize the data and evidence on which their research rests. This helps readers apprehend the richness and diversity of the real-world political activity scholars study and to assess for themselves to what extent (and how reliably) that evidence of that activity confirms particular descriptive, interpretive, or causal interpretations and theories linked to it. The second dimension, analytic transparency, stipulates that researchers should publicize how they interpret and analyze evidence in order to generate descriptive and causal inferences. In social research, evidence does not speak for itself but is analyzed to infer unobservable characteristics such as preferences, identities, beliefs, rationality, power, strategic intent, and causality. For readers to understand and engage with research, they must be able to assess how the author purports to conceptualize and measure behavior, draw descriptive and causal inferences from those measures, determine that the results are conclusive vis-à-vis alternatives, and specify broader implications. The third dimension, production transparency, stipulates that social scientists should publicize the broader set of design choices that underlie the research. Decisions on how to select data, measure variables, test propositions, and weight overall findings – before, during, and after data analysis – often drive research results by defining the particular combination of data, theories, and methods they use for empirical analysis. Researchers are obliged, to the extent possible, to afford readers all three types of research transparency.
These three elements of research transparency underlie all scientific research communities, including those in fields such as history, law, ethnography, policy assessment, and discourse analysis.Footnote 9 Yet its form varies by research method. The appropriate rules and standards of applied transparency in qualitative research, for example, differ from those governing quantitative research. An ideal-typical qualitative case study of public policy has three distinctive characteristics. It focuses intensively on only one or a few cases. It employs primarily textual evidence, such as documents, transcripts, descriptions, and notes (though visual and numerical evidence may sometimes also be used). And, finally, it is generally reported and written up as a temporal, causal, or descriptive narrative, with individual pieces of evidence (and interpretation) inserted at specific points in the story. Different types of data and inference should generate subtly different transparency norms.
Qualitative research methods – intensive, text-based narrative studies of individual cases – are indispensable. They play a critical role in a healthy and balanced environment of research and policy evaluation – not just in the academy, but in the policy world as well. In both contexts, qualitative research enjoys distinct comparative advantages. For policy-makers, one of the most important is that qualitative analysis permits analysts to draw inferences from and about single cases (see Cartwright, Chapter 2 this volume). Detailed knowledge and insights about the characteristics of a single case, rather than average outcomes, are often what policy-makers and analysts most need. This may be because some types of phenomena are intrinsically rare, even unique. If only a limited number of cases exist, a case study may be the best way to inform policy.Footnote 10 The demand for precise knowledge about a single case may arise also because policy-makers are focused on designing a particular intervention at a specific time and geographical location. Even if solid quantitative generalizations exist, policy-makers often want to know exactly what mix of factors is at work in that case – that is, whether the case before them is a typical case or an outlier. If, for example, after-action reports show that a promising program design recently failed when implemented in Northern India, does that mean it is less likely to succeed if launched in Bolivia? Answering this type of everyday policy problem in real time often requires detailed knowledge of important contextual nuances of the local culture, politics, and economics. This, in turn, implies that, in order to be useful, the original after-action report may want to consider detailed evidence of incentives, perceptions, and inclinations as revealed by actions, documents, and statements. For similar reasons, case studies often enjoy a comparative advantage in situations where analysts possess relatively little prior knowledge and seek to observe and theorize previously unknown causal mechanisms, social contexts, and outcomes in detail, thus contributing to the development of new and more accurate explanations and theories.Footnote 11
8.3 Practical Options for Enhancing Qualitative Transparency
By what means can we best render qualitative research more transparent? Social scientists generally possess some inkling of the research transparency norms governing statistical and experimental research. When we turn to qualitative research, however, many analysts remain unaware that explicit standards for transparency of data, analysis, or methods exist, let alone what they are. In recent years, qualitative social scientists have moved to establish stronger norms of transparency. Building on the American Political Science Association’s initiative on Data Access and Research Transparency (APSA/DA-RT) in the US field of political science, a team of scholars has developed specific applied transparency guidelines for qualitative research.Footnote 12 A series of conferences, workshops, journal articles, and foundation projects are further elaborating how best to implement qualitative transparency in practice.Footnote 13 The National Science Foundation (NSF) has funded a Qualitative Data Repository (QDR) based at Syracuse University, as well as various projects demonstrating new transparency standards and instruments that use new software and internet technologies.Footnote 14
Scholars have thereby generated shared knowledge and experience about this issue. They have learned that qualitative research poses distinctive practical problems due to factors such as human subject protection, intellectual property law, and logistical complexity, and distinctive epistemological problems, which arise from its unique narrative form. These must be kept in mind when assessing alternative proposals to enhance transparency.
Four major options exist: conventional footnotes, hyperlinks to online sources, archiving textual data, and digitally enabled discursive notes. A close examination of these options reveals, first, that the practical and epistemological distinctiveness of qualitative research implies a different strategy than is employed in quantitative research, and, second, that the optimal strategy is that of creating digital entries containing annotated source material, often called Active Citation or the Annotation for Transparency Initiative. We consider each of these four options in turn.
8.3.1 Conventional Footnotes
The simplest and most widespread instruments of transparency used today in social science are citations found in footnotes, endnotes, and the text itself. Yet the current state of citation practice demonstrates the flaws in this approach. Basic citations in published work are often incomplete or incorrect, particularly if they appear as brief in-text “scientific citations” designed for a world in which most (quantitative) analysts use footnotes to acknowledge other researchers rather than cite evidence. Such citations do not provide either data access or analytic transparency. Scientific citations are often incomplete, leaving out page numbers and failing to specify the concrete textual reference within an article or on a page that the author considers decisive. Even if a citation is precise, most readers will be deterred by the need to locate each source at some third location, perhaps a library or an archive – and, in many cases, as with interviews and records of focus groups, the source material may not be available at all.Footnote 15 Even more troubling, conventional citations offer no analytical transparency whatsoever: the reader knows what is cited, but generally much less about why.
In theory, an attractive solution would be to return to the traditional method of linking evidence and explanation in most scholarly fields: long discursive footnotes containing extended quotations with interpretive annotations. Discursive footnotes of this kind remain widespread in legal academia, history, some humanities, and a few other academic disciplines that still prize qualitative transparency. In legal academia, for example, where fidelity to the precise text and rigorous interpretation are of great academic and practical value, articles may have dozens, even hundreds, of such discursive footnotes – a body of supplementary material many times longer than the article itself. The format evolved because it can enhance all three dimensions of transparency. The researcher is often obliged to insert extensive quotations from sources (data access); annotate those quotations with extensive interpretation of how, why, and to what extent they support a claim made in the text and how they fit into the broader context (analytic transparency); and discuss issues of data selection and opposing evidence (production transparency). At a glance, readers can scan everything: the main argument, the citation, the source material, the author’s interpretation, and information about how representative the source is. In many ways, discursive footnotes remain the “best practice” instruments for providing efficient qualitative transparency.
Yet recent trends in formatting social science journals – in particular, the advent of so-called scientific citations and ever-tighter word limits – have all but banished discursive footnotes. This trend is not methodologically neutral: it privileges quantitative research that employs external datasets and cites secondary journals rather than data, while blocking qualitative research from citing and interpreting texts in detail. As a result, in many social sciences, we see relatively little serious debate about the empirics of qualitative research. Replication or reanalysis is extremely difficult, and extension or secondary analysis almost impossible.Footnote 16 Given the economics of social science journals, this trend is unlikely to reverse. Practitioners and policy analysts face similar constraints, because they often aim their publications, at least in part, at nonexperts. Memos and reports have been growing shorter. Long discursive footnotes pose a visual barrier, both expanding the size of a text, and rendering it less readable and accessible. In sum, conventional footnotes and word limits are part of the problem, not the solution.
8.3.2 Hyperlinks to Online Sources
Some suggest that a simple digital solution would be to link articles and reports to source documents already posted online. Many government reports, journalistic articles, contemporary scholarship, and blogs often do just this. Yet this offers an inadequate level of research transparency, for three basic reasons. First, much material simply cannot be found online: Most primary field research evidence (e.g., interviews) is not there, and despite the efforts of archives to digitalize, we are far from having all documents online even in the most advanced industrial democracies, let alone elsewhere. Even journalistic articles and secondary scholarly works are unevenly available, with much inaccessible online (or hidden behind paywalls), in foreign languages, or buried within longer documents. Second, links to outside sources are notoriously unstable, and subject to “link rot” or removal.Footnote 17 Attempts to stabilize links to permit cross-citation have proven extremely challenging even when they focus on a very narrow range of documents (e.g., academic medical journals), and it is nearly impossible to do so if one is dealing, as policy analysts do, with an essentially unlimited range of contemporary material of many types and in many languages. Third, even when sources are available online – or when we place them online for this purpose – hyperlinks provide only data transparency, not analytical and process transparency. We learn what source a scholar cited but not why, let alone how he or she interpreted, contextualized, and weighed the evidence. This undermines one of the distinctive epistemological advantages of qualitative research.
8.3.3 Archiving Evidence in a Centralized Database
For many from other research traditions, data archiving may seem at first glance the most natural way to enhance transparency. It is, after all, the conventional solution employed by statistical researchers, who create centralized, homogeneous “datasets” where all evidence is stored, connected to a single set of algorithms used to analyze it. Moreover, data repositories do already exist for textual material, notably the Qualitative Data Repository for social science materials recently established with NSF funding at Syracuse University.Footnote 18 Data archiving is admittedly essential, especially for the purpose of preserving complete collections of new field data drawn from interviews, ethnographic notes, primary document collections, and web-searches of manageable size that are unencumbered by human subject or copyright restrictions.Footnote 19 Archiving full datasets can also help create a stronger bulwark against selection bias (“cherry-picking” or constructing biased case studies by selecting only confirming evidence) by obliging qualitative scholars to archive “all” their data.
Yet, while data archiving can be a useful ancillary technique in selected cases, it is unworkable as a general “default” approach for assuring qualitative research transparency because it is both impractical and inappropriate. Archiving is often impractical because ethical, legal, and logistical constraints limit the analyst’s ability to reveal to readers all the interviews, documents, or notes underlying qualitative research. Doing so often threatens to infringe the confidentiality of human subjects and violates copyright law limiting the reproduction of published material.Footnote 20 Sanitizing all the interviews, documents, and notes (i.e., rendering them entirely anonymous and consistent with confidentiality agreements) is likely to impose a prohibitive logistical burden on many research projects. These limitations become much greater when the researcher seeks to archive comprehensive sets of complete documents, as opposed to just releasing quotations or summaries, as some other transparency strategies require. This is often particularly problematic for policy practitioners, perhaps more so than scholars, because policy case studies and histories, after-action reports, and interview or focus-group analyses so commonly contain sensitive information.
Archiving is also inappropriate because it dilutes the distinctive epistemological advantages of qualitative research. The notion that archiving documents in one large collection generates transparency overlooks a distinctive quality of case study analysis. A qualitative analyst does not treat the data as one undifferentiated mass, analyzing all of it at once using a centralized algorithm, as in a statistical study. Instead, he or she presents and interprets individual pieces of data one at a time, each linked to a single step in the main narrative.Footnote 21 Qualitative analysts enjoy considerable flexibility to assign a different location, role, relative weight, reliability, and exact meaning to each piece of evidence, depending on its logical position in a causal narrative, the specific type of document it is, and the textual content of the quotation within that document. This type of nuanced and open-ended, yet rigorous and informed, contextual interpretation of sources is highly prized in fields such as history, law, anthropology, and the humanities. Any serious effort to enhance qualitative transparency must thus make clear to the reader how the analyst interprets each piece of data and exactly where in the narrative it fits. Simply placing all the evidence in a single database, even where it is logistically and legally feasible, does not help the reader much.Footnote 22 Links from citations to archived material are, at best, cumbersome. Moreover, as with hyperlinks and conventional citations, archiving fails to specify particular passages and provides little analytic transparency, because it fails to explain why each source supports the underlying argument at that point in the narrative. To achieve qualitative transparency, a less costly approach is required – one that reveals the inferential connection between each datum and the underlying analytical point in the narrative.
8.3.4 Active Citation/ATI: A “Best Practice” Standard of Qualitative Transparency
Given the practical and epistemological constraints outlined above, social scientists have recently agreed that the best way to enhance transparency is to exploit recent innovations in internet formatting and software engineering. These technologies permit us to create new digital formats that can reestablish the high levels of qualitative transparency afforded by discursive footnotes in a more efficient and flexible way. Active Citation (AC) and Annotation for Transparency Initiative (ATI) are two related, digitally enhanced transparency standards designed do just this. They are practical and epistemologically appropriate to qualitative research.
AC/ATI envisages a digitally enabled appendix to research publications and reports. Rather than being an entirely separate document, however, the appendix embeds each source and annotation in an entry linked to a specific statement or citation in the main narrative of a research article or report. These may take the form of numbered hyperlinks from the article to an appendix or, in the ATI version, a set of annotations that overlay the article using a separate but parallel software platform. Unlike modern in-text footnotes, hyperlinks, and archiving, AC/ATI reinforces the epistemological link between narrative, data, and interpretation central to qualitative research. This author-driven process of annotation and elaboration via a separate document assures the same (or greater) levels of data, analytical, and production transparency as discursive footnotes, but with greater flexibility and no constraint on overall length. Moreover, it reduces the logistical difficulties by leaving the existing format of basic digital or paper articles and reports completely unchanged. Indeed, AC/ATI has the advantage that some audiences can simply skim or read the article without any additional materials, while those with a desire for more information can activate the additional materials.
Two ways exist to implement the AC/ATI standards. One, initially proposed by advocates of AC, obliges authors to design standardized entries that promote realistic levels of data, analytic, and production transparency in a relatively structured way. Accordingly, AC prescribes that researchers link each annotation that concerns an “empirically contestable knowledge claim” to a corresponding appendix entry. Of course, this still leaves tremendous leeway to the author(s), who decide (as with any footnote or citation) what is sufficiently “empirical” or “contestable” to merit further elaboration. Once an author decides that further elaboration is required, each entry would contain three mandatory elements and room for one more optional one – though, again, the author would decide how detailed and lengthy this elaboration needs to be.
An examination of the four elements in an AC entry shows how, in essence, this system simply updates the centuries-old practice of discursive footnoting in a flexible, author-driven, and electronic form appropriate to a digital age.Footnote 23 The four elements that can be in each entry are:
1) A textual excerpt from the source. This excerpt is presumptively 50–100 words long, though the length is ultimately up to the author. It achieves basic qualitative data transparency by placing the essential textual source material that supports the claim “one click away” from the reader. Sources subject to human subject or copyright restrictions can be replaced with a sanitized version, a summary, or a brief description, as is feasible. This provides a modest level of prima facie data transparency, while minimizing the logistical demands on authors, the ethical threats to subjects, and the potential legal liability.
2) An annotation. This length of interpretive commentary explains how, why, to what extent, and with what certainty the source supports the underlying claim in the main text. This provides basic analytic transparency, explaining how the author has interpreted the source. In this section, the author may raise not just the analysis of a given source, but its interpretive context, its representativeness of a broader sample, the existence of counterevidence, how it should be read in broader context, how it was translated, etc. This annotation can be of any length the author believes is justified.
3) A copy of the full footnote citation, sufficient to locate the document. This is critical because authors may seek to use the appendices independently of the text – for example, in a bibliography or database. Also, it assures that, whatever the format being employed in the main report, a genuine full citation exists somewhere, which is far from true today.
4) An optional link to (or scan of) the full source. A visual copy of the source would provide more context and unambiguous evidence of the source, as well as creating additional flexibility to accommodate nontraditional sources such as maps, charts, photographs, drawings, video, recordings, and so on. This option can be invoked, however, only if the author has the right to link or copy material legally and the ability to do so cost effectively, which may not always be the case – and doing so at all remains at the discretion of the author.
Of course, the de facto level of transparency that an author chooses to provide in any specific case will still reflect other important constraints. One constraint is ethical. The active citations cannot make material transparent that would harm research subjects or that is subject to confidentiality agreements. Ethical imperatives obviously override transparency.Footnote 24 A second constraint is legal. The content of the entries must respect intellectual property rights. Fortunately, small citations of most published material (except artistic or visual products) can be cited subject to “fair use” or its equivalent in almost all jurisdictions – but in cases of conflict, legal requirements override transparency. A third constraint is logistical. The amount of time and effort required to provide discursive notes of the type AC envisages is surely manageable, since discursive footnotes with roughly the same content were the norm in some academic disciplines and were widely used in the social sciences until a generation ago – and still appear in many published books. Today, the advent of electronic scanning and word processing make the process far easier. One can readily imagine situations in which that would create excessive work for the likely benefit. This is yet another reason why the decision of how many annotations to provide and how long they are remains primarily with individual authors, subject to guidance from relevant research communities, as is currently the case with conventional citations. Ultimately, the number of such entries, and their length and content, remain essentially up to the author, much as the nature of footnotes is today.
ATI offers the slightly different prospect of a more flexible, open-ended standard. ATI’s major innovation is to use innovative software provided by the nonprofit firm hypothesis.Footnote 25 In lieu of storing the annotated source entries in a conventional appendix (akin to existing practice with formal and quantitative research) and hyperlinking individual entries to selected citations, as AC initially recommended, ATI allows the annotations to be written at will, stored in a separate program, and seamlessly layered on top of a PDF article by running the two programs simultaneously. ATI software makes the annotated sections appear as highlighted portions of the article, and when one clicks on a section of highlighting, the additional material appears in a box alongside the article. ATI provides a particularly efficient and manipulable means of delivering these source material and annotations, and it provides almost infinite flexibility to authors. In trials, authors use the software to add annotations as they see fit. This type of software option also allows for separate commentary by readers. One might imagine the social sciences moving forward for a time with a set of such experiments that recommend no specific set of minimum standards for transparency but permit authors to define their own digital options. In a number of large test studies, dozens of younger scholars have tried ATI out with considerable enthusiasm, and this approach is in the process of adoption by major university presses that publish journals. This, it seems, is the future.
8.4 Conclusion: Qualitative Transparency in the Future
Qualitative social science journals, publishers, and scholars, having inadvertently undermined traditional qualitative transparency in recent decades, appear now to be moving back toward the higher levels practiced by researchers in history, law, and the humanities. An approach such as AC/ATI offers a more attractive trade-off between enhanced research transparency and the imperatives of ethics/confidentiality, intellectual property rights, and logistics than that offered by any existing alternative, even if data archives, conventional citations, and hyperlinks to existing web sources can occasionally be useful. These new digital standards are logistically efficient, flexible in the face of competing concerns, and remain firmly decentralized in the hands of researchers themselves. Over the next decade, journals and research communities are likely to adopt levels and strategies of qualitative transparency that differ in detail but all move in this direction, not least because funders and their fellow scholars are coming to expect it. Thus, while it remains to be seen precisely how standards for qualitative transparency will evolve in the future, it seems likely that digital means will be deployed more intensively to enhance research transparency. This is true not just because it renders social science research richer and more rigorous, but because society as a whole is moving in that direction. As digital transparency that clicks through to more detailed source material has become the norm in journalism, government messaging, business, and entertainment, the notion that researchers should not follow suit seems increasingly anachronistic. The same is true, of course, for practitioners and policy analysts who work on the major international challenges of our time.