Skip to main content Accessibility help
×
Hostname: page-component-7b9c58cd5d-dlb68 Total loading time: 0 Render date: 2025-03-14T17:07:22.020Z Has data issue: false hasContentIssue false

2 - Incredible Certitude

from Part I - Characterizing Uncertainty

Published online by Cambridge University Press:  02 January 2025

Charles F. Manski
Affiliation:
Northwestern University, Illinois

Summary

Most of this chapter documents the prevalence of incredible certitude. Section 2.1 calls attention to the core role that certitude has played in major streams of religion and philosophy. Section 2.2 describes conventional certitude in official economic statistics reported by federal statistical agencies in the United States. Section 2.3 discusses dueling certitudes in research on criminal justice. Section 2.4 documents wishful extrapolation from medical research to patient care. Section 2.5 remarks on the complementary practice of sacrificing relevance for certitude, again using medical research to illustrate.

The closing part of chapter poses and assesses arguments that seek to explain incredible certitude. Section 2.6 discusses psychological arguments asserting that expression of incredible certitude in policy analysis is necessary because the public is unable to cope with uncertainty. Section 2.7 considers arguments asserting that incredible certitude is useful or necessary as a device to simplify collective decision making.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2025

Analyses of public policy regularly express certitude about the consequences of alternative policy choices. Expressions of uncertainty are rare. Yet predictions are often fragile. Conclusions may rest on critical unsupported assumptions or on leaps of logic. Then the certitude of policy analysis is not credible.

One can illuminate the tension between the credibility and power of assumptions by posing assumptions of varying strength and determining the conclusions that follow. In practice, policy analysis tends to sacrifice credibility in return for strong conclusions. Why?

Analysts and policy makers respond to incentives. The scientific community rewards strong novel findings. The public wants unequivocal policy recommendations. These incentives make it tempting to maintain assumptions far stronger than can be persuasively defended, in order to draw strong conclusions.

My concern with incredible certitude originated at a 1988 conference on evaluation of income tax policy, where I first presented in public my early findings on partial identification with missing data. Responding to my remarks, the econometrician Jerry Hausman stated: “You can’t give the client a bound. The client needs a point.” In the early 1990s, my colleague and friend Daniel McFadden relayed to me a story he had heard about an economist’s attempt to describe uncertainty about a forecast to US President Lyndon B. Johnson. The economist is said to have presented the forecast as a likely range of values for the quantity under discussion. Johnson is said to have replied, “Ranges are for cattle. Give me a number.”

In Manski (Reference Manski2011b), I introduced a typology of practices that contribute to incredible certitude. I have since elaborated in Manski (Reference Manski2013c, Reference Manski2015a, Reference Manski2019a, Reference Manski2020e). The typology is:

  • conventional certitude: A prediction that is generally accepted as true but is not necessarily true.

  • dueling certitudes: Contradictory predictions made with alternative assumptions.

  • conflating science and advocacy: Specifying assumptions to generate a predetermined conclusion.

  • wishful extrapolation: Using untenable assumptions to extrapolate.

  • illogical certitude: Drawing an unfounded conclusion based on logical errors.

  • media overreach: Premature or exaggerated public reporting of policy analysis.

Most of this chapter documents the prevalence of incredible certitude. Section 2.1 calls attention to the core role that certitude has played in major streams of religion and philosophy. Section 2.2 describes conventional certitude in official economic statistics reported by federal statistical agencies in the United States. Section 2.3 discusses dueling certitudes in research on criminal justice. Section 2.4 documents wishful extrapolation from medical research to patient care. Section 2.5 remarks on the complementary practice of sacrificing relevance for certitude, again using medical research to illustrate.

The closing part of the chapter poses and assesses arguments that seek to explain incredible certitude. Section 2.6 discusses psychological arguments asserting that expression of incredible certitude in policy analysis is necessary because the public is unable to cope with uncertainty. Section 2.7 considers arguments asserting that incredible certitude is useful or necessary as a device to simplify collective decision making.

2.1 Certitude in Religion and Philosophy

While my concern is incredible certitude in modern policy analysis, it is worth keeping in mind that expression of uncertainty is an ancient human issue. Religious dogma provides extreme manifestations of incredible certitude. Hebrew prayers asserting the existence and power of God end with the congregation stating, “Amen,” which is variously interpreted in English to mean “certainty,” “truth,” or “I believe.” The Apostles’ Creed of Christianity asserts that the speaker believes in basic tenets of the faith and concludes: “I believe in the Holy Spirit, the holy Catholic Church, the communion of saints, the forgiveness of sins, the resurrection of the body, and the life everlasting. Amen.” No proof of these tenets is given, and no space is left for uncertainty. The faith asks that one simply believe.

Religious dogma is a conventional certitude in a society with a consensus faith. Dueling certitudes occur when people hold different faiths whose dogmas are inconsistent with one another. It is sometimes said that dueling certitudes may be useful as a device to promote learning. The ancient idea of dialectic proposes that debating contradictory perspectives can be an effective way to determine truth. However, history presents numerous examples of bitter conflicts that result from dueling religious certitudes.

Classical and Enlightenment philosophers manifest a spectrum of views about uncertainty. Some assert that they know basic truths while others express uncertainty. I will focus on one persistent idea in the philosophy of science, namely that a scientist should choose one hypothesis among those that are consistent with the available data.

Researchers often refer to Occam’s Razor, the medieval philosophical declaration that: “Plurality should not be posited without necessity.” Duignan (Reference Duignan2023) gives the usual modern interpretation of this cryptic statement, remarking that: “The principle gives precedence to simplicity; of two competing theories, the simplest explanation of an entity is to be preferred.” The philosopher Richard Swinburne wrote (Swinburne, Reference Swinburne1997, p. 1):

I seek to show that – other things being equal – the simplest hypothesis proposed as an explanation of phenomena is more likely to be the true one than is any other available hypothesis, that its predictions are more likely to be true than those of any other available hypothesis, and that it is an ultimate a priori epistemic principle that simplicity is evidence for truth.

The choice criterion offered here is as imprecise as the one given by Occam. What do Duignan and Swinburne mean by “simplicity”?

Among economists, Milton Friedman expressed the Occam perspective in an influential methodological essay. Friedman (Reference Friedman1953) placed prediction as the central objective of science, writing (p. 5): “The ultimate goal of a positive science is the development of a ‘theory’ or ‘hypothesis’ that yields valid and meaningful (i.e. not truistic) predictions about phenomena not yet observed.” He later wrote (p. 10): “The choice among alternative hypotheses equally consistent with the available evidence must to some extent be arbitrary, though there is general agreement that relevant considerations are suggested by the criteria ‘simplicity’ and ‘fruitfulness,’ themselves notions that defy completely objective specification.”

Thus, Friedman counseled scientists to choose one hypothesis, even though this may require the use of “to some extent … arbitrary” criteria. He did not explain why scientists should choose one hypothesis from many. He did not entertain the idea that scientists might offer predictions under a range of plausible hypotheses that are consistent with the available evidence.

However one tries to operationalize the Occam perspective, its relevance to planning is not evident. In policy analysis, knowledge is instrumental to the objective of making good decisions. Discussions of Occam’s Razor do not pose this objective. Does use of a criterion such as “simplicity” to choose one hypothesis promote good decision making? As far as I am aware, philosophers have not addressed this essential question.

2.2 Conventional Certitude in Official Economic Statistics

2.2.1 Congressional Budget Office Scoring of Legislation

Conventional certitude is exemplified by US Congressional Budget Office (CBO) scoring of federal legislation. The CBO was established in the Congressional Budget Act of 1974. The Act has been interpreted as mandating the CBO to provide point predictions (scores) of the budgetary impact of legislation. CBO scores are conveyed in letters that the Director writes to leaders of Congress, unaccompanied by measures of uncertainty. CBO scores exemplify conventional certitude because they have achieved broad acceptance. They are used by both Democratic and Republican members of Congress. Media reports largely take them at face value.

A well-known example is the scoring of the Patient Protection and Affordable Care Act of 2010, commonly known as Obamacare or the ACA. In March of 2010, the CBO and the Joint Committee on Taxation (JCT) together scored the combined consequences of the ACA and the Reconciliation Act of 2010 and reported (Elmendorf, Reference Elmendorf2010, p.2) that “enacting both pieces of legislation … would produce a net reduction of changes in federal deficits of $138 billion over the 2010–2019 period as a result of changes in direct spending and revenue.” Media reports largely accepted the CBO score as fact without questioning its validity, the hallmark of conventional certitude.

A simple approach to avoid incredible certitude would be to provide interval forecasts of the budgetary impacts of legislation. The CBO would produce two scores for a bill, a low score and a high score, and report both. Or it could present a full probabilistic forecast in a graphical fan chart such as the one used by the Bank of England to predict gross domestic product (GDP) growth. If the CBO must provide a point prediction, it could continue to do so, with some convention used to locate the point within the interval forecast.

In 2010, when I became concerned about the incredible certitude expressed by the CBO when scoring Obamacare, I spoke with Douglas Holtz-Eakin, a former director of the CBO. He told me that he expected Congress would be highly displeased if the CBO were to express uncertainty when scoring pending legislation. I gave a seminar at the CBO and talked with staff members. They agreed that there is enormous uncertainty when attempting to predict the impact on the federal debt of complex legislation such as Obamacare. Yet they shared the perspective expressed by Holtz-Eakin, that they could not express this uncertainty in their official reports to Congress.

2.2.2 Economic Statistics Reported by Federal Statistical Agencies

Further leading cases of conventional certitude are evident in the official statistics published by federal statistical agencies in the United States, including the Bureau of Economic Analysis, Bureau of Labor Statistics, and Census Bureau. These agencies respectively report point estimates of GDP growth, unemployment, and household income. Agency staff know that official statistics suffer from sampling and non-sampling errors. Yet the practice has been to report statistics with only occasional measurement of sampling errors and no measurement of non-sampling errors. The media and the public generally accept the estimates as reported, making them instances of conventional certitude.

Government agencies communicate official economic statistics in news releases that make little if any mention of uncertainty in the reported estimates. Technical publications documenting data and methods acknowledge that official statistics are subject to error. They may use standard errors or confidence intervals to measure sampling errors; that is, the statistical imprecision that occurs with finite samples of the population. However, they generally do not attempt to quantify the many forms of non-sampling errors that generate identification problems. Neglect of non-sampling errors may reflect the fact that statistical theory has mainly focused on sampling error, making strong assumptions that imply point identification.

Reporting official statistics as point estimates without adequate attention to error manifests conventional certitude: The point estimates may be viewed as true but they are not necessarily true. In the absence of agency guidance, some users of official statistics may naively assume that errors are small and inconsequential. Persons who understand that the statistics are subject to error must fend for themselves and conjecture the error magnitudes. Thus, users of official statistics – economists, government officials, firm managers, and citizens – may misinterpret the information that the statistics provide.

Considering error from the perspective of users of statistics rather than of statisticians, I think it essential to refine the general problem of conventional certitude in official statistics, distinguishing errors in measurement of well-defined concepts from uncertainty about the concepts themselves. I also think it useful to distinguish transitory and permanent measurement problems. To highlight these distinctions, Manski (Reference Manski2015a) discussed transitory statistical uncertainty, permanent statistical uncertainty, and conceptual uncertainty. In what follows, I define these ideas and give illustrative examples.

2.2.3 Transitory Uncertainty: Revisions in National Income Accounts

Transitory statistical uncertainty arises because data collection takes time. Agencies may release a preliminary statistic with incomplete data and revise it as new data arrives. Uncertainty diminishes as data accumulates. A leading example is the Bureau of Economic Analysis’s (BEA) initial measurement of GDP and revision of the estimate as new data arrives. The BEA reports multiple vintages of quarterly GDP estimates. An “advance” estimate combines data available one month after the end of a quarter with trend extrapolations. “Second” and “third” estimates are released after two and three months, when new data become available. A “first annual” estimate is released in the summer, using data collected annually. There are subsequent annual and five-year revisions. Yet the BEA reports GDP estimates without quantitative measures of uncertainty.

A publication by BEA staff explains the practice of reporting estimates without measures of error as a response to the presumed wishes of the users of GDP statistics. Fixler et al. (Reference Fixler, Greenaway-McGrevy and Grimm2014) state (p. 2): “Given that BEA routinely revises its estimates during the course of a year, one might ask why BEA produces point estimates of GDP instead of interval estimates … Although interval estimates would inform users of the uncertainty surrounding the estimates, most users prefer point estimates, and so they are featured.” BEA analysts have provided an upbeat perspective on the accuracy of GDP statistics (Fixler et al., Reference Fixler, Greenaway-McGrevy and Grimm2011).

In contrast, Croushore (Reference Croushore2011) offers a more cautionary perspective, writing (p. 73): “Until recently, macroeconomists assumed that data revisions were small and random and thus had no effect on structural modeling, policy analysis, or forecasting. But real-time research has shown that this assumption is false and that data revisions matter in many unexpected ways.”

Communication of the transitory uncertainty of GDP estimates should be relatively easy to accomplish. The historical record of revisions has been made accessible for study in two “real-time” data sets maintained by the Philadelphia and St. Louis Federal Reserve Banks; see Croushore (Reference Croushore2011) for a definition of “real time.” Measurement of transitory uncertainty in GDP estimates is straightforward if one finds it credible to assume that the revision process is time stationary. Then historical estimates of the magnitudes of revisions can credibly be extrapolated to measure the uncertainty of future revisions. The BEA could communicate uncertainty as a probability distribution via a fan chart, as the Bank of England does regularly. See Aikman et al. (Reference Aikman, Barrett, Kapadia, King, Proudman, Taylor, de Weymarn and Yates2011) for commentary on the thinking underlying the Bank’s use of fan charts to communicate uncertainty.

2.2.4 Permanent Uncertainty: Nonresponse in Surveys

Permanent statistical uncertainty arises from incompleteness or inadequacy of data collection that is not resolved over time. Sources include sampling error due to finite sample size and non-sampling error due to nonresponse and misreporting. I focus here on nonresponse to employment and income questions in the Current Population Survey (CPS).

Each year the US Census Bureau reports statistics on the household income distribution based on data collected in a supplement to the CPS. The Census Bureau’s annual Current Population Report provides statistics characterizing the income distribution and measures sampling error by providing 90-percent confidence intervals for various estimates. The report does not measure non-sampling errors. A supplementary document describes some sources of non-sampling error but does not quantify them.

Each month, the BLS issues a news release reporting the unemployment rate for the previous month based on data collected in the monthly CPS. A “technical note” issued with the release contains a section on “Reliability of the Estimates” that acknowledges the possibility of errors (U.S. Bureau of Labor Statistics, 2023). The note describes the use of standard errors and confidence intervals to measure sampling error. It states that non-sampling errors “can occur for many reasons, including the failure to sample a segment of the population, inability to obtain information for all respondents in the sample, inability or unwillingness of respondents to provide correct information on a timely basis, mistakes made by respondents, and errors made in the collection or processing of the data.” The Note does not measure the magnitudes of non-sampling errors.

When the Census Bureau and BLS report point estimates of statistics on household income and employment, they assume that nonresponse is random conditional on specified observed covariates of sample members. This assumption, which implies the absence of non-sampling error, is implemented as weights for unit nonresponse and imputations for item nonresponse. CPS documentation of its imputation approach offers no evidence that the method yields a distribution for missing data that is close to the actual distribution. Another Census document describing the American Housing Survey is revealing. The US Census Bureau (2011) states: “The Census Bureau does not know how close the imputed values are to the actual values.” Indeed, lack of knowledge of the closeness of imputed values to actual ones is common. Manski (Reference Manski2024) critiques imputation from the perspective of partial identification analysis.

Research on partial identification shows how to measure potential non-sampling error due to nonresponse without making assumptions about the nature of the missing data. One contemplates all values that the missing data can take. Then the data yield interval estimates of official statistics. The literature derives intervals for population means and quantiles. The intervals have simple forms, their lower and upper bounds being the values that the estimate would take if all missing data were to take the smallest or largest logically possible value. The literature shows how to form confidence intervals that jointly measure sampling and nonresponse error. See Manski (Reference Manski2007a) for a textbook exposition.

To illustrate, Manski (Reference Manski2016) used CPS data to form interval estimates of median household income and the fraction of families with income below the poverty line in 2001–2011. There was considerable nonresponse to the income questions. During 2002–2012, 7 to 9 percent of the sampled households yielded no income data due to unit nonresponse and 41 to 47 percent of the interviewed households yielded incomplete income data due to item nonresponse. One set of estimates recognizes item nonresponse alone and another recognizes unit response as well. The interval estimate for the family poverty rate in 2011 is [0.14, 0.34] if one makes no assumptions about item response but assumes that unit nonresponse is random. The interval is [0.13, 0.39] if one drops the assumption that unit nonresponse is random.

Interval estimates of official statistics that place no assumptions on the values of missing data are easy to understand and simple to compute. One might therefore think that it would be standard practice for government statistical agencies to report them, but official statistics are not reported this way. It is sometimes said that such interval estimates are “too wide to be informative.” Nevertheless, I recommend that statistical agencies report them.

Wide bounds reflect real data uncertainties that cannot be washed away by assumptions lacking credibility. Even when wide, interval estimates making no assumptions on nonresponse are valuable for three reasons: (a) They are easy to compute and understand. (b) They are maximally credible in the sense that they express all logically possible values of the statistic of interest. (c) They make explicit the fundamental role that assumptions play in inferential methods that yield tighter findings.

The above does not imply that statistical agencies should refrain from making assumptions about nonresponse. Interval estimates making no assumptions may be excessively conservative if agency analysts have some understanding of the nature of nonresponse. There is much middle ground between interval estimation with no assumptions and point estimation assuming that nonresponse is conditionally random. The middle ground obtains interval estimates using assumptions that may include random nonresponse as one among various possibilities. Manski (Reference Manski2016) posed some alternatives that agencies may want to consider.

2.2.5 Conceptual Uncertainty: Seasonal Adjustment of Official Statistics

Conceptual uncertainty arises from incomplete understanding of the information that official statistics provide about economic concepts or from lack of clarity in the concepts themselves. Conceptual uncertainty concerns the interpretation of statistics rather than their magnitudes.

A leading example is seasonal adjustment of statistics. Viewed from a sufficiently high altitude, the purpose of seasonal adjustment appears straightforward to explain. It is less clear from ground level how one should perform seasonal adjustment.

The prevalent X-12-ARIMA method was developed by Census and is used by BLS and BEA. X-12, along with its predecessor X-11 and successor X-13, may be a sophisticated and successful algorithm for seasonal adjustment. Or it may be an unfathomable black box containing a complex set of operations that lack economic foundation. Wright (Reference Wright2013) noted the difficulty of understanding X-12, writing (p. 67): “Most academics treat seasonal adjustment as a very mundane job, rumored to be undertaken by hobbits living in holes in the ground. I believe that this is a terrible mistake, but one in which the statistical agencies share at least a little of the blame.” He states that understanding the practice of seasonal adjustment matters because (p. 65): “Seasonal adjustment is extraordinarily consequential.”

There presently exists no clearly appropriate way to measure the uncertainty associated with seasonal adjustment. X-12 is a standalone algorithm, not a method based on a specified dynamic theory of the economy. It is not obvious how to evaluate the extent to which it accomplishes the objective of removing the influences of predictable seasonal patterns. One might perhaps juxtapose X-12 with other seemingly reasonable algorithms, perform seasonal adjustment with each one, and view the range of resulting estimates as a measure of conceptual uncertainty. More principled ways to evaluate uncertainty may open up if agencies were to use a seasonal adjustment method derived from a specified model of the economy. One could then assess the sensitivity of seasonally adjusted estimates to variation in the parameters and the basic structure of the model.

A more radical departure from present practice would be to abandon seasonal adjustment and leave it to the users of statistics to interpret unadjusted statistics. Publication of unadjusted statistics should be particularly valuable to users who want to make year-to-year rather than month-to-month comparisons of statistics. Suppose that one wants to compare unemployment in March 2013 and March 2014. It is arguably more reasonable to compare the unadjusted estimates for these months than to compare the seasonally adjusted estimates. Comparison of unadjusted estimates for the same month each year sensibly removes the influences of predictable seasonal patterns, and compares data collected in the two months of interest.

2.2.6 Why Do Statistical Agencies Practice Incredible Certitude?

The concerns that I have expressed about incredible certitude in official economic statistics are not new. Simon Kuznets (Reference Kuznets1948), the father of national income accounting, called for publication of “margins of error” with these official statistics. Soon after, Oskar Morgenstern wrote a book that urgently argued for regular measurement of error in all official economic statistics Morgenstern (Reference Morgenstern1950, Reference Morgenstern1963). He was well placed to influence the status quo, being famous for his contribution to game theory. Yet his efforts did not bear fruit. More recently, agencies have not adhered to the National Research Council (2013) call for “Openness about Sources and Limitations of the Data Provided” in the document, Principles and Practices for a Federal Statistical Agency.

Why is it that statistical agencies do so little to communicate uncertainty in official statistics? I am not aware of any valid professional reason that would explain the failure of the BLS and Census to report measures of sampling error in their news releases of employment and income statistics. Agency administrators could task their research staffs to develop measures of non-sampling error. While I cannot conjure a valid professional explanation for the status quo, I do see a possible political explanation.

Federal statistical agencies may perceive a political incentive to express incredible certitude about the state of the economy when they publish official economic statistics. Morgenstern (Reference Morgenstern1963) commented cogently on the political incentives facing statistical agencies, writing (p. 11):

Finally, we mention a serious organizational difficulty in discussing and criticizing statistics. These are virtually always produced by large organizations, government or private; and these organizations are frequently mutually dependent upon each other in order to function normally. Often one office cannot publicly raise questions about the work of another, even when it suspects the quality of the work, since this might adversely affect bureaucratic-diplomatic relations between the two and the flow of information from one office to another might be hampered. A marked esprit de corps prevails. All offices must try to impress the public with the quality of their work. Should too many doubts be raised, financial support from Congress or other sources may not be forthcoming. More than once has it happened that Congressional appropriations were endangered when it was suspected that government statistics might not be 100 percent accurate. It is natural, therefore, that various offices will defend the quality of their work even to an unreasonable degree.

2.3 Dueling Certitudes in Criminal Justice Research

Dueling certitudes − contradictory predictions made with alternative assumptions − are common in research on controversial policy questions. Research on criminal justice policy provides many illustrations. I describe three controversies here.

2.3.1 The RAND and IDA Studies of Cocaine-Control Policy

In the mid 1990s, two studies of cocaine-control policy played prominent roles in discussions of federal policy towards illegal drugs. One was performed by analysts at RAND (Rydell and Everingham, Reference Rydell and Everingham1994) and the other by analysts at the Institute for Defense Analyses (IDA) (Crane, Rivolo, and Comfort, Reference Crane, Rivolo and Comfort1997). The two studies posed similar hypothetical objectives for cocaine-control policy, namely reduction in cocaine consumption in the United States by 1 percent. Both studies predicted the cost of using certain policies to achieve this objective. However, the RAND and IDA authors used different assumptions and data to reach dramatically different policy conclusions.

The RAND study specified a model of the supply and demand for cocaine that aimed to characterize the interaction of producers and users and the process through which alternative cocaine-control policies may affect consumption and prices. It used this model to evaluate various demand-control and supply-control policies and concluded that drug treatment, a demand-control policy, is much more effective than any supply policy. The IDA study examined the time series association between source-zone interdiction activities and retail cocaine prices. It concluded that source-zone interdiction, a supply-control policy, is at least as effective as is drug treatment.

When they appeared, the RAND and IDA studies drew attention to the ongoing struggle over federal funding of drug-control activities. The RAND study was used to argue that funding should be shifted towards drug-treatment programs and away from activities to reduce drug production or to interdict drug shipments. The IDA study, undertaken in part as a response to the RAND findings, was used to argue that interdiction activities should be funded at current or higher levels.

At a congressional hearing, Lee Brown, then director of the Office of National Drug Control Policy (ONDCP), used the RAND study to argue for drug treatment (Subcommittee on National Security, International Affairs, and Criminal Justice, 1996, p. 61): “Let me now talk about what we know works in addressing the drug problem. There is compelling evidence that treatment is cost- effective and provides significant benefits to public safety. In June 1994, a RAND Corporation study concluded that drug treatment is the most cost effective drug control intervention.”

In a subsequent hearing specifically devoted to the IDA study, Subcommittee Chair William Zeliff used the study to argue for interdiction (Subcommittee on National Security, International Affairs, and Criminal Justice 1998, p. 1):

We are holding these hearings today to review a study on drug policy, a study we believe to have significant findings, prepared by an independent group, the Institute for Defense Analysis, at the request of Secretary of Defense Perry in 1994…. The subcommittee has questioned for some time the administration’s strong reliance on treatment as the key to winning our Nation’s drug war, and furthermore this subcommittee has questioned the wisdom of drastically cutting to the bone interdiction programs in order to support major increases in hardcore drug addiction treatment programs. The basis for this change in strategy has been the administration’s reliance on the 1994 RAND study.

At the request of the ONDCP, the National Research Council Committee on Data and Research for Policy on Illegal Drugs assessed the RAND and IDA studies; see National Research Council (1999). After examining the two studies, the committee concluded that neither constitutes a persuasive basis for the formation of cocaine-control policy. Specifically, the committee concluded that neither the RAND nor the IDA study provides a credible estimate of what it would cost to use alternative policies to reduce cocaine consumption in the United States.

I chaired the National Research Council Committee. When I think now about the RAND and IDA studies, I consider their many specific differences to be less salient than their shared lack of credibility. Each study may have been coherent internally, but each rested on such a fragile foundation of weak data and unsubstantiated assumptions as to undermine its findings. To its great frustration, the committee had to conclude that the nation should not draw even the most tentative policy lessons from either study. Neither yields usable findings.

What troubles me most about both studies is their injudicious efforts to draw strong policy conclusions. It is not necessarily problematic for researchers to try to make sense of weak data and to entertain unsubstantiated conjectures. However, the strength of the conclusions drawn in a study should be commensurate with the quality of the evidence. When researchers overreach, they not only give away their own credibility, but they diminish public trust in science more generally. The damage to public trust is particularly severe when researchers inappropriately draw strong conclusions about matters as contentious as drug policy.

2.3.2 The Deterrent Effect of the Death Penalty

American society has long debated the deterrent effect of the death penalty as a punishment for murder. Disagreement persists because research has not been able to settle the question. Researchers have used data on homicide rates and sanctions across states and years to examine the deterrent effect of the death penalty. The fundamental difficulty is that the outcomes of counterfactual policies are unobservable. Data alone cannot reveal what the homicide rate in a state without (with) a death penalty would have been had the state (not) adopted a death penalty statute. Data must be combined with assumptions to predict homicides under counterfactual deterrence policies.

A large body of work has addressed deterrence and the death penalty, yet the literature has not achieved consensus. Researchers studying the question have used much the same data, but have maintained different assumptions and have consequently reached different conclusions. Rather than acknowledge uncertainty about the realism of its maintained assumptions, each published article touts its findings as accurate. The result is dueling certitudes across articles.

Two committees of the National Research Council have documented the substantial variation in research findings and have investigated in depth the problem of inference on deterrence; see Blumstein, Cohen, and Nagin (Reference Blumstein, Cohen and Nagin1978) and National Research Council (2012). The latter committee, reiterating a basic conclusion of the former one, wrote (p. 2): “The committee concludes that research to date on the effect of capital punishment on homicide is not informative about whether capital punishment decreases, increases, or has no effect on homicide rates.”

To illustrate in a simple setting how research that uses the same data, but different assumptions, can reach very different findings, Manski and Pepper (Reference Manski and Pepper2013) examined data from the critical 1970s period when the Supreme Court decided the constitutionality of the death penalty. The 1972 Supreme Court case Furman v. Georgia resulted in a multi-year moratorium on the application of the death penalty, while the 1976 case Gregg v. Georgia ruled that the death penalty could be applied subject to certain criteria. We examined the effect of death penalty statutes on homicide rates in two years: 1975, the last full year of the moratorium, and 1977, the first full year after the moratorium was lifted. In 1975, the death penalty was illegal throughout the country. In 1977, thirty-two states had statutes legalizing the death penalty. For each state and year, we observe the homicide rate and whether the death penalty is legal.

We computed three simple estimates of the effect of death penalty statutes on homicide. A before-and-after analysis compares homicide rates in the treated states in 1975 and 1977. The 1975 homicide rate in these states, when none had the death penalty, was 10.3 per 100,000. The 1977 rate, when all had the death penalty, was 9.7. The before-and-after estimate is the difference between the 1977 and 1975 homicide rates; that is 9.710.3=0.6. This is interpretable as the average effect of the death penalty on homicide in the treated states if one assumes that nothing germane to homicide occurred in these states between 1975 and 1977 except for legalization of capital punishment.

Alternatively, one might compare the 1977 homicide rates in the treated and untreated states. The 1977 rate in the treated states, which had the death penalty, was 9.7. The 1977 rate in the untreated states, which did not have the death penalty, was 6.9. The estimate is the difference between these homicide rates; that is, 9.76.9=2.8. This is interpretable as the nationwide average effect of the death penalty on homicide in 1977 if one assumes that persons living in the treated and untreated states have the same propensity to commit murder in the absence of the death penalty and respond similarly to enactment of the death penalty. With this assumption, the observed homicide rate in the treated states reveals what the rate would have been in the untreated states if they had enacted the death penalty and vice versa.

Yet a third way to use the data is to compare the temporal changes in homicide rates in the treated and untreated states. Between 1975 and 1977, the homicide rate in the treated states fell from 10.3 to 9.7, while the rate in the untreated states fell from 8.0 to 6.9. The so-called difference-in-difference (DID) estimate is the difference between these temporal changes; that is, (9.710.3)(6.98.0)=0.5. This is interpretable as the nationwide effect of the death penalty on homicide if one assumes that all states experience a common time trend in homicide and that enactment of the death penalty has the same effect in all states.

These three estimates yield different empirical findings regarding the effect of the death penalty on homicide. The before-and-after estimate implies that enactment of a death penalty statute reduces the homicide rate by 0.6 per 100,000. The other two estimates imply that having the death penalty raises the homicide rate by 2.8 or 0.5 per 100,000. The idea that capital punishment may increase the homicide rate is contrary to the traditional view of punishment as a deterrent. However, some researchers have argued that the death penalty shows a lack of concern for life that brutalizes society into greater acceptance of commission of murder.

Which estimate is correct? Given certain assumptions, each appropriately measures the effect of the death penalty on homicide. However, the assumptions that justify this interpretation differ across estimates. One may be correct, or none of them. If three researchers were to each maintain a different one of the assumptions and report one of the three estimates, they would exhibit dueling certitudes.

The antidote to dueling certitudes about the deterrent effect of capital punishment is to recognize uncertainty by generating a set of estimates under alternative assumptions. To formalize this idea in a flexible manner, Manski and Pepper (Reference Manski and Pepper2013) studied the conclusions implied by relatively weak bounded-variation assumptions that restrict variation in treatment response across places and time. See Chapter 3 for a formal description of such assumptions.

The results are findings that bound the deterrent effect of capital punishment. By successively adding stronger identifying assumptions, we sought to make transparent how assumptions shape inference. We performed empirical analysis using state-level data in the United States in 1975 and 1977. Under the weakest restrictions, there is substantial ambiguity: we cannot rule out the possibility that having a death penalty statute substantially increases or decreases homicide. This ambiguity is reduced when we impose stronger assumptions, but inferences are sensitive to the maintained restrictions. Combining the data with some assumptions implies that the death penalty increases homicide, but other assumptions imply that the death penalty deters it.

2.3.3 How Do Right-to-Carry Laws Affect Crime Rates?

A considerable body of research on crime in the United States has used data on county or state crime rates to evaluate the impact of laws allowing individuals to carry concealed handguns – so called right-to-carry (RTC) laws. Theory alone cannot predict even the direction of the impact. The knowledge or belief that potential victims may be carrying weapons may deter commission of some crimes but may escalate the severity of criminal encounters. Ultimately, how allowing individuals to carry concealed weapons affects crime is an empirical question.

Lott (Reference Lott2010) described some of this empirical research in a book with the provocative and unambiguous title More Guns, Less Crime. Yet, despite dozens of studies, the full body of research provides no clear insight on whether more guns yield less crime. Some studies find that RTC laws reduce crime, others find that the effects are negligible, and still others find that such laws increase crime. In a series of papers starting in 1997, Lott and co-authors have argued forcefully that RTC laws have important deterrent effects which can play a role in reducing violent crime. Lott and Mustard (Reference Lott and Mustard1997) and Lott (Reference Lott2010), for example, found that RTC laws reduce crime rates in every violent crime category by between 5 and 8 percent. Using different models and revised/updated data, however, other researchers have found that RTC laws either have little impact or may increase violent crime rates. See, for example, Black and Nagin (Reference Black and Nagin1998), Duggan (Reference Duggan2001), Aneja et al. (Reference Aneja, Donohue and Zhang2011), and Durlauf et al. (Reference Durlauf, Navarro and Rivers2016).

This sharp disagreement may seem surprising. How can researchers using similar data draw such different conclusions? In fact, it has long been known that inferring the magnitude and direction of treatment effects is inherently difficult due to the unobservability of counterfactual outcomes. Suppose that one wants to learn how crime rates would differ with and without a RTC law in a given place and time. Data cannot reveal what the crime rate in a RTC state would have been if the state had not enacted the law. Nor can data reveal what the crime rate in a non-RTC state would have been if an RTC law had been in effect. To identify the law’s effect, one must somehow “fill in” the missing counterfactual observations. This requires making assumptions that cannot be tested empirically. Different assumptions may yield different inferences, hence dueling certitudes.

Empirical research on RTC laws has struggled to find consensus on a set of credible assumptions. Reviewing the literature, the National Research Council Committee to Improve Research Information and Data on Firearms concluded that it is not possible to infer a credible causal link between RTC laws and crime using the current evidence (National Research Council, Reference Wellford, Pepper and Petrie2005). Indeed, the committee concluded that (National Research Council, Reference Wellford, Pepper and Petrie2005, p. 150), “additional analysis along the lines of the current literature is unlikely to yield results that will persuasively demonstrate” this link. The committee observed that findings are highly sensitive to model specification. Yet there is no solid foundation for specific assumptions and, as a result, no obvious way to prefer specific results. Hence, drawing credible precise findings that lead to consensus about the impact of RTC laws has been impossible.

The antidote to dueling certitudes about the effect on crime of RTC laws is to recognize uncertainty by generating a set of estimates under alternative assumptions. To formalize this idea in a flexible manner, Manski and Pepper (Reference Manski and Pepper2018) studied the conclusions implied by relatively weak bounded-variation assumptions that restrict variation in treatment response across places and time. The methodology extended that used in the Manski and Pepper (Reference Manski and Pepper2013) analysis of the deterrent effect of capital punishment, discussed above. The results were findings that bound the crime effect of RTC laws. Considering alternative assumptions makes transparent how assumptions shape inference.

2.4 Wishful Extrapolation from Medical Research to Patient Care

Extrapolation is essential to policy analysis. A central objective is to inform policy choice by predicting the outcomes that would occur if past policies were to be continued or alternative ones were to be enacted. Researchers often use untenable assumptions to extrapolate. I have called this manifestation of incredible certitude wishful extrapolation. To illustrate, I will discuss extrapolation from randomized trials in medicine to inform patient care, drawing on Manski (Reference Manski2019b).

Trials have long enjoyed a favored status within medical research on treatment response. They are often called the “gold standard” for such research. The appeal of trials is that, with sufficient sample size and complete observation of outcomes, they deliver credible findings on treatment response in the study population. However, extrapolation of findings from trials to clinical practice can be difficult. Researchers and guideline developers often use untenable assumptions to extrapolate.

2.4.1 Extrapolation from Study Populations to Patient Populations

Study populations in trials often differ from patient populations. It is common to perform trials studying treatment of a specific disease only on subjects who have no comorbidities. Another source of difference between study and patient populations is that a study population consists of persons with specified demographic attributes who volunteer to participate in a trial. Participation in a trial may be restricted to persons in certain age categories who reside in certain locales. Among such persons, volunteers are those who respond to financial and medical incentives to participate. It may be wishful extrapolation to assume that treatment response in trials performed on volunteers with specified demographic attributes who lack comorbidities is the same as what would occur in actual patient populations.

To justify trials performed on study populations that may differ substantially from patient populations, researchers often cite Donald Campbell, who distinguished between the internal and external validity of studies of treatment response (Campbell and Stanley, Reference Campbell and Stanley1963). A study is said to have internal validity if it has credible findings for the study population. It has external validity if an invariance assumption permits credible extrapolation. The appeal of randomized trials is their internal validity. Wishful extrapolation is an absence of external validity.

Campbell argued that studies should be judged primarily by their internal validity and secondarily by their external validity. This perspective has been used to argue for the primacy of experimental research over observational studies, whatever the study population may be. The Campbell position is well grounded if treatment response is homogeneous. Then researchers can learn about treatment response in easy-to-analyze study populations and clinicians can confidently extrapolate findings to patient populations. However, homogeneity of treatment response may be the exception rather than the rule. Hence, it may be wishful to extrapolate from a study population to a patient population. See Section 2.5.2 for further discussion of the distinction between internal and external validity.

2.4.2 Extrapolation from Experimental Treatments to Clinical Treatments

Treatments in trials often differ from those that occur in clinical practice. This is particularly so in trials comparing drug treatments. Drug trials are commonly double-blinded, neither the patient nor the clinician knowing the assigned treatment. A double-blinded drug trial reveals the distribution of response in a setting where patients and clinicians are uncertain what treatment a patient is receiving. It does not reveal what response would be when patients and clinicians know what drug is being administered and can react to this information.

Consider drug treatments for hypertension. Patients may react heterogeneously to the various drugs available for prescription. A clinician treating a specific patient may sequentially prescribe alternative drugs, trying each for a period in an effort to find one that performs satisfactorily. Sequential experimentation is not possible in a blinded trial. The standard protocol prohibits the clinician from knowing what drug a subject is receiving and from using judgment to modify the treatment. Blinding is also problematic for interpretation of noncompliance with assigned treatments.

2.4.3 Wishful Meta-analyses of Disparate Studies

The problems discussed above concern extrapolation of findings from a single trial. Further difficulties arise when one attempts to combine findings from multiple trials.

It is easy to understand the impetus for combination of findings. Decision makers must somehow interpret the mass of information provided by empirical research. The hard question is how to interpret this information sensibly. Combination of findings is sometimes performed by systematic review of a set of studies. This is a subjective process similar to the exercise of clinical judgment.

Statisticians have proposed meta-analysis, attempting to provide an objective methodology for combining the findings of multiple studies. Meta-analysis was originally developed to address a purely statistical problem. Suppose that multiple trials have been performed on the same population, each drawing an independent random sample. The best way to use the data combines them into one sample.

Suppose that the raw data are unavailable. Instead, multiple parameter estimates are available, each computed with the data from a different sample. Meta-analysis proposes methods to combine the estimates. A common proposal computes a weighted average, weighting estimates by sample size.

The original concept of meta-analysis is uncontroversial, but its applicability is limited. It is common to have multiple disparate studies. The studies may examine distinct patient populations, whose members may have different risk of disease or different distributions of treatment response. Administration of treatments and measurement of outcomes may vary. Gene Glass, who introduced the term meta-analysis, wrote (Glass, Reference Glass1977, p. 358): “The tough intellectual work in many applied fields is to make incommensurables commensurable, in short, to compare apples and oranges.”

Meta-analysis is performed often in such settings, computing weighted averages of estimates for distinct study populations and trial designs. Specifically, meta-analyses often use a random-effects model (DerSimonian and Laird, Reference DerSimonian and Laird1986). The model considers trials to be drawn at random “from a population of possible studies.” Then each trial estimates a parameter drawn at random from a population of possible parameters. A weighted average estimates the mean of these parameters.

The relevance to clinical practice is obscure. DerSimonian and Laird do not explain what is meant by a population of possible studies, nor why published studies should be considered a random sample from this population. They do not explain how a population of possible studies connects to what matters to a clinician – the distribution of health outcomes across the relevant population of patients.

Manski (Reference Manski2020f) draws on econometric research on partial identification to propose principles for patient-centered meta-analysis. One specifies a prediction of concern and determines what each available study reveals. Given common imperfections in internal and external validity, studies typically yield credible set-valued rather than point predictions. Thus, a study may enable one to conclude that a probability of disease, or mean treatment response, lies within a range of possibilities. Patient-centered meta-analysis would combine the findings of multiple studies by computing the intersection of the set-valued predictions that they yield. See Chapter 3 for further discussion.

2.5 Sacrificing Relevance for Certitude

Researchers often are aware that they cannot form a credible point prediction or estimate of a quantity of interest. They could face up to uncertainty and determine what they can credibly infer about the quantity, perhaps obtaining a bound. However, the lure of incredible certitude being strong, they often respond differently. They change the objective and focus on another quantity that is not of substantive interest but that can be predicted or estimated credibly. Thus, they sacrifice relevance for certitude.

Notable scientists have critiqued this practice. The statistician John Tukey wrote (Tukey, Reference Tukey1962, pp. 13–14): “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” Many cite some version of the joke about the drunk and the lamppost. Noam Chomsky has been quoted as putting it this way (Barsky, Reference Barsky1998, p. 95): “Science is a bit like the joke about the drunk who is looking under a lamppost for a key that he has lost on the other side of the street, because that’s where the light is.”

Sacrificing relevance for certitude does not imply incredible certitude if everyone understands that the quantity being estimated or predicted is not of substantive interest. The problem is that authors may not be forthright about this, or readers may misinterpret findings. I provide two illustrations, focusing on medical research.

2.5.1 The Odds Ratio and Public Health

In a well-known text on epidemiology, Fleiss (Reference Fleiss1981) stated that retrospective studies of disease do not yield policy-relevant predictions and so are (p. 92), “necessarily useless from the point of view of public health.” Nevertheless, he went on to say that “retrospective studies are eminently valid from the more general point of view of the advancement of knowledge.” What Fleiss meant in the first statement is that retrospective studies do not provide data that enable credible point estimation of attributable risk, a quantity of substantive interest in public health. The second statement means that retrospective studies enable credible point estimation of the odds ratio, a quantity that is not of substantive interest but that is widely reported in epidemiological research. I explain here, drawing on Manski (Reference Manski2007a, Chapter 5).

The term retrospective studies refers to a sampling process that is also known to epidemiologists as case-control sampling and to econometricians studying behavior as choice-based sampling (Manski and Lerman, Reference Manski and Lerman1977). I call it response-based sampling here, as in Manski (Reference Manski2007a). Formally, consider a population each of whose members is described by covariates x and a response (or outcome) y. Consider inference on the response probabilities P(y|x) when the population is divided into response strata and random samples are drawn from each stratum. This is response-based sampling.

In a simple case prevalent in epidemiology, y is a binary health outcome and x is a binary risk factor. Thus, y=1 if a person becomes ill and y=0 otherwise, while x=1 if the person has the risk factor and x=0 otherwise. In a classic example, y denotes the presence of lung cancer and x denotes whether a person is a smoker. Response-based sampling draws random samples of ill and healthy persons. This reveals the distributions of the risk factor among those who are ill and healthy; that is, P(x|y=1) and P(x|y=0). It does not reveal P(y|x).

A basic concern of research in public health is to learn how the probability of illness varies across persons who do and who do not have a risk factor. Attributable risk is the difference in illness probability between these groups; that is, P(y=1|x=1)P(y=1|x=0). Another measure of the variation of illness with the risk factor is the ratio P(y=1|x=1)/P(y=1|x=0), called relative risk.

Texts on epidemiology discuss both relative and attributable risk, but empirical research has focused on relative risk. This focus is hard to justify from the perspective of public health. The health impact of a risk factor presumably depends on the number of illnesses affected; that is, on attributable risk times the size of the population. The relative risk statistic is uninformative about this quantity.

For example, consider two scenarios. In one, the probability of lung cancer conditional on smoking is 0.12 and conditional on nonsmoking is 0.08. In the other, these probabilities are 0.00012 and 0.00008. The relative risk in both scenarios is 1.5. Attributable risk is 0.04 in the first scenario and 0.00004 in the second. The first scenario is clearly much more concerning to public health than the second. The relative risk statistic does not differentiate the scenarios, but attributable risk does.

Given that attributable risk is more relevant to public health, it seems odd that epidemiological research has emphasized relative risk rather than attributable risk. Indeed, the practice has long been criticized; see Berkson (Reference Berkson1958), Fleiss (Reference Fleiss1981, Section 6.3), and Hsieh, Manski, and McFadden (Reference Hsieh, Manski and McFadden1985). The rationale, such as it is, rests on the widespread use in epidemiology of response-based sampling.

The data generated by response-based sampling do not point-identify attributable risk. Fleiss (Reference Fleiss1981) remarked that (p. 92) “retrospective studies are incapable of providing estimates” of attributable risk. Manski (Reference Manski2007a) proved that these data do yield a bound.

Cornfield (Reference Cornfield1951) showed that the data from response-based sampling point-identify the odds ratio, defined as [P(y=1|x=1)/P(y=0|x=1)]/[P(y=1|x=0)/P(y=0|x=0)]. He also observed that when P(y=1) is close to zero, a condition called the “rare-disease” assumption, the odds ratio approximately equals relative risk. The rare-disease assumption is credible when considering some diseases. In such cases, epidemiologists have used the odds ratio as a point estimate of relative risk.

Cornfield’s finding motivates the widespread epidemiological practice of using response-based samples to estimate the odds ratio and then invoking the rare-disease assumption to interpret the odds ratio as relative risk. Fleiss’ (Reference Fleiss1981) statement that retrospective studies are (p. 92) “valid from the more general point of view of the advancement of knowledge” endorses this practice. Thus, use of the odds ratio to point-estimate relative risk sacrifices relevance for certitude.

2.5.2 Randomized Trials and the Primacy of Internal Validity

As discussed earlier, randomized trials of treatment response have long enjoyed a favored status in medical research. They have increasingly acquired this status in the social sciences. However, as discussed earlier, the treatment response studied in a trial may differ considerably from the response that a clinician or other planner would find of substantive interest.

Seeking to justify the estimates obtained in trials, researchers in public health and the social sciences often cite Donald Campbell, who distinguished between the internal and external validity of studies of treatment response (Campbell and Stanley, Reference Campbell and Stanley1963; Campbell, Reference Campbell1984). The appeal of randomized trials is their internal validity. Campbell argued that studies of treatment response should be judged first by their internal validity and secondarily by their external validity.

In practice, researchers commonly neglect external validity. Analyses of trials focus on the outcomes measured with the treatments assigned in the study population. Research articles may offer verbal conjectures on external validity in the discussion sections of their papers, but they do not assess external validity quantitatively. Thus, relevance is sacrificed for certitude.

The doctrine of the primacy of internal validity has been extended from randomized trials to observational studies. When considering the design and analysis of observational studies, Campbell and his collaborators recommended that researchers aim to emulate as closely as possible the conditions of a randomized experiment, even if this requires focus on a study population that differs materially from the population of interest.

Among economists, this perspective on observational studies has been championed by those who advocate study of a local average treatment effect (LATE). This is defined as the average treatment effect within the subpopulation of persons whose received treatment would be modified by altering the value of an instrumental variable; see Imbens and Angrist (Reference Imbens and Angrist1994) and Angrist, Imbens, and Rubin (Reference Angrist, Imbens and Rubin1996). Local average treatment effects generally are not quantities of substantive interest; see Manski (Reference Manski1996, Reference Manski2007a), Deaton (Reference Deaton2009), and Heckman and Urzua (Reference Heckman and Urzua2009). Their study has been motivated by the fact that they are point-identified given certain assumptions that are sometimes thought credible.

2.6 Psychological Rationales for Incredible Certitude

I have repeatedly heard colleagues who advise policy makers assert that expression of incredible certitude is necessary because the consumers of their research are psychologically unable or unwilling to cope with uncertainty. They contend that, if they were to express uncertainty, policy makers would either misinterpret findings or not listen at all.

Colleagues sometimes state that “psychologists have shown” that humans cannot deal with uncertainty, without providing citations. What has research in psychology and related fields shown about the ability and willingness of humans to deal with uncertainty? I will discuss several literatures that relate to this question. They do not provide a basis to conclude that expression of incredible certitude is a psychological necessity.

2.6.1 Intolerance of Uncertainty

Clinical psychologists have studied “intolerance of uncertainty” (IU) as a phenomenon associated with the clinical disorder called “generalized anxiety disorder” (GAD). Buhr and Dugas (Reference Buhr and Dugas2009) define IU as follows (Buhr and Dugas, Reference Buhr and Dugas2009, p. 216):

Research has shown that intolerance of uncertainty is a fundamental cognitive process involved in excessive worry and GAD. Intolerance of uncertainty can be viewed as a dispositional characteristic that results from a set of negative beliefs about uncertainty and its implications … and involves the tendency to react negatively on an emotional, cognitive, and behavioral level to uncertain situations and events…. More specifically, individuals who are intolerant of uncertainty find uncertainty stressful and upsetting, believe that uncertainty is negative and should be avoided, and experience difficulties functioning in uncertainty – inducing situations … These individuals find many aspects of life difficult to tolerate given the inherent uncertainties of daily living. They tend to feel threatened in the face of uncertainty and engage in futile attempts to control or eliminate uncertainty.

If IU as defined here were a common occurrence, researchers might have good reason to think that expression of incredible certitude is a psychological necessity. However, it does not appear to be common. I am unaware of estimates of the prevalence of IU, but Kessler and Witchen (Reference Kessler and Wittchen2002) and Craske and Stein (Reference Craske and Stein2016) give estimates of the prevalence of GAD, a disorder that encompasses IU and much else. Relying on epidemiological surveys from various countries, they respectively report that 4–7 or 3–5 percent of persons suffer from GAD at some point in their lives. These estimates, to the extent they are accurate, give upper bounds on the lifetime prevalence of IU. If the lifetime prevalence of IU is no more than 4–7 or 3–5 percent, the disorder is too rare for researchers to conclude that incredible certitude is a psychological necessity.

Moreover, it may be that IU is a treatable disorder. Clinical psychologists have developed “intolerance of uncertainty therapy” (IUT) as a treatment. IUT is defined by Van der Heiden, Muris, and van der Molen (Reference Van der Heiden, Muris and van der Molen2012) as follows (Van der Heiden et al., Reference Van der Heiden, Muris and van der Molen2012, p. 103): “IUT focuses on decreasing anxiety and the tendency to worry by helping patients develop the ability to tolerate, cope with, and even accept uncertainty in their everyday lives.” Reporting on a randomized trial comparing IUT with other treatments for GAD, these authors report that IUT yields clinically significant reduction in patient experience of the symptoms of GAD.

2.6.2 Motivated Reasoning Regarding Uncertainty

Now consider the general population; that is, the 93 percent or more of persons who do not have diagnosable IU disorder. Economists studying the general population have commonly maintained a sharp distinction between preferences and beliefs. This distinction is expressed cleanly in the expected utility model. A utility function evaluates the desirability of an action in a specified state of nature. A subjective probability distribution expresses belief about the likelihood of each feasible state.

In contrast, social psychologists commingle preferences and beliefs in various ways. They sometimes use the term motivated reasoning; see Kunda (Reference Kunda1990). Some closing of the gap between economic and social psychological thinking is evident in recent economic work that formalizes the notion of motivating reasoning. See Akerlof and Dickens (Reference Akerlof and Dickens1982), Caplin and Leahy (Reference Caplin and Leahy2001), Brunnermeier and Parker (Reference Brunnermeier and Parker2005), Gollier and Muermann (Reference Gollier and Muermann2010), and Bénabou and Tirole (Reference Bénabou and Tirole2016).

A subset of the work by social psychologists focuses on uncertainty as a motivating force per se. Bar-Anan, Wilson, and Gilbert (Reference Bar–Anan, Wilson and Gilbert2009) put it this way (p. 123): “Uncertainty has both an informational component (a deficit in knowledge) and a subjective component (a feeling of not knowing).” The idea of “a feeling of not knowing” has no interpretation in the expected utility model.

While social psychologists embrace the notion that uncertainty engenders feelings, they have not attained consensus about the nature of the feelings. Citing earlier research, Bar-Anan, Wilson, and Gilbert (Reference Bar–Anan, Wilson and Gilbert2009) initially write that (p. 123), “uncertainty is generally viewed as an aversive state that organisms are motivated to reduce.” This view, if accurate, might give researchers an incentive to express certitude to mitigate the negative feelings that persons obtain from uncertainty. However, these authors go on to question the general view, stating (Bar-Anan, Wilson, and Gilbert, Reference Bar–Anan, Wilson and Gilbert2009, p. 123): “In contrast, we propose an uncertainty intensification hypothesis, whereby uncertainty makes unpleasant events more unpleasant (as prevailing theories suggest) but also makes pleasant events more pleasant (contrary to what prevailing theories suggest).” The theme that uncertainty may sometimes be pleasurable is developed further in other papers, including Wilson et al. (Reference Wilson, Centerbar, Kermer and Gilbert2005) and Whitchurch, Wilson, and Gilbert (Reference Whitchurch, Wilson and Gilbert2011).

2.6.3 Expression of Uncertainty in Probability Judgments

Possible evidence for the psychological view that persons are motivated to reduce uncertainty exists within a body of empirical research that asks subjects to place subjective probabilities on the truth of objectively verifiable statements and subjective distributions on the values of objectively measurable quantities. Some studies have reported findings of overconfidence. Combining evidence across multiple experiments, psychologists have found that reported subjective probabilities that statements are true tend to be higher than the frequency with which they are true. Confidence intervals for real-valued quantities tend to be too narrow. The phenomenon has come to be called “overconfidence bias.” Tversky and Kahneman (Reference Tversky and Kahneman1974) and Fischhoff and MacGregor (Reference Fischhoff and MacGregor1982) view overconfidence bias as a well-established and widespread phenomenon.

Nevertheless, the literature on overconfidence bias does not provide a rationale for policy analysts to express incredible certitude. Experimental subjects typically do not manifest bias so extreme as to give responses of 0 or 1 when asked to state subjective probabilities of uncertain events. They commonly give responses that express uncertainty, albeit not as much uncertainty as warranted. Moreover, Gigerenzer, Hoffrage, and Kleinbölting (Reference Gigerenzer, Hoffrage and Kleinbölting1991) and others argue that research findings on overconfidence bias are fragile. They report that subjects often express more uncertainty when they are asked questions with different wording than psychologists have traditionally used.

Further reason to question the prevalence of overconfidence appears in the large body of economic research that elicits subjective probabilities of future personal events from survey respondents. This literature finds substantial heterogeneity in the expectations that persons hold, including the degree to which they express uncertainty. It does not find that respondents are generally overconfident. Review articles by Manski (Reference Manski2004a, Reference Manski2018b) describe the emergence of this field and summarize a range of applications. Review articles by Hurd (Reference Hurd2009), Armantier et al. (Reference Armantier, Bruine de Bruin, Potter, Topa, van der Klaauw and Zafar2013), Delavande (Reference Delavande2014), and Schotter and Trevino (Reference Schotter and Trevino2014) focus on work measuring probabilistic expectations of older persons, inflation, populations in developing countries, and subjects making decisions in lab experiments. See Chapter 4 for further discussion of economic research measuring probabilistic expectations.

2.7 As-If Optimization with Incredible Certitude

A possible rationale for incredible certitude is that it may be useful as a device to simplify decision making under uncertainty. The broad idea, following Simon (Reference Simon1955), is that humans are boundedly rational, in the sense of having computational limitations in cognition. Simon argued that it may be burdensome or infeasible for people to make choices with the decision criteria studied in standard decision theory. He suggested that people use approximations or heuristics to reduce decision effort. He called these approximations “satisficing.”

As discussed in Chapter 1, standard consequentialist decision theory assumes that a decision maker determines the set of undominated actions and use a reasonable decision criterion to make a choice. However, these tasks may require substantial computational effort. The feasibility of applying these criteria depends on the setting, but they often become less tractable as the sizes of the choice set C and the state space S grow. Maximization of expected utility requires integration of welfare over S and then maximization over C. The maximin and minimax regret criteria require solution of saddle point problems in S and C. The literature in applied decision analysis encounters many cases in which it is infeasible to find exact solutions to these problems, even with modern computers and software. Researchers use numerical or analytical approximations to simplify.

Expressing incredible certitude enables a more extreme simplification than is typically performed in applied decision analysis. One selects a single state of nature, say s*, and optimizes “as if” this is the actual state. Thus, one solves the problem max C(c,s*). This is much simpler than the criteria discussed above.

The question is the quality of the decision yielded by as-if optimization. When it yields a unique solution, the choice is necessarily undominated. However, it does not seem possible to say anything further without placing more structure on the decision problem. Depending on the circumstances, as-if optimization may yield relatively high or low expected welfare, minimum welfare, or maximum regret.

As-if optimization cannot yield some choices that may be attractive if one recognizes uncertainty. In particular, it cannot yield a choice that involves costly information acquisition. If one acts as if the actual state is s*, there exists no relevant information to acquire.

As-if optimization also cannot yield diversification. As will be discussed in Chapter 5, Manski (Reference Manski2009) studied allocation of two treatments to a population and showed that the minimax regret criterion always yields a diversified allocation under uncertainty. In contrast, as-if optimization does not diversify. It allocates the entire population to the treatment that gives the higher welfare in state s*.

2.7.1 Using As-If Consensus to Coordinate Collective Decisions: Financial Accounting

An idea similar to as-if optimization is to use “as-if consensus” to simplify collective decision making. As-if consensus means that the members of a community agree to accept a conventional certitude, which asserts that some specified state of nature holds. The motivation is that this eliminates coordination failures that may arise if persons recognize uncertainty and deal with it in different ways. I am aware of one context with a compelling argument for as-if consensus. This is in establishment of rules for financial accounting.

The literature on accounting has long been aware of uncertainties in the estimates that accounting systems make; see Brief (Reference Brief1975) for a historical perspective. The question has been how to deal with uncertainty. The answer has been to propose conventions for producing point estimates and seek to have them widely accepted, the result being as-if consensus.

As-if consensus seems essential when formulating rules for transactions. Without it, parties may not agree on the amounts to be transacted. Consider, for example, the use by the federal government of decennial state-by-state Census population estimates in apportionment of the U.S. House of Representatives and allocation of federal funds across the states. It is recognized that Census population estimates may have various forms of error; see, for example, Seeskin and Spencer (Reference Seeskin and Spencer2015). Nevertheless, apportionment and fund allocation require that the Census Bureau use some convention to produce a point estimate of each state’s population.

The use of point estimates in accounting may be inevitable, but such use does not imply that the producers of these estimates should act as if they are errorless. The conceptual framework for accounting promulgated in Financial Accounting Standards Board (2018) is instructive. The framework calls for accountants to provide a “faithful representation” of financial information, writing (Financial Accounting Standards Board, 2018, p. 4):

Faithful representation does not mean accurate in all respects. Free from error means there are no errors or omissions in the description of the phenomenon, and the process used to produce the reported information has been selected and applied with no errors in the process. In this context, free from error does not mean perfectly accurate in all respects. For example, an estimate of an unobservable price or value cannot be determined to be accurate or inaccurate. However, a representation of that estimate can be faithful if the amount is described clearly and accurately as being an estimate, the nature and limitations of the estimating process are explained, and no errors have been made in selecting and applying an appropriate process for developing the estimate.

I find admirable the way the board defines “free from error.” It does not ask that a financial estimate or prediction be “perfectly accurate in all respects,” which would require incredible certitude. It asks the accountant to describe without error “the process used to produce the reported information” and to explain “the limitations of the estimating process.” Thus, the board calls on accountants to describe uncertainty transparently rather than hide it.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Incredible Certitude
  • Charles F. Manski, Northwestern University, Illinois
  • Book: Discourse on Social Planning under Uncertainty
  • Online publication: 02 January 2025
  • Chapter DOI: https://doi.org/10.1017/9781009556767.004
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Incredible Certitude
  • Charles F. Manski, Northwestern University, Illinois
  • Book: Discourse on Social Planning under Uncertainty
  • Online publication: 02 January 2025
  • Chapter DOI: https://doi.org/10.1017/9781009556767.004
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Incredible Certitude
  • Charles F. Manski, Northwestern University, Illinois
  • Book: Discourse on Social Planning under Uncertainty
  • Online publication: 02 January 2025
  • Chapter DOI: https://doi.org/10.1017/9781009556767.004
Available formats
×