Review

# The realities of risk-cost-benefit analysis

See allHide authors and affiliations

Science  30 Oct 2015:
Vol. 350, Issue 6260, aaa6516
DOI: 10.1126/science.aaa6516

## Setting policy, knowing risks

Policy-makers often commission formal analyses to estimate the costs, risks, and benefits of proposed projects or policies. Applications range from estimating the risks of commercial nuclear power, to setting priorities among environmental risks, to comparing technologies for generating electricity, to weighing the benefits and risks of prescription drugs. In the United States, analyses are required for all major federal regulations. Fischhoff reviews how such analyses are limited by the scientific and ethical judgments inherent in the process and require collaboration between those who generate the analyses and those who want to use them.

Science, this issue p. 10.1126/science.aaa6516

## Structured Abstract

### BACKGROUND

Synthetic biology, nanotechnology, geoengineering, and other innovative technologies share a property: Their effects must often be inferred long before they are experienced. If those inferences are sound, then informed decisions are possible. If not, then decision-makers may incur risks and costs far greater than any expected benefits. Risk, cost, and benefit analysis can offer transparent ways to assemble and integrate relevant evidence to support complex decision-making All forms of analysis have the same logic: Decompose complex systems into manageable components and then calculate how they might perform together. All require scientific judgment to bound the set of components and assess the limits to those bounds. All require ethical judgment to determine which outcomes to predict and to extract the policy implications of the results. The usefulness of any analysis depends on how well its underlying assumptions and their implications are understood by those hoping to use its results. The present review uses historical examples to illustrate the roles of judgment in analyses that address four basic questions: (i) How large are the risks from a single technology?(ii) Which risks merit the greatest attention? (iii) Which technology produces the least risk per unit of benefit? (iv) Are a technology’s expected benefits acceptable, given its risks and other expected costs?

Analyses are always incomplete. They neglect concerns that are hard to quantify. They define terms in ways that serve some interests more than others. They consider some sources of uncertainty but not others. Advances in the science of analysis have often occurred after critics unhappy with the results of an analysis challenged the legitimacy of its assumptions. Awareness of the role of judgment in analysis has grown over time, in parallel with improvements in the sophistication of analytical calculations. Progress has been made in some areas, but more is needed, to include developing better ways to model human behavior, elicit expert judgments, articulate decision-makers’ preferences, characterize the robustness of conclusions, and communicate with decision-makers. The practice of analysis draws on the sciences of public participation and science communication, both shaped by the challenges faced in securing a fair hearing for science in issues where it plays a central role.

### OUTLOOK

The pace of advances will depend on the degree of collaboration among the sciences relevant to these problems, including not only the sciences underlying the technology in question but social, behavioral, and economic science as well. How well the science of analysis aids its practice will depend on how well analysts collaborate with decision-makers so as to produce the estimates that decision-makers need and ensure that analytical results are properly understood. Over time, those interactions will help decision-makers understand the capabilities and limitations of analysis while helping analysts become trusted allies, dedicated to producing relevant, properly qualified estimates of cost, risk, and benefit.

## Abstract

Formal analyses can be valuable aids to decision-making if their limits are understood. Those limits arise from the two forms of subjectivity found in all analyses: ethical judgments, made when setting the terms of an analysis, and scientific judgments, made when conducting it. As formal analysis has assumed a larger role in policy decisions, awareness of those judgments has grown, as have methods for making them. The present review traces these developments, using examples that illustrate the issues that arise when designing, executing, and interpreting analyses. It concludes with lessons learned from the science and practice of analysis. One common thread in these lessons is the importance of collaborative processes, whereby analysts and decision-makers educate one another about their respective needs and capabilities.

Formal analyses are often commissioned to estimate the costs, risks, and benefits of projects or policies. As seen below, the range of applications is as diverse as estimating the risks of commercial nuclear power, setting priorities among environmental risks, comparing technologies for generating electricity, and weighing the benefits and risks of prescription drugs. In the United States, analyses are required for all major federal regulations. One current analysis is examining the risks of gain-of-function research for pathogens with pandemic potential (i.e., studying how they could become more potent), hoping to resolve a dispute among biological scientists (1).

Risk, cost, and benefit analysis reflect a strategy of bounded rationality (2). Rather than attempting to address all aspects of a complex decision, such analyses “bound” it, in the sense of ignoring enough of its elements to be able treat those that remain “rationally.” Typically, that means estimating the expected effect of each decision option by multiplying the size of possible outcomes by their probability of occurring should the option be chosen.

Whether those calculations lead to better decisions depends on how well two sets of judgments are made and understood. One set comprises the ethical judgments involved in defining “risk,” “cost,” and “benefit,” thereby specifying which outcomes are deemed worth estimating. The second set comprises the scientific judgments involved in recruiting and interpreting the evidence used in estimating those outcomes (37).

All analyses have potentially controversial formulations, reflecting the ethical judgments underlying them, and uncertain conclusions, reflecting the scientific ones. For example, an ethical judgment determines whether an analysis estimates just the total risks, costs, and benefits for all people affected by a decision or if it also considers distributional effects, reflecting how those outcomes differ across groups of people (e.g., rich people versus poor people or people today versus people tomorrow). A scientific judgment determines whether an analysis considers just physical processes affecting the outcomes (e.g., valve failures and toxic plumes) or also human factors (e.g., how well workers operate equipment or how faithfully patients take medications). To use analyses wisely, decision-makers need to know what judgments were made and how they affected the results.

Awareness of such ethical and scientific judgments has grown slowly over the history of analysis, often emerging when motivated critics claimed to have found flaws in analyses whose results displeased them. The present review uses historical examples to illustrate the roles of judgment in analyses that address four basic questions: (i) How large are the risks from a single technology? (ii) Which risks merit the greatest attention? (iii) Which technology produces the least risk per unit of benefit? and (iv) Are a technology’s expected benefits acceptable, given its risks and other expected costs?

## The science of analysis

Analysis is not a science, in the sense of formulating and evaluating general theories. However, analyses rely on scientific results to guide the judgments needed when setting bounds, calculating estimates, and assessing their robustness. For example, when evaluating a new drug, analysts’ ethical judgments might be informed by research into which benefits and side effects have the greatest effect on patients’ lives; their scientific judgments might be informed by research into how likely patients are to take the drug as prescribed.

Exercising scientific judgment when gathering and interpreting evidence is a task faced by both analysts and scientists. Where analysts’ work differs from that of scientists is in the breadth of their evidence-gathering and the length of their interpretative chains. For example, analysts estimating the environmental impacts of a genetically modified crop must gather evidence regarding ecology, entomology, agronomy, and industrial chemistry, among other things, and then must project the implications of that evidence into a future world with potential changes in climate, land use, trade, and regulation, among other things. Whereas scientists might consider one or two interactions among such factors (e.g., how a longer, drier growing season might affect a pest), analysts might need to consider them all.

The science of analysis develops general methods for performing these tasks. Those methods include procedures for eliciting expert judgments (when observations are lacking), for combining diverse forms of evidence, and for assessing residual uncertainties (3, 710). The science of analysis has also developed methods for making ethical judgments (35). Those methods include procedures for eliciting individuals’ preferences directly and for inferring those preferences from their behavior (11, 12). As seen in the examples that follow, the science of analysis, like other sciences, has often advanced in response to challenges to controversial results [e.g., (6, 13-15)].

## Analyzing risks from one source: Nuclear power

In 1972, facing public concern over the risks of commercial nuclear power, the Atomic Energy Commission sponsored a probabilistic risk analysis of pressurized water reactors. In 1975, Norman Rasmussen and colleagues delivered WASH-1400, the Reactor Safety Study (16). Building on work in the chemical, aerospace, and other industries (17, 18), WASH-1400 sought to identify all major accident sequences and then calculate the probability of each sequence occurring and the expected number of immediate deaths should that happen. It used both forward-looking event-tree analysis, asking how accidents could arise from normal operations, and backward-looking fault-tree analysis, looking for precursors of possible accidents.

WASH-1400 greatly advanced the science of risk analysis. However, rather than resolving the question of nuclear safety, it sparked intense controversy. Reviews from the American Physical Society and the Nuclear Regulatory Commission, examining the scientific judgments underlying WASH-1400 (19, 20), concluded that there was no reason to believe that the report’s risk estimates were biased upward or downward. However, the reviewers were confident that those estimates had been stated with unwarranted certainty. Moreover, they could not say by how much (20). One source of the reviewers’ uncertainty was “that WASH-1400 is inscrutable,…it is very difficult to follow the detailed thread of any calculation through the report” [(20), p. vii].

Even though they could not repeat the study’s calculations, the reviewers could still audit it for structural problems, in the sense of seemingly relevant factors left outside its bounds. Those omissions included risks related to evacuation and manufacturing, as well as “common cause failures,” whereby an initiating event (e.g., a tsunami, earthquake, or terrorist attack) damages systems meant to provide redundant protection. The poor documentation was in itself troubling. If the reviewers could not follow the work, how well could the analysts have stayed on top of it?

The reviewers commended the study for considering the effects of operator behavior on reactor safety rather than examining only physical factors. Nonetheless, they believed that the analysts’ scientific judgment had overestimated those risks by underestimating “human adaptability during the course of an accident” [(20), p. vi].

The formal analysis of human behavior received a boost, a few years later, as a result of operators’ apparent role in the Three Mile Island accident of 1979. One approach has been human reliability analysis, which applies probabilistic risk analysis to human behavior (21). Such computational methods are potentially useful for estimating failure rates with highly structured tasks, like the assembly-line munitions production for which they were initially developed. Unfortunately, it is another matter to produce quantitative estimates of the risks arising from human factors in cognitively intense tasks, such as the design, operation, and management of complex systems (2225). On the other hand, when the need to calculate everything is relaxed, the logic of analysis can clarify the tasks facing operators, anticipating and perhaps avoiding problems (22, 26, 27).

Although WASH-1400’s omissions compromised its overall risk estimates, its wide scope still produced “increased understanding of the full spectrum of reactor accident sequences [with] implications for nuclear power plant design, siting, and planning for mitigation of consequences” [(20), p. ix] Thus, the study may have identified ways to make nuclear power safer, even though it could not establish how safe the technology was overall. Assessing the relative risk of alternative designs is a more tractable task analytically than estimating any one design’s absolute risk level (28). Comparing designs requires examining only those elements where they differ. Estimating, any single design’s overall risk requires examining every element in every accident sequence, however hard to quantify (e.g., terrorists’ plans for attacking reactors)

Although it focused on the study’s scientific judgments, the review also questioned the ethical judgments embodied in its definition of risk, noting that evaluating the “acceptability of nuclear reactors solely on the risk of early fatalities, and latent health effects, and property damage for Class 9 [major] accidents is inappropriate. All the issues associated with the nuclear fuel cycle are important, including economic and environmental matters, and weapons proliferation” [(20) p. 40]. Failure to recognize these limits may have contributed to “instances in which WASH-1400 has been misused…to judge the acceptability of reactor risks” [(20), p. x], thereby neglecting the full risk, costs, and benefits of generating electricity with nuclear power and alternative technologies.

Advances in the theory and application of probabilistic risk analysis can be found in any issue of Risk Analysis, Reliability Engineering and System Safety, IEEE Transactions on Reliability, and related journals. All forms of analysis have the same logic: Decompose complex systems into manageable components and then calculate how they might perform together. All require scientific judgment to bound the set of components and assess the limits to those bounds. All require ethical judgment to determine which outcomes to predict and to extract the policy implications of the results.

Examples of the progress possible when assumptions are clearly documented can be found in the peer-reviewed assessments of microbial risks in a recent issue of Risk Analysis. Results there include the possibility of treating municipal wastewater well enough to use for irrigating vegetables (29) and the impossibility of treating victims as a way to eliminate the parasite responsible for schistosomiasis (30). Clear documentation can also reveal fundamental flaws, as when an external review (31) concluded that the Department of Homeland Security’s 2006 Bioterrorism Risk Assessment (32) “should not be used”(italics in original), given such problems as using estimates “not supported by any existing data,” omitting “economic loss and environmental and agricultural effects,” and lacking “a realistic representation of the behavior of an intelligent adversary.” The review notes that, for any revision to be useful to decision-makers, “documentation should be sufficient for scientific peer review” [(31), pp. 3–5].

## Analyzing risks from multiple sources: Priority setting

The Reactor Safety Study made an explicit ethical judgment in omitting risks related to nuclear proliferation (among others). Some such screening of relevant outcomes is required when bounding any analysis—a process that must balance the risk of looking at so many issues that none are understood well against the risk of ignoring issues that would prove important were they to receive proper attention.

The U.S. Environmental Protection Agency addressed this challenge in the late 1980s and early 1990s in a series of risk-ranking exercises conducted with its staff (33), with its Scientific Advisory Board (34), and, eventually, with citizens from many states and regions (35). In these exercises, participants chose the risks (e.g., infectious disease and urban sprawl) and the valued outcomes (e.g., morbidity, mortality, economic development, and, in one case, the Vermont way of life). Analysts then roughly estimated each outcome for each risk, after which participants compared those expected outcomes in order to set priorities for future analysis and action. Thus, analysts’ work was driven by policy-makers’ concerns.

The success of risk-ranking depends on the scientific judgment involved in identifying potentially relevant outcomes and providing the initial estimates. How much experts know about each topic depends on the state of the science. The usefulness of their knowledge depends on how well they can translate it into the terms that decision-makers need. Since the Reactor Safety Study, collaborations between behavioral and decision scientists have developed procedures for expert elicitation (610), designed to structure that translation process so as to reduce judgmental biases such as overconfidence and anchoring (36, 37). For example, to avoid the ambiguity of verbal quantifiers (e.g., “common” side effect, “small” risk) (38), these procedures elicit numerical expressions (e.g., “there is a 95% chance of between 5 and 50 people dying in coal mining accidents next year”) (39). Expert elicitation has been used in domains as diverse as ocean acidification (40), nuclear power (41), and genetically modified crops (42).

The success of risk-ranking also depends on the ethical judgments made in defining its key terms (35, 43). For example, “risk of death” could be defined as “probability of premature mortality” or “expected life-years lost.” The former definition treats all deaths as equal, whereas the latter gives extra weight to deaths of young people (who lose more years of expected life) (43, 44). In the United States, similar numbers of people die annually from accidents and chronic lower respiratory diseases (45). By the first definition, the two risks are similar; by the second, accidents are a greater threat (because they disproportionately affect younger people).

One could distinguish among deaths for other reasons, too. For example, Starr’s seminal article on risk analysis (46) proposed that people treat voluntary and involuntary risks differently (e.g., skiing versus electric power). Therefore, voluntariness should be part of the definition of risk. Subsequent research has identified other features that might matter to people, such as how equitably risks are distributed (e.g., over people and over time), how well risks are understood (by scientists and by those exposed to the risks), and how much dread (and psychological discomfort) risks evoke (5, 4648).

Understanding Risk, an influential National Research Council (49) report, proposed an analytical-deliberative process for defining “risk,” thereby deciding which features to consider and ignore. In that process, stakeholders state their concerns. Analysts then propose a formal expression that is precise enough to be used in quantitative analyses. Stakeholders deliberate that proposal in terms of whether it captures their intent, iterating as needed with the analysts, in the manner of EPA’s risk-ranking exercises.

Having stakeholders define the terms of an analysis tailors it to their needs. However, it also limits comparisons across analyses, if each has its own definition of “risk” (or “cost” or “benefit”) (50). Figure 1 shows a standard definition protocol endorsed by the U.K. government’s economic and finance ministry (51). On the left are two outcomes that might be monetized as how much people should be willing to pay to eliminate them. On the right are six societal concerns, representing reasons people might view risks with otherwise similar outcomes differently (35, 4649). The format invites discussion of whether concerns that affect feelings should also affect policies (e.g., “I dislike risks that are unfamiliar to me, but what matters is how well scientists understand them”) (52). The format also legitimates asking about the importance of nonmonetary concerns (e.g., are death and harm worse when equity is violated, in the sense that the people who bear the risk are not those who stand to benefit?) This protocol was illustrated in the context of whether to treat all roadway accidents similarly or, for example, to give added weight to those affecting children (51).

The success of an analytical-deliberative process is an empirical question (11, 12, 53, 54). One measure of that success is how well the process enables participants to understand the risks and develop stable preferences among them (e.g., can they make sound inferences based on what they know? Do their preferences change when additional perspectives are suggested?). A second measure is whether a process leads to fewer, but better, conflicts, by focusing participants on genuine disagreements and avoiding ones that arise from misunderstanding (e.g., can participants describe their opponents’ position, even when they reject it?). Rather than seeking consensus, these processes accept the legitimacy of informed differences and try to articulate their policy implications. For practical purposes, it might be enough for participants to agree about which risks rank highest, and hence deserve attention, and which rank lowest, and hence can be set aside (55). If systematic prioritization fails these tests, then decision-makers might be better off “muddling through,” in the sense of tackling problems as they arise and reshaping their priorities as they go along (5658). Proponents of deliberative democracy (59) study how such focused processes compare to conventional (“aggregative”) policy-making. For example, the Energy Systems Project was a national consultation that found perhaps surprising agreement in its diverse participants’ visions for U.K. energy futures (60).

## Analyzing risks per unit of benefit

Risk decisions are rarely about risks alone. Large risks may be acceptable if they bring large benefits and there are no good ways to reduce them. Small risks may be unacceptable if they bring small benefits or could be easily reduced (3, 43, 44, 61).

Thus, making sound decisions means comparing the expected risks, costs, and benefits of the available options. In an early step toward informing choices among ways to generate electricity, Inhaber (62) estimated their “risk per megawatt-year of electric power.” He defined “risk” as the number of workdays lost from injury and the number of deaths, treating all deaths as equal. Predictably, given the stakes, his work met vigorous criticism on both ethical and scientific grounds (63) and was followed by more ambitious analyses (6466).

Such calculations are special cases of life-cycle analysis, which tries to account for all of the energy and materials involved with creating, using, and disposing of products, processes, or services (67, 68). That accounting depends on the bounds of the analysis, including how far it goes upstream (e.g., does it consider the energy and material embodied in equipment?) and downstream (e.g., does it include methane releases from landfills?). Setting those bounds requires scientific judgments (e.g., is the risk from terrorist attacks large enough to include in the analysis?) and ethical ones (e.g., do effects in other countries matter?).

Once the bounds are set and expected outcomes estimated within them, decision-makers must weigh those outcomes against one another. Cost-benefit analysis (69) does that by translating all outcomes into a common unit: money. In the United States, cost-benefit analysis got one push from a mandate to monetize the effects of water projects (70) and another from President Reagan’s Executive Order 12291 (71) requiring analysis of the “net benefit to society” of “major rules” and “alternative approaches that could substantially achieve the same regulatory goal.”

As an example of such analyses (and their assumptions), economists from the U.S. Environmental Protection Agency in 2014 contrasted the costs of compliance estimated before and after implementing various regulatory rules (e.g., the 2001 National Primary Drinking Water Regulations for Arsenic and the 1998 Locomotive Emission Standards) (72, 73). Observing that the actual costs were generally less than the predicted ones, these analysts attributed that bias to innovative cost-cutting (spurred by the regulations), incomplete compliance, and initial estimates based on industry data that deliberately overestimated anticipated costs. They also lamented the inconsistent bounds that complicated comparing analyses (e.g., some considered just capital costs, whereas others included both capital and operating costs).

Market prices provide one source of guidance for monetization, whose interpretation requires both scientific and ethical judgments. For example, car prices reflect their benefit to buyers. If a buyer gets a bargain, then there is consumer surplus, and the price is less than the car’s worth. If a market suffers from monopoly pricing, misleading advertising, or questionable loan practices, then a car’s price might not capture its worth (or full cost). In such cases, using that price might bias estimates (a question of science) or endorse unfair practices (a question of ethics).

For outcomes that are not traded in efficient markets, other measures of value are possible. For example, tax credits (e.g., for fuel efficiency) might be taken as capturing benefits to society beyond those received by consumers. If the bounds of an analysis include the externalized costs of driving (i.e., those imposed on others), then those costs might be partially captured by hospital bills for lung disease (due to particulate matter) and aspirin sales (due to tropospheric ozone) (69, 70). The travel-cost method monetizes destinations by what people pay to see them (74, 75). Thus, the benefits of wilderness areas are partially captured by visitors’ entry fees, fuel and lodging costs, and outfitting expenses. The cost of preserving those areas might be partially captured by the economic benefits promised by would-be developers. One tongue-in-cheek analysis applies travel-cost method assumptions made in one analysis (the Roskill Commission) to justify leveling Westminster in favor of a third London airport, given the time that travelers would save in getting to such a central location (76). Ecosystem services analyses assess the economic benefits of natural systems, such as the flood protection provided by marshes and barrier islands (77).

Human life, a focus of many risk analyses, is, of course, not traded directly in any legitimate marketplace. One possible way to monetize the value of human life is in terms of wage premiums for riskier jobs (e.g., receiving $X for assuming a Y% increase in premature death) (7880). Such analyses require scientific judgments in order to hold constant factors that can make entry-level jobs riskier than better-paid senior ones and the ethical judgment of accepting wages as measuring individuals’ worth (81). A common value in current U.S. regulatory analyses is$6 to 7 million per life. Considering lost life-years (and not lost lives) has been criticized on ethical grounds (51), sometimes as applying a “senior discount” (82).

These approaches analyze the preferences “revealed” in market transactions. In the absence of such transactions, stated preference methods ask people how much they are willing to pay to gain an outcome [or, less commonly, how much they are willing to accept for losing it (83)]. One variant, the contingent valuation method, asks respondents to imagine a market offering a transaction that links money to a valued outcome (e.g., the opportunity to pay to preserve an endangered species) (84). Its use in monetizing damages from the Exxon Valdez grounding sparked lively debate (85, 86).

One recurrent topic for stated preference methods is whether to use surveys, which ask people to answer questions on their own, or interactive methods, where moderators help people think through their answers, as with risk-ranking or decision analysis (35, 87). Surveys produce biased estimates when respondents fail to think of all relevant perspectives by themselves. Interactive methods produce biased estimates when moderators fail to present all perspectives fairly (11, 12, 35, 8789). A second recurrent topic is how people deal with missing details. For example, if a question says nothing about distributional effects, do respondents try to guess what those might be, feel implicit pressure to ignore them, or resent the implication that equity does not matter?

## Analyzing risks and benefits: Evaluating medical treatments

When the U.S. Food and Drug Administration (FDA) decides whether to approve drugs, the diverse, uncertain benefits and risks defy monetization. Nonetheless, manufacturers still want predictability in FDA’s rulings. Recognizing that desire, FDA has committed to developing a more explicit form of analysis than its traditional reliance on expert judgment, whereby its reviewers consult with one another and FDA advisory panels before rendering a narrative summary of their conclusions (90).

One potential approach to greater predictability is cost-effectiveness analysis, developed to allocate resources among approved medical treatments (91). It typically measures expected health benefits in quality-adjusted life-years (QALYs) or similar units (92, 93). Because those benefits represent reduced risks (of ill health), measuring them faces the same issues as measuring “risk.” Ethically, looking at years of life saved should be uncontroversial: More years are better. However, considering the quality of those years means placing less value on years of ill health (and, perhaps implicitly, on individuals experiencing them). Scientifically, analysts must judge how well people can answer QALY questions such as “How many years of perfect health would be equivalent to 10 years with limited mobility?”

With any stated preference method, respondents must first understand the outcomes (how mobile will I be?), then imagine the subjective experience (how will I tolerate the lack of mobility?), and finally translate those feelings into allowable answers (equivalent years of perfect health). Evaluating respondents’ success requires scientific judgment of the construct validity of their responses. That is, to what extent are they sensitive to relevant features of questions (e.g., whether the cure rate is 30% or 50%) (94) and insensitive to irrelevant ones (e.g., whether the question asks about the probability of death or the complementary probability of survival) (95, 96).

Although QALY-like approaches have strong advocates (97, 98), FDA has adopted a benefit-risk framework (Fig. 2) that does not translate all outcomes into a common unit but leaves them in their natural units (e.g., survival rates and mobility). The left-hand side of Fig. 2 summarizes FDA’s analysis of the evidence (99). The right-hand side interprets those findings in terms of FDA’s regulatory mandate. The bottom box explains FDA’s weighing of the expected risks and benefits, given the nature of the medical condition, the unmet medical need (with other treatments), and the plans for managing residual risks if the product is approved. The top row of the framework addresses a scientific and ethical question facing any analysis that makes choices on others’ behalf: Do the revealed or stated preferences used in the analysis adequately represent those individuals’ concerns? “Analysis of condition” is meant to convey a feeling for, say, life with the psychological sequelae of sickle cell disease or the incapacitating constipation that can accompany irritable bowel syndrome. FDA’s Voice of the Patient initiative (100) seeks to inform that summary by hearing directly from patients and advocates.

Preserving the reality of patients’ experience is one reason for leaving estimated risks and benefits in their natural units, rather than using standard measures of benefit and risk (e.g., QALYs). Those estimates, along with the narrative explaining FDA’s approval decision, preserve some of the richness in its reviewers’ intense deliberations over the evidence. A second reason for not using standard measures is that FDA recognizes that patients’ preferences may differ. As a result, it provides them with the inputs needed to make personally relevant choices. How people make such difficult trade-offs (e.g., when drugs offer both benefits and risks) is an active area of research (88, 89, 9496, 101).

Thus, instead of a computed weighting of risks and benefits expressed in a common unit, FDA offers a narrative weighing of estimated outcomes. As a result, FDA’s regulatory decisions are less predictable than they would be were FDA bound by a calculation. However, if the retained richness of the summaries helps FDA to construct more stable preferences and to explain them more fully, then more predictable policies should reveal themselves over time, without calculations (102, 103).

## Communication and coordination

Sound analysis requires sound communication between the analysts and the stakeholders whom they serve. Analysts need to know which issues matter to stakeholders so that they can make appropriate ethical judgments when bounding analyses and defining their terms. Stakeholders need to know what scientific judgments analysts made when conducting analyses so that they can decide how much to rely on their conclusions.

Given the social, professional, and institutional distance between most analysts and stakeholders, such communication requires deliberate coordination, with a structured process like that in Fig. 3 (104), whose features echo those of many related proposals (11, 12, 35, 4854, 105, 106). Its horizontal arrows imply continuing engagement, offering stakeholders an opportunity to hear and be heard, at each stage of the process. The vertical arrows imply continuing reflection by analysts on the progress of the work, with the opportunity and obligation to decide whether to continue, revise, or abandon it.

Among the examples above, EPA’s risk-ranking exercises come closest to such a process, whereas Inhaber’s analysis of the risks of generating electricity is the furthest away, guided solely by his intuitions about public concerns. Although stakeholder input was not part of WASH-1400, the recent Blue Ribbon Commission on American’s Nuclear Future called for proactive engagement (106). FDA’s framework was developed for use by its staff to protect the confidentiality of clinical trial data. However, its implementation requires stakeholder input.

Thus, the social context of analyses varies and, with it, the collaborative process that is needed and possible. Within those constraints, the science of stakeholder participation can provide guidance (11, 12, 107). It emphasizes creating respectful relationships, whereby stakeholders need not struggle to hear or be heard. That science has evolved from a simplistic Decide-Announce-Defend model, delivering analysts’ conclusions, to sustained, two-way communications, with stakeholders helping to define and interpret analyses. Recent examples of such processes include consultations over British energy policy (60), Cultus Lake (British Columbia) salmon (11), and the development of Zurich North (108),

The science of science communication provides guidance on how to create the content for such processes (109111). Research here began with (what is now called) a “deficit model,” whereby experts decide what knowledge people need to be deemed “literate” (about science, energy, climate, finance, and so on). The research has evolved to user-centered models, which begin by analyzing the decisions facing stakeholders, in order to identify the few things that they most need to know from among all those things that it might be nice to know. The work then proceeds by drafting messages designed to close critical gaps, testing their efficacy, and repeating as necessary until people are adequately informed.

Applying the science of communication to the communication of science requires overcoming three common barriers. One is the natural tendency for people to overestimate how well they know what other people are thinking and vice versa (112). As a result, people fail to learn enough about their audience to communicate effectively—and then may blame the audience when their message inexplicably does not get through. A second barrier is casual analysis of stakeholders’ information needs. For example, different facts about storm surges may be critical for the same individual when deciding whether to evacuate, buy a home, or support zoning rules. It takes analysis to determine the facts that people facing each decision need (39, 110, 113). A third barrier to effective communication is many experts’ conviction that lay people cannot grasp technical information. So why bother trying? The limits to lay judgments are, indeed, well documented (22, 36, 37, 47). However, that research also identifies ways to address these limits (39, 107114). Moreover, for securing public trust, even poor communication may be better than silence, by showing respect for the public’s right to know, even when meeting it clumsily (4, 11, 12, 48, 53).

Two-way communication might increase the cost of analysis. However, it can also increase its expected benefit by forestalling the radical skepticism that can poison public discourse once science loses public trust (115118). At the extreme, stakeholders may reject analysis per se, as when they advocate precautionary principles that proscribe highly uncertain actions with great potential risks (119, 120). These principles reject a tenet of rational analysis, namely, that any risk can be acceptable, given enough compensating benefit. By offering simple decision rules, precautionary principles may also provide a way to address a seeming imbalance of power, when stakeholders feel excluded from setting the terms of an analysis or without the resources to check its calculations (4, 28).

## Conclusion: Skill and wisdom in analysis

The science of analysis has seen advances in both the sophistication of its calculations and the awareness of the ethical and scientific judgments that they entail. It has also developed better ways to integrate behavioral science knowledge when communicating with stakeholders, eliciting expert knowledge, assessing preferences, and anticipating the effects of behavior (e.g., of operators or patients). Often, these advances emerged from controversies that revealed limits to analytical conventions, such as how problems are bounded or “risk” is defined. In other cases, the advances emerged from analysts grappling with the unique properties of new problems, as might be expected in future work analyzing the expected effects of hydrofracking, nanobeads, robotic surgery, commercial drones, or gain-of-function research on pathogens with pandemic potential.

To realize the potential of their science, analysts must create partnerships with the stakeholders who depend on their work, so that analyses are relevant and understood. To that end, analysts’ professional standards should require full disclosure of the scientific and ethical judgments made in formulating and executing analyses. However, those disclosures may not satisfy stakeholders unless preceded by continuing engagement with them. Together, analysts and stakeholders need to determine how best to invest analytical resources, resolve definitional issues, and interpret the resulting estimates. Through interactions like those depicted in Fig. 3, analysts can share their expertise while creating the mutually respectful personal relations needed to secure a trusted hearing for their work.

## References and Notes

1. For an example of the left-hand (scientific judgment) side of the framework, applied to a cancer drug given accelerated approval see (121).
2. Acknowledgments: Preparation of this article was supported by the National Science Foundation (SES-0949710) Center for Climate and Energy Decision Making and the Swedish Foundation for the Humanities and the Social Sciences (Riksbankens Jubileumsfond) Program on Science and Proven Experience. It is gratefully acknowledged, as are the comments of D. Caruso, S. Eggers, A. Finkel, N.-E. Sahlin, and three anonymous reviewers. The views expressed are those of the author.
View Abstract