Report

Failure to Detect Mismatches Between Intention and Outcome in a Simple Decision Task

See allHide authors and affiliations

Science  07 Oct 2005:
Vol. 310, Issue 5745, pp. 116-119
DOI: 10.1126/science.1111709

Abstract

A fundamental assumption of theories of decision-making is that we detect mismatches between intention and outcome, adjust our behavior in the face of error, and adapt to changing circumstances. Is this always the case? We investigated the relation between intention, choice, and introspection. Participants made choices between presented face pairs on the basis of attractiveness, while we covertly manipulated the relationship between choice and outcome that they experienced. Participants failed to notice conspicuous mismatches between their intended choice and the outcome they were presented with, while nevertheless offering introspectively derived reasons for why they chose the way they did. We call this effect choice blindness.

A fundamental assumption of theories of decision making is that intentions and outcomes form a tight loop (1). The ability to monitor and to compare the outcome of our choices with prior intentions and goals is seen to be critical for adaptive behavior (24). This type of cognitive control has been studied extensively, and it has been proposed that intentions work by way of forward models (5) that enable us to simulate the feedback from our choices and actions even before we execute them (6, 7).

However, in studies of cognitive control, the intentions are often tightly specified by the task at hand (810). Although important in itself, this type of research may not tell us much about natural environments where intentions are plentiful and obscure and where the actual need for monitoring is unknown. Despite all its shortcomings, the world is in many ways a forgiving place in which to implement our decisions. Mismatches between intention and outcome are surely possible, but when we reach for a bottle of beer, we very seldomly end up with a glass of milk in our hands. But what if the world were less forgiving? What if it instead conspired to create discrepancies between the choices we make and the feedback we get? Would we always be able to tell if an error were made? And if not, what would we think, and what would we say?

To examine these questions, we created a choice experiment that permitted us to surreptitiously manipulate the relationship between choice and outcome that our participants experienced. We showed picture pairs of female faces to 120 participants (70 female) and asked them to choose which face in each pair they found most attractive. On some trials, immediately after their choice, they were asked to verbally describe the reasons for choosing the way they did. Unknown to the participants, on certain trials, a double-card ploy was used to covertly exchange one face for the other (Fig. 1). Thus, on these trials, the outcome of the choice became the opposite of what they intended. Each subject completed a sequence of 15 face pairs, three of which were manipulated (M). The M face pairs always appeared at the same position in the sequence, and for each of these pairs, participants were asked to state the reasons behind their choice. Verbal reports were also solicited for three trials of nonmanipulated (NM) pairs (11).

Fig. 1.

A snapshot sequence of the choice procedure during a manipulation trial. (A) Participants are shown two pictures of female faces and asked to choose which one they find most attractive. Unknown to the participants, a second card depicting the opposite face is concealed behind the visible alternatives. (B) Participants indicate their choice by pointing at the face they prefer the most. (C) The experimenter flips down the pictures and slides the hidden picture over to the participants, covering the previously shown picture with the sleeve of his moving arm. (D) Participants pick up the picture and are immediately asked to explain why they chose the way they did.

The experiment employed a 3-by-2, between-group factorial design, with deliberation time and similarity of the face pairs as factors. For time, three choice conditions were included: one with 2 s of deliberation time, one with 5 s, and one where participants could take as much time as they liked. Participants generally feel that they are able to form an opinion given 2 s of deliberation time (supporting online text). Nevertheless, the opportunity for participants to enjoy free deliberation time was included to provide an individual criterion of choice. For similarity, we created two sets of target faces, a high-similarity (HS) and a low-similarity (LS) set (fig. S1). Using an interval scale from 1 to 10, where 1 represents “very dissimilar” and 10 “very similar,” the HS set had a mean similarity of 5.7 (SD = 2.1) and the LS set a mean similarity of 3.4 (SD = 2.0).

Detection rates for the manipulated pictures were measured both concurrently, during the experimental task, and retrospectively, through a post-experimental interview (11) (supporting online text). There was a very low level of concurrent detection. With a total of 354 M trials performed, only 46 (13%) were detected concurrently. Not even when participants were given free deliberation time and a set of LS faces to judge were more than 27% of all trials detected this way. There were no significant differences in detection rate between the 2-s and 5-s viewing time conditions, but there was a higher detection rate in the free compared to the fixed viewing time conditions [t(118) = 2.17, P = < 0.05). Across all conditions, there were no differences in detection rate between the HS and the LS sets (Fig. 2A). In addition, there were no significant sex or age differences in detection rate. Tallying all forms of detection across all groups revealed that no more than 26% of all M trials were exposed.

Fig. 2.

Percent detection, divided into deliberation time and similarity, for (A) all trials and (B) trials corrected for prior detections. Sim, similar (HS); Dis, dissimilar (LS). Error bars indicate the standard deviation of the means.

However, these figures are inflated even so. The moment a detection is made, the outlook of the participants changes: They become suspicious, and more resources are diverted to monitoring and control. To avoid such cascading detection effects, it is necessary to discard all trials after the first detection is made. Figure 2B shows detection rates with this correction in place. The overall detection rate was significantly lower [t(118) = 3.21, P < 0.005], but none of our prior conclusions are affected by the use of this data set (the percentage of participants that detected the manipulation is shown in fig. S2).

Our experiment indicates that the relationship between intentions and outcomes may sometimes be far looser than what current theorizing has suggested (6, 9). The detection rate was not influenced by the similarity of the face pairs, indicating the robustness of the finding. The face pairs of the LS set bore very little resemblance to each other, and it is hard to imagine how a choice between them could be confused (fig. S1 and supporting online text). The overall detection rate was higher when participants were given free deliberation time. This shows the importance of allowing individual criteria to govern choice, but it is not likely to indicate a simple subjective threshold. The great majority of the participants in the 2-s groups believed themselves to have had enough time to make a choice (as determined by post-test interviews), and there was no difference in the actual distribution of choices among the pairs from fixed to free deliberation time.

Next, we examined the relationship between choice and introspective report. One might suspect that the reports given for NM and M trials would differ in many ways. After all, the former reports stem from a situation common to everyday life (revealing the reasons behind a choice), whereas the latter reports stem from a truly anomalous one (revealing the reasons behind a choice one manifestly did not make).

We classified the verbal reports into a number of different categories that potentially could differentiate between NM and M reports. For all classifications, we used three independent blind raters, and interrater reliability was consistently high (supporting online text and table S1). We found no differences in the number of empty reports (when participants were unable to present any reasons at all) or in the degree to which reports were phrased in present or past tense (which might indicate whether the report is made in response to the present face or the prior context of choice). Neither did the length of the statements, as measured by number of characters, differ between the two sets (NM = 33, SD = 45.4; M = 38, SD = 44.4), nor the amount of laughter present in the reports (with laughter being a potential marker of nervousness or distress). We found significantly more dynamic self-commentary in the M reports [t(118) = 3.31, P < 0.005]. In this type of commentary, participants come to reflect upon their own choice (typically by questioning their own prior motives). However, even in the M trials, such reports occurred infrequently (5% of the M reports).

We rated the reports along three dimensions: emotionality, specificity, and certainty (using a numeric scale from 1 to 5). Emotionality was defined as the level of emotional engagement in the report, specificity as the level of detail in the description, and certainty as the level of confidence in their choice the participants expressed. There were no differences between the verbal reports elicited from NM and M trials with respect to these three categories (fig. S3). Seemingly, the M reports were delivered with the same confidence as the NM ones, and with the same level of detail and emotionality. One possible explanation is that overall engagement in the task was low, and this created a floor effect for both NM and M reports. However, this is unlikely to be the case. All three measures were rated around the midline on our scale (emotionality = 3.5, SD = 0.9; specificity = 3.1, SD = 1.2; certainty = 3.3, SD = 1.1). Another possibility is that the lack of differentiation between NM and M reports is an indication that delivering an M report came naturally to most of the participants in our task. On a radical reading of this view, a suspicion would be cast even on the NM reports. Confabulation could be seen to be the norm and truthful reporting something that needs to be argued for.

To scrutinize these possibilities more closely, we conducted a final analysis of the M reports, adding a contextual dimension to the classification previously used. Figure 3 shows the percentage of M reports falling into eight different categories. The “specific confabulation” category contains reports that refer to features unique to the face participants ended up with in a manipulated trial. As these reports cannot possibly be about the original choice (i.e., “I chose her [the blond woman] because she had dark hair”), this would indeed be an indisputable case of “telling more than we can know” (12). Equally interesting is the “original choice” category. These are reports that must be about the original choice, because they are inconsistent with the face participants ended up with (i.e., “I chose her because she smiled [said about the solemn one]”. Here, despite the imposing context of the manipulated choice, vestiges of the original intention are revealed in the M reports. Analogous to the earlier example of confabulation, this would be an unquestionable case of truthful report.

Fig. 3.

Frequency distribution of the contents of the M reports aligned along a rough continuum from confabulatory to truthful report. Sample sentences (translated from Swedish) are drawn from the set of reports for the displayed face pair. Letters in brackets indicate whether the report was given by a male (M) or a female (F) participant. The specific confabulation (Conf.) category contains reports that refer to features unique to the face participants ended up with in an M trial. The detailed and emotional confabulation categories contain reports that rank exceptionally high on detail and emotionality (>4.0 on a scale from 1 to 5). The simple and relational confabulation categories include reports where the generality of the face descriptions precluded us from conclusively associating them with either of the two faces (i.e., everybody has a nose, and a personality). The category of uncertainty contains reports dominated by uncertainty (<2 on a scale from 1 to 5). The dynamic reports are reports in which participants reflect upon their own choice, and the final category contains reports that refer to the original context of choice.

In summary, when evaluating facial attractiveness, participants may fail to notice a radical change to the outcome of their choice. As an extension of the well-known phenomenon of change blindness (13), we call this effect choice blindness (supporting online text). This finding can be used as an instrument to estimate the representational detail of the decisions that humans make (14). We do not doubt that humans can form very specific and detailed prior intentions, but as the phenomenon of choice blindness demonstrates, this is not something that should be taken for granted in everyday decision tasks. Although the current experiment warrants no conclusions about the mechanisms behind this effect, we hope it will lead to an increased scrutiny of the concept of intention itself. As a strongly counterintuitive finding, choice blindness warns of the dangers of aligning the technical concept of intention too closely with common sense (15, 16).

In addition, we have presented a method for studying the relationship between choice and introspection. Classic studies of social psychology have shown that telling discrepancies between choice and introspection can sometimes be discerned in group-level response patterns (12) but not for each of the individuals at hand. In the current experiment, using choice blindness as a wedge, we were able to “get between” the decisions of the participants and the outcomes with which they were presented. This allowed us to show, unequivocally, that normal participants may produce confabulatory reports when asked to describe the reasons behind their choices. More importantly, the current experiment contains a seed of systematicity for the study of choice and subjective report. The possibility of detailing the properties of confabulation that choice blindness affords could give researchers an increased foothold in the quest to understand the processes behind truthful report.

Supporting Online Material

www.sciencemag.org/cgi/content/full/310/5745/116/DC1

Materials and Methods

SOM Text

Figs. S1 to S3

Table S1

References and Notes

View Abstract

Navigate This Article