Optimally Interacting Minds

See allHide authors and affiliations

Science  27 Aug 2010:
Vol. 329, Issue 5995, pp. 1081-1085
DOI: 10.1126/science.1185718


In everyday life, many people believe that two heads are better than one. Our ability to solve problems together appears to be fundamental to the current dominance and future survival of the human species. But are two heads really better than one? We addressed this question in the context of a collective low-level perceptual decision-making task. For two observers of nearly equal visual sensitivity, two heads were definitely better than one, provided they were given the opportunity to communicate freely, even in the absence of any feedback about decision outcomes. But for observers with very different visual sensitivities, two heads were actually worse than the better one. These seemingly discrepant patterns of group behavior can be explained by a model in which two heads are Bayes optimal under the assumption that individuals accurately communicate their level of confidence on every trial.

To come to an optimal joint decision, individuals must share information with each other and, importantly, weigh that information by its reliability (1, 2). It has been well established that isolated individuals can accurately weigh information when combining different sources of sensory information (35). Little is known, however, about how, or even whether, two individuals can accurately combine information that they communicate with each other. To investigate this issue, we examined the behavior of pairs of individuals in a simple perceptual decision task, and we asked how signals from the same sensory modality (vision) in the brains of two different individuals could be combined through social interaction.

Work on perceptual decision-making has shown that when combining information from different senses, individuals have access not just to magnitudes of sensory signals, but also to their probability distributions, or at least to their means and variances (38). However, this may not be true for interpersonal communication. Whereas probability distributions arising from different sensory modalities are available within an individual’s brain, it is not clear whether such distributions can be passed directly to another person or what types of information can be communicated. To answer this, we considered four models (9), each of which proposes that different types of information could be communicated, and quantitatively compared the predictions of those models to empirical data in a low-level visual decision-making task.

The first model proposes that nothing except the decision about the visual stimulus is communicated, and when there is disagreement, the joint decision is no better than a coin flip (CF model). This strategy is expected from previous work on collective decision-making without feedback (10). The second model proposes that nothing except the decision is communicated, but that pairs of individuals learn, from trial-to-trial feedback, which of them is more accurate, so they eventually use that individual’s decisions [the behavior and feedback (BF) model]. This model was motivated by previous work showing that collective decisions are dominated by the most competent group member in situations where clear feedback about “the truth” (in our case, the correct answer) is available (11, 12). The third model, put forward here for the first time, proposes that confidence, which we define as an internal estimate of the probability of being correct (13), is communicated [the weighted confidence sharing (WCS) model] (9). Finally, the fourth model proposes that the mean and standard deviation of the sensory response to the stimulus about which the decision is made are communicated [direct signal sharing (DSS) model]. This model is used to account for multisensory integration within an individual (3, 4) and also for collective decisions in groups (14). To anticipate our findings, we determined that the WCS model was quantitatively consistent with our empirical data, whereas the other three models were not.

Our empirical data were obtained from pairs of participants (dyads) who viewed brief visual displays containing a faint target (contrast oddball; Fig. 1A) in either the first or second viewing interval (9). We performed a series of four experiments, each of which followed very similar procedures. Initially, each participant chose the interval that they thought contained the target, without consulting the other. Individual decisions were then shared, and if participants disagreed, they discussed the matter until they reached a joint decision. Subsequently, both participants were informed of the correct choice (with the exception of experiment 4 in which no feedback was given). Individual and dyad psychometric functions (Fig. 1B, left and middle panels) were fit with a cumulative Gaussian function, from which we extracted the slope s. The slope provided an estimate of sensitivity (the steeper the slope, the higher the sensitivity). More sensitive observers were, by definition, more reliable in their estimates of contrast.

Fig. 1

(A) Experimental paradigm. Each trial consisted of two observation intervals. In each interval, six vertically oriented Gabor patches were displayed equidistantly around an imaginary circle (duration: 85 ms). In either the first or second interval, there was one oddball target that had slightly higher contrast than all of the others (in this example, upper-left target in interval 1). (B) Two example psychometric functions and the group average in experiment 1. The proportion of trials in which the oddball was reported to be in the second interval is plotted against the contrast difference at the oddball location (i.e., contrast in the second interval minus contrast in the first). A highly sensitive observer would produce a steeply rising psychometric function with a large slope. Blue circles, performance of the less sensitive observer (smin) of the dyad; red squares, performance of the more sensitive observer (smax); and black diamonds, performance of the dyad (sdyad). The blue and red dashed curves are the best fit to a cumulative Gaussian function (9); the solid black curve is the prediction of the WCS. N = 15 dyads. (C) Predictions of the four models (see Eqs. 1 to 4). The x axis shows the ratio of individual sensitivities (smin/smax), with values near one corresponding to dyad members with similar sensitivities and values near zero to dyad members with very different sensitivities. The y axis shows the ratio of dyad sensitivity to the more sensitive member (sdyad/smax). Values above the horizontal line indicate communication benefit; in this range the dyad is better than the more sensitive observer. The red curve, which corresponds to the WCS model, is above the horizontal line only if smin/smax is larger than ~0.4, reflecting the prediction that communication by WCS is beneficial only if dyad members have approximately the same competence. The green curve, which corresponds to the DSS model, never crosses the black horizontal line, so for this model, communication will invariably be beneficial. The dot-dashed and solid black lines indicate the CF and BF models, respectively.

The four models made different predictions for the relation between the slope of the psychometric function for each individual and the collective dyad; thus, by comparing predicted and observed dyad slopes, we could distinguish the models. For each of the four models (9), we computed the predicted dyad slopes, sdyadmodel, in terms of the individual slopes, s1 and s2, of observers 1 and 2. For the CF model, the predicted dyad slope is related to the individual slopes bysdyadCFs1+s22(1)for the BF model bysdyadBF=max(s1,s2)(2)for the WCS model bysdyadWCS=s1+s221/2(3)and for the DSS model bysdyadDSS=(s12+s22)1/2(4)These equations provide upper bounds on performance for each model: For example, Eq. 3 provides the largest possible dyad slope, given that participants share only confidence. If the dyads reach that slope, then they are Bayes optimal, given the model assumptions, where by “Bayes optimal” we mean that participants made decisions that maximized their probability of being correct, given their model assumptions.

Fig. 1C shows the predictions (from Eqs. 1 to 4) for the collective benefit (the ratio sdyad/smax) versus relative sensitivity (smin/smax), where smin and smax are the minimum (less sensitive) and maximum (more sensitive) of the individual slopes, respectively. The models clearly make different predictions, but to distinguish them requires experiments with a broad range of smin/smax; we would need to investigate dyad members with nearly identical performance (smin/smax ~ 1), as well as those with very different performance (smin/smax << 1). Experiments 1 and 2 were performed to test the model predictions in different ranges of smin/smax.

In experiment 1, participants viewed identical stimuli, and individual sensitivities of the dyad members were similar (smin/smax > 0.5) (Fig. 2B). The CF model (Eq. 1 and Fig. 1C, black dot-dashed line) predicted that dyad sensitivity would never be higher than that of the better participant. The BF model (Eq. 2 and Fig. 1C, solid black line) predicted that dyad sensitivity would be as good as that of the better participant. In contrast, the WCS model (Eq. 3 and Fig. 1C, red line) and DSS model (Eq. 4 and Fig. 1C, green curve) both predicted that, within the relative sensitivity range tested here (smin/smax > 0.5), dyad sensitivity would be higher than that of the better participant.

Fig. 2

Results of experiments 1. (A) Plot of the ratio of the dyad slope to the slope predicted by each model. The BF model comparison also depicts collective benefit over the more sensitive observer. Error bars indicate SEM (N = 15). (B) Distribution of data points and model predictions. Collective benefit (sdyad/smax) is plotted against relative sensitivity (smin/smax). Each blue square represents one dyad.

We found that the dyad slope was significantly larger than that of the better participant [t(14) = 5.24, p < 10−3, paired t test]. Thus, these data ruled out both the CF (Fig. 2A; p < 10−5) and BF (Fig. 2A; p < 10−3) models, for which the dyad slope can be no larger than that of the better participant, and instead favored the sharing models (p > 0.1), for which the dyad can outperform the individuals. The sharing models were also able to accurately predict, via Eqs. 3 and 4, the dyad slopes on a case-by-case basis (fig. S1). Thus, communication conferred a significant benefit, and, at least on this task, two heads did perform better than one.

Experiment 1 favored the WCS and DSS models, but was not able to distinguish between them. For the range of relative sensitivities tested in experiment 1, the two models made very similar predictions (Fig. 2B). To distinguish the models, we sought to study dyads with very different individual sensitivities (smin/smax << 1) for which the WCS model (Fig. 1C, red line) made a counterintuitive prediction: If one participant’s sensitivity was no better than ~40% of the other’s (e.g., smin/smax < 21/2 – 1 ≈ 0.4), then two heads should do worse than the better one (sdyad/smax < 1), even when individuals accurately communicated their confidence. In contrast, the DSS model (Fig. 1C, green curve) invariably predicted a benefit for dyads, consistent with the fact that when signals are directly available (as in multisensory integration within a single brain), putting them together is never worse than either one alone (4).

We tested these predictions in experiment 2. In randomly chosen trials, we surreptitiously reduced one or the other (or both) participants’ sensitivity by adding a substantial amount of noise to their stimuli (9) without having told the participants about this manipulation. The four noise regimes were randomized, so on each trial, noise was given to both participants (“equal” condition), to one but not the other (both possibilities combined together as the “unequal” condition), or to neither participant (“none” condition). For each participant and dyad, four psychometric functions (corresponding to the four noise regimes) were constructed, and the slopes were estimated (fig. S2). Figure 3 shows that, in equal and none conditions—in which participants received identical amounts of noise—robust group benefits were obtained [Fig. 3A; for equal condition: t(10) = 2.50, p = 0.03, paired t test; for none condition: t(10) = 3.38, p = 0.007, paired t test]. This replicated the results of experiment 1. However, in the unequal condition, dyads did not perform better than the better participant, and reliable group benefit was not observed [Fig. 3A; t(21) = 0.68, p = 0.54, paired t test].

Fig. 3

Results of experiment 2. (A) Ratio of the dyad slope to the maximum individual slope for the three noise conditions (equal, unequal, and none; see main text). The line at sdyad/smax = 1 corresponds to the case in which the dyad is performing exactly as well as the more sensitive member. Values above and below the line correspond to benefit and loss due to communication, respectively. ns, not significant. (B) Ratio of the dyad slope to the slope predicted by the WCS model, the latter denoted sWCS. This ratio was not significantly different from zero for any of the noise conditions. (C) Ratio of the dyad slope to the slope of the DSS model. For the unequal noise condition, this ratio was significantly smaller than 1 (p < 10−4). (D) Distribution of data points and model predictions (the latter taken from Fig. 1C). Collective benefit (sdyad/smax) is plotted against relative sensitivity (smin/smax). Each dyad contributed four sets of data points (one triangle for equal, one square for none, and two circles for unequal conditions). The solid black line indicates the boundary of collective benefit (see Fig. 1C). In (A) to (C), error bars denote SEM (N = 11 data points for equal and none conditions; N = 22 for unequal condition).

In all three conditions, the results were consistent with the predictions of the WCS model (Fig. 3B). Importantly, the majority of the data points for which smin/smax < 0.4 fell below the black line in Fig. 3D, indicating that, in these instances, two heads did worse than the better one. The DSS model, on the other hand, was rejected in the unequal condition [Fig. 3C; t(21) = 4.52, p < 10−3, paired t test]. Moreover, randomized addition of noise resulted in a wide range of relative sensitivity, and a highly significant linear correlation was observed between collective benefit and relative sensitivity [Fig. 3D; dotted blue line R2 = 0.51, F42,1 = 43.22, p < 10−7] with a slope (0.6 ± 0.09) and intercept (0.74 ± 0.05) that were very close to the slope (1/21/2 ≈ 0.71) and intercept (1/21/2 ≈ 0.71) predicted by the WCS model.

In these experiments, two aspects of social information contributed to collective decision-making: communication and feedback. However, the experiments could not tell us whether either or both types of information were necessary for collective benefit in sensitivity. To address this issue, we conducted two more experiments: Experiment 3 tested whether communication was necessary, whereas experiment 4 tested whether feedback was necessary. We found that communication was necessary, but, surprisingly, feedback was not.

It is conceivable that, even if the participants were not able to communicate their confidence on each trial, they would still be able to estimate each other’s average reliability (defined explicitly as the slope of the psychometric curves; see Fig. 1B), not through direct trial-by-trial interaction and confidence sharing, but by accumulating information about one another’s accuracy through feedback. Armed with such an estimate, dyads might conceivably be able to match the performance of those that did communicate, and so match the performance of the WCS model. On theoretical grounds, we did not expect this; instead, we expected performance without communication to match the BF model. We hypothesized that trial-by-trial communication was necessary and that feedback alone would not be sufficient for achieving collective benefit.

Experiment 3 tested this prediction using the same paradigm as experiment 1, modified so that participants were now not allowed to communicate anything but their choice. Whenever the participants disagreed in their decision, one of the two (chosen randomly by the computer) made a decision individually by arbitrating between their own choice and that of the other participant. Feedback about the correct choice was then given to both participants (9). The results were unequivocal. In contrast to experiment 1, dyad sensitivity did not exceed that of the more sensitive observer [Fig. 4A, red bar; t(13) = 0.18, p = 0.85, paired t test], as predicted by the BF model. More important, dyad sensitivity was significantly lower than the upper bound predicted by the WCS model [Fig. 4B, red bar; t(13) = 5.91, p < 10−4, paired t test], demonstrating that knowledge of current choice and previous outcomes was not adequate for the dyads to reach the level of performance observed in experiment 1, expected from the WCS model.

Fig. 4

Results of experiments 3 and 4. y-axis conventions are the same as in Fig. 3, A and B. (A) Collective benefit (sdyad/smax) is plotted for experiment 3 (red, without communication) and for experiment 4 (blue, without feedback). (B) Ratio of the dyad slope to the slope predicted by the WCS model for experiment 3 (red, without communication), and experiment 4 (blue; without feedback). In all panels, error bars denote SEM (N = 14 for experiment 3; N = 11 for experiment 4).

Experiment 3 showed that communication was necessary and that feedback alone was not sufficient for dyads to achieve a collaboration benefit. However, the results do not address the question of whether communication alone, without feedback, is sufficient for achieving collaboration benefit. Could dyads achieve any group benefit at all without ever receiving any objective feedback about the accuracy of their decisions? This is an important question, because feedback is not formally incorporated in the confidence-sharing model (9). Taking this model seriously at face value, one may make the extremely counterintuitive assumption that, as long as accurate communication of confidence is ensured, dyad benefit can still be achieved without any feedback (that is, without any definitive knowledge of decision outcomes).

In experiment 4, we removed the feedback stage of the task to test this prediction (9): After the joint decision was made (either automatically in the agreement trials or after interaction in the disagreement trials), the participants were not told the correct answer. All other aspects of the experiment were identical to experiment 1. Consistent with our prediction, even without feedback, the dyads nevertheless achieved a significant collaboration benefit [Fig. 4A, blue bar; t(10) = 2.68, p = 0.022, paired t test], and dyad sensitivity was statistically indistinguishable from the prediction of the confidence sharing model [Fig. 4B, blue bar; t(10) = 1.16, p = 0.27, paired t test]. These findings indicate that objective feedback was not necessary, and communication alone was sufficient for achieving collective benefit.

Our results show that interactive decision-making between two individuals can significantly improve perceptual sensitivity, but, importantly, only for similarly sensitive observers. Moreover, such joint behavior is Bayes optimal under the assumption that participants accurately communicate their internal estimate that they are correct. Our findings show that human-to-human interpersonal communication is adequately rich to permit sharing of subjective estimates of confidence, and humans are adequately perceptive to make optimal use of this information. Moreover, communication of trial-by-trial confidence is necessary for collective benefit, but, somewhat surprisingly, feedback about decision outcomes is not.

Quantitatively, we tested four models, and only one—the WCS model, in which participants communicated only an internal estimate of their reliability on each trial—was consistent with the data. Of the three models that were not consistent with the data, one, the DSS model, posited that participants communicated both the perceived contrast and their estimate of its reliability on each trial. That model was rejected because it outperformed the dyads in experiment 2. This leaves open the possibility that the participants did communicate contrast and reliability, but used that information suboptimally, which seems unlikely, as we never observed any dyads explicitly communicating contrast and reliability separately. However, our data cannot definitively rule out this idea, and further research is needed to distinguish between optimal use of WCS versus suboptimal DSS.

The general consensus from extensive earlier work on collective decision-making is that groups rarely outperform their best members (11, 15). Even in one of the rare cases in which consistent collaborative benefit was established, group performance failed to reach the bound predicted by the proposed ideal combination of individual decisions (14). That study employed the DSS model (see Eq. 4) to estimate the ideal, expected group sensitivity. As shown in experiments 1 and 2, however, the predictions of that model deviate significantly from empirical data if individuals’ sensitivities differ markedly. In particular, experiment 2 demonstrated the detrimental side effect of collective decision-making based on Bayesian combination of confidence: Individuals with very different sensitivities are best advised to avoid collaboration and instead should rely entirely on the more sensitive individual. In fact, the WCS model and the results of experiment 2 (Fig. 3D) set a quantitative limit on the usefulness of cooperation that, to our knowledge, is not predicted by current economic and social theories of collective decision-making (15). An important next step for future research is to test the generality of this limit in other types of dyadic interactions.

Our findings have direct bearing on studies in social psychology that have discovered numerous situations in which groups fail to do better than their individuals. Many explanations for such “process loss” have been proposed, such as reduced effort in the presence of others [e.g., “social loafing” (16)], interpersonal competition (11), and groupthink (17). Our results raise the rather different possibility that, when the communicated evidence (perceived contrast) cannot be separated from its reliability (slope), such failures of collective decision-making may be the natural consequence of a perfectly reasonable strategy (for instance, WCS). Indeed, we know all too well about the catastrophic consequences of consulting “evidence” of unknown reliability on problems as diverse as the existence of weapons of mass destruction and the possibility of risk-free investments.

Supporting Online Material

Materials and Methods

Figs. S1 and S2


References and Notes

  1. For details of methods, models, and additional results, see the supporting material on Science Online.
  2. In our model, this internal estimate of the probability of being correct is a function of the ratio of stimulus contrast in a given trial to its standard deviation. The standard deviation is found by fitting a cumulative Gaussian to the participant’s psychometric function.
  3. We thank D. Bang for generous help with data collection in experiments 2 and 4 and F. Scharnowski for his insightful comments in the early stages of the project. This work was supported by the European Union MindBridge project (B.B.), the Danish National Research Foundation and the Danish Research Council for Culture and Communication (A.R. and C.D.F.), the Gatsby Charitable Foundation (P.E.L.), and the Wellcome Trust (G.R.). Support from the MIND Lab UNIK initiative at Aarhus University was funded by the Danish Ministry of Science, Technology and Innovation.
View Abstract

Stay Connected to Science

Navigate This Article