State-Dependent Learned Valuation Drives Choice in an Invertebrate

See allHide authors and affiliations

Science  17 Mar 2006:
Vol. 311, Issue 5767, pp. 1613-1615
DOI: 10.1126/science.1123924


Humans and other vertebrates occasionally show a preference for items remembered to be costly or experienced when the subject was in a poor condition (this is known as a sunk-costs fallacy or state-dependent valuation). Whether these mechanisms shared across vertebrates are the result of convergence toward an adaptive solution or evolutionary relicts reflecting common ancestral traits is unknown. Here we show that state-dependent valuation also occurs in an invertebrate, the desert locust Schistocerca gregaria (Orthoptera: Acrididae). Given the latter's phylogenetic and neurobiological distance from those groups in which the phenomenon was already known, we suggest that state-dependent valuation mechanisms are probably ecologically rational solutions to widespread problems of choice.

Animal decision-making is often modeled using the assumption that choices are based on the fitness consequences that each choice yields. Fitness gains, in turn, depend on both the intrinsic properties of the options and the state of the subject at the time of the choice. Recently, however, studies in humans and other vertebrates (1, 2) have shown that understanding the adaptive significance of learning mechanisms may be the key to progress in functional modeling of decision-making, because preferences more closely reflect the subject's state at the time of learning than at the time of choice. Classical learning models (3) do not address the subject's state, but recent treatments of evaluative incentive behavior do (4) and are compatible with the approach taken here.

A recent theoretical model linking learning to decision-making (5) proposes that anomalies of choice behavior in which past investments rather than expected returns dominate preference (examples include Sunk Costs, Work Ethics, and the Concorde Fallacy) result from a decelerated function of value (fitness or utility) versus objective payoff, combined with a mechanism of choice that is dependent on the remembered benefit previously yielded by each option (Fig. 1). Although some utility functions can be accelerated or sigmoid, because “desperados” in dire states would accrue less marginal gains from resources than would better-off individuals, most surviving organisms operate beyond this extreme zone, and hence the assumption of decelerated gains has very wide justification. In summary, if two sources (L and H) yielding the same objective payoff (ML = MH; M, magnitude) are systematically encountered when the individual is in different states (low or high reserves for L and H, respectively), then the source encountered when needs are greater (L) will yield larger value gains (VL > VH). According to the model, although gains depend jointly on payoff magnitude and present state, it is the remembered gains, rather than remembered payoff magnitudes or states, that drive future preferences.

Fig. 1.

Putative mechanisms of valuation learning as a function of a subject's state. The ordinate is a currency that is assumed to correlate to adaptive value, and the abscissa is a metric of objective state, here assumed to be the level of accumulated reserves. The plot illustrates consequences for a subject that encounters two food sources (L and H), each when the subject is in either of two states: low or high, respectively. The magnitudes of the outcomes are labeled ML and MH and are represented as arrows causing positive state displacements. The value (or benefit) of each outcome (VL and VH) is the vertical displacement that corresponds to each change in state. The first derivative (marginal rate) of the value-versus-state function at each initial state is indicated by the slopes of the tangents SL and SH. The inset shows that the subject's representation of the magnitude of rewards (m) may differ from the objective metrics of the outcomes (in the example, ML = MH but mL > mH). Models of learning may use M, S, V, or m as being directly responsible for value assignment.

The adaptive advantages of such a mechanism are not obvious because, at least under experimental conditions, they can produce irrational preferences: Starlings can prefer a more delayed over a more immediate reward even when having explicit knowledge of the delays involved (6), and rats can frantically operate a lever or chain that causes food or water rewards even when being neither hungry nor thirsty (7). Supporting evidence for incentive or state-dependent learning comes from the mammal and bird species that have been studied so far, but there is an open question as to whether this mechanism of learned value assignment was an early vertebrate acquisition or a wider phenomenon perhaps universally present because it confers selective advantages.

We tested whether such state-dependent valuation learning occurs in a grasshopper, an animal with a simpler nervous system (8) than that of the vertebrates in which these effects are known. Grasshoppers make particularly good test subjects for studying and modeling individual decision-making because they forage for themselves and are capable of learning (9, 10). Additionally, much is known about how changes in their nutritional state affect their feeding behavior (11).

We manipulated nutritional state both at the time of learning and at the time of preference testing. We trained grasshoppers so that they encountered each of two options under different nutritional states: low (option L) and high (option H). Each option consisted of an odor (lemon grass or peppermint) paired during learning with a food item (a small piece of seedling wheat). Food items were of the same size and quality in both options, and each odor was always associated with the same state for each subject. Individuals received an equal number of reinforced trials with each option over a 3-day training regime (fig. S1). After training was completed, individual grasshoppers were presented with a choice between the two options. Half of the subjects had the test in the low state and the other half in the high state (12).

We considered four possible outcomes. The first of these, Magnitude Priority, states that if choices depend on the intrinsic properties of the options, no systematic preference will be observed between odors because the food items were identical. The second, Value Priority, states that if choices are controlled by past gains, preference should be for the option experienced in the low state during training, regardless of state at the time of choice. The third, State Priority, stipulates that if options are valued by association with the desirability of the state they evoke, the option preferred should be that experienced in the high state during training, regardless of state during choice. The fourth, State-Option Association, stipulates that choice should favor the source met under the same state at the time of training. Thus, subjects may choose option L under state low and option H under state high.

A majority of the grasshoppers preferred option L (the stimulus to which the grasshoppers were trained when in a state of low reserves) regardless of their state at the time of testing (Fig. 2). Averaged across all test subjects, the mean preference (±SE) for option L was 0.71 ± 0.06. These results indicate a significant preference for option L (t1,15 = 3.60, P < 0.01; one sample t test against indifference). Preference was not affected by the state of the subject at the time of testing (Fig. 2) or by odor bias, and the state-by-odor interaction was not significant [analysis of variance (ANOVA): F1,15 = 0.09, P > 0.77; F1,15 = 0.01, P > 0.92; and F1,15 = 0.01, P > 0.91, respectively]. There also was no left arm–right arm positioning effect (paired-samples t test: t1,15 = 0.17, P > 0.86). Next, we considered whether the speed of learning during training, as measured by latencies to contact and eat the reward, might have anticipated the preference results. A repeated-measures ANOVA indicated that latencies to start eating decreased across the 3 days of training (F2,15 = 15.00, P < 0.01; fig. S2), but averaged over time, the latencies between the L and H options were similar (F1,15 = 0.08, P > 0.78), and no significant option-by-day interaction was observed (F2,15 = 0.91, P > 0.42). Finally, we considered the possibility that each grasshopper preferred the option for which it had a shorter latency during training (regardless of whether shorter latencies were exhibited for the L or H option) (12). We found, however, no association between latencies and choices (Pearson's correlation index, r = 0.19, P = 0.48).

Fig. 2.

Percentage of choices for option L exhibited by each subject under the two different states (low and high reserves) during testing. The figure shows individual data points (black circles) and means (±SE) (gray circles) with respect to indifference (dashed line). For both states, the percentage of choices for option L was significantly higher than indifference.

Our experiment supports the idea that in this insect, the benefit gained at the time of training affects later preference even when the magnitudes of the rewards are equal (namely, Value Priority). The Value Priority outcome can be mediated by two very different mechanisms that could be labeled Perception Distortion and Remembered Value. Perception Distortion states that the energetic state at the time of training influences the distal mechanisms of perception, so that the memory of the properties of the options is altered: Equal payoffs are perceived as being different (In Fig. 1, mL > mH). Under the Remembered Value mechanism, the memory for the magnitudes is accurate, but the animal attaches different subjective attractiveness to each option, depending on its state while learning. These considerations may apply to similar anomalies of decision-making in all animals, including humans.

In grasshoppers, there is evidence favoring the Perception Distortion mechanism, because preference for the option experienced when reserves are low could be explained by peripheral gustatory responses underlying feeding behavior. In these animals, as time since the last feed increases, nutrient levels in the hemolymph drop, and as a consequence, mouthpart taste receptors become increasingly sensitive to key depleted nutrients (13, 14). This means that at a neurological level, a grasshopper with low reserves will receive greater feedback when it contacts a food item (1517). Similarly, through digestive adaptations, individuals may extract more nutrients from identical food items when in greater need (18), and later choices may be governed by the memory of the postabsorptive gain (or the sensory adaptation consequent on the gains) and not of the objective features of the food items. The latter route for Perception Distortion could in theory also apply to vertebrates, but the available evidence does not point in this direction, at least for starlings. In their case, peripheral adjustments leading to either distorted representations or distorted perceptions due to rapid absorptive adaptations are both unlikely. This is because in learned valuation effects, starlings' preference between equally delayed rewards is not accompanied by alterations in the pecking rate (suggesting that neither the perception of the magnitude of the reward nor timing was altered) (1, 5, 6).

Thus, although similar behavioral outcomes are observed in starlings and grasshoppers, it is possible that different underlying mechanisms drive state-dependent learned valuation in each species. This difference supports the view that state-dependent learned valuation has intrinsic, although not yet identified, adaptive advantages and has probably emerged and persisted in distant species via convergent evolution.

State-dependent valuation may be computationally more efficient than remembering the attributes of each option and weighing them against current nutritional state. This may reduce errors and help when decisions need to be made quickly and where neural constraints limit the amount of information that can be processed (19, 20). State-dependent valuation can cause suboptimal choices if there is a difference between the choice circumstances and the circumstances for learning about each option, as in our experiment. In particular, for there to be a cost, there must be a correlation between state and the probability of encounter with each option when options are met singly. This would occur because when these options are met simultaneously and a choice takes place, information about past gains could be misleading. Outside these probably rare circumstances, the mechanisms do not favor suboptimal alternatives (21). It could be argued that even if state-dependent valuation causes frequent and costly suboptimal choices in nature, it may persist because of neural or psychological constraints or because the cost associated with the development of a different mechanism is higher than the cost of using such a metric. The latter possibility cannot be discarded, but we do not favor it as a working hypothesis.

Ultimately, it would be ideal to measure the prevalence of different learning and choice circumstances in natural environments, but until that becomes possible, progress can be made by modeling the theoretical ecological worlds under which state-dependent valuation would be evolutionarily stable when used in competition with animals that form preferences based on the absolute properties of their options.

Supporting Online Material

Materials and Methods

Figs. S1 and S2


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article