Technical Comments

Response to Comment on “Morality in everyday life”

See allHide authors and affiliations

Science  15 May 2015:
Vol. 348, Issue 6236, pp. 767
DOI: 10.1126/science.aaa3053


Voelkle challenges our conclusions regarding the relationship between morality and momentary happiness/sense of purpose based on methodological concerns. We show that our main conclusions are not affected by this methodological critique and clarify that the discrepancies between our and Voelkle’s effect size estimates can be reconciled by the realization that two different (but compatible) research questions are being asked.

We thank Voelkle for detecting a figure display error in our figure 3, one benefit of sharing replication data publicly (1, 2). We want to highlight that neither the reported F values nor effect sizes (Cohen’s d) were affected by the display error. We maintain that the overall interaction pattern and conclusions remain unchanged: The biggest effects of morality on happiness are when one is the target of moral versus immoral acts, and the biggest effects on sense of purpose are when one commits moral versus immoral acts.

But why is there such a striking difference in effect size estimates between our and Voelkle’s analysis? The answer is that we asked one analytical question about the connection between morality and happiness (and, analogously, sense of purpose), whereas Voelkle asks a different one. Our key question was, “How does the occurrence of a moral/immoral event affect momentary happiness?” We refer to this as the momentary impact effect (MIE) (Fig. 1). Framed differently, what difference in momentary happiness can we (typically) expect between the experience of a moral versus immoral event? The answer is provided by standardizing the difference in estimated cell means by the standard deviation of momentary happiness (Cohen’s d). Full information maximum likelihood estimation is fairly robust to missing data, especially in large samples. Hence, there is little reason to worry about the precision of estimates for these naturally sampled events (estimated within persons, averaged across participants). Reasonably robust estimation requires neither that each person is forced to contribute data to each cell nor that a “direct transition” from a moral to an immoral event is experienced. Adding the baseline data to this analysis is nonessential and only trivially affects MIE calculations (Table 1) (mean estimates from the corrected, dummy, and continuous-time analyses correlate above 0.99 on average). However, including the baseline is useful because it allows for decomposing each of the overall ds reported into the respective differences from the baseline (e.g., dmoral_to_base = 0.46; dimmoral_to_base = 0.88 for being the target of acts).

Fig. 1 Simple illustration of the two questions asked.

Table 1 Effect size estimates.

Effect size estimates (Cohen’s d) for the comparison of moral-immoral cell means from the original happiness-related analysis reported by Hofmann et al. (HWBS) (1), three analyses suggested by Voelkle (V) (2), and another sensitivity analysis (including baseline as a Level 2 covariate) run by Hofmann et al. but not reported in their original paper.

View this table:

The discrepancy and confusion stems from the fact that Voelkle’s dummy coding approach addresses a different question that we did not ask in our original paper: How much of the overall variance in happiness observed across all measurement occasions is due to the specific types of moral/immoral events in question? We call this the overall variance accounting effect (OVAE). Framed differently, in light of all other possible events affecting happiness in everyday life, how much can be attributed to, for instance, the commission of moral/immoral acts? The baseline data are essential here. To address the issue statistically, Voelkle assigned one dummy variable to represent committed moral acts (6.9% of the data), one to represent committed immoral acts (2.6%), and all remaining morally relevant (19.4%) and all morally irrelevant occasions (71.1%) to “baseline” (Fig. 1). The amount of variance accounted for by the two dummy variables (covering less than 10% of the data) is f2 = 0.022, interpreted as a small effect according to Cohen’s conventions.

Although we agree that estimating the OVAE is important, we see three noteworthy problems: First, in contrast to the MIE analysis, this analysis confounds the effect of an event with its frequency of occurrence. Because all explanatory power is limited to the two isolated dummies, the low resulting effect size estimates are essentially another way of establishing that people do not commit moral/immoral events all too often, consistent with our initial frequency data, and that happiness is affected by many sources other than morality. Does that necessarily mean these events are negligible for people’s everyday well-being? Addressing the OVAE more generally by including all eight dummies in concert—rather than having them “work against each other” by assigning remaining moral events to the unaccounted-for baseline—accounts for 8% of the overall variance in daily life. Yet again, to determine whether this is a large or small amount within this specific (i.e., high-frequency sampling) research context, comparing these effects relative to other salient but comparatively rare events (e.g., professional achievements, sexual activity, and interpersonal conflicts) may be useful, and Cohen’s conventions should be applied cautiously (3, 4).

Second, caution is strongly warranted because Voelkle himself presents remarkable evidence for strong temporal carry-over effects of moral events to subsequent occasions (continuous time analysis), indicating that moral events exert an effect well beyond the immediate moment. Inconsistent with this insight, however, the above dummy approach used for effect size determination inappropriately models happiness levels on subsequent (baseline) occasions as unaccounted-for fluctuation wholly due to “other” sources, thus underestimating the true OVAE in all likelihood.

Third, many experts have cautioned for some time against a simple transfer of f2/R2 from ordinary linear regression to multilevel models, rendering these indices neither appropriate nor truly “standardized” in this context (59). For instance, because variation in the response can come from multiple sources (i.e., within-person fluctuation versus stable between-person differences), a considerable amount of between-person variance—due to, e.g., genetic factors—would count toward the unaccounted-for portion of the model within Voelkle’s approach, possibly biasing estimates downwards.

Finally, the citations used to challenge our MIE estimates seem inadequate: Almost all of the positive psychology interventions summarized (10) consist of cognitive exercises such as counting one’s blessings. The one study that comes closest to our naturalistic setting by instructing people to perform daily kind acts for 10 days (11) yielded an effect size of d = 0.62 in posttest to pretest increase in (immediate) life satisfaction—an effect of comparable size to those found in our analyses. The remaining cited work (12, 13) assessed long-term adaptation rather than momentary impact.

In sum, what the data show is that moral and immoral events are robustly associated with people’s momentary happiness and sense of purpose—even through time. The claim that our effect sizes are “too big” or “unrealistic” stems from the misconception that there was only one “right” question to ask. Finding out more about how much morality affects happiness in relation to all other things is important, too, but addressing this complex issue as precisely as possible requires more careful analysis and better reference standards.

References and Notes

  1. Acknowledgments: We thank M. Friese and M. Luhmann for comments on an earlier version of this article.

Stay Connected to Science

Navigate This Article