Cope's Rule and the Dynamics of Body Mass Evolution in North American Fossil Mammals

See allHide authors and affiliations

Science  01 May 1998:
Vol. 280, Issue 5364, pp. 731-734
DOI: 10.1126/science.280.5364.731


Body mass estimates for 1534 North American fossil mammal species show that new species are on average 9.1% larger than older species in the same genera. This within-lineage effect is not a sampling bias. It persists throughout the Cenozoic, accounting for the gradual overall increase in average mass (Cope's rule). The effect is stronger for larger mammals, being near zero for small mammals. This variation partially explains the unwavering lower size limit and the gradually expanding mid-sized gap, but not the sudden large increase in the upper size limit, at the Cretaceous-Tertiary boundary.

Shortly after Cope described the first important Paleocene faunas from North America, he realized that the average size of mammals has increased dramatically during the Cenozoic (1). He attributed this pattern to a tendency for new groups to evolve at small sizes, combined with a persistent innate drive toward larger size. The idea that evolutionary increases in body size are common has been recast in more Darwinian terms and termed “Cope's rule.” Despite a long history of research (2), most modern studies have found little evidence to support this rule (3-5), dismissed it as context-dependent (6), or explained it with the statistical argument that means will rise passively as a group founded by small species diffuses through a bounded morphospace (7-12). Even actively driven trends have been attributed to convergence on an optimal body size, not to a general tendency toward size increase (7, 8). Here I show that there is an active within-lineage trend in the fossil record of North American mammals that is consistent with Cope's prediction.

Earlier studies of Cope's rule have focused on short-term trends (3, 5, 8), analyzed small sets of species (3, 4, 6, 8), discovered patterns to be sampling biases (9), or failed to make direct comparisons of potential ancestor-descendant species pairs (5, 10, 11). However, direct comparisons make it possible to distinguish within-lineage processes (for example, selection) from among-lineage processes (for example, differential extinction or origination), two factors that have been conflated in earlier analyses of the overall size ranges of individual clades (5) or of clade-subclade pairs (11).

I analyzed species ranging in age from Campanian (late Cretaceous) to late Pleistocene by using generic assignment and relative age as indicators of potential ancestor-descendant relationships. This is not a very robust phylogenetic method. But, as discussed below, it is highly conservative, similar to more sophisticated methods that are widely accepted, and based on seemingly uncontroversial assumptions. Furthermore, a specially designed bootstrapping test shows that the main result could not have been obtained unless the species-to-species comparisons did contain a large amount of phylogenetic signal.

Studying body mass trends requires not just an approximate phylogeny but both robust mass estimates and precise dates of first and last appearance (Fig. 1). The mass estimates were based on published lower first molar (m1) measurements, which have been related precisely to body mass in living mammals (13-17). Data were available for 1534 species, represented by 15,281 measured specimens from 2875 fossil populations. The data encompass those of some earlier studies (3, 6, 7, 11) but are at least an order of magnitude more plentiful.

Figure 1

Temporal distribution of Cenozoic mammalian species across the body mass spectrum. Age ranges were based on a multivariate ordination of faunal lists (18-21). Mass estimates were computed with the use of published regression coefficients for mass against m1 length × width [Carnivora, Insectivora, Primates, and Rodentia (13)] or against m1 length [Artiodactyla and Perissodactyla (14)]. Coefficients for Primates were also used for Plesiadapiformes (15); coefficients for Carnivora were also used for Mesonychia (16). Proboscidean m1's are rarely described, and their lower cheek teeth all are relatively large; mass estimates based on m2 area measurements and the all-mammal regression for combined p4-m2 area agreed with earlier literature (17). The all-mammal m1 area regression was used for all remaining mammals.

The appearance dates were based on a recent time-scale analysis (18, 19) of a comprehensive faunal database for North American fossil mammals (18, 20,21). These data include 4015 taxonomic lists for individual fossil localities, which have been standardized taxonomically by referring to a companion database that flags 2692 invalid species names and 1197 invalid genus-species combinations. The corrected lists document occurrences of 3181 valid species. Instead of using the traditional system of North American land mammal ages, I converted the raw data directly into numerical age-range estimates by subjecting the lists to multivariate ordination and calibrated the results to numerical time using 152 independent estimates of geochronological age (21).

For each new species, one potential ancestor was selected from the other species in the same genus that appeared before it did. If some of these older species were still extant at this time, one was selected at random; if not, then the older species that last went extinct was selected. Like several new methods that incorporate temporal information into phylogenetics (22), this procedure tends to minimize the number of implied ghost lineages. In order to test for trends, the difference in log body mass was computed for each older-younger species pair. This is similar to the widely used phylogenetic contrast procedure (23), in which measured characters are transformed into differences between putative sister species.

Admittedly, the proxy ancestor method does not directly examine character data and therefore is oversimplistic and error prone. However, its assumptions are justified. First, because the mammalian fossil record is well sampled, ancestor-descendant species should be observed with great frequency regardless of the assumed evolutionary model (24). Second, there is a correlation of age rank and clade rank in many mammalian groups (25): The relative ages of fossil species do correspond with the relative sequences of evolutionary splitting implied by phylogenies. Third, errors in identifying ancestor-descendant pairs will push the average size difference toward zero, which should obscure anything less than the strongest within-lineage trends. There are many possible errors: Older species might be closely related but not directly ancestral to younger species (for example, sister species); they might be only distantly related if a genus is diverse or polyphyletic; or they might be descendants, instead of ancestors, if undersampling leads to incorrect estimation of the relative order of appearance. Finally, the algorithm is even more conservative because it reduces sample sizes. Many genera are represented in the mass estimate data set by only one species and therefore cannot be studied. In addition, at least one species must be the oldest in each polytypic genus and therefore cannot be matched to a still older species. Despite these losses, 779 of the 1534 measured species (50.8%) were assigned a putative ancestor.

The basic pattern is overwhelming (Fig.2). Newly appearing species are on average 0.0874 natural log units (9.1%) larger than older congeneric species, a highly significant difference according to two standard tests and a nonparametric resampling analysis. The only clear-cut hypothesis that predicts such a pattern is the most narrow and deterministic interpretation of Cope's rule; namely, that there are directional trends within lineages. Alternative hypotheses make no special predictions about average differences in mass between taxonomically paired species: neither increases in variance by diffusion away from evolutionary boundaries nor differential origination and extinction among lineages have to do with within-lineage patterns. One could argue that the trend would be artifactual if taxonomists preferentially removed relatively small and derived lineages from nominal genera. But this bias is not obvious in the literature, and the argument begs the question of why taxonomists would not only do this but at the same time retain relatively large derived species in nominal genera.

Figure 2

Frequency distribution of differences in body mass between 779 matched pairs of younger and older species in the same genera. Dashed line indicates zero difference. Younger species are significantly larger, either according to a standard t test (t = 3.225, df = 776, P < 0.01) or according to a G test [442 of 779 (56.7%) are larger, with a null expectation of 50%,G = 14.782, df = 1, P < 0.005]. A more robust, nonparametric test shows that the pattern is due to within-lineage trends instead of an among-lineage trend. This involves creating pseudo-matches of the younger species to older ones drawn randomly with replacement (bootstrapping). Totally random draws would generate unrealistically large temporal and body mass differences, because species are only placed in the same genera if they appear at similar times and have similar sizes. Therefore, a conservative algorithm was used as follows: (i) The differences in first appearance dates and the absolute differences in the mass of matched species were counted (bin sizes were set at 0.1 My and 0.01 ln g). (ii) As candidate older species were drawn randomly, the observed differences were subtracted from the two count vectors. (iii) If either difference had a zero count, the counts were restored and the candidate species was replaced. (iv) Once all younger species had been matched, the mean difference was computed. (v) The procedure was iterated to create a null distribution. Because only absolute values of mass differences were held constant, the average differences could take on any value. For combinatoric reasons, an average of 74.7 pairs per trial (9.6%) could not be matched. Even though the average differences in mass for unmatched species pairs were high (0.118 ln g), the remaining matched pairs averaged differences that were very close to the original value (0.084 versus 0.087 ln g). The bootstrapped species pairs in 10,000 trials differed in mass by an average of 0.022 ln g, which is significantly less than 0.084 ln g (P = 0.0071).

The strong support for a within-lineage effect raises several questions. First, average body mass across the fauna increases dramatically during the Cenozoic; can this effect account for the trend by itself, or are among-lineage effects such as differential extinction also needed to explain it? A simple calculation shows that it can. A least-squares fit of time against mean size for the Cenozoic data yields a slope of 0.0392 ± 0.0037 ln g per million years (My). The first appearances of the older and younger species in each comparison differ on average by 2.62 My, so the increase of 0.0874 ln g per generation amounts to an increase of 0.0334 ln g/My, which is an insignificant 1.6 standard errors lower than the observed slope.

A second question is whether the within-lineage trend varies through time: It might just be a feature of one unusual interval, such as the immediate post–Cretaceous-Tertiary (K-T) boundary recovery phase. To address this question I binned the older-younger matches into 2.5-My intervals throughout the Cenozoic (Fig.3). There is a weak, marginally significant correlation between time and average size difference (Spearman's r = 0.342, t = 1.784,P < 0.10). However, this correlation is positive; the effect's strength actually increased over time. Least-squares regression predicts a mean size change of +2.7% during the initial, early Paleocene radiation of mammals, but +21.0% in the latest Pleistocene. The average might have tracked either the appearance of new taxonomic groups with stronger biases or environmental changes that favored large sizes. Short-term excursions from the trend are not consistent, as shown by the lack of significant serial correlation [Spearman's r = 0.132, t = 0.636, not significant (NS)]. These results suggest that progressive increase in size has been an important pattern throughout much of mammalian history.

Figure 3

Trend in strength of the within-lineage Cope's rule effect through the Cenozoic. Here, the data shown in Fig.2 are binned into intervals 2.5 My long and averaged. Sample sizes range from 12 to 79 older-younger species pairs per interval, with an average of 29.1. Alternative bin sizes of 1 to 10 My yield similar patterns. Cretaceous data are too sparse to allow reliable averages to be computed. The dashed line illustrates the expected average change of zero if there is no effect.

Despite the consistency of this evolutionary bias, it does not account for all of the major features of the body mass distribution (Fig. 1). These are a constant lower mass limit of about 2 ln units; a gradual increase in the upper mass limit throughout the Cenozoic; a rapid expansion in the upper, but not lower, mass limit immediately after the K-T boundary; and the gradual development of a gap in the middle part of the size spectrum that begins in the Eocene at about 46 million years ago (Ma).

Most of these patterns could be explained by the existence of two body mass optima, each serving as a statistical point attractor or equilibrium. Unlike purely unconstrained distributions, distributions with attractors eventually cease to expand. Therefore, a preestablished small-sized equilibrium might explain the invariant lower size limit. Meanwhile, a second, larger optimum, combined with the observation that there were no truly large mammals before the K-T boundary, might explain why the upper limit was not stable: There may not have been enough time during the Cenozoic for the distribution to expand and envelop the upper optimum.

Large-scale ecological studies of mostly small-sized, extant North American mammals do suggest that there is a body mass optimum at about 100 g (26), which is close to the average size of Cretaceous mammals and in agreement with the lower half of the temporal distribution (Fig. 1). We can test for this optimum by regressing the difference in body mass between younger and older species against the mass of the older species (Fig. 4A). If a single optimum exists, then relatively large older species will be matched to smaller, younger species and vice versa, thereby creating a negative correlation. Assuming a linear, or Ornstein-Uhlenbeck model (23), the ratio of the resulting intercept and slope will define the optimal mass; that is, the point at which the expected change in mass is zero.

Figure 4

Positive correlation between the mass of older species in each matched pair and the difference in mass between the younger and older species. Although not significantly better than a linear fit, a cubic fit (thick solid line) implies a biologically realistic falloff in the rate of size increase at very large body sizes. Data are shown in (A); one point at the 2.43, +4.56 coordinate falls outside the plot's limits. The same polynomial fit is shown in (B), where the yaxis has been expanded. The relation implies evolutionary tendencies (arrows) toward stable optima in body mass (solid circles) and away from an unstable equilibrium (open circle); these points fall where the expected change (thick line) equals zero (dashed line). The thick arrow shows the trajectory implied by both linear and cubic functions. The 95% confidence intervals (thin solid lines) are based on 1000 bootstrap replicates of the original data. Regressions are corrected for the “regression to the mean” artifact; that is, the spurious negative correlation between any two variablesy-x and x. Let sx equal the slope of the regression ofy-x on x, sy equal that of y-x ony, and s E equal the slope of the Ornstein-Uhlenbeck equilibrium function; and assume that the data result from a summation of the linear regression to the mean and equilibrium functions. Because the value of the slope fixes the covariance, correlation, and intercept, the desired coefficients can be estimated by numerically solving for s E in the easily derivable equation sx + sy = s E(s E + 2)/(s E + 1). For a polynomial regression, this is done separately for each of the regressions of y-x on a power of xthat is involved.

With the appropriate corrections for the fact that the independent variable (mass of the older species) appears also as part of the dependent variable (the change in mass), the predicted negative correlation is not seen; instead, there is a significant positive correlation (n = 779; r = 0.113;F = 10.06; df = 1, 777; P < 0.005). Superficially, this positive linear relation implies that body mass continues to increase at an ever accelerating rate. A disequilibrium such as this one is biologically unrealistic because there must be biomechanical and physiological limits to size. The dilemma could be solved by showing that although the rate of increase is rapid in the middle of the size range, it falls to zero at very large sizes. Such a dual-optimum dynamic should resemble a quadratic or cubic function; both functions can imply one stable and one unstable equilibrium, but they differ fundamentally because only the quadratic could imply a second stable equilibrium.

A quadratic fit does not significantly improve on the original rvalue (r = 0.115; F = 0.41; df = 1, 776; NS) and neither does a cubic fit (r = 0.126; F = 1.22; df = 2, 775; NS). This result may be due to noise in the proxy phylogeny or to undersampling of the large lineages. But in any case, both fits do imply that the rate of increase is maximal in the middle of the distribution, and the 95% confidence intervals cannot exclude near-zero rates at the ends (Fig. 4B). The rate of increase is maximal at 75.3 kg (predicted difference = 0.233 ln g). The function is so flat at the lower end that for small mammals there is more of an optimal zone than an optimal point; the biologically required large mammal equilibrium is so large that it is not statistically clear-cut and apparently never was attained during the Cenozoic. In any event, either stability or an increase in size, but not a decrease, is predicted for lineages of almost any size.

Any of these equilibrium models could partially account for the trend toward larger size, the persistence of an unwavering lower limit, and the gradual opening up of a gap in the middle of the distribution. However, they cannot account for the sudden expansion of the distribution after the K-T mass extinction event at 65 Ma. In the last million years of the Cretaceous, 29 measured species averaged 150 g. In the first million years of the Cenozoic, 33 measured species (27 of them new) averaged 1.01 kg. This extraordinary shift of 1.91 ln units is unequaled elsewhere in the data set.

Would there have been a rapid shift if the modern size-change function (Fig. 4B) suddenly came into existence at the K-T boundary? On the basis of the linear and cubic equilibrium models, the expected increase from 150 g is only either 0.035 or 0.006 ln units per first appearance. Therefore, the Cretaceous fauna already was solidly within the optimal zone of most Cenozoic small mammals, and the size-change dynamic does not explain the sudden shift by itself. Better explanations might involve stochastic factors or short-term changes in the underlying dynamic. That the early Paleocene really was a very unusual time is indicated by the phenomenal rates of origination seen then (21). In any case, the data are compatible with the idea that the extinction of large terrestrial vertebrates such as dinosaurs at the K-T boundary opened up the larger end of the body size spectrum for occupation by mammals.

Despite the clear evidence for nonrandom within-lineage evolution, the overall trend in body size known as Cope's rule may reflect a balance of forces operating both within and among lineages (27). Differential turnover rates at small sizes may help to explain the very sharp lower size limit. Similarly, if higher rates of extinction or lower rates of origination (or both) have suppressed the diversity of large mammals, that might explain why the biologically required optimum at a large size seems to be at or beyond the limit of the observed size range. The situation would be analogous to the hypothesized “taxon cycle” in island communities (28), in which newly arriving species evolve toward niches that are opened up by the extinction of older species. The analogy would be particularly relevant if the existence of a large body mass optimum itself were due not to functional constraints but to character displacement pushing lineages away from the center of the distribution. Even if these speculations eventually are refuted, the extraordinary size bias in the production of new mammalian species throughout the Cenozoic will continue to demand explanation.


View Abstract

Navigate This Article