Research Article

Phanerozoic Trends in the Global Diversity of Marine Invertebrates

See allHide authors and affiliations

Science  04 Jul 2008:
Vol. 321, Issue 5885, pp. 97-100
DOI: 10.1126/science.1156963


It has previously been thought that there was a steep Cretaceous and Cenozoic radiation of marine invertebrates. This pattern can be replicated with a new data set of fossil occurrences representing 3.5 million specimens, but only when older analytical protocols are used. Moreover, analyses that employ sampling standardization and more robust counting methods show a modest rise in diversity with no clear trend after the mid-Cretaceous. Globally, locally, and at both high and low latitudes, diversity was less than twice as high in the Neogene as in the mid-Paleozoic. The ratio of global to local richness has changed little, and a latitudinal diversity gradient was present in the early Paleozoic.

Diversity curves showing changes through the geological record in the number of fossil marine genera or families have fueled decades of macroevolutionary research (110). Traditionally, these curves were based on literature compilations that recorded only the first and last appearances of taxa (3, 5). These compilations suggested that diversity rapidly rose during the Cambrian and Ordovician and then either stayed at a plateau (3) or declined erratically (5) through the Paleozoic. There was a modest rebound after the end-Permian mass extinction, the largest (11) and most ecologically important (9, 12) of the Phanerozoic. Diversity in these curves then rose steadily and with a possibly increasing absolute rate, suggesting to some an exponential radiation (4, 6, 9, 10).

The appearance of a Paleozoic equilibrium followed by a nearly unbridled Meso-Cenozoic radiation has presented a puzzle: How could global diversity reach limits and then much later cast them off, rising to far higher levels than those seen during the Paleozoic? The possibility that the apparent radiation was exaggerated by secular trends in the quality and especially quantity of the preserved fossil record (8, 1315) was first proposed more than three decades ago (16). This claim was at first put aside because sampling effects were thought to be minor and could not be assessed without more detailed information (2, 8).

The Paleobiology Database ( makes it possible to address the problem with the use of contemporary statistical methods because it records occurrences of genera and species within particular fossil collections. An intensive data collection effort has quadrupled the database since data for two long Phanerozoic intervals were presented in 2001 (17). This initiative has focused on both filling gaps and sampling the Cenozoic at a high level (18). The data set includes 44,446 collections with individually recorded ages and geographic coordinates. The collections comprise 284,816 fossil occurrences of 18,702 genera that equate to ∼3.5 million specimens and derive from 5384 literature sources.

Sampling and counting. The amount of data per time interval and, therefore, the shape of a diversity curve may vary greatly as a result of uneven preservation and sampling effort. The key advantage of collection data is that this variation can be removed by subsampling (17). A random subset of the available collections is drawn until each interval, called a sampling bin, includes the same estimated number of specimens. Genera are counted, and the procedure is repeated to obtain averages. We tallied actual specimen counts when available and otherwise estimated them using a gently curved, one-parameter empirical function that relates the logarithms of specimen and genus counts in each collection (18). The parameter is called a calibrated weight. Previous studies (17, 1922) all presumed that this relation was log-log linear and had the same shape in every time interval, or else that there was little change through time in the average size of collections. We instead rarefied actual abundances to produce a separate estimation curve for each interval.

Additionally, we weighted the chance of drawing each collection inversely by its specimen count to distribute sampling both spatially and environmentally, which avoids underestimation of global diversity (18). We excluded samples from entirely unlithified rocks, sieved samples from poorly lithified rocks, and samples that preserve original aragonite because it is easier to collect small and fragile specimens in such cases. Furthermore, samples falling into any of these categories are extremely uncommon before the Cenozoic, and the Cenozoic samples are concentrated in a narrow region of the temperate zone [supporting online material (SOM) text]. Finally, before each round of subsampling we restricted the data set to 65 randomly drawn references per interval, exceeding this figure only when more are needed to provide the quota of specimens used in subsampling. Use of a reference quota holds the effective size of the sampling universe more constant, which avoids such problems as a correlation between apparent diversity and the geographic extent of fossil collections (SOM text).

A second issue is how to tally genera (17). Conventional protocols count not only genera sampled inside a temporal bin but also genera found at any time before and after a bin (but not inside it). Adding these unsampled taxa to the count creates dropoffs at the edges of curves, local depression of curves near extinction and origination events [the Signor-Lipps effect (23)], and interpolated presences of polyphyletic (“wastebasket”) genera that have artificially long ranges with large gaps connecting unrelated species. Our direct documentation of occurrences allowed us to avoid all such problems by counting only genera that have actually been sampled. We used a variant of this sampled-in-bin method (20) that corrects for residual error by assessing the proportion of genera found immediately before and after a sampling bin but not inside it. This correction has little effect other than reducing some short-term variation (18).

Global diversity. The sampling-standardized diversity curve (Fig. 1) shows many key features of older curves (e.g., 5), such as the Cambro-Ordovician radiation, 78% end-Permian extinction, and 63% end-Triassic extinction (18). However, it also includes features that are not highly visible in earlier published curves. Some of them are short-term excursions that may not be robust, such as the brief peak in our sixth Cretaceous bin. Others, however, are seen in numerous treatments of our data (SOM text). (i) The curve suggests that there was a large mid-Devonian drop with no clear recovery until the Permian, instead of a mid-Paleozoic plateau (3, 6). The initial decline begins well before the Frasnian-Famennian ecological collapse, and diversity does not fall across that boundary. (ii) The onset of the late Paleozoic glacial interval was no earlier than the late Famennian (24). However, the curve's large mid-Permian increase roughly corresponds with the end of glaciation and an increase in the number of latitudinally restricted tropical genera (25). (iii) The recovery from the Permo-Triassic mass extinction is so rapid that Early Triassic standing diversity is only 32% lower than Late Permian standing diversity. (iv) For similar reasons, the 55% Cretaceous-Tertiary mass extinction does not register as a visible net change in diversity at our curve's level of resolution.

Fig. 1.

Genus-level diversity of both extant and extinct marine invertebrates (metazoans less tetrapods) during the Phanerozoic, based on a sampling-standardized analysis of the Paleobiology Database. Points represent 48 temporal bins defined to be of roughly equal length (averaging 11 My) by grouping short geological stages when necessary. Vertical lines show the 95% confidence intervals based on Chernoff bounds, which are always conservative regardless of the number of genera that could be sampled or variation in their sampling probabilities (18). Data are standardized by repeatedly drawing collections from a randomly generated set of 65 publications until a quota of 16,200 specimens has been recovered in each bin. On average, 461 collections had to be drawn to reach this total. The curve shows average values found across 20 separate subsampling trials—enough to yield high precision with such large sample sizes. Ma, million years ago. Cm, Cambrian; O, Ordovician; S, Silurian; D, Devonian; C, Carboniferous; P, Permian; Tr, Triassic; J, Jurassic; K, Cretaceous; Pg, Paleogene; Ng, Neogene.

Most importantly, the curve casts doubt on the existence of an exponential radiation extending throughout the Mesozoic and Cenozoic. The Triassic data points are higher than the early Jurassic points instead of lower, and there is little net change from the mid-Cretaceous through the Paleogene. Even the Neogene peak is subdued: The curve's last data point is only a factor of 1.14 and 1.37 higher than the two highest points in the early and mid-Paleozoic (both in the Early Devonian) and a factor of 1.74 higher than the median Paleozoic point. Older subsampling methods suggest an even smaller increase (18), and the Neogene values may be exaggerated by geographic factors (SOM text). Thus, the new results suggest that any post-Paleozoic radiation was largely confined to the Jurassic and Early Cretaceous.

In sum, the net increase in global diversity over nearly a half-billion years was proportionately not much larger than some of the changes in genus counts between neighboring 11-million-year (My) intervals. However, some treatments of Sepkoski's genus-level compilation imply that the mid-Paleozoic–to–Neogene increase was by a factor of 3.5 (7), or even 4.1 (6). We next show that within-collection diversity patterns and changes in latitudinal gradients are only consistent with the new curve.

Collection-level diversity and evenness. Abundance distributions are even when each taxon is represented by a similar number of specimens. Evenness is of intrinsic ecological interest because it controls sampled richness when collections are of the size normally studied by paleoecologists (about 100 to 300 specimens) (2628). Fortuitously, the single governing parameter estimated by the calibrated-weights method is easily translated into Hurlbert's probability of interspecific encounter (PIE) index of evenness (27). PIE bears no necessary relation to the total number of taxa that might be sampled.

Evenness changed substantially through the Phanerozoic, if not as dramatically as sometimes suggested (29), and the greatest values are seen in the late Cretaceous and Cenozoic (Fig. 2). They imply, for example, that 200 specimens will yield about 11.3 genera in the latest Ordovician, 22.2 in the Paleocene, and 19.3 in the late Neogene. The general pattern of a long-term increase is confirmed by related (but more restricted) studies (26, 30). However, the evenness curve (Fig. 2) does not simply increase. It suggests a plateau between the Late Ordovician and Carboniferous and then a large rise through the Permian. After a weak recovery from the Permo-Triassic decline, it shows little change until a rise in the Late Cretaceous (18). All of these features are broadly consistent with the idea that local diversity reached its maximum sometime during the past 100 My (29) as the number of occupied niches expanded (31). However, they do not suggest a radical increase between the early Paleozoic and Cenozoic. They also do not mirror the shift in abundance distributions from simple to complex shapes at the Permo-Triassic boundary (12), because evenness dropped across this boundary instead of rising, and the Triassic figures are well within the narrow range seen throughout most of the Paleozoic.

Fig. 2.

Evenness estimates (thick line) for 11-My-long bins, based on Hurlbert's PIE index (27). The values are computed from the calibrated occurrence weights used in subsampling, and each is a weighted moving averages across five consecutive bins. Changes in global diversity (thin line, same as in Fig. 1) are shown for comparison.

Theoretically, if biogeographic and environmental gradients (i.e., beta diversity) do not change, global diversity should track local sampled diversity and, therefore, evenness. Furthermore, the global curve is methodologically grounded on the evenness data because the calibrated function estimates fewer specimens per collection when evenness is high, which causes more collections to be drawn. Hence, it is no surprise that after logging and differencing the global and local curves (Fig. 2), we find a significant correlation (Spearman's rank-order correlation ρ = 0.332, P = 0.023). Specific similarities include the parallel increase during the Cambro-Ordovician radiation, the joint rise during the Permian, and the drop in evenness during the severe global extinction at the Permo-Triassic boundary (9, 11, 12).

However, the curves depart from one another for long stretches of time. Evenness remained relatively high during the long late Paleozoic trough in global richness (Fig. 2). The offset may be driven by geography: The continents were widely dispersed during the early Paleozoic, late Mesozoic, and Cenozoic, and the late Paleozoic low roughly corresponds with the assembly of the supercontinent Pangaea. Thus, steeper biogeographic gradients during the late Phanerozoic may have accommodated greater global diversity (1), and the same may have been true during the early Paleozoic. Nevertheless, this hypothesis needs to be explored in more detail because the mid-Phanerozoic offset between the curves is not consistent, and nonbiogeographic factors such as onshore-offshore gradients also may have changed.

Latitudinal diversity gradients. To ensure that the global curve captured any radiation in the tropics that might have occurred, we deliberately focused on the Neogene tropics while simultaneously establishing a baseline sampling level for the entire Phanerozoic. Counts of references in the Paleobiology Database, Sepkoski's compendium, GeoRef, and the Treatise on Invertebrate Paleontology suggest that our data sample the tropics as well as, or better than, those compilations (SOM text). Enough data are available to compare latitudinal belts within several key time intervals.

Subsampling curves for individual Neogene bins show that low-latitude diversity (within 30° paleolatitude of the equator) was substantially higher than northern temperate zone diversity (Fig. 3A). The Neogene tropical curves overlap, and the high-latitude curves are also very similar to each other. In the Ordovician, there was little benthic habitat in the north, so we have computed low-latitude and southern temperate zone curves (Fig. 3B). They show a clear latitudinal diversity gradient for one bin but not the other. Thus, regardless of whether the modern gradient came into existence recently, at least some much older intervals did witness similar patterns. Furthermore, because there is a large difference between the Neogene and Ordovician at both high and low latitudes (Fig. 3), the moderate net increase through the Phanerozoic (Fig. 1) seems to have been a global phenomenon instead of being driven strictly by a radiation in the tropics.

Fig. 3.

Low- (30°S to 30°N paleolatitude) and high-latitude subsampling curves for individual 11-My-long bins. Gray lines indicate low-latitude data. (A) Data for the Cenozoic bins, including the Early/Middle Miocene (dotted lines) and Late Miocene/Pliocene/Pleistocene (solid lines). Black lines indicate data from above 30°N. (B) Data for the Ordovician bins, including the Llanvirn (dotted lines) and Caradoc (solid lines). Black lines denote data from below 30°S.

Previous data sets. The fact that most treatments of older compilations depict a massive Cretaceous-Cenozoic radiation (17, 9, 10) raises the question of whether the differences are primarily methodological or primarily related to coverage of the literature. We answer this question by tabulating our data set and Sepkoski's range-based, genus-level compendium (5) in exactly the same way. Because it is impossible to either sample-standardize or take sampled-in-bin counts if only ranges are available (5), we use raw, unstandardized data and treat our genus age ranges as if they were continuously sampled.

The two data sets yield similarly shaped curves of a comparable magnitude (Fig. 4). The genus-level curves (Fig. 4) virtually overlap, and the same is true for ordinal-level data sets (SOM text). The new genus-level curve is higher than Sepkoski's throughout much of the Mesozoic but still suggests a large Cenozoic radiation, albeit smaller than that in Sepkoski's data. Our raw curve suggests a 3.74 times difference between the late Neogene and the median Ordovician, Silurian, or Devonian interval—slightly more than the factor of 3.54 seen in Sepkoski's compendium (Fig. 4). Both curves identify not just major but also minor features, such as a peak in the late Jurassic.

Fig. 4.

Genus-level diversity curves based on Sepkoski's compendium [thin line (5)] and our new data (thick line). Counts are of marine metazoan genera crossing boundaries between temporal bins (boundary crossers) and exclude tetrapods. Ranges are pulled forward from first fossil appearances to the Recent, instead of ending at the last known fossil appearance. Extant genera are systematically marked as such based on Sepkoski's compendium and the primary literature. There is no correction for sampling, and genera are assumed to be sampled everywhere within their ranges because Sepkoski's traditional synoptic data (5) do not record occurrences within individual collections.

Thus, the dramatic differences between the standardized and conventional curves (Figs. 1 and 4) do not result from data-quality or -quantity problems. We instead attribute these differences to two biases that can only be removed with standardized sampling and sampled-in-bin counting of occurrence data (SOM text). First, we have exceeded the magnitude of Sepkoski's curve in the mid-Phanerozoic with a modest data set but fallen just short of it in the Cenozoic with a massive data set (18). Fully matching his curve would require us to make our sampling even more heterogeneous. Hence, the literature compiled by Sepkoski seems likely to contain a strong Cenozoic sampling bias. Second, counting each extant taxon as sampled everywhere from its last fossil appearance to the Recent [i.e., the Pull of the Recent (7, 32)] exaggerates the artifactual Cretaceous and Cenozoic increase. This problem cannot affect counts of sampled taxa because they take no note of which genera are extant or which extend beyond a particular bin.

Conclusion. The new diversity curve (Fig. 1) records substantial volatility, including some potentially meaningful excursions that might relate to evolutionary innovations, paleogeographic shifts, global climate change, sea-level change, or other factors. In particular, the fact that local and global diversity have not always changed in tandem (Fig. 2) implies that compositional differences between environments or geographic regions have waxed and waned.

Regardless of whether this is true, a more general and important pattern is evident: Most of the Meso-Cenozoic radiation took place well before the Late Cretaceous and Cenozoic, and the net increase through the Phanerozoic was proportionately small relative to the enormous amount of time that elapsed. Strong latitudinal diversity gradients as far back as the Ordovician and modest changes in local-scale sampled diversity (Fig. 2) (26, 29, 30) are consistent with the suggestion that global biodiversity is constrained. Additional evidence includes rapid rebounds from all of the major extinction episodes (Fig. 1) (3).

Although any limit to diversity could not have been static and must have had a net increase (3, 13, 16, 17, 33), our results cannot be reconciled with previous studies that argued for exponential long-term growth on the basis of raw, unstandardized data (1, 4, 6, 9, 10). Allowing for ecological reorganizations in the wake of mass extinction (9) or for the addition of ecological niches by means of evolutionary innovation (31, 34) does not explain how diversification could have been limitless in the face of interval-to-interval changes that rivaled the entire net increase over the Phanerozoic.

Thus, we now must ask what mechanisms could have led to saturation. They may have involved the way that energy is captured from lower trophic levels. This capture could have remained roughly constant (35). Alternatively, if total energy capture did increase (36), this change may have been offset by the diversification of groups with high metabolic rates (9), making it energetically difficult for a large net radiation to occur.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S21


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article