## Abstract

Shipley *et al*. (Reports, 3 November 2006, p. 812) developed a quantitative method for predicting the relative abundance of species from measured traits. We show that the method can have high explanatory power even when all trait and abundance data are randomly and independently generated, because of a mathematical dependence between the observations and predictions. We also suggest a potential solution to this problem.

Shipley *et al*. (*1*) developed a quantitative method, using tools borrowed from statistical mechanics, to predict plant relative abundances from field-measured species traits. When the authors applied the method to their field data (from 12 vineyards along a 42-year chronosequence of secondary succession in southern France), they found that up to 94% of the variation in the observed relative abundances could be accounted for by their model. The method is elegant and appealing (*2*, *3*), particularly when considered against the backdrop of a highly successful field application. However, we show that (i) the model can generate high levels of explanatory power even when all trait and abundance data are randomly and independently generated and (ii) this spurious explanatory power arises because the model predictions and the field observations that are being compared are not statistically independent; they are functionally related within the model equations.

This functional dependence can be seen through inspection of the equation that is solved to generate the predictions of the species relative abundances (*4*) where and is the observed community-aggregate value for trait *j* summed over *S* species. The symbol is the observed relative abundance of species *i*, and *p*_{i} is the model-predicted relative abundance of the same species. The remaining symbols in the above equations are defined in (*1*). Because the observed relative abundances are used both to test the model-predicted abundances and in the calculation of those predictions, the two quantities cannot be considered independent, and a relationship between them would a priori be expected.

We explored the implications of this dependence by Monte-Carlo analysis. Random communities were constructed with numbers of traits and species similar to those reported by Shipley *et al*. (*1*). Each random community comprised a matrix of *T* = 8 trait values with varying species richness (*S* = 10, 15, 20, 30, 50, or 100 species). Traits for each species were drawn independently and at random (*5*) based on the eight measured traits analyzed in (*1*), using previously published trait data for the intermediate-succession field ages (*6*). Random sets of “observed” relative abundances were drawn from the standard exponential distribution and normalized to sum to unity. An exponential distribution was used to simulate the typical pattern of dominance by a few abundant species. We refer to these abundances as “pseudo-observed” to prevent confusion with the actual field abundances. For each richness level, 5000 random communities were generated and provided as input to the algorithm (*7*).

In Fig. 1A, the pseudo-observed and model-predicted relative abundances are plotted for the first 25 random communities of each simulation, together with the least-squares line of best fit. Figure 1B displays frequency histrograms of the coefficient of determination (*r*^{2}) for the pseudo-observed versus model-predicted abundances for the 5000 random communities. With 8 traits and 10 species, on average 90% of the variation is explained, even though the underlying pseudo-observed abundances and trait data were independently and randomly generated. As richness increases, the mean variation explained by the model declines (but always remains greater than zero). This reflects a change in the degrees of freedom of the analysis, with correspondingly fewer constraints on the optimization as the number of species increases relative to the number of traits measured (*2*).

Although it is tempting to compare our simulation results directly with the analysis of field data by Shipley *et al*. (*1*), such comparisons may be misleading for two reasons. First, the aim of our analysis was to explore the overall sensitivity of the theoretical framework to the lack of independence, using *r*^{2} values calculated from simulated communities, each containing *S* species. However, the results reported by Shipley *et al*. (*1*) are calculated from the combined abundance predictions from 12 separate analyses, one for each of 12 successional communities. Second, Shipley *et al*. analyzed each of their 12 communities with a matrix size of *S* = 30 species (the global richness across all 12 communities), even though the observed richness for any individual community was usually less than 20, declining to just two species toward the end of the succession (*1*, *8*). Shipley *et al*.'s analysis therefore included matrices with a number of species with zero relative abundance, whereas all species in our simulations had positive abundances. A specific test of the validity of Shipley *et al*.'s field results would therefore require an analysis tailored to their field data set, that is, by aggregating predictions from multiple optimizations and by allowing species to have zero relative abundance.

Our analyses show that care must be taken when interpreting the results from Shipley *et al*.'s (*1*) method, as the variation explained by the model is confounded by at least two sources of variability. The first is the source of variation that is of primary interest, that is, the signature of nonrandom ecological patterns resulting from the process of community assembly. The second is variation due to the mathematical dependence of the observed and predicted abundances, as discussed above. To be of practical utility, the method requires modification to separate these two sources of variation. The randomization approach illustrated here, which quantifies the second source of variation through a null model–based approach, provides one option for separating these effects.