## Abstract

We maintain that there is no circularity or structural bias in our model that inflates its predictive ability beyond the sampling bias inherent in the *r* ^{2} statistic. When replacing observed average traits by environmental variables, the generality and predictive ability depends on one's empirical ability to predict these average values. Finally, maximizing rather than minimizing entropy given constraints is justified by axioms of information theory and is not an ecological assumption.

We developed a quantitative method based on plant traits to predict the relative abundance of plant species in a given ecological community (*1*). Roxburgh and Mokany (*2*) argue that the high explanatory power of our model is due to a mathematical dependence between the model predictions and field observations, whereas Marks and Muller-Landau (*3*) suggest that the model's entropy maximization assumption is relatively unimportant and that its explanatory power is spurious “because the predictions are being made for the same sites from which the trait averages were calculated.”

In response to Roxburgh and Mokany (*2*), there is a bias in our reported *r*^{2} values, but it is a sampling bias that is a general property of the *r*^{2} statistic applied to any model (*4*, *5*), not one specific to ours. Roxburgh and Mokany incorrectly state that spurious explanatory power in our model arises because model predictions and field observations “are functionally related within the model equations.” Their own simulation results show that this is not true, because the average *r*^{2} value and its variation decreased as the number of species (*S*) in the simulation increased, even though the model equations remained the same. This occurs because even independent random variables will have an *r* value different from zero due simply to sampling effects, and squaring this value will always result in an *r*^{2} greater than zero in any finite sample. The smaller the residual degrees of freedom relative to the number of variables in a model, the larger the spread of *r* values around zero, and therefore the farther the average *r*^{2} will be from zero (*4*, *5*). As an example, we repeated the simulation exercise of Roxburgh and Mokany based on a multiple regression model rather than our Maxent model. We generated 5000 simulated data sets of each of six sample sizes [10, 15, 20, 30, 50, and 100, analogous to the species pool sizes specified in (*2*)] in which a standard normal random value (*y*) was regressed on a series of eight mutually independent standard normal values (analogous to our eight traits): y ∼ x_{1} + x_{2} +... +x_{8}. The result is shown in Fig. 1. The sample *r*^{2} statistic is a biased estimate of the population value (ρ^{2}), even when ρ^{2} = 0 (*4*, *5*). Its expected value, given ρ^{2} = 0 and specified degrees of freedom (*v*), is (*5*) (1)

Table 1 summarizes the simulation values of Roxburgh and Mokany (*2*), of our simulation, and the theoretical values from Eq. 1. The similarity of the results shows that the bias is due to sampling variation, not to anything in particular about our model. It is because of this bias that statisticians recommend (e.g., *6*) that the bias-adjusted *r*^{2} value be given when reporting *r*^{2} values from a multiple regression (2)

Number of species (S) | Theoretical | Observed[from (View inline)] | Observed
[from (View inline)] |
---|---|---|---|

10 | 0.90 | 0.90 | 0.89 |

15 | 0.60 | 0.66 | 0.57 |

20 | 0.45 | 0.50 | 0.42 |

30 | 0.30 | 0.33 | 0.28 |

50 | 0.18 | 0.18 | 0.16 |

100 | 0.09 | 0.07 | 0.08 |

In our study (*1*), the value of *S* was always 30 (the species pool) for every site because we always predicted the relative abundance of every species in the pool, not just the species abundances that were actually observed in a given site. Using observed community-aggregated traits, we found that *r*^{2} = 0.94; thus, *r*^{2}_{adj} = 0.92. None of the 5000 simulations in (*2*) had *r*^{2} values this high. Roxburgh and Mokany's simulation results therefore show that our model is significant at *P* < 1/5000. Their use of Monte Carlo simulations to test for significance is a good idea and could be routinely done using a permutation approach (*7*). Their simulations also show why our model cannot be used with few species in the species pool and why it is not intended to predict relative abundances of only those species that are known to exist at a site. *S* in our model is the number of species occurring in the entire species pool, not at a single site. Assuming that the functional traits of a plant affect its chances of dispersing, surviving, growing and reproducing (thus, capturing resources), then a species that is present in the pool but absent from a site must also be predicted based on its functional traits. This is a key element of our model and sets a much higher predictive standard that simply predicting the relative abundance of those species already known to occur at the site.

Marks and Muller-Landau (*3*) claim that the high predictive ability of our model is largely a statistical effect. Our discussion above partly addresses this issue. Our model can account for a high proportion of the observed relative abundances, even correcting for sampling bias, when the actual community-aggregated traits are known; this is not a statistical artifact. However, the criticism of Marks and Muller-Landau extends beyond the point raised by Roxburgh and Mokany in two ways: (i) by distinguishing between the ability to predict relative abundances at a site given the observed community-aggregated traits and an ability to predict these relative abundances given only indirect indicators (successional age) of these community-aggregated traits and (ii) by distinguishing between the ability of the model to predict the observed relative abundances given site information and its ability to generalize to sites not included in the data set. We agree with these criticisms. This is why we explained in (*1*) that “[r]ealizing these potentials [of our model] will require... a demonstration of generality in the patterns of community-aggregated traits along such gradients.” The best way of doing this is to test the predictions on completely new sites. That said, the cross-validation procedure described in (*3*) is a good way of evaluating this generality using only our data. Marks and Muller-Landau reported that *r*^{2} = 0.32 for the cross-validated data. The *r*^{2} value in our study (0.94), with which they contrast their value, was not based on the observed relative abundances as in (*3*) but rather on the observed relative abundances that had been smoothed across the successional gradient using cubic-spline smoothers (*1*). This is a standard procedure in multivariate vegetation analysis (*8*) to describe the major trend in a species' distribution along an environmental gradient while removing fluctuations due to sampling. The cross-validated *r*^{2} would be much higher than 0.32 when using such smoothed values. Nonetheless, the cross-validated *r*^{2} value is instructive and useful. We calculated the cross-validation *r*^{2} values separately for each site using the method described in (*3*) and obtained the following values for the 12 sites, in order of decreasing predictive ability: 0.99, 0.94, 0.75, 0.65, 0.32, 0.25, 0.15, 0.02, 0.02, 0.00, 0.00, 0.00. The average of these 12 values (0.34) is close to the value reported in (*3*). Our model's ability to predict community structure in a site that is not included in the analysis, given only successional age but not the community-aggregated traits themselves, varies from almost perfect to none. This means that successional age is not an accurate general surrogate for the actual environmental gradients selecting for changes in our community-aggregated traits over sites. The ability to quantify such deviations between observed and predicted patterns is an important component of our model because it suggests where new traits and better quantification of environmental gradients is needed. The general usefulness of our method will depend on our ability to predict community-aggregated traits from environmental variables.

Our model is a specific application of the maximum entropy formalism (*9*–*12*), which consists of assigning probabilities (relative abundances) to each mutually exclusive state of a system, given only partial information in the form of constraints representing macroscopic properties. This is done by choosing the probability distribution that both agrees with these constraints and, subject to these constraints, maximizes Shannon's information entropy. Marks and Muller-Landau contend that the assumption of maximum entropy is relatively unimportant because a distribution that minimizes the entropy does almost as well at predicting the observed relative abundances of the 30 species over the 12 sites in (*1*).

Assigning probabilities by maximizing Shannon's entropy is neither novel (*13*) nor arbitrary (*9*–*12*). The justification for maximizing entropy, given constraints, has nothing to do with biodiversity, niche differentiation, or ecological theory; perhaps the confusion implied in (*3*) arises because ecologists use Shannon's measure as an ad hoc quantification of biodiversity (*14*). Rather, the justification comes from the basic axioms of probability theory and Bayesian statistical inference (*9*). It is equivalent to assigning a maximally uninformative (i.e., uniform) discrete prior probability in a Bayesian analysis and is a generalization of Laplace's principle of indifference. It consists of choosing the probability distribution that is consistent with the available information (quantified in the constraints) but that does not imply any further constraints for which information is not available, that is, it is maximally uninformative as quantified by Shannon's entropy, which is the only consistent measure of the amount of uncertainty in a probability distribution (*15*). For example, imagine that we must assign probabilities to a die but that the only information (*I*_{0}) we have is that it has six sides. *I*_{0} implies only one constraint: *p*_{i} = 1. The distribution that is consistent with *I*_{0}, but that is otherwise maximally noncommittal (i.e., has highest entropy), is the uniform distribution *P*_{1} = {p_{i} = 1/6 for all *i*}. We could, following Marks and Muller-Landau, choose the distribution *P*_{2} = (1,0,0,0,0,0), which is one of six different distributions that minimizes entropy and which is also consistent with this one piece of information (*I*_{0}), but doing so implies that we have more information about this die than simply that it has six sides, because the entropy of *P*_{2} is lower than that of *P*_{1}. In fact, *P*_{2} has minimum (zero) entropy, which implies that we know exactly how the die will next fall. If we do have all of the information (*I*_{1}) required to perfectly predict the behavior of the die, then maximizing the entropy conditional on both *I*_{0} and *I*_{1} would result in a distribution with minimum entropy.

In our study (*1*), the available information for assigning relative abundances to the 30 species in a site was in the form of the eight community-aggregated traits. If, as Marks and Muller-Landau suggest, one chooses a distribution of relative abundances that minimizes the information entropy (and there is more than one), one is assuming information beyond those quantified by the community-aggregated traits. In effect one is claiming, as a general principle, that plant communities will always consist of a single species (minimum entropy) unless prevented by some other force. Such information, if true, would revolutionize plant ecology.