## Abstract

If the fraction of species in area *A* that are also found in one-half of that area is independent of *A*, the distribution of species is self-similar and a number of observed patterns in ecology, including the widely cited species-area relationship connecting species richness to censused area, follow. Self-similarity also leads to a species-abundance distribution, which deviates considerably from the commonly assumed lognormal distribution and predicts considerably more rare species than the latter. Because the abundance distribution is derived under the condition of self-similarity, it may be widely applicable beyond ecology.

Patterns in the distribution and abundance of species within a biome are central concerns in ecology, providing important information about total species richness, the likelihood of species extinction under habitat loss, the design of reserves, and the processes that allow species to coexist and partition resources (1). A number of mathematical functions have been suggested as useful for characterizing observed patterns, with perhaps the most widely cited, but by no means the only plausible, ones being the power law form of the species-area relationship (SAR) (2–4) and the lognormal species-abundance distribution (4–6). The former states that the number of species found in a census patch of area *A* is a constant power of *A*: *S* =*cA ^{z}
*; the latter states that the fraction of species with

*n*individuals is a gaussian function of log(

*n*).

Although available data sets suggest that the lognormal abundance distribution may underestimate the number of rare species in an ecosystem or biome (1, 4, 7–10), in general the use of existing data sets to distinguish among candidate functions describing patterns, and therefore among underlying theories that generate these functions, is quite limited by inadequacies in existing data sets stemming from incomplete censusing and other sources of bias (1,9, 11). Because of these empirical limitations, because an effort (4, 5) to demonstrate a theoretical connection between the lognormal abundance distribution and the species-area relationship has been questioned on theoretical grounds (12), and because establishing mathematical linkages and incompatibilities among patterns may help us understand the mechanisms generating observed patterns, an overarching theoretical framework that unifies our understanding of patterns of species abundance and distribution in ecology is desirable.

Consider area *A*
_{0} where there are*S*
_{0} species. The number of individuals in each species is described by probability distribution*P*
_{0}(*n*), where*P*
_{0}(*n*)*S*
_{0} is the expected number of species with *n* individuals. For convenience we take *A*
_{0} to be a rectangle with a length-to-width ratio of *A _{i}
* the area of each of the rectangles that are formed at the

*i*th bisection, so that

*A*=

_{i}*A*

_{0}/2

^{i}, and we denote by

*S*the number of species found on average in an

_{i}*A*rectangle (Fig. 1).

_{i}We define self-similarity in conformity with the fractal literature (13): a pattern is self-similar if it does not vary with spatial scale. We impose self-similarity in the distribution of species by assuming that if a species is known to be in an*A _{i}
* rectangle, and nothing else about that species (such as its abundance) is known, then the probability that under bisection it will be found in at least a specific one of the two resulting

*A*

_{i}_{+1}rectangles is a constant,

**a**, that is independent of

*i*. This implies that the fraction of those species found in

*A*that are also found in a specific one of the two

_{i}*A*

_{i}_{+1}is the same constant

**a**. The resulting spatial distribution of species is self-similar in the sense that the likelihood of occurrence in a half-patch under bisection is independent of spatial scale (14).

If a species is known to exist in patch*A _{i}
*, there are three mutually exclusive possibilities for its presence or absence in the two

*A*

_{i}_{+1}patches that comprise

*A*: it is found only in the left half, it is found only in the right half, or it is found in both halves. From the above definition of

_{i}**a**, the probability associated with each of these options is easily worked out: (1) (2) (3)Note that the probabilities of these three options sum to 1, as they must. Because the probability a species in

*A*is at least in a specific bisection of

_{i}*A*must be at least 0.5, it follows that 0.5 ≤

_{i}**a**≤ 1. The extreme values of

**a**correspond to the case in which one species is found everywhere (

**a**= 1) and every individual belongs to a unique species (

**a**= 0.5).

By application under repeated bisections of our probability rule, it follows that the average number of species found in any particular*A _{i}
* rectangle is(4)From Eq. 4 it follows that

*S*/

_{i}*S*=

_{j}*a*. Now define

^{i−j}*z*by letting(5)Then

*S*/

_{i}*S*= 2

_{j}^{−}

^{iz}/2

^{−}

^{jz}. However,

*A*/

_{i}*A*= 2

_{j}^{−}

^{i}/2

^{−}

^{j}, so we can write

*S*/

_{i}*S*= (

_{j}*A*/

_{i}*A*)

_{j}^{z}. This is equivalent to

*S*=

_{i}*cA*

_{i}

^{z}, which is just the power law form of the SAR. Thus, we have shown that our self-similarity condition leads to the power law form of the SAR. Elsewhere (15) we have shown that the power law form of the SAR implies Eq. 4 and thus self-similarity. Note from Eq. 5 that 0.5 ≤

**a**≤ 1 implies 1 ≥

*z*≥ 0.

Consider, next, the consequence of Eqs. 1 and 2 above, which can be reexpressed as
(6)Using Eq. 6 combined with the same reasoning that led toEq. 4, the average number, *E _{i}
*, of species found only in a specified

*A*rectangle is given by(7)Defining

_{i}*z*′ = −ln

_{2}(1 −

**a**), Eq. 7 is equivalent to

*E*(

*A*)/

_{i}*E*(

*A*) = (

_{j}*A*/

_{i}*A*)

_{j}^{z}

^{′}or

*E*(

*A*) =

*c*′

*A*

^{z}^{′}. This is just the “endemics-area relationship” previously derived by us from the SAR (15). We note that 0.5 ≤

**a**≤ 1 implies

*z*′ ≥ 1 and that, for the commonly observed value

*z*= 0.25, we have

*z*′ = 2.65.

To derive the distribution *P*
_{0}(*n*) of abundances of individuals within species, we introduce the notion of a smallest patch size or unit rectangle within *A*
_{0}. This area, *A _{m}
*, contains on average one individual, so that

*A*=

_{m}*A*

_{0}/2

^{m}, where the mean total number of individuals in

*A*

_{0}is

*N*

_{0}= 2

^{m}. Because the unit rectangle contains one species as well as one individual,

**a**

^{m}

*S*

_{0}=

*S*= 1, or

_{m}*S*

_{0}=

**a**

^{−}

^{m}. Moreover, using Eq. 5,

*S*

_{0}=

*N*

_{0}

^{z}.

We generalize our definition of *P* so that*P _{i}
*(

*n*) is the probability that if a species is found in a patch of area

*A*, then it contains

_{i}*n*individuals. Our interest ultimately is in

*P*

_{0}(

*n*), the fraction of species in the entire surface that have

*n*individuals, but to obtain that distribution we derive it recursively from the

*P*(

_{i}*n*) for 0 <

*i*≤

*m*. Note that

*P*(1) = 1 (there will be on average one individual of whatever species is present in a unit rectangle) and, for each

_{m}*i*,

*P*(

_{i}*n*) = 0 for

*n*> 2

^{m−i}(on average, one cannot fit more individuals into an area than there are unit patches in that area) and Σ

_{n}

*P*(

_{i}*n*) = 1 (the sum of probabilities of all possible occurrences is 1).

Using Eqs. 1 to 3, and letting 2(1 − **a**) =**x**, where 0 ≤ **x** ≤ 1, it can readily be shown (Fig. 1) that the *P _{i}
*(

*n*) satisfy the following double recursion relation (16): (8)Analytical solutions to Eq. 8 can be derived for the first few values of

*n*(17), but we have not been able to derive the general analytical solution for all

*i*,

*n*, and

*x*. Nevertheless, numerical solutions for

*P*

_{0}(

*n*) are revealing. With

*P*plotted against log(

*n*), these species-abundance distributions are seen to deviate considerably from lognormal, being skewed more toward rarity (more species with low abundance) (Fig. 2).

Plotted on a linear abundance scale, the distributions are more skewed toward commonness than the gaussian but less so than the lognormal. Because the lognormal distribution results from a product of random variables and the normal from a sum, it is not surprising that the distribution resulting from the sum of products in Eq. 8 exhibits intermediate features. Plotted on log-log scales, the*P*
_{0}(*n*) are seen to be of the form*P*
_{0}(*n*) ∼*n ^{c}
*

^{(}

^{x}

^{)}(18) for

*n*values sufficiently below the modal abundance, with

*c*∼ 3/2, 1, and 3/4 for

**x**= 0.26, 0.376, and 0.484; the exponents

*c*(

**x**) are independent of

*m*, as expected from self-similarity (Fig. 3). The parameter

**x**in the species-abundance distribution can be related to the SAR parameter

*z*; using the relations

**x**= 2(1−

**a**) and

**a**= 2

^{−}

^{z}, we get

*z*= −ln

_{2}[1 − (

**x**/2)]. Corresponding to the values

**x**= 0.260, 0.376, and 0.484 in Fig. 2 are the values for the SAR power

*z*= 0.2, 0.3, and 0.4.

There is considerable observational support for our self-similarity condition and the abundance distribution it predicts. First, numerous census data sets are compatible with the power-law form of the SAR, as reviewed by Rosenzweig (3). Second, the few tests carried out on the endemics-area relationship (Eq. 7) show good agreement (15, 19), although considerably more testing is needed. Third, there is considerable evidence (7, 19, 20) that the fraction of species in common to two spatially separated censused patches is a decreasing function of interpatch distance (∝*d*
^{−2}
^{z}), in conformity with self-similarity (15). Fourth, measurements of the dependence of species richness on the shape as well as area of censused patches agree with predictions (19, 21). Fifth, Kunin (22) presented empirical evidence that the amount of habitat occupied by a given plant species exhibits an approximate scale independence when viewed at different scales of resolution through “censusing windows” of various sizes. Our theory not only predicts this result but also provides an explicit relation between the box-counting fractal dimension implied by Kunin's finding and the abundance of the given species (23). Sixth, available abundance data, while often qualitative at best because of sampling problems (9, 11), generally resemble our predicted distribution more than they do the lognormal, exhibiting considerably more rarity than is predicted by the latter distribution (1, 4,8–10). Important exceptions to this occur, however, indicating that self-similarity and the SAR do not always describe species abundance and distributions (24).

Two caveats are in order. It is extremely unlikely that a strictly constant value of *z* in the SAR holds across an entire accessible scale range (3). If, however, *z* is a nonconstant function of scale area, so that *z* =*z*(*i*), then that dependence can be inserted intoEq. 8 and an abundance distribution can still be derived. The nature of the breakdown of strict self-similarity in small patches—for example, strong attraction (**x** ≈ 0) or repulsion (**x** ≈ 1) between nearby individuals of the same species—will then influence the shape of the abundance distribution in larger patches in a testable manner. Nevertheless, an abundance distribution skewed toward rarity relative to the lognormal still results as long as the curvature in *z*(*i*) is not extreme. Secondly, ecosystems are heterogeneous with respect to habitat quality, and thus quantities like*S*(*A _{i}
*) and

*P*(

_{i}*n*) depend on which patch of area

*A*is censused. Moreover, the minimum area per individual (

_{i}*A*) will differ among species and among individuals in a species and thus can be defined only statistically (especially for motile organisms). Thus all statements we have made about the number of species, or the number of individuals within a particular species, in a patch of area

_{m}*A*refer to the average over all the nonoverlapping patches of area

*A*that comprise the system.

We have demonstrated that self-similarity theory provides an overarching framework within which empirically supported patterns in ecology are unified, new and plausible results are derived, and the connection between the SAR and the lognormal abundance distribution is questioned. Because our recursion relation for the species-abundance distribution is derived under the assumption of self-similarity, it may be more widely applicable to other spatial arrays of types of objects or to the distribution of energy fluctuations in turbulent media (25).