Technical Comments

The Isomap Algorithm and Topological Stability

Science  04 Jan 2002:
Vol. 295, Issue 5552, pp. 7
DOI: 10.1126/science.295.5552.7a

Tenenbaum et al. (1) presented an algorithm, Isomap, for computing a quasi-isometric, low-dimensional embedding of a set of high-dimensional data points. Two issues need to be raised concerning this work. First, the basic approach presented by Tenenbaum et al. is not new, having been described in the context of flattening cortical surfaces using geodesic distances and multidimensional scaling (2, 3) and in later work that used Dijkstra's algorithm to approximate geodesic distances (4, 5). These ideas generalize to arbitrary dimensionality if the connectivity and metric information of the manifold are correctly supplied.

Second, and more important, this approach is topologically unstable and can only be used after careful preprocessing of the data (6). In the application domain of cortical flattening, it is necessary to check manually for connectivity errors, so that points nearby in 3-space (for example, on opposite banks of a cortical sulcus) are not taken to be nearby in the cortical surface. If such care is taken, this method represents the preferred method for quasi-isometric cortical flattening.

What is new about the Isomap algorithm is how it defines the connectivity of each data point via its nearest Euclidean neighbors in the high-dimensional space. This step is vulnerable to short-circuit errors if the neighborhood is too large with respect to folds in the manifold on which the data points lie or if noise in the data moves the points slightly off the manifold. Even a single short-circuit error can alter many entries in the geodesic distance matrix, which in turn can lead to a drastically different (and incorrect) low-dimensional embedding.

We illustrate this failure in Fig. 1, using the MATLAB code published by Tenenbaum et al. along with their “Swiss roll” data, to which we have added a small amount of noise (7). Clearly, the algorithm is topologically unstable: Small errors in the data connectivity (topology) can lead to large errors in the solution. Choosing a very small neighborhood is not a satisfactory solution, as this can fragment the manifold into a large number of disconnected regions. Choosing the neighborhood “just right” requires a priori information about the global geometry of the high-dimensional data manifold (8), but, presumably, it is exactly in the absence of such information that one would need to use an algorithm to find “meaningful low-dimensional structures hidden in high-dimensional observations” (1). In summary, the basic idea of Isomap has long been known, and the new component introduced by Tenenbaum et al. provides an unreliable estimate of surface connectivity, which can lead to failure of the algorithm to perform as claimed.

Figure 1

(A) The “Swiss roll” data used by Tenenbaum et al. (1) to illustrate their algorithm (n = 1000). (B) The two-dimensional (2D) representation computed by theɛ−Isomap variant of the Isomap algorithm, withɛ = 5. Nearby points in the 2D embedding are also nearby points in the 3D manifold, as desired. (C) Data shown in A, with zero-mean normally distributed noise added to the coordinates of each point, where the standard deviation of the noise was chosen to be 2% of smallest dimension of the bounding box enclosing the data. (D) The Isomap (ɛ = 5) solution for the noisy data. There are gross “folds” in the embedding, and neither the metric nor the topological structure of the solution in (B) is preserved.

Figure 1

(A) Our Swiss roll data set (1), shaded according to one intrinsic dimension of the underlying manifold. (B) The Swiss roll data plus Gaussian noise, with zero mean and standard devation equal to approximately 7.5% of the separation between branches of the manifold (comparable to the noise level used by Balasubramanian and Schwartz). (C) An Isomap embedding of the noiseless data in (A) with neighborhood size ɛ = 5 preserves both the intrinsic geometry and the topological order of the underlying manifold. (D) An Isomap embedding of the noisy data in (B) with an appropriately chosen neighborhood size (ɛ = 4.6) also preserves both the intrinsic geometry and the topological order of the underlying manifold, just as in (C). (E) An appropriate neighborhood size for embedding the noiseless data into 2D Euclidean space can be determined based on a trade-off between two cost functions: the fraction of the variance in geodesic distance estimates not accounted for in the Euclidean embedding [(6); black triangles], and the fraction of points not included in the largest connected component of the neighborhood graph, and thus not included in the Euclidean embedding (gray filled circles, with vertical scale three times that shown). Setting the neighborhood size (x axis) too large (above ɛ = 6.2) masks the intrinsic low-dimensional structure of the data set, leading to geodesic distance estimates that are poorly represented by a 2D Euclidean embedding and thus a much higher residual variance. Setting the neighborhood size too small (below ɛ = 3.5) leaves out many points from the largest connected component of the neighborhood graph, leading to the deletion of those points from the Isomap solution and increased residual variance for the remaining points. A stable range of neighborhood sizes between these two extremes leads to embeddings of the entire data set with near-zero residual variance. (F) The same analysis can be used to select a reasonable neighborhood size for the noisy data. A neighborhood size that works well for the noiseless data (ɛ = 5) is too large here, yielding a residual variance of 0.25. However, a stable range of neighborhood sizes just slightly smaller (ɛbetween 3.5 and 4.6) yields reasonable results, with residual variances all less than or equal to 0.01.

REFERENCES AND NOTES

Response: Balasubramanian and Schwartz claim that “the basic idea” of our Isomap technique for nonlinear dimensionality reduction (1) has “long been known” in the context of flattening cortical surfaces. However, the problem of cortical flattening differs in crucial ways from our problem of finding low-dimensional structure in a cloud of high-dimensional data points. State-of-the-art procedures for cortical flattening (2,3) take as input a triangulated mesh of fixed topology and dimensionality, which represents additional structure not available in the general problem we attempted to solve. We take as input only a collection of unorganized data points; the topology and dimensionality of the underlying manifold are unknown and need to be estimated in the process of constructing a faithful low-dimensional embedding. Our algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point's neighbors on the manifold. The simplicity of our algorithm, in contrast to the specialized methods used in cortical flattening (2–4), makes it highly efficient and generally applicable to a broad range of data sources and dimensionalities, and makes possible our theoretical analyses of asymptotic convergence (1).

Balasubramanian and Schwartz also assert that Isomap is “topologically unstable,” based on their failure to construct a topology-preserving 2D embedding of a Swiss roll data set that has been corrupted by additive Gaussian noise. As we discussed in our original report (1), the success of Isomap depends on being able to choose a neighborhood size (ɛ or K) that is neither so large that it introduces “short-circuit” edges into the neighborhood graph, nor so small that the graph becomes too sparse to approximate geodesic paths accurately. Short-circuit edges can lead to low-dimensional embeddings that do not preserve a manifold's true topology, as Balasubramanian and Schwartz illustrate with a choice of neighborhood size (ɛ = 5) that works well for our original noiseless data but that is too large given the level of noise (5) in their data. However, it is misleading to characterize our algorithm as “topologically unstable” without considering its stability over a range of neighborhood sizes. Indeed, this same level of noise poses no difficulty for constructing a topology-preserving embedding if the neighborhood size is chosen just slightly smaller, with ɛ between 4.3 and 4.6 (Fig. 1D).

The suggestion by Balasubramanian and Schwartz that there is no way to choose an appropriate neighborhood size without “a priori information about the global geometry” of the data is incorrect. Such a priori information is useful in bounding the worst-case performance of our algorithm as a function of the number of data points (1), but it is not necessary to use the algorithm in practice. Fig. 1 illustrates one practical approach to selecting an appropriate neighborhood size, based on a trade-off between maximizing the number of points captured in the Euclidean embedding and minimizing the distortion of the geodesic distance estimates (6). This method successfully picks out appropriate neighborhood sizes for both the noiseless (Fig. 1E) and noisy (Fig. 1F) Swiss roll data sets. In both cases, the topology-preserving solution is stable, with a range of neighborhood sizes that achieve close to zero distortion of the geodesic distances while preserving the integrity of the underlying manifold.

For the Swiss roll with 1000 data points and Gaussian additive noise, the stability analysis illustrated in Fig. 1 can find neighborhood sizes that yield topology-preserving embeddings as long as the noise standard deviation is less than approximately 12% of the separation between branches of the manifold (7). Given a larger number of data points or a less curved manifold, the degree of noise tolerance could be substantially better. For instance, with 2000 data points on the Swiss roll, the noise standard deviation may be as high as 20% of the branch separation. For the face images from our original paper (1), topology-preserving embeddings are reliably found in the presence of Gaussian pixel noise with standard deviation as high 70% of each pixel's standard deviation in the noiseless data. More generally, short-circuit edges pose a threat to any attempt to infer the global geometry of a nonlinear data set from a bottom-up estimate of the data's topology; the locally linear embedding (LLE) algorithm (8) embodies a different approach to this same general goal and suffers from the same problem of short-circuit edges in the presence of noisy or sparse data (9). Improving the robustness of these dimensionality reduction algorithms in the presence of high noise levels—where short-circuit edges cannot be eliminated simply by shrinking the neighborhood size—is an important area for future research.

  • * Current address: Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

REFERENCES AND NOTES

Related Content

Subjects

Navigate This Article