Tenenbaum et al. (1) presented an algorithm, Isomap, for computing a quasiisometric, lowdimensional embedding of a set of highdimensional data points. Two issues need to be raised concerning this work. First, the basic approach presented by Tenenbaum et al. is not new, having been described in the context of flattening cortical surfaces using geodesic distances and multidimensional scaling (2, 3) and in later work that used Dijkstra's algorithm to approximate geodesic distances (4, 5). These ideas generalize to arbitrary dimensionality if the connectivity and metric information of the manifold are correctly supplied.
Second, and more important, this approach is topologically unstable and can only be used after careful preprocessing of the data (6). In the application domain of cortical flattening, it is necessary to check manually for connectivity errors, so that points nearby in 3space (for example, on opposite banks of a cortical sulcus) are not taken to be nearby in the cortical surface. If such care is taken, this method represents the preferred method for quasiisometric cortical flattening.
What is new about the Isomap algorithm is how it defines the connectivity of each data point via its nearest Euclidean neighbors in the highdimensional space. This step is vulnerable to shortcircuit errors if the neighborhood is too large with respect to folds in the manifold on which the data points lie or if noise in the data moves the points slightly off the manifold. Even a single shortcircuit error can alter many entries in the geodesic distance matrix, which in turn can lead to a drastically different (and incorrect) lowdimensional embedding.
We illustrate this failure in Fig. 1, using the MATLAB code published by Tenenbaum et al. along with their “Swiss roll” data, to which we have added a small amount of noise (7). Clearly, the algorithm is topologically unstable: Small errors in the data connectivity (topology) can lead to large errors in the solution. Choosing a very small neighborhood is not a satisfactory solution, as this can fragment the manifold into a large number of disconnected regions. Choosing the neighborhood “just right” requires a priori information about the global geometry of the highdimensional data manifold (8), but, presumably, it is exactly in the absence of such information that one would need to use an algorithm to find “meaningful lowdimensional structures hidden in highdimensional observations” (1). In summary, the basic idea of Isomap has long been known, and the new component introduced by Tenenbaum et al. provides an unreliable estimate of surface connectivity, which can lead to failure of the algorithm to perform as claimed.
REFERENCES AND NOTES
Response: Balasubramanian and Schwartz claim that “the basic idea” of our Isomap technique for nonlinear dimensionality reduction (1) has “long been known” in the context of flattening cortical surfaces. However, the problem of cortical flattening differs in crucial ways from our problem of finding lowdimensional structure in a cloud of highdimensional data points. Stateoftheart procedures for cortical flattening (2,3) take as input a triangulated mesh of fixed topology and dimensionality, which represents additional structure not available in the general problem we attempted to solve. We take as input only a collection of unorganized data points; the topology and dimensionality of the underlying manifold are unknown and need to be estimated in the process of constructing a faithful lowdimensional embedding. Our algorithm provides a simple method for estimating the intrinsic geometry of a data manifold based on a rough estimate of each data point's neighbors on the manifold. The simplicity of our algorithm, in contrast to the specialized methods used in cortical flattening (2–4), makes it highly efficient and generally applicable to a broad range of data sources and dimensionalities, and makes possible our theoretical analyses of asymptotic convergence (1).
Balasubramanian and Schwartz also assert that Isomap is “topologically unstable,” based on their failure to construct a topologypreserving 2D embedding of a Swiss roll data set that has been corrupted by additive Gaussian noise. As we discussed in our original report (1), the success of Isomap depends on being able to choose a neighborhood size (ɛ or K) that is neither so large that it introduces “shortcircuit” edges into the neighborhood graph, nor so small that the graph becomes too sparse to approximate geodesic paths accurately. Shortcircuit edges can lead to lowdimensional embeddings that do not preserve a manifold's true topology, as Balasubramanian and Schwartz illustrate with a choice of neighborhood size (ɛ = 5) that works well for our original noiseless data but that is too large given the level of noise (5) in their data. However, it is misleading to characterize our algorithm as “topologically unstable” without considering its stability over a range of neighborhood sizes. Indeed, this same level of noise poses no difficulty for constructing a topologypreserving embedding if the neighborhood size is chosen just slightly smaller, with ɛ between 4.3 and 4.6 (Fig. 1D).
The suggestion by Balasubramanian and Schwartz that there is no way to choose an appropriate neighborhood size without “a priori information about the global geometry” of the data is incorrect. Such a priori information is useful in bounding the worstcase performance of our algorithm as a function of the number of data points (1), but it is not necessary to use the algorithm in practice. Fig. 1 illustrates one practical approach to selecting an appropriate neighborhood size, based on a tradeoff between maximizing the number of points captured in the Euclidean embedding and minimizing the distortion of the geodesic distance estimates (6). This method successfully picks out appropriate neighborhood sizes for both the noiseless (Fig. 1E) and noisy (Fig. 1F) Swiss roll data sets. In both cases, the topologypreserving solution is stable, with a range of neighborhood sizes that achieve close to zero distortion of the geodesic distances while preserving the integrity of the underlying manifold.
For the Swiss roll with 1000 data points and Gaussian additive noise, the stability analysis illustrated in Fig. 1 can find neighborhood sizes that yield topologypreserving embeddings as long as the noise standard deviation is less than approximately 12% of the separation between branches of the manifold (7). Given a larger number of data points or a less curved manifold, the degree of noise tolerance could be substantially better. For instance, with 2000 data points on the Swiss roll, the noise standard deviation may be as high as 20% of the branch separation. For the face images from our original paper (1), topologypreserving embeddings are reliably found in the presence of Gaussian pixel noise with standard deviation as high 70% of each pixel's standard deviation in the noiseless data. More generally, shortcircuit edges pose a threat to any attempt to infer the global geometry of a nonlinear data set from a bottomup estimate of the data's topology; the locally linear embedding (LLE) algorithm (8) embodies a different approach to this same general goal and suffers from the same problem of shortcircuit edges in the presence of noisy or sparse data (9). Improving the robustness of these dimensionality reduction algorithms in the presence of high noise levels—where shortcircuit edges cannot be eliminated simply by shrinking the neighborhood size—is an important area for future research.

↵* Current address: Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA