## Abstract

Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

How do we judge similarity? Our mental representations of the world are formed by processing large numbers of sensory inputs—including, for example, the pixel intensities of images, the power spectra of sounds, and the joint angles of articulated bodies. While complex stimuli of this form can be represented by points in a high-dimensional vector space, they typically have a much more compact description. Coherent structure in the world leads to strong correlations between inputs (such as between neighboring pixels in images), generating observations that lie on or close to a smooth low-dimensional manifold. To compare and classify such observations—in effect, to reason about the world—depends crucially on modeling the nonlinear geometry of these low-dimensional manifolds.

Scientists interested in exploratory analysis or visualization of multivariate data (1) face a similar problem in dimensionality reduction. The problem, as illustrated in Fig. 1, involves mapping high-dimensional inputs into a low-dimensional “description” space with as many coordinates as observed modes of variability. Previous approaches to this problem, based on multidimensional scaling (MDS) (2), have computed embeddings that attempt to preserve pairwise distances [or generalized disparities (3)] between data points; these distances are measured along straight lines or, in more sophisticated usages of MDS such as Isomap (4), along shortest paths confined to the manifold of observed inputs. Here, we take a different approach, called locally linear embedding (LLE), that eliminates the need to estimate pairwise distances between widely separated data points. Unlike previous methods, LLE recovers global nonlinear structure from locally linear fits.

The LLE algorithm, summarized in Fig. 2, is based on simple geometric intuitions. Suppose the data consist of *N* real-valued vectors *X⃗*
_{i}, each of dimensionality *D*, sampled from some underlying manifold. Provided there is sufficient data (such that the manifold is well-sampled), we expect each data point and its neighbors to lie on or close to a locally linear patch of the manifold. We characterize the local geometry of these patches by linear coefficients that reconstruct each data point from its neighbors. Reconstruction errors are measured by the cost function(1)which adds up the squared distances between all the data points and their reconstructions. The weights*W*
_{ij} summarize the contribution of the*j*th data point to the *i*th reconstruction. To compute the weights *W*
_{ij}, we minimize the cost function subject to two constraints: first, that each data point*X⃗*
_{i} is reconstructed only from its neighbors (5), enforcing *W*
_{ij}= 0 if *X⃗*
_{j} does not belong to the set of neighbors of *X⃗*
_{i}; second, that the rows of the weight matrix sum to one: Σ_{j}
*W*
_{ij} = 1. The optimal weights *W*
_{ij} subject to these constraints (6) are found by solving a least-squares problem (7).

The constrained weights that minimize these reconstruction errors obey an important symmetry: for any particular data point, they are invariant to rotations, rescalings, and translations of that data point and its neighbors. By symmetry, it follows that the reconstruction weights characterize intrinsic geometric properties of each neighborhood, as opposed to properties that depend on a particular frame of reference (8). Note that the invariance to translations is specifically enforced by the sum-to-one constraint on the rows of the weight matrix.

Suppose the data lie on or near a smooth nonlinear manifold of lower dimensionality *d* << *D*. To a good approximation then, there exists a linear mapping—consisting of a translation, rotation, and rescaling—that maps the high-dimensional coordinates of each neighborhood to global internal coordinates on the manifold. By design, the reconstruction weights*W*
_{ij} reflect intrinsic geometric properties of the data that are invariant to exactly such transformations. We therefore expect their characterization of local geometry in the original data space to be equally valid for local patches on the manifold. In particular, the same weights *W*
_{ij}that reconstruct the *i*th data point in *D*dimensions should also reconstruct its embedded manifold coordinates in*d* dimensions.

LLE constructs a neighborhood-preserving mapping based on the above idea. In the final step of the algorithm, each high-dimensional observation *X⃗*
_{i} is mapped to a low-dimensional vector *Y⃗*
_{i}representing global internal coordinates on the manifold. This is done by choosing *d*-dimensional coordinates*Y⃗*
_{i} to minimize the embedding cost function(2)This cost function, like the previous one, is based on locally linear reconstruction errors, but here we fix the weights *W*
_{ij} while optimizing the coordinates*Y⃗*
_{i}. The embedding cost in Eq. 2 defines a quadratic form in the vectors*Y⃗*
_{i}. Subject to constraints that make the problem well-posed, it can be minimized by solving a sparse *N* × *N* eigenvalue problem (9), whose bottom *d* nonzero eigenvectors provide an ordered set of orthogonal coordinates centered on the origin.

Implementation of the algorithm is straightforward. In our experiments, data points were reconstructed from their *K* nearest neighbors, as measured by Euclidean distance or normalized dot products. For such implementations of LLE, the algorithm has only one free parameter: the number of neighbors, *K*. Once neighbors are chosen, the optimal weights *W*
_{ij} and coordinates *Y⃗*
_{i} are computed by standard methods in linear algebra. The algorithm involves a single pass through the three steps in Fig. 2 and finds global minima of the reconstruction and embedding costs in Eqs. 1 and 2.

In addition to the example in Fig. 1, for which the true manifold structure was known (10), we also applied LLE to images of faces (11) and vectors of word-document counts (12). Two-dimensional embeddings of faces and words are shown in Figs. 3 and4. Note how the coordinates of these embedding spaces are related to meaningful attributes, such as the pose and expression of human faces and the semantic associations of words.

Many popular learning algorithms for nonlinear dimensionality reduction do not share the favorable properties of LLE. Iterative hill-climbing methods for autoencoder neural networks (13, 14), self-organizing maps (15), and latent variable models (16) do not have the same guarantees of global optimality or convergence; they also tend to involve many more free parameters, such as learning rates, convergence criteria, and architectural specifications. Finally, whereas other nonlinear methods rely on deterministic annealing schemes (17) to avoid local minima, the optimizations of LLE are especially tractable.

LLE scales well with the intrinsic manifold dimensionality,*d*, and does not require a discretized gridding of the embedding space. As more dimensions are added to the embedding space, the existing ones do not change, so that LLE does not have to be rerun to compute higher dimensional embeddings. Unlike methods such as principal curves and surfaces (18) or additive component models (19), LLE is not limited in practice to manifolds of extremely low dimensionality or codimensionality. Also, the intrinsic value of *d* can itself be estimated by analyzing a reciprocal cost function, in which reconstruction weights derived from the embedding vectors *Y⃗*
_{i} are applied to the data points *X⃗*
_{i}.

LLE illustrates a general principle of manifold learning, elucidated by Martinetz and Schulten (20) and Tenenbaum (4), that overlapping local neighborhoods—collectively analyzed—can provide information about global geometry. Many virtues of LLE are shared by Tenenbaum's algorithm, Isomap, which has been successfully applied to similar problems in nonlinear dimensionality reduction. Isomap's embeddings, however, are optimized to preserve geodesic distances between general pairs of data points, which can only be estimated by computing shortest paths through large sublattices of data. LLE takes a different approach, analyzing local symmetries, linear coefficients, and reconstruction errors instead of global constraints, pairwise distances, and stress functions. It thus avoids the need to solve large dynamic programming problems, and it also tends to accumulate very sparse matrices, whose structure can be exploited for savings in time and space.

LLE is likely to be even more useful in combination with other methods in data analysis and statistical learning. For example, a parametric mapping between the observation and embedding spaces could be learned by supervised neural networks (21) whose target values are generated by LLE. LLE can also be generalized to harder settings, such as the case of disjoint data manifolds (22), and specialized to simpler ones, such as the case of time-ordered observations (23).

Perhaps the greatest potential lies in applying LLE to diverse problems beyond those considered here. Given the broad appeal of traditional methods, such as PCA and MDS, the algorithm should find widespread use in many areas of science.