Johnson–Lindenstrauss lemma

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space.The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved.In the classical proof of the lemma, the embedding is a random orthogonal projection.The lemma has applications in compressed sensing, manifold learning, dimensionality reduction, graph embedding, and natural language processing.Much of the data stored and manipulated on computers, including text and images, can be represented as points in a high-dimensional space (see vector space model for the case of text).However, the essential algorithms for working with such data tend to become bogged down very quickly as dimension increases.[1] It is therefore desirable to reduce the dimensionality of the data in a way that preserves its relevant structure.The lemma may help explain how large language models (LLMs) are able to represent highly nuanced word meanings and context in relatively low-dimension embeddings.[Note 2] Also, the lemma is tight up to a constant factor, i.e. there exists a set of points of size N that needs dimension in order to preserve the distances between all pairs of points within a factor ofAn orthogonal projection collapses some dimensions of the space it is applied to, which reduces the length of all vectors, as well as distance between vectors in the space.Under the conditions of the lemma, concentration of measure ensures there is a nonzero chance that a random orthogonal projection reduces pairwise distances between all points inIf you keep rolling the dice, you will eventually obtain one in polynomial random time., obtained by sampling each entry from the standard normal distribution., allowing arbitrarily high probability of success per sample, and a fortiori polynomial random time.[7] One can obtain the JL lemma from the distributional version by setting(Achlioptas, 2003)[8] proposed "database-friendly" JL transform, using matrices with only entries from (-1, 0, +1).This requires upper-bounding the cumulant generating function (CGF).In the bottom sum, we run a similar argument with each such term, which requires bounding the cumulant generating function on both ends.(Matoušek, 2008)[9] proposed a variant of the above JL transform that is even more sparsified, though it only works on "well-spread" vectors.There has been some work in deriving distributions for which the matrix vector product can be computed in less thanThe first, Fast Johnson Lindenstrauss Transform (FJLT),[10] was introduced by Ailon and Chazelle in 2006.This method allows the computation of the matrix vector product in justAnother approach is to build a distribution supported over matrices that are sparse.non-zero entries, the Sparse JL takes timeIt is possible to combine two JL matrices by taking the so-called face-splitting product, which is defined as the tensor products of the rows (was proposed by V. Slyusar[12] in 1996[13][14][15][16][17] for radar and digital antenna array applications).is[13][14][15][16][17] This idea of tensorization was used by Kasiviswanathan et al. for differential privacy.[18] JL matrices defined like this use fewer random bits, and can be applied quickly to vectors that have tensor structure, due to the following identity:[15] wheresatisfies the distributional JL lemma if the number of rows is at least For largethis is as good as the completely random Johnson-Lindenstrauss, but a matching lower bound in the same paper shows that this exponential dependency on
William B. JohnsonJoram LindenstraussembeddingsEuclidean spacenearly preservedrandomorthogonal projectioncompressed sensingmanifold learningdimensionality reductiongraph embeddingnatural language processingvector space modellarge language modelsconcentration of measurechi-square distributedconcentration inequalitycumulant generating functionChazelleV. Slyusardigital antenna arraydifferential privacyHadamardpolynomial kernelsRandom projectionRestricted isometry propertyWord embeddingsnearest neighbor searchJon KleinbergJohnson, William B.Lindenstrauss, JoramJournal of the ACMPagh, RasmusJournal of Computer and System SciencesBaraniuk, RichardDeVore, RonaldConstructive ApproximationLandweber, PeterAmerican Mathematical Monthly