The Homepage of Nearest Neighbors and Similarity Search

The Homepage of Nearest Neighbors and Similarity Search
Maintained by Yury Lifshits

Intro Bibliography Researchers Links Maintainer's corner

To preprocess a database of N objects so that given a query object,
one can effectively determine its nearest neighbors in database

The purpose of this page is to collect links, people, ideas, keywords, papers, slides, code and data sets on nearest neighbors in a single place.

Call for promotion and feedback: please, help to deliver these materials to all potentially interested audience. In particular, can you put a hyperlink to this page somewhere? Please, send any feedback and missed links, papers and researchers to maintainer's email: yura@logic.pdmi.ras.ru. Thanks for visiting this page!

Name of the problem: nearest neighbors, k nearest neighbors (kNN, k-NN), nearset neighbor search, proximity search, similarity search, approximate nearest neighbors (ANN), range queries, maximal intersection queries, post-office problem, partial match, best match file searching, best match retrieval, sequence nearest neighbors (SNN).

Solution concepts: locality-sensitive hashing (LSH), low-distortion embeddings, k-d trees, kd-trees, metric trees, M-trees, R*-trees, vp-trees, vantage point trees, vantage point forest, multi-vantage point tree, bisector trees, Orchard's algorithm, random projections, fixed queries tree, Voronoi tree, BBD-tree, min-wise independent permutations, Burkhard-Keller tree, generalized hyperplane tree, geometric near-neighbor access tree (GNAT), approximating eliminating search algorithm (AESA), inverted index, spatial approximation tree (SAT).

Applications: k-nearest neighbor classification algorithm, image similarity identification, audio similarity identification, fingerprint search, audio/video compression (MPEG), optical character recognition, coding theory, function approximation, recommendation systems, near-duplicate detection, targeting on-line ads, distributional similarity computation, spelling correction, nearest neighbor interpolation.

Related keywords: all-nearest-neighbors problem, indexing methods, spatial index, Voronoi diagram, spatial access methods (SAM), multidimensional access methods, closest pair, indexing algorithm, intrinsic dimension, Johnson-Lindenstrauss lemma, Johnson-Lindenstrauss transform, large-scale algorithms, scalability, dimensionality reduction, high dimensions, curse of dimensionality, high-dimensional spaces, cell probe model, metric spaces, Euclidean space, brunch-and-bound search, divide and conquer, massive data sets, metric embeddings, cell probe complexity, spatial data structures, Euclidean k-median problem, point sampling.

To preprocess a database of N objects so that given a query object, one can effectively determine its nearest neighbors in database

To preprocess a database of N objects so that given a query object,
one can effectively determine its nearest neighbors in database