The Homepage of Nearest Neighbors and Similarity Search
Maintained by Yury Lifshits

Intro     Bibliography     Researchers     Links     Maintainer's corner

I am now trying to sort all papers by topic. This work is in progress. Please email me all papers I missed so far.

Nearest neighbors in general metric space

P. Zezula, G. Amato, V. Dohnal, M. Batko, Similarity Search - The Metric Space Approach, Springer, 2006.
E. Chavez, G. Navarro, R.A. Baeza-Yates, J.L. Marroquin, Searching in metric spaces, 2001
G.R. Hjaltason, H. Samet, Index-driven similarity search in metric spaces, ACM Transactions on Database Systems, 2003.
K. Fukunaga, P.M. Narendra, A Branch and Bound Algorithm for Computing k-Nearest Neighbors, Transactions on Computers, 1975.
M.T. Orchard, A fast nearest-neighbor search algorithm, ICASSP'91.
N. Roussopoulos, S. Kelley, F. Vincent, Nearest neighbor queries, SIGMOD'95.
L. Mico, J. Oncina, R.C. Carrasco, An algorithm for finding nearest neighbours in constant average time with a linear space complexity, Pattern Recognition Letters, 1996.
D. White, R. Jain, Similarity Indexing: Algorithms and Performance, SPIE'96.
K. Clarkson, Nearest Neighbor Queries in Metric Spaces, STOC'97
P. Ciaccia, M. Patella, P. Zezula, M-tree An Efficient Access Method for Similarity Search in Metric Spaces, VLDB'97.
T. Seidl, H.P. Kriegel, Optimal multi-step k-nearest neighbor search, SIGMOD'98.
R. Weber, H.J. Schek, S. Blott, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces, VLDB'98.
K.P. Bennett, U. Fayyad, D. Geiger, Density-Based Indexing for Approximate Nearest-Neighbor Queries, KDD'99.
K.L. Clarkson, Nearest Neighbor Queries in Metric Spaces, Discrete and Computational Geometry, 1999.
M. Farach-Colton, P. Indyk, Approximate Nearest Neighbor Algorithms for Hausdorff Metrics via Embeddings, FOCS'99.
T. Bozkaya, M. Ozsoyoglu, Indexing large metric spaces for similarity search queries, ACM Transactions on Database Systems, 1999.
P. Ciaccia, M. Patella, PAC nearest neighbor queries: Approximate and controlled search inhigh-dimensional and metric spaces, ICDE'00.
M. Batko, D. Novak, P. Zezula, MESSIF: Metric Similarity Search Implementation Framework, DELOS'07.
M.E. Houle, J. Sakuma, Fast approximate similarity search in extremely high-dimensional data sets, ICDE'05.
P.N. Yianilos, Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces, SODA'93.
V.G. Pestov and A. Stojmirović, Indexing schemes for similarity search: an illustrated paradigm, Fundamenta Informaticae, 2006.
N. Katayama, S. Satoh, The SR-tree: an index structure for high-dimensional nearest neighbor queries, SIGMOD'97.
S. Brin, Near neighbor search in large metric spaces, VLDB'95.
A. Guttman, R-trees: a dynamic index structure for spatial searching, SIGMOD'84.
E. Chavez, G. Navarro, A Compact Space Decomposition for Effective Metric Indexing, Pattern Recognition Letters, 2005.
B. Bustos, G. Navarro, E. Chavez, Pivot Selection Techniques for Proximity Searching in Metric Spaces, Pattern Recognition Letters, 2003.
V. Gaede, O. Gunther, Multidimensional Access Methods, ACM Computing Surveys, 1998

Nearest neighbors in Euclidean space

P. Indyk, Nearest neighbors in high-dimensional spaces, chapter of Handbook of Discrete and Computational Geometry, 2004.
J. Bentley, M. Shamos, Divide and Conquer in Multidimensional Spaces, STOC'76, pp. 220-230.
D. Dobkin, R. Lipton, Multidimensional Searching Problems, SIAM Journal of Computing 5 (1976), pp. 181-186.
Robert F. Sproull Refinements to nearest-neighbor searching in k-dimensional trees (1989)
P. Frankl, H. Maehara, The Johnson-Lindenstrauss Lemma and the Sphericity of Some Graphs, Journal of Combinatorial Theory A, 44(1987):355-362.
S. Arya, D. Mount, Approximate nearest neighbor queries in fixed dimensions, SODA'93.
S.A. Nene and S.K. Nayar A Simple Algorithm for Nearest Neighbor Search in High Dimensions Transactions on Pattern Analysis and Machine Intelligence, 1997.
T.M. Chan Approximate Nearest Neighbor Queries Revisited, SoCG'97.
J. Kleinberg,  Two algorithms for nearest-neighbor search in high dimensions , STOC'97.
S. Berchtold, C. Böhm, D.A. Keim, H.P. Kriegel, A cost model for nearest neighbor search in high-dimensional data space, PODS'97.
S. Berchtold, C. Böhm, B Braunmüller, D.A. Keim, H.P. Kriegel, Fast parallel similarity search in multimedia databases, SIGMOD'97.
S. Arya, D. N. Mount, S. Netanyahu, R. Silverman, A. Y. Wu, An optimal algorithm for approximate nearest neighbor searching, JACM, 1998.
P. Indyk, On Approximate Nearest Neighbors in Non-Euclidean Spaces, FOCS'98.
P. Indyk, R. Motwani, Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality, STOC'98.
E. Kushilevitz, R. Ostrovsky, Y. Rabani, Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces, STOC'98.
K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, When Is "Nearest Neighbor" Meaningful? ICDT'99.
S. Dasgupta, A. Gupta, An elementary proof of the Johnson-Lindenstrauss lemma, Berkeley, TR-99-006.
A. Gionis, P. Indyk, R. Motwani, Similarity Search in High Dimensions via Hashing, VLDB'99.
A. Hinneburg, C.C. Aggarwal, D.A. Keim, What is the Nearest Neighbor in High Dimensional Spaces?, VLDB'00.
P. Indyk, Dimensionality Reduction Techniques for Proximity Problems, SODA 2000.
R. Fagin, R. Kumar, D. Sivakumar, Efficient Similarity Search and Classification via Rank Aggregation, SIGMOD'03.
Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search, VLDB'07.
N. Ailon, B. Chazelle, Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform, STOC'06.
C.C. Aggarwal, On the effects of dimensionality reduction on high dimensional similarity search, SIGMOD'01.
C. Li, E. Chang, H. Garcia-Molina, G. Wiederhold, Clustering for Approximate Similarity Search in High-Dimensional Spaces IEEE Transactions on Knowledge and Data Engineering, 2002.
A. Andoni, P. Indyk, Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions, FOCS'06.

Nearest neighbors and intrinsic dimension

V.G. Pestov, On the geometry of similarity search: dimensionality curse and concentration of measure, Information Processing Letters, 2000.
D.R. Karger, M. Ruhl, Finding nearest neighbors in growth-restricted metrics, STOC'02.
R. Krauthgamer and J.R. Lee, Navigating nets: simple algorithms for proximity search, SODA'04.
R. Krauthgamer and J.R. Lee, The black-box complexity of nearest-neighbor search, Theoretical Computer Science, 2005.
R. Cole and L.-A. Gottlieb, Searching dynamic point sets in spaces with bounded doubling dimension, STOC'06.
K.L. Clarkson, Nearest-neighbor searching and metric space dimensions, 2006.
V.G. Pestov Intrinsic dimension of a dataset: what properties does one expect?, IJCNN'07.
N. Goyal, Y. Lifshits, H. Schütze, Disorder Inequality: A Combinatorial Approach to Nearest Neighbor Search, submitted, 2007.

Nearest neighbors for sequences

R. Agrawal, C. Faloutsos, A. Swami, Efficient similarity search in sequence databases FODO'93.
M. Muthukrishnan, C. Sahinalp, Approximate nearest neighbors and sequence comparison with block operations, STOC'00.
U. Keich, M. Li, B. Ma, J. Tromp , On spaced seeds for similarity search, Discrete Applied Mathematics, 2004.
V.G. Pestov, A. Stojmirović, Indexing Schemes for Similarity Search In Datasets of Short Protein Fragments, accepted to Information Systems, 2007.

Lower bounds

A. Borodin, R. Ostrovsky, Y. Rabani, Lower Bounds for High Dimensional Nearest Neighbor Search and Related Problems, STOC'99.
R. Motwani, A. Naor, R. Panigrahi, Lower bounds on locality sensitive hashing, SoCG'06.
C. Sahinalp and A. Utis, Hardness of String Similarity Search and Other Indexing Problems, ICALP'04.
J. Bruck, M. Naor, The hardness of decoding linear codes with preprocessing, IEEE Transactions on Information Theory, 1990.
A. Chakrabarti, B. Chazelle, B. Gum, A. Lvov, A Lower Bound on the Complexity of Approximate Nearest-Neighbor Searching on the Hamming Cube, STOC'99.
M. Patrascu, M. Thorup, Higher Lower Bounds for Near-Neighbor and Further Rich Problems, FOCS'06.

Related algorithmic problems

K. Clarkson, A randomized algorithm for closest-point queries , SICOMP'88.
P. Agarwal, H. Edelsbrunner, O. Schwartzkopf, E. Wezl, Euclidean minimum spanning trees and bichromatic closest pair, SoCG'90, also Discrete Computational Geometry 1991.
M. Bern, Approximate closest point queries in high dimensions, Information Processing Letters, 1993.
S. Meiser, Point location in arrangements of hyperplanes, Information and Computation, 1993.
K. Clarkson, An algorithm for approximate closest-point queries SoCG'94.
P. Agarwal, J. Erickson, Geometric range searching and its relatives, Advances in Discrete and Computational Geometry, 1999.
D. Eppstein, Fast hierarchical clustering and other applications of dynamic closest pairs, SODA'98.
A. Borodin, R. Ostrovsky, Y. Rabani, Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces, STOC'99.
P. Indyk, Sublinear Time Algorithms for Metric Space Problems , STOC'99.
A.Z. Broder, M. Charikar, A.M. Frieze, M. Mitzenmacher, Min-wise independent permutations, Journal of Computer and System Sciences, 2000.
K. Jain, V.V. Vazirani, Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation, J ACM 2001.
A. Goel, P. Indyk, K. Varadarajan, Reductions among high dimensional proximity problems, SODA 2001.
B. Hoffmann, Y. Lifshits, D. Nowotka, Maximal Intersection Queries in Randomized Graph Models, CSR'07.
R. Benetis, C.S. Jensen, G. Karciauskas, S. Saltenis, Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects VLDB J., 2006.
H. Schutze and C. Silverstein, Projections for Efficient Document Clustering, SIGIR'97.
B. Chazelle, D. Liu, A. Magen, Approximate Range Searching in Higher Dimension, CCCG'2004.
Y. Tao, D. Papadias, Q. Shen, Continuous Nearest Neighbor Search, VLDB'02.

Applications of similarity search

T. Cover, P. Hart, Nearest Neighbor pattern classification, IEEE Transactions on Information Theory, 1967.
A.J. Broder Strategies for efficient incremental nearest neighbor search, Pattern Recognition, 1990.
R. Agrawal, K.I. Lin, H.S. Sawhney, K. Shim Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases, VLDB'95.
S. Berchtold, H.P. Kriegel, S3: similarity search in CAD database systems, SIGMOD'97.
T. Seidl, H.P. Kriegel Efficient User-Adaptable Similarity Search in Large Multimedia Databases, VLDB'97.
D.A. Keim, Efficient Geometry-based Similarity Search of 3D Spatial Databases, SIGMOD'99.
E.J. Keogh and M.J. Pazzani, An Indexing Scheme for Fast Similarity Search in Large Time Series Databases, PAKDD'00.
T. Schlieder, Similarity Search in XML Data using Cost-Based Query Transformations, Workshop on Web and Databases, 2001.
R. Ohbuchi, T. Otagiri, M. Ibato, T. Takei Shape-similarity search of three-dimensional models using parameterized statistics, Pacific Graphics'02.
A. Blessing, H. Schutze, Inverted Indexes For Fast Distributional Similarity Computations, to appear 2007.
E. Keogh, K. Chakrabarti, M. Pazzani, S. Mehrotra, Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases, Knowledge and Information Systems, 2001.
B. Zhang, S.N. Srihari A fast algorithm for finding k-nearest neighbors with non-metric dissimilarity, Frontiers in Handwriting Recognition, 2002.
R.F. Santos Filho, A. Traina, C. Traina Jr, C. Faloutsos, Similarity Search without Tears: the OMNI-Family of All-Purpose Access Methods, ICDE'01.
T.H. Haveliwala, A. Gionis, D. Klein, P. Indyk, Evaluating strategies for similarity search on the web, WWW'02.
S. Chakrabarti, K. Puniyani, S. Das, Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora, WWW'06.
K. Kummamuru, R. Krishnapuram, R. Agrawal, Learning Spatially Variant Dissimilarity (SVaD) Measures, KDD'04.
J. Dean and M.R. Henzinger, Finding Related Web Pages in the World Wide Web, WWW'99.
Z. Bar-Yossef, I. Keidar, U. Schonfeld, Do Not Crawl in the dust: Different URLs with Similar Text, WWW'06.
S. Ilyinsky, M. Kuzmin, A. Melkov, I. Segalovich, An efficient method to detect duplicates of web documents with the use of inverted index, WWW'02.
D. Fogaras, B. Racz, Scaling link-based similarity search, WWW'05.
Q. Lv, M. Charikar, K. Li, Image similarity search with compact data structures, CIKM'04.
S. Prabhakar, D. Agrawal, A.E. Abbadi, Efficient Disk Allocation for Fast Similarity Searching, SPAA'97.
H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, A.E. Abbadi, Approximate Nearest Neighbor Searching in Multimedia Databases, ICDE'01.
A.N. Papadopoulos, Y. Manolopoulos, Nearest Neighbor Search: A Database Perspective, Springer, 2005.
C. Faloutsos, K.-I. Lin, FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets, SIGMOD'95.