Poster
A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice
Hendrik Fichtenberger · Dennis Rohde
Room 210 #91
Keywords: [ Learning Theory ] [ Classification ] [ Computational Complexity ]
[
Abstract
]
Abstract:
In the kk-nearest neighborhood model (kk-NN), we are given a set of points PP, and we shall answer queries qq by returning the kk nearest neighbors of qq in PP according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many kk-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed kk-NN is not explicit. We study property testing of kk-NN graphs in theory and evaluate it empirically: given a point set P⊂RδP⊂Rδ and a directed graph G=(P,E)G=(P,E), is GG a kk-NN graph, i.e., every point p∈Pp∈P has outgoing edges to its kk nearest neighbors, or is it ϵϵ-far from being a kk-NN graph? Here, ϵϵ-far means that one has to change more than an ϵϵ-fraction of the edges in order to make GG a kk-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the kk-NN property, with complexity O(√nk2/ϵ2)O(√nk2/ϵ2) measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of Ω(√n/ϵk)Ω(√n/ϵk). We evaluate our tester empirically on the kk-NN models computed by various algorithms and show that it can be used to detect kk-NN models with bad accuracy in significantly less time than the building time of the kk-NN model.
Live content is unavailable. Log in and register to view live content