Results 1  10
of
393
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 457 (7 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Scalable Network Distance Browsing in Spatial Databases
, 2008
"... An algorithm is presented for finding the k nearest neighbors in a spatial network in a bestfirst manner using network distance. The algorithm is based on precomputing the shortest paths between all possible vertices in the network and then making use of an encoding that takes advantage of the fact ..."
Abstract

Cited by 84 (10 self)
 Add to MetaCart
(Show Context)
An algorithm is presented for finding the k nearest neighbors in a spatial network in a bestfirst manner using network distance. The algorithm is based on precomputing the shortest paths between all possible vertices in the network and then making use of an encoding that takes advantage of the fact that the shortest paths from vertex u to all of the remaining vertices can be decomposed into subsets based on the first edges on the shortest paths to them from u. Thus, in the worst case, the amount of work depends on the number of objects that are examined and the number of links on the shortest paths to them from q, rather than depending on the number of vertices in the network. The amount of storage required to keep track of the subsets is reduced by taking advantage of their spatial coherence which is captured by the aid of a shortest path quadtree. In particular, experiments on a number of large road networks as
Spatial Join Techniques
"... A variety of techniques for performing a spatial join are reviewed. Instead of just summarizing the literature and presenting each technique in its entirety, distinct components of the different techniques are described and each is decomposed into an overall framework for performing a spatial join. ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
A variety of techniques for performing a spatial join are reviewed. Instead of just summarizing the literature and presenting each technique in its entirety, distinct components of the different techniques are described and each is decomposed into an overall framework for performing a spatial join. A typical spatial join technique consists of the following components: partitioning the data, performing internalmemory spatial joins on subsets of the data, and checking if the full polygons intersect. Each technique is decomposed into these components and each component addressed in a separate section so as to compare and contrast similar aspects of each technique. The goal of this survey is to describe the algorithms within each component in detail, comparing and contrasting competing methods, thereby enabling further analysis and experimentation with each component and allowing the best algorithms for a particular situation to be built piecemeal, or, even better, enabling an optimizer to choose which algorithms to use. Categories and Subject Descriptors: H.2.4 [Database Management]: Systems—Query processing; H.2.8 [Database Management]: Database Applications—Spatial databases and GIS
Differentially private spatial decompositions
 In ICDE
, 2012
"... Abstract — Differential privacy has recently emerged as the de facto standard for private data release. This makes it possible to provide strong theoretical guarantees on the privacy and utility of released data. While it is wellunderstood how to release data based on counts and simple functions un ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
(Show Context)
Abstract — Differential privacy has recently emerged as the de facto standard for private data release. This makes it possible to provide strong theoretical guarantees on the privacy and utility of released data. While it is wellunderstood how to release data based on counts and simple functions under this guarantee, it remains to provide general purpose techniques that are useful for a wider variety of queries. In this paper, we focus on spatial data, i.e., any multidimensional data that can be indexed by a tree structure. Directly applying existing differential privacy methods to this type of data simply generates noise. We propose instead the class of “private spatial decompositions”: these adapt standard spatial indexing methods such as quadtrees and kdtrees to provide a private description of the data distribution. Equipping such structures with differential privacy requires several steps to ensure that they provide meaningful privacy guarantees. Various basic steps, such as choosing splitting points and describing the distribution of points within a region, must be done privately, and the guarantees of the different building blocks must be composed into an overall guarantee. Consequently, we expose the design space for private spatial decompositions, and analyze some key examples. A major contribution of our work is to provide new techniques for parameter setting and postprocessing of the output to improve the accuracy of query answers. Our experimental study demonstrates that it is possible to build such decompositions efficiently, and use them to answer a variety of queries privately and with high accuracy. I.
Architecture of a SpatioTextual Search Engine
 In: Proceedings of the 15th ACM Int. Symp. on Advances in Geographic Information Systems (ACMGIS07), ACM Press (2007) 186 – 193
"... STEWARD (\SpatioTextual Extraction on the Web Aiding Retrieval of Documents"), a system for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents, is presented. Methods for retrieving and processing web documents, extracting and disam ..."
Abstract

Cited by 42 (19 self)
 Add to MetaCart
(Show Context)
STEWARD (\SpatioTextual Extraction on the Web Aiding Retrieval of Documents"), a system for extracting, querying, and visualizing textual references to geographic locations in unstructured text documents, is presented. Methods for retrieving and processing web documents, extracting and disambiguating georeferences, and identifying geographic focus are described. A brief overview of STEWARD's querying capabilities, as well as the design of an intuitive user interface, are provided. Finally, several application scenarios and future extensions to STEWARD are discussed.
How do humans sketch objects
 ACM Trans. on Graphics (Proc. SIGGRAPH
"... Figure 1: In this paper we explore how humans sketch and recognize objects from 250 categories – such as the ones shown above. Humans have used sketching to depict our visual world since prehistoric times. Even today, sketching is possibly the only rendering technique readily available to all humans ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
Figure 1: In this paper we explore how humans sketch and recognize objects from 250 categories – such as the ones shown above. Humans have used sketching to depict our visual world since prehistoric times. Even today, sketching is possibly the only rendering technique readily available to all humans. This paper is the first large scale exploration of human sketches. We analyze the distribution of nonexpert sketches of everyday objects such as ‘teapot’ or ‘car’. We ask humans to sketch objects of a given category and gather 20,000 unique sketches evenly distributed over 250 object categories. With this dataset we perform a perceptual study and find that humans can correctly identify the object category of a sketch 73 % of the time. We compare human performance against computational recognition methods. We develop a bagoffeatures sketch representation and use multiclass support vector machines, trained on our sketch dataset, to classify sketches. The resulting recognition method is able to identify unknown sketches with 56 % accuracy (chance is 0.4%). Based on the computational model, we demonstrate an interactive sketch recognition system. We release the complete crowdsourced dataset of sketches to the community.
A Fast Similarity Join Algorithm Using Graphics Processing Units
"... Abstract — A similarity join operation A ⋊⋉ɛ B takes two sets of points A, B and a value ɛ ∈ R, and outputs pairs of points p ∈ A, q ∈ B, such that the distance D(p, q) ≤ ɛ. Similarity joins find use in a variety of fields, such as clustering, text mining, and multimedia databases. A novel similari ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
(Show Context)
Abstract — A similarity join operation A ⋊⋉ɛ B takes two sets of points A, B and a value ɛ ∈ R, and outputs pairs of points p ∈ A, q ∈ B, such that the distance D(p, q) ≤ ɛ. Similarity joins find use in a variety of fields, such as clustering, text mining, and multimedia databases. A novel similarity join algorithm called LSS is presented that executes on a Graphics Processing Unit (GPU), exploiting its parallelism and high data throughput. As GPUs only allow simple data operations such as the sorting and searching of arrays, LSS uses these two operations to cast a similarity join operation as a GPU sortandsearch problem. It first creates, on the fly, a set of spacefilling curves on one of its input datasets, using a parallel GPU sort routine. Next, LSS processes each point p of the other dataset in parallel. For each p, it searches an interval of one of the spacefilling curves guaranteed to contain all the pairs in which p participates. Using extensive theoretical and experimental analysis, LSS is shown to offer a good balance between time and work efficiency. Experimental results demonstrate that LSS is suitable for similarity joins in large highdimensional datasets, and that it performs well when compared against two existing prominent similarity join methods. I.
Efficient query processing on spatial networks
 In Proceedings of the 13th ACM International Symposium on Advances in Geographic Information Systems
, 2005
"... A framework for determining the shortest path and the distance between every pair of vertices on a spatial network is presented. The framework, termed SILC, uses path coherence between the shortest path and the spatial positions of vertices on the spatial network, thereby, resulting in an encoding t ..."
Abstract

Cited by 34 (15 self)
 Add to MetaCart
(Show Context)
A framework for determining the shortest path and the distance between every pair of vertices on a spatial network is presented. The framework, termed SILC, uses path coherence between the shortest path and the spatial positions of vertices on the spatial network, thereby, resulting in an encoding that is compact in representation and fast in path and distance retrievals. Using this framework, a wide variety of spatial queries such as incremental nearest neighbor searches and spatial distance joins can be shown to work on datasets of locations residing on a spatial network of sufficiently large size. The suggested framework is suitable for both main memory and diskresident datasets. Categories and Subject Descriptors
Effective Proximity Retrieval by Ordering Permutations
, 2007
"... We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in m ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in many pattern recognition tasks. This, for example, renders the KNN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against stateoftheart exact and approximate techniques, both in synthetic and real, metric and nonmetric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
Recognition and Retrieval of Mathematical Expressions
 INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION
"... Document recognition and retrieval technologies complement one another, providing improved access to increasingly large document collections. While recognition and retrieval of textual information is fairly mature, with widespread availability of Optical Character Recognition (OCR) and textbased ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
Document recognition and retrieval technologies complement one another, providing improved access to increasingly large document collections. While recognition and retrieval of textual information is fairly mature, with widespread availability of Optical Character Recognition (OCR) and textbased search engines, recognition and retrieval of graphics such as images, figures, tables, diagrams, and mathematical expressions are in comparatively early stages of research. This paper surveys the state of the art in recognition and retrieval of mathematical expressions, organized around four key problems in math retrieval (query construction, normalization, indexing, and relevance feedback), and four key problems in math recognition (detecting expressions, detecting and classifying symbols, analyzing symbol layout, and constructing a representation of meaning). Of special interest is the machine learning problem of jointly optimizing the component algorithms in a math recognition system, and developing effective indexing, retrieval and relevance feedback algorithms for math retrieval. Another important open problem is developing user interfaces that seamlessly integrate recognition and retrieval. Activity in these important research areas is increasing, in part because math notation provides an excellent domain for studying problems common to many document and graphics recognition and retrieval applications, and also because mature applications will likely provide substantial benefits for education, research, and mathematical literacy.