Results 1 - 10
of
503
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
, 1997
"... A new access meth d, called M-tree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion o ..."
Abstract
-
Cited by 663 (38 self)
- Add to MetaCart
A new access meth d, called M-tree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion of objects and split management, whF h keep th M-tree always balanced - severalheralvFV split alternatives are considered and experimentally evaluated. Algorithd for similarity (range and k-nearest neigh bors) queries are also described. Results from extensive experimentationwith a prototype system are reported, considering as th performance criteria th number of page I/O's and th number of distance computations. Th results demonstratethm th Mtree indeed extendsth domain of applicability beyond th traditional vector spaces, performs reasonably well inhE[94Kv#E44V[vh data spaces, and scales well in case of growing files. 1
Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval,”
- In IEEE Transaction on Circuits and Video Technology,
, 1998
"... ..."
Similarity search in high dimensions via hashing
, 1999
"... The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over high-dimensional data, e.g., image dat ..."
Abstract
-
Cited by 641 (10 self)
- Add to MetaCart
The nearest- or near-neighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. Unfortunately, all known techniques for solving this problem fall prey to the \curse of dimensionality. " That is, the data structures scale poorly with data dimensionality; in fact, if the number of dimensions exceeds 10 to 20, searching in k-d trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than brute-force linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather heuristic, determining an approximate nearest neighbor should su ce for most practical purposes. In this paper, we examine a novel scheme for approximate similarity search based on hashing. The basic idea is to hash the points
Fast subsequence matching in time-series databases
- PROCEEDINGS OF THE 1994 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1994
"... We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract
-
Cited by 533 (24 self)
- Add to MetaCart
We present an efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*-tree [9]. In more detail, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an ecient and eective algorithm to divide such trails into sub-trails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
Image retrieval: Current techniques, promising directions and open issues
- Journal of Visual Communication and Image Representation
, 1999
"... This paper provides a comprehensive survey of the technical achievements in the research area of image retrieval, especially content-based image retrieval, an area that has been so active and prosperous in the past few years. The survey includes 100+ papers covering the research aspects of image fea ..."
Abstract
-
Cited by 507 (15 self)
- Add to MetaCart
(Show Context)
This paper provides a comprehensive survey of the technical achievements in the research area of image retrieval, especially content-based image retrieval, an area that has been so active and prosperous in the past few years. The survey includes 100+ papers covering the research aspects of image feature representation and extraction, multidimensional indexing, and system design, three of the fundamental bases of content-based image retrieval. Furthermore, based on the state-of-the-art technology available now and the demand from real-world applications, open research issues are identified and future promising research directions are suggested. C ○ 1999 Academic Press 1.
Fast Multiresolution Image Querying
, 1995
"... We present a method for searching in an image database using a query image that is similar to the intended target. The query image may be a hand-drawn sketch or a (potentially low-quality) scan of the image to be retrieved. Our searching algorithm makes use of multiresolution wavelet decompositions ..."
Abstract
-
Cited by 314 (4 self)
- Add to MetaCart
(Show Context)
We present a method for searching in an image database using a query image that is similar to the intended target. The query image may be a hand-drawn sketch or a (potentially low-quality) scan of the image to be retrieved. Our searching algorithm makes use of multiresolution wavelet decompositions of the query and database images. The coefficients of these decompositions are distilled into small "signatures" for each image. We introduce an "image querying metric" that operates on these signatures. This metric essentially compares how many significant wavelet coefficients the query has in common with potential targets. The metric includes parameters that can be tuned, using a statistical analysis, to accommodate the kinds of image distortions found in different types of image queries. The resulting algorithm is simple, requires very little storage overhead for the database of signatures, and is fast enough to be performed on a database of 20,000 images at interactive rates (on standard...
Chabot: Retrieval from a Relational Database of Images
, 1995
"... Chabot is a picture retrieval system for a database that will eventually include over 500,000 digitized multi-resolution images. We describe the design and construction of this system which uses the relational database management system POSTGRES for storing and managing the images and their associat ..."
Abstract
-
Cited by 299 (1 self)
- Add to MetaCart
Chabot is a picture retrieval system for a database that will eventually include over 500,000 digitized multi-resolution images. We describe the design and construction of this system which uses the relational database management system POSTGRES for storing and managing the images and their associated textual data. For retrieval, Chabot uses tools provided by POSTGRES, such as representation of complex data types, a rich query language, and extensible types and functions. To implement retrieval from the current collection of 11,643 images, Chabot integrates the use of stored text and other data types with content-based analysis of the images to perform "concept queries". 1. Introduction The Chabot project was initiated at UC Berkeley to study storage and retrieval from a large collection of digitized images. The images we use belong to the State of California Department of Water Resources (DWR), the agency that oversees the system of reservoirs, aqueducts and water pumping stations th...
Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks
, 2003
"... Content-based full-text search is a challenging problem in Peer-toPeer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned. In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSea ..."
Abstract
-
Cited by 242 (7 self)
- Add to MetaCart
(Show Context)
Content-based full-text search is a challenging problem in Peer-toPeer (P2P) systems. Traditional approaches have either been centralized or use flooding to ensure accuracy of the results returned. In this paper, we present pSearch, a decentralized non-flooding P2P information retrieval system. pSearch distributes document indices through the P2P network based on document semantics generated by Latent Semantic Indexing (LSI). The search cost (in terms of different nodes searched and data transmitted) for a given query is thereby reduced, since the indices of semantically related documents are likely to be co-located in the network. We also describe techniques that help distribute the indices more evenly across the nodes, and further reduce the number of nodes accessed using appropriate index distribution as well as using index samples and recently processed queries to guide the search. Experiments show that pSearch can achieve performance comparable to centralized information retrieval systems by searching only a small number of nodes. For a system with 128,000 nodes and 528,543 documents (from news, magazines, etc.), pSearch searches only 19 nodes and transmits only 95.5KB data during the search, whereas the top 15 documents returned by pSearch and LSI have a 91.7% intersection.
Image classification for content-based indexing
- IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2001
"... Grouping images into (semantically) meaningful categories using low-level visual features is a challenging and important problem in content-based image retrieval. Using binary Bayesian classifiers, we attempt to capture high-level concepts from low-level image features under the constraint that the ..."
Abstract
-
Cited by 227 (2 self)
- Add to MetaCart
Grouping images into (semantically) meaningful categories using low-level visual features is a challenging and important problem in content-based image retrieval. Using binary Bayesian classifiers, we attempt to capture high-level concepts from low-level image features under the constraint that the test image does belong to one of the classes. Specifically, we consider the hierarchical classification of vacation images; at the highest level, images are classified as indoor or outdoor; outdoor images are further classified as city or landscape; finally, a subset of landscape images is classified into sunset, forest, and mountain classes. We demonstrate that a small vector quantizer (whose optimal size is selected using a modified MDL criterion) can be used to model the class-conditional densities of the features, required by the Bayesian methodology. The classifiers have been designed and evaluated on a database of 6931 vacation photographs. Our system achieved a classification accuracy of 90.5 % for indoor/outdoor, 95.3 % for city/landscape, 96.6 % for sunset/forest & mountain, and 96 % for forest/mountain classification problems. We further develop a learning method to incrementally train the classifiers as additional data become available. We also show preliminary results for feature reduction using clustering techniques. Our goal is to combine multiple two-class classifiers into a single hierarchical classifier.