Results 1 - 10
of
93
Blinks: Ranked keyword searches on graphs
, 2007
"... Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph nds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supportin ..."
Abstract
-
Cited by 139 (9 self)
- Add to MetaCart
(Show Context)
Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph nds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supporting such queries on general graphs suffer from several drawbacks, e.g., poor worst-case performance, not taking full advantage of indexes, and high memory requirements. To address these problems, we propose BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs. BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search. To reduce the index space, BLINKS partitions a data graph into blocks: The bilevel index stores summary information at the block level to initiate and guide search among blocks, and more detailed information for each block to accelerate search within blocks. Our experiments show that BLINKS offers orders-of-magnitude performance improvement over existing approaches.
Identifying Meaningful Return Information for XML Keyword Search
- In SIGMOD Conference
, 2007
"... Keyword search enables web users to easily access XML data with-out the need to learn a structured query language and to study pos-sibly complex data schemas. Existing work has addressed the prob-lem of selecting qualied data nodes that match keywords and con-necting them in a meaningful way, in the ..."
Abstract
-
Cited by 86 (12 self)
- Add to MetaCart
(Show Context)
Keyword search enables web users to easily access XML data with-out the need to learn a structured query language and to study pos-sibly complex data schemas. Existing work has addressed the prob-lem of selecting qualied data nodes that match keywords and con-necting them in a meaningful way, in the spirit of inferring a where clause in XQuery. However, how to infer the return clause for key-word search is an open problem. To address this challenge, we present an XML keyword search en-gine, XSeek, to infer the semantics of the search and identify return nodes effectively. XSeek recognizes possible entities and attributes inherently represented in the data. It also distinguishes between
SPARK: Top-k keyword query in relational databases
- In Proceedings of SIGMOD
, 2007
"... With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. ..."
Abstract
-
Cited by 75 (3 self)
- Add to MetaCart
(Show Context)
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with previous approaches, our new ranking method is simple yet effective, and agrees with human perceptions. We also study efficient query processing methods for the new ranking method, and propose algorithms that have minimal accesses to the database. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency. Categories and Subject Descriptors
Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data
- In SIGMOD
, 2008
"... Conventional keyword search engines are restricted to a given data model and cannot easily adapt to unstructured, semistructured or structured data. In this paper, we propose an efficient and adaptive keyword search method, called EASE, for indexing and querying large collections of heterogenous dat ..."
Abstract
-
Cited by 68 (12 self)
- Add to MetaCart
(Show Context)
Conventional keyword search engines are restricted to a given data model and cannot easily adapt to unstructured, semistructured or structured data. In this paper, we propose an efficient and adaptive keyword search method, called EASE, for indexing and querying large collections of heterogenous data. To achieve high efficiency in processing keyword queries, we first model unstructured, semi-structured and structured data as graphs, and then summarize the graphs and construct graph indices instead of using traditional inverted indices. We propose an extended inverted index to facilitate keyword-based search, and present a novel ranking mechanism for enhancing search effectiveness. We have conducted an extensive experimental study using real datasets, and the results show that EASE achieves both high search efficiency and high accuracy, and outperforms the existing approaches significantly.
Efficient keyword search across heterogeneous relational databases
- In ICDE
, 2007
"... Keyword search is a familiar and potentially effective way to find information of interest that is “locked ” inside relational databases. Current work has generally assumed that answers for a keyword query reside within a single database. Many practical settings, however, require that we combine tup ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
(Show Context)
Keyword search is a familiar and potentially effective way to find information of interest that is “locked ” inside relational databases. Current work has generally assumed that answers for a keyword query reside within a single database. Many practical settings, however, require that we combine tuples from multiple databases to obtain the desired answers. Such databases are often autonomous and heterogeneous in their schemas and data. This paper describes Kite, a solution to the keyword-search problem over heterogeneous relational databases. Kite combines schema matching and structure discovery techniques to find approximate foreign-key joins across heterogeneous databases. Such joins are critical for producing query results that span multiple databases and relations. Kite then exploits the joins – discovered automatically across the databases – to enable fast and effective querying over the distributed data. Our extensive experiments over real-world data sets show that (1) our query processing algorithms are efficient and (2) our approach manages to produce high-quality query results spanning multiple heterogeneous databases, with no need for human reconciliation of the different databases. 1
Efficient Type-Ahead Search on Relational Data: a TASTIER Approach
, 2009
"... Existing keyword-search systems in relational databases require users to submit a complete query to compute answers. Often users feel “left in the dark” when they have limited knowledge about the data, and have to use a try-and-see method to modify queries and find answers. In this paper we propose ..."
Abstract
-
Cited by 36 (14 self)
- Add to MetaCart
Existing keyword-search systems in relational databases require users to submit a complete query to compute answers. Often users feel “left in the dark” when they have limited knowledge about the data, and have to use a try-and-see method to modify queries and find answers. In this paper we propose a novel approach to keyword search in the relational world, called Tastier. A Tastier system can bring instant gratification to users by supporting type-ahead search, which finds answers “on the fly” as the user types in query keywords. A main challenge is how to achieve a high interactive speed for large amounts of data in multiple tables, so that a query can be answered efficiently within milliseconds. We propose efficient index structures and algorithms for finding relevant answers on-the-fly by joining tuples in the database. We devise a partition-based method to improve query performance by grouping relevant tuples and pruning irrelevant tuples efficiently. We also develop a technique to answer a query efficiently by predicting highly relevant complete queries for the user. We have conducted a thorough experimental evaluation of the proposed techniques on real data sets to demonstrate the efficiency and practicality of this new search paradigm.
Keyword Search on Structured and Semi-Structured Data
"... Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supp ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
(Show Context)
Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon
SQAK: doing more with keywords
- in SIGMOD, 2008
"... Today’s enterprise databases are large and complex, often relating hundreds of entities. Enabling ordinary users to query such databases and derive value from them has been of great interest in database research. Today, keyword search over relational databases allows users to find pieces of informat ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Today’s enterprise databases are large and complex, often relating hundreds of entities. Enabling ordinary users to query such databases and derive value from them has been of great interest in database research. Today, keyword search over relational databases allows users to find pieces of information without having to write complicated SQL queries. However, in order to compute even simple aggregates, a user is required to write a SQL statement and can no longer use simple keywords. This not only requires the ordinary user to learn SQL, but also to learn the schema of the complex database in detail in order to correctly construct the required query. This greatly limits the options of the user who wishes to examine a database in more depth. As a solution to this problem, we propose a framework called SQAK 1 (SQL Aggregates using Keywords) that enables users to pose aggregate queries using simple keywords with little or no knowledge of the schema. SQAK provides a novel and exciting way to trade-off some of the expressive power of SQL in exchange for the ability to express a large class of aggregate queries using simple keywords. SQAK accomplishes this by taking advantage of the data in the database and the schema (tables, attributes, keys, and referential constraints). SQAK does not require any changes to the database engine and can be used with any existing database. We demonstrate using several experiments that SQAK is effective and can be an enormously powerful tool for ordinary users.
Keyword Search on External Memory Data Graphs
, 2008
"... Keyword search on graph structured data has attracted a lot of attention in recent years. Graphs are a natural “lowest common denominator” representation which can combine relational, XML and HTML data. Responses to keyword queries are usually modeled as trees that connect nodes matching the keyword ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
(Show Context)
Keyword search on graph structured data has attracted a lot of attention in recent years. Graphs are a natural “lowest common denominator” representation which can combine relational, XML and HTML data. Responses to keyword queries are usually modeled as trees that connect nodes matching the keywords. In this paper we address the problem of keyword search on graphs that may be significantly larger than memory. We propose a graph representation technique that combines a condensed version of the graph (the “supernode graph”) which is always memory resident, along with whatever parts of the detailed graph are in a cache, to form a multi-granular graph representation. We propose two alternative approaches which extend existing search algorithms to exploit multigranular graphs; both approaches attempt to minimize IO by directing search towards areas of the graph that are likely to give good results. We compare our algorithms with a virtual memory approach on several real data sets. Our experimental results show significant benefits in terms of reduction in IO due to our algorithms.
Effective keyword-based selection of relational databases
- In Proceedings of SIGMOD
, 2007
"... over World Wide Web has fueled the demand for incorporating keyword-based search over structured databases. However, most of the current research work focuses on keywordbased searching over a single structured data source. With the growing interest in distributed databases and service oriented archi ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
(Show Context)
over World Wide Web has fueled the demand for incorporating keyword-based search over structured databases. However, most of the current research work focuses on keywordbased searching over a single structured data source. With the growing interest in distributed databases and service oriented architecture over the Internet, it is important to extend such a capability over multiple structured data sources. One of the most important problems for enabling such a query facility is to be able to select the most useful data sources relevant to the keyword query. Traditional database summary techniques used for selecting unstructured data sources developed in IR literature are inadequate for our problem, as they do not capture the structure of the data sources. In this paper, we study the database selection problem for relational data sources, and propose a method that effectively summarizes the relationships between keywords in a relational database based on its structure. We develop effective ranking methods based on the keyword relationship summaries in order to select the most useful databases for a given keyword query. We have implemented our system on PlanetLab. In that environment we use extensive experiments with real datasets to demonstrate the effectiveness of our proposed summarization method.