Results 1 - 10
of
25
Ranking with uncertain scores
- In ICDE
, 2009
"... Abstract — Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes needs to handle new problems that are fundamentally different from conve ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes needs to handle new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records ’ scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics. We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders. In addition, we design novel sampling techniques to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques in different settings.
Mining Approximate Functional Dependencies as Condensed REPRESENTATIONS OF ASSOCIATION RULES
, 2008
"... Approximate Functional Dependencies (AFD) mined from database relations represent potentially interesting patterns and have proven to be useful for various tasks like feature selection for classification, query optimization and query rewriting. Though the discovery of Functional Dependencies (FDs) f ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Approximate Functional Dependencies (AFD) mined from database relations represent potentially interesting patterns and have proven to be useful for various tasks like feature selection for classification, query optimization and query rewriting. Though the discovery of Functional Dependencies (FDs) from a relational database is a well studied problem, the discovery of AFDs still remains under explored, posing a special set of challenges. Such challenges include defining right interestingness measures for AFDs, employing effective pruning strategies and performing an efficient traversal in the search space of the attribute lattice. This thesis presents a novel perspective for AFDs as condensed representations of association rules; for example, an AFD (Model determines Make) is a condensation of various association rules like, (Model:Accord determines Make:Honda), (Model:Camry determines Make:Toyota). In this regard, this thesis describes two metrics, namely Confidence and Specificity analogous to the standard metrics confidence and support used in association rules respectively. This thesis presents an algorithm called AFDMiner for efficiently mining high quality AFDs by employing effective pruning strategies. AFDMiner performs a
Design by example for SQL table definitions with functional dependencies
- THE VLDB JOURNAL
, 2011
"... ..."
Dynamic Query Forms for Database Queries
- Year 2013. Jayashri M.Jambukar et al, / (IJCSIT) International Journal of Computer Science and Information Technologies
"... Abstract-Modern scientific databases and web databases maintain large and heterogeneous data. These real-world databases contain over hundreds or even thousands of relations and attributes. Traditional predefined query forms are not able to satisfy various ad-hoc queries from users on those databas ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract-Modern scientific databases and web databases maintain large and heterogeneous data. These real-world databases contain over hundreds or even thousands of relations and attributes. Traditional predefined query forms are not able to satisfy various ad-hoc queries from users on those databases. This paper proposes DQF, a novel database query form interface, which is able to dynamically generate query forms. The essence of DQF is to capture a user's preference and rank query form components, assisting him/her to make decisions. The generation of a query form is an iterative process and is guided by the user. At each iteration, the system automatically generates ranking lists of form components and the user then adds the desired form components into the query form. The ranking of form components is based on the captured user preference. A user can also fill the query form and submit queries to view the query result at each iteration. In this way, a query form could be dynamically refined till the user satisfies with the query results. We utilize the expected F-measure for measuring the goodness of a query form. A probabilistic model is developed for estimating the goodness of a query form in DQF. Our experimental evaluation and user study demonstrate the effectiveness and efficiency of the system.
SMARTINT: A System for Answering Queries over Web Databases Using Attribute Dependencies
"... Abstract — Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract — Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem- rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key- Foreign Key relations. While tables do share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SMARTINT is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies(AFDs) to piece together a tree of relevant tables and schemas for joining them. The result tuples produced by our system are able to strike a favorable balance between precision and recall.
SMARTINT: Using Mined Attribute Dependencies to Integrate Fragmented Web Databases
"... Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem- rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key- Foreign Key relations. While tables do share attributes, direct joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. We present a unified approach that supports intelligent retrieval over fragmented web databases by mining and using intertable dependencies. Experiments with the prototype implementation, SMARTINT, show that its retrieval strikes a good balance between precision and recall. See arxiv.org/abs/1101.5334 for a longer version of this paper
Improving retrieval accuracy in web databases using . . .
, 2009
"... This thesis deals with query answering over the Web databases. Since the Web databases are independently and autonomously populated the Web users, it leads to different problems. One such problem is missing key information and rendering the direct join infeasible. Primary key-foreign key (PK-FK) inf ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This thesis deals with query answering over the Web databases. Since the Web databases are independently and autonomously populated the Web users, it leads to different problems. One such problem is missing key information and rendering the direct join infeasible. Primary key-foreign key (PK-FK) information lies at the heart of traditional databases and assists in joining tables. In the recent years, increasing amounts of data are populated by lay users into autonomous Web databases such as Google Base and Amazon SimpleDB. This has lead to an absence of any centralized control over the data being populated. Issues such as missing data, imprecise queries and missing PK-FK information began creeping into Web databases. In this thesis, a system to deal with the problem of missing PK-FK is described. The SMARTINT system contains three important modules- Source Selection, Query Processing and Learning. The key idea underlying the framework is to exploit the mined attribute dependencies present in the data and use them to select a tree of tables which is subsequently expanded to form the result set. The performance and the accuracy of SMARTINT has been thoroughly evaluated over test data crawled from Google Base. The precision and recall of
MODULARIZING DATA MINING: A CASE STUDY FRAMEWORK
"... This paper presents the fundamental concepts underpinning MoLS, a framework for exploring and applying many variations of algorithms for one datamining problem: mining a database relation for Approximate Functional Dependencies (AFDs). An engineering approach to AFD mining suggests a framework which ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This paper presents the fundamental concepts underpinning MoLS, a framework for exploring and applying many variations of algorithms for one datamining problem: mining a database relation for Approximate Functional Dependencies (AFDs). An engineering approach to AFD mining suggests a framework which can be customized with plug-ins, yielding targetability and improved performance. This paper organizes familiar approaches for navigating a search spaces and introduces a new concepts to define and utilize variations of those spaces. 1
Graduate Supervisory Committee:
"... Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider. The scope of my thesis includes both ..."
Abstract
- Add to MetaCart
Ranking is of definitive importance to both usability and profitability of web information systems. While ranking of results is crucial for the accessibility of information to the user, the ranking of online ads increases the profitability of the search provider. The scope of my thesis includes both search and ad ranking. I consider the emerging problem of ranking the deep web data considering trustworthiness and relevance. I address the end-to-end deep web ranking by focusing on: (i) ranking and selection of the deep web databases (ii) topic sensitive ranking of the sources (iii) ranking the result tuples from the selected databases. Especially, assessing the trustworthiness and relevances of results for ranking is hard since the currently used link analysis is inapplicable (since deep web records do not have links). I formulated a method—namely SourceRank—to assess the trustworthiness and relevance of the sources based on the inter-source agreement. Secondly, I extend the SourceRank to consider