Results 1 -
9 of
9
Using Trees to Depict a Forest
- PVLDB
"... When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the “best ” results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typi ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the “best ” results appear first. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach to the first page is not to try to show the best results, but instead to help users learn what is available in the whole result set and direct them to finding what they need. In this paper, we demonstrate through a user study that a page comprising one representative from each of k clusters (generated through a k-medoid clustering) is superior to multiple alternative candidate methods for generating representatives of a data set. Users often refine query specifications based on returned results. Traditional clustering may lead to completely new representatives after a refinement step. Furthermore, clustering can be computationally expensive. We propose a treebased method for efficiently generating the representatives, and smoothly adapting them with query refinement. Experiments show that our algorithms outperform the stateof-the-art in both result quality and efficiency.
BibNetMiner: Mining Bibliographic Information Networks ∗
"... Online bibliographic databases, such as DBLP in computer science and PubMed in medical sciences, contain abundant information about research publications in different fields. Each such database forms a gigantic information network (hence called BibNet), connecting in complex ways research papers, au ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Online bibliographic databases, such as DBLP in computer science and PubMed in medical sciences, contain abundant information about research publications in different fields. Each such database forms a gigantic information network (hence called BibNet), connecting in complex ways research papers, authors, conferences/journals, and possibly citation information as well, and provides a fertile land for information network analysis. Our BibNetMiner is designed for sophisticated information network mining on such bibliographic databases. In this demo, we will take the DBLP database as an example, demonstrate several attractive functions of BibNetMiner, including clustering, ranking and profiling of conferences and authors based on the research subfields. A user-friendly, visualization-enhanced interface will be provided to facilitate interactive exploration of a bibliographic database. This project will serve as an example to demonstrate the power of links in information network mining. Since the dataset is large and the network is heterogeneous, such a study will benefit the research on the analysis of massive heterogeneous information networks.
Skimmer: Rapid Scrolling of Relational Query Results
, 2012
"... A relational database often yields a large set of tuples as the result of a query. Users browse this result set to find the information they require. If the result set is large, there may be many pages of data to browse. Since results comprise tuples of alphanumeric values that have few visual marke ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
A relational database often yields a large set of tuples as the result of a query. Users browse this result set to find the information they require. If the result set is large, there may be many pages of data to browse. Since results comprise tuples of alphanumeric values that have few visual markers, it is hard to browse the data quickly, even if it is sorted. In this paper, we describe the design of a system for browsing relational data by scrolling through it at a high speed. Rather than showing the user a fast-changing blur, the system presents the user with a small number of representative tuples. Representative tuples are selected to provide a “good impression ” of the query result. We show that the information loss to the user is limited, even at high scrolling speeds, and that our algorithms can pick good representatives fast enough to provide for real-time, high-speed scrolling over large datasets.
Datalens: making a good first impression
- In SIGMOD ’09
"... When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the “best ” results appear first. This approach is well-suited for information retrieval, and for some database queries, such as similarity ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
When a database query has a large number of results, the user can only be shown one page of results at a time. One popular approach is to rank results such that the “best ” results appear first. This approach is well-suited for information retrieval, and for some database queries, such as similarity queries or under-specified (or keyword) queries with known (or guessable) user preferences. However, standard database query results comprise a set of tuples, with no associated ranking. It is typical to allow users the ability to sort results on selected attributes, but no actual ranking is defined. An alternative approach is not to try to show the estimated best results on the first page, but instead to help users learn what is available in the whole result set and direct them to finding what they need. We present DataLens, a framework that: i) generates the most representative data points to display on the first page without sorting or ranking, ii) allows users to drill-down to more similar items in a hierarchical fashion, and iii) dynamically adjusts the representatives based on the user’s new query conditions. To the best of our knowledge, DataLens is the first to allow hierarchical database result browsing and searching at the same time.
Research Challenges for Data Mining in Science and Engineering ∗
"... With the rapid development of computer and information technology in the last several decades, an enormous amount of data in science and engineering has been and will continuously be generated in massive scale, either being stored in gigantic storage devices or flowing into and out of the system in ..."
Abstract
- Add to MetaCart
(Show Context)
With the rapid development of computer and information technology in the last several decades, an enormous amount of data in science and engineering has been and will continuously be generated in massive scale, either being stored in gigantic storage devices or flowing into and out of the system in the form of data streams. Moreover, such data has been made widely available, e.g., via the Internet. Such tremendous amount of data, in the order of tera- to petabytes, has fundamentally changed science and engineering, transforming many disciplines from data-poor to increasingly data-rich, and calling for new, data-intensive methods to conduct research in science and engineering. In this paper, we discuss the research challenges in science and engineering, from the data mining perspective, with a focus on the following issues: (1) information network analysis, (2) discovery, usage, and understanding of patterns and knowledge, (3) stream data mining, (4) mining moving object data, RFID data, and data from sensor networks, (5) spatiotemporal and multimedia data mining, (6) mining text, Web, and other unstructured data, (7) data cube-oriented multidimensional online analytical mining, (8) visual data mining, and (9) data mining by integration of sophisticated scientific and engineering domain knowledge.
A FRAMEWORK FOR PROMOTION ANALYSIS IN MULTI-DIMENSIONAL SPACE
, 2010
"... Promotion is one of the most important elements in marketing. It is often desirable to find merit in an object (e.g., product, person, organization, or other business entity) and promote it in an appropriate community confidently. In this thesis, we motivate and discuss a novel class of data mining ..."
Abstract
- Add to MetaCart
Promotion is one of the most important elements in marketing. It is often desirable to find merit in an object (e.g., product, person, organization, or other business entity) and promote it in an appropriate community confidently. In this thesis, we motivate and discuss a novel class of data mining problems, called promotion analysis, for promoting a given object in a multi-dimensional space by leveraging object ranking information. The key observation is that most objects may not be highly ranked in the global space, where all objects are compared by all aspects; in contrast, there often exist interesting and meaningful local spaces in which the given object becomes prominent. Therefore, our general goal is to break down the data space and discover the most interesting local spaces in an effective and efficient way. We formally present the promotion analysis problem and formulate its variants and related notions. The promotion analysis problem is highly practical and useful in a wide spectrum of decision support applications. Typical application examples include merit discovery, product positioning and customer targeting, object profiling and summarization, identification of interesting features, and explorative search of objects. In fact, these applications are not new as they have been
Chapter 1 Research Challenges for Data Mining in Science and Engineering
"... With the rapid development of computer and information technology in the last several decades, an enormous amount of data in science and engineering has been and will continuously be generated in massive scale, either being stored in gigantic storage devices or flowing into and out of the system in ..."
Abstract
- Add to MetaCart
With the rapid development of computer and information technology in the last several decades, an enormous amount of data in science and engineering has been and will continuously be generated in massive scale, either being stored in gigantic storage devices or flowing into and out of the system in the form of data streams. Moreover, such data has been made widely available, e.g., via the Internet. Such tremendous amount of data, in the order of tera- to peta-bytes, has fundamentally changed science and engineering, transforming many disciplines from data-poor to increasingly data-rich, and calling for new, data-intensive methods to conduct research in science and engineering. In this paper, we discuss the research challenges in science and engineering, from the data mining perspective, with a focus on the following issues: (1) information network analysis, (2) discovery, usage, and understanding of patterns and knowledge, (3) stream data mining, (4) mining moving object data, RFID data, and data from sensor networks, (5) spatiotemporal and multimedia data mining, (6) mining text, Web, and other unstructured data, (7) data cube-oriented multidimensional online analytical mining, (8) visual data mining, and (9) data mining by integration of sophisticated scientific and engineering domain knowledge.
Subspace Discovery for Promotion: A Cell Clustering Approach ⋆
"... Abstract. The promotion analysis problem has been proposed in [16], where ranking-based promotion query processing techniques are studied to effectively and efficiently promote a given object, such as a product, by exploring ranked answers. To be more specific, in a multidimensional data set, our go ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. The promotion analysis problem has been proposed in [16], where ranking-based promotion query processing techniques are studied to effectively and efficiently promote a given object, such as a product, by exploring ranked answers. To be more specific, in a multidimensional data set, our goal is to discover interesting subspaces in which the object is ranked high. In this paper, we extend the previously proposed promotion cube techniques and develop a cell clustering approach that is able to further achieve better tradeoff between offline materialization and online query processing. We formally formulate our problem and present a solution to it. Our empirical evaluation on both synthetic and real data sets show that the proposed technique can greatly speedup query processing with respect to baseline implementations. 1