Results 1  10
of
13
Mining rankcorrelated sets of numerical attributes
 In KDD’06
, 2006
"... We study the mining of interesting patterns in the presence of numerical attributes. Instead of the usual discretization methods, we propose the use of rank based measures to score the similarity of sets of numerical attributes. New support measures for numerical data are introduced, based on extens ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
(Show Context)
We study the mining of interesting patterns in the presence of numerical attributes. Instead of the usual discretization methods, we propose the use of rank based measures to score the similarity of sets of numerical attributes. New support measures for numerical data are introduced, based on extensions of Kendall’s tau, and Spearman’s Footrule and rho. We show how these support measures are related. Furthermore, we introduce a novel type of pattern combining numerical and categorical attributes. We give efficient algorithms to find all frequent patterns for the proposed support measures, and evaluate their performance on reallife datasets.
From Local Patterns to Global Models: The LeGo Approach to Data Mining
"... Abstract. In this paper we present LeGo, a generic framework that utilizes existing local pattern mining techniques for global modeling in a variety of diverse data mining tasks. In the spirit of well known KDD process models, our work identifies different phases within the data mining step, each of ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we present LeGo, a generic framework that utilizes existing local pattern mining techniques for global modeling in a variety of diverse data mining tasks. In the spirit of well known KDD process models, our work identifies different phases within the data mining step, each of which is formulated in terms of different formal constraints. It starts with a phase of mining patterns that are individually promising. Later phases establish the context given by the global data mining task by selecting groups of diverse and highly informative patterns, which are finally combined to one or more global models that address the overall data mining task(s). The paper discusses the connection to various learning techniques, and illustrates that our framework is broad enough to cover and leverage frequent pattern mining, subgroup discovery, pattern teams, multiview learning, and several other popular algorithms. The Safarii learning toolbox serves as a proofofconcept of its high potential for practical data mining applications. Finally, we point out several challenging open research questions that naturally emerge in a constraintbased localtoglobal pattern mining, selection, and combination framework. 1
MultiLabel Classification with Label Constraints
"... Abstract. We extend the multilabel classification setting with constraints on labels. This leads to two new machine learning tasks: First, the label constraints must be properly integrated into the classification process to improve its performance and second, we can try to automatically derive usef ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We extend the multilabel classification setting with constraints on labels. This leads to two new machine learning tasks: First, the label constraints must be properly integrated into the classification process to improve its performance and second, we can try to automatically derive useful constraints from data. In this paper, we experiment with two constraintbased correction approaches as postprocessing step within the ranking by pairwise comparison (RPC)framework. In addition, association rule learning is considered for the task of label constraints learning. We report on the current status of our work, together with evaluations on synthetic datasets and two realworld datasets. 1
Efficient Pattern Mining of Uncertain Data with Sampling
"... Abstract. Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic solutions. In the case of uncertain data, however, several new techniques have been proposed. Unfortunately, these proposals often suffer when a lot of items occur with many different probabi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic solutions. In the case of uncertain data, however, several new techniques have been proposed. Unfortunately, these proposals often suffer when a lot of items occur with many different probabilities. Here we propose an approach based on sampling by instantiating “possible worlds ” of the uncertain data, on which we subsequently run optimized frequent itemset mining algorithms. As such we gain efficiency at a surprisingly low loss in accuracy. These is confirmed by a statistical and an empirical evaluation on real and synthetic data. 1
Itemset Frequency Satisfiability: Complexity and Axiomatization
, 2007
"... Computing frequent itemsets is one of the most prominent problems in data mining. We study the following related problem, called FREQSAT, in depth: given some itemsetinterval pairs, does there exist a database such that for every pair the frequency of the itemset falls into the interval? This probl ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Computing frequent itemsets is one of the most prominent problems in data mining. We study the following related problem, called FREQSAT, in depth: given some itemsetinterval pairs, does there exist a database such that for every pair the frequency of the itemset falls into the interval? This problem is shown to be NPcomplete. The problem is then further extended to include arbitrary Boolean expressions over items and conditional frequency expressions in the form of association rules. We also show that, unless P equals NP, the related function problem—find the best interval for an itemset under some frequency constraints—cannot be approximated efficiently. Furthermore, it is shown that FREQSAT is recursively axiomatizable, but that there cannot exist an axiomatization of finite arity.
Itemset support queries using frequent itemsets and their condensed representations
 In: Proceedings of the 9th International Conference Discovery Science (DS 2006), SpringerVerlag, LNCS
, 2006
"... Abstract. The purpose of this paper is twofold: First, we give efficient algorithms for answering itemset support queries for collections of itemsets from various representations of the frequency information. As index structures we use itemset tries of transaction databases, frequent itemsets and t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The purpose of this paper is twofold: First, we give efficient algorithms for answering itemset support queries for collections of itemsets from various representations of the frequency information. As index structures we use itemset tries of transaction databases, frequent itemsets and their condensed representations. Second, we evaluate the usefulness of condensed representations of frequent itemsets to answer itemset support queries using the proposed query algorithms and index structures. We study analytically the worstcase time complexities of querying condensed representations and evaluate experimentally the query efficiency with random itemset queries to several benchmark transaction databases. 1
1Sensing trending topics in Twitter
"... Abstract—Online social and news media generate rich and timely information about realworld events of all kinds. However, the huge amount of data available, along with the breadth of the user base, requires a substantial effort of information filtering to successfully drill down to relevant topics a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Online social and news media generate rich and timely information about realworld events of all kinds. However, the huge amount of data available, along with the breadth of the user base, requires a substantial effort of information filtering to successfully drill down to relevant topics and events. Trending topic detection is therefore a fundamental building block to monitor and summarize information originating from social sources. There are a wide variety of methods and variables and they greatly affect the quality of results. We compare six topic detection methods on three Twitter datasets related to major events, which differ in their time scale and topic churn rate. We observe how the nature of the event considered, the volume of activity over time, the sampling procedure and the preprocessing of the data all greatly affect the quality of detected topics, which also depends on the type of detection method used. We find that standard natural language processing techniques can perform well for social streams on very focused topics, but novel techniques designed to mine the temporal distribution of concepts are needed to handle more heterogeneous streams containing multiple stories evolving in parallel. One of the novel topic detection methods we propose, based on ngrams cooccurrence and dfidft topic ranking, consistently achieves the best performance across all these conditions, thus being more reliable than other stateofthe art techniques.
Itemset Frequency Satisfiability: Complexity and Axiomatization 1
"... Computing frequent itemsets is one of the most prominent problems in data mining. We study the following related problem, called FREQSAT, in depth: given some itemsetinterval pairs, does there exist a database such that for every pair the frequency of the itemset falls into the interval? This probl ..."
Abstract
 Add to MetaCart
Computing frequent itemsets is one of the most prominent problems in data mining. We study the following related problem, called FREQSAT, in depth: given some itemsetinterval pairs, does there exist a database such that for every pair the frequency of the itemset falls into the interval? This problem is shown to be NPcomplete. The problem is then further extended to include arbitrary Boolean expressions over items and conditional frequency expressions in the form of association rules. We also show that, unless P equals NP, the related function problem—find the best interval for an itemset under some frequency constraints—cannot be approximated efficiently. Furthermore, it is shown that FREQSAT is recursively axiomatizable, but that there cannot exist an axiomatization of finite arity.
DOI: 10.1016/j.eswa.2012.08.039 Open Archive Toulouse Archive Ouverte (OATAO)
, 2012
"... OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. ..."
Abstract
 Add to MetaCart
(Show Context)
OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible.