Results 11  20
of
147
A systematic approach to the assessment of fuzzy association rules. Data Mining and Knowledge Discovery
, 2006
"... In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such ru ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
(Show Context)
In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such rules by means of appropriate quality measures is less obvious. Particularly, it assumes an understanding of the semantic meaning of a fuzzy rule. This aspect has been ignored by most existing proposals, which must therefore be considered as adhoc to some extent. In this paper, we develop a systematic approach to the assessment of fuzzy association rules. To this end, we proceed from the idea of partitioning the data stored in a database into examples of a given rule, counterexamples, and irrelevant data. Evaluation measures are then derived from the cardinalities of the corresponding subsets. The problem of finding a proper partition has a rather obvious solution for standard association rules but becomes less trivial in the fuzzy case. Our results not only provide a sound justification for commonly used measures but also suggest a means for constructing meaningful alternatives. 1.
Interestingnessbased interval merger for numeric association rules
 Proceedings of the Fourth International Conference on Knowledge Discovery & Data Mining
, 1998
"... We present an algorithm for mining association rules from relational tables containing numeric and categorical attributes. The approach is to merge adjacentintervals of numeric values, in a bottomup manner, on the basis of maximizing the interestingness of a set of association rules. A modi cation ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
We present an algorithm for mining association rules from relational tables containing numeric and categorical attributes. The approach is to merge adjacentintervals of numeric values, in a bottomup manner, on the basis of maximizing the interestingness of a set of association rules. A modi cation of the Btree is adopted for performing this task e ciently. The algorithm takes O(kN) I/O time, where k is the number of attributes and N is the numberofrows in the table. We evaluate the e ectiveness of producing good intervals.
Handling Very Large Numbers of Association Rules in the Analysis of Microarray Data
 In Proceedings of SIGKDD’02
"... The problem of analyzing microarray data became one of important topics in bioinformatics over the past several years, and different data mining techniques have been proposed for the analysis of such data. In this paper, we propose to use association rule discovery methods for determining associatio ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
(Show Context)
The problem of analyzing microarray data became one of important topics in bioinformatics over the past several years, and different data mining techniques have been proposed for the analysis of such data. In this paper, we propose to use association rule discovery methods for determining associations among expression levels of different genes. One of the main problems related to the discovery of these associations is the scalability issue. Microarrays usually contain very large numbers of genes that are sometimes measured in 10,000s. Therefore, analysis of such data can generate a very large number of associations that can often be measured in millions. The paper addresses this problem by presenting a method that enables biologists to evaluate these very large numbers of discovered association rules during the postanalysis stage of the data mining process. This is achieved by providing several rule evaluation operators, including rule grouping, filtering, browsing, and data inspection operators, that allow biologists to validate multiple individual gene regulation patterns at a time. By iteratively applying these operators, biologists can explore a significant part of all the initially generated rules in an acceptable period of time and thus answer biological questions that are of a particular interest to him or her. To validate our method, we tested our system on the microarray data pertaining to the studies of environmental hazards and their influence of gene expression processes. As a result, we managed to answer several questions that were of interest to the biologists that had collected this data.
Mining negative association rules
 In Seventh International Symposium on Computers and Communications
, 2002
"... The focus of this paper is the discovery of negative association rules. Such association rules are complementary to the sorts of association rules most often encountered in literatures and have the forms of X!:Y or:X! Y. We present a rule discovery algorithm that finds a useful subset of valid nega ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
The focus of this paper is the discovery of negative association rules. Such association rules are complementary to the sorts of association rules most often encountered in literatures and have the forms of X!:Y or:X! Y. We present a rule discovery algorithm that finds a useful subset of valid negative rules. In generating negative rules, we employ a hierarchical graphstructured taxonomy of domain terms. A taxonomy containing classification information records the similarity between items. Given the taxonomy, sibling rules, duplicated from positive rules with a couple items replaced, are derived together with their estimated confidence. Those sibling rules that bring big confidence deviation are considered candidate negative rules. Our study shows that negative association rules can be discovered efficiently from large database. 1.
OnLine Analytical Mining of Association Rules
, 1998
"... With wide applications of computers and automated data collection tools, massive amounts of data have been continuously collected and stored in databases, which creates an imminent need and great opportunities for mining interesting knowledge from data. Association rule mining is one kind of data mi ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
With wide applications of computers and automated data collection tools, massive amounts of data have been continuously collected and stored in databases, which creates an imminent need and great opportunities for mining interesting knowledge from data. Association rule mining is one kind of data mining techniques which discovers strong association or correlation relationships among data. The discovered rules may help market basket or crosssales analysis, decision making, and business management. In this thesis, we propose and develop an interesting association rule mining approach, called online analytical mining of association rules, which integrates the recently developed OLAP (online analytical processing) technology with some efficient association mining methods. It leads to flexible, multidimensional, multilevel association rule mining with high performance. Several algorithms are developed based on this approach for mining various kinds of associations in multidimensional ...
Mining association rules from xml data
 In Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
, 2002
"... Abstract. The eXtensible Markup Language (XML) rapidly emerged as a standard for representing and exchanging information. The fastgrowing amount of available XML data sets a pressing need for languages and tools to manage collections of XML documents, as well as to mine interesting information out o ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Abstract. The eXtensible Markup Language (XML) rapidly emerged as a standard for representing and exchanging information. The fastgrowing amount of available XML data sets a pressing need for languages and tools to manage collections of XML documents, as well as to mine interesting information out of them. Although the data mining community has not yet rushed into the use of XML, there have been some proposals to exploit XML. However, in practice these proposals mainly rely on more or less traditional relational databases with an XML interface. In this paper, we introduce association rules from native XML documents and discuss the new challenges and opportunities that this topic sets to the data mining community. More specifically, we introduce an extension of XQuery for mining association rules. This extension is used throughout the paper to better define association rule mining within XML and to emphasize its implications in the XML context. 1
Maximum Independent Set of Rectangles
"... We study the Maximum Independent Set of Rectangles (MISR) problem: given a collection R of n axisparallel rectangles, find a maximumcardinality subset of disjoint rectangles. MISR is a special case of the classical Maximum Independent Set problem, where the input is restricted to intersection grap ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
We study the Maximum Independent Set of Rectangles (MISR) problem: given a collection R of n axisparallel rectangles, find a maximumcardinality subset of disjoint rectangles. MISR is a special case of the classical Maximum Independent Set problem, where the input is restricted to intersection graphs of axisparallel rectangles. Due to its many applications, ranging from map labeling to data mining, MISR has received a significant amount of attention from various research communities. Since the problem is NPhard, the main focus has been on the design of approximation algorithms. Several groups of researches have independently suggested O(log n)approximation algorithms for MISR, and this remained the best currently known approximation factor for the problem. The main result of our paper is an O(log log n)approximation algorithm for MISR. Our algorithm combines existing approaches for solving special cases of the problem, in which the input set of rectangles is restricted to containing specific intersection types, with new insights into the combinatorial structure of sets of intersecting rectangles in the plane. We also consider a generalization of MISR to higher dimensions, where rectangles are replaced by ddimensional hyperrectangles. Our results for MISR imply an O((log n) d−2 log log n)approximation algorithm for this problem, improving upon the best previously known O((log n) d−1)approximation.
Mining frequent itemsets without support threshold: With and without item constraints
, 2004
"... Abstract—In classical association rules mining, a minimum support threshold is assumed to be available for mining frequent itemsets. However, setting such a threshold is typically hard. In this paper, we handle a more practical problem; roughly speaking, it is to mine N kitemsets with the highest s ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In classical association rules mining, a minimum support threshold is assumed to be available for mining frequent itemsets. However, setting such a threshold is typically hard. In this paper, we handle a more practical problem; roughly speaking, it is to mine N kitemsets with the highest supports for k up to a certain kmax value. We call the results the Nmost interesting itemsets. Generally, it is more straightforward for users to determine N and kmax. We propose two new algorithms, LOOPBACK and BOMO. Experiments show that our methods outperform the previously proposed ItemsetLoop algorithm, and the performance of BOMO can be an order of magnitude better than the original FPtree algorithm, even with the assumption of an optimally chosen support threshold. We also propose the mining of “Nmost interesting kitemsets with item constraints. ” This allows user to specify different degrees of interestingness for different itemsets. Experiments show that our proposed Double FPtrees algorithm, which is based on BOMO, is highly efficient in solving this problem. Index Terms—Association rules, Nmost interesting itemsets, FPtree, item constraints.
Clustering in a HighDimensional Space Using Hypergraph Models
, 1987
"... Clustering of data in a large dimension space is of a great interest in many data mining applications. Most of the traditional algorithms such as Kmeans or AutoClass fail to produce meaningful clusters in such data sets even when they are used with well known dimensionality reduction techniques suc ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Clustering of data in a large dimension space is of a great interest in many data mining applications. Most of the traditional algorithms such as Kmeans or AutoClass fail to produce meaningful clusters in such data sets even when they are used with well known dimensionality reduction techniques such as Principal Component Analysis and Latent Semantic Indexing. In this paper, we propose a method for clustering of data in a high dimensional space based on a hypergraph model. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents a relationship (affinity) among subsets of data and the weight of the hyperedge reflects the strength of this affinity. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. We present results of experiments on three different data sets: S&P500 stock data for the period of 19941996, protein coding data, and Web document data. Wherever applicable, we compared our results with those of AutoClass and Kmeans clustering algorithm on original data as well as on the reduced dimensionality data obtained via Principal Component Analysis or Latent Semantic Indexing scheme. These experiments demonstrate that our approach is applicable and effective in a wide range