Results 1  10
of
23
Boolean Tensor Factorizations
"... Abstract—Tensors are multiway generalizations of matrices, and similarly to matrices, they can also be factorized, that is, represented (approximately) as a product of factors. These factors are typicaly either all matrices or a mixture of matrices and tensors. With the widespread adoption of matri ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Abstract—Tensors are multiway generalizations of matrices, and similarly to matrices, they can also be factorized, that is, represented (approximately) as a product of factors. These factors are typicaly either all matrices or a mixture of matrices and tensors. With the widespread adoption of matrix factorization techniques in data mining, also tensor factroziations have started to gain attention. In this paper we study the Boolean tensor factorizations. We assume that the data is binary multiway data, and we want to factorize it to binary factors using Boolean arithmetic (i.e. defining that 1+1 = 1). Boolean tensor factorizations are, therefore, natural generalization of the Boolean matrix factorizations. We will study the theory of Boolean tensor factorizations and show that at least some of the benefits Boolean matrix factorizations have over normal matrix factorizations carry over to the tensor data. We will also present algorithms for Boolean variations of CP and Tucker decompositions, the two mostcommon types of tensor factorizations. With experimentation done with synthetic and realworld data, we show that Boolean tensor factorizations are a viable alternative when the data is naturally binary. KeywordsTensor factorization; CP factorization; Tucker factorization; Boolean tensor factorization; Boolean matrix factorization I.
Discovering Relevant CrossGraph Cliques in Dynamic Networks
"... Abstract. Several algorithms, namely CubeMiner, Trias, andDataPeeler, have been recently proposed to mine closed patterns in ternary relations. We consider here the specific context where a ternary relation denotes the value of a graph adjacency matrix at different timestamps. Then,wediscussthecons ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Several algorithms, namely CubeMiner, Trias, andDataPeeler, have been recently proposed to mine closed patterns in ternary relations. We consider here the specific context where a ternary relation denotes the value of a graph adjacency matrix at different timestamps. Then,wediscusstheconstraintbasedextractionofpatternsinsuchdynamic graphs. We formalize the concept of δcontiguous closed 3clique and we discuss the availability of a complete algorithm for mining them. It is based on a specialization of the enumeration strategy implemented in DataPeeler. Indeed, clique relevancy can be specified by means of a conjunction of constraints which can be efficiently exploited. The addedvalue of our strategy is assessed on a real dataset about a public bicycle renting system. The raw data encode the relationships between the renting stations during one year. The extracted δcontiguous closed 3cliques are shown to be consistent with our domain knowledge on the considered city. 1
Mining Biclusters of Similar Values with Triadic Concept Analysis
"... Abstract. Biclustering numerical data became a popular datamining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute datatable. So called bi ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Biclustering numerical data became a popular datamining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute datatable. So called biclusters of similar values can be thought as maximal subtables with close values. Only few methods address a complete, correct and non redundant enumeration of such patterns, which is a wellknown intractable problem, while no formal framework exists. In this paper, we introduce important links between biclustering and formal concept analysis. More specifically, we originally show that Triadic Concept Analysis (TCA), provides a nice mathematical framework for biclustering. Interestingly, existing algorithms of TCA, that usually apply on binary data, can be used (directly or with slight modifications) after a preprocessing step for extracting maximal biclusters of similar values.
From Triadic FCA to Triclustering: Experimental Comparison of Some Triclustering Algorithms
"... Abstract. In this paper we show the results of the experimental comparison of five triclustering algorithms on realworld and synthetic data wrt. resource efficiency and 4 quality measures. One of the algorithms, the OACtriclustering based on prime operators, is presented first time in this paper. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we show the results of the experimental comparison of five triclustering algorithms on realworld and synthetic data wrt. resource efficiency and 4 quality measures. One of the algorithms, the OACtriclustering based on prime operators, is presented first time in this paper. Interpretation of results for realworld datasets is provided.
Discovering Descriptive Rules in Relational Dynamic Graphs
"... Graph mining methods have become quite popular and a timely challenge is to discover dynamic properties in evolving graphs or networks. We consider the socalled relational dynamic oriented graphs that can be encoded as nary relations with n ≥ 3 and thus represented by Boolean tensors. Two dimensio ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Graph mining methods have become quite popular and a timely challenge is to discover dynamic properties in evolving graphs or networks. We consider the socalled relational dynamic oriented graphs that can be encoded as nary relations with n ≥ 3 and thus represented by Boolean tensors. Two dimensions are used to encode the graph adjacency matrices and at least one other denotes time. We design the pattern domain of multidimensional association rules, i.e., non trivial extensions of the popular association rules that may involve subsets of any dimensions in their antecedents and their consequents. First, we design new objective interestingness measures for such rules and it leads to different approaches for measuring the rule confidence. Second, we must compute collections of a priori interesting rules. It is considered here as a postprocessing of the closed patterns that can be extracted efficiently from Boolean tensors. We propose optimizations to support both rule extraction scalability and non redundancy. We illustrate the addedvalue of this new data mining task to discover patterns from a reallife relational dynamic graph.
Multidimensional Association Rules in Boolean Tensors ∗
"... Popular data mining methods support knowledge discovery from patterns that hold in binary relations. We study the generalization of association rule mining within arbitrary nary relations and thus Boolean tensorsinsteadofBooleanmatrices. Indeed, manydatasets of interest correspond to relations whos ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Popular data mining methods support knowledge discovery from patterns that hold in binary relations. We study the generalization of association rule mining within arbitrary nary relations and thus Boolean tensorsinsteadofBooleanmatrices. Indeed, manydatasets of interest correspond to relations whose number of dimensions is greater or equal to 3. However, just a few proposals deal with rule discovery when both the head and the body can involve subsets of any dimensions. A challenging problem is to provide a semantics to such generalized rules by means of objective interestingness measures that have to be carefully designed. Therefore, we discuss the need for different generalizations of the classical confidence measure. We also present the
Mining Constrained CrossGraph Cliques in Dynamic Networks
"... have been recently proposed to mine closed patterns in ternary relations, i.e., a generalization of the socalled formal concept extraction from binary relations. In this paper, we consider the specific context where a ternary relation denotes the value of a graph adjacency matrix (i. e., a Vertices ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
have been recently proposed to mine closed patterns in ternary relations, i.e., a generalization of the socalled formal concept extraction from binary relations. In this paper, we consider the specific context where a ternary relation denotes the value of a graph adjacency matrix (i. e., a Vertices × Vertices matrix) at different timestamps. We discuss the constraintbased extraction of patterns in such dynamic graphs. We formalize the concept of δcontiguous closed 3clique and we discuss the availability of a complete algorithm for mining them. It is based on a specialization of the enumeration strategy implemented in DataPeeler. Indeed, the relevant cliques are specified by means of a conjunction of constraints which can be efficiently exploited. The addedvalue of our strategy for computing constrained clique patterns is assessed on a real dataset about a public bicycle renting system. The raw data encode the relationships between the renting stations during one year. The extracted δcontiguous closed 3cliques are shown to be consistent with our knowledge on the considered city.
Agglomerating Local Patterns Hierarchically with ALPHA
"... To increase the relevancy of local patterns discovered from noisy relations, it makes sense to formalize errortolerance. Our starting point is to address the limitations of stateoftheart methods for this purpose. Some extractors perform an exhaustive search w.r.t. a declarative specification of e ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
To increase the relevancy of local patterns discovered from noisy relations, it makes sense to formalize errortolerance. Our starting point is to address the limitations of stateoftheart methods for this purpose. Some extractors perform an exhaustive search w.r.t. a declarative specification of errortolerance. Nevertheless, their computational complexity prevents the discovery of large relevant patterns. Alpha is a 3step method that (1) computes complete collections of closed patterns, possibly errortolerant ones, from arbitrary nary relations, (2) enlarges them by hierarchical agglomeration, and (3) selects the relevant agglomerated patterns.
Mining chains of relations ⋆
"... Abstract. Traditional data mining methods consider the problem of mining a single relation that relates two different attributes. For example, in a scientific bibliography database, authors are related to papers, and we may be interested in discovering association rules between authors based on the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. Traditional data mining methods consider the problem of mining a single relation that relates two different attributes. For example, in a scientific bibliography database, authors are related to papers, and we may be interested in discovering association rules between authors based on the papers that they have coauthored. However, in real life it is often the case that we have multiple attributes related through chains of relations. For example, authors write papers, and papers belong to one or more topics, defining a threelevel chain of relations. In this paper we consider the problem of mining such relational chains. We formulate a generic problem of finding selector sets (subsets of objects from one of the attributes) such that the projected dataset—the part of the dataset determined by the selector set—satisfies a specific property. The motivation for our approach is that a given property might not hold on the whole dataset, but holds when projecting the data on a subset of objects. We show that many existing and new data mining problems can be formulated in the framework. We discuss various algorithms and identify the conditions when apriori technique can be used. We experimentally demonstrate the effectiveness and efficiency of our methods. 1
Incorporating Occupancy into Frequent Pattern Mining for High Quality Pattern Recommendation
"... Mining interesting patterns from transaction databases has attracted a lot of research interest for more than a decade. Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. In this paper, we introduce a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Mining interesting patterns from transaction databases has attracted a lot of research interest for more than a decade. Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. In this paper, we introduce a new measure of pattern interestingness, occupancy. The measure of occupancy is motivated by some realworld pattern recommendation applications which require that any interesting pattern X should occupy a large portion of the transactions it appears in. Namely, for any supporting transaction t of pattern X, the number of items in X should be close to the total number of items in t. In these pattern recommendation applications, patterns with higher occupancy may lead to higher recall while patterns with higher frequency lead to higher precision. With the definition of occupancy we call a pattern dominant if its occupancy is above a userspecified threshold. Then, our task is to identify the qualified patterns which are both frequent and dominant. Additionally, we also formulate the problem of mining topk qualified patterns: finding the qualified patterns with the topk values of any function (e.g. weighted sum of both occupancy and support). The challenge to these tasks is that the monotone or antimonotone property does not hold on occupancy. In other words, the value of occupancy does not increase or decrease monotonically when we add more items to a given itemset. Thus, we propose an algorithm called DOFIA (DOminant and Frequent Itemset mining Algorithm), which explores the upper bound properties on occupancy to reduce the search process. The tradeoff between bound tightness and computational complexity is also systematically addressed. Finally, we show the effectiveness of DOFIA in a realworld application on printarea recommendation for Web pages, and also demonstrate the efficiency of DOFIA on several large synthetic data sets.