• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Boulicaut, Constraint-based mining of fault-tolerant patterns from Boolean data. Knowledge Discovery in Inductive Databases (2005)

by J Besson, R Pensa, C Robardet, J-F
Add To MetaCart

Tools

Sorted by:
Results 1 - 7 of 7

The discrete basis problem

by Pauli Miettinen , 2005
"... We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem ..."
Abstract - Cited by 41 (13 self) - Add to MetaCart
We consider the Discrete Basis Problem, which can be described as follows: given a collection of Boolean vectors find a collection of k Boolean basis vectors such that the original vectors can be represented using disjunctions of these basis vectors. We show that the decision version of this problem is NP-complete and that the optimization version cannot be approximated within any finite ratio. We also study two variations of this problem, where the Boolean basis vectors must be mutually otrhogonal. We show that the other variation is closely related with the well-known Metric k-median Problem in Boolean space. To solve these problems, two algorithms will be presented. One is designed for the variations mentioned above, and it is solely based on solving the k-median problem, while another is a heuristic intended to solve the general Discrete Basis Problem. We will also study the results of extensive experiments made with these two algorithms with both synthetic and real-world data. The results are twofold: with the synthetic data, the algorithms did rather well, but with the real-world data the results were not as good.
(Show Context)

Citation Context

...ces full of 1s. The main difference to DBP is that no 0s can be covered in a feasible tiling. Methods have been developed for finding also large approximate tiles, for example fault-tolerant patterns =-=[22]-=- and conjunctive clusters [23], but obtaining an accurate description of the whole data set with a small number of approximate tiles has not been explicitly studied previously. Boolean factorization, ...

Mining Bi-sets in Numerical Data

by Jérémy Besson, Céline Robardet, Jean-françois Boulicaut
"... Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers who ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers whose dimensions denote objects and properties). Therefore, using efficient 0/1 mining techniques needs for tedious Boolean property encoding phases. This is, e.g., the case, when considering microarray data mining and its impact for knowledge discovery in molecular biology. We consider the possibility to mine directly numerical data to extract collections of relevant bi-sets, i.e., couples of associated sets of objects and attributes which satisfy some user-defined constraints. Not only we propose a new pattern domain but also we introduce a complete solver for computing the so-called numerical bi-sets. Preliminary experimental validation is given. 1
(Show Context)

Citation Context

..., and we provide a complete solver for computing NBSpatterns.We startfromarecentformalizationof constraint-basedbi-setmining from0/1data (extensionofformalconcepts towardsfault-toleranceintroduced in =-=[3]-=-) both for the design of the pattern domain and its associated solver.The next section concerns the formalization of the NBS pattern domain and its properties. Section 3 sketches our algorithm and Sec...

Application-Independent Feature Construction from Noisy Samples ⋆

by Dominique Gay, Nazha Selmaoui, Jean-françois Boulicaut
"... Abstract. When training classifiers, presence of noise can severely harm their performance. In this paper, we focus on “non-class ” attribute noise and we consider how a frequent fault-tolerant (FFT) pattern mining task can be used to support noise-tolerant classification. Our method is based on an ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. When training classifiers, presence of noise can severely harm their performance. In this paper, we focus on “non-class ” attribute noise and we consider how a frequent fault-tolerant (FFT) pattern mining task can be used to support noise-tolerant classification. Our method is based on an application independent strategy for feature construction based on the so-called δ-free patterns. Our experiments on noisy training data shows accuracy improvement when using the computed features instead of the original ones. 1
(Show Context)

Citation Context

...nces in faulttolerant itemset mining and feature construction. The goal of fault-tolerant itemset mining [6] is to support the discovery of relevant frequent itemsets in noisy binary data (see, e.g., =-=[7]-=- for a recent survey). Among others, an extension to (frequent) closed set mining towards fault-tolerance has been studied in [8] that enables a bounded number (δ) of errors per item/attribute. It is ...

knowledge

by Sylvain Blachon, Ruggero G. Pensa, Jérémy Besson, Céline Robardet, Jean-françois Boulicaut, Olivier G
"... Clustering formal concepts to discover biologically relevant ..."
Abstract - Add to MetaCart
Clustering formal concepts to discover biologically relevant
(Show Context)

Citation Context

... a 1 value or vice versa). This is particularly true with Boolean gene expression data which are intrinsically noisy. Therefore, an idea is to extend formal concepts towar fault-tolerance (see, e.g., =-=Besson et al., 2006-=-a; Besson et al., 2006b). Such fault-tolerant patterns (FTPs) can be viewed as formal concepts in which a limited number of exceptions are tolerated (e.g., one tolerates that a few genes are not over-...

New Applications of Formal Concept Analysis: A Need for Original Pattern Domains

by Jean-françois Boulicaut, Work Jérémy Besson, Loïc Cerf, Kim-ngan T. Nguyen
"... Abstract. We survey the results obtained by our research group (joint ..."
Abstract - Add to MetaCart
Abstract. We survey the results obtained by our research group (joint
(Show Context)

Citation Context

... a result, we were discussing the use of primitive constraints to compute more relevant formal concepts, for instance large-enough ones [13] but also some generalizations that provide fault-tolerance =-=[14]-=-. A few years later, it is now possible to discuss such issues in the enlarged setting of arbitrary n-ary relations. Therefore, we can consider (a) our generic algorithm that mines set patterns and ex...

Mining Local Staircase Patterns in Noisy Data

by Luc De Raedt, Kathleen Marchal
"... Abstract—Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing bi ..."
Abstract - Add to MetaCart
Abstract—Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing biclustering algorithms. Our formalization relies on a scoring function based on the Minimum Description Length principle. Furthermore, we pro-pose a first algorithm for identifying staircase biclusters, based on a combination of local search and constraint programming. Experiments show that the approach is promising. Index Terms—Staircase patterns; pattern sets; constraint pro-gramming; MDL; biclustering. I.
(Show Context)

Citation Context

...ists of exactly the same symbol. However, many datasets contain noise, that is, a fraction of the elements have a deviating symbol. We can cater for this by introducing the concept of fault-tolerance =-=[2]-=-. A fault-tolerant bicluster is a bicluster that allows a small amount of noise on the rows and columns. In the following, we use the Iverson bracket [·] to convert the truth value of an equation into...

Mining bi-sets in numerical data

by Luc De Raedt
"... Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers who ..."
Abstract - Add to MetaCart
Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers whose dimensions denote objects and properties). Therefore, using efficient 0/1 mining techniques needs for tedious Boolean property encoding phases. This is, e.g., the case, when considering microarray data mining and its impact for knowledge discovery in molecular biology. We consider the possibility to mine directly numerical data to extract collections of relevant bi-sets, i.e., couples of associated sets of objects and attributes which satisfy some user-defined constraints. Not only we propose a new pattern domain but also we introduce a complete solver for computing the so-called numerical bi-sets. Preliminary experimental validation is given. 1
(Show Context)

Citation Context

... a complete solver for computing NBS patterns. We start from a recent formalization of constraint-based bi-set mining from 0/1 data (extension of formal concepts towards fault-tolerance introduced in =-=[3]-=-) both for the design of the pattern domain and its associated solver. The next section concerns the formalization of the NBS pattern domain and its properties. Section 3 sketches our algorithm and Se...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University