Results 1  10
of
85
GeneCodis: interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic acids research 37: W317
, 2009
"... enrichment analysis and integration of diverse biological information ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
(Show Context)
enrichment analysis and integration of diverse biological information
A systematic approach to the assessment of fuzzy association rules. Data Mining and Knowledge Discovery
, 2006
"... In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such ru ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
(Show Context)
In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such rules by means of appropriate quality measures is less obvious. Particularly, it assumes an understanding of the semantic meaning of a fuzzy rule. This aspect has been ignored by most existing proposals, which must therefore be considered as adhoc to some extent. In this paper, we develop a systematic approach to the assessment of fuzzy association rules. To this end, we proceed from the idea of partitioning the data stored in a database into examples of a given rule, counterexamples, and irrelevant data. Evaluation measures are then derived from the cardinalities of the corresponding subsets. The problem of finding a proper partition has a rather obvious solution for standard association rules but becomes less trivial in the fuzzy case. Our results not only provide a sound justification for commonly used measures but also suggest a means for constructing meaningful alternatives. 1.
Reasoning about Sets using Redescription Mining
 KDD'05
, 2005
"... Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to ide ..."
Abstract

Cited by 25 (13 self)
 Add to MetaCart
Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of association rule mining, from finding implications to equivalences; as a form of conceptual clustering, where the goal is to identify clusters that afford dual characterizations; and as a form of constructive induction, to build features based on given descriptors that mutually reinforce each other. In this paper, we present the use of redescription mining as an important tool to reason about a collection of sets, especially their overlaps, similarities, and differences. We outline algorithms to mine all minimal (nonredundant) redescriptions underlying a dataset using notions of minimal generators of closed itemsets. We also show the use of these algorithms in an interactive context, supporting constraintbased exploration and querying. Specifically, we showcase a bioinformatics application that empowers the biologist to define a vocabulary of sets underlying a domain of genes and to reason about these sets, yielding significant biological insight.
Duplessis: Mining gene expression data with pattern structures in formal concept analysis
 Information Sciences
, 2011
"... concept analysis ..."
(Show Context)
Time series knowledge mining
, 2006
"... An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of loca ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge. The patterns have a hierarchical structure, each level corresponds to a single temporal concept. On the lowest level, intervals are used to represent duration. Overlapping parts of intervals represent coincidence on the next level. Several such blocks of intervals are connected with a partial order relation on the highest level. Each pattern element consists of a semiotic triple to connect syntactic and semantic information with pragmatics. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. Efficient algorithms for the discovery of the patterns are proposed. The search for coincidence as well as partial order can be formulated as variants of the well known frequent itemset problem. One of the best known algorithms for this problem is therefore adapted for our purposes. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using several data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field.
InClose, a Fast Algorithm for Computing Formal Concepts
 the Seventeenth International Conference on Conceptual Structures
, 2009
"... Abstract. This paper presents an algorithm, called InClose, that uses incremental closure and matrix searching to quickly compute all formal concepts in a formal context. InClose is based, conceptually, on a well known algorithm called CloseByOne. The serial version of a recently published algor ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents an algorithm, called InClose, that uses incremental closure and matrix searching to quickly compute all formal concepts in a formal context. InClose is based, conceptually, on a well known algorithm called CloseByOne. The serial version of a recently published algorithm (Krajca, 2008) was shown to be in the order of 100 times faster than several wellknown algorithms, and timings of other algorithms in reviews suggest that none of them are faster than Krajca. This paper compares InClose to Krajca, discussing computational methods, data requirements and memory considerations. From experiments using several public data sets and random data, this paper shows that InClose is in the order of 20 times faster than Krajca. InClose is small, straightforward, requires no matrix preprocessing and is simple to implement. 1
A better tool than Allen’s relations for expressing temporal knowledge in interval data
, 2006
"... Temporal patterns composed of symbolic intervals are commonly formulated with Allen’s interval relations originating in temporal reasoning. We show that this representation has severe disadvantages for knowledge discovery. The patterns are not robust, in the sense that small disturbances of interva ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Temporal patterns composed of symbolic intervals are commonly formulated with Allen’s interval relations originating in temporal reasoning. We show that this representation has severe disadvantages for knowledge discovery. The patterns are not robust, in the sense that small disturbances of interval boundaries lead to different patterns for similar situations. The representation is ambiguous since the same pattern can have quantitatively widely varying appearances. For all but very simple cases the patterns are not understandable because the textual descriptions are lengthy and unstructured. We present the Time Series Knowledge Representation (TSKR), a new hierarchical language for interval patterns to express the temporal concepts of coincidence and partial order. We demonstrate the superiority of this novel form of representing temporal knowledge over Allen’s relations for data mining. Results on a real data set support our claims and show a successful application.
BLOSOM: A Framework for Mining Arbitrary Boolean Expressions over Attribute Sets
 IN: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD 2006
, 2006
"... We introduce a novel framework (BLOSOM) for mining (frequent) boolean expressions over binaryvalued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
We introduce a novel framework (BLOSOM) for mining (frequent) boolean expressions over binaryvalued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, we propose a closure operator that naturally leads to the concept of a closed boolean expression. The closed expressions and their minimal generators give the most specific and most general boolean expressions that are satisfied by their corresponding object set. Further, the closed/minimal generator expressions form a lossless representation of all possible boolean expressions. BLOSSOM efficiently
Two FCAbased methods for mining gene expression data
 International Conference on Formal Concept Analysis (ICFCA), volume 5548 of Lecture Notes in Computer Science
, 2009
"... Abstract. Gene expression data are numerical and describe the level of expression of genes in different situations, thus featuring behaviour of the genes. Two methods based on FCA (Formal Concept Analysis) are considered for clustering gene expression data. The first one is based on interordinal sca ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Gene expression data are numerical and describe the level of expression of genes in different situations, thus featuring behaviour of the genes. Two methods based on FCA (Formal Concept Analysis) are considered for clustering gene expression data. The first one is based on interordinal scaling and can be realized using standard FCA algorithms. The second method is based on pattern structures and needs adaptations of standard algorithms to computing with interval algebra. The two methods are described in details and discussed. The second method is shown to be more computationally efficient and providing more readable results. Experiments with gene expression data are discussed. 1
Mining TopK Patterns from Binary Datasets in presence of Noise
"... The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns o ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The discovery of patterns in binary dataset has many applications, e.g. in electronic commerce, TCP/IP networking, Web usage logging, etc. Still, this is a very challenging task in many respects: overlapping vs. non overlapping patterns, presence of noise, extraction of the most important patterns only. In this paper we formalize the problem of discovering the TopK patterns from binary datasets in presence of noise, as the minimization of a novel cost function. According to the Minimum Description Length principle, the proposed cost function favors succinct pattern sets that may approximately describe the input data. We propose a greedy algorithm for the discovery of Patterns in Noisy Datasets, named PaNDa, and show that it outperforms related techniques on both synthetic and realworld data.