Results 1  10
of
19
Duplessis: Mining gene expression data with pattern structures in formal concept analysis
 Information Sciences
, 2011
"... concept analysis ..."
(Show Context)
Identification of regulatory modules in timeseries gene expression data using a linear time biclustering algorithm
 IEEE/ACM Transactions on Computational Biology and Bioinformatics
"... Several nonsupervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a nonsupervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Several nonsupervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a nonsupervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of experimental conditions, where the genes exhibit highly correlated behaviors. These correlated behaviors correspond to coherent expression patterns and can be used to identify potential regulatory modules possibly involved in regulatory mechanisms. Many specific versions of the biclustering problem have been shown to be NPcomplete. However, when we are interested in identifying biclusters in time series expression data, we can restrict the problem by finding only maximal biclusters with contiguous columns. This restriction leads to a tractable problem. Its motivation is the fact that biological processes start and finish in an identifiable contiguous period of time, leading to increased (or decreased) activity of sets of genes forming biclusters with contiguous
Towards faulttolerant formal concept analysis
 In AI*IA’05, volume 3673 of LNAI
, 2005
"... Abstract. GivenBooleandatasetswhichrecordpropertiesofobjects, Formal Concept Analysis is a wellknown approach for knowledge discovery. Recent application domains, e.g., for very large data sets, have motivated new algorithms which can perform constraintbased mining of formal concepts (i.e., closed ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract. GivenBooleandatasetswhichrecordpropertiesofobjects, Formal Concept Analysis is a wellknown approach for knowledge discovery. Recent application domains, e.g., for very large data sets, have motivated new algorithms which can perform constraintbased mining of formal concepts (i.e., closed sets on both dimensions which are associated by the Galois connection and satisfy some userdefined constraints). In this paper, we consider a major limit of these approaches when considering noisy data sets. This is indeed the case of Boolean gene expression data analysis where objects denote biological experiments and attributes denote gene expression properties. In this type of intrinsically noisy data, the Galois association is so strong that the number of extracted formal concepts explodes. We formalize the computation of the socalled δbisets as an alternative for capturing strong associations between sets of objects and sets of properties. Based on a previous work on approximate condensed representations of frequent sets by means of δfree itemsets, we get an efficient technique which can be applied on large data sets. An experimental validation on both synthetic and real data is given. It confirms the addedvalue of our approach w.r.t. formal concept discovery, i.e., the extraction of smaller collections of relevant associations. 1
Mining formal concepts with a bounded number of exceptions from transactional data
 In: Proceedings KDID’04. Volume 3377 of LNCS., SpringerVerlag
, 2004
"... Abstract. We are designing new data mining techniques on boolean contexts to identify a priori interesting bisets (i.e., sets of objects or transactions associated to sets of attributes or items). A typical important case concerns formal concept mining (i.e., maximal rectangles of true values or as ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We are designing new data mining techniques on boolean contexts to identify a priori interesting bisets (i.e., sets of objects or transactions associated to sets of attributes or items). A typical important case concerns formal concept mining (i.e., maximal rectangles of true values or associated closed sets by means of the socalled Galois connection). It has been applied with some success to, e.g., gene expression data analysis where objects denote biological situations and attributes denote gene expression properties. However in such reallife application domains, it turns out that the Galois association is a too strong one when considering intrinsically noisy data. It is clear that strong associations that would however accept a bounded number of exceptions would be extremely useful. We study the new pattern domain of α/β concepts, i.e., consistent maximal bisets with less than α false values per row and less than β false values per column. We provide a complete algorithm that computes all the α/β concepts based on the generation of concept unions pruned thanks to antimonotonic constraints. An experimental validation on synthetic data is given. It illustrates that more relevant associations can be discovered in noisy data. We also discuss a practical application in molecular biology that illustrates an incomplete but quite useful extraction when all the concepts that are needed beforehand can not be discovered. 1
Mining Bisets in Numerical Data
"... Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers who ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Thanks to an important research effort the last few years, inductive queries on set patterns and complete solvers which can evaluate them on large 0/1 data sets have been proved extremely useful. However, for many application domains, the raw data is numerical (matrices of real numbers whose dimensions denote objects and properties). Therefore, using efficient 0/1 mining techniques needs for tedious Boolean property encoding phases. This is, e.g., the case, when considering microarray data mining and its impact for knowledge discovery in molecular biology. We consider the possibility to mine directly numerical data to extract collections of relevant bisets, i.e., couples of associated sets of objects and attributes which satisfy some userdefined constraints. Not only we propose a new pattern domain but also we introduce a complete solver for computing the socalled numerical bisets. Preliminary experimental validation is given. 1
From local pattern mining to relevant bicluster characterization
 Proceedings of the 6th International Symposium on Intelligent Data Analysis IDA 2005
"... Abstract. Clustering or biclustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a b ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Clustering or biclustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bipartition is available. We introduce a generic cluster characterization technique which is based on collections of bisets (i.e., sets of objects associated to sets of properties) which satisfy some userdefined constraints, and a measure of the accuracy of a given biset as a bicluster characterization pattern. The method is illustrated on both formal concepts (i.e., "maximal rectangles of true values") and the new type of δbisets (i.e., "rectangles of true values with a bounded number of exceptions per column"). The addedvalue is illustrated on benchmark data and two real data sets which are intrinsically noisy: a medical data about meningitis and Plasmodium falciparum gene expression data.
Boolean property encoding for local set pattern discovery: an application to gene expression data analysis
 Local Pattern Detection. SpringerVerlag LNAI 3539
, 2005
"... Abstract. In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of comple ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In the domain of gene expression data analysis, several researchers have recently emphasized the promising application of local pattern (e.g., association rules, closed sets) discovery techniques from boolean matrices that encode gene properties. Detecting local patterns by means of complete constraintbased mining techniques turns to be an important complementary approach or invaluable counterpart to heuristic global model mining. To take the most from local set pattern mining approaches, a needed step concerns gene expression property encoding (e.g., overexpression). The impact of this preprocessing phase on both the quantity and the quality of the extracted patterns is crucial. In this paper, we study the impact of discretization techniques by a sound comparison between the dendrograms, i.e., trees that are generated by a hierarchical clustering algorithm on raw numerical expression data and its various derived boolean matrices. Thanks to a new similarity measure, we can select the boolean property encoding technique which preserves similarity structures holding in the raw data. The discussion relies on several experimental results for three gene expression data sets. We believe our framework is an interesting direction of work for the many application domains in which (a) local set patterns have been proved useful, and (b) Boolean properties have to be derived from raw numerical data. 1
A methodology for biologically relevant pattern discovery from gene expression data
 In Discovery Science
, 2004
"... Abstract. One of the most exciting scientific challenges in functional genomics concerns the discovery of biologically relevant patterns from gene expression data. For instance, it is extremely useful to provide putative synexpression groups or transcription modules to molecular biologists. We propo ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. One of the most exciting scientific challenges in functional genomics concerns the discovery of biologically relevant patterns from gene expression data. For instance, it is extremely useful to provide putative synexpression groups or transcription modules to molecular biologists. We propose a methodology that has been proved useful in real cases. It is described as a prototypical KDD scenario which starts from raw expression data selection until useful patterns are delivered. Our conceptual contribution is (a) to emphasize how to take the most from recent progress in constraintbased mining of set patterns, and (b) to propose a generic approach for gene expression data enrichment. The methodology has been validated on real data sets. 1
Supporting bicluster interpretation in 0/1 data by means of local patterns
, 2006
"... Clustering or coclustering techniques have been proved useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. As a result, interpreting clustering results and discovering knowledge from them can be quite hard. We consider potential ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Clustering or coclustering techniques have been proved useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. As a result, interpreting clustering results and discovering knowledge from them can be quite hard. We consider potentially large Boolean data sets which record properties of objects and we assume the availability of a bipartition which has to be characterized by means of a symbolic description. Our generic approach exploits collections of local patterns which satisfy some userdefined constraints in the data, and a measure of the accuracy of a given local pattern as a bicluster characterization pattern. We consider local patterns which are bisets, i.e., sets of objects associated to sets of properties. Two concrete examples are formal concepts (i.e., associated closed sets) and the socalled δbisets (i.e., an extension of formal concepts towards faulttolerance). We introduce the idea of characterizing query which can be used by experts to support knowledge discovery from bipartitions thanks to available local patterns. The addedvalue is illustrated on benchmark data and three real data sets: a medical data set and two gene expression data sets. 1
NaviMoz: Mining Navigational Patterns in Portal Catalogs
"... Abstract. Portal Catalogs is a popular means of searching for information on the Web. They provide querying and browsing capabilities on data organized in a hierarchy, on a category/subcategory basis. This paper presents mining techniques on user navigational patterns in the hierarchies of portal ca ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Portal Catalogs is a popular means of searching for information on the Web. They provide querying and browsing capabilities on data organized in a hierarchy, on a category/subcategory basis. This paper presents mining techniques on user navigational patterns in the hierarchies of portal catalogs. Specifically, we study and implement navigation retrieval methods and clustering tasks based on navigational patterns. The above mining tasks are quite useful for portal administrators, since they can be used to observe users ’ behavior, extract personal preferences and reorganize the structure of the portal to satisfy better user needs and navigational habits. These mining tasks have been implemented in the NaviMoz, a prototype system for mining navigational patterns in portal catalogs. 1