Results 1  10
of
36
Probabilistic approaches to rough sets
 Expert Systems
, 2003
"... This paper reviews probabilistic approaches to rough sets in granulation, approximation, and rule induction. The Shannon entropy function is used to quantitatively characterize partitions of a universe. Both algebraic and probabilistic rough set approximations are studied. The probabilistic approxim ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
This paper reviews probabilistic approaches to rough sets in granulation, approximation, and rule induction. The Shannon entropy function is used to quantitatively characterize partitions of a universe. Both algebraic and probabilistic rough set approximations are studied. The probabilistic approximations are defined in a decisiontheoretic framework. The problem of rule induction, a major application of rough set theory, is studied in probabilistic and informationtheoretic terms. Two types of rules are analyzed, the local, low order rules, and the global, high order rules. 1
Isosurface Similarity Maps
"... In this paper, we introduce the concept of isosurface similarity maps for the visualization of volume data. Isosurface similarity maps present structural information of a volume data set by depicting similarities between individual isosurfaces quantified by a robust informationtheoretic measure. Un ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we introduce the concept of isosurface similarity maps for the visualization of volume data. Isosurface similarity maps present structural information of a volume data set by depicting similarities between individual isosurfaces quantified by a robust informationtheoretic measure. Unlike conventional histograms, they are not based on the frequency of isovalues and/or derivatives and therefore provide complementary information. We demonstrate that this new representation can be used to guide transfer function design and visualization parameter specification. Furthermore, we use isosurface similarity to develop an automatic parameterfree method for identifying representative isovalues. Using realworld data sets, we show that isosurface similarity maps can be a useful addition to conventional classification techniques. Categories and Subject Descriptors (according to ACM CCS): Generation—Display algorithms I.3.3 [Computer Graphics]: Picture/Image 1.
PSEUDOLIKELIHOOD METHODS FOR COMMUNITY DETECTION IN LARGE SPARSE NETWORKS
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2013
"... Many algorithms have been proposed for fitting network models with communities but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudolikelihood method for fitting the stochastic block model for networks, as well as a variant that a ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Many algorithms have been proposed for fitting network models with communities but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudolikelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms perform well under a range of settings, including on very sparse networks, and illustrate on the example of a network of political blogs. We also propose spectral clustering with perturbations, a method of independent interest, which works well on sparse networks where regular spectral clustering fails, and use it to provide an initial value for pseudolikelihood. We prove that pseudolikelihood provides consistent estimates of the communities under a mild condition on the starting value, for the case of a block model with two balanced communities.
A Measurementtheoretic foundation for rule interestingness evaluation
 Proceedings of Workshop on Foundations and New Directions in Data Mining in the Third IEEE International Conference on Data Mining (ICDM 2003
, 2003
"... Summary. Many measures have been proposed and studied extensively in data mining for evaluating the interestingness (or usefulness) of discovered rules. They are usually defined based on structural characteristics or statistical information about the rules. The meaningfulness of each measure was int ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
(Show Context)
Summary. Many measures have been proposed and studied extensively in data mining for evaluating the interestingness (or usefulness) of discovered rules. They are usually defined based on structural characteristics or statistical information about the rules. The meaningfulness of each measure was interpreted based either on intuitive arguments or mathematical properties. There does not exist a framework in which one is able to represent the user judgment explicitly, precisely, and formally. Since the usefulness of discovered rules must be eventually judged by users, a framework that takes user preference or judgement into consideration will be very valuable. The objective of this paper is to propose such a framework based on the notion of user preference. The results are useful in establishing a measurementtheoretic foundation of rule interestingness evaluation.
Level Construction of Decision Trees in a Partitionbased Framework for Classification
 Proceedings of the 16th International Conference on Software Engineering and Knowledge Engineering (SEKE’04
, 2004
"... A partitionbased framework is presented for a formal study of consistent classification problems. An information table is used as knowledge representation. Solutions to, and solution space of, classification problems are formulated in terms of partitions. Algorithms for finding solutions are modele ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
A partitionbased framework is presented for a formal study of consistent classification problems. An information table is used as knowledge representation. Solutions to, and solution space of, classification problems are formulated in terms of partitions. Algorithms for finding solutions are modeled as searching in a space of partitions under a refinement order relation. We focus on a particular type of solutions called conjunctively definable partitions. Two level construction methods for decision trees are investigated. Experimental results are reported to compare the two level construction methods. 1.
Usercentered Interactive Data Mining
 IN: PROC. OF THE IEEEICCI’06
, 2006
"... While many data mining models concentrate on automation and efficiency, interactive data mining models focus on adaptive and effective communications between human users and computer systems. User views, preferences, strategies and judgements play the most important roles in humanmachine interactiv ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
While many data mining models concentrate on automation and efficiency, interactive data mining models focus on adaptive and effective communications between human users and computer systems. User views, preferences, strategies and judgements play the most important roles in humanmachine interactivities, guide the selection of target knowledge representations, operations, and measurements. Practically, user views, preferences and judgements also decide strategies of abnormal situation handling, and explanations of mined patterns. In this paper, we discuss these fundamental issues.
Epigenetic profiles distinguish pleural mesothelioma from normal pleura and predict lung asbestos burden and clinical outcome. Cancer Res
, 2009
"... Access the most recent version of this article at: doi: 10.1158/00085472.CAN082586 Access the most recent supplemental material at: ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Access the most recent version of this article at: doi: 10.1158/00085472.CAN082586 Access the most recent supplemental material at:
Active Postrefined Multimodality Video Semantic Concept Detection with Tensor Representation
"... In this paper, we resolve the problem of multimodality video representation and semantic concept detection. Interaction and integration of multimodality media types such as visual, audio and textual data in video are essential to video semantic analysis. Traditionally, videos are represented as ve ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we resolve the problem of multimodality video representation and semantic concept detection. Interaction and integration of multimodality media types such as visual, audio and textual data in video are essential to video semantic analysis. Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then taken to these vectors in a high dimensional space for dimension reduction, classification, clustering and so on. However, the multiple modalities in video not only have their own properties, but also have correlations among them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. In this paper, we introduce a higherorder tensor framework for video analysis, in which we represent image, video and text three modalities in video shots as data points by the 3rdorder tensor called tensorshots. We propose a novel dimension reduction method that explicitly considers the manifold structure of the tensor space from multimodal media data which is temporal associated cooccurrence and then detect video semantic concepts through powerful classifiers which take tensor as input. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled, and is also able to map outofsample data points directly. Moreover we apply an active learning based contextual and temporal postrefining strategy to enhance detection accuracy. Experiment results show that our method improves the performance of video semantic concept detection.
A systematic investigation of explicit and implicit schema information on the linked open data cloud,” The Semantic Web: Semantics and Big Data
, 2013
"... cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources ’ properties. In this paper, we present a method and metrics to analyse the information theoretic properties and the correlat ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources ’ properties. In this paper, we present a method and metrics to analyse the information theoretic properties and the correlation between the two manifestations of schema information. Furthermore, we actually perform such an analysis on largescale linked data sets. To this end, we have extracted schema information regarding the types and properties defined in the data set segments provided for the Billion Triples Challenge 2012. We have conducted an in depth analysis and have computed various entropy measures as well as the mutual information encoded in the two types of schema information. Our analysis provides insights into the information encoded in the different schema characteristics. Two major findings are that implicit schema information is far more discriminative and that applications involving schema information based on either types or properties alone will only capture between 63.5 % and 88.1 % of the schema information contained in the data. Based on these observations, we derive conclusions about the design of future schemas for LOD as well as potential application scenarios. 1
Role Discovery in Networks
, 2014
"... Roles represent nodelevel connectivity patterns such as starcenter, staredge nodes, nearcliques or nodes that act as bridges to different regions of the graph. Intuitively, two nodes belong to the same role if they are struturally similar. Roles have been mainly of interest to sociologists, b ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Roles represent nodelevel connectivity patterns such as starcenter, staredge nodes, nearcliques or nodes that act as bridges to different regions of the graph. Intuitively, two nodes belong to the same role if they are struturally similar. Roles have been mainly of interest to sociologists, but more recently, roles have become increasingly useful in other domains. Traditionally, the notion of roles were defined based on graph equivalences such as structural, regular, and stochastic equivalences. We briefly revisit the notions and instead propose a more general formulation of roles based on the similarity of a feature representation (in contrast to the graph representation). This leads us to propose a taxonomy of two general classes of techniques for discovering roles which includes (i) graphbased roles and (ii) featurebased roles. This survey focuses primarily on featurebased roles. In particular, we also introduce a flexible framework for discovering roles using the notion of structural similarity on a featurebased representation. The framework consists of two fundamental components: (1) role feature construction and (2) role assignment using the learned feature representation. We discuss the relevant decisions for discovering featurebased roles and highlight the advantages and disadvantages of the many techniques that can be used for this purpose. Finally, we discuss potential applications and future directions and challenges.