Results 1  10
of
115
A Clustering Algorithm based on Graph Connectivity
 Information Processing Letters
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. ..."
Abstract

Cited by 138 (3 self)
 Add to MetaCart
(Show Context)
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques.
Evaluating Structural Similarity in XML Documents
, 2002
"... XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re)construct a DTD to describe th ..."
Abstract

Cited by 105 (0 self)
 Add to MetaCart
(Show Context)
XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents with unknown DTDs, it may not be appropriate to construct a single DTD to describe every document in the collection. Instead, we would wish to partition the collection into smaller sets of "similar" documents, and then induce a separate DTD for each such set. It is this partitioning problem that we address in this paper. Given two
Hierarchic social entropy: An information theoretic measure of robot group diversity
 Autonomous Robots
, 2000
"... Abstract. As research expands in multiagent intelligent systems, investigators need new tools for evaluating the artificial societies they study. It is impossible, for example, to correlate heterogeneity with performance in multiagent robotics without a quantitative metric of diversity. Currently di ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
Abstract. As research expands in multiagent intelligent systems, investigators need new tools for evaluating the artificial societies they study. It is impossible, for example, to correlate heterogeneity with performance in multiagent robotics without a quantitative metric of diversity. Currently diversity is evaluated on a bipolar scale with systems classified as either heterogeneous or homogeneous, depending on whether any of the agents differ. Unfortunately, this labeling doesn’t tell us much about the extent of diversity in heterogeneous teams. How can it be determined if one system is more or less diverse than another? Heterogeneity must be evaluated on a continuous scale to enable substantive comparisons between systems. To enable these types of comparisons, we introduce: (1) a continuous measure of robot behavioral difference, and (2) hierarchic social entropy, an application of Shannon’s information entropy metric to robotic groups that provides a continuous, quantitative measure of robot team diversity. The metric captures important components of the meaning of diversity, including the number and size of behavioral groups in a society and the extent to which agents differ. The utility of the metrics is demonstrated in the experimental evaluation of multirobot soccer and multirobot foraging teams.
Data clustering using a model granular magnet
 Neural Computation
, 1997
"... We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a d ..."
Abstract

Cited by 72 (4 self)
 Add to MetaCart
We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures, it is completely ordered; all spins are aligned. At very high temperatures, the system does not exhibit any ordering, and in an intermediate regime, clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spinspin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method. 1
On the similarity of dendrograms
 Journal of Theoretical Biology
, 1978
"... A metric on binary trees is dehed to give the similarity of two dendrograms. One of the major desirable properties of the proposed treesimilaritymeasure is to clarify the decision ordering nature of biological trees. This metric is applied to evolutionary tree reconstructions and comparative embryo ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
A metric on binary trees is dehed to give the similarity of two dendrograms. One of the major desirable properties of the proposed treesimilaritymeasure is to clarify the decision ordering nature of biological trees. This metric is applied to evolutionary tree reconstructions and comparative embryogenesis. The mathematical properties of this metric are discussed, and an algofithm is proposed to compute the metric. &quot;.... our essential task lies in the comparison of related forms rather than in the precise definition of each; and the deformation of a complicated figure may be a phenomenon easy of comprehension, though the figure itself have to be left unanalysed.... &quot; Darcy Thompson 1917 1.
An Algorithm for Clustering cDNAs for Gene Expression Analysis
 In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clusterin ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
(Show Context)
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
A Partition Model of Granular Computing
 LNCS Transactions on Rough Sets
, 2004
"... There are two objectives of this chapter. One objective is to examine the basic principles and issues of granular computing. We focus on the tasks of granulation and computing with granules. From semantic and algorithmic perspectives, we study the construction, interpretation, and representation ..."
Abstract

Cited by 45 (14 self)
 Add to MetaCart
(Show Context)
There are two objectives of this chapter. One objective is to examine the basic principles and issues of granular computing. We focus on the tasks of granulation and computing with granules. From semantic and algorithmic perspectives, we study the construction, interpretation, and representation of granules, as well as principles and operations of computing and reasoning with granules. The other objective is to study a partition model of granular computing in a settheoretic setting. The model is based on the assumption that a finite set of universe is granulated through a family of pairwise disjoint subsets. A hierarchy of granulations is modeled by the notion of the partition lattice.
A Unified Framework for Expressing Software Subsystem Classification Techniques
, 1996
"... The architecture of a software system classifies its components into subsystems and describes the relationships between the subsystems. The information contained in such an abstraction is of immense significance in various software maintenance activities. There is considerable interest in extracting ..."
Abstract

Cited by 44 (0 self)
 Add to MetaCart
The architecture of a software system classifies its components into subsystems and describes the relationships between the subsystems. The information contained in such an abstraction is of immense significance in various software maintenance activities. There is considerable interest in extracting the architecture of a software system from its source code, and hence in techniques that classify the components of a program into subsystems. Techniques for classifying subsystems presented in the literature differ in the type of components they place in a subsystem and the information they use to identify related components. However, these techniques have been presented using different terminology and symbols, making it harder to perform comparative analyses. This paper presents a unified framework for expressing techniques of classifying subsystems of a software system. The framework consists of a consistent set of terminology, notation, and symbols that may be used to describe the input, output, and processing performed by these techniques. Using this framework several subsystem classification techniques have been reformulated. This reformulation makes it easier to compare these techniques, a first step towards evaluating their relative effectiveness.
EntropyBased Criterion in Categorical Clustering
 Proc. of Intl. Conf. on Machine Learning (ICML
, 2004
"... Entropytype measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropybased criterion in clustering categorical data. It first shows that the entropybased criterion can be derived in the formal framework of probabilistic clustering models and e ..."
Abstract

Cited by 35 (4 self)
 Add to MetaCart
(Show Context)
Entropytype measures for the heterogeneity of clusters have been used for a long time. This paper studies the entropybased criterion in clustering categorical data. It first shows that the entropybased criterion can be derived in the formal framework of probabilistic clustering models and establishes the connection between the criterion and the approach based on dissimilarity coefficients.