Results 1  10
of
706,317
The Anatomy of a Hierarchical Clustering Engine for WebPage, News and Book Snippets
"... this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection of b ..."
Abstract
 Add to MetaCart
this paper, we investigate the web snippet hierarchical clustering problem in its full extent by devising an algorithmic solution, and a software prototype called SnakeT (accessible at http://roquefort.di.unipi.it/), that: (1) draws the snippets from 16 Web search engines, the Amazon collection
Hierarchical Dirichlet processes
 Journal of the American Statistical Association
, 2004
"... program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture comp ..."
Abstract

Cited by 918 (78 self)
 Add to MetaCart
components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a
A comparison of document clustering techniques
 In KDD Workshop on Text Mining
, 2000
"... This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and Kmeans. (We used both a “standard” Kmeans algorithm and a “bisecting ” Kmeans algorithm.) Our results indicate that the bisecting Kmeans technique is ..."
Abstract

Cited by 596 (26 self)
 Add to MetaCart
This paper presents the results of an experimental study of some common document clustering techniques: agglomerative hierarchical clustering and Kmeans. (We used both a “standard” Kmeans algorithm and a “bisecting ” Kmeans algorithm.) Our results indicate that the bisecting Kmeans technique
Imagenet: A largescale hierarchical image database
 In CVPR
, 2009
"... The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce her ..."
Abstract

Cited by 800 (28 self)
 Add to MetaCart
datasets. Constructing such a largescale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We
Clustering by passing messages between data points
 Science
, 2007
"... Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars ” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initi ..."
Abstract

Cited by 689 (8 self)
 Add to MetaCart
so in less than onehundredth the amount of time. Clustering data based on a measure of similarity is a critical step in scientific data analysis and in engineering systems. A common approach is to use data to learn a set of centers such that the sum of
Estimating the number of clusters in a dataset via the Gap statistic
, 2000
"... We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. kmeans or hierarchical), comparing the change in within cluster dispersion to that expected under an appropriate reference ..."
Abstract

Cited by 492 (1 self)
 Add to MetaCart
We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. kmeans or hierarchical), comparing the change in within cluster dispersion to that expected under an appropriate reference
CATH  a hierarchic classification of protein domain structures
 STRUCTURE
, 1997
"... Background: Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structurebased classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can ..."
Abstract

Cited by 465 (33 self)
 Add to MetaCart
can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. Results: We present a semiautomatic procedure for deriving a novel hierarchical
Knowledgebased Analysis of Microarray Gene Expression Data By Using Support Vector Machines
, 2000
"... We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of ..."
Abstract

Cited by 511 (8 self)
 Add to MetaCart
of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and selforganizing maps. SVMs have many mathematical features that make them attractive for gene expression
ModelBased Analysis of Oligonucleotide Arrays: Model Validation, Design Issues and Standard Error Application
, 2001
"... Background: A modelbased analysis of oligonucleotide expression arrays we developed previously uses a probesensitivity index to capture the response characteristic of a specific probe pair and calculates modelbased expression indexes (MBEI). MBEI has standard error attached to it as a measure of ..."
Abstract

Cited by 751 (28 self)
 Add to MetaCart
better ranking statistic for filtering genes. We can assign reliability indexes for genes in a specific cluster of interest in hierarchical clustering by resampling clustering trees. A software dChip implementing many of these analysis methods is made available. Conclusions: The modelbased approach
Dryad: Distributed DataParallel Programs from Sequential Building Blocks
 In EuroSys
, 2007
"... Dryad is a generalpurpose distributed execution engine for coarsegrain dataparallel applications. A Dryad application combines computational “vertices ” with communication “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of availa ..."
Abstract

Cited by 728 (27 self)
 Add to MetaCart
gle computers, through small clusters of computers, to data centers with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer
Results 1  10
of
706,317