Results 1  10
of
63
LargeScale CrossDocument Coreference Using Distributed Inference and Hierarchical Models
"... Crossdocument coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entitie ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
Crossdocument coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach. 1
Clustered Nyström method for large scale manifold learning and dimension reduction
 IEEE Transactions on Neural Networks
, 2010
"... Abstract — Kernel (or similarity) matrix plays a key role in many machine learning algorithms such as kernel methods, manifold learning, and dimension reduction. However, the cost of storing and manipulating the complete kernel matrix makes it infeasible for large problems. The Nyström method is a p ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
Abstract — Kernel (or similarity) matrix plays a key role in many machine learning algorithms such as kernel methods, manifold learning, and dimension reduction. However, the cost of storing and manipulating the complete kernel matrix makes it infeasible for large problems. The Nyström method is a popular samplingbased lowrank approximation scheme for reducing the computational burdens in handling large kernel matrices. In this paper, we analyze how the approximating quality of the Nyström method depends on the choice of landmark points, and in particular the encoding powers of the landmark points in summarizing the data. Our (nonprobabilistic) error analysis justifies a “clustered Nyström method ” that uses the kmeans clustering centers as landmark points. Our algorithm can be applied to scale up a wide variety of algorithms that depend on the eigenvalue decomposition of kernel matrix (or its variant), such as kernel principal component analysis, Laplacian eigenmap, spectral clustering, as well as those involving kernel matrix inverse such as leastsquares support vector machine and Gaussian process regression. Extensive experiments demonstrate the competitive performance of our algorithm in both accuracy and efficiency. Index Terms — Dimension reduction, eigenvalue decomposition, kernel matrix, lowrank approximation, manifold learning,
Large Scale Spectral Clustering with LandmarkBased Representation
 PROCEEDINGS OF THE TWENTYFIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2011
"... Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to largescale problems due to its high computational complexity. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to largescale problems due to its high computational complexity. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, these methods usually sacrifice quite a lot information of the original data, thus result in a degradation of performance. In this paper, we propose a novel approach, called Landmarkbased Spectral Clustering (LSC), for large scale clustering problems. Specifically, we select p ( ≪ n) representative data points as the landmarks and represent the original data points as the linear combinations of these landmarks. The spectral embedding of the data can then be efficiently computed with the landmarkbased representation. The proposed algorithm scales linearly with the problem size. Extensive experiments show the effectiveness and efficiency of our approach comparing to the stateoftheart methods.
Time and Space Efficient Spectral Clustering via Column Sampling
"... Spectral clustering is an elegant and powerful approach for clustering. However, the underlying eigendecomposition takes cubic time and quadratic space w.r.t. the data set size. These can be reduced by the Nyström method which samples only a subset of columns from the matrix. However, the manipulati ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Spectral clustering is an elegant and powerful approach for clustering. However, the underlying eigendecomposition takes cubic time and quadratic space w.r.t. the data set size. These can be reduced by the Nyström method which samples only a subset of columns from the matrix. However, the manipulation and storage of these sampled columns can still be expensive when the data set is large. In this paper, we propose a time and spaceefficient spectral clustering algorithm which can scale to very large data sets. A general procedure to orthogonalize the approximated eigenvectors is also proposed. Extensive spectral clustering experiments on a number of data sets, ranging in size from a few thousands to several millions, demonstrate the accuracy and scalability of the proposed approach. We further apply it to the task of image segmentation. For images with more than 10 millions pixels, this algorithm can obtain the eigenvectors in 1 minute on a single machine. 1.
Efficient kernel clustering using random fourier features
 In Proceedings of ICDM’12
, 2012
"... Abstract—Kernel clustering algorithms have the ability to capture the nonlinear structure inherent in many real world data sets and thereby, achieve better clustering performance than Euclidean distance based clustering algorithms. However, their quadratic computational complexity renders them nons ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Kernel clustering algorithms have the ability to capture the nonlinear structure inherent in many real world data sets and thereby, achieve better clustering performance than Euclidean distance based clustering algorithms. However, their quadratic computational complexity renders them nonscalable to large data sets. In this paper, we employ random Fourier maps, originally proposed for large scale classification, to accelerate kernel clustering. The key idea behind the use of random Fourier maps for clustering is to project the data into a lowdimensional space where the inner product of the transformed data points approximates the kernel similarity between them. An efficient linear clustering algorithm can then be applied to the points in the transformed space. We also propose an improved scheme which uses the top singular vectors of the transformed data matrix to perform clustering, and yields a better approximation of kernel clustering under appropriate conditions. Our empirical studies demonstrate that the proposed schemes can be efficiently applied to large data sets containing millions of data points, while achieving accuracy similar to that achieved by stateoftheart kernel clustering algorithms. KeywordsKernel clustering, Kernel kmeans, Random Fourier features, Scalability
A very fast method for clustering big text datasets
 In ECAI
, 2010
"... Abstract. Largescale text datasets have long eluded a family of particularly elegant and effective clustering methods that exploits the power of pairwise similarities between data points due to the prohibitive cost, timeand spacewise, in operating on a similarity matrix, where the stateofthe ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Largescale text datasets have long eluded a family of particularly elegant and effective clustering methods that exploits the power of pairwise similarities between data points due to the prohibitive cost, timeand spacewise, in operating on a similarity matrix, where the stateoftheart is at best quadratic in time and in space. We present an extremely fast and simple method also using the power of all pairwise similarity between data points, and show through experiments that it does as well as previous methods in clustering accuracy, and it does so with in linear time and space, without sampling data points or sparsifying the similarity matrix.
From Deformations to Parts: Motionbased Segmentation of 3D Objects
"... We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various threedimensional poses. We adapt the distance dependent Chinese restaurant process (ddCRP) to allow nonparametric discovery of a potentially unbounded number of parts, while simultane ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
We develop a method for discovering the parts of an articulated object from aligned meshes of the object in various threedimensional poses. We adapt the distance dependent Chinese restaurant process (ddCRP) to allow nonparametric discovery of a potentially unbounded number of parts, while simultaneously guaranteeing a spatially connected segmentation. To allow analysis of datasets in which object instances have varying 3D shapes, we model part variability across poses via affine transformations. By placing a matrix normalinverseWishart prior on these affine transformations, we develop a ddCRP Gibbs sampler which tractably marginalizes over transformation uncertainty. Analyzing a dataset of humans captured in dozens of poses, we infer parts which provide quantitatively better deformation predictions than conventional clustering methods. 1
Scalable sparse subspace clustering
 CVPR
"... In this paper, we address two problems in Sparse Subspace Clustering algorithm (SSC), i.e., scalability issue and outofsample problem. SSC constructs a sparse similarity graph for spectral clustering by using 1minimization based coefficients, has achieved stateoftheart results for image clus ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we address two problems in Sparse Subspace Clustering algorithm (SSC), i.e., scalability issue and outofsample problem. SSC constructs a sparse similarity graph for spectral clustering by using 1minimization based coefficients, has achieved stateoftheart results for image clustering and motion segmentation. However, the time complexity of SSC is proportion to the cubic of problem size such that it is inefficient to apply SSC into large scale setting. Moreover, SSC does not handle with outofsample data that are not used to construct the similarity graph. For each new datum, SSC needs recalculating the cluster membership of the whole data set, which makes SSC is not competitive in fast online clustering. To address the problems, this paper proposes outofsample extension of SSC, named as Scalable Sparse Subspace Clustering (SSSC), which makes SSC feasible to cluster large scale data sets. The solution of SSSC adopts a ”sampling, clustering, coding, and classifying ” strategy. Extensive experimental results on several popular data sets demonstrate the effectiveness and efficiency of our method comparing with the stateoftheart algorithms. 1.
Semisupervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion
"... Many semisupervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. However, there are two main shortcomings of the existing semisupervised clustering algorithms ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Many semisupervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. However, there are two main shortcomings of the existing semisupervised clustering algorithms. First, they have to deal with nonconvex optimization problems, leading to clustering results that are sensitive to the initialization. Second, none of these algorithms is equipped with theoretical guarantee regarding the clustering performance. We address these limitations by developing a framework for semisupervised clustering based on input pattern assisted matrix completion. The key idea is to cast clustering into a matrix completion problem, and solve it efficiently by exploiting the correlation between input patterns and cluster assignments. Our analysis shows that under appropriate conditions, only O(log n) pairwise constraints are needed to accurately recover the true cluster partition. We verify the effectiveness of the proposed algorithm by comparing it to the stateoftheart semisupervised clustering algorithms on several benchmark datasets. 1.
Comparing datadependent and dataindependent embeddings for classification and ranking of internet images
 In CVPR
, 2011
"... This paper presents a comparative evaluation of feature embeddings for classification and ranking in largescale Internet image datasets. We follow a popular framework for scalable visual learning, in which the data is first transformed by a nonlinear embedding and then an efficient linear classifie ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
This paper presents a comparative evaluation of feature embeddings for classification and ranking in largescale Internet image datasets. We follow a popular framework for scalable visual learning, in which the data is first transformed by a nonlinear embedding and then an efficient linear classifier is trained in the resulting space. Our study includes datadependent embeddings inspired by the semisupervised learning literature, and dataindependent ones based on approximating specific kernels (such as the Gaussian kernel for GIST features and the histogram intersection kernel for bags of words). Perhaps surprisingly, we find that datadependent embeddings, despite being computed from large amounts of unlabeled data, do not have any advantage over dataindependent ones in the regime of scarce labeled data. On the other hand, we find that several datadependent embeddings are competitive with popular dataindependent choices for largescale classification. 1.