Results 1  10
of
434
Just Relax: Convex Programming Methods for Identifying Sparse Signals in Noise
, 2006
"... This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that ..."
Abstract

Cited by 496 (2 self)
 Add to MetaCart
This paper studies a difficult and fundamental problem that arises throughout electrical engineering, applied mathematics, and statistics. Suppose that one forms a short linear combination of elementary signals drawn from a large, fixed collection. Given an observation of the linear combination that has been contaminated with additive noise, the goal is to identify which elementary signals participated and to approximate their coefficients. Although many algorithms have been proposed, there is little theory which guarantees that these algorithms can accurately and efficiently solve the problem. This paper studies a method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program. This approach is powerful because the optimization can be completed in polynomial time with standard scientific software. The paper provides general conditions which ensure that convex relaxation succeeds. As evidence of the broad impact of these results, the paper describes how convex relaxation can be used for several concrete signal recovery problems. It also describes applications to channel coding, linear regression, and numerical analysis.
Data Clustering: 50 Years Beyond KMeans
, 2008
"... Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and m ..."
Abstract

Cited by 274 (6 self)
 Add to MetaCart
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is exploratory in nature to find structure in data. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, Kmeans, was first published in 1955. In spite of the fact that Kmeans was proposed over 50 years ago and thousands of clustering algorithms have been published since then, Kmeans is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the illposed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semisupervised clustering, ensemble clustering, simultaneous feature selection, and data clustering and large scale data clustering.
A Probabilistic Framework for SemiSupervised Clustering
, 2004
"... Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supe ..."
Abstract

Cited by 271 (14 self)
 Add to MetaCart
(Show Context)
Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn the distance measure. We propose a probabilistic model for semisupervised clustering based on Hidden Markov Random Fields (HMRFs) that provides a principled framework for incorporating supervision into prototypebased clustering. The model generalizes a previous approach that combines constraints and Euclidean distance learning, and allows the use of a broad range of clustering distortion measures, including Bregman divergences (e.g., Euclidean distance and Idivergence) and directional similarity measures (e.g., cosine similarity). We present an algorithm that performs partitional semisupervised clustering of data by minimizing an objective function derived from the posterior energy of the HMRF model. Experimental results on several text data sets demonstrate the advantages of the proposed framework. 1.
Kernel kmeans, Spectral Clustering and Normalized Cuts
 KDD '04
, 2004
"... ..."
(Show Context)
A Generalized Maximum Entropy Approach to Bregman Coclustering and Matrix Approximation
 In KDD
, 2004
"... Coclustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic coclustering approach applicable to empirical joint probability distributions was proposed. In many situations, coclust ..."
Abstract

Cited by 133 (29 self)
 Add to MetaCart
(Show Context)
Coclustering is a powerful data mining technique with varied applications such as text clustering, microarray analysis and recommender systems. Recently, an informationtheoretic coclustering approach applicable to empirical joint probability distributions was proposed. In many situations, coclustering of more general matrices is desired. In this paper, we present a substantially generalized coclustering framework wherein any Bregman divergence can be used in the objective function, and various conditional expectation based constraints can be considered based on the statistics that need to be preserved. Analysis of the coclustering problem leads to the minimum Bregman information principle, which generalizes the maximum entropy principle, and yields an elegant meta algorithm that is guaranteed to achieve local optimality. Our methodology yields new algorithms and also encompasses several previously known clustering and coclustering algorithms based on alternate minimization.
Relational Learning via Collective Matrix Factorization
, 2008
"... Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode ..."
Abstract

Cited by 127 (4 self)
 Add to MetaCart
(Show Context)
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations would then encode users ’ ratings of movies, movies ’ genres, and actors ’ roles in movies. A common prediction technique given one pairwise relation, for example a #users × #movies ratings matrix, is lowrank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new largescale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a
Top 10 algorithms in data mining
, 2007
"... Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining a ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
(Show Context)
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,
Generalized nonnegative matrix approximations with Bregman divergences
 In: Neural Information Proc. Systems
, 2005
"... Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/imag ..."
Abstract

Cited by 97 (5 self)
 Add to MetaCart
(Show Context)
Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/image recognition, language modeling, speech processing and many others. Despite these numerous applications, the algorithmic development for computing the NNMA factors has been relatively deficient. This paper makes algorithmic progress by modeling and solving (using multiplicative updates) new generalized NNMA problems that minimize Bregman divergences between the input matrix and its lowrank approximation. The multiplicative update formulae in the pioneering work by Lee and Seung [11] arise as a special case of our algorithms. In addition, the paper shows how to use penalty functions for incorporating constraints other than nonnegativity into the problem. Further, some interesting extensions to the use of “link ” functions for modeling nonlinear relationships are also discussed. 1
Compressed Histogram of Gradients: A LowBitrate Descriptor
 INT J COMPUT VIS
, 2011
"... Establishing visual correspondences is an essential component of many computer vision problems, which is often done with local featuredescriptors. Transmission and storage of these descriptors are of critical importance in the context of mobile visual search applications. We propose a framework f ..."
Abstract

Cited by 97 (23 self)
 Add to MetaCart
Establishing visual correspondences is an essential component of many computer vision problems, which is often done with local featuredescriptors. Transmission and storage of these descriptors are of critical importance in the context of mobile visual search applications. We propose a framework for computing low bitrate feature descriptors with a 20 × reduction in bit rate compared to stateoftheart descriptors. The framework offers low complexity and has significant speedup in the matching stage. We show how to efficiently compute distances between descriptors in the compressed domain eliminating the need for decoding. We perform a comprehensive performance comparison with SIFT, SURF, BRIEF, MPEG7 image signatures and other low bitrate descriptors and show that our proposed CHoG descriptor outperforms existing schemes significantly over a wide range of bitrates. We implement the descriptor in a mobile image retrieval system and for a database of 1 million CD, DVD and book covers, we achieve 96 % retrieval accuracy using only 4 KB of data per query image.
A Unified Framework for Modelbased Clustering
 Journal of Machine Learning Research
, 2003
"... Modelbased clustering techniques have been widely used and have shown promising results in many applications involving complex data. This paper presents a unified framework for probabilistic modelbased clustering based on a bipartite graph view of data and models that highlights the commonaliti ..."
Abstract

Cited by 74 (7 self)
 Add to MetaCart
Modelbased clustering techniques have been widely used and have shown promising results in many applications involving complex data. This paper presents a unified framework for probabilistic modelbased clustering based on a bipartite graph view of data and models that highlights the commonalities and differences among existing modelbased clustering algorithms. In this view, clusters are represented as probabilistic models in a model space that is conceptually separate from the data space. For partitional clustering, the view is conceptually similar to the ExpectationMaximization (EM) algorithm. For hierarchical clustering, the graphbased view helps to visualize critical/important distinctions between similaritybased approaches and modelbased approaches.