Results 1  10
of
174
Distance metric learning for large margin nearest neighbor classification
 In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract

Cited by 685 (14 self)
 Add to MetaCart
(Show Context)
We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews
, 2003
"... The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixe ..."
Abstract

Cited by 440 (0 self)
 Add to MetaCart
The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete webbased tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.
InformationTheoretic CoClustering
 In KDD
, 2003
"... Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views ..."
Abstract

Cited by 344 (12 self)
 Add to MetaCart
(Show Context)
Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the coclustering problem as an optimization problem in information theory  the optimal coclustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters.
A search engine for 3d models
 ACM Transactions on Graphics
, 2003
"... As the number of 3D models available on the Web grows, there is an increasing need for a search engine to help people find them. Unfortunately, traditional textbased search techniques are not always effective for 3D data. In this paper, we investigate new shapebased search methods. The key challen ..."
Abstract

Cited by 313 (22 self)
 Add to MetaCart
As the number of 3D models available on the Web grows, there is an increasing need for a search engine to help people find them. Unfortunately, traditional textbased search techniques are not always effective for 3D data. In this paper, we investigate new shapebased search methods. The key challenges are to develop query methods simple enough for novice users and matching algorithms robust enough to work for arbitrary polygonal models. We present a webbased search engine system that supports queries based on 3D sketches, 2D sketches, 3D
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 280 (9 self)
 Add to MetaCart
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
A Minmax Cut Algorithm for Graph Partitioning and Data Clustering
, 2001
"... An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objec ..."
Abstract

Cited by 212 (15 self)
 Add to MetaCart
An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objective function that follows the minmax clustering principle. The relaxed version of the optimization of the minmax cut objective function leads to the Fiedler vector in spectral graph partition. Theoretical analyses of minmax cut indicate that it leads to balanced partitions, and lower bonds are derived. The minmax cut algorithm is tested on newsgroup datasets and is found to outperform other current popular partitioning/clustering methods. The linkagebased re nements in the algorithm further improve the quality of clustering substantially. We also demonstrate that the linearized search order based on linkage differential is better than that based on the Fiedler vector, providing another effective partition method.
Spectral Relaxation for Kmeans Clustering
, 2001
"... The popular Kmeans clustering partitions a data set by minimizing a sumofsquares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the dat ..."
Abstract

Cited by 194 (27 self)
 Add to MetaCart
(Show Context)
The popular Kmeans clustering partitions a data set by minimizing a sumofsquares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition of the eigenvector matrix. As a byproduct we also derive a lower bound for the minimum of the sumofsquares cost function.
Document Clustering using Word Clusters via the Information Bottleneck Method
 In ACM SIGIR 2000
, 2000
"... We present a novel implementation of the recently introduced information bottleneck method for unsupervised document clustering. Given a joint empirical distribution of words and documents, p(x; y), we first cluster the words, Y , so that the obtained word clusters, Y_hat , maximally preserve the in ..."
Abstract

Cited by 178 (17 self)
 Add to MetaCart
(Show Context)
We present a novel implementation of the recently introduced information bottleneck method for unsupervised document clustering. Given a joint empirical distribution of words and documents, p(x; y), we first cluster the words, Y , so that the obtained word clusters, Y_hat , maximally preserve the information on the documents. The resulting joint distribution, p(X; Y_hat ), contains most of the original information about the documents, I(X; Y_hat ) ~= I(X;Y ), but it is much less sparse and noisy. Using the same procedure we then cluster the documents, X , so that the information about the wordclusters is preserved. Thus, we first find wordclusters that capture most of the mutual information about the set of documents, and then find document clusters, that preserve the information about the word clusters. We tested this procedure over several document collections based on subsets taken from the standard 20Newsgroups corpus. The results were assessed by calculating the correlation between the document clusters and the correct labels for these documents. Finding from our experiments show that this double clustering procedure, which uses the information bottleneck method, yields significantly superior performance compared to other common document distributional clustering algorithms. Moreover, the double clustering procedure improves all the distributional clustering methods examined here.
Stochastic Neighbor Embedding
 Advances in Neural Information Processing Systems 15
"... We describe a probabilistic approach to the task of placing objects, described by highdimensional vectors or by pairwise dissimilarities, in a lowdimensional space in a way that preserves neighbor identities. A Gaussian is centered on each object in the highdimensional space and the densities ..."
Abstract

Cited by 169 (9 self)
 Add to MetaCart
(Show Context)
We describe a probabilistic approach to the task of placing objects, described by highdimensional vectors or by pairwise dissimilarities, in a lowdimensional space in a way that preserves neighbor identities. A Gaussian is centered on each object in the highdimensional space and the densities under this Gaussian (or the given dissimilarities) are used to define a probability distribution over all the potential neighbors of the object. The aim of the embedding is to approximate this distribution as well as possible when the same operation is performed on the lowdimensional "images" of the objects. A natural cost function is a sum of KullbackLeibler divergences, one per object, which leads to a simple gradient for adjusting the positions of the lowdimensional images.
CentroidBased Document Classification: Analysis & Experimental Results
, 2000
"... In this paper we present a simple lineartime centroidbased document classification algorithm, that despite its simplicity and robust performance, has not been extensively studied and analyzed. Our experiments show that this centroidbased classifier consistently and substantially outperforms o ..."
Abstract

Cited by 138 (1 self)
 Add to MetaCart
(Show Context)
In this paper we present a simple lineartime centroidbased document classification algorithm, that despite its simplicity and robust performance, has not been extensively studied and analyzed. Our experiments show that this centroidbased classifier consistently and substantially outperforms other algorithms such as Naive Bayesian, knearestneighbors, and C4.5, on a wide range of datasets. Our analysis shows that the similarity measure used by the centroidbased scheme allows it to classify a new document based on how closely its behavior matches the behavior of the documents belonging to different classes. This matching allows it to dynamically adjust for classes with different densities and accounts for dependencies between the terms in the different classes.