• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Extracting relevant structures with side information (2002)

by G Chechik, N Tishby
Venue:In NIPS
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 51
Next 10 →

Learning Distance Functions Using Equivalence Relations

by Aharon Bar-Hillel, Tomer Hertz, Noam Shental, Daphna Weinshall - In Proceedings of the Twentieth International Conference on Machine Learning , 2003
"... We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and e#cient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002). ..."
Abstract - Cited by 173 (6 self) - Add to MetaCart
We address the problem of learning distance metrics using side-information in the form of groups of "similar" points. We propose to use the RCA algorithm, which is a simple and e#cient algorithm for learning a full ranked Mahalanobis metric (Shental et al., 2002).

MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification

by Jun Zhu, Amr Ahmed, Eric P. Xing
"... Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a max-margin supervised topic model for both continuous and categorical response variab ..."
Abstract - Cited by 93 (27 self) - Add to MetaCart
Supervised topic models utilize document’s side information for discovering predictive low dimensional representations of documents; and existing models apply likelihoodbased estimation. In this paper, we present a max-margin supervised topic model for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the max-margin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction. We develop efficient variational methods for posterior inference and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihood-based topic models on movie review and 20 Newsgroups data sets. 1.

Distributional Word Clusters vs. Words for Text Categorization

by Ron Bekkerman, Ran El-Yaniv, Naftali Tishby, Yoad Winter, Isabelle Guyon, André Elisseeff - Journal of Machine Learning Research , 2003
"... We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This word-cluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representati ..."
Abstract - Cited by 89 (7 self) - Add to MetaCart
We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This word-cluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. This novel combination of SVM with word-cluster representation is compared with SVM-based categorization using the simpler bag-of-words (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method based on word clusters significantly outperforms the word-based representation in terms of categorization accuracy or representation efficiency. On the two other sets (Reuters-21578 and WebKB) the word-based representation slightly outperforms the word-cluster representation. We investigate the potential reasons for this behavior and relate it to structural differences between the datasets.
(Show Context)

Citation Context

...g., Slonim and Tishby, 2001; El-Yaniv and Souroujon, 2001; 6 Distributional Word Clusters vs. Words for Text Categorization Slonim et al., 2002) and has interesting extensions (Friedman et al., 2001; =-=Chechik and Tishby, 2002-=-). We note also that unlike other variants of distributional clustering (such as the PLSI approach of Homan, 2001), the IB method is not based on a generative (mixture) modelling approach (including ...

Non-Redundant Data Clustering

by David Gondek, Thomas Hofmann , 2004
"... Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to de ..."
Abstract - Cited by 87 (3 self) - Add to MetaCart
Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We present experimental results for applications in text mining and computer vision.

Euclidean embedding of co-occurrence data

by Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby - Advances in Neural Information Processing Systems 17 , 2005
"... Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for em-bedding objects of different types, such as images and text, into a single comm ..."
Abstract - Cited by 65 (1 self) - Add to MetaCart
Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for em-bedding objects of different types, such as images and text, into a single common Euclidean space based on their co-occurrence statistics. Thejoint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to con-vex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via ran-dom walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and signifi-cantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1 Introduction Embeddings of objects in a low-dimensional space are an important tool in unsupervisedlearning and in preprocessing data for supervised learning algorithms. They are especially valuable for exploratory data analysis and visualization by providing easily interpretablerepresentations of the relationships among objects. Most current embedding techniques build low dimensional mappings that preserve certain relationships among objects and dif-fer in the relationships they choose to preserve, which range from pairwise distances in multidimensional scaling (MDS) [4] to neighborhood structure in locally linear embedding[12]. All these methods operate on objects of a single type endowed with a measure of similarity or dissimilarity. However, real-world data often involve objects of several very different types without anatural measure of similarity. For example, typical web pages or scientific papers contain
(Show Context)

Citation Context

... 7 newsgroup sets. Embedding dimension is 2. 5.2 Information Retrieval To obtain a more quantitative estimate of performance, we applied CODE to the 20 newsgroups corpus, preprocessed as described in =-=[3]-=-. This corpus consists of 20 groups, each with 1000 documents. We first removed the 100 most frequent words, and then selected the next k most frequent words for different values of k (see below). The...

Similarity Scores based on Background Samples

by Lior Wolf, Tal Hassner, Yaniv Taigman
"... Abstract. Evaluating the similarity of images and their descriptors by employing discriminative learners has proven itself to be an effective face recognition paradigm. In this paper we show how “background samples”, that is, examples which do not belong to any of the classes being learned, may prov ..."
Abstract - Cited by 65 (6 self) - Add to MetaCart
Abstract. Evaluating the similarity of images and their descriptors by employing discriminative learners has proven itself to be an effective face recognition paradigm. In this paper we show how “background samples”, that is, examples which do not belong to any of the classes being learned, may provide a significant performance boost to such face recognition systems. In particular, we make the following contributions. First, we define and evaluate the “Two-Shot Similarity ” (TSS) score as an extension to the recently proposed “One-Shot Similarity ” (OSS) measure. Both these measures utilize background samples to facilitate better recognition rates. Second, we examine the ranking of images most similar to a query image and employ these as a descriptor for that image. Finally, we provide results underscoring the importance of proper face alignment in automatic face recognition systems. These contributions in concert allow us to obtain a success rate of 86.83 % on the Labeled Faces in the Wild (LFW) benchmark, outperforming current state-of-the-art results. 1
(Show Context)

Citation Context

... some of the previousSimilarity Scores based on Background Samples 3 contributions, e.g., [17–19], require having training samples with the same identity. Other side-information contributions, e.g., =-=[20]-=- assume that the variability in the side information differs from that in the relevant data. 3 The Two-Shot Similarity Score We begin our description of the TSS measure by reviewing the OSS [10, 11]. ...

Non-Redundant Multi-View Clustering Via Orthogonalization

by Ying Cui
"... Typical clustering algorithms output a single clustering of the data. However, in real world applications, data can often be interpreted in many different ways; data can have different groupings that are reasonable and interesting from different perspectives. This is especially true for high-dimensi ..."
Abstract - Cited by 39 (5 self) - Add to MetaCart
Typical clustering algorithms output a single clustering of the data. However, in real world applications, data can often be interpreted in many different ways; data can have different groupings that are reasonable and interesting from different perspectives. This is especially true for high-dimensional data, where different feature subspaces may reveal different structures of the data. Why commit to one clustering solution while all these alternative clustering views might be interesting to the user. In this paper, we propose a new clustering paradigm for explorative data analysis: find all non-redundant clustering views of the data, where data points of one cluster can belong to different clusters in other views. We present a framework to solve this problem and suggest two approaches within this framework: (1) orthogonal clustering, and (2) clustering in orthogonal subspaces. In essence, both approaches find alternative ways to partition the data by projecting it to a space that is orthogonal to our current solution. The first approach seeks orthogonality in the cluster space, while the second approach seeks orthogonality in the feature space. We test our framework on both synthetic and high-dimensional benchmark data sets, and the results show that indeed our approaches were able to discover varied solutions that are interesting and meaningful.
(Show Context)

Citation Context

...e user. 2 Literature Review In this section, we briefly review the literature related to our research in different aspects. First, our research can be considered as performing nonredundant clustering =-=[13, 5]-=-. In non-redundant clustering, we are typically given a set of data objects together with an existing clustering solution and the goal is to learn an alternative clustering that captures new informati...

COALA: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity

by Eric Bae, James Bailey - in ICDM, 2006
"... Cluster analysis has long been a fundamental task in data mining and machine learning. However, traditional clustering methods concentrate on producing a single solution, even though multiple alternative clusterings may exist. It is thus difficult for the user to validate whether the given solution ..."
Abstract - Cited by 38 (4 self) - Add to MetaCart
Cluster analysis has long been a fundamental task in data mining and machine learning. However, traditional clustering methods concentrate on producing a single solution, even though multiple alternative clusterings may exist. It is thus difficult for the user to validate whether the given solution is in fact appropriate, particularly for large and complex datasets. In this paper we explore the critical requirements for systematically finding a new clustering, given that an already known clustering is available and we also propose a novel algorithm, COALA, to discover this new clustering. Our approach is driven by two important factors; dissimilarity and quality. These are especially important for finding a new clustering which is highly informative about the underlying structure of data, but is at the same time distinctively different from the provided clustering. We undertake an experimental analysis and show that our method is able to outperform existing techniques, for both synthetic and real datasets. 1.
(Show Context)

Citation Context

... knowledge to guide their clustering process. In constraint clustering [6, 24], knowledge is expressed as ‘must-link’ and ‘cannot-link’ constraints to produce more efficient and accurate clusters. In =-=[3, 13]-=-, negative information about undesired structures or features is provided to ensure that clustering process avoids these information and focusing on the clusterings in ‘positive’ data. However, unlike...

Effective Unconstrained Face Recognition by Combining Multiple Descriptors and Learned Background Statistics

by Lior Wolf, Tal Hassner, Yaniv Taigman
"... Abstract—Computer Vision and Biometrics systems have demonstrated considerable improvement in recognizing and verifying faces in digital images. Still, recognizing faces appearing in unconstrained, natural conditions remains a challenging task. In this paper we present a face-image, pair-matching ap ..."
Abstract - Cited by 37 (2 self) - Add to MetaCart
Abstract—Computer Vision and Biometrics systems have demonstrated considerable improvement in recognizing and verifying faces in digital images. Still, recognizing faces appearing in unconstrained, natural conditions remains a challenging task. In this paper we present a face-image, pair-matching approach primarily developed and tested on the “Labeled Faces in the Wild ” (LFW) benchmark that reflect the challenges of face recognition from unconstrained images. The approach we propose makes the following contributions. (a) We present a family of novel face-image descriptors designed to capture statistics of local patch similarities. (b) We demonstrate how semi-labeled background samples may be used to better evaluate image similarities. To this end we describe a number of novel, effective similarity measures. (c) We show how labeled background samples, when available, may further improve classification performance, by employing a unique pair-matching pipeline. We present state-of-the-art results on the LFW pair-matching benchmarks. In addition, we show our system to be well suited for multi-label face classification (recognition) problems. We perform recognition tests on LFW images as well images from the laboratory controlled multiPIE database.
(Show Context)

Citation Context

... literature known to us. In particular, some of the previous contributions, e.g., [36], [42], [43], require having training samples with the same identity. Other side-information contributions, e.g., =-=[44]-=- assume that the variability in the side information differs from that in the relevant data. Also related to our work is the recent method of [6]. They study trait- or identity-based classifier-output...

Finding alternative clusterings using constraints

by Ian Davidson, Zijie Qi - In Proceedings of the 8th IEEE international conference on data mining (ICDM , 2008
"... The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of mustl ..."
Abstract - Cited by 31 (2 self) - Add to MetaCart
The aim of data mining is to find novel and actionable insights. However, most algorithms typically just find a single explanation of the data even though alternatives could exist. In this work, we explore a general purpose approach to find an alternative clustering of the data with the aid of mustlink and cannot-link constraints. This problem has received little attention in the literature and since our approach can be incorporated into many clustering algorithm that uses a distance function, compares favorably with existing work. 1.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University