MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  y Ravi Kumar

Download:
Download as a PDF | Download as a PS
by Moses Charikar, Venkatesan Guruswami, Sridhar Rajagopalan, Amit Sahai
ftp://theory.lcs.mit.edu/pub/people/venkat/feature-sel.ps
Add To MetaCart

Abstract:

Motivated by frequently recurring themes in information retrieval and related disciplines, we define a genre of problems called combinatorial feature selection problems. Given a set S of multidimensional objects, the goal is to select a subset K of relevant dimensions (or features) such that some desired property holds for the set S restricted to K. Depending on , the goal could be to either maximize or minimize the size of the subset K. Several well-studied feature selection problems can be cast in this form. We study the problems in this class derived from several natural and interesting properties , including variants of the classical p-center problem as well as problems akin to determining the VCdimension of a set system. Our main contribution is a theoretical framework for studying combinatorial feature selection, providing (in most cases essentially tight) approximation algorithms and hardness results for several instances of these problems. 1

Citations

1460 Indexing by latent semantic analysis – Deerwester, Dumais, et al. - 1990
1258 Randomized Algorithms – Motwani, Raghavan - 1997
562 Automatic Text Processing – Salton - 1989
490 Irrelevant features and the subset selection problem – John, Kohavi - 1994
367 ªAutomatic Subspace Clustering of High Dimensional Data for Data Mining Applications,º – Agrawal, Gehrke, et al. - 1998
333 Interconnections and Packaging for VLSI – Bakoglu, Circuits - 1990
295 The geometry of graphs and some of its algorithmic applications – LINIAL, LONDON, et al. - 1994
289 Hierarchically classifying documents using very few words – Koller, Sahami - 1997
255 Toward optimal feature selection – Koller, Sahami - 1996
239 Randomized rounding: a technique for provably good algorithms and algorithmic proofs – Raghavan, Thompson - 1987
234 Enhanced hypertext categorization using hyperlinks – Chakrabarti, Dom, et al. - 1998
223 Polynomial Time Approximation Schemes for Euclidean TSP and other Geometric Problems – Arora - 1996
195 Human behaviour and the principle of least effort – Zipf - 1949
174 Concept decompositions for large sparse text data using clustering – Dhillon, Modha - 2001
167 Extensions of Lipschitz mappings into a Hilbert space – Johnson, Lindenstrauss - 1984
166 Clustering to minimize the maximum intercluster distance – Gonzalez - 1985
165 On Lipschitz embedding of finite metric spaces in Hilbert space – BOURGAIN - 1985
155 Approximation algorithms for projective clustering – Agarwal, Procopiuc - 2000
145 Latent semantic indexing: A probabilistic analysis – Papadimitriou, Raghavan, et al. - 2000
130 Efficient search for approximate nearest neighbor in high dimensional spaces – Kushilevitz, Ostrovsky, et al. - 1998
98 A best possible heuristic for the k-center problem – Hochbaum, Shmoys - 1985
84 An o(log k) approximate min-cut max-flow theorem and approximation algorithm – Aumann, Rabani - 1998
80 Nearly linear time approximation scheme for Euclidean TSP and other geometric problems – ARORA - 1997
80 On Optimal Interconnections for VLSI – KAHNG, ROBINS - 1995
78 Learning mixtures of Gaussians – Dasgupta - 1999
75 An improved approximation ratio for the minimum latency problem – Goemans, Kleinberg
74 Non-clairvoyant scheduling – Motwani, Phillips, et al. - 1994
67 Clique is hard to approximate within n 1 – Hastad - 1996
64 The dense k-subgraph problem – Feige, Kortsarz, et al.
63 Zero skew clock routing with minimum wirelength – CHAO, HSU, et al. - 1992
61 Exact zero skew – Tsay - 1991
60 On limited nondeterminism and the complexity of the V-C dimension – Papadimitriou, Yannakakis - 1996
59 Unsing taxonomy, discriminants, and signatures for navigating in text databases – Charkabarti, Dom, et al. - 1997
55 Cost-sensitive analysis of communication protocols – Awerbuch, Baratz, et al.
46 Clock routing for high-performance ICs – JACKSON, SRINIVASAN, et al. - 1990
42 High-performance clock routing based on recursive geometric matching – KAHNG, CONG, et al. - 1990
41 A simple heuristic for the p-center problem – Dyer, Frieze - 1985
35 An exact zero-skew clock routing algorithm – Tsay - 1993
34 Balancing minimum spanning trees and shortest-path trees, Algorithmica 15 – Khuller, Raghavachari, et al. - 1995
33 On multidimensional packing problems – Chekuri, Khanna - 1997
31 Improved approximations of packing and covering problems – Srinivasan - 1995
30 A best possible approximation algorithm for the k-center problem – Hochbaum, Shmoys - 1985
25 Matching-based methods for high-performance clock routing – Cong, Kahng, et al. - 1993
25 Various notions of approximations – Hochbaum - 1996
23 Clustering for edge-cost minimization – Schulman - 2000
21 Bounded-slew clock and Steiner routing under Elmore delay – Cong, Kahng, et al. - 1995
20 Minimum-cost bounded-skew clock routing – Cong, Koh - 1995
20 Perfect-balance planar clock routing with minimal path-length – Zhu, Dai - 1992
14 An Efficient Zero-Skew Routing Algorithm – Edahiro - 1994
13 A zero-skew clock routing scheme for VLSI circuits – LI, JABRI - 1992