Motivated by frequently recurring themes in information retrieval and related disciplines, we define a genre of problems called combinatorial feature selection problems. Given a set S of multidimensional objects, the goal is to select a subset K of relevant dimensions (or features) such that some desired property holds for the set S restricted to K. Depending on , the goal could be to either maximize or minimize the size of the subset K. Several well-studied feature selection problems can be cast in this form. We study the problems in this class derived from several natural and interesting properties , including variants of the classical p-center problem as well as problems akin to determining the VCdimension of a set system. Our main contribution is a theoretical framework for studying combinatorial feature selection, providing (in most cases essentially tight) approximation algorithms and hardness results for several instances of these problems. 1
|
1460
|
Indexing by latent semantic analysis
– Deerwester, Dumais, et al.
- 1990
|
|
1258
|
Randomized Algorithms
– Motwani, Raghavan
- 1997
|
|
562
|
Automatic Text Processing
– Salton
- 1989
|
|
490
|
Irrelevant features and the subset selection problem
– John, Kohavi
- 1994
|
|
367
|
ªAutomatic Subspace Clustering of High Dimensional Data for Data Mining Applications,º
– Agrawal, Gehrke, et al.
- 1998
|
|
333
|
Interconnections and Packaging for VLSI
– Bakoglu, Circuits
- 1990
|
|
295
|
The geometry of graphs and some of its algorithmic applications
– LINIAL, LONDON, et al.
- 1994
|
|
289
|
Hierarchically classifying documents using very few words
– Koller, Sahami
- 1997
|
|
255
|
Toward optimal feature selection
– Koller, Sahami
- 1996
|
|
239
|
Randomized rounding: a technique for provably good algorithms and algorithmic proofs
– Raghavan, Thompson
- 1987
|
|
234
|
Enhanced hypertext categorization using hyperlinks
– Chakrabarti, Dom, et al.
- 1998
|
|
223
|
Polynomial Time Approximation Schemes for Euclidean TSP and other Geometric Problems
– Arora
- 1996
|
|
195
|
Human behaviour and the principle of least effort
– Zipf
- 1949
|
|
174
|
Concept decompositions for large sparse text data using clustering
– Dhillon, Modha
- 2001
|
|
167
|
Extensions of Lipschitz mappings into a Hilbert space
– Johnson, Lindenstrauss
- 1984
|
|
166
|
Clustering to minimize the maximum intercluster distance
– Gonzalez
- 1985
|
|
165
|
On Lipschitz embedding of finite metric spaces in Hilbert space
– BOURGAIN
- 1985
|
|
155
|
Approximation algorithms for projective clustering
– Agarwal, Procopiuc
- 2000
|
|
145
|
Latent semantic indexing: A probabilistic analysis
– Papadimitriou, Raghavan, et al.
- 2000
|
|
130
|
Efficient search for approximate nearest neighbor in high dimensional spaces
– Kushilevitz, Ostrovsky, et al.
- 1998
|
|
98
|
A best possible heuristic for the k-center problem
– Hochbaum, Shmoys
- 1985
|
|
84
|
An o(log k) approximate min-cut max-flow theorem and approximation algorithm
– Aumann, Rabani
- 1998
|
|
80
|
Nearly linear time approximation scheme for Euclidean TSP and other geometric problems
– ARORA
- 1997
|
|
80
|
On Optimal Interconnections for VLSI
– KAHNG, ROBINS
- 1995
|
|
78
|
Learning mixtures of Gaussians
– Dasgupta
- 1999
|
|
75
|
An improved approximation ratio for the minimum latency problem
– Goemans, Kleinberg
|
|
74
|
Non-clairvoyant scheduling
– Motwani, Phillips, et al.
- 1994
|
|
67
|
Clique is hard to approximate within n 1
– Hastad
- 1996
|
|
64
|
The dense k-subgraph problem
– Feige, Kortsarz, et al.
|
|
63
|
Zero skew clock routing with minimum wirelength
– CHAO, HSU, et al.
- 1992
|
|
61
|
Exact zero skew
– Tsay
- 1991
|
|
60
|
On limited nondeterminism and the complexity of the V-C dimension
– Papadimitriou, Yannakakis
- 1996
|
|
59
|
Unsing taxonomy, discriminants, and signatures for navigating in text databases
– Charkabarti, Dom, et al.
- 1997
|
|
55
|
Cost-sensitive analysis of communication protocols
– Awerbuch, Baratz, et al.
|
|
46
|
Clock routing for high-performance ICs
– JACKSON, SRINIVASAN, et al.
- 1990
|
|
42
|
High-performance clock routing based on recursive geometric matching
– KAHNG, CONG, et al.
- 1990
|
|
41
|
A simple heuristic for the p-center problem
– Dyer, Frieze
- 1985
|
|
35
|
An exact zero-skew clock routing algorithm
– Tsay
- 1993
|
|
34
|
Balancing minimum spanning trees and shortest-path trees, Algorithmica 15
– Khuller, Raghavachari, et al.
- 1995
|
|
33
|
On multidimensional packing problems
– Chekuri, Khanna
- 1997
|
|
31
|
Improved approximations of packing and covering problems
– Srinivasan
- 1995
|
|
30
|
A best possible approximation algorithm for the k-center problem
– Hochbaum, Shmoys
- 1985
|
|
25
|
Matching-based methods for high-performance clock routing
– Cong, Kahng, et al.
- 1993
|
|
25
|
Various notions of approximations
– Hochbaum
- 1996
|
|
23
|
Clustering for edge-cost minimization
– Schulman
- 2000
|
|
21
|
Bounded-slew clock and Steiner routing under Elmore delay
– Cong, Kahng, et al.
- 1995
|
|
20
|
Minimum-cost bounded-skew clock routing
– Cong, Koh
- 1995
|
|
20
|
Perfect-balance planar clock routing with minimal path-length
– Zhu, Dai
- 1992
|
|
14
|
An Efficient Zero-Skew Routing Algorithm
– Edahiro
- 1994
|
|
13
|
A zero-skew clock routing scheme for VLSI circuits
– LI, JABRI
- 1992
|