Determining the Number of Factors in Approximate Factor Models
, 2000
Cited by 538 (29 self)
In this paper we develop some statistical theory for factor models of large dimensions. The focus is the determination of the number of factors, which is an unresolved issue in the rapidly growing literature on multifactor models. We propose a panel Cp criterion and show that the number of factors
Learning with local and global consistency
 Advances in Neural Information Processing Systems 16
, 2004
Cited by 666 (21 self)
We consider the general problem of learning from labeled and unlabeled data, which is often called semisupervised learning or transductive inference. A principled approach to semisupervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic
Reputation and Imperfect Information
 Journal of Economic Theory
, 1982
Cited by 500 (5 self)
of finite games, such as Selten’s finitely repeated chainstore game or in the finitely repeated prisoners ’ dilemma. We reexamine Selten’s model, adding to it a “small ” amount of imperfect (or incomplete) information about players ’ payoffs, and we find that this addition is sufficient to give rise
Unsupervised Models for Named Entity Classification
 In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora
, 1999
Cited by 534 (4 self)
This paper discusses the use of unlabeled examples for the problem of named entity classification. A large number of rules is needed for coverage of the domain, suggesting that a fairly large number of labeled examples should be required to train a classifier. However, we show that the use
Greed is Good: Algorithmic Results for Sparse Approximation
, 2004
Cited by 916 (8 self)
This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries. It provides a sufficient condition under which both OMP and Donoho’s basis pursuit (BP) paradigm can recover the optimal
A ReExamination of Text Categorization Methods
, 1999
Cited by 832 (24 self)
focus on the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the trainingset category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category
The Dantzig Selector: Statistical Estimation When p Is Much Larger Than n
, 2007
Cited by 877 (14 self)
In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Xβ + z, where β ∈ Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n
A theory of communicating sequential processes
, 1984
Cited by 4135 (17 self)
A mathematical model for communicating sequential processes is given, and a number of its interesting and useful properties are stated and proved. The possibilities of nondetermimsm are fully taken into account.
The space complexity of approximating the frequency moments
 JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1996
Cited by 855 (12 self)
The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly
