Results 1  10
of
205
For Most Large Underdetermined Systems of Linear Equations the Minimal ℓ1norm Solution is also the Sparsest Solution
 Comm. Pure Appl. Math
, 2004
"... We consider linear equations y = Φα where y is a given vector in R n, Φ is a given n by m matrix with n < m ≤ An, and we wish to solve for α ∈ R m. We suppose that the columns of Φ are normalized to unit ℓ 2 norm 1 and we place uniform measure on such Φ. We prove the existence of ρ = ρ(A) so that ..."
Abstract

Cited by 568 (10 self)
 Add to MetaCart
We consider linear equations y = Φα where y is a given vector in R n, Φ is a given n by m matrix with n < m ≤ An, and we wish to solve for α ∈ R m. We suppose that the columns of Φ are normalized to unit ℓ 2 norm 1 and we place uniform measure on such Φ. We prove the existence of ρ = ρ(A) so that for large n, and for all Φ’s except a negligible fraction, the following property holds: For every y having a representation y = Φα0 by a coefficient vector α0 ∈ R m with fewer than ρ · n nonzeros, the solution α1 of the ℓ 1 minimization problem min �x�1 subject to Φα = y is unique and equal to α0. In contrast, heuristic attempts to sparsely solve such systems – greedy algorithms and thresholding – perform poorly in this challenging setting. The techniques include the use of random proportional embeddings and almostspherical sections in Banach space theory, and deviation bounds for the eigenvalues of random Wishart matrices.
PEGASUS: A policy search method for large MDPs and POMDPs
 In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000
"... We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent&qu ..."
Abstract

Cited by 257 (9 self)
 Add to MetaCart
We propose a new approach to the problem of searching a space of policies for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP), given a model. Our approach is based on the following observation: Any (PO)MDP can be transformed into an "equivalent" POMDP in which all state transitions (given the current state and action) are deterministic. This reduces the general problem of policy search to one in which we need only consider POMDPs with deterministic transitions. We give a natural way of estimating the value of all policies in these transformed POMDPs. Policy search is then simply performed by searching for a policy with high estimated value. We also establish conditions under which our value estimates will be good, recovering theoretical results similar to those of Kearns, Mansour and Ng [7], but with "sample complexity" bounds that have only a polynomial rather than exponential dependence on the horizon time. Our method appl...
Learning nearoptimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path
 MACHINE LEARNING JOURNAL (2008) 71:89129
, 2008
"... ..."
Convergence rates of posterior distributions
 Ann. Statist
, 2000
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, D ..."
Abstract

Cited by 110 (19 self)
 Add to MetaCart
(Show Context)
We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinitedimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, logspline models, Dirichlet processes and interval censoring. 1. Introduction. Suppose
Projection estimation in multiple regression with application to functional ANOVA models
 Ann. Statist
, 1998
"... A general theory on rates of convergence of the leastsquares projection estimate in multiple regression is developed. The theory is applied to the functional ANOVA model, where the multivariate regression function is modeled as a specified sum of a constant term, main effects Žfunctions of one vari ..."
Abstract

Cited by 58 (14 self)
 Add to MetaCart
A general theory on rates of convergence of the leastsquares projection estimate in multiple regression is developed. The theory is applied to the functional ANOVA model, where the multivariate regression function is modeled as a specified sum of a constant term, main effects Žfunctions of one variable. and selected interaction terms Žfunctions of two or more variables.. The leastsquares projection is onto an approximating space constructed from arbitrary linear spaces of functions and their tensor products respecting the assumed ANOVA structure of the regression function. The linear spaces that serve as building blocks can be any of the ones commonly used in practice: polynomials, trigonometric polynomials, splines, wavelets and finite elements. The rate of convergence result that is obtained reinforces the intuition that loworder ANOVA modeling can achieve dimension reduction and thus overcome the curse of dimensionality. Moreover, the components of the projection estimate in an appropriately defined ANOVA decomposition provide consistent estimates of the corresponding components of the regression function. When the regression function does not satisfy the assumed ANOVA form, the projection estimate converges to its best approximation of that form.
Convergence rates of posterior distributions for noniid observations
 Ann. Statist
, 2007
"... We consider the asymptotic behavior of posterior distributions and Bayes estimators based on observations which are required to be neither independent nor identically distributed. We give general results on the rate of convergence of the posterior measure relative to distances derived from a testing ..."
Abstract

Cited by 57 (6 self)
 Add to MetaCart
(Show Context)
We consider the asymptotic behavior of posterior distributions and Bayes estimators based on observations which are required to be neither independent nor identically distributed. We give general results on the rate of convergence of the posterior measure relative to distances derived from a testing criterion. We then specialize our results to independent, nonidentically distributed observations, Markov processes, stationary Gaussian time series and the white noise model. We apply our general results to several examples of infinitedimensional statistical models including nonparametric regression with normal errors, binary regression, Poisson regression, an interval censoring model, Whittle estimation of the spectral density of a time series and a nonlinear autoregressive model.: θ ∈ Θ) be a sequence of statistical experiments with observations X (n), where the parameter set Θ is arbitrary and n is an indexing parameter, usually the sample size. We put a prior distribution Πn on θ ∈ Θ and study the rate of convergence of the posterior
Modulation Estimators and Confidence Sets
 ANN. STATIST
, 1999
"... An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to ..."
Abstract

Cited by 50 (11 self)
 Add to MetaCart
An unknown signal plus white noise is observed at n discrete time points. Within a large convex class of linear estimators of , we choose the estimator b that minimizes estimated quadratic risk. By construction, b is nonlinear. This estimation is done after orthogonal transformation of the data to a reasonable coordinate system. The procedure adaptively tapers the coefficients of the transformed data. If the class of candidate estimators satisfies a uniform entropy condition, then b is asymptotically minimax in Pinsker's sense over certain ellipsoids in the parameter space and shares one such asymptotic minimax property with the JamesStein estimator. We describe computational algorithms for b and construct confidence sets for the unknown signal. These confidence sets are centered at b , have correct asymptotic coverage probability, and have relatively small risk as setvalued estimators of .
Maximal inequalities for degenerate Uprocesses with applications to optimization estimators
, 1991
"... Maximal Inequalities for degenerate Uprocesses of order k, k ≥ 1, are established. The results rest on a moment inequality (due to Bonami (1970)) for kthorder forms, and extensions of chaining and symmetrization inequalities from the theory of empirical processes. Rates of uniform convergence are ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
Maximal Inequalities for degenerate Uprocesses of order k, k ≥ 1, are established. The results rest on a moment inequality (due to Bonami (1970)) for kthorder forms, and extensions of chaining and symmetrization inequalities from the theory of empirical processes. Rates of uniform convergence are obtained. The maximal inequalities can be used to determine the limiting distribution of estimators that optimize criterion functions having Uprocess structure. As an application, a semiparametric regression estimator that maximizes a Uprocess of order three is shown to be nconsistent and asymptotically normally distributed.
Efficient Estimation for the Cox Model with Interval Censoring
 Annals of Statistics
, 1996
"... The maximum likelihood estimator (MLE) for the proportional hazards model with current status data is studied. It is shown that the MLE for the regression parameter is asymptotically normal with p nconvergence rate and achieves the information bound, even though the MLE for the baseline cumulativ ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
The maximum likelihood estimator (MLE) for the proportional hazards model with current status data is studied. It is shown that the MLE for the regression parameter is asymptotically normal with p nconvergence rate and achieves the information bound, even though the MLE for the baseline cumulative hazard function only converges at n 1=3 rate. Estimation of the asymptotic variance matrix for the MLE of the regression parameter is also considered. To prove our main results, we also establish a general theorem showing that the MLE of the finite dimensional parameter in a class of semiparametric models is asymptotically efficient even though the MLE of the infinite dimensional parameter converges at a rate slower than p n. The results are illustrated by applying them to a data set from a tumoriginicity study. 1. Introduction In many survival analysis problems, we are interested in the relationship between a failure time T and a vector of covariates Z. However, it is common that obs...