Results 1  10
of
852,162
The Importance of Convexity in Learning with Squared Loss
 IEEE Transactions on Information Theory
, 1997
"... We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is\Omega\Gamma/2 (1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability o ..."
Abstract

Cited by 65 (14 self)
 Add to MetaCart
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is\Omega\Gamma/2 (1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability
The Importance of Convexity in Learning with Squared Loss
 IEEE Transactions on Information Theory
"... We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is W(ln(1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability of success ..."
Abstract
 Add to MetaCart
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is W(ln(1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability of success
Least Median of Squares Regression
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1984
"... ..."
LeastSquares Policy Iteration
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We propose a new approach to reinforcement learning for control problems which combines valuefunction approximation with linear architectures and approximate policy iteration. This new approach ..."
Abstract

Cited by 461 (12 self)
 Add to MetaCart
We propose a new approach to reinforcement learning for control problems which combines valuefunction approximation with linear architectures and approximate policy iteration. This new approach
A note on the role of squared loss in regression
, 2005
"... Abstract. In the context of regression, a particularly important role is played by the conditional first moment of the outputs, the socalled regression function. In order to estimate this target function, one usually minimizes a regularized form of the empirical average of a loss function. We show ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
that if no assumption on the underlying probability distribution is available, the squared loss is the only convex loss function whose target is indeed the regression function. Minimization of penalized loss functionals is a quite common approach in statistical learning. Different loss functions have been proposed
A Stagewise Least Square Loss Function for Classification
"... This paper presents a stagewise least square (SLS) loss function for classification. It uses a least square form within each stage to approximate a bounded monotonic nonconvex loss function in a stagewise manner. Several benefits are obtained from using the SLS loss function, such as: (i) higher gen ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper presents a stagewise least square (SLS) loss function for classification. It uses a least square form within each stage to approximate a bounded monotonic nonconvex loss function in a stagewise manner. Several benefits are obtained from using the SLS loss function, such as: (i) higher
A unified biasvariance decomposition for zeroone and squared loss
 In AAAI’00
"... The biasvariance decomposition is a very useful and widelyused tool for understanding machinelearning algorithms. It was originally developed for squared loss. In recent years, several authors have proposed decompositions for zeroone loss, but each has significant shortcomings. In particular, al ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
The biasvariance decomposition is a very useful and widelyused tool for understanding machinelearning algorithms. It was originally developed for squared loss. In recent years, several authors have proposed decompositions for zeroone loss, but each has significant shortcomings. In particular
The performance of TCP/IP for networks with high bandwidthdelay products and random loss
, 1997
"... This paper examines the performance of TCP/IP, the Internet data transport protocol, over Wide Area Networks (WANs) in which data traffic could coexist with realtime traffic such as voice and video. Specifically, we attempt to develop a basic understanding, using analysis and simulation, of the pro ..."
Abstract

Cited by 464 (6 self)
 Add to MetaCart
of interest. The following key results are obtained. First, random loss leads to significant throughput deterioration when the product of the loss probability and the square of the bandwidthdelay product is larger than one. Unless network resources are specifically reserved for data traffic, data traffic
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 951 (12 self)
 Add to MetaCart
for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual
Results 1  10
of
852,162