Results 1  10
of
2,725
The Importance of Convexity in Learning with Squared Loss
 IEEE Transactions on Information Theory
, 1997
"... We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is\Omega\Gamma/2 (1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability o ..."
Abstract

Cited by 62 (14 self)
 Add to MetaCart
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is\Omega\Gamma/2 (1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability
The Importance of Convexity in Learning with Squared Loss
 IEEE Transactions on Information Theory
"... We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is W(ln(1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability of success ..."
Abstract
 Add to MetaCart
We show that if the closure of a function class F under the metric induced by some probability distribution is not convex, then the sample complexity for agnostically learning F with squared loss (using only hypotheses in F ) is W(ln(1=ffi)=ffl 2 ) where 1 \Gamma ffi is the probability of success
The Sample Complexity of Learning Linear Predictors with the Squared Loss
, 2015
"... Abstract We provide a tight sample complexity bound for learning boundednorm linear predictors with respect to the squared loss. Our focus is on an agnostic PACstyle setting, where no assumptions are made on the data distribution beyond boundedness. This contrasts with existing results in the lit ..."
Abstract
 Add to MetaCart
Abstract We provide a tight sample complexity bound for learning boundednorm linear predictors with respect to the squared loss. Our focus is on an agnostic PACstyle setting, where no assumptions are made on the data distribution beyond boundedness. This contrasts with existing results
A note on the role of squared loss in regression
, 2005
"... Abstract. In the context of regression, a particularly important role is played by the conditional first moment of the outputs, the socalled regression function. In order to estimate this target function, one usually minimizes a regularized form of the empirical average of a loss function. We show ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
that if no assumption on the underlying probability distribution is available, the squared loss is the only convex loss function whose target is indeed the regression function. Minimization of penalized loss functionals is a quite common approach in statistical learning. Different loss functions have been proposed
A Stagewise Least Square Loss Function for Classification
"... This paper presents a stagewise least square (SLS) loss function for classification. It uses a least square form within each stage to approximate a bounded monotonic nonconvex loss function in a stagewise manner. Several benefits are obtained from using the SLS loss function, such as: (i) higher gen ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper presents a stagewise least square (SLS) loss function for classification. It uses a least square form within each stage to approximate a bounded monotonic nonconvex loss function in a stagewise manner. Several benefits are obtained from using the SLS loss function, such as: (i) higher
A unified biasvariance decomposition for zeroone and squared loss
 In AAAI’00
"... The biasvariance decomposition is a very useful and widelyused tool for understanding machinelearning algorithms. It was originally developed for squared loss. In recent years, several authors have proposed decompositions for zeroone loss, but each has significant shortcomings. In particular, al ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
The biasvariance decomposition is a very useful and widelyused tool for understanding machinelearning algorithms. It was originally developed for squared loss. In recent years, several authors have proposed decompositions for zeroone loss, but each has significant shortcomings. In particular
The Performance of TCP/IP for Networks with High BandwidthDelay Products and Random Loss.
 IEEE/ACM Trans. Networking,
, 1997
"... AbstractThis paper examines the performance of TCP/IP, the Internet data transport protocol, over widearea networks (WANs) in which data traffic could coexist with realtime traffic such as voice and video. Specifically, we attempt to develop a basic understanding, using analysis and simulation, ..."
Abstract

Cited by 465 (6 self)
 Add to MetaCart
). The following key results are obtained. First, random loss leads to significant throughput deterioration when the product of the loss probability and the square of the bandwidthdelay product is larger than one. Second, for multiple connections sharing a bottleneck link, TCP is grossly unfair toward connections
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 1000 (13 self)
 Add to MetaCart
for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual
Sufficient Dimension Reduction via Squaredloss Mutual Information Estimation
"... The goal of sufficient dimension reduction in supervised learning is to find the lowdimensional subspace of input features that is ‘sufficient ’ for predicting output values. In this paper, we propose a novel sufficient dimension reduction method using a squaredloss variant of mutual information as ..."
Abstract

Cited by 35 (30 self)
 Add to MetaCart
as a dependency measure. We utilize an analytic approximator of squaredloss mutual information based on density ratio estimation, which is shown to possess suitable convergence properties. We then develop a natural gradient algorithm for sufficient subspace search. Numerical experiments show
Results 1  10
of
2,725