How to compare different loss functions and their risks
, 2006
"... Many learning problems are described by a risk functional which in turn is defined by a loss function, and a straightforward and widelyknown approach to learn such problems is to minimize a (modified) empirical version of this risk functional. However, in many cases this approach suffers from subst ..."
described. However, beyond the classification problem little is known on good surrogate loss functions up to now. In this work we establish a general theory that provides powerful tools for comparing excess risks of different loss functions. We then apply this theory to several learning problems including
Flexible Nonparametric Kernel Learning with Different Loss Functions
"... Abstract. Side information is highly useful in the learning of a nonparametric kernel matrix. However, this often leads to an expensive semidefinite program (SDP). In recent years, a number of dedicated solvers have been proposed. Though much better than offtheshelf SDP solvers, they still canno ..."
Abstract. Side information is highly useful in the learning of a nonparametric kernel matrix. However, this often leads to an expensive semidefinite program (SDP). In recent years, a number of dedicated solvers have been proposed. Though much better than offtheshelf SDP solvers, they still cannot scale to large data sets. In this paper, we propose a novel solver based on the alternating direction method of multipliers (ADMM). The key idea is to use a lowrank decomposition of the kernel matrix Z = XY, with the constraint that X = Y. The resultant optimization problem, though nonconvex, has favorable convergence properties and can be efficiently solved without requiring eigendecomposition in each iteration. Experimental results on a number of realworld data sets demonstrate that the proposed method is as accurate as directly solving the SDP, but can be one to two orders of magnitude faster. 1
Bayesian analysis of shape parameter of Lomax distribution using different loss functions IJSM
"... Bayesian analysis of shape parameter of Lomax distribution using different loss functions ..."
Bayesian analysis of shape parameter of Lomax distribution using different loss functions
Are investors reluctant to realize their losses
 Journal of Finance
, 1998
"... I test the disposition effect, the tendency of investors to hold losing investments too long and sell winning investments too soon, by analyzing trading records for 10,000 accounts at a large discount brokerage house. These investors demonstrate a strong preference for realizing winners rather than ..."
motivated selling is most evident in December. THE TENDENCY TO HOLD LOSERS too long and sell winners too soon has been labeled the disposition effect by Shefrin and Statman ~1985!. For taxable investments the disposition effect predicts that people will behave quite differently than they would if they paid
Employing different loss functions for the classification of images via supervised learning
, 2013
"... Abstract. Supervised learning methods are powerful techniques to learn a function from a given set of labeled data, the socalled training data. In this paper the support vector machines approach is applied to an image classification task. Starting with the corresponding Tikhonov regularization prob ..."
problem, reformulated as a convex optimization problem, we introduce a conjugate dual problem to it and prove that, whenever strong duality holds, the function to be learned can be expressed via the dual optimal solutions. Corresponding dual problems are then derived for different loss functions
Advances in Prospect Theory: Cumulative Representation of Uncertainty
 JOURNAL OF RISK AND UNCERTAINTY, 5:297323 (1992)
, 1992
"... We develop a new version of prospect theory that employs cumulative rather than separable decision weights and extends the theory in several respects. This version, called cumulative prospect theory, applies to uncertain as well as to risky prospects with any number of outcomes, and it allows differ ..."
different weighting functions for gains and for losses. Two principles, diminishing sensitivity and loss aversion, are invoked to explain the characteristic curvature of the value function and the weighting functions. A review of the experimental evidence and the results of a new experiment confirm a
Functional discovery via a compendium of expression profiles. Cell 102:109
, 2000
"... have been devised to survey gene functions en masse either computationally (Marcotte et al., 1999) or experimentally; among these, highly parallel assays of ..."
have been devised to survey gene functions en masse either computationally (Marcotte et al., 1999) or experimentally; among these, highly parallel assays of
Experimental Estimates of Education Production Functions
 Princeton University, Industrial Relations Section Working Paper No. 379
, 1997
"... This paper analyzes data on 11,600 students and their teachers who were randomly assigned to different size classes from kindergarten through third grade. Statistical methods are used to adjust for nonrandom attrition and transitions between classes. The main conclusions are (1) on average, performa ..."
This paper analyzes data on 11,600 students and their teachers who were randomly assigned to different size classes from kindergarten through third grade. Statistical methods are used to adjust for nonrandom attrition and transitions between classes. The main conclusions are (1) on average
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual
The Plenoptic Function and the Elements of Early Vision
 Computational Models of Visual Processing
, 1991
"... experiment. Electrophysiologists have described neurons in striate cortex that are selectively sensitive to certain visual properties; for reviews, see Hubel (1988) and DeValois and DeValois (1988). Psychophysicists have inferred the existence of channels that are tuned for certain visual properties ..."
experiment. Electrophysiologists have described neurons in striate cortex that are selectively sensitive to certain visual properties; for reviews, see Hubel (1988) and DeValois and DeValois (1988). Psychophysicists have inferred the existence of channels that are tuned for certain visual properties; for reviews, see Graham (1989), Olzak and Thomas (1986), Pokorny and Smith (1986), and Watson (1986). Researchers in perception have found aspects of visual stimuli that are processed preattentively (Beck, 1966; Bergen & Julesz, 1983; Julesz & Bergen, Motion Color Binocular disparity Retinal processing Early vision Memory Higherlevel vision Etc... Retina More processing Still more processing Orientation Fig.1.1 A generic diagram for visual processing. In this approach, early vision consists of a set of parallel pathways, each analyzing some particular aspect of the visual stimulus. 1983; Treisman, 1986; Treisman & Gelade, 1980). And in computational
