Results 1  10
of
18
Learning from noisy labels with deep neural networks,” arXiv
, 2014
"... We propose several simple approaches to training deep neural networks on data with noisy labels. We introduce an extra noise layer into the network which adapts the network outputs to match the noisy label distribution. The parameters of this noise layer can be estimated as part of the training proc ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We propose several simple approaches to training deep neural networks on data with noisy labels. We introduce an extra noise layer into the network which adapts the network outputs to match the noisy label distribution. The parameters of this noise layer can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks. We demonstrate the approaches on several datasets, including large scale experiments on the ImageNet classification benchmark, showing how additional noisy data can improve stateoftheart recognition models. 1
PU Learning for Matrix Completion
 Inderjit S. Dhillon Dept of Computer Science UT Austin LowRank Bilinear Prediction
, 2015
"... In this paper, we consider the matrix completion problem when the observations are onebit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social net ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the matrix completion problem when the observations are onebit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only “likes ” or “friendships ” are observed. The problem is an instance of PU (positiveunlabeled) learning, i.e. learning from only positive and unlabeled examples that has been studied in the context of binary classification. Under the assumption thatM has bounded nuclear norm, we provide recovery guarantees for two different observation models: 1) M parameterizes a distribution that generates a binary matrix, 2) M is thresholded to obtain a binary matrix. For the first case, we propose a “shifted matrix completion ” method that recovers M using only a subset of indices corresponding to ones; for the second case, we propose a “biased matrix completion ” method that recovers the (thresholded) binary matrix. Both methods yield strong error bounds — if M ∈ Rn×n, the error is bounded as O
Robust distance metric learning in the presence of label noise
 in TwentyEighth AAAI Conference on Artificial Intelligence
, 2014
"... Many distance learning algorithms have been developed in recent years. However, few of them consider the problem when the class labels of training data are noisy, and this may lead to serious performance deterioration. In this paper, we present a robust distance learning method in the presence of l ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Many distance learning algorithms have been developed in recent years. However, few of them consider the problem when the class labels of training data are noisy, and this may lead to serious performance deterioration. In this paper, we present a robust distance learning method in the presence of label noise, by extending a previous nonparametric discriminative distance learning algorithm, i.e., Neighbourhood Components Analysis (NCA). Particularly, we analyze the effect of label noise on the derivative of likelihood with respect to the transformation matrix, and propose to model the conditional probability of the true label of each point so as to reduce that effect. The model is then optimized within the EM framework, with additional regularization used to avoid overfitting. Our experiments on several UCI datasets and a real dataset with unknown noise patterns show that the proposed RNCA is more tolerant to class label noise compared to the original NCA method.
To Re(label), or Not To Re(label)
"... One of the most popular uses of crowdsourcing is to provide training data for supervised machine learning algorithms. Since human annotators often make errors, requesters commonly ask multiple workers to label each example. But is this strategy always the most cost effective use of crowdsourced wor ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
One of the most popular uses of crowdsourcing is to provide training data for supervised machine learning algorithms. Since human annotators often make errors, requesters commonly ask multiple workers to label each example. But is this strategy always the most cost effective use of crowdsourced workers? We argue “No ” — often classifiers can achieve higher accuracies when trained with noisy “unilabeled ” data. However, in some cases relabeling is extremely important. We discuss three factors that may make relabeling an effective strategy: classifier expressiveness, worker accuracy, and budget.
Error Rate Bounds and Iterative Weighted Majority Voting for Crowdsourcing
, 2014
"... Crowdsourcing has become an effective and popular tool for humanpowered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this pa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Crowdsourcing has become an effective and popular tool for humanpowered computation to label large datasets. Since the workers can be unreliable, it is common in crowdsourcing to assign multiple workers to one task, and to aggregate the labels in order to obtain results of high quality. In this paper, we provide finitesample exponential bounds on the error rate (in probability and in expectation) of general aggregation rules under the DawidSkene crowdsourcing model. The bounds are derived for multiclass labeling, and can be used to analyze many aggregation methods, including majority voting, weighted majority voting and the oracle Maximum A Posteriori (MAP) rule. We show that the oracle MAP rule approximately optimizes our upper bound on the mean error rate of weighted majority voting in certain setting. We propose an iterative weighted majority voting (IWMV) method that optimizes the error rate bound and approximates the oracle MAP rule. Its one step version has a provable theoretical guarantee on the error rate. The IWMV method is intuitive and computationally simple. Experimental results on simulated and real data show that IWMV performs at least on par with the stateoftheart methods, and it has a much lower computational cost (around one hundred times faster) than the stateoftheart methods.
A rate of convergence for mixture proportion estimation, with application to learning from noisy labels.
 In To appear in International Conference on Artificial Intelligence and Statistics (AISTATS),
, 2015
"... Abstract Mixture proportion estimation (MPE) is a fundamental tool for solving a number of weakly supervised learning problems supervised learning problems where label information is noisy or missing. Previous work on MPE has established a universally consistent estimator. In this work we establis ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract Mixture proportion estimation (MPE) is a fundamental tool for solving a number of weakly supervised learning problems supervised learning problems where label information is noisy or missing. Previous work on MPE has established a universally consistent estimator. In this work we establish a rate of convergence for mixture proportion estimation under an appropriate distributional assumption, and argue that this rate of convergence is useful for analyzing weakly supervised learning algorithms that build on MPE. To illustrate this idea, we examine an algorithm for classification in the presence of noisy labels based on surrogate risk minimization, and show that the rate of convergence for MPE enables proof of the algorithm's consistency. Finally, we provide a practical implementation of mixture proportion estimation and demonstrate its efficacy in classification with noisy labels.
Learning from Corrupted Binary Labels via ClassProbability Estimation
"... Abstract Many supervised learning problems involve learning from samples whose labels are corrupted in some way. For example, each label may be flipped with some constant probability (learning with label noise), or one may have a pool of unlabelled samples in lieu of negative samples (learning from ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract Many supervised learning problems involve learning from samples whose labels are corrupted in some way. For example, each label may be flipped with some constant probability (learning with label noise), or one may have a pool of unlabelled samples in lieu of negative samples (learning from positive and unlabelled data). This paper uses classprobability estimation to study these and other corruption processes belonging to the mutually contaminated distributions framework Learning from corrupted binary labels In many practical scenarios involving learning from binary labels, one observes samples whose labels are corrupted versions of the actual ground truth. For example, in learning from classconditional label noise (CCN learning), the labels are flipped with some constant probability A fundamental question is whether one can minimise a given performance measure with respect to D, given access only to samples from D corr . Intuitively, in general this requires knowledge of the parameters of the corruption process that determines D corr . This yields two further questions: are there measures for which knowledge of these corruption parameters is unnecessary, and for other measures, can we estimate these parameters? In this paper, we consider corruption problems belonging to the mutually contaminated distributions framework While some of our results are known for the special cases of CCN and PU learning, our interest is in determining to what extent they generalise to other label corruption problems. This is a step towards a unified treatment of these problems. We now fix notation and formalise the problem.
Decontamination of mutually contaminated models
 In AISTATS
, 2014
"... Abstract A variety of machine learning problems are characterized by data sets that are drawn from multiple different convex combinations of a fixed set of base distributions. We call this a mutual contamination model. In such problems, it is often of interest to recover these base distributions, o ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract A variety of machine learning problems are characterized by data sets that are drawn from multiple different convex combinations of a fixed set of base distributions. We call this a mutual contamination model. In such problems, it is often of interest to recover these base distributions, or otherwise discern their properties. This work focuses on the problem of classification with multiclass label noise, in a general setting where the noise proportions are unknown and the true class distributions are nonseparable and potentially quite complex. We develop a procedure for decontamination of the contaminated models from data, which then facilitates the design of a consistent discrimination rule. Our approach relies on a novel method for estimating the error when projecting one distribution onto a convex combination of others, where the projection is with respect to a statistical distance known as the separation distance. Under sufficient conditions on the amount of noise and purity of the base distributions, this projection procedure successfully recovers the underlying class distributions. Connections to novelty detection, topic modeling, and other learning problems are also discussed.
Mapping Problems to Skills Combining Expert Opinion and Student Data
"... Abstract. Construction of a mapping between educational content and skills is an important part of development of adaptive educational systems. This task is difficult, requires a domain expert, and any mistakes in the mapping may hinder the potential of an educational system. In this work we study ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Construction of a mapping between educational content and skills is an important part of development of adaptive educational systems. This task is difficult, requires a domain expert, and any mistakes in the mapping may hinder the potential of an educational system. In this work we study techniques for improving a problemskill mapping constructed by a domain expert using student data, particularly problem solving times. We describe and compare different techniques for the task – a multidimensional model of problem solving times and supervised classification techniques. In the evaluation we focus on surveying situations where the combination of expert opinion with student data is most useful. 1
Convex Formulation for Learning from Positive and Unlabeled Data
"... We discuss binary classification from only positive and unlabeled data (PU classification), which is conceivable in various realworld machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solutio ..."
Abstract
 Add to MetaCart
(Show Context)
We discuss binary classification from only positive and unlabeled data (PU classification), which is conceivable in various realworld machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular nonconvex loss such as the ramp loss. However, classifier training with a nonconvex loss is not straightforward in practice. In this paper, we discuss a convex formulation for PU classification that can still cancel the bias. The key idea is to use different loss functions for positive and unlabeled samples. However, in this setup, the hinge loss is not permissible. As an alternative, we propose the double hinge loss. Theoretically, we prove that the estimators converge to the optimal solutions at the optimal parametric rate. Experimentally, we demonstrate that PU classification with the double hinge loss performs as accurate as the nonconvex method, with a much lower computational cost. 1.