Results 1 - 10
of
13
Active Learning with Real Annotation Costs
"... The goal of active learning is to minimize the cost of training an accurate model by allowing the learner to choose which instances are labeled for training. However, most research in active learning to date has assumed that the cost of acquiring labels is the same for all instances. In domains wher ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
The goal of active learning is to minimize the cost of training an accurate model by allowing the learner to choose which instances are labeled for training. However, most research in active learning to date has assumed that the cost of acquiring labels is the same for all instances. In domains where labeling costs may vary, a reduction in the number of labeled instances does not guarantee a reduction in cost. To better understand the nature of actual labeling costs in such domains, we present a detailed empirical study of active learning with annotation costs in four real-world domains involving human annotators. 1
Multiple-instance active learning
- In Advances in Neural Information Processing Systems (NIPS
, 2008
"... We present a framework for active learning in the multiple-instance (MI) setting. In an MI learning problem, instances are naturally organized into bags and it is the bags, instead of individual instances, that are labeled for training. MI learners assume that every instance in a bag labeled negativ ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
We present a framework for active learning in the multiple-instance (MI) setting. In an MI learning problem, instances are naturally organized into bags and it is the bags, instead of individual instances, that are labeled for training. MI learners assume that every instance in a bag labeled negative is actually negative, whereas at least one instance in a bag labeled positive is actually positive. We consider the particular case in which an MI learner is allowed to selectively query unlabeled instances from positive bags. This approach is well motivated in domains in which it is inexpensive to acquire bag labels and possible, but expensive, to acquire instance labels. We describe a method for learning from labels at mixed levels of granularity, and introduce two active query selection strategies motivated by the MI setting. Our experiments show that learning from instance labels can significantly improve performance of a basic MI learning algorithm in two multiple-instance domains: content-based image retrieval and text classification. 1
On the relation between multi-instance learning and semi-supervised learning
- The 24th International Conference on Machine Learning
, 2007
"... Multi-instance learning and semi-supervised learning are different branches of machine learning. The former attempts to learn from a training set consists of labeled bags each containing many unlabeled instances; the latter tries to exploit abundant unlabeled instances when learning with a small num ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Multi-instance learning and semi-supervised learning are different branches of machine learning. The former attempts to learn from a training set consists of labeled bags each containing many unlabeled instances; the latter tries to exploit abundant unlabeled instances when learning with a small number of labeled examples. In this paper, we establish a bridge between these two branches by showing that multi-instance learning can be viewed as a special case of semi-supervised learning. Based on this recognition, we propose the MissSVM algorithm which addresses multi-instance learning using a special semisupervised support vector machine. Experiments show that solving multi-instance problems from the view of semi-supervised learning is feasible, and the MissSVM algorithm is competitive with state-of-the-art multiinstance learning algorithms. 1.
Multi-instance learning by treating instances as nonI.I.D. samples
- In Proceedings of the 26th International Conference on Machine Learning
, 2009
"... Previous studies on multi-instance learning typically treated instances in the bags as independently and identically distributed. The instances in a bag, however, are rarely independent in real tasks, and a better performance can be expected if the instances are treated in an non-i.i.d. way that exp ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Previous studies on multi-instance learning typically treated instances in the bags as independently and identically distributed. The instances in a bag, however, are rarely independent in real tasks, and a better performance can be expected if the instances are treated in an non-i.i.d. way that exploits relations among instances. In this paper, we propose two simple yet effective methods. In the first method, we explicitly map every bag to an undirected graph and design a graph kernel for distinguishing the positive and negative bags. In the second method, we implicitly construct graphs by deriving affinity matrices and propose an efficient graph kernel considering the clique information. The effectiveness of the proposed methods are validated by experiments. 1.
Salience Assignment for Multiple-Instance Regression
"... We present a Multiple-Instance Learning (MIL) algorithm for determining the salience of each item in each bag with respect to the bag’s real-valued label. We use an alternating-projections constrained optimization approach to simultaneously learn a regression model and estimate all salience values. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We present a Multiple-Instance Learning (MIL) algorithm for determining the salience of each item in each bag with respect to the bag’s real-valued label. We use an alternating-projections constrained optimization approach to simultaneously learn a regression model and estimate all salience values. We evaluate this algorithm on a significant real-world problem, crop yield modeling, and demonstrate that it provides more extensive, intuitive, and stable salience models than Primary-Instance Regression, which selects a single relevant item from each bag. 1.
Curious Machines: Active Learning with Structured Instances
, 2008
"... and for Natalie, who now piques it. i ii Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks in ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
and for Natalie, who now piques it. i ii Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks involving the automatic organization, extraction, and retrieval of information from large collections of data (e.g., text, images, and other digital media). In traditional supervised learning, one uses “labeled ” training data to induce a model. However, labeled instances for real-world applications are often difficult, expensive, or time consuming to obtain. Consider a complex task such as extracting key person and organization names from text documents. While gathering large amounts of unlabeled documents for these tasks is often relatively easy (e.g., from the World Wide Web), labeling these texts usually requires experienced human annotators with specific domain knowledge and training. There are implicit costs associated with obtaining these labels from domain experts, such as limited time and financial resources. This
Marginalized multi-instance kernels
- In Proceedings of International Joint Conference on Artificial Intelligence
, 2007
"... Support vector machines (SVM) have been highly successful in many machine learning problems. Recently, it is also used for multi-instance (MI) learning by employing a kernel that is defined directly on the bags. As only the bags (but not the instances) have known labels, this MI kernel implicitly as ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Support vector machines (SVM) have been highly successful in many machine learning problems. Recently, it is also used for multi-instance (MI) learning by employing a kernel that is defined directly on the bags. As only the bags (but not the instances) have known labels, this MI kernel implicitly assumes all instances in the bag to be equally important. However, a fundamental property of MI learning is that not all instances in a positive bag necessarily belong to the positive class, and thus different instances in the same bag should have different contributions to the kernel. In this paper, we address this instance label ambiguity by using the method of marginalized kernels. It first assumes that all the instance labels are available and defines a label-dependent kernel on the instances. By integrating out the unknown instance labels, a marginalized kernel defined on the bags can then be obtained. A desirable property is that this kernel weights the instance pairs by the consistencies of their probabilistic instance labels. Experiments on both classification and regression data sets show that this marginalized MI kernel, when used in a standard SVM, performs consistently better than the original MI kernel. It also outperforms a number of traditional MI learning methods. 1
MILD: Multiple-Instance Learning via Disambiguation
, 2009
"... In multiple-instance learning (MIL), an individual example is called an instance and a bag contains a single or multiple instances. The class labels available in the training set are associated with bags rather than instances. A bag is labeled positive if at least one of its instances is positive; o ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In multiple-instance learning (MIL), an individual example is called an instance and a bag contains a single or multiple instances. The class labels available in the training set are associated with bags rather than instances. A bag is labeled positive if at least one of its instances is positive; otherwise, the bag is labeled negative. Since a positive bag may contain some negative instances in addition to one or more positive instances, the true labels for the instances in a positive bag may or may not be the same as the corresponding bag label and, consequently, the instance labels are inherently ambiguous. In this paper, we propose a very efficient and robust MIL method, called MILD (Multiple-Instance Learning via Disambiguation), for general MIL problems. First, we propose a novel disambiguation method to identify the true positive instances in the positive bags. Second, we propose two feature representation schemes, one for instance-level classification and the other for bag-level classification, to convert the MIL problem into a standard single-instance learning (SIL) problem that can be solved by well-known SIL algorithms, such as support vector machine. Third, an inductive semi-supervised learning method is proposed for MIL. We evaluate our methods extensively on several challenging MIL applications to demonstrate their promising efficiency, robustness and accuracy.
ABSTRACT Structure-Sensitive Manifold Ranking for Video Concept Detection
"... Pairwise similarity of samples is an essential factor in graph propagation based semi-supervised learning methods. Usually it is estimated based on Euclidean distance. However, the structural assumption, which is a basic assumption in these methods, has not been taken into consideration in the norma ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Pairwise similarity of samples is an essential factor in graph propagation based semi-supervised learning methods. Usually it is estimated based on Euclidean distance. However, the structural assumption, which is a basic assumption in these methods, has not been taken into consideration in the normal pairwise similarity measure. In this paper, we propose a novel graph-based learning approach, named Structure-Sensitive Manifold Ranking (SSMR), based on a structuresensitive similarity measure. Instead of using distance only, SSMR takes local distribution differences into account to more accurately measure pairwise similarity. Furthermore, we show that SSMR can also be deduced from a partial differential equation based anisotropic diffusion. Experiments conducted on the TRECVID dataset show that this approach significantly outperforms existing graph-based semisupervised learning methods for video semantic concept detection.
Localized Content-Based Image Retrieval Through Evidence Region Identification
"... Over the past decade, multiple-instance learning (MIL) has been successfully utilized to model the localized content-based image retrieval (CBIR) problem, in which a bag corresponds to an image and an instance corresponds to a region in the image. However, existing feature representation schemes are ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Over the past decade, multiple-instance learning (MIL) has been successfully utilized to model the localized content-based image retrieval (CBIR) problem, in which a bag corresponds to an image and an instance corresponds to a region in the image. However, existing feature representation schemes are not effective enough to describe the bags in MIL, which hinders the adaptation of sophisticated single-instance learning (SIL) methods for MIL problems. In this paper, we first propose an evidence region (or evidence instance) identification method to identify the evidence regions supporting the labels of the images (i.e., bags). Then, based on the identified evidence regions, a very effective feature representation scheme, which is also very computationally efficient and robust to labeling noise, is proposed to describe the bags. As a result, the MIL problem is converted into a standard SIL problem and a support vector machine (SVM) can be easily adapted for localized CBIR. Experimental results on two challenging data sets show that our method, called EC-SVM, can outperform the state-of-the-art methods in terms of accuracy, robustness and efficiency.

