Results 1 - 10
of
28
Efficiently Scaling Up Video Annotation with Crowdsourced Marketplaces
"... Abstract. Accurately annotating entities in video is labor intensive and expensive. As the quantity of online video grows, traditional solutions to this task are unable to scale to meet the needs of researchers with limited budgets. Current practice provides a temporary solution by paying dedicated ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Accurately annotating entities in video is labor intensive and expensive. As the quantity of online video grows, traditional solutions to this task are unable to scale to meet the needs of researchers with limited budgets. Current practice provides a temporary solution by paying dedicated workers to label a fraction of the total frames and otherwise settling for linear interpolation. As budgets and scale require sparser key frames, the assumption of linearity fails and labels become inaccurate. To address this problem we have created a public framework for dividing the work of labeling video data into micro-tasks that can be completed by huge labor pools available through crowdsourced marketplaces. By extracting pixel-based features from manually labeled entities, we are able to leverage more sophisticated interpolation between key frames to maximize performance given a budget. Finally, by validating the power of our framework on difficult, real-world data sets we demonstrate an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling. 1
Cost-Sensitive Active Visual Category Learning
"... examples to build models (b) Unlabeled and partially labeled examples to survey (c) Actively chosen queries sent to annotators Figure 1: Overview of the proposed approach. (a) We learn object categories from multi-label images, with a mixture of weak and strong labels. (b) The active selection funct ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
examples to build models (b) Unlabeled and partially labeled examples to survey (c) Actively chosen queries sent to annotators Figure 1: Overview of the proposed approach. (a) We learn object categories from multi-label images, with a mixture of weak and strong labels. (b) The active selection function surveys unlabeled and partially labeled images, and for each candidate annotation, predicts the tradeoff between its informativeness vs. the manual effort it would cost to obtain. (c) The most promising annotations are requested and used to update the current classifier. Are larger image training sets necessarily better for recognition? The accuracies of most current object recognition methods steadily improve as more and more labeled training data is made available. However, this requires manually collecting and possibly further annotating image examples, which is an expensive endeavor. Though the protocol of learning models from carefully gathered images has proven fruitful, it is too expensive to perpetuate in the long-term. Active learning strategies have the potential to reduce this burden by generally selecting only the most informative examples to label.
Incremental Relabeling for Active Learning with Noisy Crowdsourced Annotations
- IEEE Intl. Conf. on Social Computing
, 2011
"... Crowdsourcing has become an popular approach for annotating the large quantities of data required to train machine learning algorithms. However, obtaining labels in this manner poses two important challenges. First, naively labeling all of the data can be prohibitively expensive. Second, a significa ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Crowdsourcing has become an popular approach for annotating the large quantities of data required to train machine learning algorithms. However, obtaining labels in this manner poses two important challenges. First, naively labeling all of the data can be prohibitively expensive. Second, a significant fraction of the annotations can be incorrect due to carelessness or limited domain expertise of crowdsourced workers. Active learning provides a natural formulation to address the former issue by affordably selecting an appropriate subset of instances to label. Unfortunately, most active learning strategies are myopic and sensitive to label noise, which leads to poorly trained classifiers. We propose an active learning method that is specifically designed to be robust to such noise. We present an application of our technique in the domain of activity recognition for eldercare and validate the proposed approach using both simulated and realworld experiments using Amazon Mechanical Turk. I.
Active frame selection for label propagation in videos
- In ECCV
"... Abstract. Manually segmenting and labeling objects in video sequences is quite tedious, yet such annotations are valuable for learning-based approaches to ob-ject and activity recognition. While automatic label propagation can help, existing methods simply propagate annotations from arbitrarily sele ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Manually segmenting and labeling objects in video sequences is quite tedious, yet such annotations are valuable for learning-based approaches to ob-ject and activity recognition. While automatic label propagation can help, existing methods simply propagate annotations from arbitrarily selected frames (e.g., the first one) and so may fail to best leverage the human effort invested. We define an active frame selection problem: select k frames for manual labeling, such that automatic pixel-level label propagation can proceed with minimal expected er-ror. We propose a solution that directly ties a joint frame selection criterion to the predicted errors of a flow-based random field propagation model. It selects the set of k frames that together minimize the total mislabeling risk over the en-tire sequence. We derive an efficient dynamic programming solution to optimize the criterion. Further, we show how to automatically determine how many total frames k should be labeled in order to minimize the total manual effort spent labeling and correcting propagation errors. We demonstrate our method’s clear advantages over several baselines, saving hours of human effort per video. 1
1 Dynamic Processing Allocation in Video
"... Large stores of digital video pose severe computational challenges to existing video analysis algorithms. In applying these algorithms, users must often trade-off processing speed for accuracy, as many sophisticated and effective algorithms require large computational resources that make it impracti ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Large stores of digital video pose severe computational challenges to existing video analysis algorithms. In applying these algorithms, users must often trade-off processing speed for accuracy, as many sophisticated and effective algorithms require large computational resources that make it impractical to apply them throughout long videos. One can save considerable effort by applying these expensive algorithms sparingly, directing their application using the results of more limited processing. We show how to do this for retrospective video analysis by modeling a video using a chain graphical model and performing inference both to analyze the video and to direct processing. To accomplish this, we develop a new algorithm to direct processing. This algorithm approximates the optimal solution efficiently. We apply our algorithm to problems in background subtraction and face detection and show in experiments that this leads to significant improvements over baseline algorithms.
Active detection via adaptive submodularity
- In ICML
, 2014
"... Efficient detection of multiple object instances is one of the fundamental challenges in computer vision. For certain object categories, even the best automatic systems are yet unable to produce high-quality detection results, and fully manual annotation would be an expensive process. How can detect ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Efficient detection of multiple object instances is one of the fundamental challenges in computer vision. For certain object categories, even the best automatic systems are yet unable to produce high-quality detection results, and fully manual annotation would be an expensive process. How can detection algorithms interplay with human expert annotators? To make the best use of scarce (human) labeling resources, one needs to decide when to invoke the expert, such that the best possible performance can be achieved while requiring a minimum amount of supervision. In this paper, we propose a principled approach to active object detection, and show that for a rich class of base detectors algorithms, one can derive a natural sequential decision problem for deciding when to invoke expert supervision. We further show that the objective function satisfies adaptive submodularity, which allows us to derive strong performance guarantees for our al-gorithm. We demonstrate the proposed algorithm on three real-world tasks, including a problem for biodiversity monitoring from micro UAVs in the Sumatra rain forest. Our results show that active detection not only outperforms its passive counterpart; for certain tasks, it also works sig-nificantly better than straightforward application of existing active learning techniques. To the best of our knowledge, our approach is the first to rigorously address the active detection problem from both empirical and theoretical perspectives.
From Semi-Supervised to Transfer Counting of Crowds
"... Regression-based techniques have shown promising re-sults for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustivel ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Regression-based techniques have shown promising re-sults for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustively annotating every single frame, the most in-formative frames are selected for annotation automatically and actively. (2) Rather than learning from only labelled data, the abundant unlabelled data are exploited. (3) La-belled data from other scenes are employed to further al-leviate the burden for data annotation. All three ideas are implemented in a unified active and semi-supervised regres-sion framework with ability to perform transfer learning, by exploiting the underlying geometric structure of crowd pat-terns via manifold analysis. Extensive experiments validate the effectiveness of our approach. 1.
Active Inference for Retrieval in Camera Networks
"... We address the problem of searching camera network videos to retrieve frames containing specified individuals. We show the benefit of utilizing a learned probabilistic model that captures dependencies among the cameras. In addition, we develop an active inference framework that can request human inp ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
We address the problem of searching camera network videos to retrieve frames containing specified individuals. We show the benefit of utilizing a learned probabilistic model that captures dependencies among the cameras. In addition, we develop an active inference framework that can request human input at inference time, directing human attention to the portions of the videos whose correct annotation would provide the biggest performance improvements. Our primary contribution is to show that by mapping video frames in a camera network onto a graphical model, we can apply collective classification and active inference algorithms to significantly increase the performance of the retrieval system, while minimizing the number of human annotations required. 1.
Perceptual annotation: Measuring human vision to improve computer vision
- Pattern Analysis and Machine Intelligence, IEEE Transactions on
"... Abstract—For many problems in computer vision, human learners are considerably better than machines. Humans possess highly accurate internal recognition and learning mechanisms that are not yet understood, and they frequently have access to more extensive training data through a lifetime of unbiased ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—For many problems in computer vision, human learners are considerably better than machines. Humans possess highly accurate internal recognition and learning mechanisms that are not yet understood, and they frequently have access to more extensive training data through a lifetime of unbiased experience with the visual world. We propose to use visual psychophysics to directly leverage the abilities of human subjects to build better machine learning systems. First, we use an advanced online psychometric testing platform to make new kinds of annotation data available for learning. Second, we develop a technique for harnessing these new kinds of information – “perceptual annotations ” – for support vector machines. A key intuition for this approach is that while it may remain infeasible to dramatically increase the amount of data and high-quality labels available for the training of a given system, measuring the exemplar-by-exemplar difficulty and pattern of errors of human annotators can provide important information for regularizing the solution of the system at hand. A case study for the problem face detection demonstrates that this approach yields state-of-the-art results on the challenging FDDB data set.
Crowdsourcing and Its Applications in Computer Vision
, 2011
"... Crowdsourcing has emerged in the past decade as a popular model for online distributed problem solving and production. The creation of Amazon Mechanical Turk (MTurk) as a “micro-task ” marketplace has facilitated this growth by connecting willing workers with available tasks. Within computer vision, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Crowdsourcing has emerged in the past decade as a popular model for online distributed problem solving and production. The creation of Amazon Mechanical Turk (MTurk) as a “micro-task ” marketplace has facilitated this growth by connecting willing workers with available tasks. Within computer vision, MTurk has proven especially useful in large-scale image label collection, as many computer vision algorithms require substantial amounts of training data. In this survey, we discuss different types of worker incentives, various considerations for MTurk task design, methods for annotation quality analysis, and cost-effective ways of obtaining labels in a selective manner. We present several examples of how MTurk is being utilized in the computer vision community. Finally, we discuss the implications that MTurk usage will have on future computer vision research. 1