Results 1 
4 of
4
Cross Domain Distribution Adaptation via Kernel Mapping
"... When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between sou ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and sourcedomain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernelmapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the stateoftheart boostingbased transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10 % higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.
Universal Learning over Related Distributions and Adaptive Graph Transduction
"... Abstract. The basis assumption that “training and test data drawn from the same distribution ” is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under “different but related distributions ” in a single framework. Explicit examples incl ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The basis assumption that “training and test data drawn from the same distribution ” is often violated in reality. In this paper, we propose one common solution to cover various scenarios of learning under “different but related distributions ” in a single framework. Explicit examples include (a) sample selection bias between training and testing data, (b) transfer learning or no labeled data in target domain, and (c) noisy or uncertain training data. The main motivation is that one could ideally solve as many problems as possible with a single approach. The proposed solution extends graph transduction using the maximum margin principle over unlabeled data. The error of the proposed method is bounded under reasonable assumptions even when the training and testing distributions are different. Experiment results demonstrate that the proposed method improves the traditional graph transduction by as much as 15 % in accuracy and AUC in all common situations of distribution difference. Most importantly, it outperforms, by up to 10 % in accuracy, several stateofart approaches proposed to solve specific category of distribution difference, i.e, BRSD [1] for sample selection bias, CDSC [2] for transfer learning, etc. The main claim is that the adaptive graph transduction is a general and competitive method to solve distribution differences implicitly without knowing and worrying about the exact type. These at least include sample selection bias, transfer learning, uncertainty mining, as well as those alike that are still not studied yet.Thesourcecodeanddatasetsareavailablefromtheauthors. 1
UNIVERSITY OF WISCONSIN–MADISON
"... All Rights ReservedFor my parents, who always taught me to strive for the highest achievements possible. i ii In many realworld learning scenarios, acquiring a large amount of labeled training data is expensive and timeconsuming. Semisupervised learning (SSL) is the machine learning paradigm conc ..."
Abstract
 Add to MetaCart
(Show Context)
All Rights ReservedFor my parents, who always taught me to strive for the highest achievements possible. i ii In many realworld learning scenarios, acquiring a large amount of labeled training data is expensive and timeconsuming. Semisupervised learning (SSL) is the machine learning paradigm concerned with utilizing unlabeled data to try to build better classifiers and regressors. Unlabeled data is a powerful resource, yet SSL can be difficult to apply in practice. The objective of this dissertation is to move the field toward more practical and robust SSL. This is accomplished by several key contributions. First, we introduce the online (and active) semisupervised learning setting, which considers large amounts of mostly unlabeled data arriving constantly over time. An online SSL classifier must be able to make efficient predictions at any moment and update itself in response to labeled and unlabeled data. Previously, almost all SSL assumed a fixed dataset was available before training began, and receiving new data meant retraining a potentially slow model. We present two families of online semisupervised learners that reformulate the popular manifold and cluster assumptions
Learning Aspect Models with Partially Labeled Data
"... In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this work is to take advantage of the amount of available unlabeled data together with the set of labeled examples to learn latent models whose struc ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this work is to take advantage of the amount of available unlabeled data together with the set of labeled examples to learn latent models whose structure and underlying hypotheses take more accurately into account the document generation process, compared to other mixturebased generative models. We present one semisupervised variant of the Probabilistic Latent Semantic Analysis (PLSA) model [1]. In our approach, we try to capture the possible data mislabeling errors which occur during the training of our model. This is done by iteratively assigning class labels to unlabeled examples using the current aspect model and reestimating the probabilities of the mislabeling errors. We perform experiments over the 20Newsgroups, WebKB and Reuters document collections, as well as over a real world dataset coming from a Business Group of Xerox and show the effectiveness of our approach compared to a semisupervised version of Naive Bayes, two other semisupervised versions of PLSA and to transductive Support Vector Machines. 1