Results 1  10
of
247
A Survey on Transfer Learning
"... A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many realworld applications, this assumption may not hold. For example, we sometimes have a classification task i ..."
Abstract

Cited by 443 (22 self)
 Add to MetaCart
(Show Context)
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many realworld applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.
Analysis of representations for domain adaptation
 In NIPS
, 2007
"... Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS) ..."
Abstract

Cited by 165 (11 self)
 Add to MetaCart
(Show Context)
Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS)
Posterior regularization for structured latent variable models
 Journal of Machine Learning Research
, 2010
"... We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model co ..."
Abstract

Cited by 135 (8 self)
 Add to MetaCart
(Show Context)
We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multiview learning, crosslingual dependency grammar induction, unsupervised partofspeech induction,
Domain Adaptation via Transfer Component Analysis
"... Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning met ..."
Abstract

Cited by 101 (18 self)
 Add to MetaCart
(Show Context)
Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a Reproducing Kernel Hilbert Space (RKHS) using Maximum Mean Discrepancy (MMD). In the subspace spanned by these transfer components, data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. The main contribution of our work is that we propose a novel feature representation in which to perform domain adaptation via a new parametric kernel using feature extraction methods, which can dramatically minimize the distance between domain distributions by projecting data onto the learned transfer components. Furthermore, our approach can handle large datsets and naturally lead to outofsample generalization. The effectiveness and efficiency of our approach in are verified by experiments on two realworld applications: crossdomain indoor WiFi localization and crossdomain text classification. 1
Domain adaptation for object recognition: An unsupervised approach
 In ICCV
"... Adapting the classifier trained on a source domain to recognize instances from a new target domain is an important problem that is receiving recent attention. In this paper, we present one of the first studies on unsupervised domain adaptation in the context of object recognition, where we have labe ..."
Abstract

Cited by 97 (3 self)
 Add to MetaCart
(Show Context)
Adapting the classifier trained on a source domain to recognize instances from a new target domain is an important problem that is receiving recent attention. In this paper, we present one of the first studies on unsupervised domain adaptation in the context of object recognition, where we have labeled data only from the source domain (and therefore do not have correspondences between object categories across domains). Motivated by incremental learning, we create intermediate representations of data between the two domains by viewing the generative subspaces (of same dimension) created from these domains as points on the Grassmann manifold, and sampling points along the geodesic between them to obtain subspaces that provide a meaningful description of the underlying domain shift. We then obtain the projections of labeled source domain data onto these subspaces, from which a discriminative classifier is learnt to classify projected data from the target domain. We discuss extensions of our approach for semisupervised adaptation, and for cases with multiple source and target domains, and report competitive results on standard datasets. 1.
Geodesic flow kernel for unsupervised domain adaptation
 In CVPR
, 2012
"... In realworld applications of visual recognition, many factors—such as pose, illumination, or image quality—can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform ..."
Abstract

Cited by 92 (6 self)
 Add to MetaCart
(Show Context)
In realworld applications of visual recognition, many factors—such as pose, illumination, or image quality—can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform poorly on the target domain. Domain adaptation techniques aim to correct the mismatch. Existing approaches have concentrated on learning feature representations that are invariant across domains, and they often do not directly exploit lowdimensional structures that are intrinsic to many vision datasets. In this paper, we propose a new kernelbased method that takes advantage of such structures. Our geodesic flow kernel models domain shift by integrating an infinite number of subspaces that characterize changes in geometric and statistical properties from the source to the target domain. Our approach is computationally advantageous, automatically inferring important algorithmic parameters without requiring extensive crossvalidation or labeled data from either domain. We also introduce a metric that reliably measures the adaptability between a pair of source and target domains. For a given target domain and several source domains, the metric can be used to automatically select the optimal source domain to adapt and avoid less desirable ones. Empirical studies on standard datasets demonstrate the advantages of our approach over competing methods. 1.
Visual event recognition in videos by learning from web data
 In CVPR. IEEE
, 2010
"... We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned spacetime pyramid matching method to measure the distances between two video clips, where each video clip is di ..."
Abstract

Cited by 83 (18 self)
 Add to MetaCart
(Show Context)
We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned spacetime pyramid matching method to measure the distances between two video clips, where each video clip is divided into spacetime volumes over multiple levels. We calculate the pairwise distances between any two volumes and further integrate the information from different volumes with Integerflow Earth Mover’s Distance (EMD) to explicitly align the volumes. Second, we propose a new crossdomain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., spacetime feature and static SIFT feature) and 2) cope with the considerable variation in feature distributions between videos from two domains (i.e., web domain and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a crossdomain learning method, referred to as Adaptive Multiple Kernel Learning (AMKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data. 1.
Domain adaptation for largescale sentiment classification: A deep learning approach
 In Proceedings of the Twentyeight International Conference on Machine Learning, ICML
, 2011
"... The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this p ..."
Abstract

Cited by 78 (7 self)
 Add to MetaCart
The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this paper studies the problem of domain adaptation for sentiment classifiers, hereby a system is trained on labeled reviews from one source domain but is meant to be deployed on another. We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. Sentiment classifiers trained with this highlevel feature representation clearly outperform stateoftheart methods on a benchmark composed of reviews of 4 types of Amazon products. Furthermore, this method scales well and allowed us to successfully perform domain adaptation on a larger industrialstrength dataset of 22 domains. 1.
Noisy matrix decomposition via convexrelaxation: Optimal rates in high dimensions
 Annals of Statistics,40(2):1171
"... We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary for ..."
Abstract

Cited by 63 (11 self)
 Add to MetaCart
We analyze a class of estimators based on convex relaxation for solving highdimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation X of the sum of an (approximately) low rank matrix � ⋆ with a second matrix Ɣ ⋆ endowed with a complementary form of lowdimensional structure; this setup includes many statistical models of interest, including factor analysis, multitask regression and robust covariance estimation. We derive a general theorem that bounds the Frobenius norm error for an estimate of the pair ( � ⋆,Ɣ ⋆ ) obtained by solving a convex optimization problem that combines the nuclear norm with a general decomposable regularizer. Our results use a “spikiness ” condition that is related to, but milder than, singular vector incoherence. We specialize our general result to two cases that have been studied in past work: low rank plus an entrywise sparse matrix, and low rank plus a columnwise sparse matrix. For both models, our theory yields nonasymptotic Frobenius error bounds for both deterministic and stochastic noise matrices, and applies to matrices � ⋆ that can be exactly or approximately low rank, and matrices Ɣ ⋆ that can be exactly or approximately sparse. Moreover, for the case of stochastic noise matrices and the identity observation operator, we establish matching lower bounds on the minimax error. The sharpness of our nonasymptotic predictions is confirmed by numerical simulations. 1. Introduction. The