Results 1 -
7 of
7
A Survey on Transfer Learning
"... A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task i ..."
Abstract
-
Cited by 59 (8 self)
- Add to MetaCart
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as co-variate shift. We also explore some potential future issues in transfer learning research.
Feature selection by transfer learning with linear regularized models
- Lecture Notes in Artificial Intelligence
, 2009
"... Abstract. This paper presents a novel feature selection method for classification of high dimensional data, such as those produced by microarrays. It includes a partial supervision to smoothly favor the selection of some dimensions (genes) on a new dataset to be classified. The dimensions to be favo ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. This paper presents a novel feature selection method for classification of high dimensional data, such as those produced by microarrays. It includes a partial supervision to smoothly favor the selection of some dimensions (genes) on a new dataset to be classified. The dimensions to be favored are previously selected from similar datasets in large microarray databases, hence performing inductive transfer learning at the feature level. This technique relies on a feature selection method embedded within a regularized linear model estimation. A practical approximation of this technique reduces to linear SVM learning with iterative input rescaling. The scaling factors depend on the selected dimensions from the related datasets. The final selection may depart from those whenever necessary to optimize the classification objective. Experiments on several microarray datasets show that the proposed method both improves the selected gene lists stability, with respect to sampling variation, as well as the classification performances. 1
Cross Domain Distribution Adaptation via Kernel Mapping
"... When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between sou ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
When labeled examples are limited and difficult to obtain, transfer learning employs knowledge from a source domain to improve learning accuracy in the target domain. However, the assumption made by existing approaches, that the marginal and conditional probabilities are directly related between source and target domains, has limited applicability in either the original space or its linear transformations. To solve this problem, we propose an adaptive kernel approach that maps the marginal distribution of targetdomain and source-domain data into a common kernel space, and utilize a sample selection strategy to draw conditional probabilities between the two domains closer. We formally show that under the kernel-mapping space, the difference in distributions between the two domains is bounded; and the prediction error of the proposed approach can also be bounded. Experimental results demonstrate that the proposed method outperforms both traditional inductive classifiers and the state-of-the-art boosting-based transfer algorithms on most domains, including text categorization and web page ratings. In particular, it can achieve around 10 % higher accuracy than other approaches for the text categorization problem. The source code and datasets are available from the authors.
Feature Relevance Network-Based Transfer Learning for Indoor Location Estimation
"... Abstract—We present a new machine learning framework for indoor location estimation. In many cases, locations could be easily estimated using various traditional positioning methods and conventional machine learning approaches based on signalling devices, e.g., access points (APs). When there exist ..."
Abstract
- Add to MetaCart
Abstract—We present a new machine learning framework for indoor location estimation. In many cases, locations could be easily estimated using various traditional positioning methods and conventional machine learning approaches based on signalling devices, e.g., access points (APs). When there exist environmental changes, however, such traditional methods cannot be employed due to data distribution change. In order to circumvent this difficulty, we introduce feature relevance network-based method, which focuses on interrelatedness among features. Feature relevance networks are connected graphs representing concurrency of the signalling devices such as APs. In the newly created relevance network, a test instance and the prototype of a location are expanded until convergence. The expansion cost corresponds to distance between the test instance and the prototype. Unlike other methods, our model is nonparametric making no assumptions about signal distributions. The proposed method is applied to the 2007 IEEE International Conference on Data Mining Data Mining Contest Task #2 (transfer learning), which is a typical example situation where the training and test datasets have been gathered during different periods. Using the proposed method, we accomplish the estimation accuracy of 0.3238, which is better than the best result of the contest. Index Terms—Feature relevance networks, indoor location estimation, transfer learning. I.
Gaussian Process for Dimensionality Reduction in Transfer Learning ∗
"... Dimensionality reduction has been considered as one of the most significant tools for data analysis. In general, supervised information is helpful for dimensionality reduction. However, in typical real applications, supervised information in multiple source tasks may be available, while the data of ..."
Abstract
- Add to MetaCart
Dimensionality reduction has been considered as one of the most significant tools for data analysis. In general, supervised information is helpful for dimensionality reduction. However, in typical real applications, supervised information in multiple source tasks may be available, while the data of the target task are unlabeled. An interesting problem of how to guide the dimensionality reduction for the unlabeled target data by exploiting useful knowledge, such as label information, from multiple source tasks arises in such a scenario. In this paper, we propose a new method for dimensionality reduction in the transfer learning setting. Unlike traditional paradigms where the useful knowledge from multiple source tasks is transferred through distance metric, our proposal firstly converts the dimensionality reduction problem into integral regression problems in parallel. Gaussian process is then employed to learn the underlying relationship between the original data and the reduced data. Such a relationship can be appropriately transferred to the target task by exploiting the prediction ability of the Gaussian process model and inventing different kinds of regularizers. Extensive experiments on both synthetic and real data sets show the effectiveness of our method.
Feature-based Inductive Transfer Learning through Minimum Encoding ∗
"... This paper proposes an Extended Minimum Description Length Principle (EMDLP) for feature-based inductive transfer learning, in which both the source and the target data sets contain class labels and relevant features are transferred from the source domain to the target one. Despite numerous works on ..."
Abstract
- Add to MetaCart
This paper proposes an Extended Minimum Description Length Principle (EMDLP) for feature-based inductive transfer learning, in which both the source and the target data sets contain class labels and relevant features are transferred from the source domain to the target one. Despite numerous works on this topic, few of them have a solid theoretical framework and are parameter-free. Our EMDLP overcomes these flaws and allows us to evaluate the inferiority of the results of transfer learning with the add-sum of the code lengths of five components: the corresponding two hypotheses, the two data sets with the help of the hypotheses, and the set of the transferred features. We design a code book to build the connections between the source and the target tasks. Extensive experiments using both real and artificial data sets show that EMDLP is robust against noise and performs better on the classification accuracy than the state-of-the-art methods.
Ranking Function Adaptation with Boosting Trees
, 2011
"... Machine learned ranking functions have shown successes in web search engines. With the increasing demands on developing effective ranking functions for different search domains, we have seen a big bottleneck, i.e., the problem of insufficient labeled training data, which has significantly slowed the ..."
Abstract
- Add to MetaCart
Machine learned ranking functions have shown successes in web search engines. With the increasing demands on developing effective ranking functions for different search domains, we have seen a big bottleneck, i.e., the problem of insufficient labeled training data, which has significantly slowed the development and deployment of machine learned ranking functions for different domains. There are two possible approaches to address this problem: (1) combining labeled training data from similar domains with the small targetdomain labeled data for training or (2) using pairwise preference data extracted from user clickthrough log for the target domain for training. In this paper, we propose a new approach called tree based ranking function adaptation (“Trada”) to effectively utilize these data sources for training cross-domain ranking functions. Tree adaptation assumes that ranking functions are trained with the Stochastic Gradient Boosting Trees method − a gradient boosting method on regression trees. It takes such a ranking function from one domain and tunes its tree based structure with a small amount of training data from the target domain. The unique features include (1) it can automatically identify the part of model that needs adjustment for the new domain, (2) it can appropriately weigh training examples considering both local and global distributions. Based on a novel pairwise loss function that we developed for pairwise learning, the basic tree adaptation algorithm is also extended (“Pairwise Trada”) to utilize the pairwise preference data from the target domain

