Results 1  10
of
134
Analysis of representations for domain adaptation
 In NIPS
, 2007
"... Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS) ..."
Abstract

Cited by 165 (11 self)
 Add to MetaCart
(Show Context)
Domain is a distribution D on an instance set X Domain adaptation of a classifier A classification task Source domain (DS)
Learning bounds for domain adaptation
 In Advances in Neural Information Processing Systems
, 2008
"... Empirical risk minimization offers wellknown learning guarantees when training and test data come from the same domain. In the real world, though, we often wish to adapt a classifier from a source domain with a large amount of training data to different target domain with very little training data. ..."
Abstract

Cited by 80 (8 self)
 Add to MetaCart
Empirical risk minimization offers wellknown learning guarantees when training and test data come from the same domain. In the real world, though, we often wish to adapt a classifier from a source domain with a large amount of training data to different target domain with very little training data. In this work we give uniform convergence bounds for algorithms that minimize a convex combination of source and target empirical risk. The bounds explicitly model the inherent tradeoff between training on a large but inaccurate source data set and a small but accurate target training set. Our theory also gives results when we have multiple source domains, each of which may have a different number of instances, and we exhibit cases in which minimizing a nonuniform combination of source risks can achieve much lower target error than standard empirical risk minimization. 1
A kernel method for the two sample problem
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 19
, 2007
"... We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert ..."
Abstract

Cited by 72 (19 self)
 Add to MetaCart
We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our twosample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.
Learning from timechanging data with adaptive windowing
 In SIAM International Conference on Data Mining
, 2007
"... We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window it ..."
Abstract

Cited by 66 (20 self)
 Add to MetaCart
(Show Context)
We present a new approach for dealing with distribution change and concept drift when learning from data sequences that may vary with time. We use sliding windows whose size, instead of being fixed a priori, is recomputed online according to the rate of change observed from the data in the window itself. This delivers the user or programmer from having to guess a timescale for change. Contrary to many related works, we provide rigorous guarantees of performance, as bounds on the rates of false positives and false negatives. Using ideas from data stream algorithmics, we develop a time and memoryefficient version of this algorithm, called ADWIN2. We show how to combine ADWIN2 with the Naïve Bayes (NB) predictor, in two ways: one, using it to monitor the error rate of the current model and declare when revision is necessary and, two, putting it inside the NB predictor to maintain uptodate estimations of conditional probabilities in the data. We test our approach using synthetic and real data streams and compare them to both fixedsize and variablesize window strategies with good results.
Domain Adaptation: Learning Bounds and Algorithms
"... This paper addresses the general problem of domain adaptation which arises in a variety of applications where the distribution of the labeled sample available somewhat differs from that of the test data. Building on previous work by BenDavid et al. (2007), we introduce a novel distance between dist ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
This paper addresses the general problem of domain adaptation which arises in a variety of applications where the distribution of the labeled sample available somewhat differs from that of the test data. Building on previous work by BenDavid et al. (2007), we introduce a novel distance between distributions, discrepancy distance, that is tailored to adaptation problems with arbitrary loss functions. We give Rademacher complexity bounds for estimating the discrepancy distance from finite samples for different loss functions. Using this distance, we derive new generalization bounds for domain adaptation for a wide family of loss functions. We also present a series of novel adaptation bounds for large classes of regularizationbased algorithms, including support vector machines and kernel ridge regression based on the empirical discrepancy. This motivates our analysis of the problem of minimizing the empirical discrepancy for various loss functions for which we also give several algorithms. We report the results of preliminary experiments that demonstrate the benefits of our discrepancy minimization algorithms for domain adaptation. 1
Supporting network coordinates on PlanetLab
 In WORLDS
, 2005
"... Largescale distributed applications need latency information to make networkaware routing decisions. Collecting these measurements, however, can impose a high burden. Network coordinates are a scalable and efficient way to supply nodes with uptodate latency estimates. We present our experience o ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
Largescale distributed applications need latency information to make networkaware routing decisions. Collecting these measurements, however, can impose a high burden. Network coordinates are a scalable and efficient way to supply nodes with uptodate latency estimates. We present our experience of maintaining network coordinates on PlanetLab. We present two different APIs for accessing coordinates: a perapplication library, which takes advantage of applicationlevel traffic, and a standalone service, which is shared across applications. Our results show that statistical filtering of latency samples improves accuracy and stability and that a small number of neighbors is sufficient when updating coordinates. 1
An informationtheoretic approach to detecting changes in multidimensional data streams
 In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications
, 2006
"... Abstract An important problem in processing large data streams is detecting changes in the underlying distribution that generates the data. The challenge in designing change detection schemes is making them general, scalable, and statistically sound. In this paper, we take a general,informationthe ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
Abstract An important problem in processing large data streams is detecting changes in the underlying distribution that generates the data. The challenge in designing change detection schemes is making them general, scalable, and statistically sound. In this paper, we take a general,informationtheoretic approach to the change detection problem, which works for multidimensional as well as categorical data. We use relative entropy, also called the KullbackLeiblerdistance, to measure the difference between two given distributions. The KLdistance is known to be related to the optimal error in determining whether the two distributions are the sameand draws on fundamental results in hypothesis testing. The KLdistance also generalizes traditional distance measures in statistics, and has invariance properties that make it ideally suitedfor comparing distributions. Our scheme is general; it is nonparametric and requires no assumptions on the underlyingdistributions. It employs a statistical inference procedure based on the theory of bootstrapping, which allows us to determine whether our measurements are statistically significant. The schemeis also quite flexible from a practical perspective; it can be implemented using any spatial partitioning scheme that scales well with dimensionality. In addition to providing change detections,our method generalizes Kulldorff's spatial scan statistic, allowing us to quantitatively identify specific regions in space where large changes have occurred.We provide a detailed experimental study that demonstrates the generality and efficiency of our approach with different kinds of multidimensional datasets, both synthetic and real. 1 Introduction We are collecting and storing data in unprecedented quantities and varietiesstreams, images, audio, text, metadata descriptions, and even simple numbers. Over time, these data streams change as the underlying processes that generate them change. Some changes are spurious and pertain to glitches in the data. Some are genuine, caused by changes in the underlying distributions. Some changes are gradual and some are more precipitous. We would like to detect changes in a variety of settings:
On Appropriate Assumptions to Mine Data Streams: Analysis and Practice
"... Recent years have witnessed an increasing number of studies in stream mining, which aim at building an accurate model for continuously arriving data. Somehow most existing work makes the implicit assumption that the training data and the yettocome testing data are always sampled from the “same dis ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
Recent years have witnessed an increasing number of studies in stream mining, which aim at building an accurate model for continuously arriving data. Somehow most existing work makes the implicit assumption that the training data and the yettocome testing data are always sampled from the “same distribution”, and yet this “same distribution” evolves over time. We demonstrate that this may not be true, and one actually may never know either “how ” or “when ” the distribution changes. Thus, a model that fits well on the observed distribution can have unsatisfactory accuracy on the incoming data. Practically, one can just assume the bare minimum that learning from observed data is better than both random guessing and always predicting exactly the same class label. Importantly, we formally and
Changepoint detection in timeseries data by direct densityratio estimation
 Proceedings of 2009 SIAM International Conference on Data Mining (SDM2009
, 2009
"... Changepoint detection is the problem of discovering time points at which properties of timeseries data change. This covers a broad range of realworld problems and has been actively discussed in the community of statistics and data mining. In this paper, we present a novel nonparametric approach ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
(Show Context)
Changepoint detection is the problem of discovering time points at which properties of timeseries data change. This covers a broad range of realworld problems and has been actively discussed in the community of statistics and data mining. In this paper, we present a novel nonparametric approach to detecting the change of probability distributions of sequence data. Our key idea is to estimate the ratio of probability densities, not the probability densities themselves. This formulation allows us to avoid nonparametric density estimation, which is known to be a difficult problem. We provide a changepoint detection algorithm based on direct densityratio estimation that can be computed very efficiently in an online manner. The usefulness of the proposed method is demonstrated through experiments using artificial and real datasets.
Impossibility Theorems for Domain Adaptation
"... The domain adaptation problem in machine learning occurs when the test data generating distribution differs from the one that generates the training data. It is clear that the success of learning under such circumstances depends on similarities between the two data distributions. We study assumption ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
The domain adaptation problem in machine learning occurs when the test data generating distribution differs from the one that generates the training data. It is clear that the success of learning under such circumstances depends on similarities between the two data distributions. We study assumptions about the relationship between the two distributions that one needed for domain adaptation learning to succeed. We analyze the assumptions in an agnostic PACstyle learning model for a the setting in which the learner can access a labeled training data sample and an unlabeled sample generated by the test data distribution. We focus on three assumptions: (i) similarity between the unlabeled distributions, (ii) existence of a classifier in the hypothesis class with low error on both training and testing distributions, and (iii) the covariate shift assumption. I.e., the assumption that the conditioned label distribution (for each data point) is the same for both the training and test distributions. We show that without either assumption (i) or (ii), the combination of the remaining assumptions is not sufficient to guarantee successful learning. Our negative results hold with respect to any domain adaptation learning algorithm, as long as it does not have access to target labeled examples. In particular, we provide formal proofs that the popular covariate shift assumption is rather weak and does not relieve the necessity of the other assumptions. We also discuss the intuitively appealing