Results 1  10
of
27
M.: Changepoint detection in timeseries data by relative densityratio estimation. arXiv 1203.0453
, 2012
"... Abstract. The objective of changepoint detection is to discover abrupt property changes lying behind timeseries data. In this paper, we present a novel statistical changepoint detection algorithm that is based on nonparametric divergence estimation between two retrospective segments. Our method u ..."
Abstract

Cited by 20 (13 self)
 Add to MetaCart
(Show Context)
Abstract. The objective of changepoint detection is to discover abrupt property changes lying behind timeseries data. In this paper, we present a novel statistical changepoint detection algorithm that is based on nonparametric divergence estimation between two retrospective segments. Our method uses the relative Pearson divergence as a divergence measure, and it is accurately and efficiently estimated by a method of direct densityratio estimation. Through experiments on realworld humanactivity sensing, speech, and Twitter datasets, we demonstrate the usefulness of the proposed method.
No Bias Left Behind: Covariate Shift Adaptation for Discriminative 3D Pose Estimation
"... Abstract. Discriminative,or(structured)prediction,methodshaveproved effective for variety of problems in computer vision; a notable exampleis 3Dmonocular pose estimation. Allmethodstodate, however,relied on an assumption thattraining (source) and test (target) datacome from thesameunderlyingjointdis ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Discriminative,or(structured)prediction,methodshaveproved effective for variety of problems in computer vision; a notable exampleis 3Dmonocular pose estimation. Allmethodstodate, however,relied on an assumption thattraining (source) and test (target) datacome from thesameunderlyingjointdistribution.Inmanyrealcases, includingstandard datasets, this assumption is flawed. In presence of training set bias, the learning results in a biased model whose performance degrades on the (target) test set. Under the assumption of covariate shift we propose an unsupervised domain adaptation approach to address this problem. The approach takes the form of training instance reweighting, where the weights are assigned based on the ratio of training and test marginals evaluated at the samples. Learning with the resulting weighted training samples, alleviates the bias in the learned models. We show the efficacy of our approach by proposing weighted variants of Kernel Regression (KR) and Twin Gaussian Processes (TGP). We show that our weighted variants outperform their unweighted counterparts and improve on the stateoftheart performance in the public (HumanEva) dataset. 1
Direct Divergence Approximation between Probability Distributions and Its Applications in Machine Learning
"... Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics, information theory, and machine learning. A divergence approximator can be used for various purposes such as twosample homogeneity testing, changepoint detection, and class ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
(Show Context)
Approximating a divergence between two probability distributions from their samples is a fundamental challenge in statistics, information theory, and machine learning. A divergence approximator can be used for various purposes such as twosample homogeneity testing, changepoint detection, and classbalance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications including feature selection and extraction, clustering, object matching, independent component analysis, and causal direction estimation. In this paper, we review recent advances in divergence approximation. Our emphasis is that directly approximating the divergence without estimating probability distributions is more sensible than a naive twostep approach of first estimating probability distributions and then approximating the divergence. Furthermore, despite the overwhelming popularity of the KullbackLeibler divergence as a divergence measure, we argue that alternatives such as the Pearson divergence, the relative Pearson divergence, and the L2distance are more useful in practice because of their computationally efficient approximability, high numerical stability, and superior robustness against outliers.
Robust learning under uncertain test distributions: Relating covariate shift to model misspecification
 In Proceedings of the 31st international conference on machine learning (icml14
, 2014
"... Abstract Many learning situations involve learning the conditional distribution ppyxq when the training instances are drawn from the training distribution p tr pxq, even though it will later be used to predict for instances drawn from a different test distribution p te pxq. Most current approaches ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract Many learning situations involve learning the conditional distribution ppyxq when the training instances are drawn from the training distribution p tr pxq, even though it will later be used to predict for instances drawn from a different test distribution p te pxq. Most current approaches focus on learning how to reweigh the training examples, to make them resemble the test distribution. However, reweighing does not always help, because (we show that) the test error also depends on the correctness of the underlying model class. This paper analyses this situation by viewing the problem of learning under changing distributions as a game between a learner and an adversary. We characterize when such reweighing is needed, and also provide an algorithm, robust covariate shift adjustment (RCSA), that provides relevant weights. Our empirical studies, on UCI datasets and a realworld cancer prognostic prediction dataset, show that our analysis applies, and that our RCSA works effectively.
Btest: A nonparametric, low variance kernel twosample test
 In Advances in Neural Information Processing Systems
, 2013
"... We propose a family of maximum mean discrepancy (MMD) kernel twosample tests that have low sample complexity and are consistent. The test has a hyperparameter that allows one to control the tradeoff between sample complexity and computational time. Our family of tests, which we denote as Btests, ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We propose a family of maximum mean discrepancy (MMD) kernel twosample tests that have low sample complexity and are consistent. The test has a hyperparameter that allows one to control the tradeoff between sample complexity and computational time. Our family of tests, which we denote as Btests, is both computationally and statistically efficient, combining favorable properties of previously proposed MMD twosample tests. It does so by better leveraging samples to produce low variance estimates in the finite sample case, while avoiding a quadratic number of kernel evaluations and complex nullhypothesis approximation as would be required by tests relying on one sample Ustatistics. The Btest uses a smaller than quadratic number of kernel evaluations and avoids completely the computational burden of complex nullhypothesis approximation, while maintaining consistency and probabilistically conservative thresholds on Type I error. Finally, recent results of combining multiple kernels transfer seamlessly to our hypothesis test, allowing a further increase in discriminative power and decrease in sample complexity. 1
Btests: Low Variance Kernel TwoSample Tests
 in "Neural Information Processing Systems", Lake Tahoe, United States
, 2013
"... A family of maximum mean discrepancy (MMD) kernel twosample tests is introduced. Members of the test family are called Blocktests or Btests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A family of maximum mean discrepancy (MMD) kernel twosample tests is introduced. Members of the test family are called Blocktests or Btests, since the test statistic is an average over MMDs computed on subsets of the samples. The choice of block size allows control over the tradeoff between test power and computation time. In this respect, the Btest family combines favorable properties of previously proposed MMD twosample tests: Btests are more powerful than a linear time test where blocks are just pairs of samples, yet they are more computationally efficient than a quadratic time test where a single large block incorporating all the samples is used to compute a Ustatistic. A further important advantage of the Btests is their asymptotically Normal null distribution: this is by contrast with the Ustatistic, which is degenerate under the null hypothesis, and for which estimates of the null distribution are computationally demanding. Recent results on kernel selection for hypothesis testing transfer seamlessly to the Btests, yielding a means to optimize test power via kernel choice. 1
GeSmart: A Gestural Activity Recognition Model for Predicting Behavioral Health
"... Abstract—To promote independent living for elderly population activity recognition based approaches have been investigated deeply to infer the activities of daily living (ADLs) and instrumental activities of daily living (IADLs). Deriving and integrating the gestural activities (such as talking, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—To promote independent living for elderly population activity recognition based approaches have been investigated deeply to infer the activities of daily living (ADLs) and instrumental activities of daily living (IADLs). Deriving and integrating the gestural activities (such as talking, coughing, and deglutition etc.) along with activity recognition approaches can not only help identify the daily activities or social interaction of the older adults but also provide unique insights into their longterm health care, wellness management and ambulatory conditions. Gestural activities (GAs), in general, help identify finegrained physiological symptoms and chronic psychological conditions which are not directly observable from traditional activities of daily living. In this paper, we propose GeSmart, an energy efficient wearable smart earring based GA recognition model for detecting a combination of speech and nonspeech events. To capture the GAs we propose to use only the accelerometer sensor inside our smart earring due to its energy efficient operations and ubiquitous presence in everyday wearable devices. We present initial results and insights based on a C4.5 classification algorithm to infer the infrequent GAs. Subsequently, we propose a novel changepoint detection based hybrid classification method exploiting the emerging patterns in a variety of GAs to detect and infer infrequent GAs. Experimental results based on real data traces collected from 10 users demonstrate that this approach improves the accuracy of GAs classification by over 23%, compared to previously proposed pure classificationbased solutions. We also note that the accelerometer sensor based earrings are surprisingly informative and energy efficient (by 2.3 times) for identifying different types of GAs. Keywords—smart jewelry, behavioral health, changepoint detection, energy efficiency, cognitive computing. I.
Kernel Methods for Unsupervised Domain Adaptation
, 2015
"... This thesis concludes a wonderful fouryear journey at USC. I would like to take the chance to express my sincere gratitude to my amazing mentors and friends during my Ph.D. training. First and foremost I would like to thank my adviser, Prof. Fei Sha, without whom there would be no single page of th ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This thesis concludes a wonderful fouryear journey at USC. I would like to take the chance to express my sincere gratitude to my amazing mentors and friends during my Ph.D. training. First and foremost I would like to thank my adviser, Prof. Fei Sha, without whom there would be no single page of this thesis. Fei is smart, knowledgeable, and inspiring. Being truly fortunate, I got an enormous amount of guidance and support from him, financially, academically, and emotionally. He consistently and persuasively conveyed the spirit of adventure in research and academia of which I appreciate very much and from which my interests in trying out the faculty life start. On one hand, Fei is tough and sets a high standard on my research at “home”— the TEDS lab he leads. On the other hand, Fei is enthusiastically supportive when I reach out to conferences and the job market. These combined make a wonderful mix. I cherish every mindblowing discussion with him, which sometimes lasted for hours. I would like to thank our longterm collaborator, Prof. Kristen Grauman, whom I see as my other academic adviser. Like Fei, she has set such a great model for me to follow on the road of becoming a good researcher. She is a deep thinker, a fantastic writer, and a hardworking professor. I will never forget how she praised our good work, how she hesitated on my poor
LETTER Communicated by Yongjia Song Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation
"... We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we directly learn the network structure change by estimating the ratio of Ma ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we directly learn the network structure change by estimating the ratio of Markov network models. This densityratio formulation naturally allows us to introduce sparsity in the network structure change, which highly contributes to enhancing interpretability. Furthermore, computation of the normalization term, a critical bottleneck of the naive approach, can be remarkably mitigated. We also give the dual formulation of the optimization problem,which further reduces the computation cost for largescaleMarkov networks. Through experiments, we demonstrate the usefulness of our method. 1