Results 1  10
of
16
Online Clustering of Processes
"... The problem of online clustering is considered in the case where each data point is a sequence generated by a stationary ergodic process. Data arrive in an online fashion so that the sample received at every timestep is either a continuation of some previously received sequence or a new sequence. Th ..."
Abstract

Cited by 12 (11 self)
 Add to MetaCart
(Show Context)
The problem of online clustering is considered in the case where each data point is a sequence generated by a stationary ergodic process. Data arrive in an online fashion so that the sample received at every timestep is either a continuation of some previously received sequence or a new sequence. The dependence between the sequences can be arbitrary. No parametric or independence assumptions are made; the only assumption is that the marginal distribution of each sequence is stationary and ergodic. A novel, computationally efficient algorithm is proposed and is shown to be asymptotically consistent (under a natural notion of consistency). The performance of the proposed algorithm is evaluated on simulated data, as well as on real datasets (motion classification). 1
Reducing statistical timeseries problems to binary classification
 in ‘Neural Information Processing Systems (NIPS)’, Lake Tahoe, Nevada, United States
, 2012
"... classification ..."
(Show Context)
Locating Changes in Highly Dependent Data with Unknown Number of Change Points
"... The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent timeseries, and an asymptotically consistent algorithm is proposed. In order for the consistency to be establi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The problem of multiple change point estimation is considered for sequences with unknown number of change points. A consistency framework is suggested that is suitable for highly dependent timeseries, and an asymptotically consistent algorithm is proposed. In order for the consistency to be established the only assumption required is that the data is generated by stationary ergodic timeseries distributions. No modeling, independence or parametric assumptions are made; the data are allowed to be dependent and the dependence can be of arbitrary form. The theoretical result is complemented with experimental evaluations. 1
Nonparametric multiple change point estimation in highly dependent time series
 In Neural Information Processing Systems (NIPS), Lake Tahoe
"... Abstract. Given a heterogeneous timeseries sample, it is required to find the points in time (called change points) where the probability distribution generating the data has changed. The data is assumed to have been generated by arbitrary, unknown, stationary ergodic distributions. No modelling, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Given a heterogeneous timeseries sample, it is required to find the points in time (called change points) where the probability distribution generating the data has changed. The data is assumed to have been generated by arbitrary, unknown, stationary ergodic distributions. No modelling, independence or mixing assumptions are made. A novel, computationally efficient, nonparametric method is proposed, and is shown to be asymptotically consistent in this general framework; the theoretical results are complemented with experimental evaluations.
A BinaryClassificationBased Metric between TimeSeries Distributions and Its Use in Statistical and Learning Problems
"... A metric between timeseries distributions is proposed that can be evaluated using binary classification methods, which were originally developed to work on i.i.d. data. It is shown how this metric can be used for solving statistical problems that are seemingly unrelated to classification and concer ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A metric between timeseries distributions is proposed that can be evaluated using binary classification methods, which were originally developed to work on i.i.d. data. It is shown how this metric can be used for solving statistical problems that are seemingly unrelated to classification and concern highly dependent time series. Specifically, the problems of timeseries clustering, homogeneity testing and the threesample problem are addressed. Universal consistency of the resulting algorithms is proven under most general assumptions. The theoretical results are illustrated with experiments on synthetic and realworld data. Keywords: distributions time series, reductions, stationary ergodic, clustering, metrics between probability 1.
Uniform hypothesis testing for finitevalued stationary processes
, 2012
"... Given a discretevalued sample X1,...,Xn, we wish to decide whether it was generated by a distribution belonging to a family H0, or it was generated by a distribution belonging to a family H1. In this work, we assume that all distributions are stationary ergodic, and do not make any further assumpti ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Given a discretevalued sample X1,...,Xn, we wish to decide whether it was generated by a distribution belonging to a family H0, or it was generated by a distribution belonging to a family H1. In this work, we assume that all distributions are stationary ergodic, and do not make any further assumptions (e.g. no independence or mixing rate assumptions). We would like to have a test whose probability of error (both Types I and II) is uniformly bounded. More precisely, we require that for each ε there exists a sample size n such that probability of error is upperbounded by ε for samples longer than n. We find some necessary and some sufficient conditions onH0 andH1 under which a consistent test (with this notion of consistency) exists. These conditions are topological, with respect to the topology of distributional distance.
Asymptotically consistent estimation of the number of change points in highly dependent time series
"... The problem of change point estimation is considered in a general framework where the data are generated by arbitrary unknown stationary ergodic process distributions. This means that the data may have longrange dependencies of an arbitrary form. In this context the consistent estimation of the n ..."
Abstract
 Add to MetaCart
The problem of change point estimation is considered in a general framework where the data are generated by arbitrary unknown stationary ergodic process distributions. This means that the data may have longrange dependencies of an arbitrary form. In this context the consistent estimation of the number of change points is provably impossible. A formulation is proposed which overcomes this obstacle: it is possible to find the correct number of change points at the expense of introducing the additional constraint that the correct number of process distributions that generate the data is provided. This additional parameter has a natural interpretation in many realworld applications. It turns out that in this formulation change point estimation can be reduced to time series clustering. Based on this reduction, an algorithm is proposed that finds the number of change points and locates the changes. This algorithm is shown to be asymptotically consistent. The theoretical results are complemented with empirical evaluations. 1.
ASYMPTOTIC STATISTICAL ANALYSIS OF STATIONARY ERGODIC TIME SERIES
"... It is shown how to construct asymptotically consistent efficient algorithms for various statistical problems concerning stationary ergodic time series. The considered problems include clustering, hypothesis testing, changepoint estimation and others. The presented approach is based on empirical ..."
Abstract
 Add to MetaCart
(Show Context)
It is shown how to construct asymptotically consistent efficient algorithms for various statistical problems concerning stationary ergodic time series. The considered problems include clustering, hypothesis testing, changepoint estimation and others. The presented approach is based on empirical estimates of the distributional distance. Some open problems are also discussed. 1.
ONLINECLUSTERINGOFPROCESSES
"... Setup: We have a growing body of sequences of data. Each sequence is generated by on of k unknown discretetime stochastic process. The number k of distributions is known. Data are observed in an online fashion: → New samples arrive at every timestep; they either are continuations of previously r ..."
Abstract
 Add to MetaCart
(Show Context)
Setup: We have a growing body of sequences of data. Each sequence is generated by on of k unknown discretetime stochastic process. The number k of distributions is known. Data are observed in an online fashion: → New samples arrive at every timestep; they either are continuations of previously received sequences or a new sequences. Goal: Cluster the sequences at every timestep. CONSISTENCY In general it is hard to give a precise definition for “correct clustering”. But, a natural notion for correct clustering exists in the considered setting: Sequences generated by the same process distribution should be grouped together. Asymptotic Consistency: A clustering algorithm is (asymptotically) consistent if, with probability 1, for each N ∈ N from some time on, it clusters the first N observed sequences are clustered correctly. ASSUMPTIONS ON DATA • Data revealed in an arbitrary fashion. • Our only assumption is that the distributions generating the data are stationaryergodic. → The samples are allowed to be dependent and the dependence can be arbitrary, or even adversarial. No such assumptions as iid, Markov etc. Remark: In timeseries literature, it is typically assumed that the distributions generating the data have a known form, ex. Gaussian, HMMs etc., and the samples are independent. MAIN THEORETICAL RESULT Theorem: There exists an online clustering algorithm that is asymptotically consistent provided that the distributions generating the data are stationary and ergodic.