Results 1  10
of
25
Online robust pca via stochastic optimization
 in Adv. Neural Info. Proc. Sys. (NIPS
, 2013
"... Robust PCA methods are typically based on batch optimization and have to load all the samples into memory during optimization. This prevents them from efficiently processing big data. In this paper, we develop an Online Robust PCA (ORPCA) that processes one sample per time instance and hence its m ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Robust PCA methods are typically based on batch optimization and have to load all the samples into memory during optimization. This prevents them from efficiently processing big data. In this paper, we develop an Online Robust PCA (ORPCA) that processes one sample per time instance and hence its memory cost is independent of the number of samples, significantly enhancing the computation and storage efficiency. The proposed ORPCA is based on stochastic optimization of an equivalent reformulation of the batch RPCA. Indeed, we show that ORPCA provides a sequence of subspace estimations converging to the optimum of its batch counterpart and hence is provably robust to sparse corruption. Moreover, ORPCA can naturally be applied for tracking dynamic subspace. Comprehensive simulations on subspace recovering and tracking demonstrate the robustness and efficiency advantages of the ORPCA over online PCA and batch RPCA methods. 1
An online algorithm for separating sparse and lowdimensional signal sequences from their sum
 IEEE Trans. Signal Process
"... Abstract—This paper designs and extensively evaluates an online algorithm, called practical recursive projected compressive sensing (PracReProCS), for recovering a time sequence of sparse vectors and a time sequence of dense vectors from their sum, , when the ’s lie in a slowly changing lowdimens ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper designs and extensively evaluates an online algorithm, called practical recursive projected compressive sensing (PracReProCS), for recovering a time sequence of sparse vectors and a time sequence of dense vectors from their sum, , when the ’s lie in a slowly changing lowdimensional subspace of the full space. A key application where this problem occurs is in realtime video layering where the goal is to separate a video sequence into a slowly changing background sequence and a sparse foreground sequence that consists of one or more moving regions/objects onthefly. PracReProCS is a practical modification of its theoretical counterpart which was analyzed in our recent work. Extension to the undersampled case is also developed. Extensive experimental comparisons demonstrating the advantage of the approach for both simulated and real videos, over existing batch and recursive methods, are shown. Index Terms—Online robust PCA, recursive sparse recovery, large but structured noise, compressed sensing. I.
Decentralized sparsityregularized rank minimization: Algorithms and applications
 IEEE Trans. Signal Process
, 2013
"... Abstract—Given a limited number of entries from the superposition of a lowrank matrix plus the product of a known compression matrix times a sparse matrix, recovery of the lowrank and sparse components is a fundamental task subsuming compressed sensing, matrix completion, and principal components ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
(Show Context)
Abstract—Given a limited number of entries from the superposition of a lowrank matrix plus the product of a known compression matrix times a sparse matrix, recovery of the lowrank and sparse components is a fundamental task subsuming compressed sensing, matrix completion, and principal components pursuit. This paper develops algorithms for decentralized sparsityregularized rank minimization over networks, when the nuclear andnorm are used as surrogates to the rank and nonzero entry counts of the sought matrices, respectively. While nuclearnorm minimization has welldocumented merits when centralized processing is viable, nonseparability of the singularvalue sum challenges its decentralized minimization. To overcome this limitation, leveraging an alternative characterization of the nuclear norm yields a separable, yet nonconvex cost minimized via the alternatingdirection method of multipliers. Interestingly, if the decentralized (nonconvex) estimator converges, under certain conditions it provably attains the global optimum of its centralized counterpart. As a result, this paper bridges the performance gap between centralized and innetwork decentralized, sparsityregularized rankminimization. This, in turn, facilitates (stable) recovery of the low rank and sparse model matrices through reducedcomplexity pernode computations, and affordable message passing among singlehop neighbors. Several application domains are outlined to highlight the generality and impact of the proposed framework. These include unveiling traffic anomalies in backbone networks, and predicting networkwide path latencies. Simulations with synthetic and real network data confirm the convergence of the novel decentralized algorithm, and its centralized performance guarantees. Index Terms—Decentralized optimization, sparsity, nuclear norm, low rank, networks, Lasso, matrix completion. I.
Subspace learning and imputation for streaming big data matrices and tensors
 IEEE Trans. Signal Process
, 2015
"... Abstract—Extracting latent lowdimensional structure from highdimensional data is of paramount importance in timely inference tasks encountered with “Big Data ” analytics. However, increasingly noisy, heterogeneous, and incomplete datasets, as well as the need for realtime processing of streaming ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Extracting latent lowdimensional structure from highdimensional data is of paramount importance in timely inference tasks encountered with “Big Data ” analytics. However, increasingly noisy, heterogeneous, and incomplete datasets, as well as the need for realtime processing of streaming data, pose major challenges to this end. In this context, the present paper permeates benefits from rank minimization to scalable imputation of missing data, via tracking lowdimensional subspaces and unraveling latent (possibly multiway) structure from incomplete streaming data. For lowrank matrix data, a subspace estimator is proposed based on an exponentially weighted leastsquares criterion regularized with the nuclear norm. After recasting the nonseparable nuclear norm into a form amenable to online optimization, realtime algorithms with complementary strengths are developed, and their convergence is established under simplifying technical assumptions. In a stationary setting, the asymptotic estimates obtained offer the welldocumented performance guarantees of the batch nuclearnorm regularized estimator. Under the same unifying framework, a novel online (adaptive) algorithm is developed to obtain multiway decompositions of lowrank tensors with missing entries and perform imputation as a byproduct. Simulated tests with both synthetic as well as real Internet and cardiac magnetic resonance imagery (MRI) data confirm the efficacy of the proposed algorithms, and their superior performance relative to stateoftheart alternatives. Index Terms—Low rank, matrix and tensor completion, missing data, streaming analytics, subspace tracking. I.
Robust pca with partial subspace knowledge,”
 in IEEE Intl. Symp. on Information Theory (ISIT),
, 2014
"... AbstractIn recent work, robust Principal Components Analysis (PCA) has been posed as a problem of recovering a lowrank matrix L and a sparse matrix S from their sum, M := L + S and a provably exact convex optimization solution called PCP has been proposed. This work studies the following problem. ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
AbstractIn recent work, robust Principal Components Analysis (PCA) has been posed as a problem of recovering a lowrank matrix L and a sparse matrix S from their sum, M := L + S and a provably exact convex optimization solution called PCP has been proposed. This work studies the following problem. Suppose that we have partial knowledge about the column space of the low rank matrix L. Can we use this information to improve the PCP solution, i.e. allow recovery under weaker assumptions? We propose here a simple but useful modification of the PCP idea, called modifiedPCP, that allows us to use this knowledge. We derive its correctness result which shows that, when the available subspace knowledge is accurate, modifiedPCP indeed requires significantly weaker incoherence assumptions than PCP. Extensive simulations are also used to illustrate this. Comparisons with PCP and other existing work are shown for a stylized real application as well. Finally, we explain how this problem naturally occurs in many applications involving time series data, i.e. in what is called the online or recursive robust PCA problem. A corollary for this case is also given.
Giannakis, “Rank minimization for subspace tracking from incomplete data
 in Proc. IEEE Int. Conf. on Acoust., Speech Signal Process
, 2013
"... Extracting latent lowdimensional structure from highdimensional data is of paramount importance in timely inference tasks encountered with ‘Big Data ’ analytics. However, increasingly noisy, heterogeneous, and incomplete datasets as well as the need for realtime processing pose major challenges ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Extracting latent lowdimensional structure from highdimensional data is of paramount importance in timely inference tasks encountered with ‘Big Data ’ analytics. However, increasingly noisy, heterogeneous, and incomplete datasets as well as the need for realtime processing pose major challenges towards achieving this goal. In this context, the fresh look advocated here permeates benefits from rank minimization to track lowdimensional subspaces from incomplete data. Leveraging the lowdimensionality of the subspace sought, a novel estimator is proposed based on an exponentiallyweighted leastsquares criterion regularized with the nuclear norm. After recasting the nonseparable nuclear norm into a form amenable to online optimization, a realtime algorithm is developed and its convergence established under simplifying technical assumptions. The novel subspace tracker can asymptotically offer the welldocumented performance guarantees of the batch nuclearnorm regularized estimator. Simulated tests with real Internet data confirm the efficacy of the proposed algorithm in tracking the traffic subspace, and its superior performance relative to stateoftheart alternatives. Index Terms — Low rank, online algorithm, matrix completion. 1.
Giannakis, “Load curve data cleansing and imputation via sparsity and low rank
 IEEE Trans. Smart Grid
, 2013
"... Abstract—The smart grid vision is to build an intelligent power network with an unprecedented level of situational awareness and controllability over its services and infrastructure. This paper advocates statistical inference methods to robustify power monitoring tasks against the outlier effects o ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—The smart grid vision is to build an intelligent power network with an unprecedented level of situational awareness and controllability over its services and infrastructure. This paper advocates statistical inference methods to robustify power monitoring tasks against the outlier effects owing to faulty readings and malicious attacks, as well as against missing data due to privacy concerns and communication errors. In this context, a novel load cleansing and imputation scheme is developed leveraging the low intrinsicdimensionality of spatiotemporal load profiles and the sparse nature of “bad data. ” A robust estimator based on principal components pursuit (PCP) is adopted, which effects a twofold sparsitypromoting regularization through annorm of the outliers, and the nuclear norm of the nominal load profiles. Upon recasting the nonseparable nuclear norm into a form amenable to decentralized optimization, a distributed (D) PCP algorithm is developed to carry out the imputation and cleansing tasks using networked devices comprising the sotermed advanced metering infrastructure. If DPCP converges and a qualification inequality is satisfied, the novel distributed estimator provably attains the performance of its centralized PCP counterpart, which has access to all networkwide data. Computer simulations and tests with real load curve data corroborate the convergence and effectiveness of the novel DPCP algorithm. Index Terms—Advancedmetering infrastructure, distributed algorithms, load curve cleansing and imputation, principal components pursuit, smart grid. I.
1Big Data Analytics in Future Internet of Things
"... Current research on Internet of Things (IoT) mainly focuses on how to enable general objects to see, hear, and smell the physical world for themselves, and make them connected to share the observations. In this paper, we argue that only connected is not enough, beyond that, general objects should ha ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Current research on Internet of Things (IoT) mainly focuses on how to enable general objects to see, hear, and smell the physical world for themselves, and make them connected to share the observations. In this paper, we argue that only connected is not enough, beyond that, general objects should have the capability to learn, think, and understand both the physical world by themselves. On the other hand, the future IoT will be highly populated by large numbers of heterogeneous networked embedded devices, which are generating massive or big data in an explosive fashion. Although there is a consensus among almost everyone on the great importance of big data analytics in IoT, to date, limited results, especially the mathematical foundations, are obtained. These practical needs impels us to propose a systematic tutorial on the development of effective algorithms for big data analytics in future IoT, which are grouped into four classes: 1) heterogeneous data processing, 2) nonlinear data processing, 3) highdimensional data processing, and 4) distributed and parallel data processing. We envision that the presented research is offered as a mere baby step in a potentially fruitful research direction. We hope that this article, with interdisciplinary perspectives, will stimulate more interests in research and development of practical and effective algorithms for specific IoT applications, to enable smart resource allocation, automatic network operation, and intelligent service provisioning. Index Terms Internet of Things, big/massive data analytics, heterogeneous/nonlinear/highdimensional/distributed and parallel data processing I.
Generalized Low Rank Models
, 2014
"... Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Principal components analysis (PCA) is a wellknown technique for approximating a data set represented by a matrix by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, kmeans, kSVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results. This manuscript is a draft. Comments sent to udell@stanford.edu are welcome.