#### DMCA

## Dynamic anomalography: Tracking network anomalies via sparsity and low rank (2013)

Citations: | 22 - 10 self |

### Citations

3461 | The Elements of Statistical Learning - Hastie, Tibshirani, et al. - 2001 |

1430 | An introduction to compressive sampling
- Candès, Wakin
- 2008
(Show Context)
Citation Context ...n the low-rank property of the traffic matrix and the sparsity of the anomalies, the fresh look advocated here permeates benefits from rank minimization [8], [9], [11], and compressive sampling [10], =-=[12]-=-, to perform dynamic anomalography. The aim is to construct a map of network anomalies in real time, that offers a succinct depiction of the network ‘health state’ across both the flow and time dimens... |

1385 | Decoding by linear programming
- Candes, Tao
- 2005
(Show Context)
Citation Context ...zing on the low-rank property of the traffic matrix and the sparsity of the anomalies, the fresh look advocated here permeates benefits from rank minimization [8], [9], [11], and compressive sampling =-=[10]-=-, [12], to perform dynamic anomalography. The aim is to construct a map of network anomalies in real time, that offers a succinct depiction of the network ‘health state’ across both the flow and time ... |

1049 | A fast iterative shrinkage-thresholding algorithm for linear inverse problems
- Beck, Teboulle
(Show Context)
Citation Context ...implifying technical assumptions in Section IV-B. For situations were reducing computational complexity is critical, an online stochastic gradient algorithm based on Nesterov’s accelaration technique =-=[5]-=-, [29] is developed as well (Section V-A). The possibility of implementing the anomaly trackers in a distributed fashion is further outlined in Section V-B, where several directions for future researc... |

864 | Exact matrix completion via convex optimization
- Candes, Recht
- 2009
(Show Context)
Citation Context ...to the measurement horizon. Capitalizing on the low-rank property of the traffic matrix and the sparsity of the anomalies, the fresh look advocated here permeates benefits from rank minimization [8], =-=[9]-=-, [11], and compressive sampling [10], [12], to perform dynamic anomalography. The aim is to construct a map of network anomalies in real time, that offers a succinct depiction of the network ‘health ... |

561 | Gaurateed minimum-rank solutions of linear matrix equations via nuclear norm minimization
- Recht, Fazel, et al.
- 2007
(Show Context)
Citation Context ...NP-hard to optimize [16], [28]. Typically, the nuclear norm ‖X‖∗ and the ℓ1-norm ‖A‖1 are adopted as surrogates, since they are the closest convex approximants to rank(X) and ‖A‖0, respectively [10], =-=[30]-=-, [36]. Accordingly, one solves (P1) min {X,A} 1 2 ‖PΩ(Y −X−RA)‖2F + λ∗‖X‖∗ + λ1‖A‖1 (4) where λ∗, λ1 ≥ 0 are rank- and sparsity-controlling parameters. When an estimate σ̂2v of the noise variance is ... |

560 | Robust principal component analysis
- Candès, Li, et al.
- 2011
(Show Context)
Citation Context ...tive to the measurement horizon. Capitalizing on the low-rank property of the traffic matrix and the sparsity of the anomalies, the fresh look advocated here permeates benefits from rank minimization =-=[8]-=-, [9], [11], and compressive sampling [10], [12], to perform dynamic anomalography. The aim is to construct a map of network anomalies in real time, that offers a succinct depiction of the network ‘he... |

554 | A singular value thresholding algorithm for matrix completion
- Cai, Candès, et al.
- 2010
(Show Context)
Citation Context ...ly solved by alternating minimization over A (which corresponds to Lasso) and X. The minimizations with respect to X can be carried out using the iterative singular-value thresholding (SVT) algorithm =-=[7]-=-. Note that with full data, SVT requires only a single SVD computation. In the presence of missing data however, the SVT algorithm may require several SVD computations until convergence, rendering the... |

553 |
Sparse approximate solutions to linear systems
- Natarajan
- 1995
(Show Context)
Citation Context ... minimize the rank of X, and the number of nonzero entries of A measured by its ℓ0-(pseudo) norm. Unfortunately, albeit natural both rank and ℓ0-norm criteria are in general NP-hard to optimize [16], =-=[28]-=-. Typically, the nuclear norm ‖X‖∗ and the ℓ1-norm ‖A‖1 are adopted as surrogates, since they are the closest convex approximants to rank(X) and ‖A‖0, respectively [10], [30], [36]. Accordingly, one s... |

475 | Just relax: convex programming methods for identifying sparse signals in noise
- Tropp
- 2006
(Show Context)
Citation Context ...d to optimize [16], [28]. Typically, the nuclear norm ‖X‖∗ and the ℓ1-norm ‖A‖1 are adopted as surrogates, since they are the closest convex approximants to rank(X) and ‖A‖0, respectively [10], [30], =-=[36]-=-. Accordingly, one solves (P1) min {X,A} 1 2 ‖PΩ(Y −X−RA)‖2F + λ∗‖X‖∗ + λ1‖A‖1 (4) where λ∗, λ1 ≥ 0 are rank- and sparsity-controlling parameters. When an estimate σ̂2v of the noise variance is availa... |

362 | Diagnosing network-wide traffic anomalies
- Lakhina, Crovella, et al.
- 2004
(Show Context)
Citation Context ... flows, and periodic behavior across time [21], [41]. Exploiting the low-rank structure of the anomaly-free traffic matrix, a landmark principal component analysis (PCA)-based method was put forth in =-=[20]-=- to identify network anomalies; see also [27] for a distributed implementation. A limitation of the algorithm in [20] is that it cannot identify the anomalous flows. Most importantly, [20] has not exp... |

346 | Fundamentals of Adaptive Filtering - Sayed - 2003 |

325 | Online learning for matrix factorization and sparse coding
- Mairal, Bach, et al.
(Show Context)
Citation Context ...ession matrices F[t]Rt, and thus compromise the uniqueness of the Lasso solutions. This also increases the likelihood that ∇2Ĉt(P) = λ∗t ILρ + 1t ∑t τ=1(q[τ ]q ′[τ ]) ⊗ Ωτ cILρ holds. As argued in =-=[23]-=-, if needed one could incorporate additional regularization terms in the cost function to enforce a4) and a5). Before moving on to the proof, a remark is in order. Remark 3 (Performance guarantees): I... |

296 | Convergence of a block coordinate descent method for nondifferentiable minimization - Tseng |

296 | der Vaart, Asymptotic Statistics - Van - 2000 |

295 |
A method of solving a convex programming problem with convergence rateO(1/k2
- Nesterov
- 1983
(Show Context)
Citation Context ...fying technical assumptions in Section IV-B. For situations were reducing computational complexity is critical, an online stochastic gradient algorithm based on Nesterov’s accelaration technique [5], =-=[29]-=- is developed as well (Section V-A). The possibility of implementing the anomaly trackers in a distributed fashion is further outlined in Section V-B, where several directions for future research are ... |

253 | Matrix completion with noise
- Candes, Plan
- 2010
(Show Context)
Citation Context ...e measurement horizon. Capitalizing on the low-rank property of the traffic matrix and the sparsity of the anomalies, the fresh look advocated here permeates benefits from rank minimization [8], [9], =-=[11]-=-, and compressive sampling [10], [12], to perform dynamic anomalography. The aim is to construct a map of network anomalies in real time, that offers a succinct depiction of the network ‘health state’... |

231 |
Projection approximation subspace tracking
- Yang
- 1995
(Show Context)
Citation Context ...l gradient descent on the Grassmannian manifold of subspaces was put forth in [4]. The second-order RLS-type algorithm in [15] extends the seminal projection approximation subspace tracking algorithm =-=[39]-=- to handle missing data. When outliers are present, robust counterparts can be found in [14], [18], [26]. Relative to all aforementioned works, the estimation problem here is more challenging due to t... |

225 | Rank-sparsity incoherence for matrix decomposition
- Chandrasekaran, Sanghavi, et al.
(Show Context)
Citation Context ...ling (CS); see e.g., [10] and the tutorial account [12]. The decomposition Y = X+A corresponds to principal component pursuit (PCP), also referred to as robust principal component analysis (PCA) [8], =-=[13]-=-. PCP was adopted for network anomaly detection using flow (not link traffic) measurements in [2]. For the idealized IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (SUBMITTED) 5 noise-free setti... |

165 | Structural Analysis of Network Traffic Flows. - Lakhina, Papagiannaki, et al. - 2004 |

98 |
Anomaly Detection in IP Networks.
- Thottan, Ji
- 2003
(Show Context)
Citation Context ... which in this case remains invariant during the three weeks of data acquisition [1]. Even though Y is “constructed” here from flow measurements, link loads can be typically acquired from SNMP traces =-=[35]-=-. The available OD flows are incomplete due to problems in the data collection process. In addition, flows can be modeled as the superposition of “clean” plus anomalous traffic, i.e., the sum of some ... |

92 | Compressive principal component pursuit
- Wright, Ganesh, et al.
- 2013
(Show Context)
Citation Context ... λ∗‖X‖∗ + λ1‖A‖1 (4) where λ∗, λ1 ≥ 0 are rank- and sparsity-controlling parameters. When an estimate σ̂2v of the noise variance is available, guidelines for selecting λ∗ and λ1 have been proposed in =-=[42]-=-. Being convex (P1) is appealing, and it is known to attain good performance in theory and practice [25]. Also (3) and its estimator (P1) are quite general, as discussed in the ensuing remark. Remark ... |

82 | Network anomography
- Zhang, Ge, et al.
- 2005
(Show Context)
Citation Context ...lies, as well as their tracking capabilities when traffic routes are slowly time-varying, and the network monitoring station acquires incomplete link traffic measurements (Section VI). Different from =-=[40]-=- which employs a two-step batch procedure to learn the nominal traffic subspace first, and then unveil anomalies via ℓ1-norm minimization, the approach here estimates both quantities jointly and attai... |

78 | Online identification and tracking of subspaces from highly incomplete information.
- Balzano, Nowak, et al.
- 2010
(Show Context)
Citation Context ...PΩt(Pqt + at + vt), t = 1, 2, . . .. In the absence of sparse ‘outliers’ {at}∞t=1, an online algorithm based on incremental gradient descent on the Grassmannian manifold of subspaces was put forth in =-=[4]-=-. The second-order RLS-type algorithm in [15] extends the seminal projection approximation subspace tracking algorithm [39] to handle missing data. When outliers are present, robust counterparts can b... |

78 | Spatio-temporal compressive sensing and Internet traffic matrices,"
- Zhang, Roughan, et al.
- 2009
(Show Context)
Citation Context ...e, that is, the intuitive low-rank property of the traffic matrix in the absence of anomalies, which is mainly due to common temporal patterns across OD flows, and periodic behavior across time [21], =-=[41]-=-. Exploiting the low-rank structure of the anomaly-free traffic matrix, a landmark principal component analysis (PCA)-based method was put forth in [20] to identify network anomalies; see also [27] fo... |

72 | Parallel stochastic gradient algorithms for large-scale matrix completion. Optimization Online
- Recht, Ré
- 2011
(Show Context)
Citation Context ...MITTED) 6 A. A separable low-rank regularization To address (c2) [along with (c3) as it will become clear in Section IV], consider the following alternative characterization of the nuclear norm [30], =-=[31]-=- ‖X‖∗ := min {P,Q} 1 2 {‖P‖2F + ‖Q‖2F} , s. t. X = PQ′. (5) The optimization (5) is over all possible bilinear factorizations of X, so that the number of columns ρ of P and Q is also a variable. Lever... |

70 |
Adaptive Signal Processing Algorithms: Stability and Performance. Upper Saddle River,
- Solo, Kong
- 1995
(Show Context)
Citation Context ...a new datum is acquired, anomaly estimates are formed via the least-absolute shrinkage and selection operator (Lasso), e.g, [17, p. 68], and the low-rank nominal traffic subspace is refined using RLS =-=[34]-=-. Convergence analysis is provided under simplifying technical assumptions in Section IV-B. For situations were reducing computational complexity is critical, an online stochastic gradient algorithm b... |

61 | Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions,” - Agarwal, Negahban, et al. - 2012 |

41 |
Complexity of quantifier elimination in the theory of algebraically closed fields.
- Chistov, Grigor’ev
- 1984
(Show Context)
Citation Context ...ell as minimize the rank of X, and the number of nonzero entries of A measured by its ℓ0-(pseudo) norm. Unfortunately, albeit natural both rank and ℓ0-norm criteria are in general NP-hard to optimize =-=[16]-=-, [28]. Typically, the nuclear norm ‖X‖∗ and the ℓ1-norm ‖A‖1 are adopted as surrogates, since they are the closest convex approximants to rank(X) and ‖A‖0, respectively [10], [30], [36]. Accordingly,... |

33 | Incremental gradient on the grassmannian for online foreground and background separation in subsampled video
- He, Balzano, et al.
- 2012
(Show Context)
Citation Context ...r RLS-type algorithm in [15] extends the seminal projection approximation subspace tracking algorithm [39] to handle missing data. When outliers are present, robust counterparts can be found in [14], =-=[18]-=-, [26]. Relative to all aforementioned works, the estimation problem here is more challenging due to the presence of the fat (compression) matrix Rt; see [25] for fundamental identifiability issues re... |

26 | Multivariate online anomaly detection using kernel recursive least squares
- Ahmed, Coates, et al.
- 2007
(Show Context)
Citation Context ...ons (Section II). Special focus will be placed on devising online (adaptive) algorithms that are capable of efficiently processing link measurements and track network anomalies ‘on the fly’; see also =-=[3]-=- for a ‘model-free’ approach that relies on the kernel recursive LS (RLS) algorithm. Accordingly, the novel online estimator entails an exponentially-weighted least-squares (LS) cost regularized with ... |

18 | Giannakis, “Recovery of low-rank plus compressed sparse matrices with application to unveiling traffic anomalies - Mardani, Mateos, et al. - 2013 |

16 | Recursive sparse recovery in large but correlated noise
- Qiu, Vaswani
- 2011
(Show Context)
Citation Context ...iability of {X,A}; see [25] for early results. Going back to the CS paradigm, even when X is nonzero one could envision a variant where the measurements are corrupted with correlated (low-rank) noise =-=[14]-=-. Last but not least, when A = 0F×T and Y is noisy, the recovery of X subject to a rank constraint is nothing but PCA – arguably, the workhorse of highdimensional data analytics. This same formulation... |

14 | Robust pca as bilinear decomposition with outlier-sparsity regularization - Mateos, Giannakis - 2012 |

13 |
Fast distributed gradient methods,” arXiv:1112.2972v1
- Jakovetic, Xavier, et al.
- 2011
(Show Context)
Citation Context ...nd speed of convergence, developing and studying algorithms for distributed optimization based on Nesterov’s acceleration techniques emerges as an exciting and rather pristine research direction; see =-=[19]-=- for early IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING (SUBMITTED) 18 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x−axis y− ax is 1 f6 5 3 14 6 2 4 10 f9 f2 Fig. 1. Synthetic network topology gra... |

11 |
PETRELS: Subspace estimation and tracking from partial observations,” in
- Chi, Eldar, et al.
- 2012
(Show Context)
Citation Context ...absence of sparse ‘outliers’ {at}∞t=1, an online algorithm based on incremental gradient descent on the Grassmannian manifold of subspaces was put forth in [4]. The second-order RLS-type algorithm in =-=[15]-=- extends the seminal projection approximation subspace tracking algorithm [39] to handle missing data. When outliers are present, robust counterparts can be found in [14], [18], [26]. Relative to all ... |

10 | Robust traffic anomaly detection with principal component pursuit - Abdelkefi, Jiang, et al. - 2010 |

7 | Distributed principal component analysis on networks via directed graphical models
- Meng, Wiesel, et al.
- 2012
(Show Context)
Citation Context ...], [41]. Exploiting the low-rank structure of the anomaly-free traffic matrix, a landmark principal component analysis (PCA)-based method was put forth in [20] to identify network anomalies; see also =-=[27]-=- for a distributed implementation. A limitation of the algorithm in [20] is that it cannot identify the anomalous flows. Most importantly, [20] has not exploited the sparsity of anomalies across flows... |

7 | A case-study of the accuracy of SNMP measurements
- Roughan
(Show Context)
Citation Context ...he network operator’s perspective. SNMP packets may be dropped for instance, if some links become congested, rendering link count information for those links more important, as well as less available =-=[32]-=-. To model missing link measurements, collect the tuples (l, t) associated with the available observations yl,t in the set Ω ∈ [1, 2, ..., L]× [1, 2, ..., T ]. Introducing the matrices Y := [yl,t],V :... |

5 | Guaranteedminimum-rank solutions of linear matrix equations via nuclear norm minimization - Recht, Fazel, et al. - 2010 |

4 |
Giannakis, “In-network sparsity regularized rank minimization: Applications and algorithms
- Mardani, Mateos, et al.
- 2013
(Show Context)
Citation Context ...ns method of multipliers (AD-MoM) as the basic tool to carry out distributed optimization, a general framework for in-network sparsity-regularized rank minimization was put forth in a companion paper =-=[24]-=-. In the context of network anomaly detection, results therein are encouraging yet there is ample room for improvement and immediate venues for future research open up. For instance, the distributed a... |

2 |
of low-rank plus compressed sparse matrices with application to unveiling traffic anomalies
- “Recovery
(Show Context)
Citation Context ...2v of the noise variance is available, guidelines for selecting λ∗ and λ1 have been proposed in [42]. Being convex (P1) is appealing, and it is known to attain good performance in theory and practice =-=[25]-=-. Also (3) and its estimator (P1) are quite general, as discussed in the ensuing remark. Remark 1 (Subsumed paradigms): When there is no missing data and X = 0L×T , one is left with an under-determine... |

1 |
Theory and Practice of Recursive Identification, 2nd ed
- Ljung, Söderström
- 1983
(Show Context)
Citation Context ...I. C. Proof of Proposition 3 The main steps of the proof are inspired by [23], which studies convergence of an online dictionary learning algorithm using the theory of martingale sequences; see e.g., =-=[22]-=-. However, relative to [23] the problem here introduces several distinct elements including: i) missing data with a time-varying pattern Ωt; ii) a non-convex bilinear term where the tall subspace matr... |