Results 1  10
of
78
Anatomy of a Large European IXP
"... The largest IXPs carry on a daily basis traffic volumes in the petabyte range, similar to what some of the largest global ISPs reportedly handle. This littleknown fact is due to a few hundreds of member ASes exchanging traffic with one another over the IXP’s infrastructure. This paper reports on a ..."
Abstract

Cited by 55 (16 self)
 Add to MetaCart
(Show Context)
The largest IXPs carry on a daily basis traffic volumes in the petabyte range, similar to what some of the largest global ISPs reportedly handle. This littleknown fact is due to a few hundreds of member ASes exchanging traffic with one another over the IXP’s infrastructure. This paper reports on a firstofitskind and indepth analysis of one of the largest IXPs worldwide based on nine months ’ worth of sFlow records collected at that IXP in 2011. A main finding of our study is that the number of actual peering links at this single IXP exceeds the number of total AS links of the peerpeer type in the entire Internet known as of 2010! To explain such a surprisingly rich peering fabric, we examine in detail this IXP’s ecosystem and highlight the diversity of networks that are members at this IXP and connect there with other member ASes for reasons that are similarly diverse, but can be partially inferred from their business types and observed traffic patterns. In the process, we investigate this IXP’s traffic matrix and illustrate what its temporal and structural properties can tell us about the member ASes that generated the traffic in the first place. While our results suggest that these large IXPs can be viewed as a microcosm of the Internet ecosystem itself, they also argue for a reassessment of the mental picture that our community has about this ecosystem.
Compressive sensing over graphs
 in Proc. IEEE INFOCOM
, 2011
"... Abstract—In this paper, motivated by network inference and tomography applications, we study the problem of compressive sensing for sparse signal vectors over graphs. In particular, we are interested in recovering sparse vectors representing the properties of the edges from a graph. Unlike existing ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, motivated by network inference and tomography applications, we study the problem of compressive sensing for sparse signal vectors over graphs. In particular, we are interested in recovering sparse vectors representing the properties of the edges from a graph. Unlike existing compressive sensing results, the collective additive measurements we are allowed to take must follow connected paths over the underlying graph. For a sufficiently connected graph with n nodes, it is shown that, using O(k log(n)) path measurements, we are able to recover any ksparse link vector (with no more than k nonzero elements), even though the measurements have to follow the graph path constraints. We mainly show that the computationally efficient 1 minimization can provide theoretical guarantees for inferring such ksparse vectors with O(k log(n)) path measurements from the graph. I.
Scalable Tensor Factorizations with Missing Data
 SIAM INTERNATIONAL CONFERENCE ON DATA MINING
, 2010
"... The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networksall domains in which data collection is subject to occasional errors. Moreover, the ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
The problem of missing data is ubiquitous in domains such as biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networksall domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multiway arrays (i.e., tensors) in the presence of missing data. We focus on one of the most wellknown tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CPWOPT (CP Weighted OPTimization) using a firstorder optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the realworld usefulness of CPWOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes.
Dynamic anomalography: Tracking network anomalies via sparsity and low rank
, 2013
"... In the backbone of largescale networks, origintodestination (OD) traffic flows experience abrupt unusual changes known as traffic volume anomalies, which can result in congestion and limit the extent to which enduser quality of service requirements are met. As a means of maintaining seamless en ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
(Show Context)
In the backbone of largescale networks, origintodestination (OD) traffic flows experience abrupt unusual changes known as traffic volume anomalies, which can result in congestion and limit the extent to which enduser quality of service requirements are met. As a means of maintaining seamless enduser experience in dynamic environments, as well as for ensuring network security, this paper deals with a crucial network monitoring task termed dynamic anomalography. Given link traffic measurements (noisy superpositions of unobserved OD flows) periodically acquired by backbone routers, the goal is to construct an estimated map of anomalies in real time, and thus summarize the network ‘health state ’ along both the flow and time dimensions. Leveraging the low intrinsicdimensionality of OD flows and the sparse nature of anomalies, a novel online estimator is proposed based on an exponentiallyweighted leastsquares criterion regularized with the sparsitypromotingnorm of the anomalies, and the nuclear norm of the nominal traffic matrix. After recasting the nonseparable nuclear norm into a form amenable to online optimization, a realtime algorithm for dynamic anomalography is developed and its convergence established under simplifying technical assumptions. For operational conditions where computational complexity reductions are at a premium, a lightweight stochastic gradient algorithm based on Nesterov’s acceleration technique is developed as well. Comprehensive numerical tests with both synthetic and real network data corroborate the effectiveness of the proposed online algorithms and their tracking capabilities, and demonstrate that they outperform stateoftheart approaches developed to diagnose traffic anomalies.
Recovery of lowrank plus compressed sparse matrices with application to unveiling traffic anomalies
 IEEE TRANS. INFO. THEORY
, 2013
"... Given the noiseless superposition of a lowrank matrix plus the product of a known fat compression matrix times a sparse matrix, the goal of this paper is to establish deterministic conditions under which exact recovery of the lowrank and sparse components becomes possible. This fundamental identif ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Given the noiseless superposition of a lowrank matrix plus the product of a known fat compression matrix times a sparse matrix, the goal of this paper is to establish deterministic conditions under which exact recovery of the lowrank and sparse components becomes possible. This fundamental identifiability issue arises with traffic anomaly detection in backbone networks, and subsumes compressed sensing as well as the timely lowrank plus sparse matrix recovery tasks encountered in matrix decomposition problems. Leveraging the ability of and nuclear norms to recover sparse and lowrank matrices, a convex program is formulated to estimate the unknowns. Analysis and simulations confirm that the said convex program can recover the unknowns for sufficiently lowrank and sparse enough components, along with a compression matrix possessing an isometry property when restricted to operate on sparse vectors. When the lowrank, sparse, and compression matrices are drawn from certain random ensembles, it is established that exact recovery is possible with high probability. Firstorder algorithms are developed to solve the nonsmooth convex optimization problem with provable iteration complexity guarantees. Insightful tests with synthetic and real network data corroborate the effectiveness of the novel approach in unveiling traffic anomalies across flows and time, and its ability to outperform existing alternatives.
Exploiting temporal stability and lowrank structure for localization in mobile networks, in: MobiCom ’10
 Proceedings of the sixteenth annual international conference on Mobile computing and networking, ACM
, 2010
"... Localization is a fundamental operation for many wireless networks. While GPS is widely used for location determination, it is unavailable in many environments either due to its high cost or the lack of line of sight to the satellites (e.g., indoors, under the ground, or in a downtown canyon). The l ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
Localization is a fundamental operation for many wireless networks. While GPS is widely used for location determination, it is unavailable in many environments either due to its high cost or the lack of line of sight to the satellites (e.g., indoors, under the ground, or in a downtown canyon). The limitations of GPS have motivated researchers to develop many localization schemes to infer locations based on measured wireless signals. However, most of these existing schemes focus on localization in static wireless networks. As many wireless networks are mobile (e.g., mobile sensor networks, disaster recovery networks, and vehicular networks), we focus on localization in mobile networks in this paper. We analyze real mobility traces and find that they exhibit temporal stability and lowrank structure. Motivated by this observation, we develop three novel localization schemes to accurately determine locations in mobile networks: (i) Low Rank based Localization (LRL), which exploits the lowrank structure in mobility, (ii) Temporal Stability based Localization (TSL), which leverages the temporal stability, and (iii) Temporal Stability and Low Rank based Localization (TSLRL), which incorporates both the temporal stability and the lowrank structure. These localization schemes are general and can leverage either mere connectivity (i.e., rangefree localization) or distance estimation between neighbors (i.e., rangebased localization). Using extensive simulations and testbed experiments, we show that our new schemes significantly outperform stateoftheart localization schemes under a wide range of scenarios and are robust to measurement errors.
Efficient Networkwide Flow Record Generation
"... Abstract—Experiments on diverse topics such as network measurement, management and security are routinely conducted using empirical flow export traces. However, the availability of empirical flow traces from operational networks is limited and frequently comes with significant restrictions. Furtherm ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Experiments on diverse topics such as network measurement, management and security are routinely conducted using empirical flow export traces. However, the availability of empirical flow traces from operational networks is limited and frequently comes with significant restrictions. Furthermore, empirical traces typically lack critical metadata (e.g., labeled anomalies) which reduce their utility in certain contexts. In this paper, we describe fs: a firstofitskind tool for automatically generating representative flow export records as well as basic SNMPlike router interface counts. fs generates measurements for a target network topology with specified traffic characteristics. The resulting records for each router in the topology have byte, packet and flow characteristics that are representative of what would be seen in a live network. fs also includes the ability to inject different types of anomalous events that have precisely defined characteristics, thereby enabling evaluation of proposed attack and anomaly detection methods. We validate fs by comparing it with the ns2 simulator, which targets accurate recreation of packetlevel dynamics in small network topologies. We show that data generated by fs are virtually identical to what are generated by ns2, except over small time scales (below 1 second). We also show that fs is highly efficient, thus enabling test sets to be created for large topologies. Finally, we demonstrate the utility of fs through an assessment of anomaly detection algorithms, highlighting the need for flexible, scalable generation of networkwide measurement data with known ground truth. I.
ASTUTE: Detecting a Different Class of Traffic Anomalies
, 2010
"... When many flows are multiplexed on a nonsaturated link, their volume changes over short timescales tend to cancel each other out, making the average change across flows close to zero. This equilibrium property holds if the flows are nearly independent, and it is violated by traffic changes caused b ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
When many flows are multiplexed on a nonsaturated link, their volume changes over short timescales tend to cancel each other out, making the average change across flows close to zero. This equilibrium property holds if the flows are nearly independent, and it is violated by traffic changes caused by several, potentially small, correlated flows. Many traffic anomalies (both malicious and benign) fit this description. Based on this observation, we exploit equilibrium to design a computationally simple detection method for correlated anomalous flows. We compare our new method to two well known techniques on three network links. We manually classify the anomalies detected by the three methods, and discover that our method uncovers a different class of anomalies than previous techniques do.
Inferring Visibility: Who’s (Not) Talking to Whom?
"... Consider this simple question: how can a network operator identify the set of routes that pass through its network? Answering this question is surprisingly hard: BGP only informs an operator about a limited set of routes. By observing traffic, an operator can only conclude that a particular route pa ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Consider this simple question: how can a network operator identify the set of routes that pass through its network? Answering this question is surprisingly hard: BGP only informs an operator about a limited set of routes. By observing traffic, an operator can only conclude that a particular route passes through its network – but not that a route does not pass through its network. We approach this problem as one of statistical inference, bringing varying levels of additional information to bear: (1) the existence of traffic, and (2) the limited set of publicly available routing tables. We show that the difficulty depends critically on the position of the network in the overall Internet topology, and that the operators with the greatest incentive to solve this problem are those for which the problem is hardest. Nonetheless, we show that suitable application of nonparametric inference techniques can solve this problem quite accurately. For certain networks, traffic existence information yields good accuracy, while for other networks an accurate approach uses the ‘distance ’ between prefixes, according to a new network distance metric that we define. We then show how solving this problem leads to improved solutions for a particular application: traffic matrix completion.
Inferring Gas Consumption and Pollution Emission of Vehicles throughout a City
, 2014
"... This paper instantly infers the gas consumption and pollution emission of vehicles traveling on a city’s road network in a current time slot, using GPS trajectories from a sample of vehicles (e.g., taxicabs). The knowledge can be used to suggest costefficient driving routes as well as identifying r ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
This paper instantly infers the gas consumption and pollution emission of vehicles traveling on a city’s road network in a current time slot, using GPS trajectories from a sample of vehicles (e.g., taxicabs). The knowledge can be used to suggest costefficient driving routes as well as identifying road segments where gas has been wasted significantly. The instant estimation of the emissions from vehicles can enable pollution alerts and help diagnose the root cause of air pollution in the long run. In our method, we first compute the travel speed of each road segment using the GPS trajectories received recently. As many road segments are not traversed by trajectories (i.e., data sparsity), we propose a Travel Speed Estimation (TSE) model based on a contextaware matrix factorization approach. TSE leverages features learned from other data sources, e.g., map data and historical trajectories, to deal with the data sparsity problem. We then propose a Traffic Volume Inference (TVI) model to infer the number of vehicles passing each road segment per minute. TVI is an unsupervised Bayesian Network that incorporates multiple factors, such as travel speed, weather conditions and geographical features of a road. Given the travel speed and traffic volume of a road segment, gas consumption and emissions can be calculated based on existing environmental theories. We evaluate our method based on extensive experiments using GPS trajectories generated by over 32,000 taxis in Beijing over a period of two months. The results demonstrate the advantages of our method over baselines, validating the contribution of its components and finding interesting discoveries for the benefit of society.