Results 1 - 10
of
10
Impact of traffic mix on caching performance in a content-centric network
- In IEEE Workshop on Emerging Design Choices in NameOriented Networking
, 2012
"... Abstract. For a realistic traffic mix, we evaluate the hit rates attained in a two-layer cache hierarchy designed to reduce Internet bandwidth require-ments. The model identifies four main types of content, web, file sharing, user generated content and video on demand, distinguished in terms of thei ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
(Show Context)
Abstract. For a realistic traffic mix, we evaluate the hit rates attained in a two-layer cache hierarchy designed to reduce Internet bandwidth require-ments. The model identifies four main types of content, web, file sharing, user generated content and video on demand, distinguished in terms of their traffic shares, their population and object sizes and their popularity distributions. Results demonstrate that caching VoD in access routers offers a highly favor-able bandwidth memory tradeoff but that the other types of content would likely be more efficiently handled in very large capacity storage devices in the core. Evaluations are based on a simple approximation for LRU cache perfor-mance that proves highly accurate in relevant configurations. 1.
Internet Video Delivery in YouTube: From Traffic Measurements to Quality of Experience
, 2013
"... This chapter investigates HTTP video streaming over the Internet for the YouTube platform. YouTube is used as concrete example and case study for video delivery over the Internet, since it is not only the most popular online video platform, but also generates a large share of traffic on today’s Int ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This chapter investigates HTTP video streaming over the Internet for the YouTube platform. YouTube is used as concrete example and case study for video delivery over the Internet, since it is not only the most popular online video platform, but also generates a large share of traffic on today’s Internet. We will describe the YouTube infrastructure as well as the underlying mechanisms for optimizing content delivery. Such mechanisms include server selection via DNS as well as application-layer traffic management. Furthermore, the impact of delivery via the Internet on the user experienced quality (QoE) of YouTube video streaming is quantified. In this context, different QoE monitoring approaches are qualitatively compared and evaluated in terms of the accuracy of QoE estimation.
Bias Correction in Small Sample from Big Data
"... This paper discusses the bias problem when estimating the population size of big data such as online social networks (OSN) using simple random walk. Unlike the traditional estimation problem where the sample size is not very small relative to the data size, in big data a small sample relative to th ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paper discusses the bias problem when estimating the population size of big data such as online social networks (OSN) using simple random walk. Unlike the traditional estimation problem where the sample size is not very small relative to the data size, in big data a small sample relative to the data size is already very large and costly to obtain. When small samples are used, there is a bias that is no longer negligible. This paper shows analitically that the relative bias can be approximated by the reciprocal of the number of collisions, thereby a bias correction estimator is introduced. The result is further supported by both simulation studies and the real Twitter network that contains 41.7 million nodes.
Graph Size Estimation
"... Many online networks are not fully known and are often studied via sampling. Random Walk (RW) based techniques are the current state-of-the-art for estimating nodal attributes and local graph properties, but estimating global properties remains a challenge. In this paper, we are interested in a fund ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Many online networks are not fully known and are often studied via sampling. Random Walk (RW) based techniques are the current state-of-the-art for estimating nodal attributes and local graph properties, but estimating global properties remains a challenge. In this paper, we are interested in a fundamental property of this type — the graph size N,i.e., the number of its nodes. Existing methods for estimatingN are (i) inefficient and (ii) cannot be easily used with RW sampling due to dependence between successive samples. In this paper, we address both problems. First, we propose IE (Induced Edges), an efficient technique for estimating N from an independence sample of graph’s nodes. IE exploits the edges induced on the sampled nodes. Second, we introduce SafetyMargin, a method that corrects estimators for dependence in RW samples. Finally, we combine these two stand-alone techniques to obtain a RW-based graph size estimator. We evaluate our approach in simulations on a wide range of real-life topologies, and on several samples of Facebook. IE with SafetyMargin typically requires at least 10 times fewer samples than the state-of-the-art techniques (over 100 times in the case of Facebook) for the same estimation error. Keywords graph size estimation, network sampling, random walk, online social networks, measurement 1.
Youtube live and twitch: A tour of usergenerated live streaming systems
- In Proc. of ACM MMSys (dataset track
"... User-Generated live video streaming systems are services that allow anybody to broadcast a video stream over the Internet. These Over-The-Top services have recently gained popularity, in particular with e-sport, and can now be seen as competitors of the traditional cable TV. In this paper, we presen ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
User-Generated live video streaming systems are services that allow anybody to broadcast a video stream over the Internet. These Over-The-Top services have recently gained popularity, in particular with e-sport, and can now be seen as competitors of the traditional cable TV. In this paper, we present a dataset for further works on these systems. This dataset contains data on the two main user-generated live streaming systems: Twitch and the live service of YouTube. We got three months of traces of these services from January to April 2014. Our dataset includes, at every five minutes, the identifier of the online broadcaster, the number of people watching the stream, and various other media information. In this paper, we introduce the dataset and we make a pre-liminary study to show the size of the dataset and its poten-tials. We first show that both systems generate a significant traffic with frequent peaks at more than 1 Tbps. Thanks to more than a million unique uploaders, Twitch is in par-ticular able to offer a rich service at anytime. Our second main observation is that the popularity of these channels is more heterogeneous than what have been observed in other services gathering user-generated content.
1 Detect Inflated Follower Numbers in OSN Using Star Sampling
"... Abstract—The properties of online social networks (OSNs) are of great interests to the general public as well as IT professionals. Often the raw data are not available and the summary released by the service providers are sketchy. Thus sampling is needed to reveal the hidden properties of the underl ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The properties of online social networks (OSNs) are of great interests to the general public as well as IT professionals. Often the raw data are not available and the summary released by the service providers are sketchy. Thus sampling is needed to reveal the hidden properties of the underlying data. While uniform random sampling is often preferred, some properties such as the top bloggers need to be obtained using PPS (probability proportional to size) sampling. Although PPS sampling can be approximated using simple random walk, it is not efficient because only one sample is taken in every step. This paper introduces an efficient sampling method, called star sampling, that takes all the neighbours as valid samples. It is more efficient than random walk sampling by a factor of the average degrees. We derive the estimator and its variance, and verify the result using six large real-networks locally where the ground-truth are known and the estimations can be evaluated. Then we apply our method on Weibo, the Chinese version of Twitter, whose properties are rarely studied albeit its enormous size and influence. Along with other conventional metrics such as size and degree distributions, we demonstrate that star sampling can identify ten thousand top bloggers efficiently. In general, the estimated follower number is consistent with the claimed number, but there are cases where the follower numbers are inflated by a factor up to 132. Index Terms—Online social network, sampling, graph sampling, Weibo. I.
Discover Hidden Web Properties by Random Walk on Bipartite Graph
"... This paper proposes to use random walk to discover the properties of the deep web data sources that are hidden behind searchable interfaces. The properties, such as the average degree and population size of both documents and terms, are of interests to general public, and find their applications i ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper proposes to use random walk to discover the properties of the deep web data sources that are hidden behind searchable interfaces. The properties, such as the average degree and population size of both documents and terms, are of interests to general public, and find their applications in business intelligence, data integration and deep web crawling. We show that simple random walk (RW) can outperform the uniform random (UR) samples disregarding the high cost of uniform random sampling. We prove that in the idealized case when the degrees follow Zipf’s law, the sample size of UR sampling needs to grow in the order of O(N/ln 2 N) with the corpus size N, while the sample size of RW sampling grows logarithmically. Reuters corpus is used to demonstrate that the term degrees resemble power law distribution, thus RW is better than UR sampling. On the other hand, document degrees have lognormal distribution and exhibit a smaller variance, therefore UR sampling is slightly better.
A Quantitative Study of Video Duplicate Levels in
"... Abstract. The popularity of video sharing services has increased exponentially in recent years, but this popularity is accompanied by challenges associated with the tremendous scale of user bases and massive amounts of video data. A known inefficiency of video sharing services with user-uploaded con ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. The popularity of video sharing services has increased exponentially in recent years, but this popularity is accompanied by challenges associated with the tremendous scale of user bases and massive amounts of video data. A known inefficiency of video sharing services with user-uploaded content is widespread video duplication. These duplicate videos are often of different aspect ratios, can contain overlays or additional borders, or can be excerpted from a longer, original video, and thus can be difficult to detect. The proliferation of duplicate videos can have an impact at many levels, and accurate assessment of duplicate levels is a critical step toward mitigating their effects on both video sharing services and network infrastructure. In this work, we combine video sampling methods, automated video compari-son techniques, and manual validation to estimate duplicate levels within large collections of videos. The combined strategies yield a 31.7 % estimated video duplicate ratio across all YouTube videos, with 24.0 % storage occupied by du-plicates. These high duplicate ratios motivate the need for further examination of the systems-level tradeoffs associated with video deduplication versus storing large number of duplicates. 1
How Much to Coordinate? — Optimizing In-Network Caching in Content-Centric Networks
"... Abstract—In content-centric networks, it is challenging how to optimally provision in-network storage to cache contents, so as to balance the trade-offs between the network performance and the provisioning cost. To address this problem, we first propose a holistic model for intra-domain networks to ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—In content-centric networks, it is challenging how to optimally provision in-network storage to cache contents, so as to balance the trade-offs between the network performance and the provisioning cost. To address this problem, we first propose a holistic model for intra-domain networks to characterize the network performance of routing contents to clients and the network cost incurred by globally coordinating the in-network storage capability. We then derive the optimal strategy for provi-sioning the storage capability that optimizes the overall network performance and cost, and analyze the performance gains via numerical evaluations on real network topologies. Our results reveal interesting phenomena; for instance, different ranges of the Zipf exponent can lead to opposite optimal strategies, and the trade-offs between the network performance and the provisioning cost have great impacts on the stability of the optimal strategy. We also demonstrate that the optimal strategy can achieve significant gain on both the load reduction at origin servers and the improvement on the routing performance. Moreover, given an optimal coordination level `∗, we design a routing aware content placement (RACP) algorithm that runs on a centralized server. The algorithm computes and assigns contents to each CCN router to store, which can minimize the overall routing cost, e.g., transmission delay or hop counts, to deliver contents to clients. By conducting extensive simulations using a large scale trace dataset collected from a commercial 3G network in China, our results demonstrate that our caching scheme can achieve 4% to 22 % latency reduction on average over the state-of-the-art caching mechanisms. Index Terms—in-network caching, content-centric networks, coordinated caching I.
Estimating IPv4 Address Space Usage with Capture-Recapture
"... Preprint. ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribu ..."
Abstract
- Add to MetaCart
(Show Context)
Preprint. ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.