Results 1 - 10
of
14
The Failure Trace Archive: Enabling the Comparison of Failure Measurements and Models of Distributed Systems
, 2013
"... With the increasing presence, scale, and complexity of distributed systems, resource failures are becoming an important and practical topic of computer science research. While numerous failure models and failure-aware algorithms exist, their comparison has been hampered by the lack of public failure ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
With the increasing presence, scale, and complexity of distributed systems, resource failures are becoming an important and practical topic of computer science research. While numerous failure models and failure-aware algorithms exist, their comparison has been hampered by the lack of public failure data sets and data processing tools. To facilitate the design, validation, and comparison of fault-tolerant models and algorithms, we have created the Failure Trace Archive (FTA)—an online, public repository of failure traces collected from diverse parallel and distributed systems. In this work, we first describe the design of the archive, in particular of the standard FTA data format, and the design of a toolbox that facilitates automated analysis of trace data sets. We also discuss the use of the FTA for various current and future purposes. Second, after applying the toolbox to over fifteen failure traces collected from distributed systems used in various application domains (e.g., HPC, Internet operation, and various online applications), we present a comparative analysis of failures in various distributed systems. Our analysis presents various statistical insights and typical statistical modeling results for the availability of individual resources in various distributed systems. The analysis results underline the need for public availability of trace data from different distributed systems. Last, we show how different interpretations of the meaning of failure data can result in different conclusions for failure modeling and job scheduling in distributed systems. Our results for different interpretations show evidence that there may be a need for further revisiting existing failure-aware algorithms, when applied for general rather than for domain-specific distributed systems.
Scheduling Jobs in the Cloud Using On-demand and Reserved Instances
"... Abstract. Deploying applications in leased cloud infrastructure is increasingly considered by a variety of business and service integrators. However, the challenge of selecting the leasing strategy — larger or faster instances? on-demand or reserved instances? etc. — and to configure the leasing str ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Deploying applications in leased cloud infrastructure is increasingly considered by a variety of business and service integrators. However, the challenge of selecting the leasing strategy — larger or faster instances? on-demand or reserved instances? etc. — and to configure the leasing strategy with appropriate scheduling policies is still daunting for the (potential) cloud user. In this work, we investigate leasing strategies and their policies from a broker’s perspective. We propose, CoH, a family of Cloud-based, online Hybrid scheduling policies that minimizes rental cost by making use of both on-demand and reserved instances. We formulate the resource provisioning and job allocation policies as Integer Programming problems. As the policies need to be executed online, we limit the time to explore the optimal solution of the integer program, and compare the obtained solution with various heuristics-based policies; then automatically pick the best one. We show, via simulation and using multiple real-world traces, that the hybrid leasing policy can obtain significantly lower cost than typical heuristics-based policies. 1
Understanding and Recommending Play Relationships in Online Social Gaming
"... Abstract—Online Social Networking (OSN) applications such as Facebook’s communication and Zynga’s gaming platforms service hundreds of millions of users. To understand and model such relationships, social network graphs are extracted from running OSN applications and subsequently processed using soc ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Online Social Networking (OSN) applications such as Facebook’s communication and Zynga’s gaming platforms service hundreds of millions of users. To understand and model such relationships, social network graphs are extracted from running OSN applications and subsequently processed using social and complex network analysis tools. In this paper, we focus on the application domain of Online Social Games (OSGs) and deploy a formalism for extracting graphs from large datasets. Our formalism covers notions such as game participation, adversarial relationships, match outcomes, and allows to filter out “weak” links based on one or more threshold values. Using two novel large-scale OSG datasets, we investigate a range of threshold values and their influence on the resulting OSG graph properties. We discuss how an analysis of multiple graphs—obtained through different extraction rules—could be used in an algorithm to improve matchmaking for players. I.
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience (Invited Paper)
"... Abstract—Infrastructure-as-a-Service (IaaS) cloud computing is an emerging commercial infrastructure paradigm under which clients (users) can lease resources when and for how long needed, under a cost model that reflects the actual usage of resources by the client. For IaaS clouds to become mainstre ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Infrastructure-as-a-Service (IaaS) cloud computing is an emerging commercial infrastructure paradigm under which clients (users) can lease resources when and for how long needed, under a cost model that reflects the actual usage of resources by the client. For IaaS clouds to become mainstream technology and for current cost models to become more clientfriendly, benchmarking and comparing the non-functional system properties of various IaaS clouds is important, especially for the cloud users. In this article we focus on the IaaS cloudspecific elements of benchmarking, from a user’s perspective. We propose a generic approach for IaaS cloud benchmarking, discuss numerous challenges in developing this approach, and summarize our experience towards benchmarking IaaS clouds. We argue for an experimental approach that requires, among others, new techniques for experiment compression, new benchmarking methods that go beyond blackbox and isolated-user testing, new benchmark designs that are domain-specific, and new metrics for elasticity and variability.
Extending the Capabilities of Mobile Devices for Online Social Applications through Cloud Offloading
"... Abstract—Handheld devices are becoming an attractive option for users to interact with their social network, through online social applications. We are witnessing a rapid adoption of smarter devices all around us, which brings with it orders of magnitude in heterogeneity. Thus, researchers in the fi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Handheld devices are becoming an attractive option for users to interact with their social network, through online social applications. We are witnessing a rapid adoption of smarter devices all around us, which brings with it orders of magnitude in heterogeneity. Thus, researchers in the field of distributed systems are faced with new challenges: How to optimize performance for devices that are so diverse in terms of energy consumption, processing power and communication capabilities? My PhD research focuses on this challenge, adopting techniques for offloading operations from mobile to more powerful cloud-based infrastructure, and brings a three-fold contribution. First, we have characterized and modeled workloads of online social applications, and empirically validated them using traces of hundreds of real applications. Second, we are currently investigating offloading mechanisms, including: communication
1 An Analysis of Implicit Social Networks in Multiplayer Online Games
"... Abstract—For many networked games, such as the Defense of the Ancients and StarCraft series, the unofficial leagues created by players themselves greatly enhance user-experience, and extend the success of each game. Understanding the social structure that players of these games implicitly form helps ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—For many networked games, such as the Defense of the Ancients and StarCraft series, the unofficial leagues created by players themselves greatly enhance user-experience, and extend the success of each game. Understanding the social structure that players of these games implicitly form helps to create innovative gaming services to the benefit of both players and game operators. But how to extract and analyse the implicit social structure? We address this question by first proposing a formalism consisting of various ways to map interaction to social structure, and apply this to real-world data collected from three different game genres. We analyse the implications of these mappings for in-game and gaming-related services, ranging from network and socially-aware matchmaking of players, to an investigation of social network robustness against player departure. I.
LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms
"... ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
ABSTRACT In this paper we introduce LDBC Graphalytics, a new industrial-grade benchmark for graph analysis platforms. It consists of six deterministic algorithms, standard datasets, synthetic dataset generators, and reference output, that enable the objective comparison of graph analysis platforms. Its test harness produces deep metrics that quantify multiple kinds of system scalability, such as horizontal/vertical and weak/strong, and of robustness, such as failures and performance variability. The benchmark comes with open-source software for generating data and monitoring performance. We describe and analyze six implementations of the benchmark (three from the community, three from the industry), providing insights into the strengths and weaknesses of the platforms. Key to our contribution, vendors perform the tuning and benchmarking of their platforms.
Composition of the doctoral committee:
, 2015
"... ter verkrijging van de graad van doctor ..."
(Show Context)
TraceBench: An Open Data Set for Trace-Oriented Monitoring
"... Abstract-User request trace-oriented monitoring is an effective method to improve the reliability of cloud systems. However, there are some difficulties in getting traces, which hinder the development of trace-oriented monitoring research. In this paper, we, for the first time, release a fine-grain ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-User request trace-oriented monitoring is an effective method to improve the reliability of cloud systems. However, there are some difficulties in getting traces, which hinder the development of trace-oriented monitoring research. In this paper, we, for the first time, release a fine-grained user request-centric open trace data set, called TraceBench 1 , collected on a real world cloud storage system deployed in a real environment. During collecting, many aspects are considered to simulate different scenarios, including cluster size, request type, workload speed, etc. Besides recording the traces when the monitored system is running normally, we also collect the traces under the situation with faults injected. With a mature injection tool, 14 faults are introduced, including function faults and performance faults. The traces in TraceBench are clustered in different files, where each file corresponds to a certain scenario. The whole collection work lasted for more than half a year, resulting in more than 360, 000 traces of 361 files. In addition, we also employ several applications based on TraceBench, which validate the helpfulness of TraceBench for the field of trace-oriented monitoring.
Supervisor
"... are an important type of distributed applications and have millions of users. Traditionally, MMOGs are hosted on dedicated clusters, distributed globally. With the advent of cloud computing, MMOGs such as Zynga’s are increasingly run on cloud resources, through the use of cloud technology and innova ..."
Abstract
- Add to MetaCart
(Show Context)
are an important type of distributed applications and have millions of users. Traditionally, MMOGs are hosted on dedicated clusters, distributed globally. With the advent of cloud computing, MMOGs such as Zynga’s are increasingly run on cloud resources, through the use of cloud technology and innovation. Massivizing MMOGs on clouds is the focus of my PhD research. My main contributions are: 1) analyzing and modeling various MMOG workloads, including those of social and traditional real-time games, 2) designing and implementing a cost-efficient and reliable cloud-based MMOG platform, 3) designing and implementing a scalable MMOG system which employs domain-specific scaling techniques to support the realtime strategy games of the future, 4) experimental prototypes and tools to evaluate our proposed research via real-world experimentation and simulation, and applying our proposed research to a popular real-world application. In this article, I introduce my research progress and my future plans. I.