MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  A Monitoring Sensor Management System for Grid Environments (2000) [36 citations — 2 self]

Download:
Download as a PDF
by Brian Tierney, Brian Crowley, Dan Gunter, Jason Lee, Mary Thompson
Proceedings of the IEEE High Performance Distributed Computing Conference (HPDC-9
http://www-didc.lbl.gov/papers/CC_JAMM.pdf
Add To MetaCart

Abstract:

Large distributed systems such as Computational Grids require a large amount of monitoring data be collected for a variety of tasks such as fault detection, performance analysis, performance tuning, performance prediction, and scheduling. Ensuring that all necessary monitoring is turned on and that data is being collected can be a very tedious and error-prone task. We have developed an agent-based system to automate the execution of monitoring sensors and the collection of event data. The ability to monitor and manage distributed computing components is critical for enabling high-performance distributed computing. Monitoring data is needed to determine the source of performance problems and to tune the system for better performance. Fault detection and recovery mechanisms need monitoring data to determine if a server is down, and whether to restart the server or redirect service requests elsewhere. A performance prediction service

Citations

449 The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Future Generation Computer Systems – Wolski, Spring, et al. - 1999
395 Software agents – Genesereth, Katchpel - 1994
210 A Directory Service for Configuring High-Performance Distributed Computations – Fitzgerald, Foster, et al. - 1997
79 Certificate-based Access Control for Widely Distributed Resources. 8th Usenix Security Symposium – Thompson, Johnston, et al. - 1999
58 The Netlogger methodology for high performance distributed systems performance analysis – Tierney, Johnston, et al. - 1998
28 SvPablo: A multi-language architecture-independent performance analysis system – DeRose, Reed - 1999
25 A Network-Aware Distributed Storage Cache for Data-Intensive Environments – Tierney, Lee, et al. - 1999
20 Introduction to Version 3 – Case, Mundy, et al. - 1999
10 Network Time – Mills - 1992
9 3] The Grid: Blueprint for a New Computing Infrastructure, edited by Ian Foster and Carl – Kaufmann, Pub - 1998
8 Universal Format for Logger Messages – Abela, Debeaupuis - 1999
4 Scalable Performance – Pablo - 1998
3 White Paper: Developing a Dynamic Performance Information Infrastructure for Grid Systems”, http://dast.nlanr.net/ GridForum/Perf-WG/white.PDF – Wolski, Swany, et al. - 2000