Download:
by Brian Tierney, Brian Crowley, Dan Gunter, Jason Lee, Mary Thompson
Proceedings of the IEEE High Performance Distributed Computing Conference (HPDC-9
http://www-didc.lbl.gov/papers/CC_JAMM.pdf
Add To MetaCart
Abstract:
Large distributed systems such as Computational Grids require a large amount of monitoring data be collected for a variety of tasks such as fault detection, performance analysis, performance tuning, performance prediction, and scheduling. Ensuring that all necessary monitoring is turned on and that data is being collected can be a very tedious and error-prone task. We have developed an agent-based system to automate the execution of monitoring sensors and the collection of event data. The ability to monitor and manage distributed computing components is critical for enabling high-performance distributed computing. Monitoring data is needed to determine the source of performance problems and to tune the system for better performance. Fault detection and recovery mechanisms need monitoring data to determine if a server is down, and whether to restart the server or redirect service requests elsewhere. A performance prediction service
Citations
|
449
|
The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing,” Future Generation Computer Systems
– Wolski, Spring, et al.
- 1999
|
|
395
|
Software agents
– Genesereth, Katchpel
- 1994
|
|
210
|
A Directory Service for Configuring High-Performance Distributed Computations
– Fitzgerald, Foster, et al.
- 1997
|
|
79
|
Certificate-based Access Control for Widely Distributed Resources. 8th Usenix Security Symposium
– Thompson, Johnston, et al.
- 1999
|
|
58
|
The Netlogger methodology for high performance distributed systems performance analysis
– Tierney, Johnston, et al.
- 1998
|
|
28
|
SvPablo: A multi-language architecture-independent performance analysis system
– DeRose, Reed
- 1999
|
|
25
|
A Network-Aware Distributed Storage Cache for Data-Intensive Environments
– Tierney, Lee, et al.
- 1999
|
|
20
|
Introduction to Version 3
– Case, Mundy, et al.
- 1999
|
|
10
|
Network Time
– Mills
- 1992
|
|
9
|
3] The Grid: Blueprint for a New Computing Infrastructure, edited by Ian Foster and Carl
– Kaufmann, Pub
- 1998
|
|
8
|
Universal Format for Logger Messages
– Abela, Debeaupuis
- 1999
|
|
4
|
Scalable Performance
– Pablo
- 1998
|
|
3
|
White Paper: Developing a Dynamic Performance Information Infrastructure for Grid Systems”, http://dast.nlanr.net/ GridForum/Perf-WG/white.PDF
– Wolski, Swany, et al.
- 2000
|