Results 1 - 10
of
14
Schemes for Fault Identification in Communication Networks
- IEEE/ACM Transactions on Networking
, 1995
"... As networks evolve, become larger and more complex, the need for advanced Fault Management capabilities becomes more critical. Failures in a telecommunication network are unavoidable. Quick detection and recovery make the network more robust and increase the confidence in the services it provides. U ..."
Abstract
-
Cited by 83 (2 self)
- Add to MetaCart
As networks evolve, become larger and more complex, the need for advanced Fault Management capabilities becomes more critical. Failures in a telecommunication network are unavoidable. Quick detection and recovery make the network more robust and increase the confidence in the services it provides. Usually a single fault in a large communication network results in a large number of fault indications and it is not always easy to isolate the primary source of failure. In this paper we propose a system model suitable for fault localization which takes into account the dependencies between the different objects in the telecommunication system. Based on that model we design an algorithm for alarm correlation and fault localization and analyze its performance. We also propose and analyze the performance of two algorithms for fault localization which are suitable when the independent failure assumption for the system under observation is valid. Finally we examine the importance of the informat...
Increasing Robustness of Fault Localization Through Analysis of Lost, Spurious, and Positive Symptoms
, 2002
"... This paper utilizes belief networks to implement fault localization in communication systems taking into account comprehensive information about the system behavior. Most previous work on this subject performs fault localization based solely on the information about malfunctioning system components ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
This paper utilizes belief networks to implement fault localization in communication systems taking into account comprehensive information about the system behavior. Most previous work on this subject performs fault localization based solely on the information about malfunctioning system components (i.e., negative symptoms). In this paper, we show that positive information, i.e., the lack of any disorder in some system components, may be used to improve the accuracy of this process. The technique presented in this paper allows lost and spurious symptoms to be incorporated in the analysis. We show through simulation that in a noisy network environment the analysis of lost and spurious symptoms increases the robustness of fault localization with belief networks. We also demonstrate that belief networks yield high accuracy even for approximate probability input data and therefore are a promising model for non-deterministic fault localization.
End-to-end Service Failure Diagnosis Using Belief Networks
- In Proc. Network Operation and Management Symposium
, 2002
"... We present fault localization techniques suitable for diagnosing end-to-end service problems in communication systems with complex topologies. We refine a layered system model that represents relationships between services and functions offered between neighboring protocol layers. In a given layer, ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
We present fault localization techniques suitable for diagnosing end-to-end service problems in communication systems with complex topologies. We refine a layered system model that represents relationships between services and functions offered between neighboring protocol layers. In a given layer, an end-to-end service between two hosts may be provided using multiple host-to-host services offered in this layer between two hosts on the end-to-end path. Relationships among end-to-end and host-tohost services form a bipartite probabilistic dependency graph whose structure depends on the network topology in the corresponding protocol layer. When an end-to-end service fails or experiences performance problems it is important to efficiently find the responsible host-to-host services. Finding the most probable explanation (MPE) of the observed symptoms is NP-hard. We propose two fault localization techniques based on Pearl's iterative algorithms for singly connected belief networks. The probabilistic dependency graph is transformed into a belief network, and then the approximations based on Pearl's algorithms and exact bucket tree elimination algorithm are designed and evaluated through extensive simulation study.
The present and future of event correlation: A need for end-to-end service fault localization
, 2001
"... the observable malfunctioning of the managed system. Until recently, fault localization efforts concentrated mostly on diagnosing faults related to the availability of network resources in the lowest layers of the protocol stack. Modern enterprise environments require that fault diagnosis be perform ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
the observable malfunctioning of the managed system. Until recently, fault localization efforts concentrated mostly on diagnosing faults related to the availability of network resources in the lowest layers of the protocol stack. Modern enterprise environments require that fault diagnosis be performed in integrated fashion in multiple layers of the protocol stack and that it include diagnosing performance problems. This paper reviews the existing approaches to fault localization and presents its new facets revealed by the demands of modern enterprise systems. We also present end-to-end service failure diagnosis as a critical step towards multi-layer fault localization in an enterprise environment.
Fault Isolation based on Decision-Theoretic Troubleshooting
, 1996
"... A decision-theoretic approach for fault isolation in broadband networks is presented. Our approach considers faults due to software and hardware as well as performance degradation and configuration problems. Belief networks are used to represent the relationships among various network entities. Duri ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
A decision-theoretic approach for fault isolation in broadband networks is presented. Our approach considers faults due to software and hardware as well as performance degradation and configuration problems. Belief networks are used to represent the relationships among various network entities. During a troubleshooting session, the network manager iteratively derives a sequence of tests based on the conditional probabilities, computed from statistics gathered (via alarms and tests) about the state of the network, and the costs associated with testing entities. An online dynamic programming technique is used to get the optimal sequence of tests. A system prototype that was implemented based on data from the XUNET testbed is also described. Keywords: Fault Isolation, Fault Management, Decision-Theoretic Troubleshooting, Broadband Networks February 15, 1996. Center for Telecommunications Research Tech. Rep. CU/CTR/TR 442-96-08 Contact author: Jean-Fran¸cois Huard CTR/Columbia University ...
Probabilistic fault diagnosis in communication systems through incremental hypothesis updating
, 2004
"... ..."
An Adaptable Network COntrol and Reporting System (ANCORS)
- In Integrated Network Management
, 1999
"... We present ANCORS, an adaptable network control and reporting system that merges technology from network management and distributed simulation to provide a unified paradigm for assessing, controlling, and designing active networks. ANCORS introduces a framework to assist in managing the substantial ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We present ANCORS, an adaptable network control and reporting system that merges technology from network management and distributed simulation to provide a unified paradigm for assessing, controlling, and designing active networks. ANCORS introduces a framework to assist in managing the substantial complexities of software reuse and scalability in active network environments. Specifically, ANCORS provides an extensible approach to the dynamic integration, management, and runtime assessment of various network protocols in live network operations. We present some of the advantages that can be obtained by merging technology from network management, distributed simulation, and active networking, and describe how ANCORS leverages complementary elements of each. We also introduce an ANCORS facility called the active network daemon anetd, which supports the deployment and system management of a large class of legacy software and newer active network applications under the ANCORS framework. La...
Modeling Correlated Alarms in Network Management Systems
- In In Western Simulation Multiconference
, 1996
"... We introduce a new model for describing the behavior of complex interconnection networks in the presence of faults. Using our model it is possible to obtain the detailed behavior of a faulty telecommunication network by specifying some parameters that describe its physical characteristics. We descri ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
We introduce a new model for describing the behavior of complex interconnection networks in the presence of faults. Using our model it is possible to obtain the detailed behavior of a faulty telecommunication network by specifying some parameters that describe its physical characteristics. We describe our model, we show how it can be applied to study some real network management structures, and we give some experimental data to evaluate its relation to existing standard analytical models. 1 Introduction Modern telecommunication networks have grown in complexity beyond the point where they can be managed by a manual process alone. This complexity gave rise to automated network management (NM) systems that interact with network elements (NEs) to gain data about their behavior and performance, and to control their behavior by setting operational parameters. Several protocols have been proposed for the interaction between NEs and NM systems. At present, the Simple Network Management Proto...
Distributed fault localization in hierarchically routed networks
- In Int’l Wksp on Distributed Systems: Operations and Management
, 2002
"... Probabilistic inference was shown effective in non-deterministic diagnosis of end-to-end service failures. To overcome the exponential complexity of the exact inference algorithms in fault propagation models represented by graphs with undirected loops, Pearl’s iterative algorithms for polytrees were ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Probabilistic inference was shown effective in non-deterministic diagnosis of end-to-end service failures. To overcome the exponential complexity of the exact inference algorithms in fault propagation models represented by graphs with undirected loops, Pearl’s iterative algorithms for polytrees were used as an approximation schema. The approximation made it possible to diagnose end-to-end service failures in network topologies composed of tens of nodes. This paper proposes a distributed algorithm that increases the admissible network size by an order of magnitude. The algorithm divides the computational effort and system knowledge among multiple, hierarchically organized managers. The cooperation among managers is illustrated with examples, and the results of a preliminary performance study are presented. 1 1
Non-deterministic Event-driven Fault Diagnosis through Incremental Hypothesis Updating
- in Integrated Network Management VIII, G. Goldszmidt and
, 2003
"... This paper presents a non-deterministic event-driven fault localization technique, which uses a probabilistic symptom-fault map as a fault propagation model. The technique isolates the most probable set of faults through incremental updating of the symptom explanation hypothesis. At any time, it pro ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper presents a non-deterministic event-driven fault localization technique, which uses a probabilistic symptom-fault map as a fault propagation model. The technique isolates the most probable set of faults through incremental updating of the symptom explanation hypothesis. At any time, it provides a set of alternative hypotheses, each of which is a complete explanation of the set of symptoms observed thus far. The hypotheses are ranked according to a measure of their “goodness”. The technique allows multiple simultaneous independent faults to be identified and incorporates both negative and positive symptoms in the analysis. As shown in a simulation study, the technique is resilient both to noise in the symptom data and to the inaccuracies of the probabilistic fault propagation model. 1 1

