Results 1 - 10
of
31
A parallel approach to xml parsing
- In The 7th IEEE/ACM International Conference on Grid Computing
, 2006
"... Abstract — A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has a reputation for poor performance, and a number of op ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
(Show Context)
Abstract — A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has a reputation for poor performance, and a number of optimizations have been developed to address this performance problem from different perspectives, none of which have been entirely satisfactory. In this paper, we present a seemingly quixotic, but novel approach: parallel XML parsing. Parallel XML parsing leverages the growing prevalence of multicore architectures in all sectors of the computer market, and yields significant performance improvements. This paper presents our design and implementation of parallel XML parsing. Our design consists of an initial preparsing phase to determine the structure of the XML document, followed by a full, parallel parse. The results of the preparsing phase are used to help partition the XML document for data parallel processing. Our parallel parsing phase is a modification of the libxml2 [1] XML parser, which shows that our approach applies to real-world, production quality parsers. Our empirical study shows our parallel XML parsing algorithm can improved the XML parsing performance significantly and scales well. I.
A static load-balancing scheme for parallel xml parsing on multicore cpus
- In CCGrid’07 (IEEE International Symposium on Cluster Computing and the Grid ), Rio de Janeiro
, 2007
"... A number of techniques to improve the parsing performance of XML have been developed. Generally, however, these techniques have limited impact on the construction of a DOM tree, which can be a significant bottleneck. Meanwhile, the trend in hardware technology is toward an increasing number of cores ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
A number of techniques to improve the parsing performance of XML have been developed. Generally, however, these techniques have limited impact on the construction of a DOM tree, which can be a significant bottleneck. Meanwhile, the trend in hardware technology is toward an increasing number of cores per CPU. As we have shown in previous work, these cores can be used to parse XML in parallel, resulting in significant speedups. In this paper, we introduce a new static partitioning and load-balancing mechanism. By using a static, global approach, we reduce synchronization and load-balancing overhead, thus improving performance over dynamic schemes for a large class of XML documents. Our approach leverages libxml2 without modification, which reduces development effort and shows that our approach is applicable to real-world, production parsers. Our scheme works well with Sun’s Niagara class of CMT architectures, and shows that multiple hardware threads can be effectively used for XML parsing. 1.
XML screamer: An integrated approach to high performance XML parsing, validation and deserialization
- In 15th International World Wide Web Conference
, 2006
"... This paper describes an experimental system in which customized high performance XML parsers are prepared using parser generation and compilation techniques. Parsing is integrated with Schema-based validation and deserialization, and the resulting validating processors are shown to be as fast as or ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
(Show Context)
This paper describes an experimental system in which customized high performance XML parsers are prepared using parser generation and compilation techniques. Parsing is integrated with Schema-based validation and deserialization, and the resulting validating processors are shown to be as fast as or in many cases significantly faster than traditional nonvalidating parsers. High performance is achieved by integration across layers of software that are traditionally separate, by avoiding unnecessary data copying and transformation, and by careful attention to detail in the generated code. The effect of API design on XML performance is also briefly discussed. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors – code generation, compilers, optimization, parsing, retargetable compilers. D.2.8
Differential Deserialization for Optimized SOAP Performance
- in Proc. of the Int’l. Conference for High Performance Computing, Networking, and Storage
, 2005
"... SOAP, a simple, robust, and extensible protocol for the exchange of messages, is the most widely used communication protocol in the Web services model. SOAP’s XML-based message format hinders its performance, thus making it unsuitable in high-performance scientific applications. The deserialization ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
(Show Context)
SOAP, a simple, robust, and extensible protocol for the exchange of messages, is the most widely used communication protocol in the Web services model. SOAP’s XML-based message format hinders its performance, thus making it unsuitable in high-performance scientific applications. The deserialization of SOAP messages, which includes processing of XML data and conversion of strings to in-memory data types, is the major performance bottleneck in a SOAP message exchange. This paper presents and evaluates a new optimization technique for removing this bottleneck. This technique, called differential deserialization (DDS), exploits the similarities between incoming messages to reduce deserialization time. Differential deserialization is fully SOAPcompliant and requires no changes to a SOAP client. A performance study demonstrates that DDS can result in a significant performance improvement for some Web services 1.
An Efficient Service Oriented Architecture for Heterogeneous and Dynamic Wireless Sensor Networks
"... The purpose of this work is to bridge the gap between highend networked devices and wireless networks of ubiquituous and resource-constrained sensors and actuators by extensively applying Service-Oriented Architecture (SOA) patterns. We present a multi-level approach that implements existing SOA sta ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
The purpose of this work is to bridge the gap between highend networked devices and wireless networks of ubiquituous and resource-constrained sensors and actuators by extensively applying Service-Oriented Architecture (SOA) patterns. We present a multi-level approach that implements existing SOA standards on higher tiers, and propose a novel protocol stack, WSN-SOA, which brings the benefits of SOA to low capacity nodes without the overhead of XML-based technologies. This solution fully supports network dynamicity, auto-configuration, service discovery, device heterogeneity and interoperability with legacy architectures. As a proof-of-concept, we have studied a surveillance scenario in which the detection of an intruder, conducted within the range of a network of wireless sensors (e.g., MICAz from Crossbow), leads to the automatic triggering of tracking activities by a Linux-powered network camera and of alerts and video streams toward a control room.
Constructing Finite State Automata for High-Performance XML Web Services
, 2004
"... This paper describes a validating XML parsing method based on deterministic finite state automata (DFA). XML parsing and validation is performed by a schema-specific XML parser that encodes the admissible parsing states as a DFA. This DFA is automatically constructed from the XML schemas of XML mess ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
This paper describes a validating XML parsing method based on deterministic finite state automata (DFA). XML parsing and validation is performed by a schema-specific XML parser that encodes the admissible parsing states as a DFA. This DFA is automatically constructed from the XML schemas of XML messages using a code generator. A twolevel DFA architecture is used to increase efficiency and to reduce the generated code size. The lower-level DFA efficiently parses syntactically well-formed XML messages. The higher-level DFA validates the messages and produces application events associated with transitions in the DFA. Two example case studies are presented and performance results are given to demonstrate that the approach supports the implementation of high-performance Web services.
Engelen “ A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services”,
- in IEEE International Conference on Web Services (ICWS’06),
, 2006
"... ..."
(Show Context)
An End-to-end Web Services-based Infrastructure for Biomedical Applications
- In 6th IEEE/ACM International Workshop on Grid Computing
, 2005
"... Abstract — Services-oriented architectures hold a lot of promise for grid-enabling scientific applications. In recent times, Web services have gained wide-spread acceptance in the Grid community as the standard way of exposing application functionality to endusers. Web services-based architectures p ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Services-oriented architectures hold a lot of promise for grid-enabling scientific applications. In recent times, Web services have gained wide-spread acceptance in the Grid community as the standard way of exposing application functionality to endusers. Web services-based architectures provide accessibility via a multitude of clients, and the ability to enable composition of data and applications in novel ways for facilitating innovation across scientific disciplines. However, issues of diverse data formats and styles which hinder interoperability and integration must be addressed. Providing Web service wrappers for legacy applications alleviates many problems because of the exchange of strongly typed data, defined and validated using XML schemas, that can be used by workflow tools for application integration. In this paper, we describe the end-to-end architecture of such a system for biomedical applications that are part of the National Biomedical Computation Resource (NBCR). We present the technical challenges in setting up such an infrastructure, and discuss in detail the back-end resource management, application services, user-interfaces, and the security infrastructure for the same. We also evaluate our prototype infrastructure, discuss some of its shortcomings, and the future work that may be required to address them. I.
An Adaptive, Fast, and Safe XML Parser Based on Byte Sequences Memorization
"... XML (Extensible Markup Language) processing can incur significant runtime overhead in XML-based infrastructural middleware such as Web service application servers. This paper proposes a novel mechanism for efficiently processing similar XML documents. Given a new XML document as a byte sequence, the ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
XML (Extensible Markup Language) processing can incur significant runtime overhead in XML-based infrastructural middleware such as Web service application servers. This paper proposes a novel mechanism for efficiently processing similar XML documents. Given a new XML document as a byte sequence, the XML parser proposed in this paper normally avoids syntactic analysis but simply matches the document with previously processed ones, reusing those results. Our parser is adaptive since it partially parses and then remembers XML document fragments that it has not met before. Moreover, it processes safely since its partial parsing correctly checks the well-formedness of documents. Our implementation of the proposed parser complies with the JSR 63 standard of the Java API for XML Processing (JAXP) 1.1 specification. We evaluated Deltarser performance with messages using Google Web services. Comparing to Piccolo (and Apache Xerces), it effectively parses 35 % (106%) faster in a server-side use-case scenario, and 73 % (126%) faster in a client-side use-case scenario.
ParaXML: A Parallel XML Processing Model on the Multicore CPUs
"... performance and scale well on a multicore machine. XML has emerged as the de facto standard interoperable data format for the web service, the database and document processing systems. The processing of the XML documents, however, has been recognized as the performance bottleneck in those systems; a ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
performance and scale well on a multicore machine. XML has emerged as the de facto standard interoperable data format for the web service, the database and document processing systems. The processing of the XML documents, however, has been recognized as the performance bottleneck in those systems; as a result the demand for highperformance XML processing grows rapidly. On the hardware front, the multicore processor is increasingly becoming available on desktop-computing machines with quadcore shipping now and 16 core system within two or three years. Unfortunately almost all of the present XML processing algorithms are still using serial processing model, thus being unable to take advantage of the multicore resource. We believe a parallel XML processing model should be a cost-effective solution for the XML performance issue in the multicore era. In this paper, we present a generalpurpose parallel XML processing model, ParaXML, designed for multicore CPUs. General speaking, ParaXML treats the XML document as the general tree structure and the XML processing task as the extension from the parallel tree traversal algorithm for the classic discrete optmization problems. The XML processing, however, has quite distinct characteristics from the classic discrete optmization problems, thus demanding the special treatments and the finegrained tuning technologies. ParaXML internally adopts a fine-grained work-stealing scheme to dynamically control the load balance among the parallel-running threads, and a novel approach is also introduced to trace the stealing actions and the running results to facilitate the reducing of those parallel-running results. Besides, ParaXML provides the tuning options, particularly for the large XML documents, to control the trade-off between the parallelism gain and task-partitioning overhead. To show the feasibility and effectiveness of the ParaXML model, we demonstrate our parallel implementations of three fundamental XML processing tasks based on the ParaXML: traversal, serializing and parsing. The empirical study in this paper shows that those parallel implementations substantially improved the 1