Results 1 - 10
of
280
Simple fast algorithms for the editing distance between trees and related problems
- SIAM J. COMPUT
, 1989
"... Ordered labeled trees are trees in which the left-to-right order among siblings is. significant. The distance between two ordered trees is considered to be the weighted number of edit operations (insert, delete, and modify) to transform one tree to another. The problem of approximate tree matching i ..."
Abstract
-
Cited by 405 (12 self)
- Add to MetaCart
(Show Context)
Ordered labeled trees are trees in which the left-to-right order among siblings is. significant. The distance between two ordered trees is considered to be the weighted number of edit operations (insert, delete, and modify) to transform one tree to another. The problem of approximate tree matching is also considered. Specifically, algorithms are designed to answer the following kinds of questions: 1. What is the distance between two trees? 2. What is the minimum distance between T and T when zero or more subtrees can be removed from T2 3. Let the pruning of a tree at node n mean removing all the descendants of node n. The analogous question for prunings as for subtrees is answered. A dynamic programming algorithm is presented to solve the three questions in sequential time O(I Tll x IT2lxmin (depth ( Tt), leaves ( T)) x min (depth(T2), leaves(T2))) and space O(Ir, x lT21) compared with o(I T,I IT=I x(depth(T)): x (depth(T2))) for the best previous published algorithm due to Tai [J. Assoc. Comput. Mach., 26 (1979), pp. 422-433]. Further, the algorithm presented here can be parallelized to give
Recognition of Shapes by Editing Their Shock Graphs
, 2004
"... This paper presents a novel framework for the recognition of objects based on their silhouettes. The main idea is to measure the distance between two shapes as the minimum extent of deformation necessary for one shape to match the other. Since the space of deformations is very high-dimensional, thr ..."
Abstract
-
Cited by 204 (8 self)
- Add to MetaCart
(Show Context)
This paper presents a novel framework for the recognition of objects based on their silhouettes. The main idea is to measure the distance between two shapes as the minimum extent of deformation necessary for one shape to match the other. Since the space of deformations is very high-dimensional, three steps are taken to make the search practical: 1) define an equivalence class for shapes based on shock-graph topology, 2) define an equivalence class for deformation paths based on shock-graph transitions, and 3) avoid complexity-increasing deformation paths by moving toward shock-graph degeneracy. Despite these steps, which tremendously reduce the search requirement, there still remain numerous deformation paths to consider. To that end, we employ an edit-distance algorithm for shock graphs that finds the optimal deformation path in polynomial time. The proposed approach gives intuitive correspondences for a variety of shapes and is robust in the presence of a wide range of visual transformations. The recognition rates on two distinct databases of 99 and 216 shapes each indicate highly successful within category matches (100 percent in top three matches), which render the framework potentially usable in a range of shape-based recognition applications.
Detecting Changes in XML Documents
- In ICDE
, 2001
"... We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of s ..."
Abstract
-
Cited by 159 (3 self)
- Add to MetaCart
(Show Context)
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volume of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of "quality". Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NPhard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the "optimal" in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web. 1
Archiving scientific data
- In ACM SIGMOD
, 2002
"... Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each elemen ..."
Abstract
-
Cited by 143 (12 self)
- Add to MetaCart
Archiving is important for scientific data, where it is necessary to record all past versions of a database in order to verify findings based upon a specific version. Much scientific data is held in a hierachical format and has a key structure that provides a canonical identification for each element of the hierarchy. In this article, we exploit these properties to develop an archiving technique that is both efficient in its use of space and preserves the continuity of elements through versions of the database, something that is not provided by traditional minimum-edit-distance diff approaches. The approach also uses timestamps. All versions of the data are merged into one hierarchy where an element appearing in multiple versions is stored only once along with a timestamp. By identifying the semantic continuity of elements and merging them into one data structure, our technique is capable of providing meaningful change descriptions, the archive allows us to easily answer certain temporal queries such as retrieval of any specific version from the archive and finding the history of an element. This is in contrast with approaches that store a sequence of deltas where such operations may require undoing a large number of changes or significant reasoning with the
X-Diff: An Effective Change Detection Algorithm for XML Documents
"... XML has become the de facto standard format for web publishing and data transportation. Since online information changes frequently, being able to quickly detect changes in XML documents is important to Internet query systems, search engines, and continuous query systems. Previous work in change det ..."
Abstract
-
Cited by 132 (0 self)
- Add to MetaCart
XML has become the de facto standard format for web publishing and data transportation. Since online information changes frequently, being able to quickly detect changes in XML documents is important to Internet query systems, search engines, and continuous query systems. Previous work in change detection on XML, or other hierarchically structured documents, used an ordered tree model, in which left-to-right order among siblings is important and it can affect the change result. This paper argues that an unordered model (only ancestor relationships are significant) is more suitable for most database applications. Using an unordered model, change detection is substantially harder than using the ordered model, but the change result that it generates is more accurate. This paper proposes X-Diff, an effective algorithm that integrates key XML structure characteristics with standard tree-to-tree correction techniques. The algorithm is analyzed and compared with XyDiff [CAM02], a published XML diff algorithm. An experimental evaluation on both algorithms is provided.
Identifying Syntactic Differences Between Two Programs
- Software - Practice and Experience
, 1991
"... this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the tree-matching algorithm and the synchronous pretty-printing technique are described. Experience with the comparator for the C languag ..."
Abstract
-
Cited by 119 (0 self)
- Add to MetaCart
this paper is organized into five sections, as follows. The internal form of a program, which is a variant of a parse tree, is discussed in the next section. Then the tree-matching algorithm and the synchronous pretty-printing technique are described. Experience with the comparator for the C language and some performance measurements are also presented. The last section discusses related work and concludes this paper
Identifying the Semantic and Textual Differences Between Two Versions of a Program
- Proceedings of the ACM SIGPLAN 90 Conference on Programming Language Design and Implementation
, 1990
"... Text-based file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describe ..."
Abstract
-
Cited by 109 (5 self)
- Add to MetaCart
Text-based file comparators (e.g., the Unix utility diff), are very general tools that can be applied to arbitrary files. However, using such tools to compare programs can be unsatisfactory because their only notion of change is based on program text rather than program behavior. This paper describes a technique for comparing two versions of a program, determining which program components represent changes, and classifying each changed component as representing either a semantic or a textual change. ######################## This work was supported in part by the Defense Advanced Research Projects Agency, monitored by the Office of Naval Research under contract N00014-88-K, by the National Science Foundation under grant CCR8958530, and by grants from Xerox, Kodak, and Cray. Author's address: Computer Sciences Department, Univ. of Wisconsin, 1210 W. Dayton St., Madison, WI 53706. Permission to copy without fee all or part of this material is granted provided that the copies are not made...
Evaluating Structural Similarity in XML Documents
, 2002
"... XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe th ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work (cf. [10]) has given us a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents with unknown DTDs, it may not be appropriate to construct a single DTD to describe every document in the collection. Instead, we would wish to partition the collection into smaller sets of "similar" documents, and then induce a separate DTD for each such set. It is this partitioning problem that we address in this paper. Given two
XMIDDLE: A Data-Sharing Middleware for Mobile Computing
- Int. Journal on Personal and Wireless Communications
, 2002
"... Abstract. An increasing number of distributed applications will be written for mobile hosts, such as laptop computers, third generation mobile phones, personal digital assistants, watches and the like. Application engineers have to deal with a new set of problems caused by mobility, such as low band ..."
Abstract
-
Cited by 98 (10 self)
- Add to MetaCart
(Show Context)
Abstract. An increasing number of distributed applications will be written for mobile hosts, such as laptop computers, third generation mobile phones, personal digital assistants, watches and the like. Application engineers have to deal with a new set of problems caused by mobility, such as low bandwidth, context changes or loss of connectivity. During disconnection, users will typically update local replicas of shared data independently from each other. The resulting inconsistent replicas need to be reconciled upon re-connection. To support building mobile applications that use both replication and reconciliation over ad-hoc networks, we have designed xmiddle, a mobile computing middleware. In this paper we describe xmiddle and show how it uses reflection capabilities to allow application engineers to influence replication and reconciliation techniques. xmiddle enables the transparent sharing of XML documents across heterogeneous mobile hosts, allowing on-line and off-line access to data. We describe xmiddle using a collaborative e-shopping case study on mobile clients.
Computing the edit-distance between unrooted ordered trees
- In Proceedings of the 6th annual European Symposium on Algorithms (ESA
, 1998
"... Abstract. An ordered tree is a tree in which each node’s incident edges are cyclically ordered; think of the tree as being embedded in the plane. Let A and B be two ordered trees. The edit distance between A and B is the minimum cost of a sequence of operations (contract an edge, uncontract an edge, ..."
Abstract
-
Cited by 98 (0 self)
- Add to MetaCart
Abstract. An ordered tree is a tree in which each node’s incident edges are cyclically ordered; think of the tree as being embedded in the plane. Let A and B be two ordered trees. The edit distance between A and B is the minimum cost of a sequence of operations (contract an edge, uncontract an edge, modify the label of an edge) needed to transform A into B. WegiveanO(n 3 log n) algorithm to compute the edit distance between two ordered trees. 1