@MISC{Mozes08somelower, author = {Shay Mozes}, title = {Some Lower and Upper Bounds for Tree Edit Distance}, year = {2008} }

Share

OpenURL

Abstract

In this report I describe my results on the Tree Edit Distance problem [13, 27]. The edit distance between two ordered rooted trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. Tree Edit Distance has applications in many fields such as computer vision, computational biology and compiler optimization. I describe an algorithm that computes the edit distance between two trees of sizes n and m, where m < n, and runs in O(nm² (1+log n m)) = O(n³) time and O(nm) space. The previously best known algorithm for this problem, which is due to Philip Klein [22], runs in O(m2n log n) = O(n3 log n) time and O(mn) space. Next, a matching lower bound is proved for the family of decomposition strategy algorithms, which includes the previous fastest algorithms for this problem. The best previously known lower bound for this family was Ω(n2 log 2 n). Finally, I describe recent results on the Longest Common Subtree problem. This is an interesting special case of Tree Edit Distance in which only insertions and deletions are considered (i.e., the cost of all relabeling operations is infinite, and the cost of any insertion or deletion is 1). I describe a few algorithms for this problem, the fastest of which runs in O(Lr log r log log m), where L is the size of the LCS (L ≤ m) and r is the number of pairs of vertices with matching labels, one from each tree (r ≤ nm). These algorithms combine techniques from sparse string LCS (Longest Common Subsequence), with Tree Edit Distance algorithms. The tree edit distance paper [13] is a joint work with Erik Demaine, Benjamin Rossman