Results 11 - 20
of
127
A System for Approximate Tree Matching
, 1992
"... Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology, programming compilation and natural language processing. Many of the applications ..."
Abstract
-
Cited by 58 (10 self)
- Add to MetaCart
Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology, programming compilation and natural language processing. Many of the applications involve comparing trees or retrieving/extracting information from a repository of trees. Examples include classification of unknown patterns, analysis of newly sequenced RNA structures, semantic taxonomy for dictionary definitions, generation of interpreters for nonprocedural programming languages, and automatic error recovery and correction for programming languages. Previous systems use exact matching (or generalized regular expression matching) for tree comparison. This paper presents a system, called Approximate-Tree-By-Example (ATBE), which allows inexact matching of trees. The ATBE system interacts with the user through a simple, but powerful query language; graphical devices a...
K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
, 2000
"... The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with t ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.
Approximate Tree Pattern Matching
- In Pattern Matching Algorithms
, 1995
"... this article proceeds on the assumption that this question has a negative response. In particular, we discuss the best known algorithms for tree editing and several variations having to do with subtree removal, variable length don't cares, and alignment. We discuss both sequential and parallel algor ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
this article proceeds on the assumption that this question has a negative response. In particular, we discuss the best known algorithms for tree editing and several variations having to do with subtree removal, variable length don't cares, and alignment. We discuss both sequential and parallel algorithms. We present negative results having to do with unordered trees (trees whose sibling order is arbitrary) and a few approximation algorithms. Finally, we discuss the problem of finding commonalities among a set of trees.
Approximate Tree Matching in the Presence of Variable Length Don't Cares
- Journal of Algorithms
, 1993
"... Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable-length don't cares (VLDC's). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols i ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable-length don't cares (VLDC's). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if "comer" is the pattern, then the "" would substitute for the substring "put" when matching the data string "computer". Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, "comer" matches "counter" within distance 1 (representing the cost of removing the "m" from "comer" and having the "" substitute for "unt"). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(jP j \Theta jDj \Theta min(depth(P ); leaves(P )) \Theta min(de...
An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology and natural language processing. We consider a substructure of an ordered label ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology and natural language processing. We consider a substructure of an ordered labeled tree T to be a connected subgraph of T . Given two ordered labeled trees T1 and T2 and an integer d, the largest approximately common substructure problem is to find a substructure U1 of T1 and a substructure U2 of T2 such that U1 is within edit distance d of U2 and where there does not exist any other substructure V1 of T1 and V2 of T2 such that V1 and V2 satisfy the distance constraint and the sum of the sizes of V1 and V2 is greater than the sum of the sizes of U1 and U2 . We present a dynamic programming algorithm to solve this problem, which runs as fast as the fastest known algorithm for computing the edit distance of two trees when the distance allowed in the common substruc...
Landscapes - Complex Optimization Problems and Biopolymer Structures
- Computers Chem
, 1993
"... The evolution of RNA molecules in replication assays, viroids and RNA viruses can be viewed as an adaptation process on a 'fitness' landscape. The dynamics of evolution is hence tightly linked to the structure of the underlying landscape. Global features of landscapes can be described by statistical ..."
Abstract
-
Cited by 30 (16 self)
- Add to MetaCart
The evolution of RNA molecules in replication assays, viroids and RNA viruses can be viewed as an adaptation process on a 'fitness' landscape. The dynamics of evolution is hence tightly linked to the structure of the underlying landscape. Global features of landscapes can be described by statistical measures like number of optima, lengths of walks, and correlation functions. The evolution of a quasispecies on such landscapes exhibits three dynamical regimes depending on the replication fidelity: Above the "localization threshold" the population is centered around a (local) optimum. Between localization and "dispersion threshold" the population is still centered around a consensus sequence, which, however, changes in time. For very large mutation rates the population spreads in sequence space like a gas. The critical mutation rates separating the three domains depend strongly on characteristics properties of the fitness landscapes. Statistical characteristics of RNA landscapes are acces...
An optimal decomposition algorithm for tree edit distance
- In Proceedings of the 34th International Colloquium on Automata, Languages and Programming (ICALP
, 2007
"... Abstract. The edit distance between two ordered rooted trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. In this paper, we present a worst-c ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Abstract. The edit distance between two ordered rooted trees with vertex labels is the minimum cost of transforming one tree into the other by a sequence of elementary operations consisting of deleting and relabeling existing nodes, as well as inserting new nodes. In this paper, we present a worst-case O(n 3)-time algorithm for this problem, improving the previous best O(n 3 log n)-time algorithm [9]. Our result requires a novel adaptive strategy for deciding how a dynamic program divides into subproblems, together with a deeper understanding of the previous algorithms for the problem. We prove the optimality of our algorithm among the family of decomposition strategy algorithms—which also includes the previous fastest algorithms—by tightening the known lower bound of Ω(n 2 log 2 n) [6] to Ω(n 3), matching our algorithm’s running time. Furthermore, we obtain matching upper and lower bounds of)) when the two trees have sizes m and n where m < n. Θ(nm 2 (1 + log n m
Enabling web browsers to augment web sites’ filtering and sorting functionalities
- ACM Symposium on User Interface Software and Technology (UIST
, 2006
"... Existing augmentations of web pages are mostly small cosmetic changes (e.g., removing ads) and minor addition of third-party content (e.g., product prices from competing sites). None leverages the structured data presented in web pages. This paper describes Sifter, a web browser extension that can a ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Existing augmentations of web pages are mostly small cosmetic changes (e.g., removing ads) and minor addition of third-party content (e.g., product prices from competing sites). None leverages the structured data presented in web pages. This paper describes Sifter, a web browser extension that can augment a web site with advanced filtering and sorting functionality. These added features work inside the site’s own pages, preserving the site’s presentational style, as if the site itself has implemented the features. Sifter contains an algorithm that scrapes structured data out of web pages while usually requiring no user intervention. We tested Sifter on real web sites and real users and found that people could use Sifter to perform sophisticated queries and high-level analyses on sizable data collections on the Web. We propose that web sites can be similarly augmented with other sophisticated data-centric functionality, giving users new benefits over the existing Web. ACM Classification: H5.2 [Information interfaces and presentation]: User Interfaces – Graphical user interfaces (GUI).
Mobile Computing Middleware
- In Advanced lectures on networking
, 2002
"... Recent advances in wireless networking technologies and the growing success of mobile computing devices, such as laptop computers, third generation mobile phones, personal digital assistants, watches and the like, are enabling new classes of applications that present challenging problems to desi ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Recent advances in wireless networking technologies and the growing success of mobile computing devices, such as laptop computers, third generation mobile phones, personal digital assistants, watches and the like, are enabling new classes of applications that present challenging problems to designers. Mobile devices face temporary loss of network connectivity when they move; they are likely to have scarce resources, such as low battery power, slow CPU speed and little memory; they are required to react to frequent and unannounced changes in the environment, such as high variability of network bandwidth, and in the resources availability. To support designers building mobile applications, research in the field of middleware systems has proliferated. Middleware aims at facilitating communication and coordination of distributed components, concealing complexity raised by mobility from application engineers as much as possible. In this survey, we examine characteristics of mobile distributed systems and distinguish them from their fixed counterpart.
Approximate Tree Embedding for Querying XML Data
- IN ACM SIGIR WORKSHOP ON XML AND INFORMATION RETRIEVAL
, 2000
"... Querying heterogeneous collections of data-centric XML documents requires a combination of database languages and concepts used in information retrieval, in particular similarity search and ranking. In this paper we present an approach to nd approximate answers to formal user queries. We reduce t ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
Querying heterogeneous collections of data-centric XML documents requires a combination of database languages and concepts used in information retrieval, in particular similarity search and ranking. In this paper we present an approach to nd approximate answers to formal user queries. We reduce the problem of answering queries against XML document collections to the well-known unordered tree inclusion problem. We extend this problem to an optimization problem by applying a cost model to the embeddings. Thereby we are able to determine how close parts of the XML document match a user query. We present an ecient algorithm that nds all approximate matches and ranks them according to their similarity to the query.

