Results 1 - 10
of
82
Applying Model Management to Classical Meta Data Problems
, 2003
"... Model management is a new approach to meta data management that offers a higher level programming interface than current techniques. The main abstractions are models (e.g., schemas, interface definitions) and mappings between models. It treats these abstractions as bulk objects and offers such ..."
Abstract
-
Cited by 259 (21 self)
- Add to MetaCart
Model management is a new approach to meta data management that offers a higher level programming interface than current techniques. The main abstractions are models (e.g., schemas, interface definitions) and mappings between models. It treats these abstractions as bulk objects and offers such operators as Match, Merge, Diff, Compose, Apply, and ModelGen. This paper extends earlier treatments of these operators and applies them to three classical meta data management problems: schema integration, schema evolution, and round-trip engineering.
Meaningful change detection in structured data,”
- in SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data.
, 1997
"... Abstract Detecting changes by comparing data snapshots is an important requirement for di erence queries, active databases, and version and con guration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This is a much m ..."
Abstract
-
Cited by 144 (7 self)
- Add to MetaCart
Abstract Detecting changes by comparing data snapshots is an important requirement for di erence queries, active databases, and version and con guration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This is a much more challenging problem than the corresponding one for relational or at-le data. In order to describe changes better, we base our work not just on the traditional atomic" insert, delete, update operations, but also on operations that move a n e n tire sub-tree of nodes, and that copy a n e n tire sub-tree. This allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP -hard, in this paper we present a heuristic change detection algorithm that yields close to minimal" descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a minimum-cost edge cover of a bipartite graph. We study the quality of the solution produced by our algorithm, as well as the running time, both analytically and experimentally.
An Efficient Index Structure for String Databases
- In VLDB
, 2001
"... We consider the problem of substring searching in large databases. Typical applications of this problem are genetic data, web data, and event sequences. Since the size of such databases grows exponentially, it becomes impractical to use inmemory algorithms for these problems. In this paper, we ..."
Abstract
-
Cited by 81 (9 self)
- Add to MetaCart
(Show Context)
We consider the problem of substring searching in large databases. Typical applications of this problem are genetic data, web data, and event sequences. Since the size of such databases grows exponentially, it becomes impractical to use inmemory algorithms for these problems. In this paper, we propose to map the substrings of the data into an integer space with the help of wavelet coefficients. Later, we index these coefficients using MBRs (Minimum Bounding Rectangles). We define a distance function which is a lower bound to the actual edit distance between strings. We experiment with both nearest neighbor queries and range queries. The results show that our technique prunes significant amount of the database (typically 50-95%), thus reducing both the disk I/O cost and the CPU cost significantly. 1
A methodology for clustering XML documents by structure
- Information Systems
, 2006
"... The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
The processing and management of XML data are popular research issues. However, operations based on the structure of XML data have not received strong attention. These operations involve, among others, the grouping of structurally similar XML documents. Such grouping results from the application of clustering methods with distances that estimate the similarity between tree structures. This paper presents a framework for clustering XML documents by structure. Modeling the XML documents as rooted ordered labeled trees, we study the usage of structural distance metrics in hierarchical clustering algorithms to detect groups of structurally similar XML documents. We suggest the usage of structural summaries for trees to improve the performance of the distance calculation and at the same time to maintain or even improve its quality. Our approach is tested using a prototype testbed.
Wooki: a P2P Wiki-based Collaborative Writing Tool.
- IN 8TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING – WISE 2007
, 2007
"... Wiki systems are becoming an important part of the information system of many organisations and communities. This introduce the issue of the data availability in case of failure, heavy load or off-line access. We propose to replicate wiki page ..."
Abstract
-
Cited by 38 (18 self)
- Add to MetaCart
Wiki systems are becoming an important part of the information system of many organisations and communities. This introduce the issue of the data availability in case of failure, heavy load or off-line access. We propose to replicate wiki pages across a P2P network of wiki engines. We address the problem of consistency of replicated wiki pages in the context of a P2P wiki system. In this paper, we present the architecture and the underlying algorithms of the wooki system. Compared to traditional wikis, Wooki is P2P wiki which scales, delivers better performances and allows off-line access.
Theoretical and practical improvements on the RMQ-problem, with applications to LCA and LCE
- PROC. CPM. VOLUME 4009 OF LNCS
, 2006
"... The Range-Minimum-Query-Problem is to preprocess an array such that the position of the minimum element between two specified indices can be obtained efficiently. We present a direct algorithm for the general RMQ-problem with linear preprocessing time and constant query time, without making use of ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
(Show Context)
The Range-Minimum-Query-Problem is to preprocess an array such that the position of the minimum element between two specified indices can be obtained efficiently. We present a direct algorithm for the general RMQ-problem with linear preprocessing time and constant query time, without making use of any dynamic data structure. It consumes less than half of the space that is needed by the method by Berkman and Vishkin. We use our new algorithm for RMQ to improve on LCA-computation for binary trees, and further give a constant-time LCE-algorithm solely based on arrays. Both LCA and LCE have important applications, e.g., in computational biology. Experimental studies show that our new method is almost twice as fast in practice as previous approaches, and asymptotically slower variants of the constant-time algorithms perform even better for today’s common problem sizes.
Ontology-Based Information Extraction: An Introduction and a Survey of Current Approaches
"... ..."
(Show Context)
Similarity metric for XML documents
- In Proc. of Workshop on Knowledge and Experience Management
, 2003
"... Since XML documents can be represented as trees, Based on traditional tree edit distance, this paper presents structural similarity metric for XML documents,which is based on edge constraint, path constraint, and inclusive path constraint, and similarity metric based on machine learning with node co ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Since XML documents can be represented as trees, Based on traditional tree edit distance, this paper presents structural similarity metric for XML documents,which is based on edge constraint, path constraint, and inclusive path constraint, and similarity metric based on machine learning with node costs. It extends scope for searching XML documents, and improves recall and precision for searching XML documents. 1
Measuring Similarity of Large Software Systems Based on Source Code Correspondence
- In: Product Focused Software Process Improvement. Volume 3547 of Lecture Notes in Computer Studies. Springer Berlin / Heidelberg (2005) 530–544 http://www.springerlink.com/content/ h6m2vg5c3ejk38l4
, 2002
"... Abstract. It is an important and intriguing issue to know the quantitative similarity of large software systems. In this paper, a similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed. A Software similarity MeAsurement Tool SMAT ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
Abstract. It is an important and intriguing issue to know the quantitative similarity of large software systems. In this paper, a similarity metric between two sets of source code files based on the correspondence of overall source code lines is proposed. A Software similarity MeAsurement Tool SMAT was developed and applied to various versions of an operating system(BSD UNIX). The resulting similarity valuations clearly revealed the evolutionary history characteristics of the BSD UNIX Operating System. 1
Data-driven Multilingual Coreference Resolution using Resolver Stacking
"... This paper describes our contribution to the CoNLL 2012 Shared Task. 1 We present a novel decoding algorithm for coreference resolution which is combined with a standard pair-wise coreference resolver in a stacking approach. The stacked decoders are evaluated on the three languages of the Shared Tas ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
This paper describes our contribution to the CoNLL 2012 Shared Task. 1 We present a novel decoding algorithm for coreference resolution which is combined with a standard pair-wise coreference resolver in a stacking approach. The stacked decoders are evaluated on the three languages of the Shared Task. We obtain an official overall score of 58.25 which is the second highest in the Shared Task.