Results 1 -
4 of
4
Data Integration and Data Exchange: It’s Really About Time
"... With the deluge in the amount and variety of data in the world, it is rare for data that describes an entity to be completely contained and managed by a single data source. As a consequence, there is often great value in combining data about an entity from multiple sources, and also from versions of ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
With the deluge in the amount and variety of data in the world, it is rare for data that describes an entity to be completely contained and managed by a single data source. As a consequence, there is often great value in combining data about an entity from multiple sources, and also from versions of data reported by the same source over time. Data integration in which multiple dimensions of time may be expressed explicitly (e.g., as part of the data itself) or implicitly (e.g., the publication date of a data source), must be performed with great care. This is because each data source contains only partial (time-specific) knowledge about an entity, and thus their collective knowledge about the entity may contain conflicts that need to be resolved. In this paper, we call for a formal framework for data integration and data exchange across time that would facilitate the creation of consistent and integrated longitudinal knowledge about entities. We call such longitudinal knowledge of an entity its whenprovenance, which intuitively corresponds to when one knows what one knows about the entity. We believe that the vision and research directions described in this paper will serve to instigate the research and development of the next generation data integration and data exchange system, where both data and time can be reasoned on equal footing. 1.
SliceSort: Efficient Sorting of Hierarchical Data
"... Sortingisafundamentaloperationindataprocessing. While the problem of sorting flat data records has been extensively studied, there is very little work on sorting hierarchical data such as XML documents. Existing hierarchy-aware sorting approaches for hierarchical dataare based on creating sorted sub ..."
Abstract
- Add to MetaCart
(Show Context)
Sortingisafundamentaloperationindataprocessing. While the problem of sorting flat data records has been extensively studied, there is very little work on sorting hierarchical data such as XML documents. Existing hierarchy-aware sorting approaches for hierarchical dataare based on creating sorted subtrees as initial sorted runs and merging sorted subtrees to create the sorted output using either explicit pointers or absolute node key comparisons for merging subtrees. In this paper, we propose SliceSort, a novel, level-wise sortingtechniqueforhierarchical datathatavoidsthedrawbacks of subtree-based sorting techniques. Our experimental performance evaluation shows that SliceSort outperforms the state-of-art approach, HErMeS, by up to a factor of 27%.
Preference-aware Integration of Temporal Data
"... A complete description of an entity is rarely contained in a single data source, but rather, it is often distributed across different data sources. Applications based on personal electronic health records, sentiment analysis, and financial records all illustrate that signifi-cant value can be derive ..."
Abstract
- Add to MetaCart
(Show Context)
A complete description of an entity is rarely contained in a single data source, but rather, it is often distributed across different data sources. Applications based on personal electronic health records, sentiment analysis, and financial records all illustrate that signifi-cant value can be derived from integrated, consistent, and query-able profiles of entities from different sources. Even more so, such integrated profiles are considerably enhanced if temporal informa-tion from different sources is carefully accounted for. We develop a simple and yet versatile operator, called PRAWN, that is typically called as a final step of an entity integration work-flow. PRAWN is capable of consistently integrating and resolv-ing temporal conflicts in data that may contain multiple dimen-sions of time based on a set of preference rules specified by a user (hence the name PRAWN for preference-aware union). In the event that not all conflicts can be resolved through preferences, one can enumerate each possible consistent interpretation of the result returned by PRAWN at a given time point through a polynomial-delay algorithm. In addition to providing algorithms for imple-menting PRAWN, we study and establish several desirable proper-ties of PRAWN. First, PRAWN produces the same temporally inte-grated outcome, modulo representation of time, regardless of the order in which data sources are integrated. Second, PRAWN can be customized to integrate temporal data for different applications by specifying application-specific preference rules. Third, we show experimentally that our implementation of PRAWN is feasible on both “small ” and “big ” data platforms in that it is efficient in both storage and execution time. Finally, we demonstrate a fundamental advantage of PRAWN: we illustrate that standard query languages can be immediately used to pose useful temporal queries over the integrated and resolved entity repository. 1.