Results 1 - 10
of
29
Provenance-Aware Storage Systems
, 2006
"... A Provenance-Aware Storage System (PASS) is a storage system that automatically collects and maintains provenance or lineage, the complete history or ancestry of an item. We discuss the advantages of treating provenance as meta-data collected and maintained by the storage system, rather than as manu ..."
Abstract
-
Cited by 175 (20 self)
- Add to MetaCart
A Provenance-Aware Storage System (PASS) is a storage system that automatically collects and maintains provenance or lineage, the complete history or ancestry of an item. We discuss the advantages of treating provenance as meta-data collected and maintained by the storage system, rather than as manual annotations stored in a separately administered database. We describe a PASS implementation, discussing the challenges it presents, performance cost it incurs, and the new functionality it enables. We show that with reasonable overhead, we can provide useful functionality not available in today’s file systems or provenance management systems.
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
- In ICWS
, 2006
"... The increasing ability for the earth sciences to sense the world around us is resulting in a growing need for datadriven applications that are under the control of data-centric workflows composed of grid- and web- services. The focus of our work is on provenance collection for these workflows, neces ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
(Show Context)
The increasing ability for the earth sciences to sense the world around us is resulting in a growing need for datadriven applications that are under the control of data-centric workflows composed of grid- and web- services. The focus of our work is on provenance collection for these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1 % for an eight service workflow using 271 data products). 1.
Data Provenance: A Categorization of Existing Approaches
, 2007
"... In many application areas like e-science and data-warehousing detailed information about the origin of data is required. This kind of information is often referred to as data provenance or data lineage. The provenance of a data item includes information about the processes and source data items that ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
In many application areas like e-science and data-warehousing detailed information about the origin of data is required. This kind of information is often referred to as data provenance or data lineage. The provenance of a data item includes information about the processes and source data items that lead to its creation and current representation. The diversity of data representation models and application domains has lead to a number of more or less formal definitions of provenance. Most of them are limited to a special application domain, data representation model or data processing facility. Not surprisingly, the associated implementations are also restricted to some application domain and depend on a special data model. In this paper we give a survey of data provenance models and prototypes, present a general categorization scheme for provenance models and use this categorization scheme to study the properties of the existing approaches. This categorization enables us to distinguish between different kinds of provenance information and could lead to a better understanding of provenance in general. Besides the categorization of provenance types, it is important to include the storage, transformation and query requirements for the different kinds of provenance information and application domains in our considerations. The analysis of existing approaches will assist us in revealing open research problems in the area of data provenance.
Survey on Security Issues in Cloud Computing and Associated Mitigation Techniques
"... Cloud Computing holds the potential to eliminate the requirements for setting up of high-cost computing infrastructure for IT-based solutions and services that the industry uses. It promises to provide a flexible IT architecture, accessible through internet from lightweight portable devices. This wo ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Cloud Computing holds the potential to eliminate the requirements for setting up of high-cost computing infrastructure for IT-based solutions and services that the industry uses. It promises to provide a flexible IT architecture, accessible through internet from lightweight portable devices. This would allow multi-fold increase in the capacity and capabilities of the existing and new software. In a cloud computing environment, the entire data resides over a set of networked resources, enabling the data to be accessed through virtual machines. Since these data-centres may be located in any part of the world beyond the reach and control of users, there are multifarious security and privacy challenges that need to be understood and addressed. Also, one can never deny the possibility of a server breakdown that has been witnessed, rather quite often in the recent times. There are various issues that need to be addressed with respect to security and privacy in a cloud computing environment. This extensive survey paper aims to elaborate and analyze the numerous unresolved issues threatening the cloud computing adoption and diffusion affecting the various stake-holders associated with it.
A New Perspective on Semantics of Data Provenance
"... Abstract: Data Provenance refers to the “origin”, “lineage”, and “source ” of data. In this work, we examine provenance from a semantics perspective and present the W7 model, an ontological model of data provenance. In the W7 model, provenance is conceptualized as a combination of seven interconnect ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract: Data Provenance refers to the “origin”, “lineage”, and “source ” of data. In this work, we examine provenance from a semantics perspective and present the W7 model, an ontological model of data provenance. In the W7 model, provenance is conceptualized as a combination of seven interconnected elements including “what”, “when”, “where”, “how”, “who”, “which ” and “why”. Each of these components may be used to track events that affect data during its lifetime. The W7 model is general and extensible enough to capture provenance semantics for data in different domains. Using the example of the Wikipedia, we illustrate how the W7 model can capture domain or application specific provenance. 1.
Exploring Provenance in a Distributed Job Execution System
- In Proc. of the International Provenance and Annotation Workshop (IPAW
"... Abstract. We examine provenance in the context of a distributed job execution system. It is crucial to capture provenance information during the execution of a job in a distributed environment because often this information is lost once the job has finished. In this paper we discuss the type of inf ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We examine provenance in the context of a distributed job execution system. It is crucial to capture provenance information during the execution of a job in a distributed environment because often this information is lost once the job has finished. In this paper we discuss the type of information that is available within a distributed job execution system, how to capture such information, and what the burdens on the user and system are when such information is captured. We identify what we think is the key data that must be captured and discuss the collection of provenance in the Quill++ project of Condor. Our conclusion is that it is possible to capture important provenance information in a distributed job execution system with relatively little intrusion on the user or the system.
FlowRecommender: A Workflow Recommendation Technique for Process Provenance
"... The increasingly complicated workflow systems necessitates the development of automated workflow recommendation techniques, which are able to not only speed up the workflow construction process, but also reduce the errors that are possibly made. The existing workflow recommendation systems are quite ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
The increasingly complicated workflow systems necessitates the development of automated workflow recommendation techniques, which are able to not only speed up the workflow construction process, but also reduce the errors that are possibly made. The existing workflow recommendation systems are quite limited in that they cannot produce a correct recommendation of the next node if the upstream nodes/sub-paths that determine the occurrence of this node are not immediately connected with it. To solve this drawback, we propose in this paper a new workflow recommendation technique, called FlowRecommender. FlowRecommender features a more robust exploration capability to identify the upstream dependency patterns that are essential to the accuracy of workflow recommendation. These patterns are properly register offline to ensure a highly efficient online workflow recommendation. The experimental results confirm the promising effectiveness and efficiency of FlowRecommender. 1
Clear and Precise Specification of Ecological Data Management Processes and Dataset Provenance
"... Abstract—With the availability of powerful computational and communication systems, scientists now readily access large, complicated derived datasets and build on those results to produce, through further processing, yet other derived datasets of interest. The scientific processes used to create suc ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Abstract—With the availability of powerful computational and communication systems, scientists now readily access large, complicated derived datasets and build on those results to produce, through further processing, yet other derived datasets of interest. The scientific processes used to create such datasets must be clearly documented so that scientists can evaluate their soundness, reproduce the results, and build upon them in responsible and appropriate ways. Here, we present the concept of an analytic web, which defines the scientific processes employed and details the exact application of those processes in creating derived datasets. The work described here is similar to work often referred to as “scientific workflow, ” but emphasizes the need for a semantically rich, rigorously defined process definition language. We illustrate the information that comprises an analytic web for a scientific process that measures and analyzes the flux of water through a forested watershed. This is a complex and demanding scientific process that illustrates the benefits of using a semantically rich, executable
The Probabilistic Provenance Graph
"... Previous provenance models have assumed that there is complete certainty in the provenance rela-tionships. But what if this assumption does not hold? In this work, we propose a probabilistic provenance graph (PPG) model to characterize scenarios where provenance relationships are uncertain. We descr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Previous provenance models have assumed that there is complete certainty in the provenance rela-tionships. But what if this assumption does not hold? In this work, we propose a probabilistic provenance graph (PPG) model to characterize scenarios where provenance relationships are uncertain. We describe two motivating examples. The first example demon-strates the uncertainty associated with the prove-nance of an email. The second example demonstrates and characterizes the uncertainty associated with the provenance of statements in documents.
Identifying and Explaining Map Imperfections Through Knowledge Provenance Visualization
, 2007
"... Applications deployed on cyber-infrastructures often rely on multiple data sources and distributed compute resources to access, process, and derive results. When application results are maps, it is possible that non-intentional imperfections can get introduced into the map generation processes becau ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Applications deployed on cyber-infrastructures often rely on multiple data sources and distributed compute resources to access, process, and derive results. When application results are maps, it is possible that non-intentional imperfections can get introduced into the map generation processes because of several reasons including the use of low quality datasets, use of data filtering techniques incompatible for the kind of map to be generated, or even the use of inappropriate mapping parameters, e.g., low-resolution gridding parameters. Without some means for accessing and visualizing the provenance associated with map generation processes, i.e., metadata about information sources and methods used to derive the map, it may be impossible for most scientists to discern whether or not a map is of a required quality. Probe-It! is a tool that provides provenance visualization for results from cyber-infrastructure-based applications including maps. In this paper, we describe a quantitative user study on how Probe-It! can help scientists discriminate between quality maps and maps with known imperfections. The study had the participation of fifteen active scientists from five domains with different levels of expertise with regards to gravity data and GIS. The study demonstrates that a very small percentage of the scientists can identify imperfections using maps without the help of knowledge provenance. The study also demonstrates that most scientists, whether GIS experts, subject matter experts (i.e., experts on gravity data maps) or not, can identify and explain several kinds of map imperfections when using maps together with knowledge provenance visualization.