| M. Beynon, R. Ferreira, T. M. Kurc, A. Sussman, and J. H. Saltz. Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In Symp. on Mass Storage Systems, pages 119--134. IEEE, 2000. |
....In addition to dQUOB, there are several software systems that support processing near the data server to either reduce network traffic or distribute computational load. Two such systems from the University of Maryland are MOCHA (middleware based on code shipping architecture) 34] and DataCutter [35]. Each provide capability to move filtering code close to the data, but the decision about where to place the code is made by the application. In situations where many applications compete for the server s resources, placing the filtering code on the server may actually slow performance rather ....
M.D. Beynon, R. Ferreira, T. Kurc, A. Sussman, J. Saltz, DataCutter: Middleware for filtering very large scientific datasets on archival storage systems, in: Proceedings of the 2000.
....that potentially execute near the data storage devices. PS presents a nice programming model for data intensive applications; however, they lack the flexibility to provide application specific interfaces and the user has no control over the placement decisions of the system. DataCutter [1], developed at the University of Maryland, is middleware used to explore and analyze scientific datasets stored on archival storage systems across a widearea network. DataCutter provides a query based interface with support for accessing subsets of datasets and for performing user defined ....
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the 2000.
....and resource sharing among autonomous groups at an unprecedented scale. Scientific and commercial data are undoubtedly recognized as invaluable resources, and several new systems have been proposed to facilitate wide area information sharing in a secure, flexible and cost e#ective way [6, 7, 10, 15]. On the other hand, even though network bandwidth is not at scarcity, its cost is not negligible either. When also considering latencies due to the speed of light, it makes sense to move data processing close to the storage servers. Lerna: Haunt of a gigantic monster with nine heads (Greek ....
....files depending on the desirable degree of flexibility and e#ciency. 4 A Multi Layer Map Warehouse Online generation of geographical, astronomical or biomedical maps generally can consume a large amount of processing and bandwidth resources depending on the size of the datasets involved [5, 10, 30]. In the past, the problem of handling terabyte sized datasets has been kept manageable by i) limiting the accessible number of layers to one, and ii) producing raster images of all the supported resolutions o#ine so that map rendering would be reduced to data retrieval. In that sense, large scale ....
[Article contains additional citation context not shown here]
Beynon, M., Ferreira, R., Kure, T., Sussman, A., and Saltz, J. DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems. In IEEE Symposium on Mass Storage Systems (College Park, MD, Mar. 2000), pp. 119--133.
....projects. Experience with these applications will motivate further refinements and additions to the services described here. We are already planning extensions, such as automated replica management, community based access control, automated buffer size negotiation, and server side data reduction [17]. Acknowledgements We are grateful to Marcus Thiebaux and Soonwook Hwang for their work characterizing the performance of LDAP servers; to Brian Toonen, who helped to optimize the GridFTP code; to Gail Pieper, Laura Pearlman and Ewa Deelman for comments on this paper; and to the many colleagues ....
M. Beynon, R. Ferreira, T. Kurc, A. Sussman, and Saltz, J., "DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems," Proc. 8th Goddard Conference on Mass Storage Systems and Technologies/17th IEEE Symposium on Mass Storage Systems, 2000, 119-133.
....For example, large distributed simulations [6] can require access to many large computational resources at one time. On line experiments [38] require that computational resources be available when the experiment is being conducted, and processing pipelines such as data transfer [21] data analysis [25, 3] and visualization pipelines [8] require simulations access to a balanced resource set. Given that each of the resources in question may be owned and operated by a different provider, establishing a single SLA across all of the desired resources is not possible. One solution to this problem is to ....
M. Beynon, R. Ferreira, T. M. Kurc, A. Sussman, and J. H. Saltz. Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In IEEE Symposium on Mass Storage Systems, pages 119--134, 2000.
....projects. Experience with these applications will motivate further refinements and additions to the services described here. We are already planning extensions, such as automated replica management, community based access control, automated buffer size negotiation, and server side data reduction [14]. Acknowledgements We are grateful to Marcus Thiebaux and Soonwook Hwang for their work characterizing the performance of LDAP servers; to Brian Toonen, who helped to optimize the GridFTP code; to Gail Pieper, Laura Pearlman and Ewa Deelman for comments on this paper; and to the many colleagues ....
M. Beynon, R. Ferreira, T. Kurc, A. Sussman, and Saltz, J., "DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems," Proc. 8th Goddard Conference on Mass Storage Systems and Technologies/17th IEEE Symposium on Mass Storage Systems,
....file systems, Unitree, HPSS and database objects. However, it does not fully support the optimizations implemented in MPIIO. Shoshani et al. 28, 29] describe an architecture for op6 timizing access to large volumes of scientific data stored on tapes. The Active Data Repository [17] and DataCutter [4] optimize storage, retrieval, and processing of very large multidimensional datasets. The main difference between our work and other efforts in I O is that SDM aims to combine the good features of parallel file I O and databases, whereas other efforts focus on either parallel I O or data ....
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems. In Proceedings of the Eighth Goddard Conference on Mass Storage Systems and Technologies, March 2000.
....resources. Storage resources: Mechanisms are required for putting and getting files. Third party and high performance (e.g. striped) transfers are useful [61] So are mechanisms for reading and writing subsets of a file and or executing remote data selection or reduction functions [14]. Management mechanisms that allow control over the resources allocated to data transfers (space, disk bandwidth, network bandwidth, CPU) are useful, as are advance reservation mechanisms. Enquiry functions are needed for determining hardware and software characteristics as well as relevant load ....
Beynon, M., Ferreira, R., Kurc, T., Sussman, A. and Saltz, J., DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems. In Proc. 8th Goddard Conference on Mass Storage Systems and Technologies/17th IEEE Symposium on Mass Storage Systems, 2000, 119-133.
....application: data flows into local servers from each sensor, which create on disk sensor logs. At some later point, this data is shipped to a central repository, which scientists then connect to and execute large aggregate and subset queries to facilitate their data exploration. The DataCutter [4] system exemplifies current research in this area: it focuses on providing an efficient platform in which scientists can collect data from multiple repositories and then efficiently aggregate and subset that data across multiple dimensions. This model is very different from the interactive sensor ....
M. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the 2000 Mass Storage Systems Conference. IEEE Computer Society Press, March 2000.
....[36] and remote I O [34] However, providing support for efficient subsetting and processing of very large scientific datasets stored in archival storage systems in a distributed environment remains a challenging research issue. We have developed a middleware infrastructure, called DataCutter [4] [6], 8] that enables processing of scientific datasets stored in archival storage systems in a distributed, heterogeneous environment. In this section we describe an implementation of the Virtual Microscope server using the DataCutter infrastructure. We compare the DataCutter implementation of VM ....
....client using the standard Virtual Microscope client server protocol. 1) Serving Digitized Microscopy Images from an Archival Storage System: We have implemented a simple data server for digitized microscopy images, stored in the IBM HPSS archival storage system at the University of Maryland [4] [6]. The HPSS setup has 10 terabytes (1TB is 1000GB) of tape storage space, 500GB of disk cache, and is accessed through a 10 node IBM SP multicomputer. One node of the SP is used to run the filter that carries out index lookup, and the client was run on a SUN workstation connected to the SP node ....
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the Eighth Goddard Conference on Mass Storage Systems and Technologies/17th IEEE Symposium on Mass Storage Systems, pages 119--133. National Aeronautics and Space Administration, Mar. 2000. NASA/CP 2000-209888.
....that make up the query processing structure. In this work, we expect the application developer to present to the system a functional description of the query type to be registered. This assumption is also made by other frameworks that require the functional decomposition of a complex computation [3, 12, 31, 32]. The execution chain of a query type q i is represented by a directed acyclic graph G i (V; E) referred to as a query graph. A vertex represents a function primitive and an edge corresponds to a data dependency between the two primitives sharing the edge. An edge is marked with a cacheable ....
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the Eighth Goddard Conference on Mass Storage Systems and Technologies/17th IEEE Symposium on Mass Storage Systems, pages 119--133, College Park, MD, March 2000.
....party application developer that wants to be able to exchange data with the library) The interface functions provide information that allows the meta library to inquire about the location (processor and local address) of data distributed by a given data parallel library. DataCutter. DataCutter [12, 11] is an application framework, under development at University of Maryland, that provides support for developing data intensive applications that make use of scientific datasets in remote archival storage systems across a wide area network. To make efficient use of distributed shared resources, the ....
....DataCutter also provides support for subsetting very large datasets through multi dimensional range queries. It uses a multi level hierarchical indexing scheme, based on R tree indexing methods, to ensure scalability to very large datasets. The basic ideas underlying the filter stream model [12] are to (1) constrain application components to allow for location independence, which is necessary for execution in a distributed environment, and (2) expose application communication patterns and resource requirements, allowing a runtime system to aid in efficient execution. The programming ....
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In MASS
....developing applications in distributed, heterogeneous environments. In this model, the processing structure of an application is represented as multiple objects that interact with each other by moving data and control information. We have developed a component based framework, called DataCutter [11,12], for developing data intensive applications in a distributed environment. The framework is built upon prior work in our Active Disks [2,63] and Active Data Repository projects. As was described in the previous section, the ADR framework aims to realize performance gains by executing ....
....environments by allowing decomposition of application specific reduction operations into a set of interacting components, which we refer to as filters. The goal is to achieve performance improvements by providing the flexibility to (1) place components among storage and compute nodes in a system [11], and (2) instantiate and run multiple copies of a group of components or copies of individual components in parallel [13] The middleware we have developed 19 provides two core services: an indexing service for subsetting of datasets via range queries, and a filtering service for instantiating ....
[Article contains additional citation context not shown here]
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the Eighth Goddard Conference on Mass Storage Systems and Technologies/17th IEEE Symposium on Mass Storage Systems, pages 119--133. National Aeronautics and Space Administration, Mar.
....progresses from the data source(s) to the client, and on the ability to move all or part of its computations to other machines that are well suited for the computation. Recent research on programming models for developing applications in the Grid has converged on the use of component based models [2, 4, 5, 6, 7, 9, 10], in which an application is composed of multiple interacting computational objects. In the DataCutter project [4, 5, 8] we are developing a framework, called filter stream programming, for developing data intensive applications in a distributed environment. This model represents components of a ....
....well suited for the computation. Recent research on programming models for developing applications in the Grid has converged on the use of component based models [2, 4, 5, 6, 7, 9, 10] in which an application is composed of multiple interacting computational objects. In the DataCutter project [4, 5, 8], we are developing a framework, called filter stream programming, for developing data intensive applications in a distributed environment. This model represents components of a dataintensive application as a set of filters, which are designed to be efficient in their use of resources. Data ....
[Article contains additional citation context not shown here]
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the 2000 Mass Storage Systems Conference, College Park, MD, March 2000.
....rather than an attempt to extract paralellism. 4. A Prototype Infrastructure In this section, we describe a prototype infrastructure implementation that provides support for execution of applications developed using the filter stream framework. This work is part of the DataCutter project [6], that provides 4 services for subsetting and processing multi dimensional datasets stored on archival storage systems. 4.1. Filters A filter is specified by the code to execute, and a description of the input and output streams it will use. Currently, filter code is expressed using a C ....
....specified in the query. The resulting image blocks are directly sent to the client. The client viewer assembles and displays the image blocks from each of the backend processes to form the query output. 5.2. Filter Implementation The filter decomposition used for the Virtual Microscope system [6] is shown in Figure 5. This filter pipeline structure is natural for query response applications. The figure only depicts the main dataflow path of image data through the system; other low volume streams related to the clientserver protocol are not shown for clarity. The thickness of the stream ....
[Article contains additional citation context not shown here]
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman,and J. Saltz. DataCutter: Middleware for filtering very large scientific
....rather than an attempt to extract paralellism. 4. A Prototype Infrastructure In this section, we describe a prototype infrastructure implementation that provides support for execution of applications developed using the filter stream framework. This work is part of the DataCutter project [6], that provides services for subsetting and processing multi dimensional datasets stored on archival storage systems. 4.1. Filters A filter is specified by the code to execute, and a description of the input and output streams it will use. Currently, filter code is expressed using a C language ....
....specified in the query. The resulting image blocks are directly sent to the client. The client viewer assembles and displays the image blocks from each of the backend processes to form the query output. 5.2. Filter Implementation The filter decomposition used for the Virtual Microscope system [6] is shown in Figure 5. This filter pipeline structure is natural for query response applications. The figure only depicts the main dataflow path of image data through the system; other low volume streams related to the clientserver protocol are not shown for clarity. The thickness of the stream ....
[Article contains additional citation context not shown here]
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the 2000 Mass Storage Systems Conference, College Park, MD, March 2000. IEEE Computer Society Press. To appear.
No context found.
M. Beynon, R. Ferreira, T. M. Kurc, A. Sussman, and J. H. Saltz. Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In Symp. on Mass Storage Systems, pages 119--134. IEEE, 2000.
No context found.
M.D. Beynon, R. Ferreira, T. Kurc, A. Sussman, J. Saltz. "DataCutter: Middleware for filtering very large scientific datasets of archival storage systems". in In Proceedinfs of the 2000 Mass Storage Systems Conference, Tpages 119-133. College Park, MD, March 2000. IEEE Computer Society Press
No context found.
M. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the 2000.
No context found.
M. Beynon, R. Ferreira, T. Kurc, and J. Saltz, \DataCutter: Middleware for Filtering Very Large Scienti c Datasets 10 on Archival Storage Systems," in The Eighth Goddard Conference on Mass Storage Systems and Technologies/17th IEEE Symposium on Mass Storage Systems, College Park, Maryland, USA, March 2000.
No context found.
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proc. Mass Storage Systems Conference, pages 119--133, College Park, MD, March 2000. IEEE Computer Society Press.
No context found.
Beynon, M., Ferreira, R., Kurc, T. M., Sussman, A., and Saltz, J. H. DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems. In IEEE Symposium on Mass Storage Systems (College Park, MD, Mar. 2000), pp. 119--134.
No context found.
M. Beynon, R. Ferreira, T. M. Kurc, A. Sussman, and J. H. Saltz. Datacutter: Middleware for filtering very large scientific datasets on archival storage systems. In IEEE Symposium on Mass Storage Systems, pages 119--134, 2000.
No context found.
M. D. Beynon, R. Ferreira, T. Kurc, A. Sussman, and J. Saltz. DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In Proceedings of the 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC