11 citations found. Retrieving documents...
T. Barclay, R. Barnes, J. Gray, P. Sundaresan, Loading Databases Using Dataflow Parallelism, SIGMOD Record, 23(4), December 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Researching System Administration - Anderson   (Correct)

....of parallel databases found in the literature, including Gamma [CABK88] Volcano [Gra90] and Bubba [DGS88] These systems all use similar techniques to distribute data among processes. Both the Gamma split table, Volcano exchange operators, and a generalized split table known as a river in [BBGS94] are used to move data between producers and consumers in a distributed memory machine. However, all use static data partitioning techniques, such as hash partitioning, range partitioning, or round robin. These functions all do not adapt at run time to load variations among consumers. Current ....

Tom Barclay, Robert Barnes, Jim Gray, and Prakash Sundaresan. Loading Databases Using Dataflow Parallelism. SIGMOD Record (ACM Special Interest Group on Management of Data), 23(4):72--83, December 1994.


SNOWBALL: Scalable Storage on Networks of Workstations .. - Vingralek, Breitbart, .. (1995)   (5 citations)  (Correct)

....of nodes, and their size is likely to grow in the future. For this reason the management of such complex systems becomes increasingly difficult. Thus, developing software to automate management of networks of workstations and provide a logical view of a monolithic system is extremely important [BBGS94, ACP95]. In this paper we design a distributed data manager that exploits the aggregate I O resources of networks of workstations, encapsulates the distributed nature of the underlying model, and guarantees close to optimal load distribution among the nodes of a NOW. An important application class that ....

T. Barclay, R. Barnes, J. Gray, P. Sundaresan, Loading Databases Using Dataflow Parallelism, Sigmod Record, Vol. 23, No. 4, 1994.


Cluster I/O with River: Making the Fast Case Common - Arpaci-Dusseau, Anderson..   (Correct)

....a number of parallel databases found in the literature, including Gamma [15] Volcano [24] and Bubba [20] These systems all use similar techniques to distribute data among processes. Both the Gamma split table, Volcano exchange operators, and a generalized split table known as a river in [5], are used to move data between producers and consumers in a distributed memory machine; however, all use static data partitioning techniques, such as hash partitioning, range partitioning, or round robin. These functions all do not adapt at run time to load variations among consumers. Current ....

.... ultimate goal is to write applications to the River interface that not only have robust performance, but also can continue operation under machine failure, similar to work in other dynamic programming environments [8, 43] Some form of automatic check pointing may be the solution, as suggested in [5]. We also believe River is well suited to a large class of external, distributed applications, including traditional scientific codes and perhaps multimedia programs as well. Some evidence for this exists in the literature about Volcano [51] where scientific data intensive applications are ....

T. Barclay, R. Barnes, J. Gray, and P. Sundaresan. Loading Databases Using Dataflow Parallelism. SIGMOD Record (ACM Special Interest Group on Management of Data), 23(4):72--83, December 1994.


Cluster I/O with River: Making the Fast Case Common - Arpaci-Dusseau, Anderson..   (Correct)

....are a number of parallel databases found in the literature, including Gamma [14] Volcano [23] and Bubba [19] These systems all use similar techniques to distribute data among processes. Both the Gamma split table, Volcano exchange operators, and a generalized split table known as a river in [5], are used to move data between producers and consumers in a distributed memory machine; however, all use static data partitioning techniques, such as hash partitioning, range partitioning, or round robin. These functions all do not adapt at run time to load variations among consumers. Current ....

T. Barclay, R. Barnes, J. Gray, and P. Sundaresan. Loading Databases Using Dataflow Parallelism. SIGMOD Record (ACM Special Interest Group on Management of Data), 23(4):72--83, December 1994.


SNOWBALL: Scalable Storage on Networks of Workstations .. - Vingralek, Breitbart, .. (1995)   (5 citations)  (Correct)

....of nodes, and their size is likely to grow in the future. For this reason the management of such complex systems becomes increasingly difficult. Thus, developing software to automate management of networks of workstations and provide a logical view of a monolithic system is extremely important [BBGS94, ACP95]. In this paper we design a distributed file manager that exploits the aggregate I O resources of networks of workstations, encapsulates the distributed nature of the underlying model, and guarantees close to optimal load distribution among the nodes of a NOW. An important application class that ....

T. Barclay, R. Barnes, J. Gray, P. Sundaresan, Loading Databases Using Dataflow Parallelism, Sigmod Record, Vol. 23, No. 4, 1994.


SNOWBALL: Scalable Storage on Networks of Workstations.. - Vingralek, Breitbart (1995)   (5 citations)  (Correct)

....of nodes, and their size is likely to grow in the future. For this reason the management of such complex systems becomes increasingly difficult. Thus, developing software to automate management of networks of workstations and provide a logical view of a monolithic system is extremely important [BBGS94, ACP95]. In this paper we design a distributed file manager that exploits the aggregate I O resources of networks of workstations, encapsulates the distributed nature of the underlying model, and guarantees close to optimal load distribution among the nodes of a NOW. An important application class that ....

T. Barclay, R. Barnes, J. Gray, P. Sundaresan, Loading Databases Using Dataflow Parallelism, Sigmod Record, Vol. 23, No. 4, 1994.


Structure and Performance of Decision Support Algorithms.. - Uysal, Acharya, Saltz (1998)   (5 citations)  (Correct)

....in experimental systems, e.g. the E programming language used in EXODUS [41] Genesis [12] and Starburst [25] Operators implemented in this model are called iterators, streams, synchronous pipelines, row sources, or similar names in the lingo of commercial systems . According to Barclay et al. [11], relational databases are ideally suited to dataflow approach and that the database community has adopted a dataflow approach to describe and implement parallel algorithms . The stream based programming model proposed for Active Disks closely resembles the operator iterator based model used ....

....optimizing I O intensive algorithms is often a matter of setting up efficient pipelines where each stage performs some processing on the data being read from disk and passes it on to the next stage [1, 7] As a result, dataflow based models have been proposed by several researchers. Barclay et al. [11] proposed a dataflow based technique for parallelizing the loading of a large database. Similar techniques are used by the Gamma [17] and Volcano [20] parallel databases. Recently, Arpaci Dusseau et al. [8] have proposed a dataflow based programming model for scheduling I O intensive tasks on ....

T. Barclay, R. Barnes, J. Gray, and P. Sundaresan. Loading databases using dataflow parallelism. SIGMOD Record, 23(4):72--83, 1994.


Designing and Mining Multi-Terabyte Astronomy.. - Szalay, Kunszt.. (2000)   (32 citations)  Self-citation (Gray)   (Correct)

....the more general dataflow programming model in which data flows from storage through various processing steps. Each step is amenable to partition parallelism. The underlying system manages the creation and processing of the flows. This programming style has evolved both in the database community [4, 5, 9] and in the scientific programming community with PVM and MPI [8] This has evolved to a general programming model as typified by a river system [2, 3, 4] We propose to let astronomers construct dataflow graphs where the nodes consume one or more data streams, filter and combine the data, and ....

....The underlying system manages the creation and processing of the flows. This programming style has evolved both in the database community [4, 5, 9] and in the scientific programming community with PVM and MPI [8] This has evolved to a general programming model as typified by a river system [2, 3, 4]. We propose to let astronomers construct dataflow graphs where the nodes consume one or more data streams, filter and combine the data, and then produce one or more result streams. The outputs of these rivers either go back to the database or to visualization programs. These dataflow graphs ....

Barclay, T. Barnes, R., Gray, J., Sundaresan, P., "Loading Databases Using Dataflow Parallelism." SIGMOD Record 23(4): 72-83 (1994)


Performance Availability for Networks of Workstations - Arpaci-Dusseau (1999)   (4 citations)  Self-citation (Gray)   (Correct)

....3.2.2 Parallel Databases Large scale I O operations are common not only in parallel file systems but in parallel database systems as well. There are a number of parallel databases found in the literature, including Gamma [41] Volcano [56] the parallel load prototype from the Digital Rdb project [14], and Bubba [35] Many of these systems are based on similar data flow techniques, where parallel queries are described as a directed graph that connect different sequential data operators. Gamma: Gamma is a parallel database system developed at Wisconsin [41] The initial prototype was developed ....

....first extracts field values from the record. Then it compares these values to values in the split table to pick a destination stream. The split table can be a range partitioning, a hash partitioning, a round robin, or even a replication (in which input records are sent to all sink operators) [14], page 2, paragraph 7. Once again, these static techniques do not provide performance availability, and will run at the rate of the slowest sink . As the authors themselves state: If different nodes have different speeds and different amounts of memory, then it is no longer straight forward ....

[Article contains additional citation context not shown here]

Tom Barclay, Robert Barnes, Jim Gray, and Prakash Sundaresan. Loading Databases Using Dataflow Parallelism. SIGMOD Record (ACM Special Interest Group on Management of Data), 23(4):72--83, December 1994.


Designing and Mining Multi-Terabyte Astronomy Archives.. - Alexander Szalay Szalay (2000)   (32 citations)  Self-citation (Gray)   (Correct)

No context found.

T. Barclay, R. Barnes, J. Gray, P. Sundaresan, "Loading Databases Using Dataflow Parallelism.", SIGMOD Record 23(4): 72-83 (1994)


Research Issues in Data Warehousing - Wu, Buchmann (1997)   (18 citations)  (Correct)

No context found.

T. Barclay, R. Barnes, J. Gray, P. Sundaresan, Loading Databases Using Dataflow Parallelism, SIGMOD Record, 23(4), December 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC