30 citations found. Retrieving documents...
P. Ranganathan, et. al., "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors", Intl. Conf. on Architectural support for programming languages and operating systems, pp. 307-318, 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Active Disk Architecture for Databases - Riedel, al. (1992)   (4 citations)  (Correct)

.... on disk support for selectively returning individual records to the host, at the cost of changing the physical layout of the data [Tamura99] Work at Digital, Compaq, and Rice University considered the memory and processor performance of large decision support systems, among other workloads [Barroso98, Ranganathan98]. The authors conclude that I O performance is not the primary determinant of large system performance, as improvements in disk arrays and software structures allow sufficient disk bandwidth to be added to a system until the processors are the bottleneck. Our work on Active Disks takes advantage ....

Ranganathan, P., Gharachorloo, K., Adve, S.V. and Barroso, L.A. "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors" ASPLOS, October 1998.


Using Destination-Set Prediction to Improve the.. - Martin, al. (2003)   (1 citation)  (Correct)

....the bandwidth efficiency of a directory protocol. This latency bandwidth tradeoff is especially important for the commercial workloads that dominate the current use of multiprocessor servers, since many of these workloads exhibit high cache miss rates and a large fraction of cache to cache misses [5, 18, 30]. One emerging approach for improving this latency bandwidth trade off is destination set prediction. Multicast Snooping [7] reduces bandwidth compared to broadcast snooping, by multicasting a coherence request to a predicted destination set. If the destination set is sufficient (e.g. includes ....

....in the next section, these workloads have a larger percentage of indirections, providing ample opportunity for destinationset prediction to improve their performance. 2. 3 Cache to Cache Misses Prior studies have shown that commercial workloads incur a large fraction of cache to cache misses [5, 18, 30]. Our results, shown in the rightmost column of Table 2 (column 7) corroborate these previous results by finding that 3596 of all L2 cache misses for our commercial workloads would suffer from indirection in a directory protocol. The high miss rate and high rate of indirections in commercial ....

[Article contains additional citation context not shown here]

P. Ranganathan, K. Gharachorloo, S. Adve, and L. Barroso. Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 307--318, Oct. 1998.


Compressibility Characteristics of Address/Data Transfers in - Commercial Workloads..   (Correct)

....mimics a parts ordering database system. The transactions include entering and delivering part orders, recording payments, checking the status of orders, and monitoring the level of stock at the warehouses. Some studies on the cache memory access characteristics of OLTP workloads can be found in [2, 18, 10]. For our study, we used TPC C traces from a 4P system. In addition to the above mentioned 32 bit systems, we also used a TPC C trace from a 4P 64 bit system for understanding the impact of compression schemes on 64 bit architectures. 4.2 Simulation Tools and Studies To perform the ....

P. Ranganathan, K. Gharachorloo, et al., "Performance of Database Workloads on Shared Memory Systems with Out-of-Order Processors," Proceedings of the Eighth International Conference on Architecture Support for Programming Languages and Operating Systems, Oct. 1998.


Architectural Support For User-Level Input/Output - Schaelicke (2001)   (Correct)

....commercial transactions performed over networks leads to higher performance requirements for database engines as well as to a growing market for commercial server systems. Databases, like Web and file servers, achieve high I O throughput by overlapping I O requests of independent transactions [9][85]. Commercial database servers usually run on a shared memory or clustered multiprocessor systems with a large number of disks. Hundreds of disks are required not only to store the database tables but also to provide sufficient parallelism and redundancy in the storage system to allow efficient ....

P. Ranganathan et al., "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors," Proc. 8th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), ACM Press, New York, N.Y., 1998, pp. 307-318.


Workload Characterization of Java Server Applications on.. - Seshadri, John, Mericas (2002)   (3 citations)  (Correct)

....benchmarks from SPECint2000, a suite of more traditional workloads. We run these benchmarks on two IBM PowerPC microarchitectures, the RS64 III and the POWER3 II. 2. Related Work Commercial workloads have been increasing in importance, and efforts have been made to understand their behavior [2,11,8,7,16,1]. Most of these studies have been focused on applications written in C or C , in particular OLTP, DSS, and web server applications. Java has also been a popular subject of research. The majority of Java studies use SPECjvm98 [17,9,15] which is a client benchmark suite. SPECjvm98 has been ....

P. Ranganathan, K. Gharachorloo, S.V. Adve and L.A. Barroso. Performance of Database Workloads on Shared-Memory Systems with Outof -Order Processors. In Proceedings of the 8 for Programming Languages and Operating Systems, October 1998, pp. 307-318.


Scalability and Resource Usage of an OLAP Benchmark on.. - Taufer, Stricker, Weber (2002)   (Correct)

.... in general do not result in large data transfers. Therefore, large scale OLTP setups are much harder to generate than a large scale OLAP job, and less interesting for computer architects. The accurate behavior of database workload for OLTP jobs is addressed for shared memory multiprocessors in [14]. We still use the TPC D [21] benchmark as a representative of OLAP applications for historical reasons but could easily migrate our approach to TPC H or TPC R and the qualitative aspects of this article would remain the same if these more recent benchmarks were used. Some of the newer work in ....

P. Ranganathan, K. Gharachorloo, S. Adve, and L. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proc. of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, October 1998.


Compressibility Characteristics of Address/Data Transfers in.. - Kant, Iyer   (Correct)

....the status of orders, and monitoring the level of stock at the warehouses. The setup of the TPC C benchmark basically consists of a single system where both client and server functionalities are run on. Some studies on the cache memory access characteristics of OLTP workloads can be found in [2, 18, 10]. For our study, we used TPC C traces from a 4P system. In addition to the above mentioned 32 bit systems, we also used a TPC C trace from a 4P 64 bit system for understanding the impact of compression schemes on 64 bit architectures. 4.2 Simulation Tools and Studies To perform the ....

P. Ranganathan, K. Gharachorloo, et al., "Performance of Database Workloads on Shared Memory Systems with Out-of-Order Processors," Proceedings of the Eighth International Conference on Architecture Support for Programming Languages and Operating Systems, Oct. 1998.


Performance and Memory-Access Characterization of - Data Mining Applications   (Correct)

....datasets. 1. Introduction Recent years have seen a significant increase in the use of computer systems for commercial, as opposed to scientific, applications. In response to this change, many recent papers have examined the performance and memory access characteristics of commercial applications [17, 2, 7, 8, 10, 9, 15, 5], allowing the characteristics of commercial applications to help guide design for more cost effective computer systems. These studies, however, have focused almost exclusively on the SQL database queries used in the TPC B, TPCC, and TPC D benchmark programs [18] Exceptions include [3, 2, 8, 5, ....

P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12--23, 1998.


Workload Characterization of Multithreaded Java Servers on.. - Seshadri, Mericas (2001)   (Correct)

....on two IBM PowerPC microarchitectures, the RS64 III and the POWER3 II, to determine the performance characteristics of multithreaded Java server applications. 2. Related Work Commercial workloads have been increasing in importance, and efforts have been made to understand their behavior [2,11,8,7,16,1]. Most of these studies have been focused on applications written in C or C , in particular OLTP, DSS, and web server applications. Java has also been a popular subject of research. The majority of Java studies use SPECjvm98 [17,9,15] which is a client benchmark suite. SPECjvm98 has been ....

P. Ranganathan, K. Gharachorloo, S.V. Adve and L.A. Barroso. Performance of Database Workloads on Shared-Memory Systems with Outof -Order Processors. In Proceedings of the 8 th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998.


Improving Index Performance through Prefetching - Chen, Gibbons (2000)   (11 citations)  (Correct)

.... While database researchers have historically focused on the importance of this latter form of caching (also known as the buffer pool ) recent studies have demonstrated that even on traditional disk oriented databases, roughly 50 or more of execution time is often wasted due to SRAM cache misses [1, 2, 10, 18]. For main memory databases, it is even clearer that SRAM cache performance is crucial [19] Hence several recent studies have revisited core database algorithms in an effort to make them more cache friendly [5, 17, 19, 20, 21] Permission to make digital or hard copies of all or part of this ....

P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors. In Proceedings of the 8th ASPLOS, pages 307--318, Oct. 1998.


Comparative Evaluation of Latency-Tolerating and -Reducing.. - Grahn, Stenström (2000)   (1 citation)  (Correct)

....and performance. They found, as we do, that the occupancy in the home node can be a performance bottleneck when prefetching is used. 828 GRAHN AND STENSTRO#M Migratory sharing is quite common in many applications beyond numerically intensive ones such as in SPLASH. Recently, Ranganathan et al. [33] showed that migratory sharing is a dominant performance bottleneck in OLTP workloads on multiprocessors. For such workloads, we would expect that software only protocol implementations augmented with migratory optimization techniques perform quite well in comparison with hardware centric ....

....implementation is fairly aggressive and the SPLASH 2 applications have very little migratory sharing. We expect to see larger benefits of the migratory optimization in less aggressive software only directory implementations and for applications with more migratory sharing, e.g. OLTP applications [33]. Release consistency, a technique which does not affect the protocol execution overhead, manages to hide all write stall time for both software only and hardwareonly directory protocols. Since the write stall time is longer for software only than for hardware only directory protocols, the ....

P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso, Performance of database workloads on shared-memory systems with out-of-order processors, in Proc. 8th Int.Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII)," pp. 307#318, October 1998.


Towards A Simplified Database Workload For Computer.. - Keeton, Patterson (2000)   (1 citation)  (Correct)

....processing, decision support, microbenchmark, and performance evaluation. 1. INTRODUCTION In the last five to ten years, several studies have explored the architectural characteristics of online transaction processing (OLTP) database workloads [3] 7] 8] 9] 16] 17] 18] 19] 22] [23] [24] 26] 27] 1 This work was performed as part of the author s dissertation research. The author s present address is: Storage Systems Program, Hewlett Packard Laboratories, 1501 Page Mill Road, M S 1U 13, Palo Alto, CA 94304 1126. Her current email address is kkeeton hpl.hp.com. 50 [33] and ....

....dissertation research. The author s present address is: Storage Systems Program, Hewlett Packard Laboratories, 1501 Page Mill Road, M S 1U 13, Palo Alto, CA 94304 1126. Her current email address is kkeeton hpl.hp.com. 50 [33] and decision support (DSS) database workloads [3] 5] 15] 17] 18] [23] [32] These studies used standard workloads defined by the Transaction Processing Performance Council (TPC) namely TPC B and TPC C for OLTP [10] and TPC D, TPC H and TPC R for DSS [10] 30] 31] Although these benchmarks specify well defined workloads, they pose several challenges for the ....

[Article contains additional citation context not shown here]

P. Ranganathan, et al. "Performance of database workloads on shared-memory systems with out-of-order processors," In Proc. of ASPLOS-VIII, October 1998.


Computer Architecture Support for Database Applications - Keeton (1999)   (3 citations)  (Correct)

....of a representative query, due to the complexity of simulation. For instance, Ranganathan et al. report simulating approximately 200 million instructions of one standard query (Q6 of the TPC D DSS benchmark) for their evaluation of an in memory database on out of order processor multiprocessors [83]. These 200 million instructions would take a fraction of a second on their 1 GHz simulated processor. This kernel analysis may be limiting, as query behavior often varies over the course of the query. Even if an entire query is simulated, execution variations may not be observed if the datasets ....

....and two update queries. We focus on a representative set of the read only queries chosen based on the variety of operations performed, query complexity and query duration. Several other studies have presented performance analyses of TPC D queries (e.g. Q1, Q4, Q5, Q6, Q8, Q13 for [13] Q6 for [83], and Q3, Q6, and Q12 for [106] Since the four processor system described in [13] is comparable to our system, we attempt to match this query set closely, for comparability of results. Our query set includes Q1, Q4, Q5, Q6, Q8, and Q11. We substitute Q11 for Q13 because Q13 s short duration ....

[Article contains additional citation context not shown here]

P. Ranganathan, et al. "Performance of database workloads on shared-memory systems with outof -order processors," Proc. of ASPLOS-VIII, October 1998.


Piranha: A Scalable Architecture Based on.. - Barroso.. (2000)   (53 citations)  (Correct)

.... and high communication miss rates which are characteristic for such workloads [4] Second, multiple instruction issue and out of order execution provide only small gains for workloads such as OLTP due to the data dependent nature of the computation and the lack of instruction level parallelism [35]. Third, commercial workloads do not have any use for the high performance floating point and multimedia functionality that is implemented in modern microprocessors. Therefore, it is not uncommon for a high end microprocessor to be stalling most of the time while executing commercial workloads, ....

....has already been referenced in earlier sections. We further discuss some of the previous work pertinent to database workloads and CMP in this section. There have been a large number of recent studies of database applications (both OLTP and DSS) due to the increasing importance of these workloads [4,7,8,12,21,27,28,34,35,36,42,46]. To the best of our knowledge, this is the first paper that provides a detailed evaluation of database workloads in the context of chip multiprocessing. Ranganathan et al. 35] study user level traces of database workloads in the context of wide issue out of order processors, and show that the ....

[Article contains additional citation context not shown here]

P. Ranganathan, K. Gharachorloo, S. Adve, and L. A. Barroso. Performance of Database Workloads on Shared- Memory Systems with Outof -Order Processors. In 8th International Conference on Architectural Support for Programming L anguages and O perating Systems, pages 307-318, October 1998.


Analysis of Commercial Workload on SMP Multiprocessors - Zhang, Zhu, Du (1999)   (1 citation)  (Correct)

....for performance evaluation. In addition, the experiments for case studies have been shown to be highly time consuming. So far, all published results on performance evaluation of commercial workloads on SMP multiprocessors are experimentally oriented and measurement based case studies (see e.g. [7,8,10]) If performance models are tractable and sufficiently accurate for the evaluation of commercial workloads, it is certainly cost effective and complementary to the measurement based experiments. In this paper, we present an analytical model to study the SMP architectural impacts on performance ....

P. Ranganathan, et. al., Performance of database workloads on shared-memory systems with out-of-order processors, in: Proc. 8th ASPLOS, 1998, pp. 307-318.


Active Disks - Remote Execution for Network-Attached Storage - Riedel (1999)   (18 citations)  (Correct)

.... work by members of this group has studied the performance of relational database code on modern multiprocessors in the context of a transaction processing workload [Keeton98a] A group at Rice and Compaq has provided a similar analysis for both decision support and transaction processing workloads [Ranganathan98]. Both of these studies focus strictly on the detailed processor performance, seeing the behavior of the input output system as secondary. 9.3.3 SmartSTOR Work at IBM Almaden and Berkeley analyzed the performance of the TPC D decision support queries and found that single table acceleration was ....

Ranganathan, P., Gharachorloo, K., Adve, S.V. and Barroso, L.A. "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors" ASPLOS, October 1998.


Using Switch Directories to Speed Up Cache-to-Cache Transfers .. - Ravi Iyer Laxmi (2000)   (1 citation)  (Correct)

....grants MIP 9622740, CCR9810205, and an IBM Partnership Award. This work was done while Ravi Iyer was at Texas A M University. directory lookup, several message transfers over the interconnect and coherence controller occupancies. This affects the performance of commercial workloads such as TPC C [6, 12] as well as several scientific applications [8, 4] In this paper, we propose a directory caching technique to reduce the cache to cache transfer latency in CC NUMA multiprocessors running scientific and commercial workloads. By embedding a small fast SRAM directory cache within each switch, ....

P. Ranganathan et al., "Performance of Database Workloads on Shared Memory Systems with Out-of-Order Processors, "Proceedings of 8th Conference on Architectural Support for Parallel Languages and Operating Systems, 1998.


Improving Performance of Load-Store Sequences for.. - Nilsson, Dahlgren (1999)   (1 citation)  (Correct)

....Barroso et al. showed in [1] that up to 90 of the total execution time of an OLTP application consists of stall time due to instruction and data cache misses. Furthermore, Ranganathan et al. pointed out the existence of migratory shared data as one possible origin of this performance bottleneck [2]. Migratory sharing [3] occurs when shared data tend to be manipulated by only one processor at a time. This manipulation can often be characterized by a load followed by a store to the same block by the same processor a load store sequence. Several previous studies have proposed and explored ....

....instruction overhead for some applications. The compiler based prefetching algorithm has so far not been applied to database workloads or operating systems. For database workloads such as DSS and OLTP, several studies have gained profound knowledge on their memory behavior and performance issues [1,2,14 16]. Ranganathan et al. 2] first recognized substantial amounts of migratory sharing in OLTP. While they used traces from a commercial database server (Oracle 7.3.2) they had no potential to relate their findings to the source code and they did not include the operating system execution in their ....

[Article contains additional citation context not shown here]

P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso, "Performance of Database Workloads on SharedMemory Systems with Out-of-Order Processors," Proc. of ASPLOS-8, Oct. 1998, pp. 307--318.


DBMSs On A Modern Processor: Where Does Time Go? - Ailamaki, DeWitt, Hill, Wood (1999)   (3 citations)  (Correct)

....running OLTP workloads. In the past two years, several interesting studies evaluated database workloads, mostly on multiprocessor platforms. Most of these studies evaluate OLTP workloads [4] 13] 10] a few evaluate decision support (DSS) workloads [11] and there are some studies that use both [2][16]. All of the studies agree that the DBMS behavior depends upon the nature of the workload (DSS or OLTP) that DSS workloads benefit more from out oforder processors with increased instruction level parallelism than OLTP, and that memory stalls are a major bottleneck. Although the list of ....

....System A did not use the index to execute this query. Although the workload is much simpler than TPC benchmarks [5] the computation time is usually less than half the execution time; thus, the processor spends most of the time stalled. Similar results have been presented for OLTP [21] 10] and DSS [16] workloads, although none of the studies measured more than one DBMS. The high processor stall time indicates the importance of further analyzing the query execution time. Even as processor clocks become faster, stall times are not expected to become much smaller because memory access times do not ....

[Article contains additional citation context not shown here]

P. Ranganathan, K. Gharachorloo, S. Adve, and L. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, October 1998.


DBMSs on modern processors: Where does time go? - Ailamaki, DeWitt, Hill, Wood (1999)   (3 citations)  (Correct)

....to cache misses when running OLTP workloads. In the past two years, several interesting studies evaluated database workloads, mostly on multiprocessor platforms. Most of these studies evaluate OLTP workloads [4] 13] 10] a few evaluate DSS workloads [11] and there are some studies that use both [2][16]. All of the studies agree that the DBMS behavior depends upon the workload, that DSS workloads benefit more than OLTP from out of order processors with increased instruction level parallelism, and that the memory stalls are a major bottleneck. Although the list of references presented here is not ....

....System A did not use the index to execute this query. Although the workload is much simpler than TPC benchmarks [5] the computation time is usually less than half the execution time; thus, the processor spends most of the time stalled. Similar results have been presented for OLTP [21] 10] and DSS [16] workloads, although none of the studies measured more than one DBMS. The high processor stall time indicates the importance of further analyzing the query execution time. Even as processor clock rates Indexed range selection 0 20 40 60 80 100 B C D Computation Memory stalls Branch ....

[Article contains additional citation context not shown here]

P. Ranganathan, K. Gharachorloo, S. Adve, and L. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, October 1998.


Impact of Chip-Level Integration on Performance of OLTP.. - Luiz Andr Barroso (2000)   (9 citations)  Self-citation (Gharachorloo Barroso)   (Correct)

....latencies. The smaller gains in the multiprocessor case are due to the longer remote miss latencies that are more difficult to hide with out of order execution. Our results for the relative gain from wide issue outof order execution are consistent with results presented by Ranganathan et al. [16] based on simulations of user level traces of OLTP. Our results, which are based on full system simulations of OLTP including kernel activity, show similarly small gains due 0 20 40 60 80 100 120 140 Normalized Execution Time LocStall 139 100 68 67 Base ....

....trade offs that arise in the integration of various systemlevel modules onto the processor chip, and quantifying the performance gains from such integration in the context of OLTP workloads. There have been a large number of recent studies of OLTP due to the increasing importance of this workload [1, 2, 3, 5, 8, 11, 12, 15, 16, 18]. Many of these studies emphasize the importance of memory system behavior on OLTP performance. Barroso et al. 1] provide performance results for various off chip L2 cache sizes, and recommend the use of large (8MB) direct mapped offchip caches. This recommendation is consistent with our obser ....

[Article contains additional citation context not shown here]

P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors. In Proceedings of the Eight International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998.


Store Memory-Level Parallelism Optimizations for Commercial .. - Yuan Chou Lawrence (2005)   (Correct)

No context found.

P. Ranganathan, et. al., "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors", Intl. Conf. on Architectural support for programming languages and operating systems, pp. 307-318, 1998.


Permission to Make Digital Or Hard Copies of All Or Part.. - Personal Or Classroom   (Correct)

No context found.

P. Ranganathan, K. Gharachorloo, S. Adve, and L. A. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the Eighth Symposium on Architectural Support for Programming Languages and Operating Systems, pages 307--318, Oct. 1998.


Using Interaction Costs for Microarchitectural.. - Fields, Bodik, Hill..   (Correct)

No context found.

P. Ranganathan, K. Gharachorloo, S. V. Adve, and L. A. Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. Oct 1998.


IEEE 34 Computer - Benchmarking Internet Servers   (Correct)

No context found.

P. Ranganathan et al., "Performance of Database Workloads on Shared-Memory Systems with Out-ofOrder Processors," Proc. 8th Int'l Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 98), ACM Press, 1998, pp. 144-156.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC