Results 1 - 10
of
46
Hybrid transactional memory
- IN PROC. 12TH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOSXII
, 2006
"... Transactional memory (TM) promises to substantially reduce the difficulty of writing correct, efficient, and scalable concurrent programs. But “bounded ” and “best-effort” hardware TM proposals impose unreasonable constraints on programmers, while more flexible software TM implementations are consid ..."
Abstract
-
Cited by 91 (10 self)
- Add to MetaCart
Transactional memory (TM) promises to substantially reduce the difficulty of writing correct, efficient, and scalable concurrent programs. But “bounded ” and “best-effort” hardware TM proposals impose unreasonable constraints on programmers, while more flexible software TM implementations are considered too slow. Proposals for supporting “unbounded” transactions in hardware entail significantly higher complexity and risk than best-effort designs. We introduce Hybrid Transactional Memory (HyTM), an approach to implementing TM in software so that it can use best-effort hardware TM (HTM) to boost performance but does not depend on HTM. Thus programmers can develop and test transactional programs in existing systems today, and can enjoy the performance benefits of HTM support when it becomes available. We describe our prototype HyTM system, comprising a compiler and a library. The compiler allows a transaction to be attempted using best-effort HTM, and retried using the software library if it fails. We have used our prototype to “transactify ” part of the Berkeley DB system, as well as several benchmarks. By disabling the optional use of HTM, we can run all of these tests on existing systems. Furthermore, by using a simulated multiprocessor with HTM support, we demonstrate the viability of the HyTM approach: it can provide performance and scalability approaching that of an unbounded HTM implementation, without the need to support all transactions with complicated HTM support.
LogTM-SE: Decoupling Hardware Transactional Memory from Caches
- In HPCA 13
, 2007
"... This paper proposes a hardware transactional memory (HTM) system called LogTM Signature Edition (LogTM-SE). LogTM-SE uses signatures to summarize a transaction’s readand write-sets and detects conflicts on coherence requests (eager conflict detection). Transactions update memory “in place ” after sa ..."
Abstract
-
Cited by 60 (10 self)
- Add to MetaCart
This paper proposes a hardware transactional memory (HTM) system called LogTM Signature Edition (LogTM-SE). LogTM-SE uses signatures to summarize a transaction’s readand write-sets and detects conflicts on coherence requests (eager conflict detection). Transactions update memory “in place ” after saving the old value in a per-thread memory log (eager version management). Finally, a transaction commits locally by clearing its signature, resetting the log pointer, etc., while aborts must undo the log. LogTM-SE achieves two key benefits. First, signatures and logs can be implemented without changes to highly-optimized cache arrays because LogTM-SE never moves cached data, changes a block’s cache state, or flash clears bits in the cache. Second, transactions are more easily virtualized because signatures and logs are software accessible, allowing the operating system and runtime to save and restore this state. In particular, LogTM-SE allows cache victimization, unbounded nesting (both open and closed), thread context switching and migration, and paging. 1
Privatization techniques for software transactional memory
- In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing (PODC’07
"... Early implementations of software transactional memory (STM) assumed that sharable data would be accessed only within transactions. Memory may appear inconsistent in programs that violate this assumption, even when program logic would seem to make extra-transactional accesses safe. Designing STM sys ..."
Abstract
-
Cited by 50 (17 self)
- Add to MetaCart
Early implementations of software transactional memory (STM) assumed that sharable data would be accessed only within transactions. Memory may appear inconsistent in programs that violate this assumption, even when program logic would seem to make extra-transactional accesses safe. Designing STM systems that avoid such inconsistency has been dubbed the privatization problem. We argue that privatization comprises a pair of symmetric subproblems: private operations may fail to see updates made by transactions that have committed but not yet completed; conversely, transactions that are doomed but have not yet aborted may see updates made by private code, causing them to perform erroneous, externally visible operations. We explain how these problems arise in different styles of STM, present strategies to address them, and discuss their implementation tradeoffs. We also propose a taxonomy of contracts between the system and the user, analogous to programmer-centric memory consistency models, which allow us to classify programs based on their privatization requirements. Finally, we present empirical comparisons of several privatization strategies. Our results suggest that the best strategy may depend on application characteristics.
High-level small-step operational semantics for transactions (technical companion
, 2007
"... Software transactions have received significant attention as a way to simplify shared-memory concurrent programming, but insufficient focus has been given to the precise meaning of software transactions or their interaction with other language features. This work begins to rectify that situation by ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
Software transactions have received significant attention as a way to simplify shared-memory concurrent programming, but insufficient focus has been given to the precise meaning of software transactions or their interaction with other language features. This work begins to rectify that situation by presenting a family of formal languages that model a wide variety of behaviors for software transactions. These languages abstract away implementation details of transactional memory, providing high-level definitions suitable for programming languages. We use small-step semantics in order to represent explicitly the interleaved execution of threads that is necessary to investigate pertinent issues. We demonstrate the value of our core approach to modeling transactions by investigating two issues in depth. First, we consider parallel nesting, in which parallelism and transactions can nest arbitrarily. Second, we present multiple models for weak isolation, in which nontransactional code can violate the isolation of a transaction. For both, type-and-effect systems let us soundly and statically restrict what computation can occur inside or outside a transaction. We prove some key language-equivalence theorems to confirm that under sufficient static restrictions, in particular that each mutable memory location is used outside transactions or inside transactions (but not both), no program can determine whether the language implementation uses weak isolation or strong isolation.
Enforcing isolation and ordering in STM
- In the Proceedings of the Conf. on Programming Language Design and Implementation
, 2007
"... Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can resu ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
Transactional memory provides a new concurrency control mechanism that avoids many of the pitfalls of lock-based synchronization. High-performance software transactional memory (STM) implementations thus far provide weak atomicity: Accessing shared data both inside and outside a transaction can result in unexpected, implementation-dependent behavior. To guarantee isolation and consistent ordering in such a system, programmers are expected to enclose all shared-memory accesses inside transactions. A system that provides strong atomicity guarantees isolation even in the presence of threads that access shared data outside transactions. A strongly-atomic system also orders transactions with conflicting non-transactional memory operations in a consistent manner. In this paper, we discuss some surprising pitfalls of weak atomicity, and we present an STM system that avoids these problems
Making the fast case common and the uncommon case simple in unbounded transactional memory
- In ISCA
, 2007
"... Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, allowing programmers to exploit more effectively the soon-to-be-ubiquitous multi-core designs. Several recent proposals have extended the original bounded transactional memory ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
Hardware transactional memory has great potential to simplify the creation of correct and efficient multithreaded programs, allowing programmers to exploit more effectively the soon-to-be-ubiquitous multi-core designs. Several recent proposals have extended the original bounded transactional memory to unbounded transactional memory, a crucial step toward transactions becoming a generalpurpose primitive. Unfortunately, supporting the concurrent execution of an unbounded number of unbounded transactions is challenging, and as a result, many proposed implementations are complex. This paper explores a different approach. First, we introduce the permissions-only cache to extend the bound at which transactions overflow to allow the fast, bounded case to be used as frequently as possible. Second, we propose ONETM to simplify the implementation of unbounded transactional memory by bounding the concurrency
Revisiting the sequential programming model for multi-core
- In Proceedings of the 40th Annual ACM/IEEE International Symposium on Microarchitecture
, 2007
"... Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased debugging of the program, and efforts to avoid race ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased debugging of the program, and efforts to avoid race conditions, deadlocks, and other problems associated with parallel programming. To address these costs, other approaches, such as automatic thread extraction, have been explored. Unfortunately, the amount of parallelism that has been automatically extracted is generally insufficient to keep many cores busy. This paper argues that this lack of parallelism is not an intrinsic limitation of the sequential programming model, but rather occurs for two reasons. First, there exists no framework for automatic thread extraction that brings together key existing state-of-the-art compiler and hardware techniques. This paper shows that such a framework can yield scalable parallelization on several SPEC CINT2000 benchmarks. Second, existing sequential programming languages force programmers to define a single legal program outcome, rather than allowing for a range of legal outcomes. This paper shows that natural extensions to the sequential programming model enable parallelization for the remainder of the SPEC CINT2000 suite. Our experience demonstrates that, by changing only 60 source code lines, all of the C benchmarks in the SPEC CINT2000 suite were parallelizable by automatic thread extraction. This process, constrained by the limits of modern optimizing compilers, yielded a speedup of 454 % on these applications. 1
JudoSTM: A Dynamic Binary-Rewriting Approach to Software Transactional Memory
- In PACT ’07: Proc. of the 16th Intl. Conf. on Parallel Architecture and Compilation Techniques (PACT 2007), Brasov
, 2007
"... With the advent of chip-multiprocessors, we are faced with the challenge of parallelizing performance-critical software. Transactional memory (TM) has emerged as a promising programming model allowing programmers to focus on parallelism rather than maintaining correctness and avoiding deadlock. Many ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
With the advent of chip-multiprocessors, we are faced with the challenge of parallelizing performance-critical software. Transactional memory (TM) has emerged as a promising programming model allowing programmers to focus on parallelism rather than maintaining correctness and avoiding deadlock. Many implementations of hardware, software, and hybrid support for TM have been proposed; of these, software-only implementations (STMs) are especially compelling since they can be used with current commodity hardware. However, in addition to higher overheads, many existing STM systems are limited to either managed languages or intrusive APIs. Furthermore, transactions in STMs cannot normally contain calls to unobservable code such as shared libraries or system calls. In this paper we present JudoSTM, a novel dynamic binary-rewriting approach to implementing STM that supports C and C++ code. Furthermore, by using value-based conflict detection, JudoSTM additionally supports the transactional execution of both (i) irreversible system calls and (ii) library functions that may contain locks. We significantly lower overhead through several novel optimizations that improve the quality of rewritten code and reduce the cost of conflict detection and buffering. We show that our approach performs comparably to Rochester’s RSTM library-based implementation—demonstrating that a dynamic binary-rewriting approach to implementing STM is an interesting alternative. 1.
TxLinux: Using and Managing Hardware Transactional Memory in an Operating System
- In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles
, 2007
"... TxLinux is a variant of Linux that is the first operating system to use hardware transactional memory (HTM) as a synchronization primitive, and the first to manage HTM in the scheduler. This paper describes and measures TxLinux and discusses two innovations in detail: cooperation between locks and t ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
TxLinux is a variant of Linux that is the first operating system to use hardware transactional memory (HTM) as a synchronization primitive, and the first to manage HTM in the scheduler. This paper describes and measures TxLinux and discusses two innovations in detail: cooperation between locks and transactions, and the integration of transactions with the OS scheduler. Mixing locks and transactions requires a new primitive, cooperative transactional spinlocks (cxspinlocks) that allow locks and transactions to protect the same data while maintaining the advantages of both synchronization primitives. Cxspinlocks allow the system to attempt execution of critical regions with transactions and automatically roll back to use locking if the region performs I/O. Integrating the scheduler with HTM eliminates priority inversion. On a series of real-world benchmarks TxLinux has similar performance to Linux, exposing concurrency with as many as 32 concurrent threads on 32 CPUs in the same critical region.
An Integrated Hardware-Software Approach to Flexible Transactional Memory
, 2006
"... There has been considerable recent interest in both hardware and software transactional memory (TM). We present an intermediate approach, in which hardware serves to accelerate a TM implementation controlled fundamentally by software. Specifically, we describe an alert on update mechanism (AOU) that ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
There has been considerable recent interest in both hardware and software transactional memory (TM). We present an intermediate approach, in which hardware serves to accelerate a TM implementation controlled fundamentally by software. Specifically, we describe an alert on update mechanism (AOU) that allows a thread to receive fast, asynchronous notification when previously-identified lines are written by other threads, and a programmable data isolation mechanism (PDI) that allows a thread to hide its speculative writes from other threads, ignoring conflicts, until software decides to make them visible. These mechanisms reduce bookkeeping, validation, and copying overheads without constraining software policy on a host of design decisions. We have used AOU and PDI to implement a hardwareaccelerated software transactional memory system we call RTM.

