Parallel programming has become increasingly important both as a programming skill and as a research topic over the last decades. Multicore computers and mobile devices are part of our daily lives, and computer clusters facilitate research in many areas all over the world. Distributed shared memory (DSM) is the prevalent programming paradigm in cluster computing, allowing programmers to address all available memory on a computer cluster as one large, contiguous address space. Even with the support of DSM, data races pose a major hurdle for programmers in most applications, and existing lock-based concurrency mechanisms are complicated to use and not suitable for all tasks.
Transactional memory (TM) places the burden of coherence and concurrency on the system rather than on the programmer, leaving the programmer with the simple task of defining a transactional code block. As an attractive alternative to lock-based synchronization, it also shifts the burden of code optimization towards the system rather than the programmer. Transactional memory has traditionally relied on centralized concurrency protocols inherently unsuitable for large distributed settings, and therefore the question whether transactional memory can be implemented in a scalable manner suitable for large distributed systems can be asked.
In this thesis, a transactional memory extension of distributed shared memory is presented and compared as an alternative to lock-based synchronization. A synthetic random access algorithm shows significant throughput scaling for up to at least 256 cores across 32 DSM nodes, with transactional memory outperforming cohort locking for all cases above 16 cores. The benefits of local decision making and a distributed coherence protocol are also shown to be of utmost importance.