Tom Vander Aa. Exascale Matrix Factorization: Machine Learning on Supercomputers to Find New Drugs. EuroHPC Summit Week 2019.
Parallel programming has become increasingly important both as a programming skill and as a research topic over the last decades. Multicore computers and mobile devices are part of our daily lives, and computer clusters facilitate research in many areas all over the world. Distributed shared memory (DSM) is the prevalent programming paradigm in cluster computing, allowing programmers to address all available memory on a computer cluster as one large, contiguous address space. Even with the support of DSM, data races pose a major hurdle for programmers in most applications, and existing lock-based concurrency mechanisms are complicated to use and not suitable for all tasks.
Transactional memory (TM) places the burden of coherence and concurrency on the system rather than on the programmer, leaving the programmer with the simple task of defining a transactional code block. As an attractive alternative to lock-based synchronization, it also shifts the burden of code optimization towards the system rather than the programmer. Transactional memory has traditionally relied on centralized concurrency protocols inherently unsuitable for large distributed settings, and therefore the question whether transactional memory can be implemented in a scalable manner suitable for large distributed systems can be asked.
In this thesis, a transactional memory extension of distributed shared memory is presented and compared as an alternative to lock-based synchronization. A synthetic random access algorithm shows significant throughput scaling for up to at least 256 cores across 32 DSM nodes, with transactional memory outperforming cohort locking for all cases above 16 cores. The benefits of local decision making and a distributed coherence protocol are also shown to be of utmost importance.
Imen Chakroun, Tom Vander Aa, Tomas, J. Ashby. Guidelines for enhancing data locality in selected machine learning algorithms. Intelligent Data Analysis, vol. 23, no. 5, pp. 1003-1020, 2019
Arenaz M., Martorell X. (2019) Parallelware Tools: An Experimental Evaluation on POWER Systems. In: Weiland M., Juckeland G., Alam S., Jagode H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science, vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_27
Imen Chakroun, Tom Vander Aa, Tom Ashby. Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms. 4th International Conference on Big Data Analytics, Data Mining and Computational Intelligence (BIGDACI2019). IADIS. 2019.
Tom Vander Aa. SMURFF: a High-Performance Framework for Matrix Factorization. IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS2019)
Tom Vander Aa, Imen Chakroun, Thomas J. Ashby. SMURFF: a High-Performance Framework for Matrix Factorization. 1st IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS 2019). IEEE. 2019.
Tom Vander Aa, Imen Chakroun, Tom Ashby, Jaak Simm, Adam Arany, Yves Moreau, Thanh Le Van, José Felipe Golib Dzib, Jörg Wegner, Vladimir Chupakhin, Hugo Ceulemans, Roel Wuyts, Wilfried Verachtert. SMURFF: a High-Performance Framework for Matrix Factorization Methods (extended abstract). 31st Benelux Conference on Artificial Intelligence (BNAIC 2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn 2019). 2019.
Toledo, L., Peña, A.J., Catalán, S., and Valero-Lara, P.. Tasking in accelerators: performance evaluation. A: International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). "2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT): Gold Coast, Australia: 5-7 December 2019: proceeding". Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 127-132.