Software distributed shared memory (DSM) systems have been one of the main areas of research in the high-performance computing community. One of the many implementations of such systems is Argo, a page-based, user-space DSM, built on top of MPI. Researchers have dedicated considerable effort in making Argo easier to use and alleviate some of its shortcomings that are culprits in hurting performance and scaling. However, there are several issues left to be addressed, one of them concerning the simplistic distribution of pages across the nodes of a cluster. Since Argo works on page granularity, the page-based memory allocation or placement of pages in a distributedsystem is of significant importance to the performance, since it determines the extent of remote memory accesses. To ensure high performance, it is essential to employ memory allocation policies that allocate data in distributed memory modules intelligently, thus reducing latencies and increasing memory bandwidth. In this thesis,we incorporate several page placement policies on Argo and evaluate their impact on performance with a set of benchmarks ported on that programming model.
Daniel Castro, Alexandro Baldassin, João Barreto and Paolo Romano. SPHT: Scalable Persistent Hardware Transactions. 19th USENIX Conference on File and Storage Technologies (FAST'21).
P. Ekemark, Y. Yao, A. Ros, K. Sagonas and S. Kaxiras, "TSOPER: Efficient Coherence-Based Strict Persistency," 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 125-138, doi: 10.1109/HPCA51647.2021.00021.
Antonio J. Peña. A Software Ecosystem to Save Money in DRAM and Increase Performance with Optane DIMMs. Intel HPC+AI Pavilion. 2020.
Tom Vander Aa, Xiangju Qin, Paul Blomstedt, Roel Wuyts, Wilfried Verachtert, Samuel Kaski. A High-Performance Implementation of Bayesian Matrix Factorization with Limited Communication. International Conference on Computational Science (ICCS 2020).
This paper is included in the Public Health Emergency #COVID19 Initiative repository
Jaume Bosch, Carlos Álvarez, Daniel Jiménez-González, Xavier Martorell, Eduard Ayguadé. Asynchronous runtime with distributed manager for task-based programming models. Parallel Computing, Volume 97, 2020. https://doi.org/10.1016/j.parco.2020.102664
Gureya, D., Neto, J., Karimi, R., Barreto, J, Bhatotia, P., Quema, V., Rodrigues, R., Romano, P., Vlassov, V. Bandwidth-Aware Page Placement in NUMA. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, 2020 pp. 546-556. doi: 10.1109/IPDPS47924.2020.00063
Antonio J. Peña. EPEEC’s Advances toward Programming Productivity for Heterogeneity at Large Scale. EuroExaScale 2020 (HiPEAC 2020 Conference).
Pavanakumar Mohanamuraly and Gabriel Staffelbach. 2020. Hardware Locality-Aware Partitioning and Dynamic Load-Balancing of Unstructured Meshes for Large-Scale Scientific Applications. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 7, 1–10. DOI:https://doi.org/10.1145/3394277.3401851