Thesis/dissertation

Software distributed shared memory (DSM) systems have been one of the main areas of research in the high-performance computing community. One of the many implementations of such systems is Argo, a page-based, user-space DSM, built on top of MPI. Researchers have dedicated considerable effort in making Argo easier to use and alleviate some of its shortcomings that are culprits in hurting performance and scaling. However, there are several issues left to be addressed, one of them concerning the simplistic distribution of pages across the nodes of a cluster. Since Argo works on page granularity, the page-based memory allocation or placement of pages in a distributedsystem is of significant importance to the performance, since it determines the extent of remote memory accesses. To ensure high performance, it is essential to employ memory allocation policies that allocate data in distributed memory modules intelligently, thus reducing latencies and increasing memory bandwidth. In this thesis,we incorporate several page placement policies on Argo and evaluate their impact on performance with a set of benchmarks ported on that programming model.


Publication in Conference Proceedings/Workshop

Daniel Castro, Alexandro Baldassin, João Barreto and Paolo Romano. SPHT: Scalable Persistent Hardware Transactions. 19th USENIX Conference on File and Storage Technologies (FAST'21).


Publication in Conference Proceedings/Workshop

P. Ekemark, Y. Yao, A. Ros, K. Sagonas and S. Kaxiras, "TSOPER: Efficient Coherence-Based Strict Persistency," 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 125-138, doi: 10.1109/HPCA51647.2021.00021.


DOI: 10.1109/HPCA51647.2021.00021
Oral presentation

Antonio J. Peña.  A Software Ecosystem to Save Money in DRAM and Increase Performance with Optane DIMMs. Intel HPC+AI Pavilion. 2020.


Publication in Conference Proceedings/Workshop

Tom Vander Aa, Xiangju Qin, Paul Blomstedt, Roel Wuyts, Wilfried Verachtert, Samuel Kaski. A High-Performance Implementation of Bayesian Matrix Factorization with Limited Communication. International Conference on Computational Science (ICCS 2020).

This paper is included in the Public Health Emergency #COVID19 Initiative repository


DOI: 10.1007/978-3-030-50433-5_1
Article in journal

Jaume Bosch, Carlos Álvarez, Daniel Jiménez-González, Xavier Martorell, Eduard Ayguadé. Asynchronous runtime with distributed manager for task-based programming models. Parallel Computing, Volume 97, 2020. https://doi.org/10.1016/j.parco.2020.102664


DOI: https://doi.org/10.1016/j.parco.2020.102664
Publication in Conference Proceedings/Workshop

Gureya, D., Neto, J., Karimi, R., Barreto, J, Bhatotia, P., Quema, V., Rodrigues, R., Romano, P., Vlassov, V. Bandwidth-Aware Page Placement in NUMA. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, 2020 pp. 546-556. doi: 10.1109/IPDPS47924.2020.00063


DOI: 10.1109/IPDPS47924.2020.00063
Oral presentation

Antonio J. Peña. EPEEC’s Advances toward Programming Productivity for Heterogeneity at Large Scale. EuroExaScale 2020 (HiPEAC 2020 Conference).


Publication in Conference Proceedings/Workshop

Pavanakumar Mohanamuraly and Gabriel Staffelbach. 2020. Hardware Locality-Aware Partitioning and Dynamic Load-Balancing of Unstructured Meshes for Large-Scale Scientific Applications. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC ’20). Association for Computing Machinery, New York, NY, USA, Article 7, 1–10. DOI:https://doi.org/10.1145/3394277.3401851


DOI: https://doi.org/10.1145/3394277.3401851
Other

Manuel Aranz, Xavier Martorell and Antonio J. Peña. Programming Guidelines for Parellel Computing. EPEEC. May 2020.