The specific main components of the programming environment for exascale, carefully selected for the EPEEC project, are introduced in this figure. Their new proposed features, also shown in the figure, have been identified as the necessary developments to bring the overall programming environment high programming productivity, high execution efficiency and scalability, energy awareness, and smooth composability/interoperability.

components

 

The five selected components are described below:

Parallelware

The Parallelware software developed by Appentra provides advanced capabilities for static analysis of program, which are used in the related products Parallelware Trainer and Parallelware Analyzer. At the start of the project Parallelware software has support for OpenMP 4.5 and OpenACC 2.5, the main directive-based standards for intra-node parallel programming in (pre)exascale systems. The feedback received from the end-users highlighted that Parallelware lacks support for the tasking paradigm, for asynchronous execution, and for the C++ and Fortran programming languages. At the end of the EPEEC project, Parallelware will be enhanced with new support for C++ and Fortran built from scratch on top of the CLANG and FLANG front-ends of the LLVM infrastructure, as well as with new support for OmpSs and for the latest features of OpenMP and OpenACC.

OmpSs

The OmpSs programming model is implemented at BSC by means of the Mercurium compiler and the Nanos++ runtime system. A brand-new runtime system written from scratch, codenamed Nanos 6, is currently under development, intended to overcome code degradation and prevent future maintainability burdens. Both Mercurium and Nanos++ are currently at TRL9 - these have been deployed in the Tier-0 PRACE facility Marenostrum and there is a considerable number of known users in external institutions that seek support through the corresponding mailing list at BSC (e.g., CINECA, LRZ, JSC, Herta Security, and Vimar). Nanos 6, on the other hand, is currently at TRL3. Being Nanos++ fully operational, it may well be used as a base of the new developments to be ported to a mature Nanos 6 at a later stage. This will be performed in case, by the beginning of the project, Nanos 6 has not yet attained TRL7, or if it is not expected to reach TRL8 by the end of it. Directive-based acceleration by means of OpenACC syntax in OmpSs (i.e., OmpSs+OpenACC) is currently at TRL3, since an early prototype to validate the idea, offering limited functionality, has already been developed on top of the OmpSs CUDA back-end. OpenMP offloading support, on the other hand, is at TRL2, being only the concept formulated. Similarly, the remaining proposed features, i.e. tasking in accelerators, heterogeneous memory management, and latest C++ and Fortran features support, are at TRL2 as well.

GASPI

GASPI stands for Global Address Space Programming Interface and is a Partitioned Global Address Space (PGAS) API. It targets extreme scalability, high flexibility, and failure tolerance for parallel computing environments. GASPI aims to initiate a paradigm shift from bulk-synchronous two-sided communication patterns towards an asynchronous communication and execution model. To that end, GASPI leverages remote completion and one-sided RDMA-driven communication in a Partitioned Global Address Space. The asynchronous communication allows a perfect overlap between computation and communication. The main design idea of GASPI is to offer a lightweight API ensuring high performance, flexibility, and failure tolerance. GPI-2 is an open source implementation of the GASPI standard, freely available to application developers and researchers. It is already at production level (TRL9). However in a fast-changing innovative environment such as HPC, further developments are needed to incorporate hardware changes such as the use of accelerators and improving the ease of use of GPI, which is currently an obstacle for a wider adoption of the programming model. EPEEC provides solutions for both challenges by addressing heterogeneous compute systems and promoting the composability of the GASPI and OmpSs programming models. Traditionally GPI is used in visualisation and seismic imaging domains. In the past years GPI has been ported to several scientific applications within publicly funded projects.

ArgoDSM

ArgoDSM is a modern page-based distributed shared memory system first released in 2016 (first publication describing ArgoDSM appeared in HPDC 201512). ArgoDSM is based on recent advances on cache coherence protocols and synchronisation algorithms at Uppsala University. ArgoDSM is a page-based distributed shared virtual-memory system that operates in user space. It supports third-party network layers. ArgoDSM offers: (1) POSIX Threads compatibility (runs pthreads on clusters); (2) Minimal effort to scale a typical pthreads program from one node to thousands of nodes; (3) User-space operation: ArgoDSM is a library on top of an RDMA-capable network layer; (4) Release consistency (RC) and sequential consistency for data-race-free research, evaluation, and education and is available under a custom license for commercial use. It is distributed by Eta Scale and currently at TRL6, its prototype implementations being evaluated by commercial entities under real-world conditions.

BSC performance tools

The BSC performance tools, namely Extrae, Paraver, and Dimemas, are currently widely deployed and extensively used by institutions such as NCAR or NASA-AMES, and in large projects such as DEEP-EST or the POP Centre of Excellence, to name only a few. All software components have been deployed in pre-exascale machines, at least Marenostrum, and hence these are considered to be at TRL9. The new features to be included as part of this project, such as OpenACC support (please see sections 1.3.4 and 3.1), are at TRL2 (concept formulated).