Productivity and high-performance usually do not go together very well. High-performance codes are typically low-level codes, difficult to understand, optimized and fine-tuned, and usually written bearing in mind a specific system. CUDA code can run blazingly fast on Nvidia GPUs, but only on Nvidia GPUs. Maintaining such codes or porting them to other platforms is difficult.
The EPEEC project develops tools and techniques to make high-performance codes easier to write, more maintainable and more portable. In this news article we show the benefits of these tools and techniques for the SMURFF application from IMEC.
Automatic checking and parallelization using Appentra’s Parallelware
Appentra’s tools can enhance and verify the OpenMP and OpenACC directives in parallel codes. Running the Parallelware tool on the SMURFF application code resulted in several suggestions to make the OpenMP directives clearer, and increased performance by explicitly enabling SIMD parallelization.
Fig. 1: OpenMP directive automatically added by Appentra's Parallelware
Universal task-based parallelization with OmpSS
Tasks everywhere is a powerful technique that allows you to express parallelism in the same way at all different levels (core level, node level and even system level). EPEEC develops technology to better off-load tasks to accelerators like FPGA and GPU, and to run tasks automatically on multiple nodes of a supercomputer without having to write a single line of MPI. Tasking in SMURFF allowed us to create a second level op parallelism written in a compact taskloop form, resulting in better load-balancing and more performance.
Fig. 2: Code simplification using taskloop. Old, complicated code (top) versus much simpler code (bottom) with the same performance