New recommendations and opportunities regarding vectorization and use of SIMD instructions

Current processors are incorporating more and more vectorization and SIMD (Single-Instruction Multiple-Data) computing capabilities, and it is a challenge for programmers to understand their codes to decide when an opportunity for vectorization can be applied. 

In the context of our EPEEC project, Appentra and BSC have developed a set of rules, recommendations and opportunities that programmers can take into account in their daily work. In addition to this, Appentra’s Parallelware Analyzer has been extended to use the application’s internal representation, to find such opportunities and provide this feedback to the programmer, thus increasing programmer’s productivity. The following figure shows the parallel programming phases: 

Parallel programming phases
Figure: Parallel programming phases

The following listing shows how Parallelware Analyzer “pwreport” tool is used to obtain the code locations where the programmer can focus, with opportunities for parallelization and optimization:

Parallel programming phases

Parallel programming phases

Other features include determining if a particular a loop nest can be vectorized, including in situations where some of the variables need explicit synchronization support, as the reductions on “dot” and “sum” shown in the following figure:

Additionally, Parallelware Analyzer is able to determine the benefits of loop interchange or loop fission to favor or enable vectorization; it detects invariant computations and conditional sentences that can be extracted from vectorizable loops; it determines if loop nests may get benefit from hybrid parallelization and vectorization; and it is also able to provide recommendations about using SMP parallelism, or if it is preferable to use offload execution to GPUs; all along with other interesting suggestions for adding “restrict” to function pointer parameters, or for annotating pure functions with OpenMP SIMD.