One of the hardware-related challenges identified by EPEEC is the significant increase in the number of computer nodes in future HPC systems when compared to current systems, which brings the potential for serious congestion problems due to higher generated network traffic. This topic is addressed by Comprex, which is deeply integrated in the GASPI programming model that forms together with ArgoDSM, OmpSs and OmpSs@cluster the EPEEC programming model.
To leverage the power of an HPC cluster, it is necessary to send data between the physically separated computational nodes. However, with an increasing number of nodes, the communication overhead increases as well. The communication takes time and reduces the efficiency of the HPC system, as the computational nodes are waiting for the communication to finish. It can happen that the HPC system spends more time with communication than with computation, which means that the system is underutilized. One of our goals is to make the communication as efficient and fast as possible, as this saves time and energy, which ultimately reduces costs and benefits the environment.
There are many techniques to make the communication as efficient as possible, so the computing part does not need to wait for the communication. Compression is one of the techniques and it reduces the volume of the data, measured in bytes, which needs to be communicated. Lossy compression describes methods, which irreversibly reduce the information content of the original data. Although this is prohibitive for some applications, there are important applications which allow for high lossy compression. One such application is the training of deep neural networks (DNN) for e.g. computer vision and language processing applications. During the training of DNNs, all the computational nodes calculate a different update for their joint, global DNN model. The different updates need to be incorporated in the global model, which means that the nodes need to communicate their updates and find a new global model. It has been shown that the communication of these updates can be highly compressed below 1% of the original data, without the training to suffer.
We combine lossy compression schemes with our communication library called GaspiCxx (https://github.com/cc-hpc-itwm/GaspiCxx), which uses our GPI communication backend (https://github.com/cc-hpc-itwm/GPI-2). The big advantage of the GaspiCxx communication library is that it can overlap communication and computation, so that the communication overhead is hidden behind the computations. Using this concept, we can also hide the overhead from the compression step and fully benefit from the reduced data volume. As shown in figure 1, it is possible to reduce the time to communicate data by more than one order of magnitude. The compression schemes reduce the time more with higher data volumes, which is ideal for bigger problems, such as DNN training.
Figure 1: Communication times for different data volumes and communication schemes. The percentage refers to the fraction of the original data, which is sent around. The data is measured for 32 nodes with 10 Gbit/s Ethernet connection on the Amazon AWS Cloud.