Reducing network traffic for Exascale programming

The aim of our work is to deliver support for a scalable integrated distributed programming model environment. One of the components is the GASPI programming specification and its implementation GPI ( In the scope of EPEEC we are enhancing the parallel programming model GASPI to increase the scalability. One extension is the compression of data to be exchanged between compute nodes to reduce network congestion in data-intensive applications distributed across a large number of nodes. A first prototype compresses communication data for a machine learning use case.

The motivation to design and develop the compression library arises both from the natural endeavour to enhance the productivity of the GASPI communication, and from the actual need of the applications in a particular research field, machine learning.

EPEEC image
Figure 1: Benefits of data compression at the example of gradient compression in deep learning applications. (Figure from: Yujun Lin, Song Han, Huizi Mao, Yu Wang and William J. Dally. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. ICLR 2018)

Machine Learning Algorithms:

A machine learning algorithm extracts meaningful information from input data and makes some prediction about it. For example, it could predict the risk of a patient developing a specific illness based on the patient’s clinical history and vital signs. The main component of a machine learning algorithm is a statistical model with many, sometimes millions, of tuneable parameters. One of the major tasks is to find these parameters, which is done by a procedure referred to as optimization. 

One of necessary operations in a machine learning optimization algorithm is the (inter-nodal) exchange of gradient vectors. Although they are large in size, they contain little information which is essential to the optimization procedure. Therefore, the application only needs to transfer a small subset of vector items.

Thus, to make both the communication itself and the application as a whole more effective, the basic idea is to reduce the gradient vectors to their essential content on the sender side and send them in a compressed format. On the receiver side, the compressed vector is expanded to its original size. The crucial points are the identification of what content is essential and a fast compression implementation.

A first prototype of the compression library called Comprex with a simple machine learning use case has been implemented and can be found in our github repository:

Tests on Fraunhofer’s GPU cluster CARME are in preparation: