Several applications are considered in EPEEC: AVBP (Cerfacs, a numerical simulation framework for the study of fluid dynamics and combustion problems), DIOGENeS (Inria, a numerical simulation framework for the study of nanoscale light-matter interaction problems), OSIRIS (INESC-ID, a numerical simulation framework for the study of plasma physics problems), Quantum ESPRESSO (Cineca, a set of numerical tools for the study of electronic properties of materials) and SMURFF (IMEC, a Bayesian matrix factorization framework for building recommender systems with applications to life sciences). The present news focuses on the SMURFF application.
Recommender Systems (RS) have become very common in recent years and are useful in various real-life applications. The most popular ones are probably suggestions for movies on Netflix and books for Amazon. However, they can also be used in more unlikely areas such as drug discovery where a key problem is the identification of candidate molecules that affect proteins associated with diseases. Bayesian Matrix Factorization (MF) is a recommender technique that has been able to give high-quality recommendations but is computationally intensive and thus more challenging to implement for large datasets.
SMURFF, developed in the EPEEC project, is a high-performance feature-rich framework to compose and construct different Bayesian matrix factorization methods. The framework also allows combining different matrices together and thus incorporating more and different types of information into the model. SMURFF takes advantage of several EPEEC-technologies. In 2019 we presented a distributed implementation with a partitioned global address using GASPI at the EuroHPC summit. We showed how the HPC infrastructure and the HPC implementations were crucial to reach insights and how these insights helped the pharma industry in their drug discovery process. Another example where SMURFF was used for discovering new drugs is in so-called virtual screens. Virtual screening (see Figure 1) is a computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme. For this use-case we successfully applied OmpSs with OpenACC-accelerated kernel giving a 30× speedup on using GPU-acceleration and we used the Parallelware tools for vectorization and multicore acceleration, giving on-par performance compared to hand-coding by experts.
Figure 1. Virtual screening pipeline