UPMARC Workshop on Task-Based Parallel Programming
StarPU: Exploiting heterogeneous, accelerator-based multicore clusters
Dr. Samuel Thibault, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Inria Bordeaux -- Sud-Ouest, France.
Abstract. Heterogeneous accelerator-based architectures are more and more seen in production HPC clusters, featuring both multicore CPUs and e.g. GPU accelerators, thus providing an unprecedented amount of processing power per node, and new accelerators such as the Intel MIC are being deployed. It has thus become one of the biggest challenges in HPC to deal with such machines which expose such a highly unbalanced computing power. To fully tap into the potential of these heterogeneous machines, pure offloading approaches, that consist in running an application on regular processors while offloading part of the code on accelerators, are not sufficient.
This talk will present the StarPU project, which aims at providing portable optimized performance on clusters of heterogeneous multicore+accelerator machines to task-based applications. The goal is to relieve the programmer from the technical aspects of data management and task scheduling, while applying theoretical task scheduling algorithms on actual application execution to improve performance. It also provides performance feedback through task profiling and trace analysis. This approach has been used successfully, for instance, for integrating in a few weeks the PLASMA (CPUs) and MAGMA (GPUs) cholesky, QR and LU factorizations into a cluster CPU+GPU implementation whose efficiency is very close to peak performance.