Chunks and Tasks
The Chunks and Tasks programming model provides a way to simplify parallelization of algorithms by using separate modules for data access and task execution. The key idea is that the programmer defines her algorithm in terms of well-defined tasks that access data via a chunk management system. This approach is particularly appealing when the problem can be solved using a dynamic hierarchic algorithm. Examples of applications where dynamic hierarchic algorithms are used include blocked sparse matrix algebra and multipole methods, both of which are important in large scale electronic structure calculations. Another example is finite difference methods using adaptive grids.
Interface
When writing a program using the Chunks and Tasks programming model, all the programmer needs to know is the Chunks and Tasks interface. This interface consists of a set of functions available for the programmer that allows for creation of new tasks and access to data through chunk objects. This is a great advantage since once the algorithm has been implemented using the Chunks and Tasks interface, it can be executed using a multitude of different platforms without changing the source code. Different platforms utilizing different types of parallelism and storage devices can be used as long as a Chunks and Tasks library implementation exists for the platform in question. Optimization related to specific technical issues such as for example the best way to store a chunk can be left to the implementation of the Chunks and Tasks framework. Thus, once your algorithm has been formulated using a Chunks and Tasks interface, you can benefit from future optimizations and hardware developments without rewriting your program.
Implementation
Our current parallel Chunks and Tasks library implementation is written in C++ and uses the work stealing parallel programming paradigm to achieve good scalability with respect to the number of compute nodes. We have implemented a task scheduler and a chunk manager as two separate modules. Both modules use the Message Passing Interface (MPI) and communication conflicts are avoided using separate MPI communicators. Chunks are stored distributed on the compute nodes' memory and data locality is achieved by making sure that chunks created by a task running on a particular node is stored on the same node. Thus, communication is only needed when the chunk is referenced by a task executed by another compute node. To further reduce the need for communication, each compute node uses a chunk cache that keeps copies of the most recently used chunks.
More info and source code
More information, as well as source code for two different Chunks and Tasks library implementations, is available at chunks-and-tasks.org.
Participants
- Emanuel Rubensson (Associate Professor, Docent)
- Elias Rudberg (Researcher)
- Anastasia Kruchinina (PhD student)
- Anton Artemov (PhD student)