This page is a copy of research/scientific_computing/former/hava (Wed, 31 Aug 2022 15:00:52)
High-performance computing applications on various architectures
Markus Nordén, Henrik Löf, Sverker Holmgren, Jarmo Rantakokko, Michael Thuné, Dan Wallin
Research
Large shared-memory computers often have a non-uniform memory architecture (NUMA), i.e., the memory is distributed among the CPUs, and the latency of a memory access depends on whether it is an access to local memory (close to the CPU performing the access) or remote memory (somewhere else in the computer). If a majority of the memory accesses of a program are to local memory, it is said to show good geographical locality.
In this project, we study performance with respect to geographical locality for typical programs in the field of computational science. We measure how large the influence of geographical locality is on over all performance, and study different strategies to improve geographical locality.
In the example below, Figure 1 shows a pulse that moves diagonally through the square. The computational grid is refined close to the pulse in order to improve accuracy. This makes it necessary to repartition the responsibility for the computations between the CPUs. When doing that, geographical locality is usually lost. In Figures 2 and 3, the square is shown from above. The
background color shows the responsibility for the computations, the size of the circles the size of a block, and the white and black pie-slices local and remote memory accesses.
Apart from various locality optimizations, we also study communication overheads for parallel implementations of computational methods. Recent advances in microprocessor technology have made it possible to place several processor cores on a single chip to form a tightly coupled shared memory multiprocessor. These so called chip-multiprocessors (CMPs) have the ability to communicate using on-chip cache memories resulting in very low latencies and bandwidths magnitudes higher than previous single-core architectures. We study how to exploit these new features to increase the performance of some well known parallel algorithms.
Publications
Refereed
Theses
-
Multithreaded PDE Solvers on Non-Uniform Memory Architectures
. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology nr 224, Acta Universitatis Upsaliensis, Uppsala, 2006. (fulltext
).
-
Iterative and Adaptive PDE Solvers for Shared Memory Architectures
Iterativa och adaptiva PDE-lösare för parallelldatorer med gemensam minnesorganisation
. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology nr 218, Acta Universitatis Upsaliensis, Uppsala, 2006. (fulltext
).
-
Parallelizing the Method of Conjugate Gradients for Shared Memory Architectures
. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2004-005, Uppsala University, 2004. (fulltext
).
-
Parallel PDE Solvers on cc-NUMA Systems
. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2004-002, Uppsala University, 2004. (fulltext
).