Skip to main content
Department of Information Technology

This page is a copy of research/scientific_computing/former/hava (Wed, 31 Aug 2022 15:00:52)

High-performance computing applications on various architectures

Markus Nordén, Henrik Löf, Sverker Holmgren, Jarmo Rantakokko, Michael Thuné, Dan Wallin

Research

Large shared-memory computers often have a non-uniform memory architecture (NUMA), i.e., the memory is distributed among the CPUs, and the latency of a memory access depends on whether it is an access to local memory (close to the CPU performing the access) or remote memory (somewhere else in the computer). If a majority of the memory accesses of a program are to local memory, it is said to show good geographical locality.

In this project, we study performance with respect to geographical locality for typical programs in the field of computational science. We measure how large the influence of geographical locality is on over all performance, and study different strategies to improve geographical locality.

In the example below, Figure 1 shows a pulse that moves diagonally through the square. The computational grid is refined close to the pulse in order to improve accuracy. This makes it necessary to repartition the responsibility for the computations between the CPUs. When doing that, geographical locality is usually lost. In Figures 2 and 3, the square is shown from above. The
background color shows the responsibility for the computations, the size of the circles the size of a block, and the white and black pie-slices local and remote memory accesses.

puls.gif

numa.gif uma.gif

Apart from various locality optimizations, we also study communication overheads for parallel implementations of computational methods. Recent advances in microprocessor technology have made it possible to place several processor cores on a single chip to form a tightly coupled shared memory multiprocessor. These so called chip-multiprocessors (CMPs) have the ability to communicate using on-chip cache memories resulting in very low latencies and bandwidths magnitudes higher than previous single-core architectures. We study how to exploit these new features to increase the performance of some well known parallel algorithms.

Publications

Refereed

Theses

Updated  2022-08-31 15:00:52 by Victor Kuismin.