Multicore Programming Frameworks
Motivation
The shift to universal parallelism has dramatically increased the complexity of developing software that can exploit modern hardware efficiently. These difficulties arise from the need to leverage parallel hardware through explicit program concurrency, but are often most apparent in the resulting need to minimize communications by managing memory system and network usage. Unfortunately today's programming methodologies do a very poor job of both exposing optimization costs and opportunities to the programmer. The result is a severe lack of performance portability, which both increases the cost of software development and prevents existing software from leveraging emerging hardware to its fullest.
The goal of this project is to develop programming frameworks that can overcome this hurdle by leveraging the domain-specific high-level application information available from the programmer and a detailed knowledge of the hardware. By combining this information we will be able to provide performance portability and efficient development through a combination of improved programmability and optimization.
Long Term Goals
- Understand the interactions between programming models, application domains, and hardware.
- Build frameworks for providing performance portability and efficient implementation of real-world applications.
- Investigate how to leverage high-level program information for optimizing parallelization and communication.
Expected Results
- Develop a suite of benchmarks across a range of programming models to enable comparisons of ease of implementation and optimization.
- Build an efficient, performance-portable framework for solving PDEs using radial basis function approximation methods.
- Develop a task-based framework to leverage high-level program structure to optimize parallelism and data movement.
- Leverage runtime hardware performance information in a task-based programming framework to efficiently create and manage tasks.
- Demonstrate optimization of regular (data-parallel, static) and irregular (data-dependent, runtime) application parallelism.
Approach
Our approach to investigating programming frameworks is highly application-centric and aims to identify the key issues in developing and optimizing performance portable applications.
- Identify and implement key benchmark applications.
- Compare application implementations across different programming frameworks to gain insight into programming and optimization challenges.
- Extend existing frameworks and develop new ones as needed to explore new optimizations.
- Leverage tools from the architecture group to understand and optimize performance automatically.
Results
Software
- SuperGlue. A library for data-dependency driven task parallelism.
- DuctTeip. A library for distributed data-dependency driven task parallelism.
Refereed publications
- DuctTeip: An efficient programming model for distributed task-based parallel computing. In Parallel Computing, volume 90, 2019. (DOI, fulltext:postprint).
- SuperGlue: A shared memory framework using data versioning for dependency-aware task-based parallelization. In SIAM Journal on Scientific Computing, volume 37, pp C617-C642, 2015. (DOI, fulltext:print).
- A scalable RBF–FD method for atmospheric flow. In Journal of Computational Physics, volume 298, pp 406-422, 2015. (DOI, fulltext:postprint).
- Resource-aware task scheduling. In ACM Transactions on Embedded Computing Systems, volume 14, number 1, pp 5:1-25, 2015. (DOI, Fulltext).
- Programming models based on data versioning for dependency-aware task-based parallelisation. In Proc. 15th International Conference on Computational Science and Engineering, pp 275-280, IEEE Computer Society, Los Alamitos, CA, 2012. (DOI).
- Using hardware transactional memory for high-performance computing. In Proc. 25th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, pp 1660-1667, IEEE, Piscataway, NJ, 2011. (DOI).
- Information quality testing. In Perspectives in Business Informatics Research, volume 64 of Lecture Notes in Business Information Processing, pp 14-26, Springer-Verlag, Berlin, 2010. (DOI).
- Analysis and visualization of information quality of technical documentation. In Proc. 4th European Conference on Information Management and Evaluation, pp 388-396, Academic Conferences, Reading, UK, 2010.
- An efficient task-based approach for solving the <em>n</em>-body problem on multicore architectures. PARA 2010: State of the Art in Scientific and Parallel Computing, University of Iceland, Reykjavík, 2010. (fulltext:postprint).
- Current practice in mobile learning: A survey of research method and purpose. In Proc. 8th World Conference on Mobile and Contextual Learning, pp 103-111, University of Central Florida, Orlando, FL, 2009.
- Sharing experience from three initiatives in mobile learning: Lessons learned. In Proc. 17th International Conference on Computers in Education, pp 613-617, Asia-Pacific Society for Computers in Education, Jhongli City, Taiwan, 2009.
- Thinking ahead in mobile learning projects: A survey on risk assessment. In Proc. 8th International Conference on Perspectives in Business Informatics Research, pp 57-66, Kristianstad Academic Press, Sweden, 2009.
- A meta-model describing the development process of mobile learning. In Advances in Web Based Learning – ICWL 2009, volume 5686 of Lecture Notes in Computer Science, pp 454-463, Springer-Verlag, Berlin, 2009. (DOI).
Theses
- Advances in Task-Based Parallel Programming for Distributed Memory Architectures. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology nr 1621, Acta Universitatis Upsaliensis, Uppsala, 2018. (fulltext, preview image).
- Scientific Computing on Multicore Architectures. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology nr 1139, Acta Universitatis Upsaliensis, Uppsala, 2014. (fulltext, preview image).
- Leveraging multicore processors for scientific computing. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2012-006, Uppsala University, 2012. (fulltext).
Other publications
- Distributed dynamic load balancing for task parallel programming. 2018. (arXiv:1801.04582).
- DuctTeip: A task-based parallel programming framework for distributed memory architectures. Technical report / Department of Information Technology, Uppsala University nr 2016-010, 2016. (fulltext).
- A task parallel implementation of an RBF-generated finite difference method for the shallow water equations on the sphere. Technical report / Department of Information Technology, Uppsala University nr 2014-011, 2014. (fulltext).
- SuperGlue: A shared memory framework using data versioning for dependency-aware task-based parallelization. Technical report / Department of Information Technology, Uppsala University nr 2014-010, 2014. (fulltext).
- A task parallel implementation of a scattered node stencil-based solver for the shallow water equations. In Proc. 6th Swedish Workshop on Multi-Core Computing, pp 33-36, Halmstad University, Halmstad, Sweden, 2013.
- Resource-aware task scheduling. In 4th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures (PARMA), p 6, Tech. Univ. Berlin, Germany, 2013. (fulltext:postprint).
- A simple model for tuning tasks. In Proc. 4th Swedish Workshop on Multi-Core Computing, pp 45-49, Linköping University, Linköping, Sweden, 2011.
- Early results using hardware transactional memory for high-performance computing applications. In Proc. 3rd Swedish Workshop on Multi-Core Computing, pp 93-97, Chalmers University of Technology, Göteborg, Sweden, 2010. (fulltext:postprint).
- Dealing with stakeholders in mobile learning: A study of three initiatives. In Proc. 32nd Information Systems Research Seminar in Scandinavia, pp A72:1-14, Molde University College, Norway, 2009.
Presentations
- Scientific Computing in Sweden, Toward A Unified Task-based Parallel Programming Interface, October 19-20, 2016.
- PP 2014, A Parallel Scattered Node Finite Difference Scheme for the Shallow Water Equations on a Sphere, February 18, 2014.
- PP 2014, A Task-Based Parallel Programming Framework with Modularity, Scalability and Adaptability Features, February 21, 2014.
- MCC 2013, A task parallel implementation of a scattered node stencil-based solver for the shallow water equations, November 25, 2013.
- PARMA 2013, Resource-aware task scheduling, January 23, 2013.
- IEEE International Conference on Computational Science and Engineering, Programming models based on data versioning for dependency-aware task-based parallelisation, December 5-7, 2012.
- Algorithmy 2012, A parallel implementation of a radial basis functions method using data dependent tasks, September 9-14, 2012.
- SIAM Conference on Parallel Processing for Scientific Computing 2012, Managing dependencies in a task parallel framework, February 15-17, 2012.
- MTAAP 2011, Using hardware transactional memory for high-performance computing, May 20, 2011.
- Para2010, An efficient task-based approach for solving the n-body problem on multicore architectures, June 6-9, 2010
Current staff
Senior: Sverker Holmgren, Elisabeth Larsson (Contact)
Ph.D.: Martin Tillenius, Afshin Zafari
Former staff
Morgan Ericsson, Linnaeus University, Växjö, Sweden.
Marcus Holm, UPPMAX, Uppsala University.