Distributed Computing Applications (DCA) Research Group
Modern computational experiments often require large computing and storage facilities, efficient data analysis of very large datasets and sophisticated software capable of leveraging distributed and heterogeneous infrastructures such as (hybrid) clouds.
The DCA research group is an interdisciplinary arena for researchers interested in large-scale distributed and data-intensive computing, data science and computational science and engineering software. A current focus area is the development of methods and software that efficiently, flexibly and reliably leverage cloud computing environments to solve science and engineering problems both in academia and industry.
The DCA group participates in the eSSENCE strategic collaboration on eScience. We take an active role in developing cloud computing infrastructure for scientists in the Nordics, with leading roles in the SNIC Science Cloud project and participation in the NeIC Glenna project.
DCA is also part of the HASTE project, Hierarchical Analysis of Temporal and Spatial Image Data.
Participants
- Andreas Hellander (Associate Professor in Scientific Computing)
- Salman Toor (Assistant Professor in Scientific Computing, group coordinator)
- Carl Nettelblad (Associate Professor in Scientific Computing)
- Sverker Holmgren (Professor in Scientific Computing)
- Ola Spjuth (Associate Professor in Pharmaceutical Biosciences)
- Prashant Singh (Postdoc Researcher)
- Ben Blamey (Postdoc Researcher)
- Marco Capuccini (PhD student)
- Kristiina Ausmees (PhD student)
- Fredrik Wrede (PhD student)
- Mona Babikir (MSc Student)
- Albin Stjerna (MSc Student)
- Andy Ishak (MSc Student)
- Tony Wang (MSc Student)
- Oliver Stein (MSc Student)
- Aleksander Okonski (MSc student)
- Virakraingsei Hai Socheat (MSc student)
- Preechakorn Torruangwatthana (MSc student)
- Alieu Jallow (MSc student)
Industry Outreach and Commercialization
Together with answering the state-of-the-art research questions, DCA platform is actively working on transforming research into the viable industrial solutions. The Scaleout Systems is a spinoff company with strong research support from the DCA group and the strength to offer cutting-edge production-grade solutions. The offered solutions are based on the Cloud-native AI, a well-aligned work direction towards Cloud-3.0 vision.
Current Projects
- Smart and automatic systems for computationally-driven scientific discovery
- Distributed Machine Learning with confidence
- Large-scale computational experiments in Systems Biology
- HarmonicIO: A Stream-based Solution for Scalable Data Analysis
- GROOT: Implementing Security Rules, Safeguards, and IDS tools for Private Cloud Infrastructures
Completed projects
- Scientific applications on GRID systems
- Compute-and data-intensive applications in Bioinformatics
- Cost-aware Application Development and Management using CLOUD-METRIC
Education
We are offering MSc. courses in Applied Cloud Computing and Data-intensive computing ("Big Data"):
- Applied Cloud Computing (Period 1)
- Large Datasets for Scientific Applications (Period 3)
PhD Courses
We will offer PhD courses again in late 2019 or early 2020.
Past course offerings:
Available Projects
We are always looking for motivated students to join the group. Please do not hesitate to contact us and discuss possibilities. Currently, we have the following MSc. thesis projects, but projects can be formulated based on your interests as well.
- Artifical Intelligence for Distributed Infrastructures
- Intelligent Resource Management for Processing Large Data Streams
- Privacy-preserved Distributed Analysis for Sensitive Datasets
Open Position(s)
Publications
- B. Blamey, A. Hellander, S. Toor. Apache Spark Streaming, Kafka and HarmonicIO: A Performance and Architecture Comparison for Enterprise and Scientific Computing. Submitted to the IEEE Cloud 2019 http://conferences.computer.org/cloud/2019/
- B. Blamey, F. Wrede, J. Karlsson, A. Hellander, S. Toor. Adapting The Secretary Hiring Problem for Optimal Hot-Cold Tier Placement under Top-K Workloads. Submitted to the 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGrid 2019) https://www.ccgrid2019.org/pages/submission.html
- P. Torruangwatthana, H. Wieslander, B. Blamey, A. Hellander, S. Toor. HarmonicIO: Scalable Data Stream Processing for Scientific Datasets. ISSN: 2159-6190. ISBN: 978-1-5386-7235-8. http://doi.ieeecomputersociety.org/10.1109/CLOUD.2018.00126
- K. Ausmees, A. John, S. Toor, A. Hellander, C. Nettelblad. BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data. BMC Bioinformatics 201819:240, https://doi.org/10.1186/s12859-018-2241-z
- L.Ahmed, V.Georgiev, M. Capuccini, S. Toor, W. Schaal, E. Laure and O. Spjuth. Efficient iterative virtual screening with Apache Spark and conformal prediction. Journal of Cheminformatics, DOI: doi 10.1186/s13321-018-0265-z. Publication date: 2018/12.
- T. Bell, E. Fernandez, K. Happonen, D. Still, S. Toor, K. Yazdi. Book: The Crossroads of Cloud and HPC, Chapter: OpenStack and Reseach Cloud Federation. ISBN-10: 1978244703, ISBN-13: 978-1978244702 Paperback: 88 pages, Publisher: CreateSpace Independent Publishing Platform; 1 edition (October 19, 2017), Language: English.
- S. Toor, M. Lindberg, I. Fallman, A. Vallin, O. Mohill, P. Freyhult, L. Nilsson, M. Agback, L. Viklund, H. Zazzi, O. Spjuth, M. Capuccini, J. Moller, D. Murtagh and A. Hellander. SNIC Science Cloud (SSC): A National-scale Cloud Infrastructure for Swedish Academia. Accepted in 13th IEEE International Conference on eScience, doi: 10.1109/eScience.2017.35, ISBN: 978-1-5386-2686-3.
- A. Jallow, A. Hellander and S. Toor. Cost-aware Application Development and Management using CLOUD-METRIC. Proceedings of the 7th International Conference on Cloud Computing and Services Science - Volume 1: CLOSER, doi: 10.5220/0006307505150522, isbn: 978-989-758-243-1.
- L. Ahmed, V. Georgiev, M. Capuccini, S. Toor, W. Schaal, E. Laure and O. Spjuth. Efficient Iterative Virtual Screening with Apache Spark and Conformal Prediction. Submitted in Journal of Cheminformatics, special collection addition Novel applications of machine learning in cheminformatics.
- B. Drawert, M. Trogdon, S. Toor, L. Petzold, A. Hellander MOLNs: A cloud appliance for interactive, reproducible and scalable spatial stochastic computational experiments, SIAM J. Sci. Comput., 38(3), C179–C202. (24 pages). DOI:10.1137/15M1014784.
- L. Osmani, S. Toor, M. Komu, M. J. Kortelainen, T. Lindén, J. White, R. Khan, P. Eerola, S. Tarkoma. Secure Cloud Connectivity for Scientific Applications. Accepted in IEEE/ACM Transactions on Services Computing, SI - Cloud Services Meet Big Data, SI - Cloud Services Meet Big Data, Volume: PP, Issue: 99, doi: 10.1109/TSC.2015.2469292
- B. Mahjani, S. Toor, C. Nettelblad, S. Holmgren. A flexible computational framework using R and Map-Reduce for permutation tests of massive genetic analysis of complex traits. Accepted in IEEE Transactions on Services Computing, SI - Cloud Services Meet Big Data, Volume: PP, Issue: 99, doi: 10.1109/TSC.2015.2469292.
- J. White, S. Toor, P. Eerola, T. Lindén, L. Osmani and S. Tarkoma. Dynamic Provisioning of Resources in a Hybrid Infrastructures. Accepted in International Symposium on Grids and Clouds 2014, 23-28 March 2014. link
- S. Toor, L. Osmani, P. Eerola, O. Kraemer, T. Lindén, S. Tarkoma, J. White. A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system. Journal of Physics: Conference Series 513 062047 doi:10.1088/1742-6596/513/6/062047.
- A. Andrejev, S. Toor, A. Hellander, S. Holmgren, T. Risch, Scientific analysis by queries in extended SPARQL over a scalable e-Science data store, Proc. 9th IEEE International Conference on e-Science, pp. 98-106, 2013. doi: 10.1109/eScience.2013.19
- S. Toor, R. Toebbicke, M. Zotes Resines, S. Holmgren, Investigating an Open Source Cloud Storage Infrastructure for CERN-specific Data Analysis, Proc. 7th International Conference on Networking, Architecture, and Storage (NAS), pp. 84-88, 2012. doi: 10.1109/NAS.2012.14
- P.-O. Östberg, A. Hellander, B. Drawert, E. Elmroth, S. Holmgren and L. Petzold, Abstractions for Scaling eScience Applications to Distributed Computing Environments; A StratUm Integration Case Study in Molecular Systems Biology, Proceedings of BIOINFORMATICS 2012, International Conference on Bioinformatics Models, Methods, and Algorithms, pp. 290-294, 2012.
- P-O Östberg, A. Hellander, B. Drawert, E. Elmroth, S. Holmgren, L. Petzold, Reducing Complexity in Management of eScience Computations, Proceedings of CCGrid 2012 - The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 845-852, 2012
- J. K. Nilsen, S. Toor, Z. Nagy, B. Mohn, A. Read. Performance and stability of the Chelonia storage system. In: Proc. International Symposium on Grids and Clouds 2012. ISGC 2012. Trieste, Italy: SISSA; 2012. p. 009:1-14. Proceedings of Science, 153.
- J. K. Nilsen, S. Toor, Z. Nagy, and A. Read. Chelonia: A self-healing, replicated storage system. Journal of Physics: Conference Series, 331(6):062019, 2011.
- S. Toor, M. Sabesan, S. Holmgren, T. Risch, A Scalable Architecture for e-Science Data Management, 2011 IEEE 7th International Conference on E-Science, vol., no., pp.210,217, 5-8 Dec. 2011. doi: 10.1109/eScience.2011.37
- M. Jayawardena, C. Nettelblad, S. Toor, P-O. Östberg, E. Elmroth and S. Holmgren. A Grid-Enabled Problem Solving Environment for QTL Analysis in R. Accepted for publication in Proc. 2nd Interna- tional Conference on Bioinformatics and Computational Biology (BI- CoB 2010), 2010.
- J. K. Nilsen, S. Toor, Zs. Nagy and B. Mohn. Chelonia - A Self-healing Storage Cloud. In M. Bubak, M. Turala, and K. Wiatr, editors, CGW´09 Proceedings, Krakow, 2 2010. ACC CYFRONET AGH. ISBN 978-83-61433-01-9.
- S. Toor, B. Mohn, S. Holmgren. Case-Study for Different Models of Resource Brokering in Grid Systems. Technical Report no. 2010-009, Department of Information Technology, Uppsala University.
- J. K. Nilsen, S. Toor, Z. Nagy and A. Read. Chelonia: A self-healing, replicated storage system. Submitted to the International Conference on Computing in High Energy and Nuclear Physics (CHEP) 2010.
- E. Elmroth, S. Holmgren, J. Lindemann, S. Toor, and P-O. Östberg. Empowering a Flexible Application Portal with a SOA-based Grid Job Management Framework. Accepted for publication in Proc. 9th Workshop on State-of-the-art in Scientific and Parallel Computing (PARA 2008), Lecture Notes in Computer Science, Springer-Verlag.
- Grid-enabling an efficient algorithm for demanding global optimization problems in genetic analysis. Mahen Jayawardena and Sverker Holmgren. In Proc. 3rd International Conference on e-Science and Grid Computing, pp 205-212, IEEE, Piscataway, NJ, 2007.
Technical Reports and Theses
- S. Toor, B. Mohn, D. Cameron, S. Holmgren. Case-Study for Different Models of Resource Brokering in Grid Systems. Technical Report no. 2010-009, Department of Information Technology, Uppsala University.
- Parallel algorithms and implementations for genetic analysis of quantitative traits. Mahen Jayawardena. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2007-005, 2007. (Supervised by Prof. Sverker Holmgren )
- An e-Science Approach to Genetic Analysis of Quantitative Traits. Mahen Jayawardena. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology nr 708, Acta Universitatis Upsaliensis, Uppsala, 2010. (Supervised by Prof. Sverker Holmgren )
- Managing Applications and Data in Distributed Computing Infrastructures. Salman Toor. Licentiate thesis, IT licentiate theses / Uppsala University, Department of Information Technology nr 2010-003, 2010. (Supervised by Prof. Sverker Holmgren )
- Managing Applications and Data in Distributed Computing Infrastructures. Salman Toor. Ph.D. thesis, Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214; 940. (Supervised by Prof. Sverker Holmgren )
Supervised M.Sc. Theses and Project Reports
- S. Toor, A Grid Portal Implementation for Genetic Mapping of Multiple QTL. MSc Thesis in Scientific Computing, Uppsala University, 2007. (Supervised by Prof. Sverker Holmgren)
- A. Jallow, CLOUD-METRIC: A Cost Effective Application Development Framework for Cloud Infrastructures. MSc Thesis in Computer Science, Uppsala University, 2016. (Supervised by Prof. Salman Toor)
- P. Torruangwatthana, S3DA: A Stream-based Solution for Scalable Data Analysis. MSc Thesis in Computer Science, Uppsala University, 2016. (Supervised by Prof. Salman Toor)
- F. Wrede, An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations. MSc Thesis in Bioinformatics, Uppsala University, 2016. (Supervised by Prof. Andreas Hellander)
- V. H. Socheat, Automatic and scalable cloud framework for parametric studies using scientific applications. MSc Thesis in Computer Science, Uppsala University, 2016. (Supervised by Prof. Andreas Hellander)
- A. Okonski, GROOT: Infrastructure Security as a Service (ISaaS). MSc Thesis in Computer Science, Uppsala University, 2017. (Supervised by Prof. Salman Toor)
Software
MOLNs: A Virtual Experimentation Platform for Systems Biology (and more)
StochSS: Stochastic Simulation Service, Developed in collabortion with the Petzold and Krintz labs at UCSB.
Chelonia Storage, Developed with in the collaboration of NorduGrid.
CLOUD-METRIC
GROOT
Contact Information
Collaborators
Cloud and Grid Computing Group at Umeå University
Örjan Carlborg's group at the Swedish University of Agricultural Sciences (SLU)
Computational System Biology group at the Scientific Computing Division of Uppsala University
Nordu Grid Collaboration
Uppsala DataBase Laboratory
Nordic e-Infrastructure Collaboration (NeIC)
Glenna Project