This page is a copy of research/scientific_computing/project/genomics/QTLaaS (Wed, 31 Aug 2022 15:01:11)
Software-as-a-Service for Analysis of Quantitative Trait Loci
Sub-project of: Computational Genomics
Participants
- Sverker Holmgren
- Behrang Mahjani (PhD student. Principal advisor: S. Holmgren.)
- Salman Toor (PhD in Scientific Computing)
- Carl Nettelblad (Assistant Professor in Scientific Computing)
Summary
We have developed QTL as a Service (QTLaaS) using PruneDIRECT algorithm. QTLaaS automatically deploys an R cluster for using PruneDIRECT, or any statistical analysis in R, over your desired infrastructure.
First the user should install Ansible. Then Ansible node will handle the orchestration of computational environment using our code. The codes for automatically enabling our architecture on any infrastructure is available via:
https://github.com/QTLaaS/QTLaaS
Three files are required for this method: ansible_install.sh, setup_var.yml,spark_deployment.yml
- Install Ansible using the bash script in the file: ansible_install.sh. Configure Ansible hosts.
- Modify the environment variables available in the file: setup_var.yml, if needed.
- For setup deployment run: spark_deployment.yml as root which is the actual file that contains all the installation restructures for all the components of our architecture. For example # ansible-playbook -s spark_deployment.yml, where -s is the sudo flag.
We will soon provide a demo through this webpage using the SNIC cloud resources. Any user can try QTLaaS over a few nodes in our cloud setting. For larger computation, one can download QTLaaS from the github repository and it automatically deploys the desired number of nodes over an infrastructure.
Setup details
- Setup up at least 3 nodes, one for the Ansible Master, one for the Spark Master, and at least one for the Spark Worker.
- Install Ansible using the bash script in the file: ansible_install.sh.
- Add the IP-address/hostnames of Spark Master and Spark Worker to /etc/hosts in Ansible Master node.
- Generate a key and copy its public part to ~/.ssh/authorized_keys in all the Spark nodes.
- Edit /etc/ansible/hosts using example-hosts-file available in the reprository. (Add [sparkmaster] followed by the name of sparkmaster node in the next line. Add [sparkworker] followed by the names of sparkworkers in the next lines, one per line).
- Modify the environment variables available in the file: setup_var.yml, if needed.
- Run ansible-playbook -s spark_deployment.yml, where -s is the sudo flag.
- Make sure the following ports are open on Spark Master node, 60060 for Jupyter Hub, 7077 Spark Context, 8080 Spark Web UI.
- Now you can access following services: http://sparkmaster:60060 http://sparkmaster:8080
- Use example-sparkR file to make sure your setup is working.
After all the steps above, Jupiter, Spark Master and R will be installed in Spark Master, and Spark Worker and R is installed in all Spark Workers.
How to add nodes
In order to add new nodes to your already configured cluster, you should just add the new hosts to the Ansible hosts file, under [sparkworker] tag.
Demo Service
You can access a demo Service of QTLaaS from the following url:
http://130.238.28.241:60060/
This demo is only for testing purposes with limited amount of resources.