SoS as a script organizer and executor
Script of Scripts (SoS) is both an interactive notebook as well as bioinformatics workflow system that we use at the lab for our daily computational research.
Please follow this document to install SoS with JupyterLab and Docker.
How to use this tutorial
This tutorial demonstrates how SoS can be used to put together many otherwise scattered script, and provide a unified command interface to running them. This document should be self-explanary. The document source code, sos_meta_script.ipynb
can be found here (accessible only to lab members), in case you are interested in running these codes in addition to reviewing them below.
Parameter setting
[global]
# parameter 1
parameter: n = 1.0
# parameter 2
parameter: beta = [1.0,2.0,3.0]
Some Bash code
# Print the value of n with bash
[print_n]
bash: expand = '${ }'
echo ${n}
Some other Bash code
# Print the value of beta with bash
[print_beta]
bash: expand = '${ }'
echo ${beta}
Some Python code
# Print log(beta) with Python
[log_beta]
python: expand = '${ }'
import numpy as np
print(np.log(${beta}))
Some R code
# Print exp(n) with R
[exp_n]
R: expand = '${ }'
print(exp(${n}))
The SoS meta-script command interface
sos run sos_meta_script.ipynb -h
Run the script
sos run sos_meta_script.ipynb print_n
sos run sos_meta_script.ipynb print_n --n 666
sos run sos_meta_script.ipynb print_beta
sos run sos_meta_script.ipynb print_beta --beta 666
sos run sos_meta_script.ipynb log_beta
sos run sos_meta_script.ipynb log_beta --beta 2.7183
sos run sos_meta_script.ipynb exp_n
Use SoS on High Performance Computing (HPC) cluster
Please check out this notebook for an example using SoS to submit jobs on our HPC cluster.
Additional SoS workflow and notebook examples
- Learn from these examples the very basic usage of SoS Workflow (you can find and run the first 2 at: http://sosworkflows.com):
- You can try to reproduce this example on your computer (source code here). In particular, note how multiple samples are processed in parallel (
group_by
in SoS) and how intermediate results can be visualized within the workflow notebook. Also note how docker containers are used to execute the workflow to help avoid installing all software dependencies and ensuring reproducible results.