SoS as a script organizer and executor

Script of Scripts (SoS) is both an interactive notebook as well as bioinformatics workflow system that we use at the lab for our daily computational research.

Please follow this document to install SoS with JupyterLab and Docker.

How to use this tutorial

This tutorial demonstrates how SoS can be used to put together many otherwise scattered script, and provide a unified command interface to running them. This document should be self-explanary. The document source code, sos_meta_script.ipynb can be found here (accessible only to lab members), in case you are interested in running these codes in addition to reviewing them below.

Parameter setting

[global]
# parameter 1
parameter: n = 1.0
# parameter 2
parameter: beta = [1.0,2.0,3.0]

Some Bash code

# Print the value of n with bash
[print_n]
bash: expand = '${ }'
    echo ${n}

Some other Bash code

# Print the value of beta with bash
[print_beta]
bash: expand = '${ }'
    echo ${beta}

Some Python code

# Print log(beta) with Python
[log_beta]
python: expand = '${ }'
    import numpy as np
    print(np.log(${beta}))

Some R code

# Print exp(n) with R
[exp_n]
R: expand = '${ }'
    print(exp(${n}))

The SoS meta-script command interface

sos run sos_meta_script.ipynb -h

usage: sos run sos_meta_script.ipynb
               [workflow_name | -t targets] [options] [workflow_options]
  workflow_name:        Single or combined workflows defined in this script
  targets:              One or more targets to generate
  options:              Single-hyphen sos parameters (see "sos run -h" for details)
  workflow_options:     Double-hyphen workflow-specific parameters

Workflows:
  print_n
  print_beta
  log_beta
  exp_n

Global Workflow Options:
  --n 1.0 (as float)
                        parameter 1
  --beta 1.0 2.0 3.0 (as list)
                        parameter 2

Sections
  print_n:              Print the value of n with bash
  print_beta:           Print the value of beta with bash
  log_beta:             Print log(beta) with Python
  exp_n:                Print exp(n) with R

Run the script

sos run sos_meta_script.ipynb print_n

INFO: Running print_n: Print the value of n with bash
1.0
INFO: print_n is completed.
INFO: Workflow print_n (ID=w4bcbb8958466f710) is executed successfully with 1 completed step.
sos run sos_meta_script.ipynb print_n --n 666

INFO: Running print_n: Print the value of n with bash
666.0
INFO: print_n is completed.
INFO: Workflow print_n (ID=we094e7d433abb2ad) is executed successfully with 1 completed step.
sos run sos_meta_script.ipynb print_beta

INFO: Running print_beta: Print the value of beta with bash
[1.0, 2.0, 3.0]
INFO: print_beta is completed.
INFO: Workflow print_beta (ID=w78fa93e094c77376) is executed successfully with 1 completed step.
sos run sos_meta_script.ipynb print_beta --beta 666

INFO: Running print_beta: Print the value of beta with bash
[666.0]
INFO: print_beta is completed.
INFO: Workflow print_beta (ID=w573651170106b134) is executed successfully with 1 completed step.
sos run sos_meta_script.ipynb log_beta

INFO: Running log_beta: Print log(beta) with Python
[0.         0.69314718 1.09861229]
INFO: log_beta is completed.
INFO: Workflow log_beta (ID=w077f5b194f70ad1b) is executed successfully with 1 completed step.
sos run sos_meta_script.ipynb log_beta --beta 2.7183

INFO: Running log_beta: Print log(beta) with Python
[1.00000668]
INFO: log_beta is completed.
INFO: Workflow log_beta (ID=wcba2081d2ed3ef96) is executed successfully with 1 completed step.
sos run sos_meta_script.ipynb exp_n

INFO: Running exp_n: Print exp(n) with R
[1] 2.718282
INFO: exp_n is completed.
INFO: Workflow exp_n (ID=w68dec66676a24f9f) is executed successfully with 1 completed step.

Use SoS on High Performance Computing (HPC) cluster

Please check out this notebook for an example using SoS to submit jobs on our HPC cluster.

Additional SoS workflow and notebook examples

  1. Learn from these examples the very basic usage of SoS Workflow (you can find and run the first 2 at: http://sosworkflows.com):
  2. You can try to reproduce this example on your computer (source code here). In particular, note how multiple samples are processed in parallel (group_by in SoS) and how intermediate results can be visualized within the workflow notebook. Also note how docker containers are used to execute the workflow to help avoid installing all software dependencies and ensuring reproducible results.