Launchers
Contents
Launchers¶
SmartSim interfaces with a number of backends called launchers that are responsible for constructing jobs based on run parameters and launching them onto a system.
The launchers allow SmartSim users to interact with their system programmatically through a python interface. Because of this, SmartSim users do not have to leave the Jupyter Notebook, Python REPL, or Python script to launch, query, and interact with their jobs.
- SmartSim currently supports 5 launchers:
local
: for single-node, workstation, or laptopslurm
: for systems using the Slurm schedulerpbs
: for systems using the PBSpro schedulercobalt
: for systems using the Cobalt schedulerlsf
: for systems using the LSF schedulerauto
: have SmartSim auto-detect the launcher to use.
To specify a specific launcher, one argument needs to be provided
to the Experiment
initialization.
from smartsim import Experiment
exp = Experiment("name-of-experiment", launcher="local") # local launcher
exp = Experiment("name-of-experiment", launcher="slurm") # Slurm launcher
exp = Experiment("name-of-experiment", launcher="pbs") # PBSpro launcher
exp = Experiment("name-of-experiment", launcher="cobalt") # Cobalt launcher
exp = Experiment("name-of-experiment", launcher="lsf") # LSF launcher
exp = Experiment("name-of-experiment", launcher="auto") # auto-detect launcher
Local¶
The local launcher can be used on laptops, workstations and single nodes of supercomputer and cluster systems. Through launching locally, users can prototype workflows and quickly scale them to larger systems with minimal changes.
As with all launchers in SmartSim, the local launcher supports asynchronous execution meaning once entities have been launched the main thread of execution is not blocked. Daemon threads that manage currently running jobs will be created when active jobs are present within SmartSim.
The local launcher uses the psutil library to execute and monitor user-created jobs.
Running Locally¶
The local launcher supports the base RunSettings API which can be used to run executables as well as run executables with arbitrary launch binaries like mpiexec.
The local launcher is the default launcher for all Experiment
instances.
The local launcher does not support batch launching. Ensembles are always executed in parallel but launched sequentially.
Slurm¶
The Slurm launcher works directly with the Slurm scheduler to launch, query,
monitor and stop applications. During the course of an Experiment
,
launched entities can be queried for status, completion, and errors.
The amount of communication between SmartSim and Slurm can be tuned
for specific guidelines of different sites by setting the
value for jm_interval
in the SmartSim configuration file.
To use the Slurm launcher, specify at Experiment
initialization:
from smartsim import Experiment
exp = Experiment("NAMD-worklfow", launcher="slurm")
Running on Slurm¶
- The Slurm launcher supports three types of
RunSettings
: - As well as batch settings for
sbatch
through:
Both supported RunSettings
types above can be added
to a SbatchSettings
batch workload through Ensemble
creation.
Getting Allocations¶
Slurm supports a number of user facing features that other schedulers
do not. For this reason, an extra module smartsim.slurm can be
used to obtain allocations to launch on and release them after
Experiment
completion.
from smartsim.wlm import slurm
alloc = slurm.get_allocation(nodes=1)
The ID of the allocation is returned as a string to the user so that they can specify what entities should run on which allocations obtained by SmartSim.
Additional arguments that would have been passed to the salloc
command can be passed through the options
argument in a dictionary.
Anything passed to the options will be processed as a Slurm argument and appended to the salloc command with the appropriate prefix (e.g. - or –).
- For arguments without a value, pass None as the value:
exclusive=None
from smartsim.wlm import slurm
salloc_options = {
"C": "haswell",
"partition": "debug",
"exclusive": None
}
alloc_id = slurm.get_slurm_allocation(nodes=128,
time="10:00:00",
options=salloc_options)
The above code would generate a salloc
command like:
salloc -N 5 -C haswell --partition debug --time 10:00:00 --exclusive
Releasing Allocations¶
The smartsim.slurm interface also supports releasing allocations obtained in an experiment.
The example below releases the allocation in the example above.
from smartsim.wlm import slurm
salloc_options = {
"C": "haswell",
"partition": "debug",
"exclusive": None
}
alloc_id = slurm.get_slurm_allocation(nodes=128,
time="10:00:00",
options=salloc_options)
# <experiment code goes here>
slurm.release_slurm_allocation(alloc_id)
PBSPro¶
Like the Slurm launcher, the PBSPro launcher works directly with the PBSPro scheduler to launch, query, monitor and stop applications.
The amount of communication between SmartSim and PBSPro can be tuned
for specific guidelines of different sites by setting the
value for jm_interval
in the SmartSim configuration file.
To use the PBSpro launcher, specify at Experiment
initialization:
from smartsim import Experiment
exp = Experiment("LAMMPS-melt", launcher="pbs")
Running on PBSpro¶
- The PBSpro launcher supports three types of
RunSettings
: - As well as batch settings for
qsub
through:
Both supported RunSettings
types above can be added
to a QsubBatchSettings
batch workload through Ensemble
creation.
Cobalt¶
The Cobalt Launcher works just like the PBSPro launcher and is compatible with ALPS and OpenMPI workloads as well.
To use the Cobalt launcher, specify at Experiment
initialization:
from smartsim import Experiment
exp = Experiment("MOM6-double-gyre", launcher="cobalt")
Running on Cobalt¶
- The Cobalt launcher supports three types of
RunSettings
: - As well as batch settings for
qsub
through:
Both supported RunSettings
types above can be added
to a CobaltBatchSettings
batch workload through Ensemble
creation.
LSF¶
The LSF Launcher works like the PBSPro and Cobalt launchers and is compatible with LSF and OpenMPI workloads.
To use the LSF launcher, specify at Experiment
initialization:
from smartsim import Experiment
exp = Experiment("MOM6-double-gyre", launcher="lsf")
Running on LSF¶
- The LSF launcher supports three types of
RunSettings
: - As well as batch settings for
bsub
through:
Both supported RunSettings
types above can be added
to a BsubBatchSettings
batch workload through Ensemble
creation.