SmartSim API

Experiment

Experiment.__init__(name[, exp_path, launcher])

Initialize an Experiment instance

Experiment.start(*args[, block, summary, ...])

Start passed instances using Experiment launcher

Experiment.stop(*args)

Stop specific instances launched by this Experiment

Experiment.create_ensemble(name[, params, ...])

Create an Ensemble of Model instances

Experiment.create_model(name, run_settings)

Create a general purpose Model

Experiment.create_database([port, db_nodes, ...])

Initialize an Orchestrator database

Experiment.create_run_settings(exe[, ...])

Create a RunSettings instance.

Experiment.create_batch_settings([nodes, ...])

Create a BatchSettings instance

Experiment.generate(*args[, tag, overwrite])

Generate the file structure for an Experiment

Experiment.poll([interval, verbose, ...])

Monitor jobs through logging to stdout.

Experiment.finished(entity)

Query if a job has completed.

Experiment.get_status(*args)

Query the status of launched instances

Experiment.reconnect_orchestrator(checkpoint)

Reconnect to a running Orchestrator

Experiment.summary([format])

Return a summary of the Experiment

class Experiment(name, exp_path=None, launcher='local')[source]

Bases: object

Experiments are the Python user interface for SmartSim.

Experiment is a factory class that creates stages of a workflow and manages their execution.

The instances created by an Experiment represent executable code that is either user-specified, like the Model instance created by Experiment.create_model, or pre-configured, like the Orchestrator instance created by Experiment.create_database.

Experiment methods that accept a variable list of arguments, such as Experiment.start or Experiment.stop, accept any number of the instances created by the Experiment.

In general, the Experiment class is designed to be initialized once and utilized throughout runtime.

Initialize an Experiment instance

With the default settings, the Experiment will use the local launcher, which will start all Experiment created instances on the localhost.

Example of initializing an Experiment with the local launcher

exp = Experiment(name="my_exp", launcher="local")

SmartSim supports multiple launchers which also can be specified based on the type of system you are running on.

exp = Experiment(name="my_exp", launcher="slurm")

If you wish your driver script and Experiment to be run across multiple system with different schedulers (workload managers) you can also use the auto argument to have the Experiment guess which launcher to use based on system installed binaries and libraries

exp = Experiment(name="my_exp", launcher="auto")

The Experiment path will default to the current working directory and if the Experiment.generate method is called, a directory with the Experiment name will be created to house the output from the Experiment.

Parameters
  • name (str) – name for the Experiment

  • exp_path (str, optional) – path to location of Experiment directory if generated

  • launcher (str, optional) – type of launcher being used, options are “slurm”, “pbs”, “cobalt”, “lsf”, or “local”. If set to “auto”, an attempt will be made to find an available launcher on the system. Defaults to “local”

create_batch_settings(nodes=1, time='', queue='', account='', batch_args=None, **kwargs)[source]

Create a BatchSettings instance

Batch settings parameterize batch workloads. The result of this function can be passed to the Ensemble initialization.

the batch_args parameter can be used to pass in a dictionary of additional batch command arguments that aren’t supported through the smartsim interface

# i.e. for Slurm
batch_args = {
    "distribution": "block"
    "exclusive": None
}
bs = exp.create_batch_settings(nodes=3,
                               time="10:00:00",
                               batch_args=batch_args)
bs.set_account("default")
Parameters
  • nodes (int, optional) – number of nodes for batch job, defaults to 1

  • time (str, optional) – length of batch job, defaults to “”

  • queue (str, optional) – queue or partition (if slurm), defaults to “”

  • account (str, optional) – user account name for batch system, defaults to “”

  • batch_args (dict[str, str], optional) – additional batch arguments, defaults to None

Returns

a newly created BatchSettings instance

Return type

BatchSettings

Raises

SmartSimError – if batch creation fails

create_database(port=6379, db_nodes=1, batch=False, hosts=None, run_command='auto', interface='ipogif0', account=None, time=None, queue=None, single_cmd=True, **kwargs)[source]

Initialize an Orchestrator database

The Orchestrator database is a key-value store based on Redis that can be launched together with other Experiment created instances for online data storage.

When launched, Orchestrator can be used to communicate data between Fortran, Python, C, and C++ applications.

Machine Learning models in Pytorch, Tensorflow, and ONNX (i.e. scikit-learn) can also be stored within the Orchestrator database where they can be called remotely and executed on CPU or GPU where the database is hosted.

To enable a SmartSim Model to communicate with the database the workload must utilize the SmartRedis clients. For more information on the database, and SmartRedis clients see the documentation at www.craylabs.org

Parameters
  • port (int, optional) – TCP/IP port, defaults to 6379

  • db_nodes (int, optional) – number of database shards, defaults to 1

  • batch (bool, optional) – run as a batch workload, defaults to False

  • hosts (list[str], optional) – specify hosts to launch on, defaults to None

  • run_command (str, optional) – specify launch binary or detect automatically, defaults to “auto”

  • interface (str, optional) – Network interface, defaults to “ipogif0”

  • account (str, optional) – account to run batch on, defaults to None

  • time (str, optional) – walltime for batch ‘HH:MM:SS’ format, defaults to None

  • queue (str, optional) – queue to run the batch on, defaults to None

  • single_cmd (bool, optional) – run all shards with one (MPMD) command, defaults to True

Raises
  • SmartSimError – if detection of launcher or of run command fails

  • SmartSimError – if user indicated an incompatible run command for the launcher

Returns

Orchestrator

Return type

Orchestrator or derived class

create_ensemble(name, params=None, batch_settings=None, run_settings=None, replicas=None, perm_strategy='all_perm', **kwargs)[source]

Create an Ensemble of Model instances

Ensembles can be launched sequentially or as a batch if using a non-local launcher. e.g. slurm

Ensembles require one of the following combinations of arguments

  • run_settings and params

  • run_settings and replicas

  • batch_settings

  • batch_settings, run_settings, and params

  • batch_settings, run_settings, and replicas

If given solely batch settings, an empty ensemble will be created that models can be added to manually through Ensemble.add_model(). The entire ensemble will launch as one batch.

Provided batch and run settings, either params or replicas must be passed and the entire ensemble will launch as a single batch.

Provided solely run settings, either params or replicas must be passed and the ensemble members will each launch sequentially.

The kwargs argument can be used to pass custom input parameters to the permutation strategy.

Parameters
  • name (str) – name of the ensemble

  • params (dict[str, Any]) – parameters to expand into Model members

  • batch_settings (BatchSettings) – describes settings for Ensemble as batch workload

  • run_settings (RunSettings) – describes how each Model should be executed

  • replicas (int) – number of replicas to create

  • perm_strategy (str, optional) – strategy for expanding params into Model instances from params argument options are “all_perm”, “stepped”, “random” or a callable function. Default is “all_perm”.

Raises

SmartSimError – if initialization fails

Returns

Ensemble instance

Return type

Ensemble

create_model(name, run_settings, params=None, path=None, enable_key_prefixing=False)[source]

Create a general purpose Model

The Model class is the most general encapsulation of executable code in SmartSim. Model instances are named references to pieces of a workflow that can be parameterized, and executed.

Model instances can be launched sequentially or as a batch by adding them into an Ensemble.

Parameters supplied in the params argument can be written into configuration files supplied at runtime to the model through Model.attach_generator_files. params can also be turned into executable arguments by calling Model.params_to_args

By default, Model instances will be executed in the current working directory if no path argument is supplied. If a Model instance is passed to Experiment.generate, a directory within the Experiment directory will be created to house the input and output files from the model.

Example initialization of a Model instance

from smartsim import Experiment
run_settings = exp.create_run_settings("python", "run_pytorch_model.py")
model = exp.create_model("pytorch_model", run_settings)

# adding parameters to a model
run_settings = exp.create_run_settings("python", "run_pytorch_model.py")
train_params = {
    "batch": 32,
    "epoch": 10,
    "lr": 0.001
}
model = exp.create_model("pytorch_model", run_settings, params=params)
model.attach_generator_files(to_configure="./train.cfg")
exp.generate(model)

New in 0.4.0, Model instances can be co-located with an Orchestrator database shard through Model.colocate_db. This will launch a single Orchestrator instance on each compute host used by the (possibly distributed) application. This is useful for performant online inference or processing at runtime.

Parameters
  • name (str) – name of the model

  • run_settings (RunSettings) – defines how Model should be run

  • params (dict, optional) – model parameters for writing into configuration files

  • path (str, optional) – path to where the model should be executed at runtime

  • enable_key_prefixing (bool, optional) – If True, data sent to the Orchestrator using SmartRedis from this Model will be prefixed with the Model name. Default is True.

Raises

SmartSimError – if initialization fails

Returns

the created Model

Return type

Model

create_run_settings(exe, exe_args=None, run_command='auto', run_args=None, env_vars=None, container=None, **kwargs)[source]

Create a RunSettings instance.

run_command=”auto” will attempt to automatically match a run command on the system with a RunSettings class in SmartSim. If found, the class corresponding to that run_command will be created and returned.

If the local launcher is being used, auto detection will be turned off.

If a recognized run command is passed, the RunSettings instance will be a child class such as SrunSettings

If not supported by smartsim, the base RunSettings class will be created and returned with the specified run_command and run_args will be evaluated literally.

Run Commands with implemented helper classes:
  • aprun (ALPS)

  • srun (SLURM)

  • mpirun (OpenMPI)

  • jsrun (LSF)

Parameters
  • run_command (str) – command to run the executable

  • exe (str) – executable to run

  • exe_args (list[str], optional) – arguments to pass to the executable

  • run_args (list[str], optional) – arguments to pass to the run_command

  • env_vars (dict[str, str], optional) – environment variables to pass to the executable

Returns

the created RunSettings

Return type

RunSettings

finished(entity)[source]

Query if a job has completed.

An instance of Model or Ensemble can be passed as an argument.

Passing Orchestrator will return an error as a database deployment is never finished until stopped by the user.

Parameters

entity (Model | Ensemble) – object launched by this Experiment

Returns

True if job has completed, False otherwise

Return type

bool

Raises

SmartSimError – if entity has not been launched by this Experiment

generate(*args, tag=None, overwrite=False)[source]

Generate the file structure for an Experiment

Experiment.generate creates directories for each instance passed to organize Experiments that launch many instances.

If files or directories are attached to Model objects using Model.attach_generator_files(), those files or directories will be symlinked, copied, or configured and written into the created directory for that instance.

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to the generate method.

Parameters
  • tag (str, optional) – tag used in to_configure generator files

  • overwrite (bool, optional) – overwrite existing folders and contents, defaults to False

get_status(*args)[source]

Query the status of launched instances

Return a smartsim.status string representing the status of the launched instance.

exp.get_status(model)

As with an Experiment method, multiple instance of varying types can be passed to and all statuses will be returned at once.

statuses = exp.get_status(model, ensemble, orchestrator)
assert all([status == smartsim.status.STATUS_COMPLETED for status in statuses])
Returns

status of the instances passed as arguments

Return type

list[str]

Raises

SmartSimError – if status retrieval fails

poll(interval=10, verbose=True, kill_on_interrupt=True)[source]

Monitor jobs through logging to stdout.

This method should only be used if jobs were launched with Experiment.start(block=False)

The internal specified will control how often the logging is performed, not how often the polling occurs. By default, internal polling is set to every second for local launcher jobs and every 10 seconds for all other launchers.

If internal polling needs to be slower or faster based on system or site standards, set the SMARTSIM_JM_INTERNAL environment variable to control the internal polling interval for SmartSim.

For more verbose logging output, the SMARTSIM_LOG_LEVEL environment variable can be set to debug

If kill_on_interrupt=True, then all jobs launched by this experiment are guaranteed to be killed when ^C (SIGINT) signal is received. If kill_on_interrupt=False, then it is not guaranteed that all jobs launched by this experiment will be killed, and the zombie processes will need to be manually killed.

Parameters
  • interval (int, optional) – frequency (in seconds) of logging to stdout, defaults to 10 seconds

  • verbose (bool, optional) – set verbosity, defaults to True

  • kill_on_interrupt (bool, optional) – flag for killing jobs when SIGINT is received

Raises

SmartSimError

reconnect_orchestrator(checkpoint)[source]

Reconnect to a running Orchestrator

This method can be used to connect to a Orchestrator deployment that was launched by a previous Experiment. This can be helpful in the case where separate runs of an Experiment wish to use the same Orchestrator instance currently running on a system.

Parameters

checkpoint (str) – the smartsim_db.dat file created when an Orchestrator is launched

start(*args, block=True, summary=False, kill_on_interrupt=True)[source]

Start passed instances using Experiment launcher

Any instance Model, Ensemble or Orchestrator instance created by the Experiment can be passed as an argument to the start method.

exp = Experiment(name="my_exp", launcher="slurm")
settings = exp.create_run_settings(exe="./path/to/binary")
model = exp.create_model("my_model", settings)
exp.start(model)

Multiple instance can also be passed to the start method at once no matter which type of instance they are. These will all be launched together.

exp.start(model_1, model_2, db, ensemble, block=True)
# alternatively
stage_1 = [model_1, model_2, db, ensemble]
exp.start(*stage_1, block=True)

If block==True the Experiment will poll the launched instances at runtime until all non-database jobs have completed. Database jobs must be killed by the user by passing them to Experiment.stop. This allows for multiple stages of a workflow to produce to and consume from the same Orchestrator database.

If kill_on_interrupt=True, then all jobs launched by this experiment are guaranteed to be killed when ^C (SIGINT) signal is received. If kill_on_interrupt=False, then it is not guaranteed that all jobs launched by this experiment will be killed, and the zombie processes will need to be manually killed.

Parameters
  • block (bool, optional) – block execution until all non-database jobs are finished, defaults to True

  • summary (bool, optional) – print a launch summary prior to launch, defaults to False

  • kill_on_interrupt (bool, optional) – flag for killing jobs when ^C (SIGINT) signal is received.

stop(*args)[source]

Stop specific instances launched by this Experiment

Instances of Model, Ensemble and Orchestrator can all be passed as arguments to the stop method.

Whichever launcher was specified at Experiment initialization will be used to stop the instance. For example, which using the slurm launcher, this equates to running scancel on the instance.

Example

exp.stop(model)
# multiple
exp.stop(model_1, model_2, db, ensemble)
Raises
  • TypeError – if wrong type

  • SmartSimError – if stop request fails

summary(format='github')[source]

Return a summary of the Experiment

The summary will show each instance that has been launched and completed in this Experiment

Parameters

format (str, optional) – the style in which the summary table is formatted, for a full list of styles see: https://github.com/astanin/python-tabulate#table-format, defaults to “github”

Returns

tabulate string of Experiment history

Return type

str

Settings

Settings are provided to Model and Ensemble objects to provide parameters for how a job should be executed. Some are specifically meant for certain launchers like SbatchSettings is solely meant for system using Slurm as a workload manager. MpirunSettings for OpenMPI based jobs is supported by Slurm, PBSPro, and Cobalt.

Types of Settings:

RunSettings(exe[, exe_args, run_command, ...])

Run parameters for a Model

SrunSettings(exe[, exe_args, run_args, ...])

Initialize run parameters for a slurm job with srun

AprunSettings(exe[, exe_args, run_args, ...])

Settings to run job with aprun command

MpirunSettings(exe[, exe_args, run_args, ...])

Settings to run job with mpirun command (OpenMPI)

MpiexecSettings(exe[, exe_args, run_args, ...])

Settings to run job with mpiexec command (OpenMPI)

OrterunSettings(exe[, exe_args, run_args, ...])

Settings to run job with orterun command (OpenMPI)

JsrunSettings(exe[, exe_args, run_args, ...])

Settings to run job with jsrun command

SbatchSettings([nodes, time, account, ...])

Specify run parameters for a Slurm batch job

QsubBatchSettings([nodes, ncpus, time, ...])

Specify qsub batch parameters for a job

CobaltBatchSettings([nodes, time, queue, ...])

Specify settings for a Cobalt qsub batch launch

BsubBatchSettings([nodes, time, project, ...])

Specify bsub batch parameters for a job

Settings objects can accept a container object that defines a container runtime, image, and arguments to use for the workload. Below is a list of supported container runtimes.

Types of Containers:

Singularity(*args, **kwargs)

Singularity (apptainer) container type.

RunSettings

When running SmartSim on laptops and single node workstations, the base RunSettings object is used to parameterize jobs. RunSettings include a run_command parameter for local launches that utilize a parallel launch binary like mpirun, mpiexec, and others.

RunSettings.add_exe_args(args)

Add executable arguments to executable

RunSettings.update_env(env_vars)

Update the job environment variables

class RunSettings(exe, exe_args=None, run_command='', run_args=None, env_vars=None, container=None, **kwargs)[source]

Run parameters for a Model

The base RunSettings class should only be used with the local launcher on single node, workstations, or laptops.

If no run_command is specified, the executable will be launched locally.

run_args passed as a dict will be interpreted literally for local RunSettings and added directly to the run_command e.g. run_args = {“-np”: 2} will be “-np 2”

Example initialization

rs = RunSettings("echo", "hello", "mpirun", run_args={"-np": "2"})
Parameters
  • exe (str) – executable to run

  • exe_args (str | list[str], optional) – executable arguments, defaults to None

  • run_command (str, optional) – launch binary (e.g. “srun”), defaults to empty str

  • run_args (dict[str, str], optional) – arguments for run command (e.g. -np for mpiexec), defaults to None

  • env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

  • container (Container, optional) – container type for workload (e.g. “singularity”), defaults to None

add_exe_args(args)[source]

Add executable arguments to executable

Parameters

args (str | list[str]) – executable arguments

Raises

TypeError – if exe args are not strings

format_env_vars()[source]

Build environment variable string

Returns

formatted list of strings to export variables

Return type

list[str]

format_run_args()[source]

Return formatted run arguments

For RunSettings, the run arguments are passed literally with no formatting.

Returns

list run arguments for these settings

Return type

list[str]

make_mpmd(settings)[source]

Make job an MPMD job

Parameters

settingsRunSettings instance

reserved_run_args: set[str] = {}
property run_command

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns

launch binary e.g. mpiexec

Type

str | None

set(arg, value=None, condition=True)[source]

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters
  • arg (str) – name of the argument

  • value (str | None) – value of the argument

  • conditon – set the argument if condition evaluates to True

set_binding(binding)[source]

Set binding

Parameters

binding (str) – Binding

set_broadcast(dest_path=None)[source]

Copy executable file to allocated compute nodes

Parameters

dest_path (str | None) – Path to copy an executable file

set_cpu_bindings(bindings)[source]

Set the cores to which MPI processes are bound

Parameters

bindings (list[int] | int) – List specifing the cores to which MPI processes are bound

set_cpus_per_task(cpus_per_task)[source]

Set the number of cpus per task

Parameters

cpus_per_task (int) – number of cpus per task

set_excluded_hosts(host_list)[source]

Specify a list of hosts to exclude for launching this job

Parameters

host_list (str | list[str]) – hosts to exclude

set_hostlist(host_list)[source]

Specify the hostlist for this job

Parameters

host_list (str | list[str]) – hosts to launch on

set_hostlist_from_file(file_path)[source]

Use the contents of a file to specify the hostlist for this job

Parameters

file_path (str) – Path to the hostlist file

set_memory_per_node(memory_per_node)[source]

Set the amount of memory required per node in megabytes

Parameters

memory_per_node (int) – Number of megabytes per node

set_mpmd_preamble(preamble_lines)[source]

Set preamble to a file to make a job MPMD

Parameters

preamble_lines (list[str]) – lines to put at the beginning of a file.

set_nodes(nodes)[source]

Set the number of nodes

Parameters

nodes (int) – number of nodes to run with

set_quiet_launch(quiet)[source]

Set the job to run in quiet mode

Parameters

quiet (bool) – Whether the job should be run quietly

set_task_map(task_mapping)[source]

Set a task mapping

Parameters

task_mapping (str) – task mapping

set_tasks(tasks)[source]

Set the number of tasks to launch

Parameters

tasks (int) – number of tasks to launch

set_tasks_per_node(tasks_per_node)[source]

Set the number of tasks per node

Parameters

tasks_per_node (int) – number of tasks to launch per node

set_time(hours=0, minutes=0, seconds=0)[source]

Automatically format and set wall time

Parameters
  • hours (int) – number of hours to run job

  • minutes (int) – number of minutes to run job

  • seconds (int) – number of seconds to run job

set_verbose_launch(verbose)[source]

Set the job to run in verbose mode

Parameters

verbose (bool) – Whether the job should be run verbosely

set_walltime(walltime)[source]

Set the formatted walltime

Parameters

walltime (str) – Time in format required by launcher``

update_env(env_vars)[source]

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters

env_vars (dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises

TypeError – if env_vars values cannot be coerced to strings

SrunSettings

SrunSettings can be used for running on existing allocations, running jobs in interactive allocations, and for adding srun steps to a batch.

SrunSettings.set_nodes(nodes)

Set the number of nodes

SrunSettings.set_tasks(tasks)

Set the number of tasks for this job

SrunSettings.set_tasks_per_node(tasks_per_node)

Set the number of tasks for this job

SrunSettings.set_walltime(walltime)

Set the walltime of the job

SrunSettings.set_hostlist(host_list)

Specify the hostlist for this job

SrunSettings.set_excluded_hosts(host_list)

Specify a list of hosts to exclude for launching this job

SrunSettings.set_cpus_per_task(cpus_per_task)

Set the number of cpus to use per task

SrunSettings.add_exe_args(args)

Add executable arguments to executable

SrunSettings.format_run_args()

Return a list of slurm formatted run arguments

SrunSettings.format_env_vars()

Build bash compatible environment variable string for Slurm

SrunSettings.update_env(env_vars)

Update the job environment variables

class SrunSettings(exe, exe_args=None, run_args=None, env_vars=None, alloc=None, **kwargs)[source]

Initialize run parameters for a slurm job with srun

SrunSettings should only be used on Slurm based systems.

If an allocation is specified, the instance receiving these run parameters will launch on that allocation.

Parameters
  • exe (str) – executable to run

  • exe_args (list[str] | str, optional) – executable arguments, defaults to None

  • run_args (dict[str, str | None], optional) – srun arguments without dashes, defaults to None

  • env_vars (dict[str, str], optional) – environment variables for job, defaults to None

  • alloc (str, optional) – allocation ID if running on existing alloc, defaults to None

add_exe_args(args)

Add executable arguments to executable

Parameters

args (str | list[str]) – executable arguments

Raises

TypeError – if exe args are not strings

format_comma_sep_env_vars()[source]

Build environment variable string for Slurm

Slurm takes exports in comma separated lists the list starts with all as to not disturb the rest of the environment for more information on this, see the slurm documentation for srun

Returns

the formatted string of environment variables

Return type

tuple[str, list[str]]

format_env_vars()[source]

Build bash compatible environment variable string for Slurm

Returns

the formatted string of environment variables

Return type

list[str]

format_run_args()[source]

Return a list of slurm formatted run arguments

Returns

list of slurm arguments for these settings

Return type

list[str]

make_mpmd(srun_settings)[source]

Make a mpmd workload by combining two srun commands

This connects the two settings to be executed with a single Model instance

Parameters

srun_settings (SrunSettings) – SrunSettings instance

reserved_run_args: set[str] = {'D', 'chdir'}
property run_command

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns

launch binary e.g. mpiexec

Type

str | None

set(arg, value=None, condition=True)

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters
  • arg (str) – name of the argument

  • value (str | None) – value of the argument

  • conditon – set the argument if condition evaluates to True

set_binding(binding)

Set binding

Parameters

binding (str) – Binding

set_broadcast(dest_path=None)[source]

Copy executable file to allocated compute nodes

This sets --bcast

Parameters

dest_path (str | None) – Path to copy an executable file

set_cpu_bindings(bindings)[source]

Bind by setting CPU masks on tasks

This sets --cpu-bind using the map_cpu:<list> option

Parameters

bindings (list[int] | int) – List specifing the cores to which MPI processes are bound

set_cpus_per_task(cpus_per_task)[source]

Set the number of cpus to use per task

This sets --cpus-per-task

Parameters

num_cpus (int) – number of cpus to use per task

set_excluded_hosts(host_list)[source]

Specify a list of hosts to exclude for launching this job

Parameters

host_list (list[str]) – hosts to exclude

Raises

TypeError

set_hostlist(host_list)[source]

Specify the hostlist for this job

This sets --nodelist

Parameters

host_list (str | list[str]) – hosts to launch on

Raises

TypeError – if not str or list of str

set_hostlist_from_file(file_path)[source]

Use the contents of a file to set the node list

This sets --nodefile

Parameters

file_path (str) – Path to the hostlist file

set_memory_per_node(memory_per_node)[source]

Specify the real memory required per node

This sets --mem in megabytes

Parameters

memory_per_node (int) – Amount of memory per node in megabytes

set_mpmd_preamble(preamble_lines)

Set preamble to a file to make a job MPMD

Parameters

preamble_lines (list[str]) – lines to put at the beginning of a file.

set_nodes(nodes)[source]

Set the number of nodes

Effectively this is setting: srun --nodes <num_nodes>

Parameters

nodes (int) – number of nodes to run with

set_quiet_launch(quiet)[source]

Set the job to run in quiet mode

This sets --quiet

Parameters

quiet (bool) – Whether the job should be run quietly

set_task_map(task_mapping)

Set a task mapping

Parameters

task_mapping (str) – task mapping

set_tasks(tasks)[source]

Set the number of tasks for this job

This sets --ntasks

Parameters

tasks (int) – number of tasks

set_tasks_per_node(tasks_per_node)[source]

Set the number of tasks for this job

This sets --ntasks-per-node

Parameters

tasks_per_node (int) – number of tasks per node

set_time(hours=0, minutes=0, seconds=0)

Automatically format and set wall time

Parameters
  • hours (int) – number of hours to run job

  • minutes (int) – number of minutes to run job

  • seconds (int) – number of seconds to run job

set_verbose_launch(verbose)[source]

Set the job to run in verbose mode

This sets --verbose

Parameters

verbose (bool) – Whether the job should be run verbosely

set_walltime(walltime)[source]

Set the walltime of the job

format = “HH:MM:SS”

Parameters

walltime (str) – wall time

update_env(env_vars)

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters

env_vars (dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises

TypeError – if env_vars values cannot be coerced to strings

AprunSettings

AprunSettings can be used on any system that supports the Cray ALPS layer. SmartSim supports using AprunSettings on PBSPro and Cobalt WLM systems.

AprunSettings can be used in interactive session (on allocation) and within batch launches (e.g., QsubBatchSettings)

AprunSettings.set_cpus_per_task(cpus_per_task)

Set the number of cpus to use per task

AprunSettings.set_hostlist(host_list)

Specify the hostlist for this job

AprunSettings.set_tasks(tasks)

Set the number of tasks for this job

AprunSettings.set_tasks_per_node(tasks_per_node)

Set the number of tasks for this job

AprunSettings.make_mpmd(aprun_settings)

Make job an MPMD job

AprunSettings.add_exe_args(args)

Add executable arguments to executable

AprunSettings.format_run_args()

Return a list of ALPS formatted run arguments

AprunSettings.format_env_vars()

Format the environment variables for aprun

AprunSettings.update_env(env_vars)

Update the job environment variables

class AprunSettings(exe, exe_args=None, run_args=None, env_vars=None, **kwargs)[source]

Settings to run job with aprun command

AprunSettings can be used for both the pbs and cobalt launchers.

Parameters
  • exe (str) – executable

  • exe_args (str | list[str], optional) – executable arguments, defaults to None

  • run_args (dict[str, str], optional) – arguments for run command, defaults to None

  • env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)

Add executable arguments to executable

Parameters

args (str | list[str]) – executable arguments

Raises

TypeError – if exe args are not strings

format_env_vars()[source]

Format the environment variables for aprun

Returns

list of env vars

Return type

list[str]

format_run_args()[source]

Return a list of ALPS formatted run arguments

Returns

list of ALPS arguments for these settings

Return type

list[str]

make_mpmd(aprun_settings)[source]

Make job an MPMD job

This method combines two AprunSettings into a single MPMD command joined with ‘:’

Parameters

aprun_settings (AprunSettings) – AprunSettings instance

reserved_run_args: set[str] = {}
property run_command

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns

launch binary e.g. mpiexec

Type

str | None

set(arg, value=None, condition=True)

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters
  • arg (str) – name of the argument

  • value (str | None) – value of the argument

  • conditon – set the argument if condition evaluates to True

set_binding(binding)

Set binding

Parameters

binding (str) – Binding

set_broadcast(dest_path=None)

Copy executable file to allocated compute nodes

Parameters

dest_path (str | None) – Path to copy an executable file

set_cpu_bindings(bindings)[source]

Specifies the cores to which MPI processes are bound

This sets --cpu-binding

Parameters

bindings (list[int] | int) – List of cpu numbers

set_cpus_per_task(cpus_per_task)[source]

Set the number of cpus to use per task

This sets --cpus-per-pe

Parameters

cpus_per_task (int) – number of cpus to use per task

set_excluded_hosts(host_list)[source]

Specify a list of hosts to exclude for launching this job

Parameters

host_list (str | list[str]) – hosts to exclude

Raises

TypeError – if not str or list of str

set_hostlist(host_list)[source]

Specify the hostlist for this job

Parameters

host_list (str | list[str]) – hosts to launch on

Raises

TypeError – if not str or list of str

set_hostlist_from_file(file_path)[source]

Use the contents of a file to set the node list

This sets --node-list-file

Parameters

file_path (str) – Path to the hostlist file

set_memory_per_node(memory_per_node)[source]

Specify the real memory required per node

This sets --memory-per-pe in megabytes

Parameters

memory_per_node (int) – Per PE memory limit in megabytes

set_mpmd_preamble(preamble_lines)

Set preamble to a file to make a job MPMD

Parameters

preamble_lines (list[str]) – lines to put at the beginning of a file.

set_nodes(nodes)

Set the number of nodes

Parameters

nodes (int) – number of nodes to run with

set_quiet_launch(quiet)[source]

Set the job to run in quiet mode

This sets --quiet

Parameters

quiet (bool) – Whether the job should be run quietly

set_task_map(task_mapping)

Set a task mapping

Parameters

task_mapping (str) – task mapping

set_tasks(tasks)[source]

Set the number of tasks for this job

This sets --pes

Parameters

tasks (int) – number of tasks

set_tasks_per_node(tasks_per_node)[source]

Set the number of tasks for this job

This sets --pes-per-node

Parameters

tasks_per_node (int) – number of tasks per node

set_time(hours=0, minutes=0, seconds=0)

Automatically format and set wall time

Parameters
  • hours (int) – number of hours to run job

  • minutes (int) – number of minutes to run job

  • seconds (int) – number of seconds to run job

set_verbose_launch(verbose)[source]

Set the job to run in verbose mode

This sets --debug arg to the highest level

Parameters

verbose (bool) – Whether the job should be run verbosely

set_walltime(walltime)[source]

Set the walltime of the job

Walltime is given in total number of seconds

Parameters

walltime (str) – wall time

update_env(env_vars)

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters

env_vars (dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises

TypeError – if env_vars values cannot be coerced to strings

JsrunSettings

JsrunSettings can be used on any system that supports the IBM LSF launcher.

JsrunSettings can be used in interactive session (on allocation) and within batch launches (i.e. BsubBatchSettings)

JsrunSettings.set_num_rs(num_rs)

Set the number of resource sets to use

JsrunSettings.set_cpus_per_rs(cpus_per_rs)

Set the number of cpus to use per resource set

JsrunSettings.set_gpus_per_rs(gpus_per_rs)

Set the number of gpus to use per resource set

JsrunSettings.set_rs_per_host(rs_per_host)

Set the number of resource sets to use per host

JsrunSettings.set_tasks(tasks)

Set the number of tasks for this job

JsrunSettings.set_tasks_per_rs(tasks_per_rs)

Set the number of tasks per resource set

JsrunSettings.set_binding(binding)

Set binding

JsrunSettings.make_mpmd([jsrun_settings])

Make step an MPMD (or SPMD) job.

JsrunSettings.set_mpmd_preamble(preamble_lines)

Set preamble used in ERF file.

JsrunSettings.update_env(env_vars)

Update the job environment variables

JsrunSettings.set_erf_sets(erf_sets)

Set resource sets used for ERF (SPMD or MPMD) steps.

JsrunSettings.format_env_vars()

Format environment variables.

JsrunSettings.format_run_args()

Return a list of LSF formatted run arguments

class JsrunSettings(exe, exe_args=None, run_args=None, env_vars=None, **kwargs)[source]

Settings to run job with jsrun command

JsrunSettings should only be used on LSF-based systems.

Parameters
  • exe (str) – executable

  • exe_args (str | list[str], optional) – executable arguments, defaults to None

  • run_args (dict[str, str], optional) – arguments for run command, defaults to None

  • env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)

Add executable arguments to executable

Parameters

args (str | list[str]) – executable arguments

Raises

TypeError – if exe args are not strings

format_env_vars()[source]

Format environment variables. Each variable needs to be passed with --env. If a variable is set to None, its value is propagated from the current environment.

Returns

formatted list of strings to export variables

Return type

list[str]

format_run_args()[source]

Return a list of LSF formatted run arguments

Returns

list of LSF arguments for these settings

Return type

list[str]

make_mpmd(jsrun_settings=None)[source]

Make step an MPMD (or SPMD) job.

This method will activate job execution through an ERF file.

Optionally, this method adds an instance of JsrunSettings to the list of settings to be launched in the same ERF file.

Parameters

aprun_settings (JsrunSettings, optional) – JsrunSettings instance, defaults to None

reserved_run_args: set[str] = {'chdir', 'h'}
property run_command

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns

launch binary e.g. mpiexec

Type

str | None

set(arg, value=None, condition=True)

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters
  • arg (str) – name of the argument

  • value (str | None) – value of the argument

  • conditon – set the argument if condition evaluates to True

set_binding(binding)[source]

Set binding

This sets --bind

Parameters

binding (str) – Binding, e.g. packed:21

set_broadcast(dest_path=None)

Copy executable file to allocated compute nodes

Parameters

dest_path (str | None) – Path to copy an executable file

set_cpu_bindings(bindings)

Set the cores to which MPI processes are bound

Parameters

bindings (list[int] | int) – List specifing the cores to which MPI processes are bound

set_cpus_per_rs(cpus_per_rs)[source]

Set the number of cpus to use per resource set

This sets --cpu_per_rs

Parameters

cpus_per_rs (int or str) – number of cpus to use per resource set or ALL_CPUS

set_cpus_per_task(cpus_per_task)[source]

Set the number of cpus per tasks.

This function is an alias for set_cpus_per_rs.

Parameters

cpus_per_task (int) – number of cpus per resource set

set_erf_sets(erf_sets)[source]

Set resource sets used for ERF (SPMD or MPMD) steps.

erf_sets is a dictionary used to fill the ERF line representing these settings, e.g. {“host”: “1”, “cpu”: “{0:21}, {21:21}”, “gpu”: “*”} can be used to specify rank (or rank_count), hosts, cpus, gpus, and memory. The key rank is used to give specific ranks, as in {“rank”: “1, 2, 5”}, while the key rank_count is used to specify the count only, as in {“rank_count”: “3”}. If both are specified, only rank is used.

Parameters

hosts (dict[str,str]) – dictionary of resources

set_excluded_hosts(host_list)

Specify a list of hosts to exclude for launching this job

Parameters

host_list (str | list[str]) – hosts to exclude

set_gpus_per_rs(gpus_per_rs)[source]

Set the number of gpus to use per resource set

This sets --gpu_per_rs

Parameters

gpus_per_rs (int or str) – number of gpus to use per resource set or ALL_GPUS

set_hostlist(host_list)

Specify the hostlist for this job

Parameters

host_list (str | list[str]) – hosts to launch on

set_hostlist_from_file(file_path)

Use the contents of a file to specify the hostlist for this job

Parameters

file_path (str) – Path to the hostlist file

set_individual_output(suffix=None)[source]

Set individual std output.

This sets --stdio_mode individual and inserts the suffix into the output name. The resulting output name will be self.name + suffix + .out.

Parameters

suffix (str, optional) – Optional suffix to add to output file names, it can contain %j, %h, %p, or %t, as specified by jsrun options.

set_memory_per_node(memory_per_node)[source]

Specify the number of megabytes of memory to assign to a resource set

Alias for set_memory_per_rs.

Parameters

memory_per_node (int) – Number of megabytes per rs

set_memory_per_rs(memory_per_rs)[source]

Specify the number of megabytes of memory to assign to a resource set

This sets --memory_per_rs

Parameters

memory_per_rs (int) – Number of megabytes per rs

set_mpmd_preamble(preamble_lines)[source]

Set preamble used in ERF file. Typical lines include oversubscribe-cpu : allow or overlapping-rs : allow. Can be used to set launch_distribution. If it is not present, it will be inferred from the settings, or set to packed by default.

Parameters

preamble_lines (list[str]) – lines to put at the beginning of the ERF file.

set_nodes(nodes)

Set the number of nodes

Parameters

nodes (int) – number of nodes to run with

set_num_rs(num_rs)[source]

Set the number of resource sets to use

This sets --nrs.

Parameters

num_rs (int or str) – Number of resource sets or ALL_HOSTS

set_quiet_launch(quiet)

Set the job to run in quiet mode

Parameters

quiet (bool) – Whether the job should be run quietly

set_rs_per_host(rs_per_host)[source]

Set the number of resource sets to use per host

This sets --rs_per_host

Parameters

rs_per_host (int) – number of resource sets to use per host

set_task_map(task_mapping)

Set a task mapping

Parameters

task_mapping (str) – task mapping

set_tasks(tasks)[source]

Set the number of tasks for this job

This sets --np

Parameters

tasks (int) – number of tasks

set_tasks_per_node(tasks_per_node)[source]

Set the number of tasks per resource set.

This function is an alias for set_tasks_per_rs.

Parameters

tasks_per_node (int) – number of tasks per resource set

set_tasks_per_rs(tasks_per_rs)[source]

Set the number of tasks per resource set

This sets --tasks_per_rs

Parameters

tasks_per_rs (int) – number of tasks per resource set

set_time(hours=0, minutes=0, seconds=0)

Automatically format and set wall time

Parameters
  • hours (int) – number of hours to run job

  • minutes (int) – number of minutes to run job

  • seconds (int) – number of seconds to run job

set_verbose_launch(verbose)

Set the job to run in verbose mode

Parameters

verbose (bool) – Whether the job should be run verbosely

set_walltime(walltime)

Set the formatted walltime

Parameters

walltime (str) – Time in format required by launcher``

update_env(env_vars)

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters

env_vars (dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises

TypeError – if env_vars values cannot be coerced to strings

MpirunSettings

MpirunSettings are for launching with OpenMPI. MpirunSettings are supported on Slurm, PBSpro, and Cobalt.

MpirunSettings.set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

MpirunSettings.set_hostlist(host_list)

Set the hostlist for the mpirun command

MpirunSettings.set_tasks(tasks)

Set the number of tasks for this job

MpirunSettings.set_task_map(task_mapping)

Set mpirun task mapping

MpirunSettings.make_mpmd(mpirun_settings)

Make a mpmd workload by combining two mpirun commands

MpirunSettings.add_exe_args(args)

Add executable arguments to executable

MpirunSettings.format_run_args()

Return a list of OpenMPI formatted run arguments

MpirunSettings.format_env_vars()

Format the environment variables for mpirun

MpirunSettings.update_env(env_vars)

Update the job environment variables

class MpirunSettings(exe, exe_args=None, run_args=None, env_vars=None, **kwargs)[source]

Settings to run job with mpirun command (OpenMPI)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into mpirun arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters
  • exe (str) – executable

  • exe_args (str | list[str], optional) – executable arguments, defaults to None

  • run_args (dict[str, str], optional) – arguments for run command, defaults to None

  • env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)

Add executable arguments to executable

Parameters

args (str | list[str]) – executable arguments

Raises

TypeError – if exe args are not strings

format_env_vars()

Format the environment variables for mpirun

Returns

list of env vars

Return type

list[str]

format_run_args()

Return a list of OpenMPI formatted run arguments

Returns

list of OpenMPI arguments for these settings

Return type

list[str]

make_mpmd(mpirun_settings)

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters

mpirun_settings (MpirunSettings) – MpirunSettings instance

reserved_run_args: set[str] = {'wd', 'wdir'}
property run_command

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns

launch binary e.g. mpiexec

Type

str | None

set(arg, value=None, condition=True)

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters
  • arg (str) – name of the argument

  • value (str | None) – value of the argument

  • conditon – set the argument if condition evaluates to True

set_binding(binding)

Set binding

Parameters

binding (str) – Binding

set_broadcast(dest_path=None)

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters

dest_path (str | None) – Destination path (Ignored)

set_cpu_bindings(bindings)

Set the cores to which MPI processes are bound

Parameters

bindings (list[int] | int) – List specifing the cores to which MPI processes are bound

set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

This sets --cpus-per-proc

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters

cpus_per_task (int) – number of tasks

set_excluded_hosts(host_list)

Specify a list of hosts to exclude for launching this job

Parameters

host_list (str | list[str]) – hosts to exclude

set_hostlist(host_list)

Set the hostlist for the mpirun command

This sets --host

Parameters

host_list (str | list[str]) – list of host names

Raises

TypeError – if not str or list of str

set_hostlist_from_file(file_path)

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters

file_path (str) – Path to the hostlist file

set_memory_per_node(memory_per_node)

Set the amount of memory required per node in megabytes

Parameters

memory_per_node (int) – Number of megabytes per node

set_mpmd_preamble(preamble_lines)

Set preamble to a file to make a job MPMD

Parameters

preamble_lines (list[str]) – lines to put at the beginning of a file.

set_nodes(nodes)

Set the number of nodes

Parameters

nodes (int) – number of nodes to run with

set_quiet_launch(quiet)

Set the job to run in quiet mode

This sets --quiet

Parameters

quiet (bool) – Whether the job should be run quietly

set_task_map(task_mapping)

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters

task_mapping (str) – task mapping

set_tasks(tasks)

Set the number of tasks for this job

This sets --n

Parameters

tasks (int) – number of tasks

set_tasks_per_node(tasks_per_node)

Set the number of tasks per node

Parameters

tasks_per_node (int) – number of tasks to launch per node

set_time(hours=0, minutes=0, seconds=0)

Automatically format and set wall time

Parameters
  • hours (int) – number of hours to run job

  • minutes (int) – number of minutes to run job

  • seconds (int) – number of seconds to run job

set_verbose_launch(verbose)

Set the job to run in verbose mode

This sets --verbose

Parameters

verbose (bool) – Whether the job should be run verbosely

set_walltime(walltime)

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters

walltime (str) – number like string of seconds that a job will run in secs

update_env(env_vars)

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters

env_vars (dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises

TypeError – if env_vars values cannot be coerced to strings

MpiexecSettings

MpiexecSettings are for launching with OpenMPI’s mpiexec. MpirunSettings are supported on Slurm, PBSpro, and Cobalt.

MpiexecSettings.set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

MpiexecSettings.set_hostlist(host_list)

Set the hostlist for the mpirun command

MpiexecSettings.set_tasks(tasks)

Set the number of tasks for this job

MpiexecSettings.set_task_map(task_mapping)

Set mpirun task mapping

MpiexecSettings.make_mpmd(mpirun_settings)

Make a mpmd workload by combining two mpirun commands

MpiexecSettings.add_exe_args(args)

Add executable arguments to executable

MpiexecSettings.format_run_args()

Return a list of OpenMPI formatted run arguments

MpiexecSettings.format_env_vars()

Format the environment variables for mpirun

MpiexecSettings.update_env(env_vars)

Update the job environment variables

class MpiexecSettings(exe, exe_args=None, run_args=None, env_vars=None, **kwargs)[source]

Settings to run job with mpiexec command (OpenMPI)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into mpiexec arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters
  • exe (str) – executable

  • exe_args (str | list[str], optional) – executable arguments, defaults to None

  • run_args (dict[str, str], optional) – arguments for run command, defaults to None

  • env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)

Add executable arguments to executable

Parameters

args (str | list[str]) – executable arguments

Raises

TypeError – if exe args are not strings

format_env_vars()

Format the environment variables for mpirun

Returns

list of env vars

Return type

list[str]

format_run_args()

Return a list of OpenMPI formatted run arguments

Returns

list of OpenMPI arguments for these settings

Return type

list[str]

make_mpmd(mpirun_settings)

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters

mpirun_settings (MpirunSettings) – MpirunSettings instance

reserved_run_args: set[str] = {'wd', 'wdir'}
property run_command

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns

launch binary e.g. mpiexec

Type

str | None

set(arg, value=None, condition=True)

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters
  • arg (str) – name of the argument

  • value (str | None) – value of the argument

  • conditon – set the argument if condition evaluates to True

set_binding(binding)

Set binding

Parameters

binding (str) – Binding

set_broadcast(dest_path=None)

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters

dest_path (str | None) – Destination path (Ignored)

set_cpu_bindings(bindings)

Set the cores to which MPI processes are bound

Parameters

bindings (list[int] | int) – List specifing the cores to which MPI processes are bound

set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

This sets --cpus-per-proc

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters

cpus_per_task (int) – number of tasks

set_excluded_hosts(host_list)

Specify a list of hosts to exclude for launching this job

Parameters

host_list (str | list[str]) – hosts to exclude

set_hostlist(host_list)

Set the hostlist for the mpirun command

This sets --host

Parameters

host_list (str | list[str]) – list of host names

Raises

TypeError – if not str or list of str

set_hostlist_from_file(file_path)

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters

file_path (str) – Path to the hostlist file

set_memory_per_node(memory_per_node)

Set the amount of memory required per node in megabytes

Parameters

memory_per_node (int) – Number of megabytes per node

set_mpmd_preamble(preamble_lines)

Set preamble to a file to make a job MPMD

Parameters

preamble_lines (list[str]) – lines to put at the beginning of a file.

set_nodes(nodes)

Set the number of nodes

Parameters

nodes (int) – number of nodes to run with

set_quiet_launch(quiet)

Set the job to run in quiet mode

This sets --quiet

Parameters

quiet (bool) – Whether the job should be run quietly

set_task_map(task_mapping)

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters

task_mapping (str) – task mapping

set_tasks(tasks)

Set the number of tasks for this job

This sets --n

Parameters

tasks (int) – number of tasks

set_tasks_per_node(tasks_per_node)

Set the number of tasks per node

Parameters

tasks_per_node (int) – number of tasks to launch per node

set_time(hours=0, minutes=0, seconds=0)

Automatically format and set wall time

Parameters
  • hours (int) – number of hours to run job

  • minutes (int) – number of minutes to run job

  • seconds (int) – number of seconds to run job

set_verbose_launch(verbose)

Set the job to run in verbose mode

This sets --verbose

Parameters

verbose (bool) – Whether the job should be run verbosely

set_walltime(walltime)

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters

walltime (str) – number like string of seconds that a job will run in secs

update_env(env_vars)

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters

env_vars (dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises

TypeError – if env_vars values cannot be coerced to strings

OrterunSettings

OrterunSettings are for launching with OpenMPI’s orterun. OrterunSettings are supported on Slurm, PBSpro, and Cobalt.

OrterunSettings.set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

OrterunSettings.set_hostlist(host_list)

Set the hostlist for the mpirun command

OrterunSettings.set_tasks(tasks)

Set the number of tasks for this job

OrterunSettings.set_task_map(task_mapping)

Set mpirun task mapping

OrterunSettings.make_mpmd(mpirun_settings)

Make a mpmd workload by combining two mpirun commands

OrterunSettings.add_exe_args(args)

Add executable arguments to executable

OrterunSettings.format_run_args()

Return a list of OpenMPI formatted run arguments

OrterunSettings.format_env_vars()

Format the environment variables for mpirun

OrterunSettings.update_env(env_vars)

Update the job environment variables

class OrterunSettings(exe, exe_args=None, run_args=None, env_vars=None, **kwargs)[source]

Settings to run job with orterun command (OpenMPI)

Note that environment variables can be passed with a None value to signify that they should be exported from the current environment

Any arguments passed in the run_args dict will be converted into orterun arguments and prefixed with --. Values of None can be provided for arguments that do not have values.

Parameters
  • exe (str) – executable

  • exe_args (str | list[str], optional) – executable arguments, defaults to None

  • run_args (dict[str, str], optional) – arguments for run command, defaults to None

  • env_vars (dict[str, str], optional) – environment vars to launch job with, defaults to None

add_exe_args(args)

Add executable arguments to executable

Parameters

args (str | list[str]) – executable arguments

Raises

TypeError – if exe args are not strings

format_env_vars()

Format the environment variables for mpirun

Returns

list of env vars

Return type

list[str]

format_run_args()

Return a list of OpenMPI formatted run arguments

Returns

list of OpenMPI arguments for these settings

Return type

list[str]

make_mpmd(mpirun_settings)

Make a mpmd workload by combining two mpirun commands

This connects the two settings to be executed with a single Model instance

Parameters

mpirun_settings (MpirunSettings) – MpirunSettings instance

reserved_run_args: set[str] = {'wd', 'wdir'}
property run_command

Return the launch binary used to launch the executable

Attempt to expand the path to the executable if possible

Returns

launch binary e.g. mpiexec

Type

str | None

set(arg, value=None, condition=True)

Allows users to set individual run arguments.

A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.

Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.

Basic Usage

rs = RunSettings("python")
rs.set("an-arg", "a-val")
rs.set("a-flag")
rs.format_run_args()  # returns ["an-arg", "a-val", "a-flag", "None"]

Slurm Example with Conditional Setting

import socket

rs = SrunSettings("echo", "hello")
rs.set_tasks(1)
rs.set("exclusive")

# Only set this argument if condition param evals True
# Otherwise log and NOP
rs.set("partition", "debug",
       condition=socket.gethostname()=="testing-system")

rs.format_run_args()
# returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system"
# otherwise returns ["exclusive", "None"]
Parameters
  • arg (str) – name of the argument

  • value (str | None) – value of the argument

  • conditon – set the argument if condition evaluates to True

set_binding(binding)

Set binding

Parameters

binding (str) – Binding

set_broadcast(dest_path=None)

Copy the specified executable(s) to remote machines

This sets --preload-binary

Parameters

dest_path (str | None) – Destination path (Ignored)

set_cpu_bindings(bindings)

Set the cores to which MPI processes are bound

Parameters

bindings (list[int] | int) – List specifing the cores to which MPI processes are bound

set_cpus_per_task(cpus_per_task)

Set the number of tasks for this job

This sets --cpus-per-proc

note: this option has been deprecated in openMPI 4.0+ and will soon be replaced.

Parameters

cpus_per_task (int) – number of tasks

set_excluded_hosts(host_list)

Specify a list of hosts to exclude for launching this job

Parameters

host_list (str | list[str]) – hosts to exclude

set_hostlist(host_list)

Set the hostlist for the mpirun command

This sets --host

Parameters

host_list (str | list[str]) – list of host names

Raises

TypeError – if not str or list of str

set_hostlist_from_file(file_path)

Use the contents of a file to set the hostlist

This sets --hostfile

Parameters

file_path (str) – Path to the hostlist file

set_memory_per_node(memory_per_node)

Set the amount of memory required per node in megabytes

Parameters

memory_per_node (int) – Number of megabytes per node

set_mpmd_preamble(preamble_lines)

Set preamble to a file to make a job MPMD

Parameters

preamble_lines (list[str]) – lines to put at the beginning of a file.

set_nodes(nodes)

Set the number of nodes

Parameters

nodes (int) – number of nodes to run with

set_quiet_launch(quiet)

Set the job to run in quiet mode

This sets --quiet

Parameters

quiet (bool) – Whether the job should be run quietly

set_task_map(task_mapping)

Set mpirun task mapping

this sets --map-by <mapping>

For examples, see the man page for mpirun

Parameters

task_mapping (str) – task mapping

set_tasks(tasks)

Set the number of tasks for this job

This sets --n

Parameters

tasks (int) – number of tasks

set_tasks_per_node(tasks_per_node)

Set the number of tasks per node

Parameters

tasks_per_node (int) – number of tasks to launch per node

set_time(hours=0, minutes=0, seconds=0)

Automatically format and set wall time

Parameters
  • hours (int) – number of hours to run job

  • minutes (int) – number of minutes to run job

  • seconds (int) – number of seconds to run job

set_verbose_launch(verbose)

Set the job to run in verbose mode

This sets --verbose

Parameters

verbose (bool) – Whether the job should be run verbosely

set_walltime(walltime)

Set the maximum number of seconds that a job will run

This sets --timeout

Parameters

walltime (str) – number like string of seconds that a job will run in secs

update_env(env_vars)

Update the job environment variables

To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the add_exe_args() method. For example, --export=ALL for slurm, or -V for PBS/aprun.

Parameters

env_vars (dict[str, Union[str, int, float, bool]]) – environment variables to update or add

Raises

TypeError – if env_vars values cannot be coerced to strings


SbatchSettings

SbatchSettings are used for launching batches onto Slurm WLM systems.

SbatchSettings.set_account(account)

Set the account for this batch job

SbatchSettings.set_batch_command(command)

Set the command used to launch the batch e.g.

SbatchSettings.set_nodes(num_nodes)

Set the number of nodes for this batch job

SbatchSettings.set_hostlist(host_list)

Specify the hostlist for this job

SbatchSettings.set_partition(partition)

Set the partition for the batch job

SbatchSettings.set_queue(queue)

alias for set_partition

SbatchSettings.set_walltime(walltime)

Set the walltime of the job

SbatchSettings.format_batch_args()

Get the formatted batch arguments for a preview

class SbatchSettings(nodes=None, time='', account=None, batch_args=None, **kwargs)[source]

Specify run parameters for a Slurm batch job

Slurm sbatch arguments can be written into batch_args as a dictionary. e.g. {‘ntasks’: 1}

If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}

Initialization values provided (nodes, time, account) will overwrite the same arguments in batch_args if present

Parameters
  • nodes (int, optional) – number of nodes, defaults to None

  • time (str, optional) – walltime for job, e.g. “10:00:00” for 10 hours

  • account (str, optional) – account for job, defaults to None

  • batch_args (dict[str, str], optional) – extra batch arguments, defaults to None

add_preamble(lines)

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters

line (str or list[str]) – lines to add to preamble.

property batch_cmd

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns

batch command

Type

str

format_batch_args()[source]

Get the formatted batch arguments for a preview

Returns

batch arguments for Sbatch

Return type

list[str]

set_account(account)[source]

Set the account for this batch job

Parameters

account (str) – account id

set_batch_command(command)

Set the command used to launch the batch e.g. sbatch

Parameters

command (str) – batch command

set_cpus_per_task(cpus_per_task)[source]

Set the number of cpus to use per task

This sets --cpus-per-task

Parameters

num_cpus (int) – number of cpus to use per task

set_hostlist(host_list)[source]

Specify the hostlist for this job

Parameters

host_list (str | list[str]) – hosts to launch on

Raises

TypeError – if not str or list of str

set_nodes(num_nodes)[source]

Set the number of nodes for this batch job

Parameters

num_nodes (int) – number of nodes

set_partition(partition)[source]

Set the partition for the batch job

Parameters

partition (str) – partition name

set_queue(queue)[source]

alias for set_partition

Sets the partition for the slurm batch job

Parameters

queue (str) – the partition to run the batch job on

set_walltime(walltime)[source]

Set the walltime of the job

format = “HH:MM:SS”

Parameters

walltime (str) – wall time

QsubBatchSettings

QsubBatchSettings are used to configure jobs that should be launched as a batch on PBSPro systems.

QsubBatchSettings.set_account(account)

Set the account for this batch job

QsubBatchSettings.set_batch_command(command)

Set the command used to launch the batch e.g.

QsubBatchSettings.set_nodes(num_nodes)

Set the number of nodes for this batch job

QsubBatchSettings.set_ncpus(num_cpus)

Set the number of cpus obtained in each node.

QsubBatchSettings.set_queue(queue)

Set the queue for the batch job

QsubBatchSettings.set_resource(...)

Set a resource value for the Qsub batch

QsubBatchSettings.set_walltime(walltime)

Set the walltime of the job

QsubBatchSettings.format_batch_args()

Get the formatted batch arguments for a preview

class QsubBatchSettings(nodes=None, ncpus=None, time=None, queue=None, account=None, resources=None, batch_args=None, **kwargs)[source]

Specify qsub batch parameters for a job

nodes, and ncpus are used to create the select statement for PBS if a select statement is not included in the resources. If both are supplied the value for select statement supplied in resources will override.

Parameters
  • nodes (int, optional) – number of nodes for batch, defaults to None

  • ncpus (int, optional) – number of cpus per node, defaults to None

  • time (str, optional) – walltime for batch job, defaults to None

  • queue (str, optional) – queue to run batch in, defaults to None

  • account (str, optional) – account for batch launch, defaults to None

  • resources (dict[str, str], optional) – overrides for resource arguments, defaults to None

  • batch_args (dict[str, str], optional) – overrides for PBS batch arguments, defaults to None

add_preamble(lines)

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters

line (str or list[str]) – lines to add to preamble.

property batch_cmd

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns

batch command

Type

str

format_batch_args()[source]

Get the formatted batch arguments for a preview

Returns

batch arguments for Qsub

Return type

list[str]

Raises

ValueError – if options are supplied without values

set_account(account)[source]

Set the account for this batch job

Parameters

acct (str) – account id

set_batch_command(command)

Set the command used to launch the batch e.g. sbatch

Parameters

command (str) – batch command

set_hostlist(host_list)[source]

Specify the hostlist for this job

Parameters

host_list (str | list[str]) – hosts to launch on

Raises

TypeError – if not str or list of str

set_ncpus(num_cpus)[source]

Set the number of cpus obtained in each node.

If a select argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters

num_cpus (int) – number of cpus per node in select

set_nodes(num_nodes)[source]

Set the number of nodes for this batch job

If a select argument is provided in QsubBatchSettings.resources this value will be overridden

Parameters

num_nodes (int) – number of nodes

set_queue(queue)[source]

Set the queue for the batch job

Parameters

queue (str) – queue name

set_resource(resource_name, value)[source]

Set a resource value for the Qsub batch

If a select statement is provided, the nodes and ncpus arguments will be overridden. Likewise for Walltime

Parameters
  • resource_name (str) – name of resource, e.g. walltime

  • value (str) – value

set_walltime(walltime)[source]

Set the walltime of the job

format = “HH:MM:SS”

If a walltime argument is provided in QsubBatchSettings.resources, then this value will be overridden

Parameters

walltime (str) – wall time

CobaltBatchSettings

CobaltBatchSettings are used to configure jobs that should be launched as a batch on Cobalt Systems. They closely mimic that of the QsubBatchSettings for PBSPro.

CobaltBatchSettings.set_account(account)

Set the account for this batch job

CobaltBatchSettings.set_batch_command(command)

Set the command used to launch the batch e.g.

CobaltBatchSettings.set_nodes(num_nodes)

Set the number of nodes for this batch job

CobaltBatchSettings.set_queue(queue)

Set the queue for the batch job

CobaltBatchSettings.set_walltime(walltime)

Set the walltime of the job

CobaltBatchSettings.format_batch_args()

Get the formatted batch arguments for a preview

class CobaltBatchSettings(nodes=None, time='', queue=None, account=None, batch_args=None, **kwargs)[source]

Specify settings for a Cobalt qsub batch launch

If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}

Initialization values provided (nodes, time, account) will overwrite the same arguments in batch_args if present

Parameters
  • nodes (int, optional) – number of nodes, defaults to None

  • time (str, optional) – walltime for job, e.g. “10:00:00” for 10 hours, defaults to empty str

  • queue (str, optional) – queue to launch job in, defaults to None

  • account (str, optional) – account for job, defaults to None

  • batch_args (dict[str, str], optional) – extra batch arguments, defaults to None

add_preamble(lines)

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters

line (str or list[str]) – lines to add to preamble.

property batch_cmd

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns

batch command

Type

str

format_batch_args()[source]

Get the formatted batch arguments for a preview

Returns

list of batch arguments for Sbatch

Return type

list[str]

set_account(account)[source]

Set the account for this batch job

Parameters

acct (str) – account id

set_batch_command(command)

Set the command used to launch the batch e.g. sbatch

Parameters

command (str) – batch command

set_hostlist(host_list)[source]

Specify the hostlist for this job

Parameters

host_list (str | list[str]) – hosts to launch on

Raises

TypeError – if not str or list of str

set_nodes(num_nodes)[source]

Set the number of nodes for this batch job

Parameters

num_nodes (int) – number of nodes

set_queue(queue)[source]

Set the queue for the batch job

Parameters

queue (str) – queue name

set_tasks(num_tasks)[source]

Set total number of processes to start

Parameters

num_tasks (int) – number of processes

set_walltime(walltime)[source]

Set the walltime of the job

format = “HH:MM:SS”

Cobalt walltime can also be specified with number of minutes.

Parameters

walltime (str) – wall time

BsubBatchSettings

BsubBatchSettings are used to configure jobs that should be launched as a batch on LSF systems.

BsubBatchSettings.set_walltime(walltime)

Set the walltime

BsubBatchSettings.set_smts(smts)

Set SMTs

BsubBatchSettings.set_project(project)

Set the project

BsubBatchSettings.set_nodes(num_nodes)

Set the number of nodes for this batch job

BsubBatchSettings.set_expert_mode_req(...)

Set allocation for expert mode.

BsubBatchSettings.set_hostlist(host_list)

Specify the hostlist for this job

BsubBatchSettings.set_tasks(tasks)

Set the number of tasks for this job

BsubBatchSettings.format_batch_args()

Get the formatted batch arguments for a preview

class BsubBatchSettings(nodes=None, time=None, project=None, batch_args=None, smts=None, **kwargs)[source]

Specify bsub batch parameters for a job

Parameters
  • nodes (int, optional) – number of nodes for batch, defaults to None

  • time (str, optional) – walltime for batch job in format hh:mm, defaults to None

  • project (str, optional) – project for batch launch, defaults to None

  • batch_args (dict[str, str], optional) – overrides for LSF batch arguments, defaults to None

  • smts (int, optional) – SMTs, defaults to None

add_preamble(lines)

Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.

Parameters

line (str or list[str]) – lines to add to preamble.

property batch_cmd

Return the batch command

Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.

Returns

batch command

Type

str

format_batch_args()[source]

Get the formatted batch arguments for a preview

Returns

list of batch arguments for Qsub

Return type

list[str]

set_account(account)[source]

Set the project

this function is an alias for set_project.

Parameters

account (str) – project name

set_batch_command(command)

Set the command used to launch the batch e.g. sbatch

Parameters

command (str) – batch command

set_expert_mode_req(res_req, slots)[source]

Set allocation for expert mode. This will activate expert mode (-csm) and disregard all other allocation options.

This sets -csm -n slots -R res_req

set_hostlist(host_list)[source]

Specify the hostlist for this job

Parameters

host_list (str | list[str]) – hosts to launch on

Raises

TypeError – if not str or list of str

set_nodes(num_nodes)[source]

Set the number of nodes for this batch job

This sets -nnodes.

Parameters

nodes (int) – number of nodes

set_project(project)[source]

Set the project

This sets -P.

Parameters

time (str) – project name

set_queue(queue)[source]

Set the queue for this job

Parameters

queue (str) – The queue to submit the job on

set_smts(smts)[source]

Set SMTs

This sets -alloc_flags. If the user sets SMT explicitly through -alloc_flags, then that takes precedence.

Parameters

smts (int) – SMT (e.g on Summit: 1, 2, or 4)

set_tasks(tasks)[source]

Set the number of tasks for this job

This sets -n

Parameters

tasks (int) – number of tasks

set_walltime(walltime)[source]

Set the walltime

This sets -W.

Parameters

walltime (str) – Time in hh:mm format, e.g. “10:00” for 10 hours, if time is supplied in hh:mm:ss format, seconds will be ignored and walltime will be set as hh:mm

Singularity

Singularity is a type of Container that can be passed to a RunSettings class or child class to enable running the workload in a container.

class Singularity(*args, **kwargs)[source]

Singularity (apptainer) container type. To be passed into a RunSettings class initializer or Experiment.create_run_settings.

Note

Singularity integration is currently tested with Apptainer 1.0 with slurm and PBS workload managers only.

Also, note that user-defined bind paths (mount argument) may be disabled by a system administrator

Parameters
  • image (str) – local or remote path to container image, e.g. docker://sylabsio/lolcow

  • args (str | list[str], optional) – arguments to ‘singularity exec’ command

  • mount (str | list[str] | dict[str, str], optional) – paths to mount (bind) from host machine into image.

Orchestrator

Orchestrator

class Orchestrator(port=6379, interface='lo', launcher='local', run_command='auto', db_nodes=1, batch=False, hosts=None, account=None, time=None, alloc=None, single_cmd=False, **kwargs)[source]

The Orchestrator is an in-memory database that can be launched alongside entities in SmartSim. Data can be transferred between entities by using one of the Python, C, C++ or Fortran clients within an entity.

Initialize an Orchestrator reference for local launch

Parameters
  • port (int, optional) – TCP/IP port, defaults to 6379

  • interface (str, optional) – network interface, defaults to “lo”

Extra configurations for RedisAI

See https://oss.redislabs.com/redisai/configuration/

Parameters
  • threads_per_queue (int, optional) – threads per GPU device

  • inter_op_threads (int, optional) – threads accross CPU operations

  • intra_op_threads (int, optional) – threads per CPU operation

property batch
enable_checkpoints(frequency)[source]

Sets the database’s save configuration to save the DB every ‘frequency’ seconds given that at least one write operation against the DB occurred in that time. For example, if frequency is 900, then the database will save to disk after 900 seconds if there is at least 1 change to the dataset.

Parameters

frequency (int) – the given number of seconds before the DB saves

get_address()[source]

Return database addresses

Returns

addresses

Return type

list[str]

Raises

SmartSimError – If database address cannot be found or is not active

property hosts

Return the hostnames of orchestrator instance hosts

Note that this will only be populated after the orchestrator has been launched by SmartSim.

Returns

hostnames

Return type

list[str]

is_active()[source]

Check if the database is active

Returns

True if database is active, False otherwise

Return type

bool

property num_shards

Return the number of DB shards contained in the orchestrator. This might differ from the number of DBNode objects, as each DBNode may start more than one shard (e.g. with MPMD).

Returns

num_shards

Return type

int

remove_stale_files()[source]

Can be used to remove database files of a previous launch

set_batch_arg(arg, value)[source]

Set a batch argument the orchestrator should launch with

Some commonly used arguments such as –job-name are used by SmartSim and will not be allowed to be set.

Parameters
  • arg (str) – batch argument to set e.g. “exclusive”

  • value (str | None) – batch param - set to None if no param value

Raises

SmartSimError – if orchestrator not launching as batch

set_cpus(num_cpus)[source]

Set the number of CPUs available to each database shard

This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.

Parameters

num_cpus (int) – number of cpus to set

set_db_conf(key, value)[source]

Set any valid configuration at runtime without the need to restart the database. All configuration parameters that are set are immediately loaded by the database and will take effect starting with the next command executed.

Parameters
  • key (str) – the configuration parameter

  • value (str) – the database configuration parameter’s new value

set_eviction_strategy(strategy)[source]

Sets how the database will select what to remove when ‘maxmemory’ is reached. The default is noeviction.

Parameters

strategy (str) – The max memory policy to use e.g. “volatile-lru”, “allkeys-lru”, etc.

Raises
  • SmartSimError – If ‘strategy’ is an invalid maxmemory policy

  • SmartSimError – If database is not active

set_hosts(host_list)[source]

Specify the hosts for the Orchestrator to launch on

Parameters

host_list (str, list[str]) – list of host (compute node names)

Raises

TypeError – if wrong type

set_max_clients(clients=50000)[source]

Sets the max number of connected clients at the same time. When the number of DB shards contained in the orchestrator is more than two, then every node will use two connections, one incoming and another outgoing.

Parameters

clients (int, optional) – the maximum number of connected clients

set_max_memory(mem)[source]

Sets the max memory configuration. By default there is no memory limit. Setting max memory to zero also results in no memory limit. Once a limit is surpassed, keys will be removed according to the eviction strategy. The specified memory size is case insensitive and supports the typical forms of: 1k => 1000 bytes 1kb => 1024 bytes 1m => 1000000 bytes 1mb => 1024*1024 bytes 1g => 1000000000 bytes 1gb => 1024*1024*1024 bytes

Parameters

mem (str) – the desired max memory size e.g. 3gb

Raises
  • SmartSimError – If ‘mem’ is an invalid memory value

  • SmartSimError – If database is not active

set_max_message_size(size=1073741824)[source]

Sets the database’s memory size limit for bulk requests, which are elements representing single strings. The default is 1 gigabyte. Message size must be greater than or equal to 1mb. The specified memory size should be an integer that represents the number of bytes. For example, to set the max message size to 1gb, use 1024*1024*1024.

Parameters

size (int, optional) – maximum message size in bytes

set_path(new_path)
set_run_arg(arg, value)[source]

Set a run argument the orchestrator should launch each node with (it will be passed to jrun)

Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “n”, “N”, etc.

Parameters
  • arg (str) – run argument to set

  • value (str | None) – run parameter - set to None if no parameter value

set_walltime(walltime)[source]

Set the batch walltime of the orchestrator

Note: This will only effect orchestrators launched as a batch

Parameters

walltime (str) – amount of time e.g. 10 hours is 10:00:00

Raises

SmartSimError – if orchestrator isn’t launching as batch

property type

Return the name of the class

Model

Model.__init__(name, params, path, run_settings)

Initialize a Model

Model.attach_generator_files([to_copy, ...])

Attach files to an entity for generation

Model.colocate_db([port, db_cpus, ...])

Colocate an Orchestrator instance with this Model at runtime.

Model.params_to_args()

Convert parameters to command line arguments and update run settings.

Model.register_incoming_entity(incoming_entity)

Register future communication between entities.

Model.enable_key_prefixing()

If called, the entity will prefix its keys with its own model name

Model.disable_key_prefixing()

If called, the entity will not prefix its keys with its own model name

Model.query_key_prefixing()

Inquire as to whether this entity will prefix its keys with its name

class Model(name, params, path, run_settings, params_as_args=None)[source]

Bases: smartsim.entity.entity.SmartSimEntity

Initialize a Model

Parameters
  • name (str) – name of the model

  • params (dict) – model parameters for writing into configuration files or to be passed as command line arguments to executable.

  • path (str) – path to output, error, and configuration files

  • run_settings (RunSettings) – launcher settings specified in the experiment

  • params_as_args (list[str]) – list of parameters which have to be interpreted as command line arguments to be added to run_settings

add_function(name, function=None, device='CPU', devices_per_node=1)[source]

TorchScript function to launch with this Model instance

Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of this Model instance.

For converged orchestrators, the add_script() method should be used.

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

Parameters
  • name (str) – key to store function under

  • script (str or byte string, optional) – TorchScript code

  • script_path (str, optional) – path to TorchScript code

  • device (str, optional) – device for script execution, defaults to “CPU”

  • devices_per_node (int) – number of devices on each host

add_ml_model(name, backend, model=None, model_path=None, device='CPU', devices_per_node=1, batch_size=0, min_batch_size=0, tag='', inputs=None, outputs=None)[source]

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance

One of either model (in memory representation) or model_path (file) must be provided

Parameters
  • name (str) – key to store model under

  • model (byte string, optional) – model in memory

  • model_path (file path to model) – serialized model

  • backend (str) – name of the backend (TORCH, TF, TFLITE, ONNX)

  • device (str, optional) – name of device for execution, defaults to “CPU”

  • batch_size (int, optional) – batch size for execution, defaults to 0

  • min_batch_size (int, optional) – minimum batch size for model execution, defaults to 0

  • tag (str, optional) – additional tag for model information, defaults to “”

  • inputs (list[str], optional) – model inputs (TF only), defaults to None

  • outputs (list[str], optional) – model outupts (TF only), defaults to None

add_script(name, script=None, script_path=None, device='CPU', devices_per_node=1)[source]

TorchScript to launch with this Model instance

Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

One of either script (in memory string representation) or script_path (file) must be provided

Parameters
  • name (str) – key to store script under

  • script (str, optional) – TorchScript code

  • script_path (str, optional) – path to TorchScript code

  • device (str, optional) – device for script execution, defaults to “CPU”

  • devices_per_node (int) – number of devices on each host

attach_generator_files(to_copy=None, to_symlink=None, to_configure=None)[source]

Attach files to an entity for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters
  • to_copy (list, optional) – files to copy, defaults to []

  • to_symlink (list, optional) – files to symlink, defaults to []

  • to_configure (list, optional) – input files with tagged parameters, defaults to []

colocate_db(port=6379, db_cpus=1, limit_app_cpus=True, ifname='lo', debug=False, **kwargs)[source]

Colocate an Orchestrator instance with this Model at runtime.

This method will initialize settings which add an unsharded (not connected) database to this Model instance. Only this Model will be able to communicate with this colocated database by using the loopback TCP interface or Unix Domain sockets (UDS coming soon).

Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.

ex. kwargs = {
    maxclients: 100000,
    threads_per_queue: 1,
    inter_op_threads: 1,
    intra_op_threads: 1,
    server_threads: 2 # keydb only
}

Generally these don’t need to be changed.

Parameters
  • port (int, optional) – port to use for orchestrator database, defaults to 6379

  • db_cpus (int, optional) – number of cpus to use for orchestrator, defaults to 1

  • limit_app_cpus (bool, optional) – whether to limit the number of cpus used by the app, defaults to True

  • ifname (str, optional) – interface to use for orchestrator, defaults to “lo”

  • debug (bool, optional) – launch Model with extra debug information about the co-located db

  • kwargs (dict, optional) – additional keyword arguments to pass to the orchestrator database

property colocated

Return True if this Model will run with a colocated Orchestrator

disable_key_prefixing()[source]

If called, the entity will not prefix its keys with its own model name

enable_key_prefixing()[source]

If called, the entity will prefix its keys with its own model name

params_to_args()[source]

Convert parameters to command line arguments and update run settings.

query_key_prefixing()[source]

Inquire as to whether this entity will prefix its keys with its name

register_incoming_entity(incoming_entity)[source]

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Parameters

incoming_entity (SmartSimEntity) – The entity that data will be received from

Raises

SmartSimError – if incoming entity has already been registered

property type

Return the name of the class

Ensemble

Ensemble.__init__(name, params[, ...])

Initialize an Ensemble of Model instances.

Ensemble.models

Ensemble.add_model(model)

Add a model to this ensemble

Ensemble.attach_generator_files([to_copy, ...])

Attach files to each model within the ensemble for generation

Ensemble.register_incoming_entity(...)

Register future communication between entities.

Ensemble.enable_key_prefixing()

If called, all models within this ensemble will prefix their keys with its own model name.

Ensemble.query_key_prefixing()

Inquire as to whether each model within the ensemble will prefix its keys

class Ensemble(name, params, params_as_args=None, batch_settings=None, run_settings=None, perm_strat='all_perm', **kwargs)[source]

Bases: smartsim.entity.entityList.EntityList

Ensemble is a group of Model instances that can be treated as a reference to a single instance.

Initialize an Ensemble of Model instances.

The kwargs argument can be used to pass custom input parameters to the permutation strategy.

Parameters
  • name (str) – name of the ensemble

  • params (dict[str, Any]) – parameters to expand into Model members

  • params_as_args – list of params which should be used as command line arguments to the Model member executables and not written to generator files

  • batch_settings (BatchSettings, optional) – describes settings for Ensemble as batch workload

  • run_settings (RunSettings, optional) – describes how each Model should be executed

  • replicas (int, optional) – number of Model replicas to create - a keyword argument of kwargs

  • perm_strategy (str) – strategy for expanding params into Model instances from params argument options are “all_perm”, “stepped”, “random” or a callable function. Defaults to “all_perm”.

Returns

Ensemble instance

Return type

Ensemble

add_function(name, function=None, device='CPU', devices_per_node=1)[source]

TorchScript function to launch with every entity belonging to this ensemble

Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of every entity belonging to this ensemble.

For converged orchestrators, the add_script() method should be used.

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

Parameters
  • name (str) – key to store function under

  • script (str, optional) – TorchScript code

  • script_path (str, optional) – path to TorchScript code

  • device (str, optional) – device for script execution, defaults to “CPU”

  • devices_per_node (int) – number of devices on each host

add_ml_model(name, backend, model=None, model_path=None, device='CPU', devices_per_node=1, batch_size=0, min_batch_size=0, tag='', inputs=None, outputs=None)[source]

A TF, TF-lite, PT, or ONNX model to load into the DB at runtime

Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble

One of either model (in memory representation) or model_path (file) must be provided

Parameters
  • name (str) – key to store model under

  • model (str | bytes | None) – model in memory

  • model_path (file path to model) – serialized model

  • backend (str) – name of the backend (TORCH, TF, TFLITE, ONNX)

  • device (str, optional) – name of device for execution, defaults to “CPU”

  • batch_size (int, optional) – batch size for execution, defaults to 0

  • min_batch_size (int, optional) – minimum batch size for model execution, defaults to 0

  • tag (str, optional) – additional tag for model information, defaults to “”

  • inputs (list[str], optional) – model inputs (TF only), defaults to None

  • outputs (list[str], optional) – model outupts (TF only), defaults to None

add_model(model)[source]

Add a model to this ensemble

Parameters

model (Model) – model instance to be added

Raises
  • TypeError – if model is not an instance of Model

  • EntityExistsError – if model already exists in this ensemble

add_script(name, script=None, script_path=None, device='CPU', devices_per_node=1)[source]

TorchScript to launch with every entity belonging to this ensemble

Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble

Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.

Setting devices_per_node=N, with N greater than one will result in the model being stored in the first N devices of type device.

One of either script (in memory string representation) or script_path (file) must be provided

Parameters
  • name (str) – key to store script under

  • script (str, optional) – TorchScript code

  • script_path (str, optional) – path to TorchScript code

  • device (str, optional) – device for script execution, defaults to “CPU”

  • devices_per_node (int) – number of devices on each host

attach_generator_files(to_copy=None, to_symlink=None, to_configure=None)[source]

Attach files to each model within the ensemble for generation

Attach files needed for the entity that, upon generation, will be located in the path of the entity.

During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.

Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;

Parameters
  • to_copy (list, optional) – files to copy, defaults to []

  • to_symlink (list, optional) – files to symlink, defaults to []

  • to_configure (list, optional) – input files with tagged parameters, defaults to []

enable_key_prefixing()[source]

If called, all models within this ensemble will prefix their keys with its own model name.

query_key_prefixing()[source]

Inquire as to whether each model within the ensemble will prefix its keys

Returns

True if all models have key prefixing enabled, False otherwise

Return type

bool

register_incoming_entity(incoming_entity)[source]

Register future communication between entities.

Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity

Only python clients can have multiple incoming connections

Parameters

incoming_entity (SmartSimEntity) – The entity that data will be received from

property type

Return the name of the class

Machine Learning

SmartSim includes built-in utilities for supporting TensorFlow, Keras, and Pytorch.

TensorFlow

SmartSim includes built-in utilities for supporting TensorFlow and Keras in training and inference.

freeze_model(model, output_dir, file_name)

Freeze a Keras or TensorFlow Graph

freeze_model(model, output_dir, file_name)[source]

Freeze a Keras or TensorFlow Graph

to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model_from_file() method.

This utiliy function provides everything users need to take a trained model and put it inside an orchestrator instance

Parameters
  • model (tf.Module) – TensorFlow or Keras model

  • output_dir (str) – output dir to save model file to

  • file_name (str) – name of model file to create

Returns

path to model file, model input layer names, model output layer names

Return type

str, list[str], list[str]

serialize_model(model)[source]

Serialize a Keras or TensorFlow Graph

to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model() method.

This utiliy function provides everything users need to take a trained model and put it inside an orchestrator instance.

Parameters

model (tf.Module) – TensorFlow or Keras model

Returns

serialized model, model input layer names, model output layer names

Return type

str, list[str], list[str]

class StaticDataGenerator(**kwargs)[source]

Bases: smartsim.ml.data.StaticDataDownloader, tensorflow.python.keras.utils.data_utils.Sequence

A class to download a dataset from the DB.

Details about parameters and features of this class can be found in the documentation of StaticDataDownloader, of which it is just a TensorFlow-specialized sub-class.

init_samples(sources=None)

Initialize samples (and targets, if needed).

This function will not return until samples have been downloaded from all sources.

Parameters

sources (list[tuple], optional) – List of sources as defined in init_sources, defaults to None, in which case sources will be initialized, unless self.sources is already set

init_sources()

Initalize list of data sources based on incoming entitites and self.sub_indices.

Each source is represented as a tuple (producer_name, sub_index). Before init_samples() is called, the user can modify the list. Once init_samples() is called, all data is downloaded and batches can be obtained with iter(). The list of all sources is stored as self.sources.

Raises
  • ValueError – If self.uploader_info is set to auto but no uploader_name is specified.

  • ValueError – If self.uploader_info is not set to auto or manual.

property need_targets

Compute if targets have to be downloaded.

Returns

Whether targets (or labels) should be downloaded

Return type

bool

on_epoch_end()[source]

Callback called at the end of each training epoch

If self.shuffle is set to True, data is shuffled.

class DynamicDataGenerator(**kwargs)[source]

Bases: smartsim.ml.data.DynamicDataDownloader, smartsim.ml.tf.data.StaticDataGenerator

A class to download batches from the DB.

Details about parameters and features of this class can be found in the documentation of DynamicDataDownloader, of which it is just a TensorFlow-specialized sub-class.

init_samples(sources=None)

Initialize samples (and targets, if needed).

This function will not return until at least one batch worth of data has been downloaded.

Parameters

sources (list[tuple], optional) – List of sources as defined in init_sources, defaults to None, in which case sources will be initialized, unless self.sources is already set

init_sources()

Initalize list of data sources based on incoming entitites and self.sub_indices.

Each source is represented as a tuple (producer_name, sub_index). Before init_samples() is called, the user can modify the list. Once init_samples() is called, all data is downloaded and batches can be obtained with iter(). The list of all sources is stored as self.sources.

Raises
  • ValueError – If self.uploader_info is set to auto but no uploader_name is specified.

  • ValueError – If self.uploader_info is not set to auto or manual.

property need_targets

Compute if targets have to be downloaded.

Returns

Whether targets (or labels) should be downloaded

Return type

bool

on_epoch_end()[source]

Callback called at the end of each training epoch

Update data (the DB is queried for new batches) and if self.shuffle is set to True, data is also shuffled.

update_data()

Update data.

Fetch new batches (if available) from the DB. Also shuffle list of samples if self.shuffle is set to True.

PyTorch

SmartSim includes built-in utilities for supporting PyTorch in training and inference.

class StaticDataGenerator(**kwargs)[source]

Bases: smartsim.ml.data.StaticDataDownloader, torch.utils.data.dataset.IterableDataset

A class to download a dataset from the DB.

Details about parameters and features of this class can be found in the documentation of StaticDataDownloader, of which it is just a PyTorch-specialized sub-class.

Note that if the StaticDataGenerator has to be used through a DataLoader, init_samples must be set to False, as sources and samples will be initialized by the DataLoader workers.

init_samples(sources=None)

Initialize samples (and targets, if needed).

This function will not return until samples have been downloaded from all sources.

Parameters

sources (list[tuple], optional) – List of sources as defined in init_sources, defaults to None, in which case sources will be initialized, unless self.sources is already set

init_sources()

Initalize list of data sources based on incoming entitites and self.sub_indices.

Each source is represented as a tuple (producer_name, sub_index). Before init_samples() is called, the user can modify the list. Once init_samples() is called, all data is downloaded and batches can be obtained with iter(). The list of all sources is stored as self.sources.

Raises
  • ValueError – If self.uploader_info is set to auto but no uploader_name is specified.

  • ValueError – If self.uploader_info is not set to auto or manual.

property need_targets

Compute if targets have to be downloaded.

Returns

Whether targets (or labels) should be downloaded

Return type

bool

class DynamicDataGenerator(**kwargs)[source]

Bases: smartsim.ml.data.DynamicDataDownloader, smartsim.ml.torch.data.StaticDataGenerator

A class to download batches from the DB.

Details about parameters and features of this class can be found in the documentation of DynamicDataDownloader, of which it is just a PyTorch-specialized sub-class.

Note that if the DynamicDataGenerator has to be used through a DataLoader, init_samples must be set to False, as sources and samples will be initialized by the DataLoader workers.

init_samples(sources=None)

Initialize samples (and targets, if needed).

This function will not return until at least one batch worth of data has been downloaded.

Parameters

sources (list[tuple], optional) – List of sources as defined in init_sources, defaults to None, in which case sources will be initialized, unless self.sources is already set

init_sources()

Initalize list of data sources based on incoming entitites and self.sub_indices.

Each source is represented as a tuple (producer_name, sub_index). Before init_samples() is called, the user can modify the list. Once init_samples() is called, all data is downloaded and batches can be obtained with iter(). The list of all sources is stored as self.sources.

Raises
  • ValueError – If self.uploader_info is set to auto but no uploader_name is specified.

  • ValueError – If self.uploader_info is not set to auto or manual.

property need_targets

Compute if targets have to be downloaded.

Returns

Whether targets (or labels) should be downloaded

Return type

bool

update_data()

Update data.

Fetch new batches (if available) from the DB. Also shuffle list of samples if self.shuffle is set to True.

class DataLoader(dataset: smartsim.ml.torch.data.StaticDataGenerator, **kwargs)[source]

Bases: torch.utils.data.dataloader.DataLoader

DataLoader to be used as a wrapper of StaticDataGenerator or DynamicDataGenerator

This is just a sub-class of torch.utils.data.DataLoader which sets up sources of a data generator correctly. DataLoader parameters such as num_workers can be passed at initialization. batch_size should always be set to None.

Slurm

get_allocation([nodes, time, account, options])

Request an allocation

release_allocation(alloc_id)

Free an allocation's resources

Ray

RayCluster is used to launch a Ray cluster

and can be launched as a batch or in an interactive allocation.

class RayCluster(name, path='/usr/local/src/SmartSim/doc', ray_port=6789, ray_args=None, num_nodes=1, run_args=None, batch_args=None, launcher='local', batch=False, time='01:00:00', interface='ipogif0', alloc=None, run_command=None, host_list=None, password='auto', **kwargs)[source]

Bases: smartsim.entity.entityList.EntityList

Entity used to run a Ray cluster on a given number of hosts. One Ray node is launched on each host, and the first host is used to launch the head node.

Parameters
  • name (str) – The name of the entity.

  • path (str) – Path to output, error, and configuration files

  • ray_port (int) – Port at which the head node will be running.

  • ray_args (dict[str,str]) – Arguments to be passed to Ray executable.

  • num_nodes (int) – Number of hosts, includes 1 head node and all worker nodes.

  • run_args (dict[str,str]) – Arguments to pass to launcher to specify details such as partition or time.

  • batch_args (dict[str,str]) – Additional batch arguments passed to launcher when running batch jobs.

  • launcher (str) – Name of launcher to use for starting the cluster.

  • interface (str) – Name of network interface the cluster nodes should bind to.

  • alloc (int) – ID of allocation to run on, if obtained with smartsim.slurm.get_allocation

  • batch (bool) – Whether cluster should be launched as batch file, ignored when launcher is local

  • time (str) – The walltime the cluster will be running for

  • run_command (str) – Specify launch binary, defaults to automatic selection.

  • hosts (str, list[str]) – Specify hosts to launch on, defaults to None. Optional if not launching with OpenMPI.

  • password (str) – Password to use for Redis server, which is passed as –redis_password to ray start. Can be set to - auto: a strong password will be generated internally - a string: it will be used as password - None: the default Ray password will be used. Defaults to auto

property batch
get_dashboard_address()[source]

Returns dashboard address

The format is <head_ip>:<dashboard_port>

Returns

Dashboard address

Return type

str

get_head_address()[source]

Return address of head node

If address has not been initialized, returns None

Returns

Address of head node

Return type

str

set_hosts(host_list, launcher)[source]

Specify the hosts for the RayCluster to launch on. This is optional, unless run_command is mpirun.

Parameters

host_list (str | list[str]) – list of hosts (compute node names)

Raises

TypeError – if wrong type

set_path(new_path)
property type

Return the name of the class