SmartSim API#
Experiment#
|
Initialize an Experiment instance. |
|
Start passed instances using Experiment launcher |
|
Stop specific instances launched by this |
|
Create an |
|
Create a general purpose |
|
Initialize an |
|
Create a |
|
Create a |
|
Generate the file structure for an |
|
Monitor jobs through logging to stdout. |
|
Query if a job has completed. |
|
Query the status of launched entity instances |
|
Reconnect to a running |
|
Preview entity information prior to launch. |
|
Return a summary of the |
Return the telemetry configuration for this entity. |
- class Experiment(name: str, exp_path: str | None = None, launcher: str = 'local')[source]#
Bases:
object
Experiment is a factory class that creates stages of a workflow and manages their execution.
The instances created by an Experiment represent executable code that is either user-specified, like the
Model
instance created byExperiment.create_model
, or pre-configured, like theOrchestrator
instance created byExperiment.create_database
.Experiment methods that accept a variable list of arguments, such as
Experiment.start
orExperiment.stop
, accept any number of the instances created by the Experiment.In general, the Experiment class is designed to be initialized once and utilized throughout runtime.
Initialize an Experiment instance.
With the default settings, the Experiment will use the local launcher, which will start all Experiment created instances on the localhost.
Example of initializing an Experiment with the local launcher
exp = Experiment(name="my_exp", launcher="local")
SmartSim supports multiple launchers which also can be specified based on the type of system you are running on.
exp = Experiment(name="my_exp", launcher="slurm")
If you want your Experiment driver script to be run across multiple system with different schedulers (workload managers) you can also use the auto argument to have the Experiment detect which launcher to use based on system installed binaries and libraries.
exp = Experiment(name="my_exp", launcher="auto")
The Experiment path will default to the current working directory and if the
Experiment.generate
method is called, a directory with the Experiment name will be created to house the output from the Experiment.- Parameters:
name (
str
) – name for theExperiment
exp_path (
Optional
[str
], default:None
) – path to location ofExperiment
directorylauncher (
str
, default:'local'
) – type of launcher being used, options are “slurm”, “pbs”, “lsf”, “sge”, or “local”. If set to “auto”, an attempt will be made to find an available launcher on the system.
- create_batch_settings(nodes: int = 1, time: str = '', queue: str = '', account: str = '', batch_args: Dict[str, str] | None = None, **kwargs: Any) smartsim.settings.base.BatchSettings [source]#
Create a
BatchSettings
instanceBatch settings parameterize batch workloads. The result of this function can be passed to the
Ensemble
initialization.the batch_args parameter can be used to pass in a dictionary of additional batch command arguments that aren’t supported through the smartsim interface
# i.e. for Slurm batch_args = { "distribution": "block" "exclusive": None } bs = exp.create_batch_settings(nodes=3, time="10:00:00", batch_args=batch_args) bs.set_account("default")
- Parameters:
nodes (
int
, default:1
) – number of nodes for batch jobtime (
str
, default:''
) – length of batch jobqueue (
str
, default:''
) – queue or partition (if slurm)account (
str
, default:''
) – user account name for batch systembatch_args (
Optional
[Dict
[str
,str
]], default:None
) – additional batch arguments
- Return type:
BatchSettings
- Returns:
a newly created BatchSettings instance
- Raises:
SmartSimError – if batch creation fails
- create_database(port: int = 6379, path: str | None = None, db_nodes: int = 1, batch: bool = False, hosts: str | List[str] | None = None, run_command: str = 'auto', interface: str | List[str] = 'ipogif0', account: str | None = None, time: str | None = None, queue: str | None = None, single_cmd: bool = True, db_identifier: str = 'orchestrator', **kwargs: Any) smartsim.database.orchestrator.Orchestrator [source]#
Initialize an
Orchestrator
databaseThe
Orchestrator
database is a key-value store based on Redis that can be launched together with otherExperiment
created instances for online data storage.When launched,
Orchestrator
can be used to communicate data between Fortran, Python, C, and C++ applications.Machine Learning models in Pytorch, Tensorflow, and ONNX (i.e. scikit-learn) can also be stored within the
Orchestrator
database where they can be called remotely and executed on CPU or GPU where the database is hosted.To enable a SmartSim
Model
to communicate with the database the workload must utilize the SmartRedis clients. For more information on the database, and SmartRedis clients see the documentation at https://www.craylabs.org/docs/smartredis.html- Parameters:
port (
int
, default:6379
) – TCP/IP portdb_nodes (
int
, default:1
) – number of database shardsbatch (
bool
, default:False
) – run as a batch workloadhosts (
Union
[str
,List
[str
],None
], default:None
) – specify hosts to launch onrun_command (
str
, default:'auto'
) – specify launch binary or detect automaticallyinterface (
Union
[str
,List
[str
]], default:'ipogif0'
) – Network interfaceaccount (
Optional
[str
], default:None
) – account to run batch ontime (
Optional
[str
], default:None
) – walltime for batch ‘HH:MM:SS’ formatqueue (
Optional
[str
], default:None
) – queue to run the batch onsingle_cmd (
bool
, default:True
) – run all shards with one (MPMD) commanddb_identifier (
str
, default:'orchestrator'
) – an identifier to distinguish this orchestrator in multiple-database experiments
- Raises:
SmartSimError – if detection of launcher or of run command fails
SmartSimError – if user indicated an incompatible run command for the launcher
- Return type:
- Returns:
Orchestrator or derived class
- create_ensemble(name: str, params: Dict[str, Any] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None, run_settings: smartsim.settings.base.RunSettings | None = None, replicas: int | None = None, perm_strategy: str = 'all_perm', path: str | None = None, **kwargs: Any) smartsim.entity.ensemble.Ensemble [source]#
Create an
Ensemble
ofModel
instancesEnsembles can be launched sequentially or as a batch if using a non-local launcher. e.g. slurm
Ensembles require one of the following combinations of arguments:
run_settings
andparams
run_settings
andreplicas
batch_settings
batch_settings
,run_settings
, andparams
batch_settings
,run_settings
, andreplicas
If given solely batch settings, an empty ensemble will be created that Models can be added to manually through
Ensemble.add_model()
. The entire Ensemble will launch as one batch.Provided batch and run settings, either
params
orreplicas
must be passed and the entire ensemble will launch as a single batch.Provided solely run settings, either
params
orreplicas
must be passed and the Ensemble members will each launch sequentially.The kwargs argument can be used to pass custom input parameters to the permutation strategy.
- Parameters:
name (
str
) – name of theEnsemble
params (
Optional
[Dict
[str
,Any
]], default:None
) – parameters to expand intoModel
membersbatch_settings (
Optional
[BatchSettings
], default:None
) – describes settings forEnsemble
as batch workloadrun_settings (
Optional
[RunSettings
], default:None
) – describes how eachModel
should be executedreplicas (
Optional
[int
], default:None
) – number of replicas to createperm_strategy (
str
, default:'all_perm'
) – strategy for expandingparams
intoModel
instances from params argument options are “all_perm”, “step”, “random” or a callable function.
- Raises:
SmartSimError – if initialization fails
- Return type:
- Returns:
Ensemble
instance
- create_model(name: str, run_settings: smartsim.settings.base.RunSettings, params: Dict[str, Any] | None = None, path: str | None = None, enable_key_prefixing: bool = False, batch_settings: smartsim.settings.base.BatchSettings | None = None) smartsim.entity.model.Model [source]#
Create a general purpose
Model
The
Model
class is the most general encapsulation of executable code in SmartSim.Model
instances are named references to pieces of a workflow that can be parameterized, and executed.Model
instances can be launched sequentially, as a batch job, or as a group by adding them into anEnsemble
.All
Models
require a reference to run settings to specify which executable to launch as well provide options for how to launch the executable with the underlying WLM. Furthermore, batch a reference to a batch settings can be added to launch theModel
as a batch job throughExperiment.start
. If aModel
with a reference to a set of batch settings is added to a larger entity with its own set of batch settings (for e.g. anEnsemble
) the batch settings of the larger entity will take precedence and the batch setting of theModel
will be strategically ignored.Parameters supplied in the params argument can be written into configuration files supplied at runtime to the
Model
throughModel.attach_generator_files
. params can also be turned into executable arguments by callingModel.params_to_args
By default,
Model
instances will be executed in the exp_path/model_name directory if no path argument is supplied. If aModel
instance is passed toExperiment.generate
, a directory within theExperiment
directory will be created to house the input and output files from theModel
.Example initialization of a
Model
instancefrom smartsim import Experiment run_settings = exp.create_run_settings("python", "run_pytorch_model.py") model = exp.create_model("pytorch_model", run_settings) # adding parameters to a model run_settings = exp.create_run_settings("python", "run_pytorch_model.py") train_params = { "batch": 32, "epoch": 10, "lr": 0.001 } model = exp.create_model("pytorch_model", run_settings, params=train_params) model.attach_generator_files(to_configure="./train.cfg") exp.generate(model)
New in 0.4.0,
Model
instances can be colocated with an Orchestrator database shard throughModel.colocate_db
. This will launch a singleOrchestrator
instance on each compute host used by the (possibly distributed) application. This is useful for performant online inference or processing at runtime.New in 0.4.2,
Model
instances can now be colocated with an Orchestrator database over either TCP or UDS using theModel.colocate_db_tcp
orModel.colocate_db_uds
method respectively. The originalModel.colocate_db
method is now deprecated, but remains as an alias forModel.colocate_db_tcp
for backward compatibility.- Parameters:
name (
str
) – name of theModel
run_settings (
RunSettings
) – defines howModel
should be runparams (
Optional
[Dict
[str
,Any
]], default:None
) –Model
parameters for writing into configuration filespath (
Optional
[str
], default:None
) – path to where theModel
should be executed at runtimeenable_key_prefixing (
bool
, default:False
) – If True, data sent to theOrchestrator
using SmartRedis from thisModel
will be prefixed with theModel
name.batch_settings (
Optional
[BatchSettings
], default:None
) – Settings to runModel
individually as a batch job.
- Raises:
SmartSimError – if initialization fails
- Return type:
- Returns:
the created
Model
- create_run_settings(exe: str, exe_args: List[str] | None = None, run_command: str = 'auto', run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, container: smartsim.settings.containers.Container | None = None, **kwargs: Any) smartsim.settings.base.RunSettings [source]#
Create a
RunSettings
instance.run_command=”auto” will attempt to automatically match a run command on the system with a
RunSettings
class in SmartSim. If found, the class corresponding to that run_command will be created and returned.If the local launcher is being used, auto detection will be turned off.
If a recognized run command is passed, the
RunSettings
instance will be a child class such asSrunSettings
If not supported by smartsim, the base
RunSettings
class will be created and returned with the specified run_command and run_args will be evaluated literally.- Run Commands with implemented helper classes:
aprun (ALPS)
srun (SLURM)
mpirun (OpenMPI)
jsrun (LSF)
- Parameters:
run_command (
str
, default:'auto'
) – command to run the executableexe (
str
) – executable to runexe_args (
Optional
[List
[str
]], default:None
) – arguments to pass to the executablerun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments to pass to therun_command
env_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment variables to pass to the executablecontainer (
Optional
[Container
], default:None
) – if execution environment is containerized
- Return type:
- Returns:
the created
RunSettings
- finished(entity: smartsim.entity.entity.SmartSimEntity) bool [source]#
Query if a job has completed.
An instance of
Model
orEnsemble
can be passed as an argument.Passing
Orchestrator
will return an error as a database deployment is never finished until stopped by the user.- Parameters:
entity (
SmartSimEntity
) – object launched by thisExperiment
- Return type:
bool
- Returns:
True if the job has finished, False otherwise
- Raises:
SmartSimError – if entity has not been launched by this
Experiment
- generate(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity], tag: str | None = None, overwrite: bool = False, verbose: bool = False) None [source]#
Generate the file structure for an
Experiment
Experiment.generate
creates directories for each entity passed to organize Experiments that launch many entities.If files or directories are attached to
Model
objects usingModel.attach_generator_files()
, those files or directories will be symlinked, copied, or configured and written into the created directory for that instance.Instances of
Model
,Ensemble
andOrchestrator
can all be passed as arguments to the generate method.- Parameters:
tag (
Optional
[str
], default:None
) – tag used in to_configure generator filesoverwrite (
bool
, default:False
) – overwrite existing folders and contentsverbose (
bool
, default:False
) – log parameter settings to std out
- Return type:
None
- get_status(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity]) List[smartsim.status.SmartSimStatus] [source]#
Query the status of launched entity instances
Return a smartsim.status string representing the status of the launched instance.
exp.get_status(model)
As with an Experiment method, multiple instance of varying types can be passed to and all statuses will be returned at once.
statuses = exp.get_status(model, ensemble, orchestrator) complete = [s == smartsim.status.STATUS_COMPLETED for s in statuses] assert all(complete)
- Return type:
List
[SmartSimStatus
]- Returns:
status of the instances passed as arguments
- Raises:
SmartSimError – if status retrieval fails
- poll(interval: int = 10, verbose: bool = True, kill_on_interrupt: bool = True) None [source]#
Monitor jobs through logging to stdout.
This method should only be used if jobs were launched with
Experiment.start(block=False)
The internal specified will control how often the logging is performed, not how often the polling occurs. By default, internal polling is set to every second for local launcher jobs and every 10 seconds for all other launchers.
If internal polling needs to be slower or faster based on system or site standards, set the
SMARTSIM_JM_INTERNAL
environment variable to control the internal polling interval for SmartSim.For more verbose logging output, the
SMARTSIM_LOG_LEVEL
environment variable can be set to debugIf kill_on_interrupt=True, then all jobs launched by this experiment are guaranteed to be killed when ^C (SIGINT) signal is received. If kill_on_interrupt=False, then it is not guaranteed that all jobs launched by this experiment will be killed, and the zombie processes will need to be manually killed.
- Parameters:
interval (
int
, default:10
) – frequency (in seconds) of logging to stdoutverbose (
bool
, default:True
) – set verbositykill_on_interrupt (
bool
, default:True
) – flag for killing jobs when SIGINT is received
- Raises:
SmartSimError – if poll request fails
- Return type:
None
- preview(*args: Any, verbosity_level: smartsim._core.control.previewrenderer.Verbosity = Verbosity.INFO, output_format: smartsim._core.control.previewrenderer.Format = Format.PLAINTEXT, output_filename: str | None = None) None [source]#
Preview entity information prior to launch. This method aggregates multiple pieces of information to give users insight into what and how entities will be launched. Any instance of
Model
,Ensemble
, orOrchestrator
created by the Experiment can be passed as an argument to the preview method.- Verbosity levels:
info: Display user-defined fields and entities.
- debug: Display user-defined field and entities and auto-generated
fields.
- developer: Display user-defined field and entities, auto-generated
fields, and run commands.
- Parameters:
verbosity_level (
Verbosity
, default:<Verbosity.INFO: 'info'>
) – verbosity level specified by user, defaults to info.output_format (
Format
, default:<Format.PLAINTEXT: 'plain_text'>
) – Set output format. The possible accepted output formats areplain_text
. Defaults toplain_text
.output_filename (
Optional
[str
], default:None
) – Specify name of file and extension to write preview data to. If no output filename is set, the preview will be output to stdout. Defaults to None.
- Return type:
None
- reconnect_orchestrator(checkpoint: str) smartsim.database.orchestrator.Orchestrator [source]#
Reconnect to a running
Orchestrator
This method can be used to connect to a
Orchestrator
deployment that was launched by a previousExperiment
. This can be helpful in the case where separate runs of anExperiment
wish to use the sameOrchestrator
instance currently running on a system.- Parameters:
checkpoint (
str
) – the smartsim_db.dat file created when anOrchestrator
is launched- Return type:
- start(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity], block: bool = True, summary: bool = False, kill_on_interrupt: bool = True) None [source]#
Start passed instances using Experiment launcher
Any instance
Model
,Ensemble
orOrchestrator
instance created by the Experiment can be passed as an argument to the start method.exp = Experiment(name="my_exp", launcher="slurm") settings = exp.create_run_settings(exe="./path/to/binary") model = exp.create_model("my_model", settings) exp.start(model)
Multiple entity instances can also be passed to the start method at once no matter which type of instance they are. These will all be launched together.
exp.start(model_1, model_2, db, ensemble, block=True) # alternatively stage_1 = [model_1, model_2, db, ensemble] exp.start(*stage_1, block=True)
If block==True the Experiment will poll the launched instances at runtime until all non-database jobs have completed. Database jobs must be killed by the user by passing them to
Experiment.stop
. This allows for multiple stages of a workflow to produce to and consume from the same Orchestrator database.If kill_on_interrupt=True, then all jobs launched by this experiment are guaranteed to be killed when ^C (SIGINT) signal is received. If kill_on_interrupt=False, then it is not guaranteed that all jobs launched by this experiment will be killed, and the zombie processes will need to be manually killed.
- Parameters:
block (
bool
, default:True
) – block execution until all non-database jobs are finishedsummary (
bool
, default:False
) – print a launch summary prior to launchkill_on_interrupt (
bool
, default:True
) – flag for killing jobs when ^C (SIGINT) signal is received.
- Return type:
None
- stop(*args: smartsim.entity.entity.SmartSimEntity | smartsim.entity.entityList.EntitySequence[smartsim.entity.entity.SmartSimEntity]) None [source]#
Stop specific instances launched by this
Experiment
Instances of
Model
,Ensemble
andOrchestrator
can all be passed as arguments to the stop method.Whichever launcher was specified at Experiment initialization will be used to stop the instance. For example, which using the slurm launcher, this equates to running scancel on the instance.
Example
exp.stop(model) # multiple exp.stop(model_1, model_2, db, ensemble)
- Parameters:
args (
Union
[SmartSimEntity
,EntitySequence
[SmartSimEntity
]]) – One or more SmartSimEntity or EntitySequence objects.- Raises:
TypeError – if wrong type
SmartSimError – if stop request fails
- Return type:
None
- summary(style: str = 'github') str [source]#
Return a summary of the
Experiment
The summary will show each instance that has been launched and completed in this
Experiment
- Parameters:
style (
str
, default:'github'
) – the style in which the summary table is formatted, for a full list of styles see the table-format section of: astanin/python-tabulate- Return type:
str
- Returns:
tabulate string of
Experiment
history
- property telemetry: TelemetryConfiguration#
Return the telemetry configuration for this entity.
- Returns:
configuration of telemetry for this entity
Settings#
Settings are provided to Model
and Ensemble
objects
to provide parameters for how a job should be executed. Some
are specifically meant for certain launchers like SbatchSettings
is solely meant for system using Slurm as a workload manager.
MpirunSettings
for OpenMPI based jobs is supported by Slurm
and PBSPro.
Types of Settings:
|
Run parameters for a |
|
Initialize run parameters for a slurm job with |
|
Settings to run job with |
|
Settings to run job with |
|
Settings to run job with |
|
Settings to run job with |
|
Settings to run job with |
|
Initialize run parameters for a Dragon process |
|
Specify run parameters for a Slurm batch job |
|
Specify |
|
Specify |
Settings objects can accept a container object that defines a container runtime, image, and arguments to use for the workload. Below is a list of supported container runtimes.
Types of Containers:
|
Singularity (apptainer) container type. |
RunSettings#
When running SmartSim on laptops and single node workstations,
the base RunSettings
object is used to parameterize jobs.
RunSettings
include a run_command
parameter for local
launches that utilize a parallel launch binary like
mpirun
, mpiexec
, and others.
|
Add executable arguments to executable |
|
Update the job environment variables |
- class RunSettings(exe: str, exe_args: str | List[str] | None = None, run_command: str = '', run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, container: smartsim.settings.containers.Container | None = None, **_kwargs: Any) None [source]#
Run parameters for a
Model
The base
RunSettings
class should only be used with the local launcher on single node, workstations, or laptops.If no
run_command
is specified, the executable will be launched locally.run_args
passed as a dict will be interpreted literally for localRunSettings
and added directly to therun_command
e.g. run_args = {“-np”: 2} will be “-np 2”Example initialization
rs = RunSettings("echo", "hello", "mpirun", run_args={"-np": "2"})
- Parameters:
exe (
str
) – executable to runexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_command (
str
, default:''
) – launch binary (e.g. “srun”)run_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run command (e.g. -np for mpiexec)env_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job withcontainer (
Optional
[Container
], default:None
) – container type for workload (e.g. “singularity”)
- add_exe_args(args: str | List[str]) None [source]#
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] [source]#
Build environment variable string
- Return type:
List
[str
]- Returns:
formatted list of strings to export variables
- format_run_args() List[str] [source]#
Return formatted run arguments
For
RunSettings
, the run arguments are passed literally with no formatting.- Return type:
List
[str
]- Returns:
list run arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None [source]#
Make job an MPMD job
- Parameters:
settings (
RunSettings
) –RunSettings
instance- Return type:
None
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None [source]#
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None [source]#
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None [source]#
Copy executable file to allocated compute nodes
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None [source]#
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus per task
- Parameters:
cpus_per_task (
int
) – number of cpus per task- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None [source]#
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Return type:
None
- set_hostlist_from_file(file_path: str) None [source]#
Use the contents of a file to specify the hostlist for this job
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None [source]#
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None [source]#
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None [source]#
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None [source]#
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None [source]#
Set the job to run in quiet mode
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None [source]#
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None [source]#
Set the number of tasks to launch
- Parameters:
tasks (
int
) – number of tasks to launch- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None [source]#
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None [source]#
Set the job to run in verbose mode
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None [source]#
Set the formatted walltime
- Parameters:
walltime (
str
) – Time in format required by launcher``- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None [source]#
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
SrunSettings#
SrunSettings
can be used for running on existing allocations,
running jobs in interactive allocations, and for adding srun
steps to a batch.
|
Set the number of nodes |
|
Specify the node feature for this job |
|
Set the number of tasks for this job |
|
Set the number of tasks for this job |
|
Set the walltime of the job |
|
Specify the hostlist for this job |
|
Specify a list of hosts to exclude for launching this job |
|
Set the number of cpus to use per task |
Add executable arguments to executable |
|
Return a list of slurm formatted run arguments |
|
Build bash compatible environment variable string for Slurm |
|
|
Update the job environment variables |
- class SrunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, alloc: str | None = None, **kwargs: Any) None [source]#
Initialize run parameters for a slurm job with
srun
SrunSettings
should only be used on Slurm based systems.If an allocation is specified, the instance receiving these run parameters will launch on that allocation.
- Parameters:
exe (
str
) – executable to runexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – srun arguments without dashesenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment variables for joballoc (
Optional
[str
], default:None
) – allocation ID if running on existing alloc
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- check_env_vars() None [source]#
Warn a user trying to set a variable which is set in the environment
Given Slurm’s env var precedence, trying to export a variable which is already present in the environment will not work.
- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_comma_sep_env_vars() Tuple[str, List[str]] [source]#
Build environment variable string for Slurm
Slurm takes exports in comma separated lists the list starts with all as to not disturb the rest of the environment for more information on this, see the slurm documentation for srun
- Return type:
Tuple
[str
,List
[str
]]- Returns:
the formatted string of environment variables
- format_env_vars() List[str] [source]#
Build bash compatible environment variable string for Slurm
- Return type:
List
[str
]- Returns:
the formatted string of environment variables
- format_run_args() List[str] [source]#
Return a list of slurm formatted run arguments
- Return type:
List
[str
]- Returns:
list of slurm arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None [source]#
Make a mpmd workload by combining two
srun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – SrunSettings instance- Return type:
None
- reserved_run_args: set[str] = {'D', 'chdir'}#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None [source]#
Copy executable file to allocated compute nodes
This sets
--bcast
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None [source]#
Bind by setting CPU masks on tasks
This sets
--cpu-bind
using themap_cpu:<list>
option- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus to use per task
This sets
--cpus-per-task
- Parameters:
num_cpus – number of cpus to use per task
- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None [source]#
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Raises:
TypeError –
- Return type:
None
- set_het_group(het_group: Iterable[int]) None [source]#
Set the heterogeneous group for this job
this sets –het-group
- Parameters:
het_group (
Iterable
[int
]) – list of heterogeneous groups- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
This sets
--nodelist
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None [source]#
Use the contents of a file to set the node list
This sets
--nodefile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None [source]#
Specify the real memory required per node
This sets
--mem
in megabytes- Parameters:
memory_per_node (
int
) – Amount of memory per node in megabytes- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None [source]#
Specify the node feature for this job
This sets
-C
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_nodes(nodes: int) None [source]#
Set the number of nodes
Effectively this is setting:
srun --nodes <num_nodes>
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None [source]#
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None [source]#
Set the number of tasks for this job
This sets
--ntasks
- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks for this job
This sets
--ntasks-per-node
- Parameters:
tasks_per_node (
int
) – number of tasks per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None [source]#
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None [source]#
Set the walltime of the job
format = “HH:MM:SS”
- Parameters:
walltime (
str
) – wall time- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
AprunSettings#
AprunSettings
can be used on any system that supports the
Cray ALPS layer. SmartSim supports using AprunSettings
on PBSPro WLM systems.
AprunSettings
can be used in interactive session (on allocation)
and within batch launches (e.g., QsubBatchSettings
)
|
Set the number of cpus to use per task |
|
Specify the hostlist for this job |
|
Set the number of tasks for this job |
|
Set the number of tasks for this job |
|
Make job an MPMD job |
Add executable arguments to executable |
|
Return a list of ALPS formatted run arguments |
|
Format the environment variables for aprun |
|
|
Update the job environment variables |
- class AprunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any)[source]#
Settings to run job with
aprun
commandAprunSettings
can be used for the pbs launcher.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] [source]#
Format the environment variables for aprun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] [source]#
Return a list of ALPS formatted run arguments
- Return type:
List
[str
]- Returns:
list of ALPS arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None [source]#
Make job an MPMD job
This method combines two
AprunSettings
into a single MPMD command joined with ‘:’- Parameters:
settings (
RunSettings
) –AprunSettings
instance- Return type:
None
- reserved_run_args: set[str] = {}#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy executable file to allocated compute nodes
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None [source]#
Specifies the cores to which MPI processes are bound
This sets
--cpu-binding
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List of cpu numbers- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus to use per task
This sets
--cpus-per-pe
- Parameters:
cpus_per_task (
int
) – number of cpus to use per task- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None [source]#
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None [source]#
Use the contents of a file to set the node list
This sets
--node-list-file
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None [source]#
Specify the real memory required per node
This sets
--memory-per-pe
in megabytes- Parameters:
memory_per_node (
int
) – Per PE memory limit in megabytes- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None [source]#
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None [source]#
Set the number of tasks for this job
This sets
--pes
- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks for this job
This sets
--pes-per-node
- Parameters:
tasks_per_node (
int
) – number of tasks per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None [source]#
Set the job to run in verbose mode
This sets
--debug
arg to the highest level- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None [source]#
Set the walltime of the job
Walltime is given in total number of seconds
- Parameters:
walltime (
str
) – wall time- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
DragonRunSettings#
DragonRunSettings
can be used on systems that support Slurm or
PBS, if Dragon is available in the Python environment (see _dragon_install
for instructions on how to install it through smart
).
DragonRunSettings
can be used in interactive sessions (on allcation)
and within batch launches (i.e. SbatchSettings
or QsubBatchSettings
,
for Slurm and PBS sessions, respectively).
|
Set the number of nodes |
Set the number of tasks for this job |
- class DragonRunSettings(exe: str, exe_args: str | List[str] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Initialize run parameters for a Dragon process
DragonRunSettings
should only be used on systems where Dragon is available and installed in the current environment.If an allocation is specified, the instance receiving these run parameters will launch on that allocation.
- Parameters:
exe (
str
) – executable to runexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable arguments, defaults to Noneenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment variables for job, defaults to Nonealloc – allocation ID if running on existing alloc, defaults to None
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Build environment variable string
- Return type:
List
[str
]- Returns:
formatted list of strings to export variables
- format_run_args() List[str] #
Return formatted run arguments
For
RunSettings
, the run arguments are passed literally with no formatting.- Return type:
List
[str
]- Returns:
list run arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make job an MPMD job
- Parameters:
settings (
RunSettings
) –RunSettings
instance- Return type:
None
- reserved_run_args: set[str] = {}#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy executable file to allocated compute nodes
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_affinity(devices: List[int]) None [source]#
Set the CPU affinity for this job
- Parameters:
devices (
List
[int
]) – list of CPU indices to execute on- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of cpus per task
- Parameters:
cpus_per_task (
int
) – number of cpus per task- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_gpu_affinity(devices: List[int]) None [source]#
Set the GPU affinity for this job
- Parameters:
devices (
List
[int
]) – list of GPU indices to execute on.- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to specify the hostlist for this job
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None [source]#
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – a collection of strings representing the required node features. Currently supported node features are: “gpu”- Return type:
None
- set_nodes(nodes: int) None [source]#
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks to launch
- Parameters:
tasks (
int
) – number of tasks to launch- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks for this job
- Parameters:
tasks_per_node (
int
) – number of tasks per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the formatted walltime
- Parameters:
walltime (
str
) – Time in format required by launcher``- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
JsrunSettings#
JsrunSettings
can be used on any system that supports the
IBM LSF launcher.
JsrunSettings
can be used in interactive session (on allocation)
and within batch launches (i.e. BsubBatchSettings
)
|
Set the number of resource sets to use |
|
Set the number of cpus to use per resource set |
|
Set the number of gpus to use per resource set |
|
Set the number of resource sets to use per host |
|
Set the number of tasks for this job |
|
Set the number of tasks per resource set |
|
Set binding |
|
Make step an MPMD (or SPMD) job. |
|
Set preamble used in ERF file. |
|
Update the job environment variables |
|
Set resource sets used for ERF (SPMD or MPMD) steps. |
Format environment variables. |
|
Return a list of LSF formatted run arguments |
- class JsrunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **_kwargs: Any) None [source]#
Settings to run job with
jsrun
commandJsrunSettings
should only be used on LSF-based systems.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] [source]#
Format environment variables. Each variable needs to be passed with
--env
. If a variable is set toNone
, its value is propagated from the current environment.- Return type:
List
[str
]- Returns:
formatted list of strings to export variables
- format_run_args() List[str] [source]#
Return a list of LSF formatted run arguments
- Return type:
List
[str
]- Returns:
list of LSF arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None [source]#
Make step an MPMD (or SPMD) job.
This method will activate job execution through an ERF file.
Optionally, this method adds an instance of
JsrunSettings
to the list of settings to be launched in the same ERF file.- Parameters:
settings (
RunSettings
) –JsrunSettings
instance- Return type:
None
- reserved_run_args: set[str] = {'chdir', 'h'}#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None [source]#
Set binding
This sets
--bind
- Parameters:
binding (
str
) – Binding, e.g. packed:21- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy executable file to allocated compute nodes
- Parameters:
dest_path (
Optional
[str
], default:None
) – Path to copy an executable file- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_rs(cpus_per_rs: int) None [source]#
Set the number of cpus to use per resource set
This sets
--cpu_per_rs
- Parameters:
cpus_per_rs (
int
) – number of cpus to use per resource set or ALL_CPUS- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus per tasks.
This function is an alias for set_cpus_per_rs.
- Parameters:
cpus_per_task (
int
) – number of cpus per resource set- Return type:
None
- set_erf_sets(erf_sets: Dict[str, str]) None [source]#
Set resource sets used for ERF (SPMD or MPMD) steps.
erf_sets
is a dictionary used to fill the ERF line representing these settings, e.g. {“host”: “1”, “cpu”: “{0:21}, {21:21}”, “gpu”: “*”} can be used to specify rank (or rank_count), hosts, cpus, gpus, and memory. The key rank is used to give specific ranks, as in {“rank”: “1, 2, 5”}, while the key rank_count is used to specify the count only, as in {“rank_count”: “3”}. If both are specified, only rank is used.- Parameters:
hosts – dictionary of resources
- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_gpus_per_rs(gpus_per_rs: int) None [source]#
Set the number of gpus to use per resource set
This sets
--gpu_per_rs
- Parameters:
gpus_per_rs (
int
) – number of gpus to use per resource set or ALL_GPUS- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to specify the hostlist for this job
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_individual_output(suffix: str | None = None) None [source]#
Set individual std output.
This sets
--stdio_mode individual
and inserts the suffix into the output name. The resulting output name will beself.name + suffix + .out
.- Parameters:
suffix (
Optional
[str
], default:None
) – Optional suffix to add to output file names, it can contain %j, %h, %p, or %t, as specified by jsrun options.- Return type:
None
- set_memory_per_node(memory_per_node: int) None [source]#
Specify the number of megabytes of memory to assign to a resource set
Alias for set_memory_per_rs.
- Parameters:
memory_per_node (
int
) – Number of megabytes per rs- Return type:
None
- set_memory_per_rs(memory_per_rs: int) None [source]#
Specify the number of megabytes of memory to assign to a resource set
This sets
--memory_per_rs
- Parameters:
memory_per_rs (
int
) – Number of megabytes per rs- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None [source]#
Set preamble used in ERF file. Typical lines include oversubscribe-cpu : allow or overlapping-rs : allow. Can be used to set launch_distribution. If it is not present, it will be inferred from the settings, or set to packed by default.
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of the ERF file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_num_rs(num_rs: str | int) None [source]#
Set the number of resource sets to use
This sets
--nrs
.- Parameters:
num_rs (
Union
[str
,int
]) – Number of resource sets or ALL_HOSTS- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_rs_per_host(rs_per_host: int) None [source]#
Set the number of resource sets to use per host
This sets
--rs_per_host
- Parameters:
rs_per_host (
int
) – number of resource sets to use per host- Return type:
None
- set_task_map(task_mapping: str) None #
Set a task mapping
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None [source]#
Set the number of tasks for this job
This sets
--np
- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None [source]#
Set the number of tasks per resource set.
This function is an alias for set_tasks_per_rs.
- Parameters:
tasks_per_node (
int
) – number of tasks per resource set- Return type:
None
- set_tasks_per_rs(tasks_per_rs: int) None [source]#
Set the number of tasks per resource set
This sets
--tasks_per_rs
- Parameters:
tasks_per_rs (
int
) – number of tasks per resource set- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the formatted walltime
- Parameters:
walltime (
str
) – Time in format required by launcher``- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
MpirunSettings#
MpirunSettings
are for launching with OpenMPI. MpirunSettings
are
supported on Slurm and PBSpro.
|
Set the number of tasks for this job |
|
Set the hostlist for the |
|
Set the number of tasks for this job |
|
Set |
|
Make a mpmd workload by combining two |
Add executable arguments to executable |
|
Return a list of MPI-standard formatted run arguments |
|
Format the environment variables for mpirun |
|
|
Update the job environment variables |
- class MpirunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Settings to run job with
mpirun
command (MPI-standard)Note that environment variables can be passed with a None value to signify that they should be exported from the current environment
Any arguments passed in the
run_args
dict will be converted intompirun
arguments and prefixed with--
. Values of None can be provided for arguments that do not have values.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Format the environment variables for mpirun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] #
Return a list of MPI-standard formatted run arguments
- Return type:
List
[str
]- Returns:
list of MPI-standard arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make a mpmd workload by combining two
mpirun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – MpirunSettings instance- Return type:
None
- reserved_run_args: set[str] = {'wd', 'wdir'}#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy the specified executable(s) to remote machines
This sets
--preload-binary
- Parameters:
dest_path (
Optional
[str
], default:None
) – Destination path (Ignored)- Return type:
None
- set_cpu_binding_type(bind_type: str) None #
Specifies the cores to which MPI processes are bound
This sets
--bind-to
for MPI compliant implementations- Parameters:
bind_type (
str
) – binding type- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of tasks for this job
This sets
--cpus-per-proc
for MPI compliant implementationsnote: this option has been deprecated in openMPI 4.0+ and will soon be replaced.
- Parameters:
cpus_per_task (
int
) – number of tasks- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Set the hostlist for the
mpirun
commandThis sets
--host
- Parameters:
host_list (
Union
[str
,List
[str
]]) – list of host names- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to set the hostlist
This sets
--hostfile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set
mpirun
task mappingthis sets
--map-by <mapping>
For examples, see the man page for
mpirun
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks for this job
This sets
-n
for MPI compliant implementations- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None #
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the maximum number of seconds that a job will run
This sets
--timeout
- Parameters:
walltime (
str
) – number like string of seconds that a job will run in secs- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
MpiexecSettings#
MpiexecSettings
are for launching with OpenMPI’s mpiexec
. MpirunSettings
are
supported on Slurm and PBSpro.
|
Set the number of tasks for this job |
|
Set the hostlist for the |
|
Set the number of tasks for this job |
|
Set |
|
Make a mpmd workload by combining two |
Add executable arguments to executable |
|
Return a list of MPI-standard formatted run arguments |
|
Format the environment variables for mpirun |
|
|
Update the job environment variables |
- class MpiexecSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Settings to run job with
mpiexec
command (MPI-standard)Note that environment variables can be passed with a None value to signify that they should be exported from the current environment
Any arguments passed in the
run_args
dict will be converted intompiexec
arguments and prefixed with--
. Values of None can be provided for arguments that do not have values.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Format the environment variables for mpirun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] #
Return a list of MPI-standard formatted run arguments
- Return type:
List
[str
]- Returns:
list of MPI-standard arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make a mpmd workload by combining two
mpirun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – MpirunSettings instance- Return type:
None
- reserved_run_args: set[str] = {'wd', 'wdir'}#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy the specified executable(s) to remote machines
This sets
--preload-binary
- Parameters:
dest_path (
Optional
[str
], default:None
) – Destination path (Ignored)- Return type:
None
- set_cpu_binding_type(bind_type: str) None #
Specifies the cores to which MPI processes are bound
This sets
--bind-to
for MPI compliant implementations- Parameters:
bind_type (
str
) – binding type- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of tasks for this job
This sets
--cpus-per-proc
for MPI compliant implementationsnote: this option has been deprecated in openMPI 4.0+ and will soon be replaced.
- Parameters:
cpus_per_task (
int
) – number of tasks- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Set the hostlist for the
mpirun
commandThis sets
--host
- Parameters:
host_list (
Union
[str
,List
[str
]]) – list of host names- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to set the hostlist
This sets
--hostfile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set
mpirun
task mappingthis sets
--map-by <mapping>
For examples, see the man page for
mpirun
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks for this job
This sets
-n
for MPI compliant implementations- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None #
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the maximum number of seconds that a job will run
This sets
--timeout
- Parameters:
walltime (
str
) – number like string of seconds that a job will run in secs- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
OrterunSettings#
OrterunSettings
are for launching with OpenMPI’s orterun
. OrterunSettings
are
supported on Slurm and PBSpro.
|
Set the number of tasks for this job |
|
Set the hostlist for the |
|
Set the number of tasks for this job |
|
Set |
|
Make a mpmd workload by combining two |
Add executable arguments to executable |
|
Return a list of MPI-standard formatted run arguments |
|
Format the environment variables for mpirun |
|
|
Update the job environment variables |
- class OrterunSettings(exe: str, exe_args: str | List[str] | None = None, run_args: Dict[str, int | str | float | None] | None = None, env_vars: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Settings to run job with
orterun
command (MPI-standard)Note that environment variables can be passed with a None value to signify that they should be exported from the current environment
Any arguments passed in the
run_args
dict will be converted intoorterun
arguments and prefixed with--
. Values of None can be provided for arguments that do not have values.- Parameters:
exe (
str
) – executableexe_args (
Union
[str
,List
[str
],None
], default:None
) – executable argumentsrun_args (
Optional
[Dict
[str
,Union
[int
,str
,float
,None
]]], default:None
) – arguments for run commandenv_vars (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – environment vars to launch job with
- add_exe_args(args: str | List[str]) None #
Add executable arguments to executable
- Parameters:
args (
Union
[str
,List
[str
]]) – executable arguments- Return type:
None
- colocated_db_settings: t.Optional[t.Dict[str, t.Union[bool, int, str, None, t.List[str], t.Iterable[t.Union[int, t.Iterable[int]]], t.List[DBModel], t.List[DBScript], t.Dict[str, t.Union[int, None]], t.Dict[str, str]]]]#
- property env_vars: Dict[str, str | None]#
Return an immutable list of attached environment variables.
- Returns:
attached environment variables
- property exe_args: str | List[str]#
Return an immutable list of attached executable arguments.
- Returns:
attached executable arguments
- format_env_vars() List[str] #
Format the environment variables for mpirun
- Return type:
List
[str
]- Returns:
list of env vars
- format_run_args() List[str] #
Return a list of MPI-standard formatted run arguments
- Return type:
List
[str
]- Returns:
list of MPI-standard arguments for these settings
- make_mpmd(settings: smartsim.settings.base.RunSettings) None #
Make a mpmd workload by combining two
mpirun
commandsThis connects the two settings to be executed with a single Model instance
- Parameters:
settings (
RunSettings
) – MpirunSettings instance- Return type:
None
- reserved_run_args: set[str] = {'wd', 'wdir'}#
- property run_args: Dict[str, int | str | float | None]#
Return an immutable list of attached run arguments.
- Returns:
attached run arguments
- property run_command: str | None#
Return the launch binary used to launch the executable
Attempt to expand the path to the executable if possible
- Returns:
launch binary e.g. mpiexec
- set(arg: str, value: str | None = None, condition: bool = True) None #
Allows users to set individual run arguments.
A method that allows users to set run arguments after object instantiation. Does basic formatting such as stripping leading dashes. If the argument has been set previously, this method will log warning but ultimately comply.
Conditional expressions may be passed to the conditional parameter. If the expression evaluates to True, the argument will be set. In not an info message is logged and no further operation is performed.
Basic Usage
rs = RunSettings("python") rs.set("an-arg", "a-val") rs.set("a-flag") rs.format_run_args() # returns ["an-arg", "a-val", "a-flag", "None"]
Slurm Example with Conditional Setting
import socket rs = SrunSettings("echo", "hello") rs.set_tasks(1) rs.set("exclusive") # Only set this argument if condition param evals True # Otherwise log and NOP rs.set("partition", "debug", condition=socket.gethostname()=="testing-system") rs.format_run_args() # returns ["exclusive", "None", "partition", "debug"] iff socket.gethostname()=="testing-system" # otherwise returns ["exclusive", "None"]
- Parameters:
arg (
str
) – name of the argumentvalue (
Optional
[str
], default:None
) – value of the argumentconditon – set the argument if condition evaluates to True
- Return type:
None
- set_binding(binding: str) None #
Set binding
- Parameters:
binding (
str
) – Binding- Return type:
None
- set_broadcast(dest_path: str | None = None) None #
Copy the specified executable(s) to remote machines
This sets
--preload-binary
- Parameters:
dest_path (
Optional
[str
], default:None
) – Destination path (Ignored)- Return type:
None
- set_cpu_binding_type(bind_type: str) None #
Specifies the cores to which MPI processes are bound
This sets
--bind-to
for MPI compliant implementations- Parameters:
bind_type (
str
) – binding type- Return type:
None
- set_cpu_bindings(bindings: int | List[int]) None #
Set the cores to which MPI processes are bound
- Parameters:
bindings (
Union
[int
,List
[int
]]) – List specifing the cores to which MPI processes are bound- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None #
Set the number of tasks for this job
This sets
--cpus-per-proc
for MPI compliant implementationsnote: this option has been deprecated in openMPI 4.0+ and will soon be replaced.
- Parameters:
cpus_per_task (
int
) – number of tasks- Return type:
None
- set_excluded_hosts(host_list: str | List[str]) None #
Specify a list of hosts to exclude for launching this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to exclude- Return type:
None
- set_hostlist(host_list: str | List[str]) None #
Set the hostlist for the
mpirun
commandThis sets
--host
- Parameters:
host_list (
Union
[str
,List
[str
]]) – list of host names- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_hostlist_from_file(file_path: str) None #
Use the contents of a file to set the hostlist
This sets
--hostfile
- Parameters:
file_path (
str
) – Path to the hostlist file- Return type:
None
- set_memory_per_node(memory_per_node: int) None #
Set the amount of memory required per node in megabytes
- Parameters:
memory_per_node (
int
) – Number of megabytes per node- Return type:
None
- set_mpmd_preamble(preamble_lines: List[str]) None #
Set preamble to a file to make a job MPMD
- Parameters:
preamble_lines (
List
[str
]) – lines to put at the beginning of a file.- Return type:
None
- set_node_feature(feature_list: str | List[str]) None #
Specify the node feature for this job
- Parameters:
feature_list (
Union
[str
,List
[str
]]) – node feature to launch on- Return type:
None
- set_nodes(nodes: int) None #
Set the number of nodes
- Parameters:
nodes (
int
) – number of nodes to run with- Return type:
None
- set_quiet_launch(quiet: bool) None #
Set the job to run in quiet mode
This sets
--quiet
- Parameters:
quiet (
bool
) – Whether the job should be run quietly- Return type:
None
- set_task_map(task_mapping: str) None #
Set
mpirun
task mappingthis sets
--map-by <mapping>
For examples, see the man page for
mpirun
- Parameters:
task_mapping (
str
) – task mapping- Return type:
None
- set_tasks(tasks: int) None #
Set the number of tasks for this job
This sets
-n
for MPI compliant implementations- Parameters:
tasks (
int
) – number of tasks- Return type:
None
- set_tasks_per_node(tasks_per_node: int) None #
Set the number of tasks per node
- Parameters:
tasks_per_node (
int
) – number of tasks to launch per node- Return type:
None
- set_time(hours: int = 0, minutes: int = 0, seconds: int = 0) None #
Automatically format and set wall time
- Parameters:
hours (
int
, default:0
) – number of hours to run jobminutes (
int
, default:0
) – number of minutes to run jobseconds (
int
, default:0
) – number of seconds to run job
- Return type:
None
- set_verbose_launch(verbose: bool) None #
Set the job to run in verbose mode
This sets
--verbose
- Parameters:
verbose (
bool
) – Whether the job should be run verbosely- Return type:
None
- set_walltime(walltime: str) None #
Set the maximum number of seconds that a job will run
This sets
--timeout
- Parameters:
walltime (
str
) – number like string of seconds that a job will run in secs- Return type:
None
- update_env(env_vars: Dict[str, str | int | float | bool]) None #
Update the job environment variables
To fully inherit the current user environment, add the workload-manager-specific flag to the launch command through the
add_exe_args()
method. For example,--export=ALL
for slurm, or-V
for PBS/aprun.- Parameters:
env_vars (
Dict
[str
,Union
[str
,int
,float
,bool
]]) – environment variables to update or add- Raises:
TypeError – if env_vars values cannot be coerced to strings
- Return type:
None
SbatchSettings#
SbatchSettings
are used for launching batches onto Slurm
WLM systems.
|
Set the account for this batch job |
|
Set the command used to launch the batch e.g. |
|
Set the number of nodes for this batch job |
|
Specify the hostlist for this job |
|
Set the partition for the batch job |
|
alias for set_partition |
|
Set the walltime of the job |
Get the formatted batch arguments for a preview |
- class SbatchSettings(nodes: int | None = None, time: str = '', account: str | None = None, batch_args: Dict[str, str | None] | None = None, **kwargs: Any) None [source]#
Specify run parameters for a Slurm batch job
Slurm sbatch arguments can be written into
batch_args
as a dictionary. e.g. {‘ntasks’: 1}If the argument doesn’t have a parameter, put None as the value. e.g. {‘exclusive’: None}
Initialization values provided (nodes, time, account) will overwrite the same arguments in
batch_args
if present- Parameters:
nodes (
Optional
[int
], default:None
) – number of nodestime (
str
, default:''
) – walltime for job, e.g. “10:00:00” for 10 hoursaccount (
Optional
[str
], default:None
) – account for jobbatch_args (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – extra batch arguments
- add_preamble(lines: List[str]) None #
Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters:
line – lines to add to preamble.
- Return type:
None
- property batch_args: Dict[str, str | None]#
Retrieve attached batch arguments
- Returns:
attached batch arguments
- property batch_cmd: str#
Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns:
batch command
- format_batch_args() List[str] [source]#
Get the formatted batch arguments for a preview
- Return type:
List
[str
]- Returns:
batch arguments for Sbatch
- property preamble: Iterable[str]#
Return an iterable of preamble clauses to be prepended to the batch file
- Returns:
attached preamble clauses
- set_account(account: str) None [source]#
Set the account for this batch job
- Parameters:
account (
str
) – account id- Return type:
None
- set_batch_command(command: str) None #
Set the command used to launch the batch e.g.
sbatch
- Parameters:
command (
str
) – batch command- Return type:
None
- set_cpus_per_task(cpus_per_task: int) None [source]#
Set the number of cpus to use per task
This sets
--cpus-per-task
- Parameters:
num_cpus – number of cpus to use per task
- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_nodes(num_nodes: int) None [source]#
Set the number of nodes for this batch job
- Parameters:
num_nodes (
int
) – number of nodes- Return type:
None
- set_partition(partition: str) None [source]#
Set the partition for the batch job
- Parameters:
partition (
str
) – partition name- Return type:
None
QsubBatchSettings#
QsubBatchSettings
are used to configure jobs that should
be launched as a batch on PBSPro systems.
|
Set the account for this batch job |
|
Set the command used to launch the batch e.g. |
|
Set the number of nodes for this batch job |
|
Set the number of cpus obtained in each node. |
|
Set the queue for the batch job |
Set a resource value for the Qsub batch |
|
|
Set the walltime of the job |
Get the formatted batch arguments for a preview |
- class QsubBatchSettings(nodes: int | None = None, ncpus: int | None = None, time: str | None = None, queue: str | None = None, account: str | None = None, resources: Dict[str, str | int] | None = None, batch_args: Dict[str, str | None] | None = None, **kwargs: Any)[source]#
Specify
qsub
batch parameters for a jobnodes
, andncpus
are used to create the select statement for PBS if a select statement is not included in theresources
. If both are supplied the value for select statement supplied inresources
will override.- Parameters:
nodes (
Optional
[int
], default:None
) – number of nodes for batchncpus (
Optional
[int
], default:None
) – number of cpus per nodetime (
Optional
[str
], default:None
) – walltime for batch jobqueue (
Optional
[str
], default:None
) – queue to run batch inaccount (
Optional
[str
], default:None
) – account for batch launchresources (
Optional
[Dict
[str
,Union
[str
,int
]]], default:None
) – overrides for resource argumentsbatch_args (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – overrides for PBS batch arguments
- add_preamble(lines: List[str]) None #
Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters:
line – lines to add to preamble.
- Return type:
None
- property batch_args: Dict[str, str | None]#
Retrieve attached batch arguments
- Returns:
attached batch arguments
- property batch_cmd: str#
Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns:
batch command
- format_batch_args() List[str] [source]#
Get the formatted batch arguments for a preview
- Return type:
List
[str
]- Returns:
batch arguments for Qsub
- Raises:
ValueError – if options are supplied without values
- property preamble: Iterable[str]#
Return an iterable of preamble clauses to be prepended to the batch file
- Returns:
attached preamble clauses
- property resources: Dict[str, str | int]#
- set_account(account: str) None [source]#
Set the account for this batch job
- Parameters:
acct – account id
- Return type:
None
- set_batch_command(command: str) None #
Set the command used to launch the batch e.g.
sbatch
- Parameters:
command (
str
) – batch command- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_ncpus(num_cpus: int | str) None [source]#
Set the number of cpus obtained in each node.
If a select argument is provided in
QsubBatchSettings.resources
, then this value will be overridden- Parameters:
num_cpus (
Union
[int
,str
]) – number of cpus per node in select- Return type:
None
- set_nodes(num_nodes: int) None [source]#
Set the number of nodes for this batch job
In PBS, ‘select’ is the more primitive way of describing how many nodes to allocate for the job. ‘nodes’ is equivalent to ‘select’ with a ‘place’ statement. Assuming that only advanced users would use ‘set_resource’ instead, defining the number of nodes here is sets the ‘nodes’ resource.
- Parameters:
num_nodes (
int
) – number of nodes- Return type:
None
- set_queue(queue: str) None [source]#
Set the queue for the batch job
- Parameters:
queue (
str
) – queue name- Return type:
None
- set_resource(resource_name: str, value: str | int) None [source]#
Set a resource value for the Qsub batch
If a select statement is provided, the nodes and ncpus arguments will be overridden. Likewise for Walltime
- Parameters:
resource_name (
str
) – name of resource, e.g. walltimevalue (
Union
[str
,int
]) – value
- Return type:
None
BsubBatchSettings#
BsubBatchSettings
are used to configure jobs that should
be launched as a batch on LSF systems.
|
Set the walltime |
Set SMTs |
|
|
Set the project |
|
Set the number of nodes for this batch job |
Set allocation for expert mode. |
|
|
Specify the hostlist for this job |
|
Set the number of tasks for this job |
Get the formatted batch arguments for a preview |
- class BsubBatchSettings(nodes: int | None = None, time: str | None = None, project: str | None = None, batch_args: Dict[str, str | None] | None = None, smts: int = 0, **kwargs: Any) None [source]#
Specify
bsub
batch parameters for a job- Parameters:
nodes (
Optional
[int
], default:None
) – number of nodes for batchtime (
Optional
[str
], default:None
) – walltime for batch job in format hh:mmproject (
Optional
[str
], default:None
) – project for batch launchbatch_args (
Optional
[Dict
[str
,Optional
[str
]]], default:None
) – overrides for LSF batch argumentssmts (
int
, default:0
) – SMTs
- add_preamble(lines: List[str]) None #
Add lines to the batch file preamble. The lines are just written (unmodified) at the beginning of the batch file (after the WLM directives) and can be used to e.g. start virtual environments before running the executables.
- Parameters:
line – lines to add to preamble.
- Return type:
None
- property batch_args: Dict[str, str | None]#
Retrieve attached batch arguments
- Returns:
attached batch arguments
- property batch_cmd: str#
Return the batch command
Tests to see if we can expand the batch command path. If we can, then returns the expanded batch command. If we cannot, returns the batch command as is.
- Returns:
batch command
- format_batch_args() List[str] [source]#
Get the formatted batch arguments for a preview
- Return type:
List
[str
]- Returns:
list of batch arguments for Qsub
- property preamble: Iterable[str]#
Return an iterable of preamble clauses to be prepended to the batch file
- Returns:
attached preamble clauses
- set_account(account: str) None [source]#
Set the project
this function is an alias for set_project.
- Parameters:
account (
str
) – project name- Return type:
None
- set_batch_command(command: str) None #
Set the command used to launch the batch e.g.
sbatch
- Parameters:
command (
str
) – batch command- Return type:
None
- set_expert_mode_req(res_req: str, slots: int) None [source]#
Set allocation for expert mode. This will activate expert mode (
-csm
) and disregard all other allocation options.This sets
-csm -n slots -R res_req
- Parameters:
res_req (
str
) – specific resource requirementsslots (
int
) – number of resources to allocate
- Return type:
None
- set_hostlist(host_list: str | List[str]) None [source]#
Specify the hostlist for this job
- Parameters:
host_list (
Union
[str
,List
[str
]]) – hosts to launch on- Raises:
TypeError – if not str or list of str
- Return type:
None
- set_nodes(num_nodes: int) None [source]#
Set the number of nodes for this batch job
This sets
-nnodes
.- Parameters:
nodes – number of nodes
- Return type:
None
- set_project(project: str) None [source]#
Set the project
This sets
-P
.- Parameters:
time – project name
- Return type:
None
- set_queue(queue: str) None [source]#
Set the queue for this job
- Parameters:
queue (
str
) – The queue to submit the job on- Return type:
None
- set_smts(smts: int) None [source]#
Set SMTs
This sets
-alloc_flags
. If the user sets SMT explicitly through-alloc_flags
, then that takes precedence.- Parameters:
smts (
int
) – SMT (e.g on Summit: 1, 2, or 4)- Return type:
None
Singularity#
Singularity
is a type of Container
that can be passed to a
RunSettings
class or child class to enable running the workload in a
container.
- class Singularity(*args: Any, **kwargs: Any) None [source]#
Singularity (apptainer) container type. To be passed into a
RunSettings
class initializer orExperiment.create_run_settings
.Note
Singularity integration is currently tested with Apptainer 1.0 with slurm and PBS workload managers only.
Also, note that user-defined bind paths (
mount
argument) may be disabled by a system administrator- Parameters:
image – local or remote path to container image, e.g.
docker://sylabsio/lolcow
args (
Any
) – arguments to ‘singularity exec’ commandmount – paths to mount (bind) from host machine into image.
Orchestrator#
|
Initialize an |
Return the DB identifier, which is common to a DB and all of its nodes |
|
Return the number of DB shards contained in the Orchestrator. |
|
Read only property for the number of nodes an |
|
Return the hostnames of Orchestrator instance hosts |
|
Clear hosts or reset them to last user choice |
|
Can be used to remove database files of a previous launch |
|
Return database addresses |
|
Check if the database is active |
|
|
Set the number of CPUs available to each database shard |
|
Set the batch walltime of the orchestrator |
|
Specify the hosts for the |
|
Set a batch argument the orchestrator should launch with |
|
Set a run argument the orchestrator should launch each node with (it will be passed to jrun) |
|
Sets the database's save configuration to save the DB every 'frequency' seconds given that at least one write operation against the DB occurred in that time. |
Sets the max memory configuration. |
|
|
Sets how the database will select what to remove when 'maxmemory' is reached. |
|
Sets the max number of connected clients at the same time. |
Sets the database's memory size limit for bulk requests, which are elements representing single strings. |
|
|
Set any valid configuration at runtime without the need to restart the database. |
Return the telemetry configuration for this entity. |
|
Get the path to the checkpoint file for this Orchestrator |
|
Property indicating whether or not the entity sequence should be launched as a batch job |
Orchestrator#
- class Orchestrator(path: str | None = '/usr/local/src/SmartSim/doc', port: int = 6379, interface: str | List[str] = 'lo', launcher: str = 'local', run_command: str = 'auto', db_nodes: int = 1, batch: bool = False, hosts: str | List[str] | None = None, account: str | None = None, time: str | None = None, alloc: str | None = None, single_cmd: bool = False, *, threads_per_queue: int | None = None, inter_op_threads: int | None = None, intra_op_threads: int | None = None, db_identifier: str = 'orchestrator', **kwargs: Any) None [source]#
The Orchestrator is an in-memory database that can be launched alongside entities in SmartSim. Data can be transferred between entities by using one of the Python, C, C++ or Fortran clients within an entity.
Initialize an
Orchestrator
reference for local launchExtra configurations for RedisAI
- Parameters:
path (
Optional
[str
], default:'/usr/local/src/SmartSim/doc'
) – path to location ofOrchestrator
directoryport (
int
, default:6379
) – TCP/IP portinterface (
Union
[str
,List
[str
]], default:'lo'
) – network interface(s)launcher (
str
, default:'local'
) – type of launcher being used, options are “slurm”, “pbs”, “lsf”, or “local”. If set to “auto”, an attempt will be made to find an available launcher on the system.run_command (
str
, default:'auto'
) – specify launch binary or detect automaticallydb_nodes (
int
, default:1
) – number of database shardsbatch (
bool
, default:False
) – run as a batch workloadhosts (
Union
[str
,List
[str
],None
], default:None
) – specify hosts to launch onaccount (
Optional
[str
], default:None
) – account to run batch ontime (
Optional
[str
], default:None
) – walltime for batch ‘HH:MM:SS’ formatalloc (
Optional
[str
], default:None
) – allocation to launch database onsingle_cmd (
bool
, default:False
) – run all shards with one (MPMD) commandthreads_per_queue (
Optional
[int
], default:None
) – threads per GPU deviceinter_op_threads (
Optional
[int
], default:None
) – threads across CPU operationsintra_op_threads (
Optional
[int
], default:None
) – threads per CPU operationdb_identifier (
str
, default:'orchestrator'
) – an identifier to distinguish this orchestrator in multiple-database experiments
- property batch: bool#
Property indicating whether or not the entity sequence should be launched as a batch job
- Returns:
True
if entity sequence should be launched as a batch job,False
if the members will be launched individually.
- property checkpoint_file: str#
Get the path to the checkpoint file for this Orchestrator
- Returns:
Path to the checkpoint file if it exists, otherwise a None
- property db_identifier: str#
Return the DB identifier, which is common to a DB and all of its nodes
- Returns:
DB identifier
- property db_models: Iterable[smartsim.entity.DBModel]#
Return an immutable collection of attached models
- property db_nodes: int#
Read only property for the number of nodes an
Orchestrator
is launched across. Notice that SmartSim currently assumes that each shard will be launched on its own node. Therefore this property is currently an alias to thenum_shards
attribute.- Returns:
Number of database nodes
- property db_scripts: Iterable[smartsim.entity.DBScript]#
Return an immutable collection of attached scripts
- enable_checkpoints(frequency: int) None [source]#
Sets the database’s save configuration to save the DB every ‘frequency’ seconds given that at least one write operation against the DB occurred in that time. E.g., if frequency is 900, then the database will save to disk after 900 seconds if there is at least 1 change to the dataset.
- Parameters:
frequency (
int
) – the given number of seconds before the DB saves- Return type:
None
- get_address() List[str] [source]#
Return database addresses
- Return type:
List
[str
]- Returns:
addresses
- Raises:
SmartSimError – If database address cannot be found or is not active
- property hosts: List[str]#
Return the hostnames of Orchestrator instance hosts
Note that this will only be populated after the orchestrator has been launched by SmartSim.
- Returns:
the hostnames of Orchestrator instance hosts
- is_active() bool [source]#
Check if the database is active
- Return type:
bool
- Returns:
True if database is active, False otherwise
- property num_shards: int#
Return the number of DB shards contained in the Orchestrator. This might differ from the number of
DBNode
objects, as eachDBNode
may start more than one shard (e.g. with MPMD).- Returns:
the number of DB shards contained in the Orchestrator
- remove_stale_files() None [source]#
Can be used to remove database files of a previous launch
- Return type:
None
- set_batch_arg(arg: str, value: str | None = None) None [source]#
Set a batch argument the orchestrator should launch with
Some commonly used arguments such as –job-name are used by SmartSim and will not be allowed to be set.
- Parameters:
arg (
str
) – batch argument to set e.g. “exclusive”value (
Optional
[str
], default:None
) – batch param - set to None if no param value
- Raises:
SmartSimError – if orchestrator not launching as batch
- Return type:
None
- set_cpus(num_cpus: int) None [source]#
Set the number of CPUs available to each database shard
This effectively will determine how many cpus can be used for compute threads, background threads, and network I/O.
- Parameters:
num_cpus (
int
) – number of cpus to set- Return type:
None
- set_db_conf(key: str, value: str) None [source]#
Set any valid configuration at runtime without the need to restart the database. All configuration parameters that are set are immediately loaded by the database and will take effect starting with the next command executed.
- Parameters:
key (
str
) – the configuration parametervalue (
str
) – the database configuration parameter’s new value
- Return type:
None
- set_eviction_strategy(strategy: str) None [source]#
Sets how the database will select what to remove when ‘maxmemory’ is reached. The default is noeviction.
- Parameters:
strategy (
str
) – The max memory policy to use e.g. “volatile-lru”, “allkeys-lru”, etc.- Raises:
SmartSimError – If ‘strategy’ is an invalid maxmemory policy
SmartSimError – If database is not active
- Return type:
None
- set_hosts(host_list: List[str] | str) None [source]#
Specify the hosts for the
Orchestrator
to launch on- Parameters:
host_list (
Union
[List
[str
],str
]) – list of host (compute node names)- Raises:
TypeError – if wrong type
- Return type:
None
- set_max_clients(clients: int = 50000) None [source]#
Sets the max number of connected clients at the same time. When the number of DB shards contained in the orchestrator is more than two, then every node will use two connections, one incoming and another outgoing.
- Parameters:
clients (
int
, default:50000
) – the maximum number of connected clients- Return type:
None
- set_max_memory(mem: str) None [source]#
Sets the max memory configuration. By default there is no memory limit. Setting max memory to zero also results in no memory limit. Once a limit is surpassed, keys will be removed according to the eviction strategy. The specified memory size is case insensitive and supports the typical forms of:
1k => 1000 bytes
1kb => 1024 bytes
1m => 1000000 bytes
1mb => 1024*1024 bytes
1g => 1000000000 bytes
1gb => 1024*1024*1024 bytes
- Parameters:
mem (
str
) – the desired max memory size e.g. 3gb- Raises:
SmartSimError – If ‘mem’ is an invalid memory value
SmartSimError – If database is not active
- Return type:
None
- set_max_message_size(size: int = 1073741824) None [source]#
Sets the database’s memory size limit for bulk requests, which are elements representing single strings. The default is 1 gigabyte. Message size must be greater than or equal to 1mb. The specified memory size should be an integer that represents the number of bytes. For example, to set the max message size to 1gb, use 1024*1024*1024.
- Parameters:
size (
int
, default:1073741824
) – maximum message size in bytes- Return type:
None
- set_path(new_path: str) None #
- Return type:
None
- set_run_arg(arg: str, value: str | None = None) None [source]#
Set a run argument the orchestrator should launch each node with (it will be passed to jrun)
Some commonly used arguments are used by SmartSim and will not be allowed to be set. For example, “n”, “N”, etc.
- Parameters:
arg (
str
) – run argument to setvalue (
Optional
[str
], default:None
) – run parameter - set to None if no parameter value
- Return type:
None
- set_walltime(walltime: str) None [source]#
Set the batch walltime of the orchestrator
Note: This will only effect orchestrators launched as a batch
- Parameters:
walltime (
str
) – amount of time e.g. 10 hours is 10:00:00- Raises:
SmartSimError – if orchestrator isn’t launching as batch
- Return type:
None
- property telemetry: TelemetryConfiguration#
Return the telemetry configuration for this entity.
- Returns:
configuration of telemetry for this entity
- property type: str#
Return the name of the class
Model#
|
Initialize a |
|
Attach files to an entity for generation |
|
An alias for |
|
Colocate an Orchestrator instance with this Model over TCP/IP. |
|
Colocate an Orchestrator instance with this Model over UDS. |
Return True if this Model will run with a colocated Orchestrator |
|
|
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime |
|
TorchScript to launch with this Model instance |
|
TorchScript function to launch with this Model instance |
Convert parameters to command line arguments and update run settings. |
|
|
Register future communication between entities. |
If called, the entity will prefix its keys with its own model name |
|
If called, the entity will not prefix its keys with its own model name |
|
Inquire as to whether this entity will prefix its keys with its name |
Model#
- class Model(name: str, params: Dict[str, str], run_settings: smartsim.settings.base.RunSettings, path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None)[source]#
Bases:
SmartSimEntity
Initialize a
Model
- Parameters:
name (
str
) – name of the modelparams (
Dict
[str
,str
]) – model parameters for writing into configuration files or to be passed as command line arguments to executable.path (
Optional
[str
], default:'/usr/local/src/SmartSim/doc'
) – path to output, error, and configuration filesrun_settings (
RunSettings
) – launcher settings specified in the experimentparams_as_args (
Optional
[List
[str
]], default:None
) – list of parameters which have to be interpreted as command line arguments to be added to run_settingsbatch_settings (
Optional
[BatchSettings
], default:None
) – Launcher settings for running the individual model as a batch job
- add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript function to launch with this Model instance
Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of this Model instance.
For converged orchestrators, the
add_script()
method should be used.Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the model being stored in the first N devices of typedevice
.- Parameters:
name (
str
) – key to store function underfunction (
Optional
[str
], default:None
) – TorchScript function codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.first_device (
int
, default:0
) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
- Return type:
None
- add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) None [source]#
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime
Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance
One of either model (in memory representation) or model_path (file) must be provided
- Parameters:
name (
str
) – key to store model underbackend (
str
) – name of the backend (TORCH, TF, TFLITE, ONNX)model (
Optional
[bytes
], default:None
) – A model in memory (only supported for non-colocated orchestrators)model_path (
Optional
[str
], default:None
) – serialized modeldevice (
str
, default:'CPU'
) – name of device for executiondevices_per_node (
int
, default:1
) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.first_device (
int
, default:0
) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.batch_size (
int
, default:0
) – batch size for executionmin_batch_size (
int
, default:0
) – minimum batch size for model executionmin_batch_timeout (
int
, default:0
) – time to wait for minimum batch sizetag (
str
, default:''
) – additional tag for model informationinputs (
Optional
[List
[str
]], default:None
) – model inputs (TF only)outputs (
Optional
[List
[str
]], default:None
) – model outupts (TF only)
- Return type:
None
- add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript to launch with this Model instance
Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of this Model instance
Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the script being stored in the first N devices of typedevice
; alternatively, settingfirst_device=M
will result in the script being stored on nodes M through M + N - 1.One of either script (in memory string representation) or script_path (file) must be provided
- Parameters:
name (
str
) – key to store script underscript (
Optional
[str
], default:None
) – TorchScript code (only supported for non-colocated orchestrators)script_path (
Optional
[str
], default:None
) – path to TorchScript codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.first_device (
int
, default:0
) – The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
- Return type:
None
- attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) None [source]#
Attach files to an entity for generation
Attach files needed for the entity that, upon generation, will be located in the path of the entity. Invoking this method after files have already been attached will overwrite the previous list of entity files.
During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.
Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;
- Parameters:
to_copy (
Optional
[List
[str
]], default:None
) – files to copyto_symlink (
Optional
[List
[str
]], default:None
) – files to symlinkto_configure (
Optional
[List
[str
]], default:None
) – input files with tagged parameters
- Return type:
None
- property attached_files_table: str#
Return a list of attached files as a plain text table
- Returns:
String version of table
- colocate_db(*args: Any, **kwargs: Any) None [source]#
An alias for
Model.colocate_db_tcp
- Return type:
None
- colocate_db_tcp(port: int = 6379, ifname: str | list[str] = 'lo', db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) None [source]#
Colocate an Orchestrator instance with this Model over TCP/IP.
This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using the loopback TCP interface.
Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.
ex. kwargs = { maxclients: 100000, threads_per_queue: 1, inter_op_threads: 1, intra_op_threads: 1, server_threads: 2 # keydb only }
Generally these don’t need to be changed.
- Parameters:
port (
int
, default:6379
) – port to use for orchestrator databaseifname (
Union
[str
,list
[str
]], default:'lo'
) – interface to use for orchestratordb_cpus (
int
, default:1
) – number of cpus to use for orchestratorcustom_pinning (
Optional
[Iterable
[Union
[int
,Iterable
[int
]]]], default:None
) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinningdebug (
bool
, default:False
) – launch Model with extra debug information about the colocated dbkwargs (
Any
) – additional keyword arguments to pass to the orchestrator database
- Return type:
None
- colocate_db_uds(unix_socket: str = '/tmp/redis.socket', socket_permissions: int = 755, db_cpus: int = 1, custom_pinning: Iterable[int | Iterable[int]] | None = None, debug: bool = False, db_identifier: str = '', **kwargs: Any) None [source]#
Colocate an Orchestrator instance with this Model over UDS.
This method will initialize settings which add an unsharded database to this Model instance. Only this Model will be able to communicate with this colocated database by using Unix Domain sockets.
Extra parameters for the db can be passed through kwargs. This includes many performance, caching and inference settings.
example_kwargs = { "maxclients": 100000, "threads_per_queue": 1, "inter_op_threads": 1, "intra_op_threads": 1, "server_threads": 2 # keydb only }
Generally these don’t need to be changed.
- Parameters:
unix_socket (
str
, default:'/tmp/redis.socket'
) – path to where the socket file will be createdsocket_permissions (
int
, default:755
) – permissions for the socketfiledb_cpus (
int
, default:1
) – number of cpus to use for orchestratorcustom_pinning (
Optional
[Iterable
[Union
[int
,Iterable
[int
]]]], default:None
) – CPUs to pin the orchestrator to. Passing an empty iterable disables pinningdebug (
bool
, default:False
) – launch Model with extra debug information about the colocated dbkwargs (
Any
) – additional keyword arguments to pass to the orchestrator database
- Return type:
None
- property colocated: bool#
Return True if this Model will run with a colocated Orchestrator
- Returns:
Return True of the Model will run with a colocated Orchestrator
- property db_models: Iterable[DBModel]#
Retrieve an immutable collection of attached models
- Returns:
Return an immutable collection of attached models
- property db_scripts: Iterable[DBScript]#
Retrieve an immutable collection attached of scripts
- Returns:
Return an immutable collection of attached scripts
- disable_key_prefixing() None [source]#
If called, the entity will not prefix its keys with its own model name
- Return type:
None
- enable_key_prefixing() None [source]#
If called, the entity will prefix its keys with its own model name
- Return type:
None
- params_to_args() None [source]#
Convert parameters to command line arguments and update run settings.
- Return type:
None
- print_attached_files() None [source]#
Print a table of the attached files on std out
- Return type:
None
- query_key_prefixing() bool [source]#
Inquire as to whether this entity will prefix its keys with its name
- Return type:
bool
- Returns:
Return True if entity will prefix its keys with its name
- register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) None [source]#
Register future communication between entities.
Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity
- Parameters:
incoming_entity (
SmartSimEntity
) – The entity that data will be received from- Raises:
SmartSimError – if incoming entity has already been registered
- Return type:
None
- property type: str#
Return the name of the class
Ensemble#
|
Initialize an Ensemble of Model instances. |
|
Add a model to this ensemble |
|
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime |
|
TorchScript to launch with every entity belonging to this ensemble |
|
TorchScript function to launch with every entity belonging to this ensemble |
|
Attach files to each model within the ensemble for generation |
If called, each model within this ensemble will prefix its key with its own model name. |
|
An alias for a shallow copy of the |
|
Inquire as to whether each model within the ensemble will prefix their keys |
|
Register future communication between entities. |
Ensemble#
- class Ensemble(name: str, params: Dict[str, Any], path: str | None = '/usr/local/src/SmartSim/doc', params_as_args: List[str] | None = None, batch_settings: smartsim.settings.base.BatchSettings | None = None, run_settings: smartsim.settings.base.RunSettings | None = None, perm_strat: str = 'all_perm', **kwargs: Any) None [source]#
Bases:
EntityList
[Model
]Ensemble
is a group ofModel
instances that can be treated as a reference to a single instance.Initialize an Ensemble of Model instances.
The kwargs argument can be used to pass custom input parameters to the permutation strategy.
- Parameters:
name (
str
) – name of the ensembleparams (
Dict
[str
,Any
]) – parameters to expand intoModel
membersparams_as_args (
Optional
[List
[str
]], default:None
) – list of params that should be used as command line arguments to theModel
member executables and not written to generator filesbatch_settings (
Optional
[BatchSettings
], default:None
) – describes settings forEnsemble
as batch workloadrun_settings (
Optional
[RunSettings
], default:None
) – describes how eachModel
should be executedreplicas – number of
Model
replicas to create - a keyword argument of kwargsperm_strategy – strategy for expanding
params
intoModel
instances from params argument options are “all_perm”, “step”, “random” or a callable function.
- Returns:
Ensemble
instance
- add_function(name: str, function: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript function to launch with every entity belonging to this ensemble
Each script function to the model will be loaded into a non-converged orchestrator prior to the execution of every entity belonging to this ensemble.
For converged orchestrators, the
add_script()
method should be used.Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the script being stored in the first N devices of typedevice
; alternatively, settingfirst_device=M
will result in the script being stored on nodes M through M + N - 1.- Parameters:
name (
str
) – key to store function underfunction (
Optional
[str
], default:None
) – TorchScript codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – number of devices on each hostfirst_device (
int
, default:0
) – first device to use on each host
- Return type:
None
- add_ml_model(name: str, backend: str, model: bytes | None = None, model_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0, batch_size: int = 0, min_batch_size: int = 0, min_batch_timeout: int = 0, tag: str = '', inputs: List[str] | None = None, outputs: List[str] | None = None) None [source]#
A TF, TF-lite, PT, or ONNX model to load into the DB at runtime
Each ML Model added will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble
One of either model (in memory representation) or model_path (file) must be provided
- Parameters:
name (
str
) – key to store model undermodel (
Optional
[bytes
], default:None
) – model in memorymodel_path (
Optional
[str
], default:None
) – serialized modelbackend (
str
) – name of the backend (TORCH, TF, TFLITE, ONNX)device (
str
, default:'CPU'
) – name of device for executiondevices_per_node (
int
, default:1
) – number of GPUs per node in multiGPU nodesfirst_device (
int
, default:0
) – first device in multi-GPU nodes to use for execution, defaults to 0; ignored if devices_per_node is 1batch_size (
int
, default:0
) – batch size for executionmin_batch_size (
int
, default:0
) – minimum batch size for model executionmin_batch_timeout (
int
, default:0
) – time to wait for minimum batch sizetag (
str
, default:''
) – additional tag for model informationinputs (
Optional
[List
[str
]], default:None
) – model inputs (TF only)outputs (
Optional
[List
[str
]], default:None
) – model outupts (TF only)
- Return type:
None
- add_model(model: smartsim.entity.model.Model) None [source]#
Add a model to this ensemble
- Parameters:
model (
Model
) – model instance to be added- Raises:
TypeError – if model is not an instance of
Model
EntityExistsError – if model already exists in this ensemble
- Return type:
None
- add_script(name: str, script: str | None = None, script_path: str | None = None, device: str = 'CPU', devices_per_node: int = 1, first_device: int = 0) None [source]#
TorchScript to launch with every entity belonging to this ensemble
Each script added to the model will be loaded into an orchestrator (converged or not) prior to the execution of every entity belonging to this ensemble
Device selection is either “GPU” or “CPU”. If many devices are present, a number can be passed for specification e.g. “GPU:1”.
Setting
devices_per_node=N
, with N greater than one will result in the model being stored in the first N devices of typedevice
.One of either script (in memory string representation) or script_path (file) must be provided
- Parameters:
name (
str
) – key to store script underscript (
Optional
[str
], default:None
) – TorchScript codescript_path (
Optional
[str
], default:None
) – path to TorchScript codedevice (
str
, default:'CPU'
) – device for script executiondevices_per_node (
int
, default:1
) – number of devices on each hostfirst_device (
int
, default:0
) – first device to use on each host
- Return type:
None
- attach_generator_files(to_copy: List[str] | None = None, to_symlink: List[str] | None = None, to_configure: List[str] | None = None) None [source]#
Attach files to each model within the ensemble for generation
Attach files needed for the entity that, upon generation, will be located in the path of the entity.
During generation, files “to_copy” are copied into the path of the entity, and files “to_symlink” are symlinked into the path of the entity.
Files “to_configure” are text based model input files where parameters for the model are set. Note that only models support the “to_configure” field. These files must have fields tagged that correspond to the values the user would like to change. The tag is settable but defaults to a semicolon e.g. THERMO = ;10;
- Parameters:
to_copy (
Optional
[List
[str
]], default:None
) – files to copyto_symlink (
Optional
[List
[str
]], default:None
) – files to symlinkto_configure (
Optional
[List
[str
]], default:None
) – input files with tagged parameters
- Return type:
None
- property attached_files_table: str#
Return a plain-text table with information about files attached to models belonging to this ensemble.
- Returns:
A table of all files attached to all models
- property batch: bool#
Property indicating whether or not the entity sequence should be launched as a batch job
- Returns:
True
if entity sequence should be launched as a batch job,False
if the members will be launched individually.
- property db_models: Iterable[smartsim.entity.DBModel]#
Return an immutable collection of attached models
- property db_scripts: Iterable[smartsim.entity.DBScript]#
Return an immutable collection of attached scripts
- enable_key_prefixing() None [source]#
If called, each model within this ensemble will prefix its key with its own model name.
- Return type:
None
- query_key_prefixing() bool [source]#
Inquire as to whether each model within the ensemble will prefix their keys
- Return type:
bool
- Returns:
True if all models have key prefixing enabled, False otherwise
- register_incoming_entity(incoming_entity: smartsim.entity.entity.SmartSimEntity) None [source]#
Register future communication between entities.
Registers the named data sources that this entity has access to by storing the key_prefix associated with that entity
Only python clients can have multiple incoming connections
- Parameters:
incoming_entity (
SmartSimEntity
) – The entity that data will be received from- Return type:
None
- property type: str#
Return the name of the class
Machine Learning#
SmartSim includes built-in utilities for supporting TensorFlow, Keras, and Pytorch.
TensorFlow#
SmartSim includes built-in utilities for supporting TensorFlow and Keras in training and inference.
- freeze_model(model: keras.src.models.model.Model, output_dir: str, file_name: str) Tuple[str, List[str], List[str]] [source]#
Freeze a Keras or TensorFlow Graph
to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model_from_file() method.
This utiliy function provides everything users need to take a trained model and put it inside an
orchestrator
instance- Parameters:
model (
Model
) – TensorFlow or Keras modeloutput_dir (
str
) – output dir to save model file tofile_name (
str
) – name of model file to create
- Return type:
Tuple
[str
,List
[str
],List
[str
]]- Returns:
path to model file, model input layer names, model output layer names
- serialize_model(model: keras.src.models.model.Model) Tuple[str, List[str], List[str]] [source]#
Serialize a Keras or TensorFlow Graph
to use a Keras or TensorFlow model in SmartSim, the model must be frozen and the inputs and outputs provided to the smartredis.client.set_model() method.
This utiliy function provides everything users need to take a trained model and put it inside an
orchestrator
instance.- Parameters:
model (
Model
) – TensorFlow or Keras model- Return type:
Tuple
[str
,List
[str
],List
[str
]]- Returns:
serialized model, model input layer names, model output layer names
- class StaticDataGenerator(**kwargs: Any) None [source]#
Bases:
_TFDataGenerationCommon
A class to download a dataset from the DB.
Details about parameters and features of this class can be found in the documentation of
DataDownloader
, of which it is just a TensorFlow-specialized sub-class with dynamic=False.- init_samples(init_trials: int = -1, wait_interval: float = 10.0) None #
Initialize samples (and targets, if needed).
A new attempt to download samples will be made every ten seconds, for
init_trials
times.- Parameters:
init_trials (
int
, default:-1
) – maximum number of attempts to fetch data- Return type:
None
- property need_targets: bool#
Compute if targets have to be downloaded.
- Returns:
Whether targets (or labels) should be downloaded
- property num_batches#
Number of batches in the PyDataset.
- Returns:
The number of batches in the PyDataset or None to indicate that the dataset is infinite.
- on_epoch_begin()#
Method called at the beginning of every epoch.
- on_epoch_end() None #
Callback called at the end of each training epoch
If self.shuffle is set to True, data is shuffled.
- Return type:
None
- class DynamicDataGenerator(**kwargs: Any) None [source]#
Bases:
_TFDataGenerationCommon
A class to download batches from the DB.
Details about parameters and features of this class can be found in the documentation of
DataDownloader
, of which it is just a TensorFlow-specialized sub-class with dynamic=True.- init_samples(init_trials: int = -1, wait_interval: float = 10.0) None #
Initialize samples (and targets, if needed).
A new attempt to download samples will be made every ten seconds, for
init_trials
times.- Parameters:
init_trials (
int
, default:-1
) – maximum number of attempts to fetch data- Return type:
None
- property need_targets: bool#
Compute if targets have to be downloaded.
- Returns:
Whether targets (or labels) should be downloaded
- property num_batches#
Number of batches in the PyDataset.
- Returns:
The number of batches in the PyDataset or None to indicate that the dataset is infinite.
- on_epoch_begin()#
Method called at the beginning of every epoch.
PyTorch#
SmartSim includes built-in utilities for supporting PyTorch in training and inference.
- class StaticDataGenerator(**kwargs: Any) None [source]#
Bases:
_TorchDataGenerationCommon
A class to download a dataset from the DB.
Details about parameters and features of this class can be found in the documentation of
DataDownloader
, of which it is just a PyTorch-specialized sub-class with dynamic=False and init_samples=False.When used in the DataLoader defined in this class, samples are initialized automatically before training. Other data loaders using this generator should implement the same behavior.
- init_samples(init_trials: int = -1, wait_interval: float = 10.0) None #
Initialize samples (and targets, if needed).
A new attempt to download samples will be made every ten seconds, for
init_trials
times.- Parameters:
init_trials (
int
, default:-1
) – maximum number of attempts to fetch data- Return type:
None
- property need_targets: bool#
Compute if targets have to be downloaded.
- Returns:
Whether targets (or labels) should be downloaded
- class DynamicDataGenerator(**kwargs: Any) None [source]#
Bases:
_TorchDataGenerationCommon
A class to download batches from the DB.
Details about parameters and features of this class can be found in the documentation of
DataDownloader
, of which it is just a PyTorch-specialized sub-class with dynamic=True and init_samples=False.When used in the DataLoader defined in this class, samples are initialized automatically before training. Other data loaders using this generator should implement the same behavior.
- init_samples(init_trials: int = -1, wait_interval: float = 10.0) None #
Initialize samples (and targets, if needed).
A new attempt to download samples will be made every ten seconds, for
init_trials
times.- Parameters:
init_trials (
int
, default:-1
) – maximum number of attempts to fetch data- Return type:
None
- property need_targets: bool#
Compute if targets have to be downloaded.
- Returns:
Whether targets (or labels) should be downloaded
- class DataLoader(dataset: smartsim.ml.torch.data._TorchDataGenerationCommon, **kwargs: Any) None [source]#
Bases:
DataLoader
DataLoader to be used as a wrapper of StaticDataGenerator or DynamicDataGenerator
This is just a sub-class of
torch.utils.data.DataLoader
which sets up sources of a data generator correctly. DataLoader parameters such as num_workers can be passed at initialization. batch_size should always be set to None.
Slurm#
|
Request an allocation |
|
Free an allocation's resources |
|
Check that there are sufficient resources in the provided Slurm partitions. |
Returns the default partition from Slurm |
|
Get the name of the nodes used in a slurm allocation. |
|
Get the name of queue in a slurm allocation. |
|
Get the number of tasks in a slurm allocation. |
|
Get the number of tasks per each node in a slurm allocation. |
- get_allocation(nodes: int = 1, time: str | None = None, account: str | None = None, options: Dict[str, str] | None = None) str [source]#
Request an allocation
This function requests an allocation with the specified arguments. Anything passed to the options will be processed as a Slurm argument and appended to the salloc command with the appropriate prefix (e.g. “-” or “–“).
The options can be used to pass extra settings to the workload manager such as the following for Slurm:
nodelist=”nid00004”
For arguments without a value, pass None or and empty string as the value. For Slurm:
exclusive=None
- Parameters:
nodes (
int
, default:1
) – number of nodes for the allocationtime (
Optional
[str
], default:None
) – wall time of the allocation, HH:MM:SS formataccount (
Optional
[str
], default:None
) – account id for allocationoptions (
Optional
[Dict
[str
,str
]], default:None
) – additional options for the slurm wlm
- Raises:
LauncherError – if the allocation is not successful
- Return type:
str
- Returns:
the id of the allocation
- get_default_partition() str [source]#
Returns the default partition from Slurm
This default partition is assumed to be the partition with a star following its partition name in sinfo output
- Return type:
str
- Returns:
the name of the default partition
- get_hosts() List[str] [source]#
Get the name of the nodes used in a slurm allocation.
Note
This method requires access to
scontrol
from the node on which it is run- Return type:
List
[str
]- Returns:
Names of the host nodes
- Raises:
LauncherError – Could not access
scontrol
SmartSimError –
SLURM_JOB_NODELIST
is not set
- get_queue() str [source]#
Get the name of queue in a slurm allocation.
- Return type:
str
- Returns:
The name of the queue
- Raises:
SmartSimError –
SLURM_JOB_PARTITION
is not set
- get_tasks() int [source]#
Get the number of tasks in a slurm allocation.
- Return type:
int
- Returns:
Then number of tasks in the allocation
- Raises:
SmartSimError –
SLURM_NTASKS
is not set
- get_tasks_per_node() Dict[str, int] [source]#
Get the number of tasks per each node in a slurm allocation.
Note
This method requires access to
scontrol
from the node on which it is run- Return type:
Dict
[str
,int
]- Returns:
Map of nodes to number of tasks on that node
- Raises:
SmartSimError –
SLURM_TASKS_PER_NODE
is not set
- release_allocation(alloc_id: str) None [source]#
Free an allocation’s resources
- Parameters:
alloc_id (
str
) – allocation id- Raises:
LauncherError – if allocation could not be freed
- Return type:
None
- validate(nodes: int = 1, ppn: int = 1, partition: str | None = None) bool [source]#
Check that there are sufficient resources in the provided Slurm partitions.
if no partition is provided, the default partition is found and used.
- Parameters:
nodes (
int
, default:1
) – Override the default node count to validateppn (
int
, default:1
) – Override the default processes per node to validatepartition (
Optional
[str
], default:None
) – partition to validate
- Raises:
LauncherError
- Return type:
bool
- Returns:
True if resources are available, False otherwise