Ensemble

Ensemble#

Overview#

A SmartSim Ensemble enables users to run a group of computational tasks together in an Experiment workflow. An Ensemble is comprised of multiple Model objects, where each Ensemble member (SmartSim Model) represents an individual application. An Ensemble can be managed as a single entity and launched with other Model’s and Orchestrators to construct AI-enabled workflows.

The Ensemble API offers key features, including methods to:

Attach Configuration Files for use at Ensemble runtime.
Load AI Models (TF, TF-lite, PT, or ONNX) into the Orchestrator at Ensemble runtime.
Load TorchScripts into the Orchestrator at Ensemble runtime.
Prevent Data Collisions within the Ensemble, which allows for reuse of application code.

To create a SmartSim Ensemble, use the Experiment.create_ensemble API function. When initializing an Ensemble, consider one of the three creation strategies explained in the Initialization section.

SmartSim manages Ensemble instances through the Experiment API by providing functions to launch, monitor, and stop applications.

Initialization#

Overview#

The Experiment API is responsible for initializing all workflow entities. An Ensemble is created using the Experiment.create_ensemble factory method, and users can customize the Ensemble creation via the factory method parameters.

The factory method arguments for Ensemble creation can be found in the Experiment API under the create_ensemble docstring.

By using specific combinations of the factory method arguments, users can tailor the creation of an Ensemble to align with one of the following creation strategies:

Parameter Expansion: Generate a variable-sized set of unique simulation instances configured with user-defined input parameters.
Replica Creation: Generate a specified number of Model replicas.
Manually: Attach pre-configured Model’s to an Ensemble to manage as a single unit.

Parameter Expansion#

Parameter expansion is a technique that allows users to set parameter values per Ensemble member. This is done by specifying input to the params and perm_strategy factory method arguments during Ensemble creation (Experiment.create_ensemble). Users may control how the params values are applied to the Ensemble through the perm_strategy argument. The perm_strategy argument accepts three values listed below.

Parameter Expansion Strategy Options:

“all_perm”: Generate all possible parameter permutations for an exhaustive exploration. This means that every possible combination of parameters will be used in the Ensemble.
“step”: Create parameter sets by collecting identically indexed values across parameter lists. This allows for discrete combinations of parameters for Model’s.
“random”: Enable random selection from predefined parameter spaces, offering a stochastic approach. This means that the parameters will be chosen randomly for each Model, which can be useful for exploring a wide range of possibilities.

Examples#

This subsection contains two examples of Ensemble parameter expansion. The first example illustrates parameter expansion using two parameters while the second example demonstrates parameter expansion with two parameters along with the launch of the Ensemble as a batch workload.

Example 1 : Parameter Expansion Using all_perm Strategy

In this example an Ensemble of four Model entities is created by expanding two parameters using the all_perm strategy. All of the Model’s in the Ensemble share the same RunSettings and only differ in the value of the params assigned to each member. The source code example is available in the dropdown below for convenient execution and customization.
Example Driver Script Source Code
from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings
rs = exp.create_run_settings(exe="path/to/example_simulation_program")

#Create the parameters to expand to the Ensemble members
params = {
            "name": ["Ellie", "John"],
            "parameter": [2, 11]
        }

# Initialize the Ensemble by specifying RunSettings, the params and "all_perm"
ensemble = exp.create_ensemble("model_member", run_settings=rs, params=params, perm_strategy="all_perm")
Begin by initializing a RunSettings object to apply to all Ensemble members:
1# Initialize a RunSettings
2rs = exp.create_run_settings(exe="path/to/example_simulation_program")
Next, define the parameters that will be applied to the Ensemble:
1#Create the parameters to expand to the Ensemble members
2params = {
3            "name": ["Ellie", "John"],
4            "parameter": [2, 11]
5        }
Finally, initialize an Ensemble by specifying the RunSettings, params and perm_strategy=”all_perm”:
1# Initialize the Ensemble by specifying RunSettings, the params and "all_perm"
2ensemble = exp.create_ensemble("model_member", run_settings=rs, params=params, perm_strategy="all_perm")
By specifying perm_strategy=”all_perm”, all permutations of the params will be calculated and distributed across Ensemble members. Here there are four permutations of the params values:
ensemble member 1: ["Ellie", 2]
ensemble member 2: ["Ellie", 11]
ensemble member 3: ["John", 2]
ensemble member 4: ["John", 11]

Example 2 : Parameter Expansion Using step Strategy with the Ensemble Configured For Batch Launching

In this example an Ensemble of two Model entities is created by expanding two parameters using the step strategy. All of the Model’s in the Ensemble share the same RunSettings and only differ in the value of the params assigned to each member. Lastly, the Ensemble is submitted as a batch workload. The source code example is available in the dropdown below for convenient execution and customization.
Example Driver Script source code
from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a BatchSettings
bs = exp.create_batch_settings(nodes=2,
                               time="10:00:00")

# Initialize and configure RunSettings
rs = exp.create_run_settings(exe="python", exe_args="path/to/application_script.py")
rs.set_nodes(1)

#Create the parameters to expand to the Ensemble members
params = {
            "name": ["Ellie", "John"],
            "parameter": [2, 11]
        }

# Initialize the Ensemble by specifying RunSettings, BatchSettings, the params and "step"
ensemble = exp.create_ensemble("ensemble", run_settings=rs, batch_settings=bs, params=params, perm_strategy="step")
Begin by initializing and configuring a BatchSettings object to run the Ensemble instance:
1# Initialize a BatchSettings
2bs = exp.create_batch_settings(nodes=2,
3                               time="10:00:00")
The above BatchSettings object will instruct SmartSim to run the Ensemble on two nodes with a timeout of 10 hours.

Next initialize a RunSettings object to apply to all Ensemble members:
1# Initialize and configure RunSettings
2rs = exp.create_run_settings(exe="python", exe_args="path/to/application_script.py")
3rs.set_nodes(1)
Next, define the parameters to include in Ensemble:
1#Create the parameters to expand to the Ensemble members
2params = {
3            "name": ["Ellie", "John"],
4            "parameter": [2, 11]
5        }
Finally, initialize an Ensemble by passing in the RunSettings, params and perm_strategy=”step”:
1# Initialize the Ensemble by specifying RunSettings, BatchSettings, the params and "step"
2ensemble = exp.create_ensemble("ensemble", run_settings=rs, batch_settings=bs, params=params, perm_strategy="step")
When specifying perm_strategy=”step”, the params sets are created by collecting identically indexed values across the param value lists.
ensemble member 1: ["Ellie", 2]
ensemble member 2: ["John", 11]

Replicas#

A replica strategy involves the creation of identical Model’s within an Ensemble. This strategy is particularly useful for applications that have some inherent randomness. Users may use the replicas factory method argument to create a specified number of identical Model members during Ensemble creation (Experiment.create_ensemble).

Examples#

This subsection contains two examples of using the replicas creation strategy. The first example illustrates creating four Ensemble member clones while the second example demonstrates creating four Ensemble member clones along with the launch of the Ensemble as a batch workload.

Example 1 : Ensemble creation with replicas strategy

In this example an Ensemble of four identical Model members is created by specifying the number of clones to create via the replicas argument. All of the Model’s in the Ensemble share the same RunSettings. The source code example is available in the dropdown below for convenient execution and customization.
Example Driver Script Source Code
from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings object
rs = exp.create_run_settings(exe="python", exe_args="path/to/application_script.py")

# Initialize the Ensemble by specifying the number of replicas and RunSettings
ensemble = exp.create_ensemble("ensemble-replica", replicas=4, run_settings=rs)
To create an Ensemble of identical Model’s, begin by initializing a RunSettings object:
1# Initialize a RunSettings object
2rs = exp.create_run_settings(exe="python", exe_args="path/to/application_script.py")
Initialize the Ensemble by specifying the RunSettings object and number of clones to replicas:
1# Initialize the Ensemble by specifying the number of replicas and RunSettings
2ensemble = exp.create_ensemble("ensemble-replica", replicas=4, run_settings=rs)
By passing in replicas=4, four identical Ensemble members will be initialized.

Example 2 : Ensemble Creation with Replicas Strategy and Ensemble Batch Launching

In this example an Ensemble of four Model entities is created by specifying the number of clones to create via the replicas argument. All of the Model’s in the Ensemble share the same RunSettings and the Ensemble is submitted as a batch workload. The source code example is available in the dropdown below for convenient execution and customization.
Example Driver Script Source Code
from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a BatchSettings object
bs = exp.create_batch_settings(nodes=4,
                               time="10:00:00")

# Initialize and configure a RunSettings object
rs = exp.create_run_settings(exe="python", exe_args="path/to/application_script.py")
rs.set_nodes(4)

# Initialize an Ensemble
ensemble = exp.create_ensemble("ensemble-replica", replicas=4, run_settings=rs, batch_settings=bs)
To launch the Ensemble of identical Model’s as a batch job, begin by initializing a BatchSettings object:
1# Initialize a BatchSettings object
2bs = exp.create_batch_settings(nodes=4,
3                               time="10:00:00")
4
The above BatchSettings object will instruct SmartSim to run the Ensemble on four nodes with a timeout of 10 hours.

Next, create a RunSettings object to apply to all Model replicas:
1# Initialize and configure a RunSettings object
2rs = exp.create_run_settings(exe="python", exe_args="path/to/application_script.py")
3rs.set_nodes(4)
Initialize the Ensemble by specifying the RunSettings object, BatchSettings object and number of clones to replicas:
1# Initialize an Ensemble
2ensemble = exp.create_ensemble("ensemble-replica", replicas=4, run_settings=rs, batch_settings=bs)
By passing in replicas=4, four identical Ensemble members will be initialized.

Manually Append#

Manually appending Model’s to an Ensemble offers an in-depth level of customization in Ensemble design. This approach is favorable when users have distinct requirements for individual Model’s, such as variations in parameters, run settings, or different types of simulations.

Examples#

This subsection contains an example of creating an Ensemble by manually appending Model’s. The example illustrates attaching two SmartSim Model’s to the Ensemble. The Ensemble is submitted as a batch workload.

Example 1 : Append Model’s to an Ensemble and Launch as a Batch Job

In this example, we append Model’s to an Ensemble for batch job execution. To do this, we first initialize an Ensemble with a BatchSettings object. Then, manually create Model’s and add each to the Ensemble using the Ensemble.add_model function. The source code example is available in the dropdown below for convenient execution and customization.

To create an empty Ensemble to append Model’s, initialize the Ensemble with a batch settings object:

# Initialize BatchSettings
bs = exp.create_batch_settings(nodes=10,
                               time="01:00:00")

# Initialize Ensemble
ensemble = exp.create_ensemble("ensemble-append", batch_settings=bs)

Next, create the Model’s to append to the Ensemble:

# Initialize RunSettings for Model 1
srun_settings_1 = exp.create_run_settings(exe=exe, exe_args="path/to/application_script_1.py")
# Initialize RunSettings for Model 2
srun_settings_2 = exp.create_run_settings(exe=exe, exe_args="path/to/application_script_2.py")
# Initialize Model 1 with RunSettings 1
model_1 = exp.create_model(name="model_1", run_settings=srun_settings_1)
# Initialize Model 2 with RunSettings 2
model_2 = exp.create_model(name="model_2", run_settings=srun_settings_2)

Finally, append the Model objects to the Ensemble:

# Add Model member to Ensemble
ensemble.add_model(model_1)
# Add Model member to Ensemble
ensemble.add_model(model_2)

The new Ensemble is comprised of two appended Model members.

Files#

Overview#

Ensemble members often depend on external files (e.g. training datasets, evaluation datasets, etc) to operate as intended. Users can instruct SmartSim to copy, symlink, or manipulate external files prior to an Ensemble launch via the Ensemble.attach_generator_files function. Attached files will be applied to all Ensemble members.

Note

Multiple calls to Ensemble.attach_generator_files will overwrite previous file configurations on the Ensemble.

To attach a file to an Ensemble for use at runtime, provide one of the following arguments to the Ensemble.attach_generator_files function:

to_copy (t.Optional[t.List[str]] = None): Files that are copied into the path of the Ensemble members.
to_symlink (t.Optional[t.List[str]] = None): Files that are symlinked into the path of the Ensemble members. A symlink, or symbolic link, is a file that points to another file or directory, allowing you to access that file as if it were located in the same directory as the symlink.

To specify a template file in order to programmatically replace specified parameters during generation of Ensemble member directories, pass the following value to the Ensemble.attach_generator_files function:

to_configure (t.Optional[t.List[str]] = None): This parameter is designed for text-based Ensemble member input files. During directory generation for Ensemble members, the linked files are parsed and replaced with the params values applied to each Ensemble member. To further explain, the Ensemble creation strategy is considered when replacing the tagged parameters in the input files. These tagged parameters are placeholders in the text that are replaced with the actual parameter values during the directory generation process. The default tag is a semicolon (e.g., THERMO = ;THERMO;).

In the Example subsection, we provide an example using the value to_configure within Ensemble.attach_generator_files.

See also

To add a file to a single Model that will be appended to an Ensemble, refer to the Files section of the Model documentation.

Example#

This example demonstrates how to attach a text file to an Ensemble for parameter replacement. This is accomplished using the params function parameter in the Experiment.create_ensemble factory function and the to_configure function parameter in Ensemble.attach_generator_files. The source code example is available in the dropdown below for convenient execution and customization.

In this example, we have a text file named params_inputs.txt. Within the text, is the parameter THERMO that is required by each Ensemble member at runtime:

THERMO = ;THERMO;

In order to have the tagged parameter ;THERMO; replaced with a usable value at runtime, two steps are required:

The THERMO variable must be included in Experiment.create_ensemble factory method as part of the params parameter.
The file containing the tagged parameter ;THERMO;, params_inputs.txt, must be attached to the Ensemble via the Ensemble.attach_generator_files method as part of the to_configure parameter.

To encapsulate our application within an Ensemble, we must create an Experiment instance to gain access to the Experiment factory method that creates the Ensemble. Begin by importing the Experiment module and initializing an Experiment:

from smartsim import Experiment

# Initialize the Experiment
exp = Experiment("getting-started", launcher="auto")

To create our Ensemble, we are using the replicas initialization strategy. Begin by creating a simple RunSettings object to specify the path to the executable simulation as an executable:

# Initialize a RunSettings object
ensemble_settings = exp.create_run_settings(exe="python", exe_args="/path/to/application.py")

Next, initialize an Ensemble object with Experiment.create_ensemble by passing in ensemble_settings, params={“THERMO”:1} and replicas=2:

# Initialize an Ensemble object via replicas strategy
example_ensemble = exp.create_ensemble("ensemble", ensemble_settings, replicas=2, params={"THERMO":1})

We now have an Ensemble instance named example_ensemble. Attach the above text file to the Ensemble for use at entity runtime. To do so, we use the Ensemble.attach_generator_files function and specify the to_configure parameter with the path to the text file, params_inputs.txt:

# Attach the file to the Ensemble instance
example_ensemble.attach_generator_files(to_configure="path/to/params_inputs.txt")

To create an isolated directory for the Ensemble member outputs and configuration files, invoke Experiment.generate via the Experiment instance exp with example_ensemble as an input parameter:

# Generate the Ensemble directory
exp.generate(example_ensemble)

After invoking Experiment.generate, the attached generator files will be available for the application when exp.start(example_ensemble) is called.

# Launch the Ensemble
exp.start(example_ensemble)

The contents of params_inputs.txt after Ensemble completion are:

THERMO = 1

ML Models and Scripts#

Overview#

SmartSim users have the capability to load ML models and TorchScripts into an Orchestrator within the Experiment script for use within Ensemble members. Functions accessible through an Ensemble object support loading ML models (TensorFlow, TensorFlow-lite, PyTorch, and ONNX) and TorchScripts into standalone or colocated Orchestrators before application runtime.

See also

To add an ML model or TorchScript to a single Model that will be appended to an Ensemble, refer to the ML Models and Scripts section of the Model documentation.

Depending on the planned storage method of the ML model, there are two distinct approaches to load it into the Orchestrator:

From Memory
From File

Warning

Uploading an ML model from memory is solely supported for standalone Orchestrators. To upload an ML model to a colocated Orchestrator, users must save the ML model to disk and upload from file.

Depending on the planned storage method of the TorchScript, there are three distinct approaches to load it into the Orchestrator:

From Memory
From File
From String

Warning

Uploading a TorchScript from memory is solely supported for standalone Orchestrators. To upload a TorchScript to a colocated Orchestrator, users upload from file or from string.

Once a ML model or TorchScript is loaded into the Orchestrator, Ensemble members can leverage ML capabilities by utilizing the SmartSim client (SmartRedis) to execute the stored ML models or TorchScripts.

AI Models#

When configuring an Ensemble, users can instruct SmartSim to load Machine Learning (ML) models dynamically to the Orchestrator (colocated or standalone). ML models added are loaded into the Orchestrator prior to the execution of the Ensemble. To load an ML model to the Orchestrator, SmartSim users can serialize and provide the ML model in-memory or specify the file path via the Ensemble.add_ml_model function. The supported ML frameworks are TensorFlow, TensorFlow-lite, PyTorch, and ONNX.

Users must serialize TensorFlow ML models before sending to an Orchestrator from memory or from file. To save a TensorFlow model to memory, SmartSim offers the serialize_model function. This function returns the TF model as a byte string with the names of the input and output layers, which will be required upon uploading. To save a TF model to disk, SmartSim offers the freeze_model function which returns the path to the serialized TF model file with the names of the input and output layers. Additional TF model serialization information and examples can be found in the ML Features section of SmartSim.

Note

Uploading an ML model from memory is only supported for standalone Orchestrators.

When attaching an ML model using Ensemble.add_ml_model, the following arguments are offered to customize storage and execution:

name (str): name to reference the ML model in the Orchestrator.
backend (str): name of the backend (TORCH, TF, TFLITE, ONNX).
model (t.Optional[str] = None): An ML model in memory (only supported for non-colocated Orchestrators).
model_path (t.Optional[str] = None): serialized ML model.
device (t.Literal[“CPU”, “GPU”] = “CPU”): name of device for execution, defaults to “CPU”.
devices_per_node (int = 1): The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
first_device (int = 0): The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
batch_size (int = 0): batch size for execution, defaults to 0.
min_batch_size (int = 0): minimum batch size for ML model execution, defaults to 0.
min_batch_timeout (int = 0): time to wait for minimum batch size, defaults to 0.
tag (str = “”): additional tag for ML model information, defaults to “”.
inputs (t.Optional[t.List[str]] = None): ML model inputs (TF only), defaults to None.
outputs (t.Optional[t.List[str]] = None): ML model outputs (TF only), defaults to None.

See also

To add an ML model to a single Model that will be appended to an Ensemble, refer to the AI Models section of the Model documentation.

Example: Attach an In-Memory ML Model#

This example demonstrates how to attach an in-memory ML model to a SmartSim Ensemble to load into an Orchestrator at Ensemble runtime. The source code example is available in the dropdown below for convenient execution and customization.

Note

This example assumes:

an Orchestrator is launched prior to the Ensemble execution
an initialized Ensemble named ensemble_instance exists within the Experiment workflow
a Tensorflow-based ML model was serialized using serialize_model which returns the ML model as a byte string with the names of the input and output layers

Attach the ML Model to a SmartSim Ensemble

In this example, we have a serialized Tensorflow-based ML model that was saved to a byte string stored under model. Additionally, the serialize_model function returned the names of the input and output layers stored under inputs and outputs. Assuming an initialized Ensemble named ensemble_instance exists, we add the byte string TensorFlow model using Ensemble.add_ml_model:

# Attach the in-memory ML model to the SmartSim Ensemble
ensemble_instance.add_ml_model(name="cnn", backend="TF", model=model, device="GPU", devices_per_node=2, first_device=0, inputs=inputs, outputs=outputs)

In the above ensemble_instance.add_ml_model code snippet, we offer the following arguments:

name (“cnn”): A name to reference the ML model in the Orchestrator.
backend (“TF”): Indicating that the ML model is a TensorFlow model.
model (model): The in-memory representation of the TensorFlow model.
device (“GPU”): Specifying the device for ML model execution.
devices_per_node (2): Use two GPUs per node.
first_device (0): Start with 0 index GPU.
inputs (inputs): The name of the ML model input nodes (TensorFlow only).
outputs (outputs): The name of the ML model output nodes (TensorFlow only).

Warning

Calling exp.start(ensemble_instance) prior to the launch of an Orchestrator will result in a failed attempt to load the ML model to a non-existent standalone Orchestrator.

When the Ensemble is started via Experiment.start, the ML model will be loaded to the launched standalone Orchestrator. The ML model can then be executed on the Orchestrator via a SmartSim client (SmartRedis) within the application code.

Example: Attach an ML Model From File#

This example demonstrates how to attach a ML model from file to a SmartSim Ensemble to load into an Orchestrator at Ensemble runtime. The source code example is available in the dropdown below for convenient execution and customization.

Note

This example assumes:

a standalone Orchestrator is launched prior to Ensemble execution
an initialized Ensemble named ensemble_instance exists within the Experiment workflow
a Tensorflow-based ML model was serialized using freeze_model which returns the the path to the serialized model file and the names of the input and output layers

Attach the ML Model to a SmartSim Ensemble

In this example, we have a serialized Tensorflow-based ML model that was saved to disk and stored under model. Additionally, the freeze_model function returned the names of the input and output layers stored under inputs and outputs. Assuming an initialized Ensemble named ensemble_instance exists, we add a TensorFlow model using the Ensemble.add_ml_model function and specify the ML model path to the parameter model_path:

# Attach ML model file to Ensemble
ensemble_instance.add_ml_model(name="cnn", backend="TF", model_path=model_file, device="GPU", devices_per_node=2, first_device=0, inputs=inputs, outputs=outputs)

In the above ensemble_instance.add_ml_model code snippet, we offer the following arguments:

name (“cnn”): A name to reference the ML model in the Orchestrator.
backend (“TF”): Indicating that the ML model is a TensorFlow model.
model_path (model_file): The path to the ML model script.
device (“GPU”): Specifying the device for ML model execution.
devices_per_node (2): Use two GPUs per node.
first_device (0): Start with 0 index GPU.
inputs (inputs): The name of the ML model input nodes (TensorFlow only).
outputs (outputs): The name of the ML model output nodes (TensorFlow only).

Warning

Calling exp.start(ensemble_instance) prior to instantiation of an Orchestrator will result in a failed attempt to load the ML model to a non-existent Orchestrator.

When the Ensemble is started via Experiment.start, the ML model will be loaded to the launched Orchestrator. The ML model can then be executed on the Orchestrator via a SmartSim client (SmartRedis) within the application executable.

TorchScripts#

When configuring an Ensemble, users can instruct SmartSim to load TorchScripts dynamically to the Orchestrator. The TorchScripts become available for each Ensemble member upon being loaded into the Orchestrator prior to the execution of the Ensemble. SmartSim users may upload a single TorchScript function via Ensemble.add_function or alternatively upload a script containing multiple functions via Ensemble.add_script. To load a TorchScript to the Orchestrator, SmartSim users can follow one of the following processes:

Define a TorchScript Function In-Memory
Use the Ensemble.add_function to instruct SmartSim to load an in-memory TorchScript to the Orchestrator.
Define Multiple TorchScript Functions From File
Provide file path to Ensemble.add_script to instruct SmartSim to load the TorchScript from file to the Orchestrator.
Define a TorchScript Function as String
Provide function string to Ensemble.add_script to instruct SmartSim to load a raw string as a TorchScript function to the Orchestrator.

Note

Uploading a TorchScript from memory using Ensemble.add_function is only supported for standalone Orchestrators. Users uploading TorchScripts to colocated Orchestrators should instead use the function Ensemble.add_script to upload from file or as a string.

Each function also provides flexible device selection, allowing users to choose between which device the TorchScript is executed on, “GPU” or “CPU”. In environments with multiple devices, specific device numbers can be specified using the devices_per_node parameter.

Note

If device=GPU is specified when attaching a TorchScript function to an Ensemble, this instructs SmartSim to execute the TorchScript on GPU nodes. However, TorchScripts loaded to an Orchestrator are executed on the Orchestrator compute resources. Therefore, users must make sure that the device specified is included in the Orchestrator compute resources. To further explain, if a user specifies device=GPU, however, initializes Orchestrator on only CPU nodes, the TorchScript will not run on GPU nodes as advised.

Continue or select the respective process link to learn more on how each function (Ensemble.add_script and Ensemble.add_function) dynamically loads TorchScripts to the Orchestrator.

See also

To add a TorchScript to a single Model that will be appended to an Ensemble, refer to the TorchScripts section of the Model documentation.

Attach an In-Memory TorchScript#

Users can define TorchScript functions within the Experiment driver script to attach to an Ensemble. This feature is supported by Ensemble.add_function.

Warning

Ensemble.add_function does not support loading in-memory TorchScript functions to a colocated Orchestrator. If you would like to load a TorchScript function to a colocated Orchestrator, define the function as a raw string or load from file.

When specifying an in-memory TF function using Ensemble.add_function, the following arguments are offered:

name (str): reference name for the script inside of the Orchestrator.
function (t.Optional[str] = None): TorchScript function code.
device (t.Literal[“CPU”, “GPU”] = “CPU”): device for script execution, defaults to “CPU”.
devices_per_node (int = 1): The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
first_device (int = 0): The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

Example: Load a In-Memory TorchScript Function#

This example walks through the steps of instructing SmartSim to load an in-memory TorchScript function to a standalone Orchestrator. The source code example is available in the dropdown below for convenient execution and customization.

Note

The example assumes:

a standalone Orchestrator is launched prior to Ensemble execution
an initialized Ensemble named ensemble_instance exists within the Experiment workflow

Define an In-Memory TF Function

To begin, define an in-memory TorchScript function within the Python driver script. For the purpose of the example, we add a simple TorchScript function, timestwo:

def timestwo(x):
    return 2*x

Attach the In-Memory TorchScript Function to a SmartSim Ensemble

We use the Ensemble.add_function function to instruct SmartSim to load the TorchScript function timestwo onto the launched standalone Orchestrator. Specify the function timestwo to the function parameter:

# Attach TorchScript to Ensemble
ensemble_instance.add_function(name="example_func", function=timestwo, device="GPU", devices_per_node=2, first_device=0)

In the above ensemble_instance.add_function code snippet, we offer the following arguments:

name (“example_func”): A name to uniquely identify the TorchScript within the Orchestrator.
function (timestwo): Name of the TorchScript function defined in the Python driver script.
device (“GPU”): Specifying the device for TorchScript execution.
devices_per_node (2): Use two GPUs per node.
first_device (0): Start with 0 index GPU.

Warning

Calling exp.start(ensemble_instance) prior to instantiation of an Orchestrator will result in a failed attempt to load the TorchScript to a non-existent Orchestrator.

When the Ensemble is started via Experiment.start, the TF function will be loaded to the standalone Orchestrator. The function can then be executed on the Orchestrator via a SmartSim client (SmartRedis) within the application code.

Attach a TorchScript From File#

Users can attach TorchScript functions from a file to an Ensemble and upload them to a colocated or standalone Orchestrator. This functionality is supported by the Ensemble.add_script function’s script_path parameter.

When specifying a TorchScript using Ensemble.add_script, the following arguments are offered:

name (str): Reference name for the script inside of the Orchestrator.
script (t.Optional[str] = None): TorchScript code (only supported for non-colocated Orchestrators).
script_path (t.Optional[str] = None): path to TorchScript code.
device (t.Literal[“CPU”, “GPU”] = “CPU”): device for script execution, defaults to “CPU”.
devices_per_node (int = 1): The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
first_device (int = 0): The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

Example: Loading a TorchScript From File#

This example walks through the steps of instructing SmartSim to load a TorchScript from file to an Orchestrator. The source code example is available in the dropdown below for convenient execution and customization.

Note

This example assumes:

an Orchestrator is launched prior to Ensemble execution
an initialized Ensemble named ensemble_instance exists within the Experiment workflow

Define a TorchScript Script

For the example, we create the Python script torchscript.py. The file contains multiple simple torch function shown below:

def negate(x):
    return torch.neg(x)

def random(x, y):
    return torch.randn(x, y)

def pos(z):
    return torch.positive(z)

Attach the TorchScript Script to a SmartSim Ensemble

Assuming an initialized Ensemble named ensemble_instance exists, we add a TorchScript script using the Ensemble.add_script function and specify the script path to the parameter script_path:

# Initialize a Ensemble object
ensemble_instance = exp.create_ensemble("ensemble_name", ensemble_settings)

In the above smartsim_model.add_script code snippet, we offer the following arguments:

name (“example_script”): Reference name for the script inside of the Orchestrator.
script_path (“path/to/torchscript.py”): Path to the script file.
device (“GPU”): device for script execution.
devices_per_node (2): Use two GPUs per node.
first_device (0): Start with 0 index GPU.

Warning

Calling exp.start(ensemble_instance) prior to instantiation of an Orchestrator will result in a failed attempt to load the ML model to a non-existent Orchestrator.

When ensemble_instance is started via Experiment.start, the TorchScript will be loaded from file to the Orchestrator that is launched prior to the start of ensemble_instance.

Define TorchScripts as Raw String#

Users can upload TorchScript functions from string to send to a colocated or standalone Orchestrator. This feature is supported by the Ensemble.add_script function’s script parameter.

When specifying a TorchScript using Ensemble.add_script, the following arguments are offered:

name (str): Reference name for the script inside of the Orchestrator.
script (t.Optional[str] = None): String of function code (e.g. TorchScript code string).
script_path (t.Optional[str] = None): path to TorchScript code.
device (t.Literal[“CPU”, “GPU”] = “CPU”): device for script execution, defaults to “CPU”.
devices_per_node (int = 1): The number of GPU devices available on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.
first_device (int = 0): The first GPU device to use on the host. This parameter only applies to GPU devices and will be ignored if device is specified as CPU.

Example: Load a TorchScript From String#

This example walks through the steps of instructing SmartSim to load a TorchScript function from string to an Orchestrator before the execution of the associated Ensemble. The source code example is available in the dropdown below for convenient execution and customization.

Note

This example assumes:

an Orchestrator is launched prior to Ensemble execution
an initialized Ensemble named ensemble_instance exists within the Experiment workflow

Define a String TorchScript

Define the TorchScript code as a variable in the Python driver script:

# TorchScript string
torch_script_str = "def negate(x):\n\treturn torch.neg(x)\n"

Attach the TorchScript Function to a SmartSim Ensemble

Assuming an initialized Ensemble named ensemble_instance exists, we add a TorchScript using the Ensemble.add_script function and specify the variable torch_script_str to the parameter script:

# Attach TorchScript to Ensemble
ensemble_instance.add_script(name="example_script", script=torch_script_str, device="GPU", devices_per_node=2, first_device=0)

In the above ensemble_instance.add_script code snippet, we offer the following arguments:

name (“example_script”): key to store script under.
script (torch_script_str): TorchScript code.
device (“GPU”): device for script execution.
devices_per_node (2): Use two GPUs per node.
first_device (0): Start with 0 index GPU.

Warning

Calling exp.start(ensemble_instance) prior to instantiation of an Orchestrator will result in a failed attempt to load the ML model to a non-existent Orchestrator.

When the Ensemble is started via Experiment.start, the TorchScript will be loaded to the Orchestrator that is launched prior to the start of the Ensemble.

Data Collision Prevention#

Overview#

When multiple Ensemble members use the same code to send and access their respective data in the Orchestrator, key overlapping can occur, leading to inadvertent data access between Ensemble members. To address this, SmartSim supports key prefixing through Ensemble.enable_key_prefixing which enables key prefixing for all Ensemble members. For example, during an Ensemble simulation with prefixing enabled, SmartSim will add the Ensemble member name as a prefix to the keys sent to the Orchestrator. Enabling key prefixing eliminates issues related to key overlapping, allowing Ensemble members to use the same code without issue.

The key components of SmartSim Ensemble prefixing functionality include:

Sending Data to the Orchestrator: Users can send data to an Orchestrator with the Ensemble member name prepended to the data name by utilizing SmartSim Ensemble functions.
Retrieving Data From the Orchestrator: Users can instruct a Client to prepend a Ensemble member name to a key during data retrieval, polling, or check for existence on the Orchestrator through SmartRedis Client functions. However, entity interaction must be registered using Ensemble or Model functions.

See also

For information on prefixing Client functions, visit the Client functions page of the Model documentation.

For example, assume you have an Ensemble that was initialized using the replicas creation strategy. Two identical Model were created named ensemble_0 and ensemble_1 that use the same executable application within an Ensemble named ensemble. In the application code you use the function Client.put_tensor("tensor_0", data). Without key prefixing enabled, the slower member will overwrite the data from the faster simulation. With Ensemble key prefixing turned on, ensemble_0 and ensemble_1 can access their tensor “tensor_0” by name without overwriting or accessing the other Model’s “tensor_0” tensor. In this scenario, the two tensors placed in the Orchestrator are named ensemble_0.tensor_0 and ensemble_1.tensor_0.

Ensemble Functions#

An Ensemble object supports two prefixing functions: Ensemble.enable_key_prefixing and Ensemble.register_incoming_entity. For more information on each function, reference the Ensemble API docs.

To enable prefixing on a Ensemble, users must use the Ensemble.enable_key_prefixing function in the Experiment driver script. This function activates prefixing for tensors, Datasets, and lists sent to an Orchestrator for all Ensemble members. This function also enables access to prefixing Client functions within the Ensemble members. This excludes the Client.set_data_source function, where enable_key_prefixing is not require for access.

Note

ML model and script prefixing is not automatically enabled through Ensemble.enable_key_prefixing. Prefixing must be enabled within the Ensemble by calling the use_model_ensemble_prefix method on the Client embedded within the member application.

Users can enable the SmartRedis Client to interact with prefixed data, ML models and TorchScripts using the Client.set_data_source. However, for SmartSim to recognize the producer entity name passed to the function within an application, the producer entity must be registered on the consumer entity using Ensemble.register_incoming_entity.

If a consumer Ensemble member requests data sent to the Orchestrator by other Ensemble members, the producer members must be registered on consumer member. To access Ensemble members, SmartSim offers the attribute Ensemble.models that returns a list of Ensemble members. Below we demonstrate registering producer members on a consumer member:

# list of producer Ensemble members
list_of_ensemble_names = ["producer_0", "producer_1", "producer_2"]

# Grab the consumer Ensemble member
ensemble_member = ensemble.models.get("producer_3")
# Register the producer members on the consumer member
for name in list_of_ensemble_names:
    ensemble_member.register_incoming_entity(ensemble.models.get(name))

For examples demonstrating how to retrieve data within the entity application that produced the data, visit the Model Copy/Rename/Delete Operations subsection.

Example: Ensemble Key Prefixing#

In this example, we create an Ensemble comprised of two Model’s that use identical code to send data to a standalone Orchestrator. To prevent key collisions and ensure data integrity, we enable key prefixing on the Ensemble which automatically appends the Ensemble member name to the data sent to the Orchestrator. After the Ensemble completes, we launch a consumer Model within the Experiment driver script to demonstrate accessing prefixed data sent to the Orchestrator by Ensemble members.

This example consists of three Python scripts:

Application Producer Script: This script is encapsulated in a SmartSim Ensemble within the Experiment driver script. Prefixing is enabled on the Ensemble. The producer script puts NumPy tensors on an Orchestrator launched in the Experiment driver script. The Ensemble creates two identical Ensemble members. The producer script is executed in both Ensemble members to send two prefixed tensors to the Orchestrator. The source code example is available in the dropdown below for convenient customization.

Application Consumer Script: This script is encapsulated within a SmartSim Model in the Experiment driver script. The script requests the prefixed tensors placed by the producer script. The source code example is available in the dropdown below for convenient customization.

Experiment Driver Script: The driver script launches the Orchestrator, the Ensemble (which sends prefixed keys to the Orchestrator), and the Model (which requests prefixed keys from the Orchestrator). The Experiment driver script is the centralized spot that controls the workflow. The source code example is available in the dropdown below for convenient execution and customization.

The Application Producer Script#

In the Experiment driver script, we instruct SmartSim to create an Ensemble comprised of two duplicate members that execute this producer script. In the producer script, a SmartRedis Client sends a tensor to the Orchestrator. Since the Ensemble members are identical and therefore use the same application code, two tensors are sent to the Orchestrator. Without prefixing enabled on the Ensemble the keys can be overwritten. To prevent this, we enable key prefixing on the Ensemble in the driver script via Ensemble.enable_key_prefixing. When the producer script is executed by each Ensemble member, a tensor is sent to the Orchestrator with the Ensemble member name prepended to the tensor name.

Here we provide the producer script that is applied to the Ensemble members:

from smartredis import Client
import numpy as np

# Initialize a Client
client = Client(cluster=False)

# Create NumPy array
array = np.array([1, 2, 3, 4])
# Use SmartRedis Client to place tensor in standalone Orchestrator
client.put_tensor("tensor", array)

After the completion of Ensemble members producer_0 and producer_1, the contents of the Orchestrator are:

1) "producer_0.tensor"
2) "producer_1.tensor"

The Application Consumer Script#

In the Experiment driver script, we initialize a consumer Model that encapsulates the consumer application to request the tensors produced from the Ensemble. To do so, we use SmartRedis key prefixing functionality to instruct the SmartRedis Client to append the name of an Ensemble member to the key name.

See also

For more information on Client prefixing functions, visit the Client functions subsection of the Model documentation.

To begin, specify the imports and initialize a SmartRedis Client:

from smartredis import Client, LLInfo

# Initialize a Client
client = Client(cluster=False)

To retrieve the tensor from the first Ensemble member named producer_0, use Client.set_data_source. Specify the name of the first Ensemble member as an argument to the function. This instructs SmartSim to append the Ensemble member name to the data search on the Orchestrator. When Client.poll_tensor is executed, the SmartRedis client will poll for key, producer_0.tensor:

# Set the data source
client.set_data_source("producer_0")
# Check if the tensor exists
tensor_1 = client.poll_tensor("tensor", 100, 100)

Follow the same steps above, however, change the data source name to the name of the second Ensemble member (producer_1):

# Set the data source
client.set_data_source("producer_1")
# Check if the tensor exists
tensor_2 = client.poll_tensor("tensor", 100, 100)

We print the boolean return to verify that the tensors were found:

client.log_data(LLInfo, f"producer_0.tensor was found: {tensor_1}")
client.log_data(LLInfo, f"producer_1.tensor was found: {tensor_2}")

When the Experiment driver script is executed, the following output will appear in consumer.out:

Default@11-46-05:producer_0.tensor was found: True
Default@11-46-05:producer_1.tensor was found: True

Warning

For SmartSim to recognize the Ensemble member names as a valid data source to Client.set_data_source, you must register each Ensemble member on the consumer Model in the driver script via Model.register_incoming_entity. We demonstrate this in the Experiment driver script section of the example.

The Experiment Script#

The Experiment driver script manages all workflow components and utilizes the producer and consumer application scripts. In the example, the Experiment:

launches standalone Orchestrator
launches an Ensemble via the replicas initialization strategy
launches a consumer Model
clobbers the Orchestrator

To begin, add the necessary imports, initialize an Experiment instance and initialize the standalone Orchestrator:

from smartsim import Experiment
from smartsim.log import get_logger

logger = get_logger("Experiment Log")
# Initialize the Experiment
exp = Experiment("getting-started", launcher="auto")

# Initialize a standalone Orchestrator
standalone_orch = exp.create_database(db_nodes=1)

We are now setup to discuss key prefixing within the Experiment driver script. To create an Ensemble using the replicas strategy, begin by initializing a RunSettings object to apply to all Ensemble members. Specify the path to the application producer script:

# Initialize a RunSettings object for Ensemble
ensemble_settings = exp.create_run_settings(exe="/path/to/executable_producer_simulation")

Next, initialize an Ensemble by specifying ensemble_settings and the number of Model replicas to create:

# Initialize Ensemble
producer_ensemble = exp.create_ensemble("producer", run_settings=ensemble_settings, replicas=2)

Instruct SmartSim to prefix all tensors sent to the Orchestrator from the Ensemble via Ensemble.enable_key_prefixing:

# Enable key prefixing for Ensemble members
producer_ensemble.enable_key_prefixing()

Next, initialize the consumer Model. The consumer Model application requests the prefixed tensors produced by the Ensemble:

# Initialize a RunSettings object for Model
model_settings = exp.create_run_settings(exe="/path/to/executable_consumer_simulation")
# Initialize Model
consumer_model = exp.create_model("consumer", model_settings)

Next, organize the SmartSim entity output files into a single Experiment folder:

# Generate SmartSim entity folder tree
exp.generate(standalone_orch, producer_ensemble, consumer_model, overwrite=True)

Launch the Orchestrator:

# Launch Orchestrator
exp.start(standalone_orch, summary=True)

Launch the Ensemble:

# Launch Ensemble
exp.start(producer_ensemble, block=True, summary=True)

Set block=True so that Experiment.start waits until the last Ensemble member has finished before continuing.

The consumer Model application script uses Client.set_data_source which accepts the Ensemble member names when searching for prefixed keys in the Orchestrator. In order for SmartSim to recognize the Ensemble member names as a valid data source in the consumer Model, we must register the entity interaction:

# Register Ensemble members on consumer Model
for model in producer_ensemble:
    consumer_model.register_incoming_entity(model)

Launch the consumer Model:

# Launch consumer Model
exp.start(consumer_model, block=True, summary=True)

To finish, tear down the standalone Orchestrator:

# Clobber Orchestrator
exp.stop(standalone_orch)

Ensemble

Contents

Ensemble#

Overview#

Initialization#

Overview#

Parameter Expansion#

Examples#

Replicas#

Examples#

Manually Append#

Examples#

Files#

Overview#

Example#

ML Models and Scripts#

Overview#

AI Models#

Example: Attach an In-Memory ML Model#

Example: Attach an ML Model From File#

TorchScripts#

Attach an In-Memory TorchScript#

Example: Load a In-Memory TorchScript Function#

Attach a TorchScript From File#

Example: Loading a TorchScript From File#

Define TorchScripts as Raw String#

Example: Load a TorchScript From String#

Data Collision Prevention#

Overview#

Ensemble Functions#

Example: Ensemble Key Prefixing#

The Application Producer Script#

The Application Consumer Script#

The Experiment Script#