Model#

Overview#

SmartSim Model objects enable users to execute computational tasks in an Experiment workflow, such as launching compiled applications, running scripts, or performing general computational operations. A Model can be launched with other SmartSim Model(s) and Orchestrator(s) to build AI-enabled workflows. With the SmartSim Client (SmartRedis), data can be transferred from a Model to the Orchestrator for use in an ML model (TF, TF-lite, PyTorch, or ONNX), online training process, or additional Model applications. SmartSim Clients (SmartRedis) are available in Python, C, C++, or Fortran.

To initialize a SmartSim Model, use the Experiment.create_model factory method. When creating a Model, a RunSettings object must be provided. A RunSettings object specifies the Model executable (e.g. the full path to a compiled binary) as well as executable arguments and launch parameters. These specifications include launch commands (e.g. srun, aprun, mpiexec, etc), compute resource requirements, and application command-line arguments.

Once a Model instance has been initialized, users have access to the Model API functions to further configure the Model. The Model API functions provide users with the following capabilities:

Once the Model has been configured and launched, a user can leverage an Orchestrator within a Model through two strategies:

  • Connect to a Standalone Orchestrator

    When a Model is launched, it does not use or share compute resources on the same host (computer/server) where a SmartSim Orchestrator is running. Instead, it is launched on its own compute resources specified by the RunSettings object. The Model can connect via a SmartRedis Client to a launched standalone Orchestrator.

  • Connect to a Colocated Orchestrator

    When the colocated Model is started, SmartSim launches an Orchestrator on the Model compute nodes prior to the Model execution. The Model can then connect to the colocated Orchestrator via a SmartRedis Client.

Note

For the Client connection to be successful from within the Model application, the SmartSim Orchestrator must be launched prior to the start of the Model.

Note

A Model can be launched without an Orchestrator if data transfer and ML capabilities are not required.

SmartSim manages Model instances through the Experiment API by providing functions to launch, monitor, and stop applications. Additionally, a Model can be launched individually or as a group via an Ensemble.

Initialization#

Overview#

The Experiment is responsible for initializing all SmartSim entities. A Model is created using the Experiment.create_model factory method, and users can customize the Model via the factory method parameters.

The key initializer arguments for Model creation can be found in the Experiment API under the create_model docstring.

A name and RunSettings reference are required to initialize a Model. Optionally, include a BatchSettings object to specify workload manager batch launching.

Note

BatchSettings attached to a Model are ignored when the Model is executed as part of an Ensemble.

The params factory method parameter for Model creation allows a user to define simulation parameters and values through a dictionary. Using Model file functions, users can write these parameters to a file in the Model working directory.

When a Model instance is passed to Experiment.generate, a directory within the Experiment directory is created to store input and output files from the Model.

Note

It is strongly recommended to invoke Experiment.generate on the Model instance before launching the Model. If a path is not specified during Experiment.create_model, calling Experiment.generate with the Model instance will result in SmartSim generating a Model directory within the Experiment directory. This directory will be used to store the Model outputs and attached files.

Example#

In this example, we provide a demonstration of how to initialize and launch a Model within an Experiment workflow. The source code example is available in the dropdown below for convenient execution and customization.

Example Driver Script Source Code
from smartsim import Experiment

# Init Experiment and specify to launch locally in this example
exp = Experiment(name="getting-started", launcher="local")

# Initialize RunSettings
model_settings = exp.create_run_settings(exe="echo", exe_args="Hello World")

# Initialize Model instance
model_instance = exp.create_model(name="example-model", run_settings=model_settings)

# Generate Model directory
exp.generate(model_instance)

# Launch Model
exp.start(model_instance)

All workflow entities are initialized through the Experiment API. Consequently, initializing a SmartSim Experiment is a prerequisite for Model initialization.

To initialize an instance of the Experiment class, import the SmartSim Experiment module and invoke the Experiment constructor with a name and launcher:

1from smartsim import Experiment
2
3# Init Experiment and specify to launch locally in this example
4exp = Experiment(name="getting-started", launcher="local")

A Model requires RunSettings objects to specify how the Model should be executed within the workflow. We use the Experiment instance exp to call the factory method Experiment.create_run_settings to initialize a RunSettings object. Finally, we specify the executable “echo” to run the executable argument “Hello World”:

1# Initialize RunSettings
2model_settings = exp.create_run_settings(exe="echo", exe_args="Hello World")

See also

For more information on RunSettings objects, reference the RunSettings documentation.

We now have a RunSettings instance named model_settings that contains all of the information required to launch our application. Pass a name and the run settings instance to the create_model factory method:

1# Initialize Model instance
2model_instance = exp.create_model(name="example-model", run_settings=model_settings)

To create an isolated output directory for the Model, invoke Experiment.generate on the Model model_instance:

1# Generate Model directory
2exp.generate(model_instance)

Note

The Experiment.generate step is optional; however, this step organizes the Experiment entity output files into individual entity folders within the Experiment folder. Continue in the example for information on Model output generation or visit the Output and Error Files section.

All entities are launched, monitored and stopped by the Experiment instance. To start the Model, invoke Experiment.start on model_instance:

1# Launch Model
2exp.start(model_instance)

When the Experiment driver script is executed, two files from the model_instance will be created in the generated Model subdirectory:

  1. model_instance.out : this file will hold outputs produced by the model_instance workload.

  2. model_instance.err : this file will hold any errors that occurred during model_instance execution.

Colocated Orchestrator#

A SmartSim Model has the capability to share compute node(s) with a SmartSim Orchestrator in a deployment known as a colocated Orchestrator. In this scenario, the Orchestrator and Model share compute resources. To achieve this, users need to initialize a Model instance using the Experiment.create_model function and then utilize one of the three functions listed below to colocate an Orchestrator with the Model. This instructs SmartSim to launch an Orchestrator on the application compute node(s) before the Model execution.

There are three different Model API functions to colocate a Model:

  • Model.colocate_db_tcp: Colocate an Orchestrator instance and establish client communication using TCP/IP.

  • Model.colocate_db_uds: Colocate an Orchestrator instance and establish client communication using Unix domain sockets (UDS).

  • Model.colocate_db: (deprecated) An alias for Model.colocate_db_tcp.

Each function initializes an unsharded Orchestrator accessible only to the Model processes on the same compute node. When the Model is started, the Orchestrator will be launched on the same compute resource as the Model. Only the colocated Model may communicate with the Orchestrator via a SmartRedis Client by using the loopback TCP interface or Unix Domain sockets. Extra parameters for the Orchestrator can be passed into the colocate functions above via kwargs.

example_kwargs = {
    "maxclients": 100000,
    "threads_per_queue": 1,
    "inter_op_threads": 1,
    "intra_op_threads": 1
}

For a walkthrough of how to colocate a Model, navigate to the Colocated Orchestrator for instructions.

For users aiming to optimize performance, SmartSim offers the flexibility to specify processor IDs to which the colocated Orchestrator should be pinned. This can be achieved using the custom_pinning argument, which is recognized by both Model.colocate_db_uds and Model.colocate_db_tcp. In systems where specific processors support ML model and TorchScript execution, users can employ the custom_pinning argument to designate these processor IDs. This ensures that the specified processors are available when executing ML models or TorchScripts on the colocated Orchestrator. Additionally, users may use the custom_pinning argument to avoid reserved processors by specifying a available processor ID or a list of available processor IDs.

Files#

Overview#

Applications often depend on external files (e.g. training datasets, evaluation datasets, etc) to operate as intended. Users can instruct SmartSim to copy, symlink, or manipulate external files prior to a Model launch via the Model.attach_generator_files function.

Note

Multiple calls to Model.attach_generator_files will overwrite previous file configurations in the Model.

To setup the run directory for the Model, users should pass the list of files to Model.attach_generator_files using the following arguments:

  • to_copy (t.Optional[t.List[str]] = None): Files that are copied into the path of the Model.

  • to_symlink (t.Optional[t.List[str]] = None): Files that are symlinked into the path of the Model.

User-formatted files can be attached using the to_configure argument. These files will be modified during Model generation to replace tagged sections in the user-formatted files with values from the params initializer argument used during Model creation:

  • to_configure (t.Optional[t.List[str]] = None): Designed for text-based Model input files, “to_configure” is exclusive to the Model. During Model directory generation, the attached files are parsed and specified tagged parameters are replaced with the params values that were specified in the Experiment.create_model factory method of the Model. The default tag is a semicolon (e.g., THERMO = ;THERMO;).

In the Example subsection, we provide an example using the value to_configure within attach_generator_files.

Example#

This example demonstrates how to attach a file to a Model for parameter replacement at the time of Model directory generation. This is accomplished using the params function parameter in Experiment.create_model and the to_configure function parameter in Model.attach_generator_files. The source code example is available in the dropdown below for convenient execution and customization.

Example Driver Script Source Code
from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings object
model_settings = exp.create_run_settings(exe="path/to/executable/simulation")

# Initialize a Model object
model_instance = exp.create_model("model_name", model_settings, params={"THERMO":1})

# Attach the file to the Model instance
model_instance.attach_generator_files(to_configure="path/to/params_inputs.txt")

# Store model_instance outputs within the Experiment directory named getting-started
exp.generate(model_instance)

# Launch the Model
exp.start(model_instance)

In this example, we have a text file named params_inputs.txt. Within the text file, is the parameter THERMO that is required by the Model application at runtime:

THERMO = ;THERMO;

In order to have the tagged parameter ;THERMO; replaced with a usable value at runtime, two steps are required:

  1. The THERMO variable must be included in Experiment.create_model factory method as part of the params initializer argument.

  2. The file containing the tagged parameter ;THERMO;, params_inputs.txt, must be attached to the Model via the Model.attach_generator_files method as part of the to_configure function parameter.

To encapsulate our application within a Model, we must first create an Experiment instance. Begin by importing the Experiment module and initializing an Experiment:

1from smartsim import Experiment
2
3# Initialize the Experiment and set the launcher to auto
4exp = Experiment("getting-started", launcher="auto")

A SmartSim Model requires a RunSettings object to specify the Model executable (e.g. the full path to a compiled binary) as well as executable arguments and launch parameters. Create a simple RunSettings object and specify the path to the executable script as an executable argument (exe_args):

1# Initialize a RunSettings object
2model_settings = exp.create_run_settings(exe="path/to/executable/simulation")

See also

To read more on SmartSim RunSettings objects, reference the RunSettings documentation.

Next, initialize a Model object via Experiment.create_model. Pass in the model_settings instance and the params value:

1# Initialize a Model object
2model_instance = exp.create_model("model_name", model_settings, params={"THERMO":1})

We now have a Model instance named model_instance. Attach the text file, params_inputs.txt, to the Model for use at entity runtime. To do so, use the Model.attach_generator_files function and specify the to_configure parameter with the path to the text file, params_inputs.txt:

1# Attach the file to the Model instance
2model_instance.attach_generator_files(to_configure="path/to/params_inputs.txt")

To created an isolated directory for the Model outputs and configuration files, invoke Experiment.generate on model_instance as an input parameter:

1# Store model_instance outputs within the Experiment directory named getting-started
2exp.generate(model_instance)

The contents of getting-started/model_name/params_inputs.txt at runtime are:

THERMO = 1

Output and Error Files#

By default, SmartSim stores the standard output and error of the Model in two files:

  • <model_name>.out

  • <model_name>.err

The files are created in the working directory of the Model, and the filenames directly match the Model name. The <model_name>.out file logs standard outputs and the <model_name>.err logs errors for debugging.

Note

Invoking Experiment.generate(model) will create a directory model_name/ and will store the two files within that directory. You can also specify a path for these files using the path parameter when invoking Experiment.create_model.

ML Models and Scripts#

Overview#

SmartSim users have the capability to load ML models and TorchScripts into an Orchestrator within the Experiment script for use within a Model. Functions accessible through a Model object support loading ML models (TensorFlow, TensorFlow-lite, PyTorch, and ONNX) and TorchScripts into standalone or colocated Orchestrator(s) before application runtime.

Users can follow two processes to load an ML model to the Orchestrator:

Warning

Uploading an ML model from memory is solely supported for standalone Orchestrator(s). To upload an ML model to a colocated Orchestrator, users must save the ML model to disk and upload from file.

Users can follow three processes to load a TorchScript to the Orchestrator:

Warning

Uploading a TorchScript from memory is solely supported for standalone Orchestrator(s). To upload a TorchScript to a colocated Orchestrator, users upload from file or from string.

Once an ML model or TorchScript is loaded into the Orchestrator, Model objects can leverage ML capabilities by utilizing the SmartSim Client (SmartRedis) to execute the stored ML models and TorchScripts.

AI Models#

When configuring a Model, users can instruct SmartSim to load Machine Learning (ML) models to the Orchestrator. ML models added are loaded into the Orchestrator prior to the execution of the Model. To load an ML model to the Orchestrator, SmartSim users can provide the ML model in-memory or specify the file path when using the Model.add_ml_model function. SmartSim solely supports loading an ML model from file for use within standalone Orchestrator(s). The supported ML frameworks are TensorFlow, TensorFlow-lite, PyTorch, and ONNX.

The arguments that customize the storage and execution of an ML model can be found in the Model API under the add_ml_model docstring.

Example: Attach an In-Memory ML Model#

This example demonstrates how to attach an in-memory ML model to a SmartSim Model to load into an Orchestrator at Model runtime. The source code example is available in the dropdown below for convenient execution and customization.

Example Driver Script Source Code
from smartsim import Experiment
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, Input

class Net(keras.Model):
        def __init__(self):
            super(Net, self).__init__(name="cnn")
            self.conv = Conv2D(1, 3, 1)

        def call(self, x):
            y = self.conv(x)
            return y

def create_tf_cnn():
    """Create an in-memory Keras CNN for example purposes

    """
    from smartsim.ml.tf import serialize_model
    n = Net()
    input_shape = (3,3,1)
    inputs = Input(input_shape)
    outputs = n(inputs)
    model = keras.Model(inputs=inputs, outputs=outputs, name=n.name)

    return serialize_model(model)

# Serialize and save TF model
model, inputs, outputs = create_tf_cnn()

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings object
model_settings = exp.create_run_settings(exe="path/to/example_simulation_program")

# Initialize a Model object
model_instance = exp.create_model("model_name", model_settings)

# Attach the in-memory ML model to the SmartSim Model
model_instance.add_ml_model(name="cnn", backend="TF", model=model, device="GPU", devices_per_node=2, first_device=0, inputs=inputs, outputs=outputs)

Note

This example assumes:

  • an Orchestrator is launched prior to the Model execution (colocated or standalone)

  • an initialized Model named smartsim_model exists within the Experiment workflow

  • a Tensorflow-based ML model was serialized using serialize_model which returns the the ML model as a byte string with the names of the input and output layers

Attach the ML Model to a SmartSim Model

In this example, we have a serialized Tensorflow-based ML model that was saved to a byte string stored under model. Additionally, the serialize_model function returned the names of the input and output layers stored under inputs and outputs. Assuming an initialized Model named smartsim_model exists, we add the in-memory TensorFlow model using the Model.add_ml_model function and specify the in-memory ML model to the function parameter model:

1# Attach the in-memory ML model to the SmartSim Model
2model_instance.add_ml_model(name="cnn", backend="TF", model=model, device="GPU", devices_per_node=2, first_device=0, inputs=inputs, outputs=outputs)

In the above smartsim_model.add_ml_model code snippet, we pass in the following arguments:

  • name (“cnn”): A name to reference the ML model in the Orchestrator.

  • backend (“TF”): Indicating that the ML model is a TensorFlow model.

  • model (model): The in-memory representation of the TensorFlow model.

  • device (“GPU”): Specifying the device for ML model execution.

  • devices_per_node (2): Use two GPUs per node.

  • first_device (0): Start with 0 index GPU.

  • inputs (inputs): The name of the ML model input nodes (TensorFlow only).

  • outputs (outputs): The name of the ML model output nodes (TensorFlow only).

Warning

Calling exp.start(smartsim_model) prior to instantiation of an Orchestrator will result in a failed attempt to load the ML model to a non-existent Orchestrator.

When the Model is started via Experiment.start, the ML model will be loaded to the launched Orchestrator. The ML model can then be executed on the Orchestrator via a SmartSim Client (SmartRedis) within the application code.

Example: Attach an ML Model From File#

This example demonstrates how to attach a ML model from file to a SmartSim Model to load into an Orchestrator at Model runtime. The source code example is available in the dropdown below for convenient execution and customization.

Note

SmartSim supports loading ML models from file to standalone Orchestrator(s). This feature is not supported for colocated Orchestrator(s).

Example Driver Script Source Code
from smartsim import Experiment
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, Input

class Net(keras.Model):
        def __init__(self):
            super(Net, self).__init__(name="cnn")
            self.conv = Conv2D(1, 3, 1)

        def call(self, x):
            y = self.conv(x)
            return y

def save_tf_cnn(path, file_name):
    """Create a Keras CNN and save to file for example purposes"""
    from smartsim.ml.tf import freeze_model

    n = Net()
    input_shape = (3, 3, 1)
    n.build(input_shape=(None, *input_shape))
    inputs = Input(input_shape)
    outputs = n(inputs)
    model = keras.Model(inputs=inputs, outputs=outputs, name=n.name)

    return freeze_model(model, path, file_name)

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings object
model_settings = exp.create_run_settings(exe="path/to/example_simulation_program")

# Initialize a Model object
model_instance = exp.create_model("model_name", model_settings)

# Get and save TF model
model_file, inputs, outputs = save_tf_cnn(model_instance.path, "model.pb")

# Attach the from file ML model to the SmartSim Model
model_instance.add_ml_model(name="cnn", backend="TF", model_path=model_file, device="GPU", devices_per_node=2, first_device=0, inputs=inputs, outputs=outputs)

Note

This example assumes:

  • a standalone Orchestrator is launched prior to the Model execution

  • an initialized Model named smartsim_model exists within the Experiment workflow

  • a Tensorflow-based ML model was serialized using freeze_model which returns the the path to the serialized model file and the names of the input and output layers

Attach the ML Model to a SmartSim Model

In this example, we have a serialized Tensorflow-based ML model that was saved to disk and stored under model. Additionally, the freeze_model function returned the names of the input and output layers stored under inputs and outputs. Assuming an initialized Model named smartsim_model exists, we add the TensorFlow model using the Model.add_ml_model function and specify the TensorFlow model path to the parameter model_path:

1# Attach the from file ML model to the SmartSim Model
2model_instance.add_ml_model(name="cnn", backend="TF", model_path=model_file, device="GPU", devices_per_node=2, first_device=0, inputs=inputs, outputs=outputs)

In the above smartsim_model.add_ml_model code snippet, we pass in the following arguments:

  • name (“cnn”): A name to reference the ML model in the Orchestrator.

  • backend (“TF”): Indicating that the ML model is a TensorFlow model.

  • model_path (model_file): The path to the ML model script.

  • device (“GPU”): Specifying the device for ML model execution.

  • devices_per_node (2): Use two GPUs per node.

  • first_device (0): Start with 0 index GPU.

  • inputs (inputs): The name of the ML model input nodes (TensorFlow only).

  • outputs (outputs): The name of the ML model output nodes (TensorFlow only).

Warning

Calling exp.start(smartsim_model) prior to instantiation of an Orchestrator will result in a failed attempt to load the ML model to a non-existent Orchestrator.

When the Model is started via Experiment.start, the ML model will be loaded to the launched standalone Orchestrator. The ML model can then be executed on the Orchestrator via a SmartRedis Client (SmartRedis) within the application code.

TorchScripts#

When configuring a Model, users can instruct SmartSim to load TorchScripts to the Orchestrator. TorchScripts added are loaded into the Orchestrator prior to the execution of the Model. To load a TorchScript to the Orchestrator, SmartSim users can follow one of the processes:

Note

SmartSim does not support loading in-memory TorchScript functions to colocated Orchestrator(s). Users should instead load TorchScripts to a colocated Orchestrator from file or as a raw string.

Continue or select a process link to learn more on how each function (Model.add_script and Model.add_function) load TorchScripts to launched Orchestrator(s).

Attach an In-Memory TorchScript#

Users can define TorchScript functions within the Python driver script to attach to a Model. This feature is supported by Model.add_function which provides flexible device selection, allowing users to choose between which device the TorchScript is executed on, “GPU” or “CPU”. In environments with multiple devices, specific device numbers can be specified using the devices_per_node function parameter.

Warning

Model.add_function does not support loading in-memory TorchScript functions to a colocated Orchestrator. If you would like to load a TorchScript function to a colocated Orchestrator, define the function as a raw string or load from file.

The arguments that customize the execution of an in-memory TorchScript function can be found in the Model API under the add_function docstring.

Example: Load a In-Memory TorchScript Function#

This example walks through the steps of instructing SmartSim to load an in-memory TorchScript function to a standalone Orchestrator. The source code example is available in the dropdown below for convenient execution and customization.

Example Driver Script Source Code
from smartsim import Experiment

def timestwo(x):
    return 2*x

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings object
model_settings = exp.create_run_settings(exe="path/to/example_simulation_program")

# Initialize a Model object
model_instance = exp.create_model("model_name", model_settings)

# Append TorchScript function to Model
model_instance.add_function(name="example_func", function=timestwo, device="GPU", devices_per_node=2, first_device=0)

Note

The example assumes:

  • a standalone Orchestrator is launched prior to the Model execution

  • an initialized Model named smartsim_model exists within the Experiment workflow

Define an In-Memory TF Function

To begin, define an in-memory TorchScript function within the Experiment driver script. For the purpose of the example, we add a simple TorchScript function named timestwo:

1def timestwo(x):
2    return 2*x

Attach the In-Memory TorchScript Function to a SmartSim Model

We use the Model.add_function function to instruct SmartSim to load the TorchScript function timestwo onto the launched standalone Orchestrator. Specify the function timestwo to the function parameter:

1# Append TorchScript function to Model
2model_instance.add_function(name="example_func", function=timestwo, device="GPU", devices_per_node=2, first_device=0)

In the above smartsim_model.add_function code snippet, we input the following arguments:

  • name (“example_func”): A name to uniquely identify the TorchScript within the Orchestrator.

  • function (timestwo): Name of the TorchScript function defined in the Python driver script.

  • device (“CPU”): Specifying the device for TorchScript execution.

  • devices_per_node (2): Use two GPUs per node.

  • first_device (0): Start with 0 index GPU.

Warning

Calling exp.start(smartsim_model) prior to instantiation of an Orchestrator will result in a failed attempt to load the TorchScript to a non-existent Orchestrator.

When the Model is started via Experiment.start, the TF function will be loaded to the standalone Orchestrator. The function can then be executed on the Orchestrator via a SmartRedis Client (SmartRedis) within the application code.

Attach a TorchScript From File#

Users can attach TorchScript functions from a file to a Model and upload them to a colocated or standalone Orchestrator. This functionality is supported by the Model.add_script function’s script_path parameter. The function supports flexible device selection, allowing users to choose between “GPU” or “CPU” via the device parameter. In environments with multiple devices, specific device numbers can be specified using the devices_per_node parameter.

The arguments that customize the storage and execution of a TorchScript script can be found in the Model API under the add_script docstring.

Example: Load a TorchScript From File#

This example walks through the steps of instructing SmartSim to load a TorchScript from file to a launched Orchestrator. The source code example is available in the dropdown below for convenient execution and customization.

Example Driver Script Source Code

from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings object
model_settings = exp.create_run_settings(exe="path/to/example_simulation_program")

# Initialize a Model object
model_instance = exp.create_model("model_name", model_settings)

# Attach TorchScript to Model
model_instance.add_script(name="example_script", script_path="path/to/torchscript.py", device="GPU", devices_per_node=2, first_device=0)

Note

This example assumes:

  • a Orchestrator is launched prior to the Model execution (Colocated or standalone)

  • an initialized Model named smartsim_model exists within the Experiment workflow

Define a TorchScript Script

For the example, we create the Python script torchscript.py. The file contains a simple torch function shown below:

def negate(x):
    return torch.neg(x)

Attach the TorchScript Script to a SmartSim Model

Assuming an initialized Model named smartsim_model exists, we add the TorchScript script using Model.add_script by specifying the script path to the script_path parameter:

1# Attach TorchScript to Model
2model_instance.add_script(name="example_script", script_path="path/to/torchscript.py", device="GPU", devices_per_node=2, first_device=0)

In the above smartsim_model.add_script code snippet, we include the following arguments:

  • name (“example_script”): Reference name for the script inside of the Orchestrator.

  • script_path (“path/to/torchscript.py”): Path to the script file.

  • device (“CPU”): device for script execution.

  • devices_per_node (2): Use two GPUs per node.

  • first_device (0): Start with 0 index GPU.

Warning

Calling exp.start(smartsim_model) prior to instantiation of an Orchestrator will result in a failed attempt to load the ML model to a non-existent Orchestrator.

When smartsim_model is started via Experiment.start, the TorchScript will be loaded from file to the Orchestrator that is launched prior to the start of smartsim_model. The function can then be executed on the Orchestrator via a SmartRedis Client (SmartRedis) within the application code.

Define TorchScripts as Raw String#

Users can upload TorchScript functions from string to colocated or standalone Orchestrator(s). This feature is supported by the Model.add_script function’s script parameter. The function supports flexible device selection, allowing users to choose between “GPU” or “CPU” via the device parameter. In environments with multiple devices, specific device numbers can be specified using the devices_per_node parameter.

The arguments that customize the storage and execution of a TorchScript script can be found in the Model API under the add_script docstring.

Example: Load a TorchScript From String#

This example walks through the steps of instructing SmartSim to load a TorchScript from string to a Orchestrator. The source code example is available in the dropdown below for convenient execution and customization.

Example Driver Script Source Code
from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Initialize a RunSettings object
model_settings = exp.create_run_settings(exe="path/to/executable/simulation")

# Initialize a Model object
model_instance = exp.create_model("model_name", model_settings)

# TorchScript string
torch_script_str = "def negate(x):\n\treturn torch.neg(x)\n"

# Attach TorchScript to Model
model_instance.add_script(name="example_script", script=torch_script_str, device="GPU", devices_per_node=2, first_device=0)

Note

This example assumes:

  • a Orchestrator is launched prior to the Model execution (standalone or colocated)

  • an initialized Model named smartsim_model exists within the Experiment workflow

Define a String TorchScript

Define the TorchScript code as a variable in the Experiment driver script:

1# TorchScript string
2torch_script_str = "def negate(x):\n\treturn torch.neg(x)\n"

Attach the TorchScript Function to a SmartSim Model

Assuming an initialized Model named smartsim_model exists, we add a TensorFlow model using the Model.add_script function and specify the variable torch_script_str to the parameter script:

1# Attach TorchScript to Model
2model_instance.add_script(name="example_script", script=torch_script_str, device="GPU", devices_per_node=2, first_device=0)

In the above smartsim_model.add_script code snippet, we offer the following arguments:

  • name (“example_script”): key to store script under.

  • script (torch_script_str): TorchScript code.

  • device (“CPU”): device for script execution.

  • devices_per_node (2): Use two GPUs per node.

  • first_device (0): Start with 0 index GPU.

Warning

Calling exp.start(smartsim_model) prior to instantiation of an Orchestrator will result in a failed attempt to load the ML model to a non-existent Orchestrator.

When the Model is started via Experiment.start, the TorchScript will be loaded to the Orchestrator that is launched prior to the start of the Model.

Data Collision Prevention#

Overview#

If an Experiment consists of multiple Model(s) that use the same key names to reference information in the Orchestrator, the names used to reference data, ML models, and scripts will be identical, and without the use of SmartSim and SmartRedis prefix methods, Model(s) will end up inadvertently accessing or overwriting each other’s data. To prevent this situation, the SmartSim Model object supports key prefixing, which prepends the name of the Model to the keys sent to the Orchestrator to create unique key names. With this enabled, collision is avoided and Model(s) can use the same key names within their applications.

The key components of SmartSim Model prefixing functionality include:

  1. Sending Data to the Orchestrator: Users can send data to an Orchestrator with the Model name prepended to the data name through SmartSim Model functions and SmartRedis Client functions.

  2. Retrieving Data from the Orchestrator: Users can instruct a Client to prepend a Model name to a key during data retrieval, polling, or check for existence on the Orchestrator through SmartRedis Client functions.

For example, assume you have two Model(s) in an Experiment, named model_0 and model_1. In each application code you use the function Client.put_tensor("tensor_0", data) to send a tensor named “tensor_0” to the same Orchestrator. With Model key prefixing turned on, the model_0 and model_1 applications can access their respective tensor “tensor_0” by name without overwriting or accessing the other Model(s) “tensor_0” tensor. In this scenario, the two tensors placed in the Orchestrator are model_0.tensor_0 and model_1.tensor_0.

Enabling and Disabling#

SmartSim provides support for toggling prefixing on a Model for tensors, Datasets, lists, ML models, and scripts. Prefixing functions from the SmartSim Model API and SmartRedis Client API rely on each other to fully support SmartSim key prefixing. For example, to use the Client prefixing functions, a user must enable prefixing on the Model through Model.enable_key_prefixing. This function enables and activates prefixing for tensors, Datasets and lists placed in an Orchestrator by the Model. This configuration can be toggled within the Model application through Client functions, such as disabling tensor prefixing via Client.use_tensor_ensemble_prefix(False).

The interaction between the prefix SmartSim Model Functions<model_prefix_func> and SmartRedis Client Functions<client_prefix_func> are documentation below.

Model Functions#

A Model object supports two prefixing functions: Model.enable_key_prefixing and Model.register_incoming_entity.

To enable prefixing on a Model, users must use the Model.enable_key_prefixing function in the Experiment driver script. The key components of this function include:

  • Activates prefixing for tensors, Datasets, and lists sent to a Orchestrator from within the Model application.

  • Enables access to prefixing Client functions within the Model application. This excludes the Client.set_data_source function, where enable_key_prefixing is not require for access.

Note

ML model and script prefixing is not automatically enabled through Model.enable_key_prefixing and rather must be enabled within the Model application using Client.use_model_ensemble_prefix.

Users can enable a SmartRedis Client to interact with prefixed data, ML models and TorchScripts within a Model application by specifying the producer entity name to Client.set_data_source. However, for SmartSim to recognize the entity name within the application, the producer entity must be registered on the consumer entity using Ensemble.register_incoming_entity. This also applies to scenarios where the Model attempts to access data placed by self. For more information on Client.set_data_source, visit the Client functions section.

Client Functions#

A Client object supports five prefixing functions: Client.use_tensor_ensemble_prefix, Client.use_dataset_ensemble_prefix, Client.use_list_ensemble_prefix, Client.use_model_ensemble_prefix and Client.set_data_source.

To enable or disable SmartRedis data structure prefixing for tensors, Datasets, aggregation lists, ML models and scripts, SmartRedis Client offers functions per data structure:

  • Tensor: Client.use_tensor_ensemble_prefix

  • Dataset: Client.use_dataset_ensemble_prefix

  • Aggregation lists: Client.use_list_ensemble_prefix

  • ML models/scripts: Client.use_model_ensemble_prefix

Warning

To access the Client prefixing functions, prefixing must be enabled on the Model through Model.enable_key_prefixing. This function activates prefixing for tensors, Datasets and lists.

Examples are provided below that show the use of these Client methods in conjunction with the SmartSim key prefixing Model API functions.

Users can enable the SmartSim Client to interact with prefixed data, ML models and TorchScripts using the Client.set_data_source function. To leverage this capability:

  1. Use Model.register_incoming_entity on the Model intending to interact with prefixed data in the Orchestrator placed by a separate Model.

  2. Pass the SmartSim entity (e.g., another Model) to Model.register_incoming_entity in order to reference the Model prefix in the application code.

  3. In the Model application, instruct the Client to prepend the specified Model name during key searches using Client.set_data_source("model_name").

Examples are provided below that show the use of these Client methods in conjunction with the SmartSim key prefixing Model API functions.

Put/Set Operations#

In the following tabs we provide snippets of driver script and application code to demonstrate activating and deactivating prefixing for tensors, Datasets, lists, ML models and scripts using SmartRedis put/get semantics.

Activate Tensor Prefixing in the Driver Script

To activate prefixing on a Model in the driver script, a user must use the function Model.enable_key_prefixing. This functionality ensures that the Model name is prepended to each tensor name sent to the Orchestrator from within the Model executable code. The source code example is available in the dropdown below for convenient execution and customization.

Example Driver Script Source Code
from smartsim import Experiment

# Initialize the Experiment and set the launcher to auto
exp = Experiment("getting-started", launcher="auto")

# Create the run settings for the Model
model_settings = exp.create_run_settings(exe="path/to/executable/simulation")

# Create a Model instance named 'model'
model = exp.create_model("model_name", model_settings)
# Enable tensor, Dataset and list prefixing on the 'model' instance
model.enable_key_prefixing()

In the driver script snippet below, we take an initialized Model and activate tensor prefixing through the enable_key_prefixing function:

1# Create the run settings for the Model
2model_settings = exp.create_run_settings(exe="path/to/executable/simulation")
3
4# Create a Model instance named 'model'
5model = exp.create_model("model_name", model_settings)
6# Enable tensor, Dataset and list prefixing on the 'model' instance
7model.enable_key_prefixing()

In the model application, two tensors named tensor_1 and tensor_2 are sent to a launched Orchestrator. The contents of the Orchestrator after Model completion are:

1) "model_name.tensor_1"
2) "model_name.tensor_2"

You will notice that the Model name model_name has been prepended to each tensor name and stored in the Orchestrator.

Activate Tensor Prefixing in the Application

Users can further configure tensor prefixing in the application by using the Client function use_tensor_ensemble_prefix. By specifying a boolean value to the function, users can turn prefixing on and off.

Note

To have access to Client.use_tensor_ensemble_prefix, prefixing must be enabled on the Model in the driver script via Model.enable_key_prefixing.

In the application snippet below, we demonstrate enabling and disabling tensor prefixing:

# Disable key prefixing
client.use_tensor_ensemble_prefix(False)
# Place a tensor in the Orchestrator
client.put_tensor("tensor_1", np.array([1, 2, 3, 4]))
# Enable key prefixing
client.use_tensor_ensemble_prefix(True)
# Place a tensor in the Orchestrator
client.put_tensor("tensor_2", np.array([5, 6, 7, 8]))

In the application, two tensors named tensor_1 and tensor_2 are sent to a launched Orchestrator. The contents of the Orchestrator after Model completion are:

1) "tensor_1"
2) "model_name.tensor_2"

You will notice that the Model name model_name is not prefixed to tensor_1 since we disabled tensor prefixing before sending the tensor to the Orchestrator. However, when we enabled tensor prefixing and sent the second tensor, the Model name was prefixed to tensor_2.

Get Operations#

In the following sections, we walk through snippets of application code to demonstrate the retrieval of prefixed tensors, Datasets, lists, ML models, and scripts using SmartRedis put/get semantics. The examples demonstrate retrieval within the same application where the data structures were placed, as well as scenarios where data structures are placed by separate applications.

Retrieve a Tensor Placed by the Same Application

SmartSim supports retrieving prefixed tensors sent to the Orchestrator from within the same application where the tensor was placed. To achieve this, users must provide the Model name that stored the tensor to Client.set_data_source. This action instructs the Client to prepend the Model name to all key searches. For SmartSim to recognize the Model name as a data source, users must execute the Model.register_incoming_entity function on the Model and pass the self Model name in the driver script.

As an example, we placed a prefixed tensor on the Orchestrator within a Model named model_1. The Orchestrator contents are:

1) "model_1.tensor_name"

Note

In the driver script, after initializing the Model instance named model_1, we execute model_1.register_incoming_entity(model_1). By passing the Model instance to itself, we instruct SmartSim to recognize the name of model_1 as a valid data source for subsequent use in Client.set_data_source.

In the application snippet below, we demonstrate retrieving the tensor:

# Set the name to prepend to key searches
client.set_data_source("model_1")
# Retrieve the prefixed tensor
tensor_data = client.get_tensor("tensor_name")
# Log the tensor data
client.log_data(LLInfo, f"The tensor value is: {tensor_data}")
In the model.out file, the Client will log the message::

Default@00-00-00:The tensor value is: [1 2 3 4]

Retrieve a Tensor Placed by an External Application

SmartSim supports retrieving prefixed tensors sent to the Orchestrator by separate Model(s). To achieve this, users need to provide the Model name that stored the tensor to Client.set_data_source. This action instructs the Client to prepend the Model name to all key searches. For SmartSim to recognize the Model name as a data source, users must execute the Model.register_incoming_entity function on the Model responsible for the search and pass the Model instance that stored the data in the driver script.

In the example, a Model named model_1 has placed a tensor in a standalone Orchestrator with prefixing enabled on the Model. The contents of the Orchestrator are as follows:

1) "model_1.tensor_name"

We create a separate Model, named model_2, with the executable application code below.

Note

In the driver script, after initializing the Model instance named model_2, we execute model_2.register_incoming_entity(model_1). By passing the producer Model instance to the consumer Model, we instruct SmartSim to recognize the name of model_1 as a valid data source for subsequent use in Client.set_data_source.

Here we retrieve the stored tensor named tensor_name:

# Set the Model source name
client.set_data_source("model_1")
# Retrieve the prefixed tensor
tensor_data = client.get_tensor("tensor_name")
# Log the tensor data
client.log_data(LLInfo, f"The tensor value is: {tensor_data}")
In the model.out file, the Client will log the message::

Default@00-00-00:The tensor value is: [1 2 3 4]

Run Operations#

In the following sections, we walk through snippets of application code to demonstrate executing prefixed ML models and scripts using SmartRedis run semantics. The examples demonstrate executing within the same application where the ML Model and Script were placed, as well as scenarios where ML Model and Script are placed by separate applications.

Access ML Models From within the Application

SmartSim supports executing prefixed ML models with prefixed tensors sent to the Orchestrator from within the same application that the ML model was placed. To achieve this, users must provide the Model name that stored the ML model and input tensors to Client.set_data_source. This action instructs the Client to prepend the Model name to all key names. For SmartSim to recognize the Model name as a data source, users must execute the Model.register_incoming_entity function on the Model and pass the self Model name.

As an example, we placed a prefixed ML model and tensor on the Orchestrator within a Model named model_1. The Orchestrator contents are:

1) "model_1.mnist_cnn"
2) "model_1.mnist_images"

Note

In the driver script, after initializing the Model instance named model_1, we execute model_1.register_incoming_entity(model_1). By passing the Model instance to itself, we instruct SmartSim to recognize the name of model_1 as a valid data source for subsequent use in Client.set_data_source.

In the application snippet below, we demonstrate running the ML model:

# Set the Model source name
client.set_data_source("model_1")
# Run the ML model
client.run_model(name="mnist_cnn", inputs=["mnist_images"], outputs=["Identity"])

The Orchestrator now contains prefixed output tensors:

1) "model_1.Identity"
2) "model_1.mnist_cnn"
3) "model_1.mnist_images"

Note

The output tensors are prefixed because we executed model_1.enable_key_prefixing in the driver script which enables and activates prefixing for tensors, Datasets and lists.

Access ML Models Loaded From an External Application

SmartSim supports executing prefixed ML models with prefixed tensors sent to the Orchestrator by separate Model(s). To achieve this, users need to provide the Model name that stored the ML model and tensor to Client.set_data_source. This action instructs the Client to prepend the Model name to all key searches. For SmartSim to recognize the Model name as a data source, users must execute the Model.register_incoming_entity function on the Model responsible for the search and pass the Model instance that stored the data.

In the example, a Model named model_1 has placed a ML model and tensor in a standalone Orchestrator with prefixing enabled on the Model. The contents of the Orchestrator are as follows:

1) "model_1.mnist_cnn"
2) "model_1.mnist_images"

We create a separate Model, named model_2, with the executable application code below.

Note

In the driver script, after initializing the Model instance named model_2, we execute model_2.register_incoming_entity(model_1). By passing the producer Model instance to the consumer Model, we instruct SmartSim to recognize the name of model_1 as a valid data source for subsequent use in Client.set_data_source.

In the application snippet below, we demonstrate running the ML model:

# Set the Model source name
client.set_data_source("model_1")
# Run the ML model
client.run_model(name="mnist_cnn", inputs=["mnist_images"], outputs=["Identity"])

The Orchestrator now contains prefixed output tensors:

1) "model_2.Identity"
2) "model_1.mnist_cnn"
3) "model_1.mnist_images"

Note

The output tensors are prefixed because we executed model_2.enable_key_prefixing in the driver script which enables and activates prefixing for tensors, Datasets and lists.

Copy/Rename/Delete Operations#

In the following sections, we walk through snippets of application code to demonstrate the copy, rename and delete operations on prefixed tensors, Datasets, lists, ML models, and scripts. The examples demonstrate these operations within the same script where the data structures were placed, as well as scenarios where data structures are placed by separate scripts.

Copy/Rename/Delete Operations on Tensors in The Same Application

SmartSim supports copy/rename/delete operations on prefixed tensors sent to the Orchestrator from within the same application that the tensor was placed. To achieve this, users must provide the Model name that stored the tensor to Client.set_data_source. This action instructs the Client to prepend the Model name to all key names. For SmartSim to recognize the Model name as a data source, users must execute the Model.register_incoming_entity function on the Model and pass the self Model name.

As an example, we placed a prefixed tensor on the Orchestrator within a Model named model_1. The Orchestrator contents are:

1) "model_1.tensor"

Note

In the driver script, after initializing the Model instance named model_1, we execute model_1.register_incoming_entity(model_1). By passing the Model instance to itself, we instruct SmartSim to recognize the name of model_1 as a valid data source for subsequent use in Client.set_data_source.

To rename the tensor in the Orchestrator, we provide self Model name to Client.set_data_source then execute the function rename_tensor:

# Set the Model source name
client.set_data_source("model_1")
# Rename the tensor
client.rename_tensor("tensor", "renamed_tensor")

Because prefixing is enabled on the Model via enable_key_prefixing in the driver script, SmartSim will keep the prefix on the tensor but replace the tensor name as shown in the Orchestrator:

1) "model_1.renamed_tensor"

Next, we copy the prefixed tensor to a new destination:

client.copy_tensor("renamed_tensor", "copied_tensor")

Since tensor prefixing is enabled on the Client, the copied_tensor is prefixed:

1) "model_1.renamed_tensor"
2) "model_1.copied_tensor"

Next, delete renamed_tensor:

client.delete_tensor("renamed_tensor")

The contents of the Orchestrator are:

1) "model_1.copied_tensor"

Copy/Rename/Delete Operations on Tensors Placed by an External Application

SmartSim supports copy/rename/delete operations on prefixed tensors sent to the Orchestrator by separate Model(s). To achieve this, users need to provide the Model name that stored the tensor to Client.set_data_source. This action instructs the Client to prepend the Model name to all key searches. For SmartSim to recognize the Model name as a data source, users must execute the Model.register_incoming_entity function on the Model responsible for the search and pass the Model instance that stored the data.

In the example, a Model named model_1 has placed a tensor in a standalone Orchestrator with prefixing enabled on the Client. The Orchestrator contents are:

1) "model_1.tensor"

Note

In the driver script, after initializing the Model instance named model_2, we execute model_2.register_incoming_entity(model_1). By passing the producer Model instance to the consumer Model, we instruct SmartSim to recognize the name of model_1 as a valid data source for subsequent use in Client.set_data_source.

From within a separate Model named model_2, we perform basic copy/rename/delete operations. To instruct the Client to prepend a Model name to all key searches, use the Client.set_data_source function. Specify the Model name model_1 that placed the tensor in the Orchestrator:

# Set the Model source name
client.set_data_source("model_1")

To rename the tensor in the Orchestrator, we provide the tensor name:

client.rename_tensor("tensor", "renamed_tensor")

SmartSim will replace the prefix with the current Model name since prefixing is enabled on the current Model. The contents of the Orchestrator are:

1) "model_2.renamed_tensor"

Note

In the driver script, we also register model_2 as an entity on itself via model_2.register_incoming_entity(model_2). This way we can use Client.set_data_source to interact with prefixed data placed by model_2.

Next, we copy the prefixed tensor to a new destination:

# Set the Model source name
client.set_data_source("model_2")
# Copy the tensor data
client.copy_tensor("renamed_tensor", "copied_tensor")

The Orchestrator contents are:

1) "model_2.renamed_tensor"
2) "model_2.copied_tensor"

Next, delete copied_tensor by specifying the name:

client.delete_tensor("copied_tensor")

The contents of the Orchestrator are:

1) "model_2.renamed_tensor"