Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmartRedis Integration Guide #214

Merged
merged 5 commits into from
Jun 22, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 75 additions & 57 deletions doc/sr_integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ Overview
========

This document provides some general guidelines to integrate the SmartRedis client
into existing simulation codebase. Developers of these simulation codebases will
need to identify the exact places to add the code, generally only SmartRedis
calls will need to be added in two places:
into existing simulation codebases. Developers of these simulation codebases will
need to identify the exact places to add the code; generally SmartRedis calls
will only need to be added in two places:

1. Initialization
2. Main loop
Expand All @@ -23,9 +23,9 @@ Creating the client
+++++++++++++++++++

The SmartRedis client must be initialized before it can be used to communicate
with the database. In the C++ and Python versions of the clients, this is done
when creating a new client whereas in the C and Fortran client an `initialize`
method that must be called.
with the orchestrator. In the C++ and Python versions of the clients, this is done
when creating a new client. In the C and Fortran client an `initialize`
method must be called.

C++::

Expand All @@ -40,95 +40,110 @@ Python::
Fortran::

use smartredis_client, only : client_type
include "enum_fortran.inc"

type(client_type) :: client
return_code = client%initialize(use_cluster)
if (return_code /= SRNoError) stop 'Error in initialization'

C::

#include "client.h"
#include "sr_enums.h"
void* client = NULL;
return_code = SmartRedisCClient(use_cluster), &client
return_code = SmartRedisCClient(use_cluster, &client)
if (return_code != SRNoError) {
return -1
}

All these methods only have one configurable parameter, indicated in the
above cases by the variable `use_cluster`. If this parameter, is true,
then the client expects to be able to communicate with a database with
All these methods only have one configurable parameter indicated in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameter -- indicated

above cases by the variable `use_cluster`. If this parameter is true,
then the client expects to be able to communicate with an orchestrator with
three or more shards.

++++++++++++++++++++++++++++++++++
(Parallel Programs): Creating keys
++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++
(Parallel Programs): Creating unique names
++++++++++++++++++++++++++++++++++++++++++

For parallel applications, each rank or thread that is communicating with
the database will likely need to create a unique prefix for keys to prevent
another rank or thread inadvertently overwriting data. Any identifier can be
used, though typically the MPI rank number (or equivalent identifier) is a
useful, unique number.
the orchestrator will likely need to create a unique prefix for names to prevent
another rank or thread inadvertently overwriting data. This prefix should be used
for when creating the name of a tensor, dataset, and model that needs to be unique
to a given rank. (Note: for models run within SmartSim, Additional prefixing may
done by the client when running an ensemble and/or multiple data sources).
Any identifier can be used, though typically the MPI rank number (or equivalent
identifier) is a useful, unique number.

C++::

const std::string key_prefix << std::format("{:06}_", *rank_id);
const std::string name_prefix << std::format("{:06}_", *rank_id);

Python::

key_prefix = f"{rank_id:06d}_"
name_prefix = f"{rank_id:06d}_"

Fortran::

character(len=12) :: key_prefix
write(key_prefix,'(A,I6.6)') rank_id
character(len=12) :: name_prefix
write(name_prefix,'(A,I6.6)') rank_id

C::

char[7] key_prefix;
key_prefix = sprintf(key_prefix", "%06d" rank_id);
char[7] name_prefix;
name_prefix = sprintf(name_prefix", "%06d\0", *rank_id);

++++++++++++++++++++++++++
Storing scripts and models
++++++++++++++++++++++++++

The last task that typically needs to be done is to store models or scripts
that will be used later in the simulation. When using a clustered database,
this only needs to be done by one client (unless ranks required a specific
model or script).
that will be used later in the simulation. When using a clustered orchestrator,
this only needs to be done by one client (unless each rank requires a different
model or script). MPI rank 0 is often a convenient choice to set models and
scripts.

C++::

if (root_client) {
client.set_model_from_file(model_key, model_file, backend, device)
client.set_model_from_file(model_name, model_file, backend, device)
}

Python::

if root_client:
client.set_model_from_file(model_key, model_file, backend, device)
client.set_model_from_file(model_name, model_file, backend, device)

Fortran::

if (root_client) return_code = client%set_model_from_file(model_key, model_file, "TORCH", "CPU")
if (root_client) return_code = client%set_model_from_file(model_name, model_file, backend, device)
if (return_code /= SRNoError) stop 'Error setting model'

C::

if (root_client) {
return_code = client.set_model_from_file(client, model_key, model_file, backend, device)
return_code = client.set_model_from_file(client, model_name, model_file, backend, device)
if (return_code != SRNoError) {
return -1
}
}

=========
Main loop
=========

Within the main loop of the code (e.g. every timestep or iteration of a solver),
the developer uses the SmartRedis client methods to implement a workflow which
the developer typically uses the SmartRedis client methods to implement a workflow which
may include receiving data, sending data, running a script or model, and/or
retrieving a result. These workflows are covered extensively in the walkthroughs
for the Fortran, C++, and python clients and the integrations with MOM6, OpenFOAM,
LAMMPS, and others.

Generally though, developers are advised to
Generally though, developers are advised to:

1. Find locations where file I/O would normally happen and either replace
or add code to use the SmartRedis client and store the data in the
database
2. Use the `key_prefix` created during initialization to avoid accidental
1. Find locations where file I/O would normally happen and either augment
or replace code to use the SmartRedis client and store the data in the
orchestrator
2. Use the `name_prefix` created during initialization to avoid accidental
writes/reads from different clients
3. Use the SmartSim `dataset` type when using clients representing decomposed
subdomains to make the retrieval/use of the data more performant
Expand All @@ -139,14 +154,14 @@ Full example

The following pseudocode is used to demonstrate various aspects of instrumenting an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the typical reader be more comfortable with C++ than Fortran?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been fielding more queries from the Fortran dev community than the C++ one. We can always include more examples later on down the line, but for a first pass I wanted to keep it in one language.

existing simulation code with SmartRedis. This code is representative of solving
the time-evolving heat equation. but will be augmented using an ML model to
the time-evolving heat equation. but we will augment it using an ML model to
provide a preconditioning step each iteration and post the state of the simulation
to the database. ::
to the orchestrator. ::

program main

! Initialize the model, setup MPI, communications, read input files
call initialize_model( temperature, number_of_timesteps )
call initialize_model(temperature, number_of_timesteps)

main_loop: do i=1,number_of_timesteps

Expand All @@ -161,40 +176,43 @@ to the database. ::

Following the guidelines from above, the first step is to initialize the client
and create a unique identifier for the given processor. This should be done
within roughly the same portion of the code where the rest of the model. ::
within roughly the same portion of the code where the rest of the model
performs the initialization of other components. ::

! Import SmartRedis modules
use, only smartredis_client : client_type

! Declare a new variable called client and a string to create a unique
! name for keys
! name for names
type(client_type) :: smartredis_client
character(len=7) :: key_prefix
character(len=7) :: name_prefix
integer :: mpi_rank, mpi_code, smartredis_code

! Note adding use_cluster as an additional runtime argument for SmartRedis
call initialize_model(temperature, number_of_timesteps, use_cluster)
call smartredis_client%initialize(use_cluster)
call MPI_Comm_rank(MPI_COMM_WORLD, mpi_rank, mpi_code)
write(key_prefix,'(I6.6,A)') mpi_rank, '_'
! Build the prefix for all tensors set in this model
write(name_prefix,'(I6.6,A)') mpi_rank, '_'

! Assume all ranks will use the same machine learning model
if (mpi_rank==0) call set_model_from_file("example_model_key", "path/to/model.pt", "TORCH", "gpu")
! Assume all ranks will use the same machine learning model, so no need to
! add the prefix to the model name
if (mpi_rank==0) call set_model_from_file("example_model_name", "path/to/model.pt", "TORCH", "gpu")

Next, add the calls in the main loop to send the temperature to the database ::
Next, add the calls in the main loop to send the temperature to the orchestrator ::

character(len=10), dimension(1) :: model_input, model_output
character(len=30), dimension(1) :: model_input, model_output

main_loop: do i=1,number_of_timesteps

! Write the current state of the simulation to a file
call write_current_state(temperature)
model_input(1) = key_prefix//"temperature"
model_output(1) = key_prefix//"temperature_out"
call smartredis_client%put_tensor(model_input(1))
model_input(1) = name_prefix//"temperature"
model_output(1) = name_prefix//"temperature_out"
call smartredis_client%put_tensor(model_input(1),temperature)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between comma and temperature


! Run the machine learning model
return_code = smartredis_client%run_model("example_model_key", model_input, model_output)
return_code = smartredis_client%run_model("example_model_name", model_input, model_output)
! The following line overwrites the prognostic temperature array
return_code = smartredis_client%unpack_tensor(model_output(1), temperature)

Expand All @@ -203,9 +221,9 @@ Next, add the calls in the main loop to send the temperature to the database ::

enddo

Now when this program runs, every time step the client will be used to the
temperature array in the database, the database will call a machine learning
model to do the inference, the simulation will request the inference,
and finally unpack the array into the existing temperature array. For more
complex examples, please see some of the integrations in the SmartSim Zoo or
feel free to contact the team at [email protected]
This model will now use the client every timestep to put a
temperature array in the orchestrator, instruct the orchestrator to call
a machine learning model for prediction/inference, and unpack the resulting
inference into the existing temperature array. For more complex examples,
please see some of the integrations in the SmartSim Zoo or feel free to
contact the team at [email protected]
2 changes: 1 addition & 1 deletion tutorials/getting_started/getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The next step is to initialize an `Experiment` instance. The `Experiment` must be provided a name which can be any string, but it is best practice to give it a meaningful name as a broad title for what types of models the experiment will be supervising. For our purposes, our `Experiment` will be named `\"getting-started\"`.\n",
"The next step is to initialize an `Experiment` instance. The `Experiment` must be provided a name. This name can be any string, but it is best practice to give it a meaningful name as a broad title for what types of models the experiment will be supervising. For our purposes, our `Experiment` will be named `\"getting-started\"`.\n",
"\n",
"The `Experiment` also needs to have a `launcher` specified. Launchers provide SmartSim the ability to construct and execute complex workloads on HPC systems with schedulers (workload managers) like Slurm, or PBS. SmartSim currently supports\n",
" * `slurm`\n",
Expand Down
2 changes: 1 addition & 1 deletion tutorials/ml_inference/Inference-in-SmartSim.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"This tutorial shows how to use trained PyTorch, TensorFlow, and ONNX (format) models, written in Python, directly in HPC workloads written in Fortran, C, C++ and Python.\n",
"\n",
"The examples simulation here is written in Python for brevity, however, the inference API in SmartRedis is the same (besides extra parameters for compiled langauges) across all clients. \n"
"The example simulation here is written in Python for brevity, however, the inference API in SmartRedis is the same (besides extra parameters for compiled langauges) across all clients. \n"
]
},
{
Expand Down