From c29074b43e590c2c6aaff8fa3cd7751f6e5e1916 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Tue, 17 Oct 2023 17:56:18 -0500 Subject: [PATCH 1/4] publishing branch --- doc/changelog.rst | 3 + doc/clients/c-plus.rst | 29 +++- doc/clients/c.rst | 28 +++- doc/clients/fortran.rst | 28 +++- doc/clients/python.rst | 31 ++++- doc/data_structures.rst | 177 +++++++++++++----------- doc/examples/cpp_api_examples.rst | 7 +- doc/examples/fortran_api_examples.rst | 12 +- doc/examples/python_api_examples.rst | 7 +- src/python/module/smartredis/dataset.py | 64 ++++----- 10 files changed, 256 insertions(+), 130 deletions(-) diff --git a/doc/changelog.rst b/doc/changelog.rst index 444bf3c8e..7491e48fb 100644 --- a/doc/changelog.rst +++ b/doc/changelog.rst @@ -8,6 +8,7 @@ To be released at some future point in time Description +- Updated docs for Client and Dataset APIs - Added coverage to SmartRedis Python API functions - Added name retrieval function to the DataSet object - Moved testing of examples to on-commit testing in CI/CD pipeline @@ -23,6 +24,7 @@ Description Detailed Notes +- Updated docs to specify differences with SmartRedis APIs - Added tests to increase Python code coverage - Added a function to the DataSet class and added a test - Moved testing of examples to on-commit testing in CI/CD pipeline (PR412_) @@ -35,6 +37,7 @@ Detailed Notes - Create CONTRIBUTIONS.md file that points to the contribution guideline for both SmartSim and SmartRedis (PR395_) - Migrated to ConfigOptions-based Client construction, adding multiple database support (PR353_) +.. _PR .. _PR414: https://github.com/CrayLabs/SmartRedis/pull/414 .. _PR411: https://github.com/CrayLabs/SmartRedis/pull/411 .. _PR412: https://github.com/CrayLabs/SmartRedis/pull/412 diff --git a/doc/clients/c-plus.rst b/doc/clients/c-plus.rst index 4ed831470..85d0260e4 100644 --- a/doc/clients/c-plus.rst +++ b/doc/clients/c-plus.rst @@ -1,10 +1,26 @@ -*** -C++ -*** +******** +C++ APIs +******** + +The following page provides a comprehensive overview of the SmartRedis C++ +APIs, which include the **Client API** and **Dataset API**. +Further explanation and details of each are presented below. Client API ========== +The Client API is purpose-built for interaction with the back-end database, +which extends the capabilities of the Redis in-memory data store. +It's important to note that the SmartRedis Client API is the exclusive +means for altering, transmitting, and receiving data within the in-memory +database. More specifically, the Client API is responsible for both +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. +It also handles the transmission and reception of +the aforementioned data structures in addition to :ref:`Dataset ` +data structure. Creating and modifying the ``DataSet`` object +is confined to local operation by the DataSet API. + .. doxygenclass:: SmartRedis::Client :project: cpp_client :members: @@ -14,6 +30,13 @@ Client API Dataset API =========== +The C++ DataSet API enables a user to manage a group of tensors +and associated metadata within a datastructure called a ``DataSet`` object. +The DataSet API operates independently of the database and solely +maintains the dataset object **in-memory**. The actual interaction with the Redis database, +where a snapshot of the DataSet object is sent, is handled by the Client API. For more +information on the ``DataSet`` object, click :ref:`here `. + .. doxygenclass:: SmartRedis::DataSet :project: cpp_client :members: diff --git a/doc/clients/c.rst b/doc/clients/c.rst index 354e8d6c0..403ee4ec6 100644 --- a/doc/clients/c.rst +++ b/doc/clients/c.rst @@ -1,11 +1,26 @@ +******* +C APIs +******* -*** - C -*** +The following page provides a comprehensive overview of the SmartRedis C +APIs, which include the **Client API** and **Dataset API**. +Further explanation and details of each are presented below. Client API ========== +The Client API is purpose-built for interaction with the back-end database, +which extends the capabilities of the Redis in-memory data store. +It's important to note that the SmartRedis Client API is the exclusive +means for altering, transmitting, and receiving data within the in-memory +database. More specifically, the Client API is responsible for both +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. +It also handles the transmission and reception of +the aforementioned data structures in addition to :ref:`Dataset ` +data structure. Creating and modifying the ``DataSet`` object +is confined to local operation by the DataSet API. + .. doxygenfile:: c_client.h :project: c_client @@ -13,6 +28,13 @@ Client API Dataset API =========== +The C++ DataSet API enables a user to manage a group of tensors +and associated metadata within a datastructure called a ``DataSet`` object. +The DataSet API operates independently of the database and solely +maintains the dataset object **in-memory**. The actual interaction with the Redis database, +where a snapshot of the DataSet object is sent, is handled by the Client API. For more +information on the ``DataSet`` object, click :ref:`here `. + .. doxygenfile:: c_dataset.h :project: c_client diff --git a/doc/clients/fortran.rst b/doc/clients/fortran.rst index fdde448fe..d22160b65 100644 --- a/doc/clients/fortran.rst +++ b/doc/clients/fortran.rst @@ -1,11 +1,26 @@ +************ +Fortran APIs +************ -******* -Fortran -******* +The following page provides a comprehensive overview of the SmartRedis Fortran +APIs, which include the **Client API** and **Dataset API**. +Further explanation and details of each are presented below. Client API ========== +The Client API is purpose-built for interaction with the back-end database, +which extends the capabilities of the Redis in-memory data store. +It's important to note that the SmartRedis Client API is the exclusive +means for altering, transmitting, and receiving data within the in-memory +database. More specifically, the Client API is responsible for both +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. +It also handles the transmission and reception of +the aforementioned data structures in addition to :ref:`Dataset ` +data structure. Creating and modifying the ``DataSet`` object +is confined to local operation by the DataSet API. + The following are overloaded interfaces which support 32/64-bit ``real`` and 8, 16, 32, and 64-bit ``integer`` tensors @@ -17,6 +32,13 @@ The following are overloaded interfaces which support Dataset API =========== +The C++ DataSet API enables a user to manage a group of tensors +and associated metadata within a datastructure called a ``DataSet`` object. +The DataSet API operates independently of the database and solely +maintains the dataset object **in-memory**. The actual interaction with the Redis database, +where a snapshot of the DataSet object is sent, is handled by the Client API. For more +information on the ``DataSet`` object, click :ref:`here `. + The following are overloaded interfaces which support 32/64-bit ``real`` and 8, 16, 32, and 64-bit ``integer`` tensors diff --git a/doc/clients/python.rst b/doc/clients/python.rst index b82fdf2f2..d0bf04e05 100644 --- a/doc/clients/python.rst +++ b/doc/clients/python.rst @@ -1,13 +1,31 @@ -****** -Python -****** +*********** +Python APIs +*********** + +The following page provides a comprehensive overview of the SmartRedis Python +APIs, which include the **Client API**, **Dataset API**, and **Logging API**. +Further explanation and details of each are presented below. Client API ========== +The Client API is purpose-built for interaction with the back-end database, +which extends the capabilities of the Redis in-memory data store. +It's important to note that the SmartRedis Client API is the exclusive +means for altering, transmitting, and receiving data within the in-memory +database. More specifically, the Client API is responsible for both +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. +It also handles the transmission and reception of +the aforementioned data structures in addition to :ref:`Dataset ` +data structure. Creating and modifying the ``DataSet`` object +is confined to local operation by the DataSet API. + Client Class Method Overview ---------------------------- +Below is a short + .. currentmodule:: smartredis .. autosummary:: @@ -86,6 +104,13 @@ Client Class Method Detailed View DataSet API =========== +The C++ DataSet API enables a user to manage a group of tensors +and associated metadata within a datastructure called a ``DataSet`` object. +The DataSet API operates independently of the database and solely +maintains the dataset object **in-memory**. The actual interaction with the Redis database, +where a snapshot of the DataSet object is sent, is handled by the Client API. For more +information on the ``DataSet`` object, click :ref:`here `. + Dataset Class Method Overview ----------------------------- diff --git a/doc/data_structures.rst b/doc/data_structures.rst index fb547c4bd..dcc13523c 100644 --- a/doc/data_structures.rst +++ b/doc/data_structures.rst @@ -2,22 +2,24 @@ Data Structures *************** -RedisAI defines three new data structures to be -used in redis databases: tensor, model, and script. -In addition, SmartRedis defines an additional data -structure ``DataSet``. In this section, the SmartRedis -API for interacting with these data structures -will be described, and when applicable, -comments on performance and best practices will be made. - -In general, concepts and capabilities will be -demonstrated for the Python and C++ API. -The C and Fortran function signatures closely -resemble the C++ API, and as a result, -they are not discussed in detail in the interest -of brevity. For more detailed explanations of the C -and Fortran API, refer to the documentation pages for those -clients. +SmartSim defines three data structures designed for use within back-end databases: + +* ``Tensor`` : represents an n-dimensional array of values. +* ``Model`` : represents a computational ML model for one of the supported backend frameworks. +* ``Script`` : represents a TorchScript program. + +In addition, SmartRedis defines a data +structure named ``DataSet`` that enables a user to manage a group of tensors +and associated metadata in-memory. In this section, we will provide an explanation +of the SmartRedis API used to interact with these four data structures, +along with relevant insights on performance and best practices. + +We illustrate concepts and capabilities of the Python +and C++ SmartRedis APIs. The C and Fortran function signatures closely +mirror the C++ API, and for brevity, we won't delve +into the two extensively. For more comprehensive explanations of +the C and Fortran SmartRedis APIs, please consult the respective documentation +pages. .. _data_structures_tensor: @@ -27,7 +29,7 @@ Tensor An n-dimensional tensor is used by RedisAI to store and manipulate numerical data. SmartRedis provides functions to -put a key and tensor pair into the Redis database and retrieve +put a key and tensor pair into the back-end database and retrieve a tensor associated with a key from the database. .. note:: @@ -88,7 +90,7 @@ Retrieving ---------- The C++, C, and Fortran clients provide two methods for retrieving -tensors from the Redis database. The first method is referred to +tensors from the back-end database. The first method is referred to as *unpacking* a tensor. When a tensor is retrieved via ``unpack_tensor()``, the memory space to store the retrieved tensor data is provided by the user. This has the advantage @@ -177,16 +179,17 @@ SmartSim ensemble capabilities. Dataset ======= -In many situations, a ``Client`` might be tasked with sending a -group of tensors and metadata which are closely related and -naturally grouped into a collection for future retrieval. -The ``DataSet`` object stages these items so that they can be -more efficiently placed in the redis database and can later be -retrieved with the name given to the ``DataSet``. +When dealing with multi-modal data or complex data sets, +you may have different types of tensors (e.g., images, text embeddings, +numerical data) and metadata for each data point. Grouping them into a +collection represents each data point as a cohesive unit. +The ``DataSet`` data structure provides this functionality to stage tensors and metadata + **in-memory** via the ``DataSet API``. After the creation of a +``DataSet`` object, the grouped data can be efficiently stored in the back-end database +by the ``Client API`` and subsequently retrieved using the assigned ``DataSet`` name. +In the upcoming sections, we outline the process of building, sending, and retrieving a ``DataSet``. Listed below are the supported tensor and metadata types. -In the following sections, building, sending, and retrieving -a ``DataSet`` will be described. .. list-table:: Supported Data Types :widths: 25 25 25 @@ -230,36 +233,40 @@ a ``DataSet`` will be described. - - X -Sending -------- +Build and Send a DataSet +------------------------ -When building a ``DataSet`` to be stored in the database, -a user can add any combination of tensors and metadata. -To add a tensor to the ``DataSet``, the user simply uses -the ``DataSet.add_tensor()`` function defined in -each language. The ``DataSet.add_tensor()`` parameters are the same -as ``Client.put_tensor()``, and as a result, details of the function -signatures will not be reiterated here. +When building a ``DataSet`` object in-memory, +a user can group various combinations of tensors and metadata that +constrain to the supported data types in the table above. To illustrate, +to include a tensor in a ``DataSet`` object, use the ``DataSet.add_tensor()`` +function in a supported language. The SmartRedis DataSet API functions +are available in C, C++, Python, and Fortran. The DataSet API or ``DataSet.add_tensor()`` function, +operates independently of the database and solely +maintains the dataset object. Storing the dataset in the back-end +database is done via the Client API ``put_dataset()`` method. .. note:: - ``DataSet.add_tensor()`` copies the tensor data - provided by the user to eliminate errors from user-provided - data being cleared or deallocated. This additional memory - will be freed when the DataSet - object is destroyed. + The ``DataSet.add_tensor()`` function copies user-provided + tensor data; this prevents potential issues arising from the user's + data being cleared or deallocated. Any additional memory allocated + for this purpose will be released when the DataSet object is deleted + or no longer in use. -Metadata can be added to the ``DataSet`` with the +Metadata can be added to an in-memory ``DataSet`` object with the ``DataSet.add_meta_scalar()`` and ``DataSet.add_meta_string()`` -functions. As the aforementioned function names suggest, -there are separate functions to add metadata that is a scalar -(e.g. double) and a string. For both functions, the first -function input is the name of the metadata field. This field -name is an internal ``DataSet`` identifier for the metadata -value(s) that is used for future retrieval, and because it -is an internal identifier, the user does not have to worry -about any key conflicts in the database (i.e. multiple ``DataSet`` -can have the same metadata field names). To clarify these -and future descriptions, the C++ interface for adding +functions. As indicated by the function names, distinct functions +exist for adding scalar metadata (e.g., double) and string metadata. +For both functions, the first input +parameter is the name of the metadata field. +The field name serves as an internal identifier within the ``DataSet`` +for grouped metadata values. It's used to retrieve metadata in the future. +Since it's an internal identifier, users don't need to be concerned +about conflicts with keys in the database. In other words, multiple +``DataSet`` objects can use the same metadata field names without causing +issues because these names are managed within the ``DataSet`` and won't +interfere with external database keys. To provide an implementation example, +the C++ interface for adding metadata is shown below: .. code-block:: cpp @@ -277,50 +284,58 @@ metadata is shown below: When adding a scalar or string metadata value, the value is copied by the ``DataSet``, and as a result, the user does not need to ensure that the metadata values provided -are still in memory after they have been added. +are still in-memory. In other words, +the ``DataSet`` handles the memory management of these metadata values, +and you don't need to retain or manage the original copies separately +once they have been included in the ``DataSet`` object. Additionally, multiple metadata values can be added to a -single field, and the default behavior is to append the value to the -existing field. In this way, the ``DataSet`` metadata supports -one-dimensional arrays, but the entries in the array must be added -iteratively by the user. Also, note that in the above C++ example, +single field name, and the default behavior is to append the value to the +field name if it exists, create if not. This behavior allows the ``DataSet`` metadata +to function like one-dimensional arrays. However, if you would like to add +multiple metadata values to a field name, you will need to add them one by +one in an iterative manner. + +Also, note that in the above C++ example, the metadata scalar type must be specified with a ``SRMetaDataType`` enum value, and similar requirements exist for C and Fortran ``DataSet`` implementations. Finally, the ``DataSet`` object is sent to the database using the ``Client.put_dataset()`` function, which is uniform across all clients. +To emphasize once more, all interactions with the back-end database are handle by +the Client API, not the DataSet API. -Retrieving ----------- +Retrieving a DataSet +-------------------- In all clients, the ``DataSet`` is retrieved with a single function call to ``Client.get_dataset()``, which requires only the name of the ``DataSet`` (i.e. the name used in the constructor of the ``DataSet`` when it was -built and placed in the database). ``Client.get_dataset()`` -returns to the user a DataSet object or a pointer to a -DataSet object that can be used to access all of the +built and placed in the database by the Client API). ``Client.get_dataset()`` +returns to the user a ``DataSet`` object or a pointer to a +``DataSet`` object from the database that is used to access all of the dataset tensors and metadata. -The functions for retrieving tensors from ``DataSet`` +The functions for retrieving tensors from an in-memory ``DataSet`` object are identical to the functions provided by ``Client``, and the same return values and memory management -paradigm is followed. As a result, please refer to +paradigm is followed. As a result, please refer to the previous section for details on tensor retrieve function calls. -There are two functions for retrieving metadata: +There are two functions for retrieving metadata from a ``DataSet`` object in-memory: ``get_meta_scalars()`` and ``get_meta_strings()``. As the names suggest, the first function is used for retrieving numerical metadata values, and the second is for retrieving metadata string -values. The metadata retrieval function prototypes +values. The metadata retrieval function prototypes vary across the clients based on programming language constraints, and as a result, please refer to the ``DataSet`` API documentation -for a description of input parameters and memory management. It is +for a description of input parameters and memory management. It is important to note, however, that all functions require the name of the -metadata field to be retrieved, and this name is the same name that +metadata field to be retrieved. This name is the same name that was used when constructing the metadata field with ``add_meta_scalar()`` and ``add_meta_string()`` functions. @@ -331,6 +346,8 @@ SmartRedis also supports an advanced API for working with aggregate lists of DataSets; details may be found :ref:`here <_advanced_topics_dataset_aggregation>`. +.. _data_structures_model: + Model ===== @@ -341,8 +358,8 @@ RedisAI supports PyTorch, TensorFlow, TensorFlow Lite, and ONNX backends, and specifying the backend to be used is done through the ``Client`` function calls. -Sending -------- +Build and Send a Model +---------------------- A model is placed in the database through the ``Client.set_model()`` function. While data types may differ, the function parameters @@ -372,14 +389,14 @@ documentation or the RedisAI documentation for a description of each parameter. .. note:: - With a Redis cluster configuration, ``Client.set_model()`` + With a clustered Redis backend configuration, ``Client.set_model()`` will distribute a copy of the model to each database node in the cluster. As a result, the model that has been placed in the cluster with ``Client.set_model()`` will not be addressable directly with the Redis CLI because of key manipulation that is required to accomplish this distribution. Despite the internal key - manipulation, models in a Redis cluster that have been + manipulation, models in a clustered Redis backend that have been set through the SmartRedis ``Client`` can be accessed and run through the SmartRedis ``Client`` API using the name provided to ``set_model()``. The user @@ -401,7 +418,7 @@ A model can be retrieved from the database using the type varies between languages, only the model name that was used with ``Client.set_model()`` is needed to reference the model in the database. Note that -in a Redis cluster configuration, only one copy of the +in a clustered Redis backend configuration, only one copy of the model is returned to the user. .. note:: @@ -416,7 +433,7 @@ Executing A model can be executed using the ``Client.run_model()`` function. The only required inputs to execute a model are the model name, a list of input tensor names, and a list of output tensor names. -If using a Redis cluster configuration, a copy of the model +If using a clustered Redis backend configuration, a copy of the model referenced by the provided name will be chosen based on data locality. It is worth noting that the names of input and output tensors will be altered with ensemble member identifications if the SmartSim @@ -464,13 +481,15 @@ via ``Client.set_model_multigpu()``. it must have been set via ``Client.set_model_multigpu()``. The ``first_gpu`` and ``num_gpus`` parameters must be constant across both calls. +.. _data_structures_script: + Script ====== Data processing is an essential step in most machine learning workflows. For this reason, RedisAI provides the ability to evaluate PyTorch programs using the hardware -co-located with the Redis database (either CPU or GPU). +co-located with the back-end database (either CPU or GPU). The SmartRedis ``Client`` provides functions for users to place a script in the database, retrieve a script from the database, and run a script. @@ -493,14 +512,14 @@ need to be provided by the user. const std::string_view& script); .. note:: - With a Redis cluster configuration, ``Client.set_script()`` + With a clustered Redis backend configuration, ``Client.set_script()`` will distribute a copy of the script to each database node in the cluster. As a result, the script that has been placed in the cluster with ``Client.set_script()`` will not be addressable directly with the Redis CLI because of key manipulation that is required to accomplish this distribution. Despite the internal key - manipulation, scripts in a Redis cluster that have been + manipulation, scripts in a clustered Redis backend that have been set through the SmartRedis ``Client`` can be accessed and run through the SmartRedis ``Client`` API using the name provided to ``set_script()``. The user @@ -522,7 +541,7 @@ A script can be retrieved from the database using the type varies between languages, only the script name that was used with ``Client.set_script()`` is needed to reference the script in the database. Note that -in a Redis cluster configuration, only one copy of the +in a clustered Redis backend configuration, only one copy of the script is returned to the user. .. note:: @@ -538,7 +557,7 @@ A script can be executed using the ``Client.run_script()`` function. The only required inputs to execute a script are the script name, the name of the function in the script to execute, a list of input tensor names, and a list of output tensor names. -If using a Redis cluster configuration, a copy of the script +If using a clustered Redis backend configuration, a copy of the script referenced by the provided name will be chosen based on data locality. It is worth noting that the names of input and output tensors will be altered with ensemble member identifications if the SmartSim @@ -583,4 +602,4 @@ via ``Client.set_script_multigpu()``. In order for a script to be executed via ``Client.run_script_multigpu()``, or deleted via ``Client.delete_script_multigpu()``, it must have been set via ``Client.set_script_multigpu()``. The - ``first_gpu`` and ``num_gpus`` parameters must be constant across both calls. + ``first_gpu`` and ``num_gpus`` parameters must be constant across both calls. \ No newline at end of file diff --git a/doc/examples/cpp_api_examples.rst b/doc/examples/cpp_api_examples.rst index cdaa8fd78..39e75e5d7 100644 --- a/doc/examples/cpp_api_examples.rst +++ b/doc/examples/cpp_api_examples.rst @@ -33,7 +33,10 @@ SmartRedis C++ client API. DataSets ======== -The C++ client can store and retrieve tensors and metadata in datasets. +The C++ ``Client`` API stores and retrieve datasets from the Redis database. The C++ +``DataSet`` API can store and retrieve tensors and metadata from an in-memory ``DataSet`` object. +To reiterate, the actual interaction with the redis database, +where a snapshot of the ``DataSet`` object is sent, is handled by the Client API. For further information about datasets, please refer to the :ref:`Dataset section of the Data Structures documentation page `. @@ -97,4 +100,4 @@ source code is also shown. .. literalinclude:: ../../examples/common/mnist_data/data_processing_script.txt :linenos: :language: Python - :lines: 15-20 + :lines: 15-20 \ No newline at end of file diff --git a/doc/examples/fortran_api_examples.rst b/doc/examples/fortran_api_examples.rst index e25537547..3693951be 100644 --- a/doc/examples/fortran_api_examples.rst +++ b/doc/examples/fortran_api_examples.rst @@ -107,9 +107,13 @@ into a different array. Datasets ======== -The following code snippet shows how to use the Fortran -Client to store and retrieve dataset tensors and -dataset metadata scalars. +The Fortran ``Client`` API stores and retrieve datasets from the Redis database. The Fortran +``DataSet`` API can store and retrieve tensors and metadata from an in-memory ``DataSet`` object. +To reiterate, the actual interaction with the redis database, +where a snapshot of the ``DataSet`` object is sent, is handled by the Client API. + +The code below shows how to store and retrieve tensors and metadata +which belong to a ``DataSet``. .. literalinclude:: ../../examples/serial/fortran/smartredis_dataset.F90 :linenos: @@ -382,4 +386,4 @@ Python Pre-Processing: .. literalinclude:: ../../examples/common/mnist_data/data_processing_script.txt :linenos: :language: Python - :lines: 15-20 + :lines: 15-20 \ No newline at end of file diff --git a/doc/examples/python_api_examples.rst b/doc/examples/python_api_examples.rst index c1c154ed4..0bc07d25e 100644 --- a/doc/examples/python_api_examples.rst +++ b/doc/examples/python_api_examples.rst @@ -37,7 +37,10 @@ and do not require any other data types. Datasets ======== -The Python client can store and retrieve tensors and metadata in datasets. +The Python ``Client`` API stores and retrieve datasets from the Redis database. The Python +``DataSet`` API can store and retrieve tensors and metadata from an in-memory ``DataSet`` object. +To reiterate, the actual interaction with the redis database, +where a snapshot of the ``DataSet`` object is sent, is handled by the Client API. For further information about datasets, please refer to the :ref:`Dataset section of the Data Structures documentation page `. @@ -92,4 +95,4 @@ looks like this: .. literalinclude:: ../../examples/serial/python/data_processing_script.txt :language: python - :linenos: + :linenos: \ No newline at end of file diff --git a/src/python/module/smartredis/dataset.py b/src/python/module/smartredis/dataset.py index a9be73b81..809bf9c05 100644 --- a/src/python/module/smartredis/dataset.py +++ b/src/python/module/smartredis/dataset.py @@ -59,15 +59,16 @@ def _data(self) -> PyDataset: @staticmethod def from_pybind(dataset: PyDataset) -> "Dataset": - """Initialize a Dataset object from - a PyDataset object + """Initialize a Dataset object from a PyDataset object + + Create a new Dataset object using the data and properties + of a PyDataset object as the initial values. :param dataset: The pybind PyDataset object to use for construction :type dataset: PyDataset - :return: The newly constructed Dataset from - the PyDataset - :rtype: Dataset + :return: The newly constructed Dataset object + :rtype: Dataset object """ typecheck(dataset, "dataset", PyDataset) new_dataset = Dataset(dataset.get_name()) @@ -96,9 +97,9 @@ def set_data(self, dataset: PyDataset) -> None: @exception_handler def add_tensor(self, name: str, data: np.ndarray) -> None: - """Add a named tensor to this dataset - - :param name: tensor name + """Add a named multi-dimensional data array (tensor) to this dataset + + :param name: name associated to the tensor data :type name: str :param data: tensor data :type data: np.ndarray @@ -112,7 +113,7 @@ def add_tensor(self, name: str, data: np.ndarray) -> None: def get_tensor(self, name: str) -> np.ndarray: """Get a tensor from the Dataset - :param name: name of the tensor to get + :param name: name of the tensor :type name: str :return: a numpy array of tensor data :rtype: np.ndarray @@ -131,16 +132,15 @@ def get_name(self) -> str: @exception_handler def add_meta_scalar(self, name: str, data: t.Union[int, float]) -> None: - """Add metadata scalar field (non-string) with value to the DataSet + """Add scalar (non-string) metadata to a field name if it exists; + otherwise, create and add - If the field does not exist, it will be created. - If the field exists, the value - will be appended to existing field. + If the field name exists, append the scalar metadata; otherwise, + create the field within the DataSet object and add the scalar metadata. - :param name: The name used to reference the metadata - field + :param name: The name used to reference the scalar metadata field :type name: str - :param data: a scalar + :param data: scalar metadata input :type data: int | float """ typecheck(name, "name", str) @@ -155,15 +155,14 @@ def add_meta_scalar(self, name: str, data: t.Union[int, float]) -> None: @exception_handler def add_meta_string(self, name: str, data: str) -> None: - """Add metadata string field with value to the DataSet + """Add string metadata to a field name if it exists; otherwise, create and add - If the field does not exist, it will be created - If the field exists the value will - be appended to existing field. + If the field name exists, append the string metadata; otherwise, + create the field within the DataSet object and add the string metadata. - :param name: The name used to reference the metadata field + :param name: The name used to reference the string metadata field :type name: str - :param data: The string to add to the field + :param data: string metadata input :type data: str """ typecheck(name, "name", str) @@ -172,10 +171,9 @@ def add_meta_string(self, name: str, data: str) -> None: @exception_handler def get_meta_scalars(self, name: str) -> t.Union[t.List[int], t.List[float]]: - """Get the metadata scalar field values from the DataSet + """Get the scalar values from the DataSet assigned to a field name - :param name: The name used to reference the metadata - field in the DataSet + :param name: The field name to retrieve from :type name: str :rtype: list[int] | list[float] """ @@ -184,10 +182,9 @@ def get_meta_scalars(self, name: str) -> t.Union[t.List[int], t.List[float]]: @exception_handler def get_meta_strings(self, name: str) -> t.List[str]: - """Get the metadata scalar field values from the DataSet + """Get the string values from the DataSet assigned to a field name - :param name: The name used to reference the metadata - field in the DataSet + :param name: The field name to retrieve from :type name: str :rtype: list[str] """ @@ -196,16 +193,16 @@ def get_meta_strings(self, name: str) -> t.List[str]: @exception_handler def get_metadata_field_names(self) -> t.List[str]: - """Get the names of all metadata scalars and strings from the DataSet + """Get all field names from the DataSet - :return: a list of metadata field names + :return: a list of all metadata field names :rtype: list[str] """ return self._data.get_metadata_field_names() @exception_handler def get_metadata_field_type(self, name: str) -> t.Type: - """Get the names of all metadata scalars and strings from the DataSet + """Get the type of metadata for a field name (scalar or string) :param name: The name used to reference the metadata field in the DataSet @@ -234,6 +231,9 @@ def get_tensor_type(self, name: str) -> t.Type: def get_tensor_names(self) -> t.List[str]: """Get the names of all tensors in the DataSet + Tensor names are used to assign a name to a tensor, + which is distinct from the names of fields. + :return: a list of tensor names :rtype: list[str] """ @@ -243,6 +243,8 @@ def get_tensor_names(self) -> t.List[str]: def get_tensor_dims(self, name: str) -> t.List[int]: """Get the dimensions of a tensor in the DataSet + :param name: name associated to the tensor data + :type name: str :return: a list of the tensor dimensions :rtype: list[int] """ From 2efc596f4acf91b7d5a334a306de413e0ac76f17 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Tue, 17 Oct 2023 18:19:35 -0500 Subject: [PATCH 2/4] pushing rst link edits --- doc/advanced_topics.rst | 4 ++-- doc/changelog.rst | 4 ++-- doc/clients/c-plus.rst | 12 ++++++------ doc/clients/c.rst | 12 ++++++------ doc/clients/fortran.rst | 12 ++++++------ doc/clients/python.rst | 12 ++++++------ doc/data_structures.rst | 10 +++++----- 7 files changed, 33 insertions(+), 33 deletions(-) diff --git a/doc/advanced_topics.rst b/doc/advanced_topics.rst index 50be73c7e..a84d090af 100644 --- a/doc/advanced_topics.rst +++ b/doc/advanced_topics.rst @@ -5,7 +5,7 @@ Advanced Topics This page of documentation is reserved for advanced topics that may not be needed for all users. -.. _advanced_topics_dataset_aggregation: +.. _advanced-topics-dataset-aggregation: Dataset Aggregation =================== @@ -125,7 +125,7 @@ lead to race conditions: // Delete an aggregation list void delete_list(const std::string& list_name); -.. _advanced_topics_dataset_aggregation: +.. _advanced-topics-dataset-aggregation: Multiple Database Support ========================= diff --git a/doc/changelog.rst b/doc/changelog.rst index 7491e48fb..e4ae328b2 100644 --- a/doc/changelog.rst +++ b/doc/changelog.rst @@ -24,7 +24,7 @@ Description Detailed Notes -- Updated docs to specify differences with SmartRedis APIs +- Updated docs to specify differences with SmartRedis APIs and data structures - Added tests to increase Python code coverage - Added a function to the DataSet class and added a test - Moved testing of examples to on-commit testing in CI/CD pipeline (PR412_) @@ -37,7 +37,7 @@ Detailed Notes - Create CONTRIBUTIONS.md file that points to the contribution guideline for both SmartSim and SmartRedis (PR395_) - Migrated to ConfigOptions-based Client construction, adding multiple database support (PR353_) -.. _PR +.. _PR416: https://github.com/CrayLabs/SmartRedis/pull/416 .. _PR414: https://github.com/CrayLabs/SmartRedis/pull/414 .. _PR411: https://github.com/CrayLabs/SmartRedis/pull/411 .. _PR412: https://github.com/CrayLabs/SmartRedis/pull/412 diff --git a/doc/clients/c-plus.rst b/doc/clients/c-plus.rst index 85d0260e4..c3417568e 100644 --- a/doc/clients/c-plus.rst +++ b/doc/clients/c-plus.rst @@ -3,7 +3,7 @@ C++ APIs ******** The following page provides a comprehensive overview of the SmartRedis C++ -APIs, which include the **Client API** and **Dataset API**. +APIs, which include the ``Client API`` and ``Dataset API``. Further explanation and details of each are presented below. Client API @@ -12,12 +12,12 @@ Client API The Client API is purpose-built for interaction with the back-end database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive -means for altering, transmitting, and receiving data within the in-memory +means for altering, transmitting, and receiving data within the backend database. More specifically, the Client API is responsible for both -creating and modifying data structures, which encompass :ref:`Models `, -:ref:`Scripts `, and :ref:`Tensors `. +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. It also handles the transmission and reception of -the aforementioned data structures in addition to :ref:`Dataset ` +the aforementioned data structures in addition to :ref:`Dataset ` data structure. Creating and modifying the ``DataSet`` object is confined to local operation by the DataSet API. @@ -35,7 +35,7 @@ and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely maintains the dataset object **in-memory**. The actual interaction with the Redis database, where a snapshot of the DataSet object is sent, is handled by the Client API. For more -information on the ``DataSet`` object, click :ref:`here `. +information on the ``DataSet`` object, click :ref:`here `. .. doxygenclass:: SmartRedis::DataSet :project: cpp_client diff --git a/doc/clients/c.rst b/doc/clients/c.rst index 403ee4ec6..50e70f699 100644 --- a/doc/clients/c.rst +++ b/doc/clients/c.rst @@ -3,7 +3,7 @@ C APIs ******* The following page provides a comprehensive overview of the SmartRedis C -APIs, which include the **Client API** and **Dataset API**. +APIs, which include the ``Client API`` and ``Dataset API``. Further explanation and details of each are presented below. Client API @@ -12,12 +12,12 @@ Client API The Client API is purpose-built for interaction with the back-end database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive -means for altering, transmitting, and receiving data within the in-memory +means for altering, transmitting, and receiving data within the backend database. More specifically, the Client API is responsible for both -creating and modifying data structures, which encompass :ref:`Models `, -:ref:`Scripts `, and :ref:`Tensors `. +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. It also handles the transmission and reception of -the aforementioned data structures in addition to :ref:`Dataset ` +the aforementioned data structures in addition to :ref:`Dataset ` data structure. Creating and modifying the ``DataSet`` object is confined to local operation by the DataSet API. @@ -33,7 +33,7 @@ and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely maintains the dataset object **in-memory**. The actual interaction with the Redis database, where a snapshot of the DataSet object is sent, is handled by the Client API. For more -information on the ``DataSet`` object, click :ref:`here `. +information on the ``DataSet`` object, click :ref:`here `. .. doxygenfile:: c_dataset.h :project: c_client diff --git a/doc/clients/fortran.rst b/doc/clients/fortran.rst index d22160b65..7c814fb11 100644 --- a/doc/clients/fortran.rst +++ b/doc/clients/fortran.rst @@ -3,7 +3,7 @@ Fortran APIs ************ The following page provides a comprehensive overview of the SmartRedis Fortran -APIs, which include the **Client API** and **Dataset API**. +APIs, which include the ``Client API`` and ``Dataset API``. Further explanation and details of each are presented below. Client API @@ -12,12 +12,12 @@ Client API The Client API is purpose-built for interaction with the back-end database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive -means for altering, transmitting, and receiving data within the in-memory +means for altering, transmitting, and receiving data within the backend database. More specifically, the Client API is responsible for both -creating and modifying data structures, which encompass :ref:`Models `, -:ref:`Scripts `, and :ref:`Tensors `. +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. It also handles the transmission and reception of -the aforementioned data structures in addition to :ref:`Dataset ` +the aforementioned data structures in addition to :ref:`Dataset ` data structure. Creating and modifying the ``DataSet`` object is confined to local operation by the DataSet API. @@ -37,7 +37,7 @@ and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely maintains the dataset object **in-memory**. The actual interaction with the Redis database, where a snapshot of the DataSet object is sent, is handled by the Client API. For more -information on the ``DataSet`` object, click :ref:`here `. +information on the ``DataSet`` object, click :ref:`here `. The following are overloaded interfaces which support 32/64-bit ``real`` and 8, 16, 32, and 64-bit diff --git a/doc/clients/python.rst b/doc/clients/python.rst index d0bf04e05..06ca3f977 100644 --- a/doc/clients/python.rst +++ b/doc/clients/python.rst @@ -3,7 +3,7 @@ Python APIs *********** The following page provides a comprehensive overview of the SmartRedis Python -APIs, which include the **Client API**, **Dataset API**, and **Logging API**. +APIs, which include the ``Client API``, ``Dataset API``, and ``Logging API``. Further explanation and details of each are presented below. Client API @@ -12,12 +12,12 @@ Client API The Client API is purpose-built for interaction with the back-end database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive -means for altering, transmitting, and receiving data within the in-memory +means for altering, transmitting, and receiving data within the backend database. More specifically, the Client API is responsible for both -creating and modifying data structures, which encompass :ref:`Models `, -:ref:`Scripts `, and :ref:`Tensors `. +creating and modifying data structures, which encompass :ref:`Models `, +:ref:`Scripts `, and :ref:`Tensors `. It also handles the transmission and reception of -the aforementioned data structures in addition to :ref:`Dataset ` +the aforementioned data structures in addition to :ref:`Dataset ` data structure. Creating and modifying the ``DataSet`` object is confined to local operation by the DataSet API. @@ -109,7 +109,7 @@ and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely maintains the dataset object **in-memory**. The actual interaction with the Redis database, where a snapshot of the DataSet object is sent, is handled by the Client API. For more -information on the ``DataSet`` object, click :ref:`here `. +information on the ``DataSet`` object, click :ref:`here `. Dataset Class Method Overview ----------------------------- diff --git a/doc/data_structures.rst b/doc/data_structures.rst index dcc13523c..3c2318164 100644 --- a/doc/data_structures.rst +++ b/doc/data_structures.rst @@ -22,7 +22,7 @@ the C and Fortran SmartRedis APIs, please consult the respective documentation pages. -.. _data_structures_tensor: +.. _data-structures-tensor: Tensor ====== @@ -174,7 +174,7 @@ Note that all of the client ``get_tensor()`` functions will internally modify the provided tensor name if the client is being used with SmartSim ensemble capabilities. -.. _data_structures_dataset: +.. _data-structures-dataset: Dataset ======= @@ -344,9 +344,9 @@ Aggregating SmartRedis also supports an advanced API for working with aggregate lists of DataSets; details may be found -:ref:`here <_advanced_topics_dataset_aggregation>`. +:ref:`here `. -.. _data_structures_model: +.. _data-structures-model: Model ===== @@ -481,7 +481,7 @@ via ``Client.set_model_multigpu()``. it must have been set via ``Client.set_model_multigpu()``. The ``first_gpu`` and ``num_gpus`` parameters must be constant across both calls. -.. _data_structures_script: +.. _data-structures-script: Script ====== From eee853273a376f6f5445301e57907c104763644b Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 18 Oct 2023 12:29:31 -0500 Subject: [PATCH 3/4] addressing bills comments --- doc/changelog.rst | 6 +-- doc/clients/c-plus.rst | 6 +-- doc/clients/c.rst | 8 ++-- doc/clients/fortran.rst | 8 ++-- doc/clients/python.rst | 8 ++-- doc/data_structures.rst | 51 +++++++++++-------------- doc/examples/cpp_api_examples.rst | 8 ++-- doc/examples/fortran_api_examples.rst | 16 ++++---- doc/examples/python_api_examples.rst | 10 ++--- src/python/module/smartredis/dataset.py | 7 +--- 10 files changed, 58 insertions(+), 70 deletions(-) diff --git a/doc/changelog.rst b/doc/changelog.rst index b49bceb35..25ffc9e45 100644 --- a/doc/changelog.rst +++ b/doc/changelog.rst @@ -8,7 +8,7 @@ To be released at some future point in time Description -- Updated docs for Client and Dataset APIs +- Updated Client and Dataset documentation - Expanded list of allowed characters in the SSDB address - Added coverage to SmartRedis Python API functions - Added name retrieval function to the DataSet object @@ -25,9 +25,7 @@ Description Detailed Notes -- Updated docs to specify differences with SmartRedis APIs and data structures -- Added tests to increase Python code coverage -- Added a function to the DataSet class and added a test +- Updated the Client and Dataset API documentation to clarify which interacts with the backend db (PR416_) - The SSDB address can now include '-' and '_' as special characters in the name. This gives users more options for naming the UDS socket file (PR415_) - Added tests to increase Python code coverage (PR414_) - Moved testing of examples to on-commit testing in CI/CD pipeline (PR412_) diff --git a/doc/clients/c-plus.rst b/doc/clients/c-plus.rst index c3417568e..67844c8db 100644 --- a/doc/clients/c-plus.rst +++ b/doc/clients/c-plus.rst @@ -3,13 +3,13 @@ C++ APIs ******** The following page provides a comprehensive overview of the SmartRedis C++ -APIs, which include the ``Client API`` and ``Dataset API``. +Client and Dataset APIs. Further explanation and details of each are presented below. Client API ========== -The Client API is purpose-built for interaction with the back-end database, +The Client API is purpose-built for interaction with the backend database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive means for altering, transmitting, and receiving data within the backend @@ -33,7 +33,7 @@ Dataset API The C++ DataSet API enables a user to manage a group of tensors and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely -maintains the dataset object **in-memory**. The actual interaction with the Redis database, +maintains the dataset object in-memory. The actual interaction with the Redis database, where a snapshot of the DataSet object is sent, is handled by the Client API. For more information on the ``DataSet`` object, click :ref:`here `. diff --git a/doc/clients/c.rst b/doc/clients/c.rst index 50e70f699..91b3ebb29 100644 --- a/doc/clients/c.rst +++ b/doc/clients/c.rst @@ -3,13 +3,13 @@ C APIs ******* The following page provides a comprehensive overview of the SmartRedis C -APIs, which include the ``Client API`` and ``Dataset API``. +Client and Dataset APIs. Further explanation and details of each are presented below. Client API ========== -The Client API is purpose-built for interaction with the back-end database, +The Client API is purpose-built for interaction with the backend database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive means for altering, transmitting, and receiving data within the backend @@ -28,10 +28,10 @@ is confined to local operation by the DataSet API. Dataset API =========== -The C++ DataSet API enables a user to manage a group of tensors +The C DataSet API enables a user to manage a group of tensors and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely -maintains the dataset object **in-memory**. The actual interaction with the Redis database, +maintains the dataset object in-memory. The actual interaction with the Redis database, where a snapshot of the DataSet object is sent, is handled by the Client API. For more information on the ``DataSet`` object, click :ref:`here `. diff --git a/doc/clients/fortran.rst b/doc/clients/fortran.rst index 7c814fb11..0a31c2a3f 100644 --- a/doc/clients/fortran.rst +++ b/doc/clients/fortran.rst @@ -3,13 +3,13 @@ Fortran APIs ************ The following page provides a comprehensive overview of the SmartRedis Fortran -APIs, which include the ``Client API`` and ``Dataset API``. +Client and Dataset APIs. Further explanation and details of each are presented below. Client API ========== -The Client API is purpose-built for interaction with the back-end database, +The Client API is purpose-built for interaction with the backend database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive means for altering, transmitting, and receiving data within the backend @@ -32,10 +32,10 @@ The following are overloaded interfaces which support Dataset API =========== -The C++ DataSet API enables a user to manage a group of tensors +The Fortran DataSet API enables a user to manage a group of tensors and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely -maintains the dataset object **in-memory**. The actual interaction with the Redis database, +maintains the dataset object in-memory. The actual interaction with the Redis database, where a snapshot of the DataSet object is sent, is handled by the Client API. For more information on the ``DataSet`` object, click :ref:`here `. diff --git a/doc/clients/python.rst b/doc/clients/python.rst index 06ca3f977..442037535 100644 --- a/doc/clients/python.rst +++ b/doc/clients/python.rst @@ -3,13 +3,13 @@ Python APIs *********** The following page provides a comprehensive overview of the SmartRedis Python -APIs, which include the ``Client API``, ``Dataset API``, and ``Logging API``. +Client, DataSet and Logging APIs. Further explanation and details of each are presented below. Client API ========== -The Client API is purpose-built for interaction with the back-end database, +The Client API is purpose-built for interaction with the backend database, which extends the capabilities of the Redis in-memory data store. It's important to note that the SmartRedis Client API is the exclusive means for altering, transmitting, and receiving data within the backend @@ -24,8 +24,6 @@ is confined to local operation by the DataSet API. Client Class Method Overview ---------------------------- -Below is a short - .. currentmodule:: smartredis .. autosummary:: @@ -104,7 +102,7 @@ Client Class Method Detailed View DataSet API =========== -The C++ DataSet API enables a user to manage a group of tensors +The Python DataSet API enables a user to manage a group of tensors and associated metadata within a datastructure called a ``DataSet`` object. The DataSet API operates independently of the database and solely maintains the dataset object **in-memory**. The actual interaction with the Redis database, diff --git a/doc/data_structures.rst b/doc/data_structures.rst index 3c2318164..84a23369d 100644 --- a/doc/data_structures.rst +++ b/doc/data_structures.rst @@ -2,7 +2,7 @@ Data Structures *************** -SmartSim defines three data structures designed for use within back-end databases: +SmartSim defines primary three data structures designed for use within backend databases: * ``Tensor`` : represents an n-dimensional array of values. * ``Model`` : represents a computational ML model for one of the supported backend frameworks. @@ -17,9 +17,8 @@ along with relevant insights on performance and best practices. We illustrate concepts and capabilities of the Python and C++ SmartRedis APIs. The C and Fortran function signatures closely mirror the C++ API, and for brevity, we won't delve -into the two extensively. For more comprehensive explanations of -the C and Fortran SmartRedis APIs, please consult the respective documentation -pages. +into them. For full discussion of the C and Fortran APIs, +please refer to their respective documentation pages. .. _data-structures-tensor: @@ -29,7 +28,7 @@ Tensor An n-dimensional tensor is used by RedisAI to store and manipulate numerical data. SmartRedis provides functions to -put a key and tensor pair into the back-end database and retrieve +put a key and tensor pair into the backend database and retrieve a tensor associated with a key from the database. .. note:: @@ -90,7 +89,7 @@ Retrieving ---------- The C++, C, and Fortran clients provide two methods for retrieving -tensors from the back-end database. The first method is referred to +tensors from the backend database. The first method is referred to as *unpacking* a tensor. When a tensor is retrieved via ``unpack_tensor()``, the memory space to store the retrieved tensor data is provided by the user. This has the advantage @@ -180,12 +179,12 @@ Dataset ======= When dealing with multi-modal data or complex data sets, -you may have different types of tensors (e.g., images, text embeddings, +one may have different types of tensors (e.g., images, text embeddings, numerical data) and metadata for each data point. Grouping them into a collection represents each data point as a cohesive unit. The ``DataSet`` data structure provides this functionality to stage tensors and metadata **in-memory** via the ``DataSet API``. After the creation of a -``DataSet`` object, the grouped data can be efficiently stored in the back-end database +``DataSet`` object, the grouped data can be efficiently stored in the backend database by the ``Client API`` and subsequently retrieved using the assigned ``DataSet`` name. In the upcoming sections, we outline the process of building, sending, and retrieving a ``DataSet``. @@ -239,24 +238,23 @@ Build and Send a DataSet When building a ``DataSet`` object in-memory, a user can group various combinations of tensors and metadata that constrain to the supported data types in the table above. To illustrate, -to include a tensor in a ``DataSet`` object, use the ``DataSet.add_tensor()`` -function in a supported language. The SmartRedis DataSet API functions -are available in C, C++, Python, and Fortran. The DataSet API or ``DataSet.add_tensor()`` function, +tensors can be inserted into a ``dataset`` object via the ``Dataset.add_tensor()`` method. +The SmartRedis DataSet API functions +are available in C, C++, Python, and Fortran. The ``DataSet.add_tensor()`` function, operates independently of the database and solely -maintains the dataset object. Storing the dataset in the back-end +maintains the dataset object. Storing the dataset in the backend database is done via the Client API ``put_dataset()`` method. .. note:: The ``DataSet.add_tensor()`` function copies user-provided tensor data; this prevents potential issues arising from the user's data being cleared or deallocated. Any additional memory allocated - for this purpose will be released when the DataSet object is deleted + for this purpose will be released when the DataSet object is destroyed or no longer in use. Metadata can be added to an in-memory ``DataSet`` object with the ``DataSet.add_meta_scalar()`` and ``DataSet.add_meta_string()`` -functions. As indicated by the function names, distinct functions -exist for adding scalar metadata (e.g., double) and string metadata. +functions. Methods exist for adding scalar metadata (e.g., double) and string metadata. For both functions, the first input parameter is the name of the metadata field. The field name serves as an internal identifier within the ``DataSet`` @@ -265,8 +263,7 @@ Since it's an internal identifier, users don't need to be concerned about conflicts with keys in the database. In other words, multiple ``DataSet`` objects can use the same metadata field names without causing issues because these names are managed within the ``DataSet`` and won't -interfere with external database keys. To provide an implementation example, -the C++ interface for adding +interfere with external database keys. The C++ interface for adding metadata is shown below: .. code-block:: cpp @@ -290,19 +287,17 @@ and you don't need to retain or manage the original copies separately once they have been included in the ``DataSet`` object. Additionally, multiple metadata values can be added to a single field name, and the default behavior is to append the value to the -field name if it exists, create if not. This behavior allows the ``DataSet`` metadata -to function like one-dimensional arrays. However, if you would like to add -multiple metadata values to a field name, you will need to add them one by -one in an iterative manner. +field name (creating the field if not already present). This behavior allows the ``DataSet`` metadata +to function like one-dimensional arrays. Also, note that in the above C++ example, the metadata scalar type must be specified with a -``SRMetaDataType`` enum value, and similar +``SRMetaDataType`` enum value; similar requirements exist for C and Fortran ``DataSet`` implementations. Finally, the ``DataSet`` object is sent to the database using the ``Client.put_dataset()`` function, which is uniform across all clients. -To emphasize once more, all interactions with the back-end database are handle by +To emphasize once more, all interactions with the backend database are handle by the Client API, not the DataSet API. @@ -314,8 +309,8 @@ function call to ``Client.get_dataset()``, which requires only the name of the ``DataSet`` (i.e. the name used in the constructor of the ``DataSet`` when it was built and placed in the database by the Client API). ``Client.get_dataset()`` -returns to the user a ``DataSet`` object or a pointer to a -``DataSet`` object from the database that is used to access all of the +returns to the user a ``DataSet`` object (in C, a pointer to a +``DataSet`` object) from the database that is used to access all of the dataset tensors and metadata. The functions for retrieving tensors from an in-memory ``DataSet`` object @@ -325,8 +320,8 @@ paradigm is followed. As a result, please refer to the previous section for details on tensor retrieve function calls. -There are two functions for retrieving metadata from a ``DataSet`` object in-memory: -``get_meta_scalars()`` and ``get_meta_strings()``. +There are four functions for retrieving metadata from a ``DataSet`` object in-memory: +``get_meta_scalars()``, ``get_meta_strings()``, ``get_metadata_field_names()`` and ``get_metadata_field_type()``. As the names suggest, the first function is used for retrieving numerical metadata values, and the second is for retrieving metadata string @@ -489,7 +484,7 @@ Script Data processing is an essential step in most machine learning workflows. For this reason, RedisAI provides the ability to evaluate PyTorch programs using the hardware -co-located with the back-end database (either CPU or GPU). +co-located with the backend database (either CPU or GPU). The SmartRedis ``Client`` provides functions for users to place a script in the database, retrieve a script from the database, and run a script. diff --git a/doc/examples/cpp_api_examples.rst b/doc/examples/cpp_api_examples.rst index 39e75e5d7..a24efa003 100644 --- a/doc/examples/cpp_api_examples.rst +++ b/doc/examples/cpp_api_examples.rst @@ -17,8 +17,8 @@ SmartRedis ``DataSet`` API is also provided. .. note:: The C++ API examples are written - to connect to a Redis cluster database. Update the - ``Client`` constructor call to connect to a Redis non-cluster database. + to connect to a clustered backend database. Update the + ``Client`` constructor call to connect to a non-clustered backend database. Tensors ======= @@ -33,9 +33,9 @@ SmartRedis C++ client API. DataSets ======== -The C++ ``Client`` API stores and retrieve datasets from the Redis database. The C++ +The C++ ``Client`` API stores and retrieve datasets from the backend database. The C++ ``DataSet`` API can store and retrieve tensors and metadata from an in-memory ``DataSet`` object. -To reiterate, the actual interaction with the redis database, +To reiterate, the actual interaction with the backend database, where a snapshot of the ``DataSet`` object is sent, is handled by the Client API. For further information about datasets, please refer to the :ref:`Dataset section of the Data Structures documentation page `. diff --git a/doc/examples/fortran_api_examples.rst b/doc/examples/fortran_api_examples.rst index 3693951be..9011db3c0 100644 --- a/doc/examples/fortran_api_examples.rst +++ b/doc/examples/fortran_api_examples.rst @@ -13,19 +13,19 @@ SmartRedis ``DataSet`` API is also provided. .. note:: The Fortran API examples rely on the ``SSDB`` environment - variable being set to the address and port of the Redis database. + variable being set to the address and port of the backend database. .. note:: The Fortran API examples are written - to connect to a Redis cluster database. Update the - ``Client`` constructor call to connect to a non-cluster Redis instance. + to connect to a clustered backend database. Update the + ``Client`` constructor call to connect to a non-cluster backend instance. Tensors ======= The SmartRedis Fortran client is used to communicate between -a Fortran client and the Redis database. In this example, +a Fortran client and the backend database. In this example, the client will be used to send an array to the database and then unpack the data into another Fortran array. @@ -107,9 +107,9 @@ into a different array. Datasets ======== -The Fortran ``Client`` API stores and retrieve datasets from the Redis database. The Fortran +The Fortran ``Client`` API stores and retrieve datasets from the backend database. The Fortran ``DataSet`` API can store and retrieve tensors and metadata from an in-memory ``DataSet`` object. -To reiterate, the actual interaction with the redis database, +To reiterate, the actual interaction with the backend database, where a snapshot of the ``DataSet`` object is sent, is handled by the Client API. The code below shows how to store and retrieve tensors and metadata @@ -304,14 +304,14 @@ constructed by including a suffix based on MPI tasks. The subroutine, in place of an actual simulation, next generates an array of random numbers and puts this array -into the Redis database. +into the backend database. .. code-block:: Fortran call random_number(array) call client%put_tensor(in_key, array, shape(array)) -The Redis database can now be called to run preprocessing +The backend database can now be called to run preprocessing scripts on these data. .. code-block:: Fortran diff --git a/doc/examples/python_api_examples.rst b/doc/examples/python_api_examples.rst index 0bc07d25e..1adf575ab 100644 --- a/doc/examples/python_api_examples.rst +++ b/doc/examples/python_api_examples.rst @@ -18,13 +18,13 @@ SmartRedis ``DataSet`` API is also provided. .. note:: The Python API examples are written - to connect to a Redis cluster database. Update the - ``Client`` constructor call to connect to a Redis non-cluster database. + to connect to a clustered backend database. Update the + ``Client`` constructor call to connect to a non-clustered backend database. Tensors ======= The Python client has the ability to send and receive tensors from -the Redis database. The tensors are stored in the Redis database +the backend database. The tensors are stored in the backend database as RedisAI data structures. Additionally, Python client API functions involving tensor data are compatible with Numpy arrays and do not require any other data types. @@ -37,9 +37,9 @@ and do not require any other data types. Datasets ======== -The Python ``Client`` API stores and retrieve datasets from the Redis database. The Python +The Python ``Client`` API stores and retrieve datasets from the backend database. The Python ``DataSet`` API can store and retrieve tensors and metadata from an in-memory ``DataSet`` object. -To reiterate, the actual interaction with the redis database, +To reiterate, the actual interaction with the backend database, where a snapshot of the ``DataSet`` object is sent, is handled by the Client API. For further information about datasets, please refer to the :ref:`Dataset section of the Data Structures documentation page `. diff --git a/src/python/module/smartredis/dataset.py b/src/python/module/smartredis/dataset.py index 809bf9c05..0816d11c8 100644 --- a/src/python/module/smartredis/dataset.py +++ b/src/python/module/smartredis/dataset.py @@ -98,7 +98,7 @@ def set_data(self, dataset: PyDataset) -> None: @exception_handler def add_tensor(self, name: str, data: np.ndarray) -> None: """Add a named multi-dimensional data array (tensor) to this dataset - + :param name: name associated to the tensor data :type name: str :param data: tensor data @@ -113,7 +113,7 @@ def add_tensor(self, name: str, data: np.ndarray) -> None: def get_tensor(self, name: str) -> np.ndarray: """Get a tensor from the Dataset - :param name: name of the tensor + :param name: name of the tensor to get :type name: str :return: a numpy array of tensor data :rtype: np.ndarray @@ -231,9 +231,6 @@ def get_tensor_type(self, name: str) -> t.Type: def get_tensor_names(self) -> t.List[str]: """Get the names of all tensors in the DataSet - Tensor names are used to assign a name to a tensor, - which is distinct from the names of fields. - :return: a list of tensor names :rtype: list[str] """ From a282c3fcc61127b96235606a5315d811e88d56de Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Thu, 19 Oct 2023 14:04:47 -0500 Subject: [PATCH 4/4] addressing Bills comments --- doc/data_structures.rst | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/doc/data_structures.rst b/doc/data_structures.rst index 84a23369d..bd4441e86 100644 --- a/doc/data_structures.rst +++ b/doc/data_structures.rst @@ -183,7 +183,7 @@ one may have different types of tensors (e.g., images, text embeddings, numerical data) and metadata for each data point. Grouping them into a collection represents each data point as a cohesive unit. The ``DataSet`` data structure provides this functionality to stage tensors and metadata - **in-memory** via the ``DataSet API``. After the creation of a +in-memory via the ``DataSet API``. After the creation of a ``DataSet`` object, the grouped data can be efficiently stored in the backend database by the ``Client API`` and subsequently retrieved using the assigned ``DataSet`` name. In the upcoming sections, we outline the process of building, sending, and retrieving a ``DataSet``. @@ -249,8 +249,7 @@ database is done via the Client API ``put_dataset()`` method. The ``DataSet.add_tensor()`` function copies user-provided tensor data; this prevents potential issues arising from the user's data being cleared or deallocated. Any additional memory allocated - for this purpose will be released when the DataSet object is destroyed - or no longer in use. + for this purpose will be released when the DataSet object is destroyed. Metadata can be added to an in-memory ``DataSet`` object with the ``DataSet.add_meta_scalar()`` and ``DataSet.add_meta_string()`` @@ -320,12 +319,14 @@ paradigm is followed. As a result, please refer to the previous section for details on tensor retrieve function calls. -There are four functions for retrieving metadata from a ``DataSet`` object in-memory: -``get_meta_scalars()``, ``get_meta_strings()``, ``get_metadata_field_names()`` and ``get_metadata_field_type()``. -As the names suggest, the first function -is used for retrieving numerical metadata values, -and the second is for retrieving metadata string -values. The metadata retrieval function prototypes +There are four functions for retrieving metadata information from a ``DataSet`` object in-memory: +``get_meta_scalars()``, ``get_meta_strings()``, ``get_metadata_field_names()`` +and ``get_metadata_field_type()``. As the names suggest, the ``get_meta_scalars()`` function +is used for retrieving numerical metadata values, while the ``get_meta_strings()`` function +is for retrieving metadata string values. The ``get_metadata_field_names()`` function +retrieves a list of all metadata field names in the ``DataSet`` object. Lastly, +the ``get_metadata_field_type()`` function returns the type (scalar or string) of the metadata +attached to the specified field name. The metadata retrieval function prototypes vary across the clients based on programming language constraints, and as a result, please refer to the ``DataSet`` API documentation for a description of input parameters and memory management. It is