Tanuki · JackHopkins · Nov 27, 2023 · Nov 27, 2023 · Nov 27, 2023 · Nov 27, 2023
@@ -0,0 +1,13 @@
+# Support for embeddings  for RAG (Retrieval augmented generation)
+
+Support for getting embeddings for RAG use-cases have been implemented. The Open-AI ada-002 model is currently supported to get embeddings for input data. For embedding output the output typehint needs to be set as  `Embedding[np.ndarray]`. Currently adding align statements to steer embedding model behaviour is not implemented, but is on the roadmap. 
+
+
+## Example
+```python
+@tanuki.patch
+def score_sentiment(input: str) -> Embedding[np.ndarray]:
+    """
+    Scores the input between 0-10
+    """
+```
@@ -0,0 +1,41 @@
+# Function configurability
+
+The following optional arguments are currently supported for funcion configurability:
+* environment_id (int, default = 0): The environment id. Used for fetching correct finetuned models.
+* ignore_finetune_fetching (boolean, default = False): Whether to ignore fetching finetuned models. If set to True, during the first call Open-Ai will not be queried for finetuned models, which reduces initial startup latency.
+* ignore_finetuning (boolean, default = False): Whether to ignore finetuning the models altogether. If set to True the teacher model will always be used. The data is still saved however if in future the data would need to be used for finetuning.
+* ignore_data_storage (boolean, default = False): Whether to ignore storing the data. If set to True, the data will not be stored in the finetune dataset and the align statements will not be saved (align statements are still used for aligning outputs so model performance is not affected). This improves latency as communications with data storage is minimised.
+
+**NB** - Configurations can be sent in only to `@tanuki.patch` decorator using keyword arguments. If you have any additional configurability needs, feel free to open an issue or implement it yourself and open a PR
+
+## Examples
+
+### Default function
+```python
+@tanuki.patch
+def some_function(input: TypedInput) -> TypedOutput:
+    """(Optional) Include the description of how your function will be used."""
+
+@tanuki.align
+def test_some_function(example_typed_input: TypedInput, 
+                       example_typed_output: TypedOutput):
+
+    assert some_function(example_typed_input) == example_typed_output
+
+```
+### Function with configurations (fastest inferece latency)
+```python
+@tanuki.patch(environment_id = 1,
+              ignore_finetune_fetching = True,
+              ignore_finetuning = True,
+              ignore_data_storage = True)
+def some_function(input: TypedInput) -> TypedOutput:
+    """(Optional) Include the description of how your function will be used."""
+
+@tanuki.align
+def test_some_function(example_typed_input: TypedInput, 
+                       example_typed_output: TypedOutput):
+
+    assert some_function(example_typed_input) == example_typed_output
+
+```
@@ -2,6 +2,11 @@
 
 The easiest way to build scalable, LLM-powered functions and applications that get cheaper and faster the more you use them. 
 
+## Release
+[27/11] Support for [embeddings](https://github.com/monkeypatch/tanuki.py/blob/update_docs/docs/embeddings_support.md) and [function configurability](https://github.com/monkeypatch/tanuki.py/blob/update_docs/docs/function_configurability.md) is released!
+* Use embeddings to integrate Tanuki with downstream RAG implementations using OpenAI Ada-2 model.
+*  Function configurability allows to configure Tanuki function executions to ignore certain implemented aspects (finetuning, data-storage communications) for improved latency and serverless integrations.
+
 ## Contents
 
 <!-- TOC start (generated with https://github.com/derlin/bitdowntoc) -->
@@ -43,6 +48,7 @@ def test_some_function(example_typed_input: TypedInput,
 
 - **Easy and seamless integration** - Add LLM augmented functions to any workflow within seconds. Decorate a function stub with `@tanuki.patch` and optionally add type hints and docstrings to guide the execution. That’s it.
 - **Type aware** - Ensure that the outputs of the LLM adhere to the type constraints of the function (Python Base types, Pydantic classes, Literals, Generics etc) to guard against bugs or unexpected side-effects of using LLMs.
+- **RAG support** - Seamlessly get embedding outputs for downstream RAG (Retrieval Augmented Generation) implementations. Output embeddings can then be easily stored and used for relevant document retrieval to reduce cost & latency and improve performance on long-form content. 
 - **Aligned outputs** - LLMs are unreliable, which makes them difficult to use in place of classically programmed functions. Using simple assert statements in a function decorated with `@tanuki.align`, you can align the behaviour of your patched function to what you expect.
 - **Lower cost and latency** - Achieve up to 90% lower cost and 80% lower latency with increased usage. The package will take care of model training, MLOps and DataOps efforts to improve LLM capabilities through distillation.
 - **Batteries included** - No remote dependencies other than OpenAI. 
@@ -101,6 +107,9 @@ if __name__ == "__main__":
 ```
 
 <!-- TOC --><a name="how-it-works"></a>
+
+See [here](https://github.com/monkeypatch/tanuki.py/blob/update_docs/docs/function_configurability.md) for configuration options for patched Tanuki functions
+
 ## How It Works
 
 When you call a tanuki-patched function during development, an LLM in a n-shot configuration is invoked to generate the typed response. 
@@ -175,6 +184,7 @@ if __name__ == "__main__":
 
 To see more examples using Tanuki for different use cases (including how to integrate with FastAPI), have a look at [examples](https://github.com/monkeypatch/tanuki.py/tree/master/examples).
 
+For embedding outputs for RAG support, see [here](https://github.com/monkeypatch/tanuki.py/blob/update_docs/docs/embeddings_support.md)
 
 <!-- TOC --><a name="test-driven-alignment"></a>
 ## Test-Driven Alignment

@@ -256,7 +256,7 @@ def patch(patchable_func=None,
         patchable_func: The function to be patched, should be always set to none. This is used here to allow for keyword arguments or no arguments to be passed to the decorator
         environment_id (int): The environment id. Used for fetching correct finetuned models
         ignore_finetune_fetching (bool): Whether to ignore fetching finetuned models.
-            If set to False, during the first call openai will not be queried for finetuned models, which reduces initial startup latency
+            If set to True, during the first call openai will not be queried for finetuned models, which reduces initial startup latency
         ignore_finetuning (bool): Whether to ignore finetuning the models altogether. If set to True the teacher model will always be used.
             The data is still saved however if in future would need to use finetuning
         ignore_data_storage (bool): Whether to ignore storing the data.