diff --git a/sdk/python/foundation-models/system/inference/text-generation/falcon-safe-online-deployment.ipynb b/sdk/python/foundation-models/system/inference/text-generation/falcon-safe-online-deployment.ipynb index 01c2aea8aa3..6c3cff3094e 100644 --- a/sdk/python/foundation-models/system/inference/text-generation/falcon-safe-online-deployment.ipynb +++ b/sdk/python/foundation-models/system/inference/text-generation/falcon-safe-online-deployment.ipynb @@ -3,43 +3,37 @@ { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "# How to create an Azure AI Content Safety enabled Falcon online endpoint (Preview)\n", "### This notebook will walk you through the steps to create an __Azure AI Content Safety__ enabled __Falcon__ online endpoint.\n", "### This notebook is under preview\n", "### The steps are:\n", "1. Create an __Azure AI Content Safety__ resource for moderating the request from user and response from the __Falcon__ online endpoint.\n", - "2. Create a new __Azure AI Content Safety__ enabled __Falcon__ online endpoint with a custom score.py which will integrate with the __Azure AI Content Safety__ resource to moderate the response from the __Falcon__ model and the request from the user, but to make the custom score.py to successfully authenticated to the __Azure AI Content Safety__ resource, we have 2 options:\n", - " 1. __UAI__, recommended but more complex approach, is to create a User Assigned Identity (UAI) and assign appropriate roles to the UAI. Then, the custom score.py can obtain the access token of the UAI from the AAD server to access the Azure AI Content Safety resource. Use [this notebook](aacs-prepare-uai.ipynb) to create UAI account for step 3 below\n", - " 2. __Environment variable__, simpler but less secure approach, is to just pass the access key of the Azure AI Content Safety resource to the custom score.py via environment variable, then the custom score.py can use the key directly to access the Azure AI Content Safety resource, this option is less secure than the first option, if someone in your org has access to the endpoint, he/she can get the access key from the environment variable and use it to access the Azure AI Content Safety resource.\n", - " " - ] + "2. Create a new __Azure AI Content Safety__ enabled __Falcon__ online endpoint with a custom score.py which will integrate with the __Azure AI Content Safety__ resource to moderate the response from the __Falcon__ model and the request from the user, but to make the custom score.py to successfully authenticated to the __Azure AI Content Safety__ resource, is to create a User Assigned Identity (UAI) and assign appropriate roles to the UAI. Then, the custom score.py can obtain the access token of the UAI from the AAD server to access the Azure AI Content Safety resource. Use [this notebook](aacs-prepare-uai.ipynb) to create UAI account for step 3 below" + ], + "metadata": {} }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "### 1. Prerequisites\n", "#### 1.1 Check List:\n", "- [x] You have created a new Python virtual environment for this notebook.\n", "- [x] The identity you are using to execute this notebook(yourself or your VM) need to have the __Contributor__ role on the resource group where the AML Workspace your specified is located, because this notebook will create an Azure AI Content Safety resource using that identity." - ] + ], + "metadata": {} }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 1.2 Assign variables for the workspace and deployment" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "# The public registry name contains Falcon models\n", "registry_name = \"azureml\"\n", @@ -60,21 +54,25 @@ "\n", "# UAI to be used for endpoint if you choose to use UAI as authentication method\n", "uai_name = \"\" # default to \"aacs-uai\" in prepare uai notebook" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764185734 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 1.3 Install Dependencies(as needed)" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "# uncomment the following lines to install the required packages\n", "# %pip install azure-identity==1.13.0\n", @@ -82,21 +80,21 @@ "# %pip install azure-ai-ml==1.8.0\n", "# %pip install azure-mgmt-msi==7.0.0\n", "# %pip install azure-mgmt-authorization==3.0.0" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": {} }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 1.4 Get credential" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential\n", "\n", @@ -107,21 +105,25 @@ "except Exception as ex:\n", " # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work\n", " credential = InteractiveBrowserCredential()" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764189401 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 1.5 Configure workspace " - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "from azure.ai.ml import MLClient\n", "\n", @@ -141,25 +143,29 @@ "workspace = ml_client.workspace_name\n", "\n", "print(f\"Connected to workspace {workspace}\")" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764193416 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 1.6 Assign variables for Azure Content Safety\n", "Currently, Azure AI Content Safety is in a limited set of regions:\n", "\n", "\n", "__NOTE__: before you choose the region to deploy the Azure AI Content Safety, please be aware that your data will be transferred to the region you choose and by selecting a region outside your current location, you may be allowing the transmission of your data to regions outside your jurisdiction. It is important to note that data protection and privacy laws may vary between jurisdictions. Before proceeding, we strongly advise you to familiarize yourself with the local laws and regulations governing data transfer and ensure that you are legally permitted to transmit your data to an overseas location for processing. By continuing with the selection of a different region, you acknowledge that you have understood and accepted any potential risks associated with such data transmission. Please proceed with caution." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient\n", "\n", @@ -191,21 +197,25 @@ "print(\n", " f\"Choose a new Azure AI Content Safety resource in {aacs_location} with SKU {aacs_sku_name}\"\n", ")" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764197848 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "### 2. Create Azure AI Content Safety" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "from azure.mgmt.cognitiveservices.models import Account, Sku, AccountProperties\n", "\n", @@ -257,34 +267,34 @@ " f\"AACS name is {aacs.name}, use this name in UAI preparation notebook to create UAI.\"\n", ")\n", "print(f\"AACS endpoint is {aacs_endpoint}\")\n", - "print(f\"AACS ResourceId is {aacs_resource_id}\")\n", - "\n", - "aacs_access_key = acs_client.accounts.list_keys(\n", - " resource_group_name=resource_group, account_name=aacs.name\n", - ").key1" - ] + "print(f\"AACS ResourceId is {aacs_resource_id}\")" + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764200966 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "### 3. Create Azure AI Content Safety enabled Falcon online endpoint" - ] + ], + "metadata": {} }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 3.1 Check if Falcon model is available in the AML registry." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "reg_client = MLClient(\n", " credential,\n", @@ -305,21 +315,25 @@ " print(\n", " f\"Using model name: {curated_model.name}, version: {curated_model.version}, id: {curated_model.id} for inferencing\"\n", " )" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764228889 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 3.2 Check if UAI is used" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "uai_id = \"\"\n", "uai_client_id = \"\"\n", @@ -334,30 +348,34 @@ " uai_resource = msi_client.user_assigned_identities.get(resource_group, uai_name)\n", " uai_id = uai_resource.id\n", " uai_client_id = uai_resource.client_id" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764234090 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### 3.3 Create Falcon online endpoint\n", "This step may take a few minutes." - ] + ], + "metadata": {} }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### Create endpoint" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "from azure.ai.ml.entities import (\n", " ManagedOnlineEndpoint,\n", @@ -379,9 +397,7 @@ " identity=IdentityConfiguration(\n", " type=\"user_assigned\",\n", " user_assigned_identities=[ManagedIdentityConfiguration(resource_id=uai_id)],\n", - " )\n", - " if uai_id != \"\"\n", - " else None,\n", + " ),\n", " )\n", "\n", " # Trigger the endpoint creation\n", @@ -392,58 +408,63 @@ " raise RuntimeError(\n", " f\"Endpoint creation failed. Detailed Response:\\n{err}\"\n", " ) from err" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764241993 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "##### 3.4 Deploy Falcon model" - ] + ], + "metadata": {} }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "#### Create deployment" - ] + ], + "metadata": {} }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "Initialize deployment parameters" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "REQUEST_TIMEOUT_MS = 90000\n", - "CODE = \"./llama-files/score/default\"\n", - "SCORE = \"score.py\" # score.py is the entry script for the model\n", "\n", "deployment_env_vars = {\n", " \"CONTENT_SAFETY_ACCOUNT_NAME\": aacs_name,\n", " \"CONTENT_SAFETY_ENDPOINT\": aacs_endpoint,\n", - " \"CONTENT_SAFETY_KEY\": aacs_access_key if uai_client_id == \"\" else None,\n", " \"CONTENT_SAFETY_THRESHOLD\": content_severity_threshold,\n", " \"SUBSCRIPTION_ID\": subscription_id,\n", " \"RESOURCE_GROUP_NAME\": resource_group,\n", " \"UAI_CLIENT_ID\": uai_client_id,\n", "}" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706764245348 + } + } }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "from azure.ai.ml.entities import (\n", " CodeConfiguration,\n", @@ -452,14 +473,12 @@ " ProbeSettings,\n", ")\n", "\n", - "code_configuration = CodeConfiguration(code=CODE, scoring_script=SCORE)\n", "deployment = ManagedOnlineDeployment(\n", " name=deployment_name,\n", " endpoint_name=endpoint_name,\n", " model=curated_model.id,\n", " instance_type=sku_name,\n", " instance_count=1,\n", - " code_configuration=code_configuration,\n", " environment_variables=deployment_env_vars,\n", " request_settings=OnlineRequestSettings(request_timeout_ms=REQUEST_TIMEOUT_MS),\n", " liveness_probe=ProbeSettings(\n", @@ -484,21 +503,25 @@ " raise RuntimeError(\n", " f\"Deployment creation failed. Detailed Response:\\n{err}\"\n", " ) from err" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706766837699 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "### 4. Test the Safety Enabled Falcon online endpoint." - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "import os\n", "\n", @@ -506,21 +529,25 @@ "os.makedirs(test_src_dir, exist_ok=True)\n", "print(f\"test script directory: {test_src_dir}\")\n", "sample_data = os.path.join(test_src_dir, \"sample-request.json\")" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706766982789 + } + } }, { "attachments": {}, "cell_type": "markdown", - "metadata": {}, "source": [ "##### Choose request from following cells based on the scenario you want to test" - ] + ], + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "## Successful response\n", "\n", @@ -544,13 +571,17 @@ " },\n", " f,\n", " )" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": { + "gather": { + "logged": 1706766986877 + } + } }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "## Blocked request/response due to hateful content\n", "\n", @@ -573,41 +604,60 @@ " },\n", " f,\n", " )" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": {} }, { "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], "source": [ "ml_client.online_endpoints.invoke(\n", " endpoint_name=endpoint_name,\n", " deployment_name=deployment_name,\n", " request_file=sample_data,\n", ")" - ] + ], + "outputs": [], + "execution_count": null, + "metadata": {} } ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "name": "python310-sdkv2", "language": "python", - "name": "python3" + "display_name": "Python 3.10 - SDK v2" }, "language_info": { + "name": "python", + "version": "3.10.11", + "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.16" + "nbconvert_exporter": "python", + "file_extension": ".py" + }, + "microsoft": { + "ms_spell_check": { + "ms_spell_check_language": "en" + }, + "host": { + "AzureML": { + "notebookHasBeenCompleted": true + } + } + }, + "kernel_info": { + "name": "python310-sdkv2" + }, + "nteract": { + "version": "nteract-front-end@1.0.0" } }, "nbformat": 4, "nbformat_minor": 1 -} +} \ No newline at end of file diff --git a/sdk/python/foundation-models/system/inference/text-generation/llama-safe-online-deployment.ipynb b/sdk/python/foundation-models/system/inference/text-generation/llama-safe-online-deployment.ipynb index 297354f34a5..3617f669029 100644 --- a/sdk/python/foundation-models/system/inference/text-generation/llama-safe-online-deployment.ipynb +++ b/sdk/python/foundation-models/system/inference/text-generation/llama-safe-online-deployment.ipynb @@ -1,737 +1,740 @@ { - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# How to create an Azure AI Content Safety enabled Llama 2 online endpoint (Preview)\n", - "### This notebook will walk you through the steps to create an __Azure AI Content Safety__ enabled __Llama 2__ online endpoint.\n", - "### This notebook is under preview\n", - "### The steps are:\n", - "1. Create an __Azure AI Content Safety__ resource for moderating the request from user and response from the __Llama 2__ online endpoint.\n", - "2. Create a new __Azure AI Content Safety__ enabled __Llama 2__ online endpoint with a custom score.py which will integrate with the __Azure AI Content Safety__ resource to moderate the response from the __Llama 2__ model and the request from the user, but to make the custom score.py to successfully authenticated to the __Azure AI Content Safety__ resource, we have 2 options:\n", - " 1. __UAI__, recommended but more complex approach, is to create a User Assigned Identity (UAI) and assign appropriate roles to the UAI. Then, the custom score.py can obtain the access token of the UAI from the AAD server to access the Azure AI Content Safety resource. Use [this notebook](aacs-prepare-uai.ipynb) to create UAI account for step 3 below\n", - " 2. __Environment variable__, simpler but less secure approach, is to just pass the access key of the Azure AI Content Safety resource to the custom score.py via environment variable, then the custom score.py can use the key directly to access the Azure AI Content Safety resource, this option is less secure than the first option, if someone in your org has access to the endpoint, he/she can get the access key from the environment variable and use it to access the Azure AI Content Safety resource.\n", - " " - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 1. Prerequisites\n", - "#### 1.1 Check List:\n", - "- [x] You have created a new Python virtual environment for this notebook.\n", - "- [x] The identity you are using to execute this notebook(yourself or your VM) need to have the __Contributor__ role on the resource group where the AML Workspace your specified is located, because this notebook will create an Azure AI Content Safety resource using that identity." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 1.2 Assign variables for the workspace and deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The public registry name contains Llama 2 models\n", - "registry_name = \"azureml-meta\"\n", - "\n", - "# Name of the Llama 2 model to be deployed\n", - "# available_llama_models_text_generation = [\"Llama-2-7b\", \"Llama-2-13b\", \"Llama-2-70b\"]\n", - "# available_llama_models_chat_complete = [\"Llama-2-7b-chat\", \"Llama-2-13b-chat\", \"Llama-2-70b-chat\"]\n", - "model_name = \"Llama-2-7b\"\n", - "\n", - "endpoint_name = f\"{model_name}-test-ep\" # Replace with your endpoint name\n", - "deployment_name = \"llama\" # Replace with your deployment name, lower case only!!!\n", - "sku_name = \"Standard_NC24s_v3\" # Name of the sku(instance type) Check the model-list(can be found in the parent folder(inference)) to get the most optimal sku for your model (Default: Standard_DS2_v2)\n", - "\n", - "# The severity level that will trigger response be blocked\n", - "# Please reference Azure AI content documentation for more details\n", - "# https://learn.microsoft.com/en-us/azure/cognitive-services/content-safety/concepts/harm-categories\n", - "content_severity_threshold = \"2\"\n", - "\n", - "# UAI to be used for endpoint if you choose to use UAI as authentication method\n", - "uai_name = \"\" # default to \"aacs-uai\" in prepare uai notebook" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 1.3 Install Dependencies(as needed)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# uncomment the following lines to install the required packages\n", - "# %pip install azure-identity==1.13.0\n", - "# %pip install azure-mgmt-cognitiveservices==13.4.0\n", - "# %pip install azure-ai-ml==1.8.0\n", - "# %pip install azure-mgmt-msi==7.0.0\n", - "# %pip install azure-mgmt-authorization==3.0.0" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 1.4 Get credential" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential\n", - "\n", - "try:\n", - " credential = DefaultAzureCredential()\n", - " # Check if given credential can get token successfully.\n", - " credential.get_token(\"https://management.azure.com/.default\")\n", - "except Exception as ex:\n", - " # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work\n", - " credential = InteractiveBrowserCredential()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 1.5 Configure workspace " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azure.ai.ml import MLClient\n", - "\n", - "try:\n", - " ml_client = MLClient.from_config(credential=credential)\n", - "except Exception as ex:\n", - " # enter details of your AML workspace\n", - " subscription_id = \"\"\n", - " resource_group = \"\"\n", - " workspace = \"\"\n", - "\n", - " # get a handle to the workspace\n", - " ml_client = MLClient(credential, subscription_id, resource_group, workspace)\n", - "\n", - "subscription_id = ml_client.subscription_id\n", - "resource_group = ml_client.resource_group_name\n", - "workspace = ml_client.workspace_name\n", - "\n", - "print(f\"Connected to workspace {workspace}\")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 1.6 Assign variables for Azure Content Safety\n", - "Currently, Azure AI Content Safety is in a limited set of regions:\n", - "\n", - "\n", - "__NOTE__: before you choose the region to deploy the Azure AI Content Safety, please be aware that your data will be transferred to the region you choose and by selecting a region outside your current location, you may be allowing the transmission of your data to regions outside your jurisdiction. It is important to note that data protection and privacy laws may vary between jurisdictions. Before proceeding, we strongly advise you to familiarize yourself with the local laws and regulations governing data transfer and ensure that you are legally permitted to transmit your data to an overseas location for processing. By continuing with the selection of a different region, you acknowledge that you have understood and accepted any potential risks associated with such data transmission. Please proceed with caution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient\n", - "\n", - "acs_client = CognitiveServicesManagementClient(credential, subscription_id)\n", - "\n", - "\n", - "# settings for the Azure AI Content Safety resource\n", - "# we will choose existing AACS resource if it exists, otherwise create a new one\n", - "# name of azure ai content safety resource, has to be unique\n", - "import time\n", - "\n", - "aacs_name = f\"{endpoint_name}-aacs-{str(time.time()).replace('.','')}\"\n", - "available_aacs_locations = [\"east us\", \"west europe\"]\n", - "\n", - "# create a new Cognitive Services Account\n", - "kind = \"ContentSafety\"\n", - "aacs_sku_name = \"S0\"\n", - "aacs_location = available_aacs_locations[0]\n", - "\n", - "\n", - "print(\"Available SKUs:\")\n", - "aacs_skus = acs_client.resource_skus.list()\n", - "print(\"SKU Name\\tSKU Tier\\tLocations\")\n", - "for sku in aacs_skus:\n", - " if sku.kind == \"ContentSafety\":\n", - " locations = \",\".join(sku.locations)\n", - " print(sku.name + \"\\t\" + sku.tier + \"\\t\" + locations)\n", - "\n", - "print(\n", - " f\"Choose a new Azure AI Content Safety resource in {aacs_location} with SKU {aacs_sku_name}\"\n", - ")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 2. Create Azure AI Content Safety" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azure.mgmt.cognitiveservices.models import Account, Sku, AccountProperties\n", - "\n", - "\n", - "parameters = Account(\n", - " sku=Sku(name=aacs_sku_name),\n", - " kind=kind,\n", - " location=aacs_location,\n", - " properties=AccountProperties(\n", - " custom_sub_domain_name=aacs_name, public_network_access=\"Enabled\"\n", - " ),\n", - ")\n", - "# How many seconds to wait between checking the status of an async operation.\n", - "wait_time = 10\n", - "\n", - "\n", - "def find_acs(accounts):\n", - " return next(\n", - " x\n", - " for x in accounts\n", - " if x.kind == \"ContentSafety\"\n", - " and x.location == aacs_location\n", - " and x.sku.name == aacs_sku_name\n", - " )\n", - "\n", - "\n", - "try:\n", - " # check if AACS exists\n", - " aacs = acs_client.accounts.get(resource_group, aacs_name)\n", - " print(f\"Found existing Azure AI content safety Account {aacs.name}.\")\n", - "except:\n", - " try:\n", - " # check if there is an existing AACS resource within same resource group\n", - " aacs = find_acs(acs_client.accounts.list_by_resource_group(resource_group))\n", - " print(\n", - " f\"Found existing Azure AI content safety Account {aacs.name} in resource group {resource_group}.\"\n", - " )\n", - " except:\n", - " print(f\"Creating Azure AI content safety Account {aacs_name}.\")\n", - " acs_client.accounts.begin_create(resource_group, aacs_name, parameters).wait()\n", - " print(\"Resource created.\")\n", - " aacs = acs_client.accounts.get(resource_group, aacs_name)\n", - "\n", - "\n", - "aacs_endpoint = aacs.properties.endpoint\n", - "aacs_resource_id = aacs.id\n", - "aacs_name = aacs.name\n", - "print(\n", - " f\"AACS name is {aacs.name}, use this name in UAI preparation notebook to create UAI.\"\n", - ")\n", - "print(f\"AACS endpoint is {aacs_endpoint}\")\n", - "print(f\"AACS ResourceId is {aacs_resource_id}\")\n", - "\n", - "aacs_access_key = acs_client.accounts.list_keys(\n", - " resource_group_name=resource_group, account_name=aacs.name\n", - ").key1" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 3. Create Azure AI Content Safety enabled Llama 2 online endpoint" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 3.1 Check if Llama 2 model is available in the AML registry." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "reg_client = MLClient(\n", - " credential,\n", - " subscription_id=subscription_id,\n", - " resource_group_name=resource_group,\n", - " registry_name=registry_name,\n", - ")\n", - "version_list = list(\n", - " reg_client.models.list(model_name)\n", - ") # list available versions of the model\n", - "llama_model = None\n", - "\n", - "# If specific inference environments are tagged for the model\n", - "inference_envs_exist = False\n", - "\n", - "if len(version_list) == 0:\n", - " raise Exception(f\"No model named {model_name} found in registry\")\n", - "else:\n", - " model_version = version_list[0].version\n", - " llama_model = reg_client.models.get(model_name, model_version)\n", - " if (\n", - " \"inference_supported_envs\" in llama_model.tags\n", - " and len(llama_model.tags[\"inference_supported_envs\"]) >= 1\n", - " ):\n", - " inference_envs_exist = True\n", - " print(\n", - " f\"Using model name: {llama_model.name}, version: {llama_model.version}, id: {llama_model.id} for inferencing\"\n", - " )" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 3.2 Check if UAI is used" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "uai_id = \"\"\n", - "uai_client_id = \"\"\n", - "if uai_name != \"\":\n", - " from azure.mgmt.msi import ManagedServiceIdentityClient\n", - " from azure.mgmt.msi.models import Identity\n", - "\n", - " msi_client = ManagedServiceIdentityClient(\n", - " subscription_id=subscription_id,\n", - " credential=credential,\n", - " )\n", - " uai_resource = msi_client.user_assigned_identities.get(resource_group, uai_name)\n", - " uai_id = uai_resource.id\n", - " uai_client_id = uai_resource.client_id" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 3.3 Create Llama 2 online endpoint\n", - "This step may take a few minutes." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create endpoint" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azure.ai.ml.entities import (\n", - " ManagedOnlineEndpoint,\n", - " IdentityConfiguration,\n", - " ManagedIdentityConfiguration,\n", - ")\n", - "\n", - "# Check if the endpoint already exists in the workspace\n", - "try:\n", - " endpoint = ml_client.online_endpoints.get(endpoint_name)\n", - " print(\"---Endpoint already exists---\")\n", - "except:\n", - " # Create an online endpoint if it doesn't exist\n", - "\n", - " # Define the endpoint\n", - " endpoint = ManagedOnlineEndpoint(\n", - " name=endpoint_name,\n", - " description=\"Test endpoint for model\",\n", - " identity=IdentityConfiguration(\n", - " type=\"user_assigned\",\n", - " user_assigned_identities=[ManagedIdentityConfiguration(resource_id=uai_id)],\n", - " )\n", - " if uai_id != \"\"\n", - " else None,\n", - " )\n", - "\n", - " # Trigger the endpoint creation\n", - " try:\n", - " ml_client.begin_create_or_update(endpoint).wait()\n", - " print(\"\\n---Endpoint created successfully---\\n\")\n", - " except Exception as err:\n", - " raise RuntimeError(\n", - " f\"Endpoint creation failed. Detailed Response:\\n{err}\"\n", - " ) from err" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### 3.4 Setup Deployment Parameters\n", - "\n", - "We utilize an optimized __foundation-model-inference__ container for model scoring. This container is designed to deliver high throughput and low latency. In this section, we introduce several environment variables that can be adjusted to customize a deployment for either high throughput or low latency scenarios.\n", - "\n", - "- __ENGINE_NAME__: Used to choose the inferencing framework to use in the scoring script. For Llama-2 models, if ENGINE_NAME = 'mii' the container will inference with the new [DeepSpeed-FastGen](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen). Alternatively if ENGINE_NAME = 'vllm' the container will inference with [vLLM](https://vllm.readthedocs.io/en/latest/), which is also the default.\n", - "- __WORKER_COUNT__: The number of workers to use for inferencing. This is used as a proxy for the number of concurrent requests that the server should handle.\n", - "- __TENSOR_PARALLEL__: The number of GPUs to use for tensor parallelism.\n", - "- __NUM_REPLICAS__: The number of model instances to load for the deployment. This is used to increase throughput by loading multiple models on multiple GPUs, if the model is small enough to fit.\n", - "\n", - "`NUM_REPLICAS` and `TENSOR_PARALLEL` work hand-in-hand to determine the most optimal configuration to increase the throughput for the deployment without degrading too much on the latency. The total number of GPUs used for inference will be `NUM_REPLICAS` * `TENSOR_PARALLEL`. For example, if `NUM_REPLICAS` = 2 and `TENSOR_PARALLEL` = 2, then 4 GPUs will be used for inference.\n", - "\n", - "Ensure that the model you are deploying is small enough to fit on the number of GPUs you are using, specified by `TENSOR_PARALLEL`. For instance, if there are 4 GPUs available, and `TENSOR_PARALLEL` = 2, then the model must be small enough to fit on 2 GPUs. If the model is too large, then the deployment will fail. \n", - "\n", - "__NOTE__: \n", - "- `NUM_REPLICAS` is currently only supported by the vLLM engine.\n", - "- DeepSpeed MII Engine is only supported on A100 / H100 GPUs.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "REQUEST_TIMEOUT_MS = 90000 # the timeout for each request in milliseconds\n", - "MAX_CONCURRENT_REQUESTS = (\n", - " 128 # the maximum number of concurrent requests supported by the endpoint\n", - ")\n", - "\n", - "acs_env_vars = {\n", - " \"CONTENT_SAFETY_ACCOUNT_NAME\": aacs_name,\n", - " \"CONTENT_SAFETY_ENDPOINT\": aacs_endpoint,\n", - " \"CONTENT_SAFETY_KEY\": aacs_access_key if uai_client_id == \"\" else None,\n", - " \"CONTENT_SAFETY_THRESHOLD\": content_severity_threshold,\n", - " \"SUBSCRIPTION_ID\": subscription_id,\n", - " \"RESOURCE_GROUP_NAME\": resource_group,\n", - " \"UAI_CLIENT_ID\": uai_client_id,\n", - "}\n", - "\n", - "fm_container_default_env_vars = {\n", - " \"WORKER_COUNT\": MAX_CONCURRENT_REQUESTS,\n", - " \"TENSOR_PARALLEL\": 2,\n", - " \"NUM_REPLICAS\": 2,\n", - "}\n", - "\n", - "deployment_env_vars = {**fm_container_default_env_vars, **acs_env_vars}\n", - "\n", - "# Uncomment the following lines to use DeepSpeed FastGen engine (experimental)\n", - "# mii_fastgen_env_vars = {\n", - "# \"ENGINE_NAME\": \"mii\",\n", - "# \"WORKER_COUNT\": MAX_CONCURRENT_REQUESTS,\n", - "# }\n", - "# deployment_env_vars = {**mii_fastgen_env_vars, **acs_env_vars}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### 3.5 Deploy Llama 2 model\n", - "This step may take a few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azure.ai.ml.entities import (\n", - " OnlineRequestSettings,\n", - " CodeConfiguration,\n", - " ManagedOnlineDeployment,\n", - " ProbeSettings,\n", - ")\n", - "\n", - "# For inference environments vLLM and DS MII, the scoring script is baked into the container\n", - "code_configuration = (\n", - " CodeConfiguration(code=\"./llama-files/score/default/\", scoring_script=\"score.py\")\n", - " if not inference_envs_exist\n", - " else None\n", - ")\n", - "\n", - "deployment = ManagedOnlineDeployment(\n", - " name=deployment_name,\n", - " endpoint_name=endpoint_name,\n", - " model=llama_model.id,\n", - " instance_type=sku_name,\n", - " instance_count=1,\n", - " code_configuration=code_configuration,\n", - " environment_variables=deployment_env_vars,\n", - " request_settings=OnlineRequestSettings(\n", - " request_timeout_ms=REQUEST_TIMEOUT_MS,\n", - " max_concurrent_requests_per_instance=MAX_CONCURRENT_REQUESTS,\n", - " ),\n", - " liveness_probe=ProbeSettings(\n", - " failure_threshold=30,\n", - " success_threshold=1,\n", - " period=100,\n", - " initial_delay=500,\n", - " ),\n", - " readiness_probe=ProbeSettings(\n", - " failure_threshold=30,\n", - " success_threshold=1,\n", - " period=100,\n", - " initial_delay=500,\n", - " ),\n", - ")\n", - "\n", - "# Trigger the deployment creation\n", - "try:\n", - " ml_client.begin_create_or_update(deployment).wait()\n", - " print(\"\\n---Deployment created successfully---\\n\")\n", - "except Exception as err:\n", - " raise RuntimeError(\n", - " f\"Deployment creation failed. Detailed Response:\\n{err}\"\n", - " ) from err" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 4. Test the Safety Enabled Llama 2 online endpoint." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "test_src_dir = \"./safety-llama-test\"\n", - "os.makedirs(test_src_dir, exist_ok=True)\n", - "print(f\"test script directory: {test_src_dir}\")\n", - "sample_data = os.path.join(test_src_dir, \"sample-request.json\")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "##### Choose request from following 4 cells based on the Llama 2 model type you chosen and the scenario you want to test" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "## For text-generation models (without -chat suffix)\n", - "## Successful response\n", - "\n", - "import json\n", - "\n", - "with open(sample_data, \"w\") as f:\n", - " json.dump(\n", - " {\n", - " \"input_data\": {\n", - " \"input_string\": [\n", - " \"Hello\",\n", - " \"My name is John and I have a dog.\",\n", - " ],\n", - " \"parameters\": {\n", - " \"temperature\": 0.6,\n", - " \"top_p\": 0.6,\n", - " \"max_new_tokens\": 256,\n", - " \"do_sample\": True,\n", - " },\n", - " }\n", - " },\n", - " f,\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "## For text-generation models (without -chat suffix)\n", - "## Blocked request/response due to hateful content\n", - "\n", - "import json\n", - "\n", - "with open(sample_data, \"w\") as f:\n", - " json.dump(\n", - " {\n", - " \"input_data\": {\n", - " \"input_string\": [\n", - " \"The dog was given a eutanasa injection due to their severed leg bleding profusely from deep lacarations to the lower extremities, exposing tisssue and nerve.\", # This line contains hateful message and endpoint will return empty string as response\n", - " ],\n", - " \"parameters\": {\n", - " \"temperature\": 0.6,\n", - " \"top_p\": 0.6,\n", - " \"max_new_tokens\": 256,\n", - " \"do_sample\": True,\n", - " },\n", - " }\n", - " },\n", - " f,\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "## For chat-complete models (with \"-chat\" suffix)\n", - "## Successful request/response\n", - "\n", - "import json\n", - "\n", - "with open(sample_data, \"w\") as f:\n", - " json.dump(\n", - " {\n", - " \"input_data\": {\n", - " \"input_string\": [\n", - " {\n", - " \"role\": \"user\",\n", - " \"content\": \"What is the tallest building in the world?\",\n", - " },\n", - " {\n", - " \"role\": \"assistant\",\n", - " \"content\": \"As of 2021, the Burj Khalifa in Dubai, United Arab Emirates is the tallest building in the world, standing at a height of 828 meters (2,722 feet). It was completed in 2010 and has 163 floors. The Burj Khalifa is not only the tallest building in the world but also holds several other records, such as the highest occupied floor, highest outdoor observation deck, elevator with the longest travel distance, and the tallest freestanding structure in the world.\",\n", - " },\n", - " {\"role\": \"user\", \"content\": \"and in Africa?\"},\n", - " {\n", - " \"role\": \"assistant\",\n", - " \"content\": \"In Africa, the tallest building is the Carlton Centre, located in Johannesburg, South Africa. It stands at a height of 50 floors and 223 meters (730 feet). The CarltonDefault Centre was completed in 1973 and was the tallest building in Africa for many years until the construction of the Leonardo, a 55-story skyscraper in Sandton, Johannesburg, which was completed in 2019 and stands at a height of 230 meters (755 feet). Other notable tall buildings in Africa include the Ponte City Apartments in Johannesburg, the John Hancock Center in Lagos, Nigeria, and the Alpha II Building in Abidjan, Ivory Coast\",\n", - " },\n", - " {\"role\": \"user\", \"content\": \"and in Europe?\"},\n", - " ],\n", - " \"parameters\": {\n", - " \"temperature\": 0.9,\n", - " \"top_p\": 0.6,\n", - " \"do_sample\": True,\n", - " \"max_new_tokens\": 100,\n", - " },\n", - " }\n", - " },\n", - " f,\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "## For chat-complete models (with \"-chat\" suffix)\n", - "## Blocked request due to hateful content\n", - "\n", - "import json\n", - "\n", - "with open(sample_data, \"w\") as f:\n", - " json.dump(\n", - " {\n", - " \"input_data\": {\n", - " \"input_string\": [\n", - " {\n", - " \"role\": \"user\",\n", - " \"content\": \"A 51-year-old man was found dead in his car. There were blood stains on the dashboard and windscreen. At autopsy, a deep, oblique, long incised injury was found on the front of the neck. It turns out that he died by suicide.\",\n", - " },\n", - " ],\n", - " \"parameters\": {\n", - " \"temperature\": 0.9,\n", - " \"top_p\": 0.6,\n", - " \"do_sample\": True,\n", - " \"max_new_tokens\": 100,\n", - " },\n", - " }\n", - " },\n", - " f,\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ml_client.online_endpoints.invoke(\n", - " endpoint_name=endpoint_name,\n", - " deployment_name=deployment_name,\n", - " request_file=sample_data,\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.17" - } - }, - "nbformat": 4, - "nbformat_minor": 1 -} + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "# How to create an Azure AI Content Safety enabled Llama 2 online endpoint (Preview)\n", + "### This notebook will walk you through the steps to create an __Azure AI Content Safety__ enabled __Llama 2__ online endpoint.\n", + "### This notebook is under preview\n", + "### The steps are:\n", + "1. Create an __Azure AI Content Safety__ resource for moderating the request from user and response from the __Llama 2__ online endpoint.\n", + "2. Create a new __Azure AI Content Safety__ enabled __Llama 2__ online endpoint with a custom score.py which will integrate with the __Azure AI Content Safety__ resource to moderate the response from the __Llama 2__ model and the request from the user, but to make the custom score.py to successfully authenticated to the __Azure AI Content Safety__ resource, is to create a User Assigned Identity (UAI) and assign appropriate roles to the UAI. Then, the custom score.py can obtain the access token of the UAI from the AAD server to access the Azure AI Content Safety resource. Use [this notebook](aacs-prepare-uai.ipynb) to create UAI account for step 3 below" + ], + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "### 1. Prerequisites\n", + "#### 1.1 Check List:\n", + "- [x] You have created a new Python virtual environment for this notebook.\n", + "- [x] The identity you are using to execute this notebook(yourself or your VM) need to have the __Contributor__ role on the resource group where the AML Workspace your specified is located, because this notebook will create an Azure AI Content Safety resource using that identity." + ], + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 1.2 Assign variables for the workspace and deployment" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "# The public registry name contains Llama 2 models\n", + "registry_name = \"azureml-meta\"\n", + "\n", + "# Name of the Llama 2 model to be deployed\n", + "# available_llama_models_text_generation = [\"Llama-2-7b\", \"Llama-2-13b\", \"Llama-2-70b\"]\n", + "# available_llama_models_chat_complete = [\"Llama-2-7b-chat\", \"Llama-2-13b-chat\", \"Llama-2-70b-chat\"]\n", + "model_name = \"Llama-2-7b\"\n", + "\n", + "endpoint_name = f\"{model_name}-test-ep\" # Replace with your endpoint name\n", + "deployment_name = \"llama\" # Replace with your deployment name, lower case only!!!\n", + "sku_name = \"Standard_NC24s_v3\" # Name of the sku(instance type) Check the model-list(can be found in the parent folder(inference)) to get the most optimal sku for your model (Default: Standard_DS2_v2)\n", + "\n", + "# The severity level that will trigger response be blocked\n", + "# Please reference Azure AI content documentation for more details\n", + "# https://learn.microsoft.com/en-us/azure/cognitive-services/content-safety/concepts/harm-categories\n", + "content_severity_threshold = \"2\"\n", + "\n", + "# UAI to be used for endpoint if you choose to use UAI as authentication method\n", + "uai_name = \"\" # default to \"aacs-uai\" in prepare uai notebook" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 1.3 Install Dependencies(as needed)" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "# uncomment the following lines to install the required packages\n", + "# %pip install azure-identity==1.13.0\n", + "# %pip install azure-mgmt-cognitiveservices==13.4.0\n", + "# %pip install azure-ai-ml==1.8.0\n", + "# %pip install azure-mgmt-msi==7.0.0\n", + "# %pip install azure-mgmt-authorization==3.0.0" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 1.4 Get credential" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential\n", + "\n", + "try:\n", + " credential = DefaultAzureCredential()\n", + " # Check if given credential can get token successfully.\n", + " credential.get_token(\"https://management.azure.com/.default\")\n", + "except Exception as ex:\n", + " # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work\n", + " credential = InteractiveBrowserCredential()" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 1.5 Configure workspace " + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from azure.ai.ml import MLClient\n", + "\n", + "try:\n", + " ml_client = MLClient.from_config(credential=credential)\n", + "except Exception as ex:\n", + " # enter details of your AML workspace\n", + " subscription_id = \"\"\n", + " resource_group = \"\"\n", + " workspace = \"\"\n", + "\n", + " # get a handle to the workspace\n", + " ml_client = MLClient(credential, subscription_id, resource_group, workspace)\n", + "\n", + "subscription_id = ml_client.subscription_id\n", + "resource_group = ml_client.resource_group_name\n", + "workspace = ml_client.workspace_name\n", + "\n", + "print(f\"Connected to workspace {workspace}\")" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 1.6 Assign variables for Azure Content Safety\n", + "Currently, Azure AI Content Safety is in a limited set of regions:\n", + "\n", + "\n", + "__NOTE__: before you choose the region to deploy the Azure AI Content Safety, please be aware that your data will be transferred to the region you choose and by selecting a region outside your current location, you may be allowing the transmission of your data to regions outside your jurisdiction. It is important to note that data protection and privacy laws may vary between jurisdictions. Before proceeding, we strongly advise you to familiarize yourself with the local laws and regulations governing data transfer and ensure that you are legally permitted to transmit your data to an overseas location for processing. By continuing with the selection of a different region, you acknowledge that you have understood and accepted any potential risks associated with such data transmission. Please proceed with caution." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from azure.mgmt.cognitiveservices import CognitiveServicesManagementClient\n", + "\n", + "acs_client = CognitiveServicesManagementClient(credential, subscription_id)\n", + "\n", + "\n", + "# settings for the Azure AI Content Safety resource\n", + "# we will choose existing AACS resource if it exists, otherwise create a new one\n", + "# name of azure ai content safety resource, has to be unique\n", + "import time\n", + "\n", + "aacs_name = f\"{endpoint_name}-aacs-{str(time.time()).replace('.','')}\"\n", + "available_aacs_locations = [\"east us\", \"west europe\"]\n", + "\n", + "# create a new Cognitive Services Account\n", + "kind = \"ContentSafety\"\n", + "aacs_sku_name = \"S0\"\n", + "aacs_location = available_aacs_locations[0]\n", + "\n", + "\n", + "print(\"Available SKUs:\")\n", + "aacs_skus = acs_client.resource_skus.list()\n", + "print(\"SKU Name\\tSKU Tier\\tLocations\")\n", + "for sku in aacs_skus:\n", + " if sku.kind == \"ContentSafety\":\n", + " locations = \",\".join(sku.locations)\n", + " print(sku.name + \"\\t\" + sku.tier + \"\\t\" + locations)\n", + "\n", + "print(\n", + " f\"Choose a new Azure AI Content Safety resource in {aacs_location} with SKU {aacs_sku_name}\"\n", + ")" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "### 2. Create Azure AI Content Safety" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from azure.mgmt.cognitiveservices.models import Account, Sku, AccountProperties\n", + "\n", + "\n", + "parameters = Account(\n", + " sku=Sku(name=aacs_sku_name),\n", + " kind=kind,\n", + " location=aacs_location,\n", + " properties=AccountProperties(\n", + " custom_sub_domain_name=aacs_name, public_network_access=\"Enabled\"\n", + " ),\n", + ")\n", + "# How many seconds to wait between checking the status of an async operation.\n", + "wait_time = 10\n", + "\n", + "\n", + "def find_acs(accounts):\n", + " return next(\n", + " x\n", + " for x in accounts\n", + " if x.kind == \"ContentSafety\"\n", + " and x.location == aacs_location\n", + " and x.sku.name == aacs_sku_name\n", + " )\n", + "\n", + "\n", + "try:\n", + " # check if AACS exists\n", + " aacs = acs_client.accounts.get(resource_group, aacs_name)\n", + " print(f\"Found existing Azure AI content safety Account {aacs.name}.\")\n", + "except:\n", + " try:\n", + " # check if there is an existing AACS resource within same resource group\n", + " aacs = find_acs(acs_client.accounts.list_by_resource_group(resource_group))\n", + " print(\n", + " f\"Found existing Azure AI content safety Account {aacs.name} in resource group {resource_group}.\"\n", + " )\n", + " except:\n", + " print(f\"Creating Azure AI content safety Account {aacs_name}.\")\n", + " acs_client.accounts.begin_create(resource_group, aacs_name, parameters).wait()\n", + " print(\"Resource created.\")\n", + " aacs = acs_client.accounts.get(resource_group, aacs_name)\n", + "\n", + "\n", + "aacs_endpoint = aacs.properties.endpoint\n", + "aacs_resource_id = aacs.id\n", + "aacs_name = aacs.name\n", + "print(\n", + " f\"AACS name is {aacs.name}, use this name in UAI preparation notebook to create UAI.\"\n", + ")\n", + "print(f\"AACS endpoint is {aacs_endpoint}\")\n", + "print(f\"AACS ResourceId is {aacs_resource_id}\")" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "### 3. Create Azure AI Content Safety enabled Llama 2 online endpoint" + ], + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 3.1 Check if Llama 2 model is available in the AML registry." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "reg_client = MLClient(\n", + " credential,\n", + " subscription_id=subscription_id,\n", + " resource_group_name=resource_group,\n", + " registry_name=registry_name,\n", + ")\n", + "version_list = list(\n", + " reg_client.models.list(model_name)\n", + ") # list available versions of the model\n", + "llama_model = None\n", + "\n", + "# If specific inference environments are tagged for the model\n", + "inference_envs_exist = False\n", + "\n", + "if len(version_list) == 0:\n", + " raise Exception(f\"No model named {model_name} found in registry\")\n", + "else:\n", + " model_version = version_list[0].version\n", + " llama_model = reg_client.models.get(model_name, model_version)\n", + " if (\n", + " \"inference_supported_envs\" in llama_model.tags\n", + " and len(llama_model.tags[\"inference_supported_envs\"]) >= 1\n", + " ):\n", + " inference_envs_exist = True\n", + " print(\n", + " f\"Using model name: {llama_model.name}, version: {llama_model.version}, id: {llama_model.id} for inferencing\"\n", + " )" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 3.2 Check if UAI is used" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "uai_id = \"\"\n", + "uai_client_id = \"\"\n", + "if uai_name != \"\":\n", + " from azure.mgmt.msi import ManagedServiceIdentityClient\n", + " from azure.mgmt.msi.models import Identity\n", + "\n", + " msi_client = ManagedServiceIdentityClient(\n", + " subscription_id=subscription_id,\n", + " credential=credential,\n", + " )\n", + " uai_resource = msi_client.user_assigned_identities.get(resource_group, uai_name)\n", + " uai_id = uai_resource.id\n", + " uai_client_id = uai_resource.client_id" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### 3.3 Create Llama 2 online endpoint\n", + "This step may take a few minutes." + ], + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "#### Create endpoint" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from azure.ai.ml.entities import (\n", + " ManagedOnlineEndpoint,\n", + " IdentityConfiguration,\n", + " ManagedIdentityConfiguration,\n", + ")\n", + "\n", + "# Check if the endpoint already exists in the workspace\n", + "try:\n", + " endpoint = ml_client.online_endpoints.get(endpoint_name)\n", + " print(\"---Endpoint already exists---\")\n", + "except:\n", + " # Create an online endpoint if it doesn't exist\n", + "\n", + " # Define the endpoint\n", + " endpoint = ManagedOnlineEndpoint(\n", + " name=endpoint_name,\n", + " description=\"Test endpoint for model\",\n", + " identity=IdentityConfiguration(\n", + " type=\"user_assigned\",\n", + " user_assigned_identities=[ManagedIdentityConfiguration(resource_id=uai_id)],\n", + " )\n", + " if uai_id != \"\"\n", + " else None,\n", + " )\n", + "\n", + " # Trigger the endpoint creation\n", + " try:\n", + " ml_client.begin_create_or_update(endpoint).wait()\n", + " print(\"\\n---Endpoint created successfully---\\n\")\n", + " except Exception as err:\n", + " raise RuntimeError(\n", + " f\"Endpoint creation failed. Detailed Response:\\n{err}\"\n", + " ) from err" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "#### 3.4 Setup Deployment Parameters\n", + "\n", + "We utilize an optimized __foundation-model-inference__ container for model scoring. This container is designed to deliver high throughput and low latency. In this section, we introduce several environment variables that can be adjusted to customize a deployment for either high throughput or low latency scenarios.\n", + "\n", + "- __ENGINE_NAME__: Used to choose the inferencing framework to use in the scoring script. For Llama-2 models, if ENGINE_NAME = 'mii' the container will inference with the new [DeepSpeed-FastGen](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen). Alternatively if ENGINE_NAME = 'vllm' the container will inference with [vLLM](https://vllm.readthedocs.io/en/latest/), which is also the default.\n", + "- __WORKER_COUNT__: The number of workers to use for inferencing. This is used as a proxy for the number of concurrent requests that the server should handle.\n", + "- __TENSOR_PARALLEL__: The number of GPUs to use for tensor parallelism.\n", + "- __NUM_REPLICAS__: The number of model instances to load for the deployment. This is used to increase throughput by loading multiple models on multiple GPUs, if the model is small enough to fit.\n", + "\n", + "`NUM_REPLICAS` and `TENSOR_PARALLEL` work hand-in-hand to determine the most optimal configuration to increase the throughput for the deployment without degrading too much on the latency. The total number of GPUs used for inference will be `NUM_REPLICAS` * `TENSOR_PARALLEL`. For example, if `NUM_REPLICAS` = 2 and `TENSOR_PARALLEL` = 2, then 4 GPUs will be used for inference.\n", + "\n", + "Ensure that the model you are deploying is small enough to fit on the number of GPUs you are using, specified by `TENSOR_PARALLEL`. For instance, if there are 4 GPUs available, and `TENSOR_PARALLEL` = 2, then the model must be small enough to fit on 2 GPUs. If the model is too large, then the deployment will fail. \n", + "\n", + "__NOTE__: \n", + "- `NUM_REPLICAS` is currently only supported by the vLLM engine.\n", + "- DeepSpeed MII Engine is only supported on A100 / H100 GPUs.\n" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "REQUEST_TIMEOUT_MS = 90000 # the timeout for each request in milliseconds\n", + "MAX_CONCURRENT_REQUESTS = (\n", + " 128 # the maximum number of concurrent requests supported by the endpoint\n", + ")\n", + "\n", + "acs_env_vars = {\n", + " \"CONTENT_SAFETY_ACCOUNT_NAME\": aacs_name,\n", + " \"CONTENT_SAFETY_ENDPOINT\": aacs_endpoint,\n", + " \"CONTENT_SAFETY_THRESHOLD\": content_severity_threshold,\n", + " \"SUBSCRIPTION_ID\": subscription_id,\n", + " \"RESOURCE_GROUP_NAME\": resource_group,\n", + " \"UAI_CLIENT_ID\": uai_client_id,\n", + "}\n", + "\n", + "fm_container_default_env_vars = {\n", + " \"WORKER_COUNT\": MAX_CONCURRENT_REQUESTS,\n", + " \"TENSOR_PARALLEL\": 2,\n", + " \"NUM_REPLICAS\": 2,\n", + "}\n", + "\n", + "deployment_env_vars = {**fm_container_default_env_vars, **acs_env_vars}\n", + "\n", + "# Uncomment the following lines to use DeepSpeed FastGen engine (experimental)\n", + "# mii_fastgen_env_vars = {\n", + "# \"ENGINE_NAME\": \"mii\",\n", + "# \"WORKER_COUNT\": MAX_CONCURRENT_REQUESTS,\n", + "# }\n", + "# deployment_env_vars = {**mii_fastgen_env_vars, **acs_env_vars}" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "##### 3.5 Deploy Llama 2 model\n", + "This step may take a few minutes." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "from azure.ai.ml.entities import (\n", + " OnlineRequestSettings,\n", + " CodeConfiguration,\n", + " ManagedOnlineDeployment,\n", + " ProbeSettings,\n", + ")\n", + "\n", + "# For inference environments vLLM and DS MII, the scoring script is baked into the container\n", + "code_configuration = (\n", + " CodeConfiguration(code=\"./llama-files/score/default/\", scoring_script=\"score.py\")\n", + " if not inference_envs_exist\n", + " else None\n", + ")\n", + "\n", + "deployment = ManagedOnlineDeployment(\n", + " name=deployment_name,\n", + " endpoint_name=endpoint_name,\n", + " model=llama_model.id,\n", + " instance_type=sku_name,\n", + " instance_count=1,\n", + " code_configuration=code_configuration,\n", + " environment_variables=deployment_env_vars,\n", + " request_settings=OnlineRequestSettings(\n", + " request_timeout_ms=REQUEST_TIMEOUT_MS,\n", + " max_concurrent_requests_per_instance=MAX_CONCURRENT_REQUESTS,\n", + " ),\n", + " liveness_probe=ProbeSettings(\n", + " failure_threshold=30,\n", + " success_threshold=1,\n", + " period=100,\n", + " initial_delay=500,\n", + " ),\n", + " readiness_probe=ProbeSettings(\n", + " failure_threshold=30,\n", + " success_threshold=1,\n", + " period=100,\n", + " initial_delay=500,\n", + " ),\n", + ")\n", + "\n", + "# Trigger the deployment creation\n", + "try:\n", + " ml_client.begin_create_or_update(deployment).wait()\n", + " print(\"\\n---Deployment created successfully---\\n\")\n", + "except Exception as err:\n", + " raise RuntimeError(\n", + " f\"Deployment creation failed. Detailed Response:\\n{err}\"\n", + " ) from err" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "### 4. Test the Safety Enabled Llama 2 online endpoint." + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "\n", + "test_src_dir = \"./safety-llama-test\"\n", + "os.makedirs(test_src_dir, exist_ok=True)\n", + "print(f\"test script directory: {test_src_dir}\")\n", + "sample_data = os.path.join(test_src_dir, \"sample-request.json\")" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "attachments": {}, + "cell_type": "markdown", + "source": [ + "##### Choose request from following 4 cells based on the Llama 2 model type you chosen and the scenario you want to test" + ], + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "## For text-generation models (without -chat suffix)\n", + "## Successful response\n", + "\n", + "import json\n", + "\n", + "with open(sample_data, \"w\") as f:\n", + " json.dump(\n", + " {\n", + " \"input_data\": {\n", + " \"input_string\": [\n", + " \"Hello\",\n", + " \"My name is John and I have a dog.\",\n", + " ],\n", + " \"parameters\": {\n", + " \"temperature\": 0.6,\n", + " \"top_p\": 0.6,\n", + " \"max_new_tokens\": 256,\n", + " \"do_sample\": True,\n", + " },\n", + " }\n", + " },\n", + " f,\n", + " )" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "## For text-generation models (without -chat suffix)\n", + "## Blocked request/response due to hateful content\n", + "\n", + "import json\n", + "\n", + "with open(sample_data, \"w\") as f:\n", + " json.dump(\n", + " {\n", + " \"input_data\": {\n", + " \"input_string\": [\n", + " \"The dog was given a eutanasa injection due to their severed leg bleding profusely from deep lacarations to the lower extremities, exposing tisssue and nerve.\", # This line contains hateful message and endpoint will return empty string as response\n", + " ],\n", + " \"parameters\": {\n", + " \"temperature\": 0.6,\n", + " \"top_p\": 0.6,\n", + " \"max_new_tokens\": 256,\n", + " \"do_sample\": True,\n", + " },\n", + " }\n", + " },\n", + " f,\n", + " )" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "## For chat-complete models (with \"-chat\" suffix)\n", + "## Successful request/response\n", + "\n", + "import json\n", + "\n", + "with open(sample_data, \"w\") as f:\n", + " json.dump(\n", + " {\n", + " \"input_data\": {\n", + " \"input_string\": [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"What is the tallest building in the world?\",\n", + " },\n", + " {\n", + " \"role\": \"assistant\",\n", + " \"content\": \"As of 2021, the Burj Khalifa in Dubai, United Arab Emirates is the tallest building in the world, standing at a height of 828 meters (2,722 feet). It was completed in 2010 and has 163 floors. The Burj Khalifa is not only the tallest building in the world but also holds several other records, such as the highest occupied floor, highest outdoor observation deck, elevator with the longest travel distance, and the tallest freestanding structure in the world.\",\n", + " },\n", + " {\"role\": \"user\", \"content\": \"and in Africa?\"},\n", + " {\n", + " \"role\": \"assistant\",\n", + " \"content\": \"In Africa, the tallest building is the Carlton Centre, located in Johannesburg, South Africa. It stands at a height of 50 floors and 223 meters (730 feet). The CarltonDefault Centre was completed in 1973 and was the tallest building in Africa for many years until the construction of the Leonardo, a 55-story skyscraper in Sandton, Johannesburg, which was completed in 2019 and stands at a height of 230 meters (755 feet). Other notable tall buildings in Africa include the Ponte City Apartments in Johannesburg, the John Hancock Center in Lagos, Nigeria, and the Alpha II Building in Abidjan, Ivory Coast\",\n", + " },\n", + " {\"role\": \"user\", \"content\": \"and in Europe?\"},\n", + " ],\n", + " \"parameters\": {\n", + " \"temperature\": 0.9,\n", + " \"top_p\": 0.6,\n", + " \"do_sample\": True,\n", + " \"max_new_tokens\": 100,\n", + " },\n", + " }\n", + " },\n", + " f,\n", + " )" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "## For chat-complete models (with \"-chat\" suffix)\n", + "## Blocked request due to hateful content\n", + "\n", + "import json\n", + "\n", + "with open(sample_data, \"w\") as f:\n", + " json.dump(\n", + " {\n", + " \"input_data\": {\n", + " \"input_string\": [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"A 51-year-old man was found dead in his car. There were blood stains on the dashboard and windscreen. At autopsy, a deep, oblique, long incised injury was found on the front of the neck. It turns out that he died by suicide.\",\n", + " },\n", + " ],\n", + " \"parameters\": {\n", + " \"temperature\": 0.9,\n", + " \"top_p\": 0.6,\n", + " \"do_sample\": True,\n", + " \"max_new_tokens\": 100,\n", + " },\n", + " }\n", + " },\n", + " f,\n", + " )" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [ + "ml_client.online_endpoints.invoke(\n", + " endpoint_name=endpoint_name,\n", + " deployment_name=deployment_name,\n", + " request_file=sample_data,\n", + ")" + ], + "outputs": [], + "execution_count": null, + "metadata": {} + }, + { + "cell_type": "code", + "source": [], + "outputs": [], + "execution_count": null, + "metadata": {} + } + ], + "metadata": { + "kernelspec": { + "name": "python3", + "language": "python", + "display_name": "Python 3 (ipykernel)" + }, + "language_info": { + "name": "python", + "version": "3.8.5", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + }, + "microsoft": { + "ms_spell_check": { + "ms_spell_check_language": "en" + } + }, + "kernel_info": { + "name": "python3" + }, + "nteract": { + "version": "nteract-front-end@1.0.0" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} \ No newline at end of file