update docs notebook

ssciwr · Oct 4, 2024 · 76627e6 · 76627e6
1 parent 4a1003e
commit 76627e6
Show file tree

Hide file tree

Showing 2 changed files with 95 additions and 23 deletions.
diff --git a/ammico/notebooks/DemoNotebook_ammico.ipynb b/ammico/notebooks/DemoNotebook_ammico.ipynb
@@ -170,7 +170,7 @@
    "source": [
     "image_dict = ammico.find_files(\n",
     "    # path=\"/content/drive/MyDrive/misinformation-data/\",\n",
-    "    path=\"data-test/\",\n",
+    "    path=str(data_path),\n",
     "    limit=15,\n",
     ")"
    ]
@@ -1434,7 +1434,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "ammico",
    "language": "python",
    "name": "python3"
   },

diff --git a/docs/source/notebooks/DemoNotebook_ammico.ipynb b/docs/source/notebooks/DemoNotebook_ammico.ipynb
@@ -18,17 +18,21 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# if running on google colab\n",
+    "# if running on google colab\\\n",
+    "# PLEASE RUN THIS ONLY AS CPU RUNTIME\n",
+    "# for a GPU runtime, there are conflicts with pre-installed packages - \n",
+    "# you first need to uninstall them (prepare a clean environment with no pre-installs) and then install ammico\n",
     "# flake8-noqa-cell\n",
     "\n",
     "if \"google.colab\" in str(get_ipython()):\n",
     "    # update python version\n",
     "    # install setuptools\n",
     "    # %pip install setuptools==61 -qqq\n",
     "    # uninstall some pre-installed packages due to incompatibility\n",
-    "    %pip uninstall --yes tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint flex-y -qqq\n",
+    "    %pip uninstall --yes tensorflow-probability dopamine-rl lida pandas-gbq torchaudio torchdata torchtext orbax-checkpoint flex-y jax jaxlib -qqq\n",
     "    # install ammico\n",
     "    %pip install git+https://github.com/ssciwr/ammico.git -qqq\n",
+    "    # install older version of jax to support transformers use of diffusers\n",
     "    # mount google drive for data and API key\n",
     "    from google.colab import drive\n",
     "\n",
@@ -92,12 +96,12 @@
    "outputs": [],
    "source": [
     "import os\n",
+    "# jax also sometimes leads to problems on google colab\n",
+    "# if this is the case, try restarting the kernel and executing this \n",
+    "# and the above two code cells again\n",
     "import ammico\n",
     "# for displaying a progress bar\n",
-    "from tqdm import tqdm\n",
-    "# to get the reference data for text_dict\n",
-    "import importlib_resources\n",
-    "pkg = importlib_resources.files(\"ammico\")"
+    "from tqdm import tqdm"
    ]
   },
   {
@@ -151,7 +155,7 @@
     "| `limit` | `int` | maximum number of files to read (defaults to `20`, for all images set to `None` or `-1`) |\n",
     "| `random_seed` | `str` | the random seed for shuffling the images; applies when only a few images are read and the selection should be preserved (defaults to `None`) |\n",
     "\n",
-    "The `find_files` function returns a nested dict that contains the file ids and the paths to the files and is empty otherwise. This dict is filled step by step with more data as each detector class is run on the data (see below).\n",
+    "The `find_files` function returns a nested dictionary that contains the file ids and the paths to the files and is empty otherwise. This dict is filled step by step with more data as each detector class is run on the data (see below).\n",
     "\n",
     "If you downloaded the test dataset above, you can directly provide the path you already set for the test directory, `data_path`. The below cell is already set up for the test dataset.\n",
     "\n",
@@ -183,9 +187,9 @@
     "\n",
     "If you want to run an analysis using the EmotionDetector detector type, you have first have to respond to an ethical disclosure statement. This disclosure statement ensures that you only use the full capabilities of the EmotionDetector after you have been made aware of its shortcomings.\n",
     "\n",
-    "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_disclosure`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_disclosure, None)`. To permanently set this envorinment variable, add it to your shell via your `.profile` or `.bashr` file.\n",
+    "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_disclosure`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_disclosure, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n",
     "\n",
-    "If the disclosure statement is accepted, the EmotionDetector will perform age, gender and race/ethnicity classification dependend on the provided thresholds. If the disclosure is rejected, only the presence of faces and emotion (if not wearing a mask) is detected."
+    "If the disclosure statement is accepted, the EmotionDetector will perform age, gender and race/ethnicity classification depending on the provided thresholds. If the disclosure is rejected, only the presence of faces and emotion (if not wearing a mask) is detected."
    ]
   },
   {
@@ -203,6 +207,34 @@
     "_ = ammico.ethical_disclosure(accept_disclosure=accept_disclosure)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Privacy disclosure statement\n",
+    "\n",
+    "If you want to run an analysis using the TextDetector detector type, you have first have to respond to a privacy disclosure statement. This disclosure statement ensures that you are aware that your data will be sent to google cloud vision servers for analysis.\n",
+    "\n",
+    "For this, answer \"yes\" or \"no\" to the below prompt. This will set an environment variable with the name given as in `accept_privacy`. To re-run the disclosure prompt, unset the variable by uncommenting the line `os.environ.pop(accept_privacy, None)`. To permanently set this environment variable, add it to your shell via your `.profile` or `.bashr` file.\n",
+    "\n",
+    "If the privacy disclosure statement is accepted, the TextDetector will perform the text extraction, translation and if selected, analysis. If the privacy disclosure is rejected, no text processing will be carried out and you cannot use the TextDetector."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# respond to the privacy disclosure statement\n",
+    "# this will set an environment variable for you\n",
+    "# if you do not want to re-accept the privacy disclosure every time, you can set this environment variable in your shell\n",
+    "# to re-set the environment variable, uncomment the below line\n",
+    "accept_privacy = \"PRIVACY_AMMICO\"\n",
+    "# os.environ.pop(accept_privacy, None)\n",
+    "_ = ammico.privacy_disclosure(accept_privacy=accept_privacy)"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -253,9 +285,17 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# set the thresholds for the emotion detection\n",
+    "emotion_threshold = 50  # this is the default value for the detection confidence\n",
+    "# the lowest possible value is 0\n",
+    "# the highest possible value is 100\n",
+    "race_threshold = 50\n",
+    "gender_threshold = 50\n",
     "for num, key in tqdm(enumerate(image_dict.keys()), total=len(image_dict)):    # loop through all images\n",
-    "    image_dict[key] = ammico.EmotionDetector(image_dict[key]).analyse_image() # analyse image with EmotionDetector and update dict\n",
-    "    \n",
+    "    image_dict[key] = ammico.EmotionDetector(image_dict[key],\n",
+    "                                             emotion_threshold=emotion_threshold,\n",
+    "                                             race_threshold=race_threshold,\n",
+    "                                             gender_threshold=gender_threshold).analyse_image() # analyse image with EmotionDetector and update dict\n",
     "    if num % dump_every == 0 or num == len(image_dict) - 1:      # save results every dump_every to dump_file\n",
     "        image_df = ammico.get_dataframe(image_dict)\n",
     "        image_df.to_csv(dump_file)"
@@ -405,8 +445,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "csv_path = pkg / \"data\" / \"ref\" / \"test.csv\"\n",
-    "ta = ammico.TextAnalyzer(csv_path=str(csv_path), column_key=\"text\")"
+    "ta = ammico.TextAnalyzer(csv_path=\"../data/ref/test.csv\", column_key=\"text\")"
    ]
   },
   {
@@ -530,6 +569,17 @@
     "        image_df.to_csv(dump_file)"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write output to csv\n",
+    "image_df = ammico.get_dataframe(image_dict)\n",
+    "image_df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -759,7 +809,7 @@
     "# analysis_type can be \n",
     "# \"summary\",\n",
     "# \"questions\",\n",
-    "# \"summary_and_questions\".\n"
+    "# \"summary_and_questions\"."
    ]
   },
   {
@@ -806,7 +856,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can also ask sequential questions if you pass the argument `cosequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. "
+    "You can also ask sequential questions if you pass the argument `consequential_questions=True`. This means that the answers to previous questions will be passed as context to the next question. However, this method will work a bit slower, because for each image the answers to the questions will not be calculated simultaneously, but sequentially. "
    ]
   },
   {
@@ -840,6 +890,17 @@
     "image_dict"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write output to csv\n",
+    "image_df = ammico.get_dataframe(image_dict)\n",
+    "image_df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -855,7 +916,7 @@
     "\n",
     "From the seven facial expressions, an overall dominating emotion category is identified: negative, positive, or neutral emotion. These are defined with the facial expressions angry, disgust, fear and sad for the negative category, happy for the positive category, and surprise and neutral for the neutral category.\n",
     "\n",
-    "A similar threshold as for the emotion recognition is set for the race/ethnicity and gender detection, `race_threshold` and `gender_threshold`, with the default set to 50% so that a confidence for race / gender above 0.5 only will return a value in the analysis.\n",
+    "A similar threshold as for the emotion recognition is set for the race/ethnicity and gender detection, `race_threshold` and `gender_threshold`, with the default set to 50% so that a confidence for race / gender above 0.5 only will return a value in the analysis. \n",
     "\n",
     "For age unfortunately no confidence value is accessible so that no threshold values can be set for this type of analysis. The [reported MAE of the model is &pm; 4.65](https://sefiks.com/2019/02/13/apparent-age-and-gender-prediction-in-keras/).\n",
     "\n",
@@ -876,6 +937,17 @@
     "                                             accept_disclosure=\"DISCLOSURE_AMMICO\").analyse_image()"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# write output to csv\n",
+    "image_df = ammico.get_dataframe(image_dict)\n",
+    "image_df.to_csv(\"/content/drive/MyDrive/misinformation-data/data_out.csv\")"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -986,7 +1058,7 @@
    "source": [
     "The images are then processed and stored in a numerical representation, a tensor. These tensors do not change for the same image and same model - so if you run this analysis once, and save the tensors giving a path with the keyword `path_to_save_tensors`, a file with filename `.<Number_of_images>_<model_name>_saved_features_image.pt` will be placed there.\n",
     "\n",
-    "This can save you time if you want to analyse same images with the same model but different questions. To run using the saved tensors, execute the below code giving the path and name of the tensor file. Any subsequent query of the model will run in a fraction of the time than it run in initially."
+    "This can save you time if you want to analyse the same images with the same model but different questions. To run using the saved tensors, execute the below code giving the path and name of the tensor file. Any subsequent query of the model will run in a fraction of the time than it run in initially."
    ]
   },
   {
@@ -1050,7 +1122,7 @@
     "You can filter your results in 3 different ways:\n",
     "- `filter_number_of_images` limits the number of images found. That is, if the parameter `filter_number_of_images = 10`, then the first 10 images that best match the query will be shown. The other images ranks will be set to `None` and the similarity value to `0`.\n",
     "- `filter_val_limit` limits the output of images with a similarity value not bigger than `filter_val_limit`. That is, if the parameter `filter_val_limit = 0.2`, all images with similarity less than 0.2 will be discarded.\n",
-    "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_simularity_value - best_simularity_value_in_current_search)/best_simularity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35."
+    "- `filter_rel_error` (percentage) limits the output of images with a similarity value not bigger than `100 * abs(current_similarity_value - best_similarity_value_in_current_search)/best_similarity_value_in_current_search < filter_rel_error`. That is, if we set filter_rel_error = 30, it means that if the top1 image have 0.5 similarity value, we discard all image with similarity less than 0.35."
    ]
   },
   {
@@ -1174,7 +1246,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images querys, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images."
+    "Then using the same output function you can add the `itm=True` argument to output the new image order. Remember that for images queries, an error will be thrown with `itm=True` argument. You can also add the `image_gradcam_with_itm` along with `itm=True` argument to output the heat maps of the calculated images."
    ]
   },
   {
@@ -1199,7 +1271,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Convert the dictionary of dictionarys into a dictionary with lists:"
+    "Convert the dictionary of dictionaries into a dictionary with lists:"
    ]
   },
   {
@@ -1367,7 +1439,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.5"
+   "version": "3.11.9"
   }
  },
  "nbformat": 4,