Revert "Pre calculated Machine energy profiles (#76)"

This reverts commit 7ed0e7e.
green-coding-solutions · Jun 18, 2024 · 5552acc · 5552acc
1 parent 7ed0e7e
commit 5552acc
Show file tree

Hide file tree

Showing 19 changed files with 495 additions and 30,569 deletions.
diff --git a/.github/workflows/data-json-test.yml b/.github/workflows/data-json-test.yml
@@ -12,12 +12,12 @@ on:
     # Reason being that we pull our ML model and this could have changed in the meantime
     - cron: '22 4 * * 6'
   workflow_dispatch:
-
+  
 jobs:
   test-data-output-action:
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@v3
         with:
           fetch-depth: 0
 
@@ -30,10 +30,6 @@ jobs:
         with:
           node-version: '18'
 
-      - name: Sleep 2
-        run: |
-          sleep 2
-
       - name: Node Setup Energy Measurment
         id: data-node-setup
         uses: ./

diff --git a/.github/workflows/overhead_test.yml b/.github/workflows/overhead_test.yml
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -20,12 +20,14 @@ permissions:
 
 jobs:
   test-action:
-    runs-on: ${{ matrix.os }}
-    continue-on-error: false
-    strategy:
-      fail-fast: true
-      matrix:
-        os: [ubuntu-22.04, ubuntu-24.04, ubuntu-20.04]
+    runs-on: ubuntu-latest
+#    runs-on: ${matrix.os}
+#    continue-on-error: false
+#    strategy:
+#      fail-fast: true
+#      matrix:
+#        os: [ubuntu-latest, ubuntu-24.04, ubuntu-20.04, ubuntu-18.04, ubuntu-20.04-16core]
+#        # It might be the case
     steps:
       - uses: actions/checkout@v4
         with:
@@ -43,11 +45,7 @@ jobs:
       - name: Sleep step
         run: sleep 2
 
-      - name: Dump ECO-CI CPU Step before
-        run: |
-          cat /tmp/eco-ci/cpu-util-step.txt
-
-      - name: Dump ECO-CI CPU before
+      - name: Dump ECO-CI CPU
         run: |
           cat /tmp/eco-ci/cpu-util-total.txt
 
@@ -58,15 +56,6 @@ jobs:
           task: get-measurement
           label: "Sleep 3s"
 
-      - name: Dump ECO-CI CPU Step actual processed
-        run: |
-          cat /tmp/eco-ci/cpu-util-temp.txt
-
-      - name: Dump ECO-CI Energy Step actual processed
-        run: |
-          cat /tmp/eco-ci/energy-step.txt
-
-
       - name: Filesystem
         run: timeout 10s ls -alhR /usr/lib
         continue-on-error: true

diff --git a/.gitlab-ci.yml.example b/.gitlab-ci.yml.example
@@ -12,11 +12,6 @@ test-job:
     # Generate one here for example: https://www.freecodeformat.com/validate-uuid-guid.php
     #- export ECO_CI_COMPANY_UUID="YOUR COMPANY UUID"
     #- export ECO_CI_PROJECT_UUID="YOUR PROJECT UUID"
-    #- export ECO_CI_MACHINE_UUID="YOUR MACHINE UUID"
-
-    # Change this to you machine, if you are not using the default.
-    # https://docs.gitlab.com/ee/ci/runners/hosted_runners/linux.html
-    #- export MACHINE_POWER_DATA="gitlab_EPYC_7B12_saas-linux-small-amd64.txt"
 
     - !reference [.initialize_energy_estimator, script]
     - !reference [.start_measurement, script]

diff --git a/README.md b/README.md
@@ -3,19 +3,13 @@
 Eco-CI is a project aimed at estimating energy consumption in continuous integration (CI) environments. It provides functionality to calculate the energy consumption of CI jobs based on the power consumption characteristics of the underlying hardware.
 
 
-## Requirements
-Following packages are expected:
-- `curl`
-- `jq`
-- `awk`
-
 ## Usage
 
 Eco-CI supports both GitHub and GitLab as CI platforms. When you integrate it into your pipeline, you must call the start-measurement script to begin collecting power consumption data, then call the get-measurement script each time you wish to make a spot measurement. When you call get-measurment, you can also assign a label to it to more easily identify the measurement. At the end, call the display-results to see all the measurement results, overall total usage, and export the data.
 
-Follow the instructions below to integrate Eco-CI into your CI pipeline.
+Follow the instructions below to integrate Eco-CI into your CI pipeline:
 
-### GitHub:
+### Github:
 To use Eco-CI in your GitHub workflow, call it with the relevant task name (start-measurement, get-measurement, or display-results). Here is a sample workflow that runs some python tests with eco-ci integrated.
 
 ```yaml
@@ -90,7 +84,7 @@ jobs:
 
 ```
 
-#### GitHub Action Mandatory and Optional Variables:
+#### Github Action Mandatory and Optional Variables:
 
 - `task`: (required) (options are `start-measurement`, `get-measurement`, `display-results`)
   - `start-measurement` - Initialize the action starts the measurement. This must be called, and only once per job.
@@ -110,6 +104,9 @@ jobs:
   - Get the CO2 grid intensity for the location from https://www.electricitymaps.com/
   - Estimates the amount of carbon the measurement has produced
 - `display-table`: (optional) (default: true)
+  - call during the `display-graph` step to either show/hide the energy reading table results in the output
+- `display-graph`: (optional) (default: true)
+  - We use an ascii charting library written in go (https://github.com/guptarohit/asciigraph). For GitHub hosted runners their images come with go so we do not install it. If you are using a private runner instance however, your machine may not have go installed, and this will not work. As we want to minimize what we install on private runner machines to not intefere with your setup, we will not install go. Therefore, you will need to call `start-measurement` with the `display-graph` flag set to false, and that will skip the installation of this go library.
 - `display-badge`: (optional) (default: true)
   - used with display-results
   - Shows the badge for the ci run during display-results step
@@ -213,40 +210,6 @@ jobs:
           task: start-measurement
  ```
 
-### Support for dedicated runners / non-standard machines
-
-This plugin is primarily designed for the [GitHub Shared Runners](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources) and comes with their energy values already pre-calculated.
-
-All the values for supported machines are found in the [power-data](https://github.com/green-coding-solutions/eco-ci-energy-estimation/tree/main/power-data) folder.
-
-The heavy work to get this values is done by [Cloud Energy](https://github.com/green-coding-solutions/cloud-energy) (See below for details).
-
-If you want to support a custom machine you need to create one of these files and load it into Eco-CI.
-
-Here is an exemplary command to create the power data for the basic **4 CPU** GitHub Shared Runner (at the time of writing 13. June 2024).
-
-`python3 xgb.py --tdp 280 --cpu-threads 128 --cpu-cores=64 --cpu-make "amd" --release-year=2021 --ram 512 --cpu-freq=2450 --cpu-chips=1 --vhost-ratio=0.03125 --dump-hashmap > github_EPYC_7763_4_CPU_shared.sh`
-
-The following would be the command for [Gitlab Shared Runners](https://docs.gitlab.com/ee/ci/runners/hosted_runners/linux.html) (at the time of writing 13. June 2024)
-
-`python3 xgb.py --tdp 240 --cpu-threads 128 --cpu-cores=64 --cpu-make "amd" --release-year=2021 --ram 512 --cpu-freq=2250 --cpu-chips=1 --vhost-ratio=0.015625 --dump-hashmap > gitlab_EPYC_7B12_saas-linux-small-amd64.txt`
-
-Gitlab uses an AMD EPYC 7B12 according to [our findings](https://www.green-coding.io/case-studies/cpu-utilization-usefulness/)
-
-
-You can see how the machine specs must be supplied to [Cloud Energy](https://github.com/green-coding-solutions/cloud-energy) and also, since the runners are shared, you need to supply the splitting ratio that is used.
-
-Since GitHub for instance uses an `AMD EPYC 7763`, which only comes with 64 cores and 128 threads, and gives you **4 CPUs** the assumption is 
-that the splitting factor is `4/128 = 0.03125`. 
-
-An uncertainty is if Hyper-Threading / SMT is turned on or off, but we believe it is reasonable to assume that for Shared runners they will turn it on as it generally increases
-throughput and performance in shared environments.
-
-If you have trouble finding out the splitting factor for your system: Open an issue! We are happy to help!!
-
-Once you have the file ready we are happy to merge it in through a PR! In future versions we also plan to include a loading mechanism, where you can just
-ingest a file from your repository without having to upstream it with us. But since this is a community open source plugin upstream is preferred, right :)
-
 ### GitLab:
 To use Eco-CI in your GitLab pipeline, you must first include a reference to the eco-ci-gitlab.yml file as such:
 ```
@@ -327,19 +290,16 @@ test-job:
   + The plugin is tested on:
   + `ubuntu-latest` (22.04 at the time of writing)
   + `ubuntu-24.04`
-  + [Autoscaling Github Runners](https://docs.github.com/en/actions/using-github-hosted-runners/about-larger-runners/managing-larger-runners#configuring-autoscaling-for-larger-runners) are not supported 
   + It is known to not work on `ubuntu-20.04` ([See here](https://github.com/green-coding-solutions/eco-ci-energy-estimation/issues/72))
   + Also Windows and macOS are currently not supported.
 
 - If you have your pipelines split over multiple VM's (often the case with many jobs) ,you have to treat each VM as a seperate machine for the purposes of measuring and setting up Eco-CI.
 
 - The underlying [Cloud Energy](https://github.com/green-coding-solutions/cloud-energy) model requires the CPU to have a fixed frequency setting. This is typical for cloud testing and is the case for instance on GitHub, but not always the case in different CIs.
 
-See also our [work on analysing fixed frequency in Cloud Providers and CI/CD](https://www.green-coding.io/case-studies/cpu-utilization-usefulness/)
-
 - The XGBoost model data is trained via the [SPECpower](https://www.spec.org/power_ssj2008/results/) database, which was mostly collected on compute machines. Results will be off for non big cloud servers and also for machines that are memory heavy or machines which rely more heavily on their GPU's for computations.
 
-### Note on the integration / Auto-Updates
+### Note on the integration
 - If you want the extension to automatically update within a version number, use the convenient @vX form. 
   + `uses: green-coding-solutions/eco-ci-energy-estimation@v3 # will pick the latest minor v3.x`
   + In case of a major change from @v3 to @v4 you need to upgrade manually. The upside is: If you use dependabot it will create a PR for you as it understands the notation
@@ -353,10 +313,3 @@ See also our [work on analysing fixed frequency in Cloud Providers and CI/CD](ht
   + We do **not** recommend this as it might contain beta features. We recommend using the releases and tagged versions only
 
 
-### Testing
-
-For local testing you can just run in the docker container of your choice, directly from the root of the repository:
-```bash
-
-docker run --rm -it -v ./:/tmp/data:ro invent-registry.kde.org/sysadmin/ci-images/suse-qt67:latest bash /tmp/data/local_ci.example.sh
-```
diff --git a/action.yml b/action.yml
@@ -12,10 +12,6 @@ inputs:
     description: 'Label for the get-measurement task, to mark what this measurement correlates to in your workflow'
     default: null
     required: false
-  machine-power-data:
-    description: 'The file to read the machine power data from. Default will be 4 core AMD EPYC 7763 Github Runner'
-    default: "github_EPYC_7763_4_CPU_shared.sh"
-    required: false
   send-data:
     description: 'Send metrics data to metrics.green-coding.io to create and display badge, and see an overview of the energy of your CI runs. Set to false to send no data.'
     default: true
@@ -28,16 +24,20 @@ inputs:
     description: 'Show the energy reading results in a table during display-results step'
     default: true
     required: false
+  display-graph:
+    description: 'Show the graph of the energy use over time during display-results step'
+    default: true
+    required: false
   display-badge:
     description: 'Shows the badge for the ci run during display-results step'
     default: true
     required: false
   pr-comment:
     description: 'Add a comment to the PR with the results during display-results step'
     default: false
-  gh-api-base:
+  api-base:
       description: 'Base URL of the Github API to send data to. Default is api.github.com, but can be changed to your hostname if you have Github Enterprise'
-      default: ${{ github.api_url }}
+      default: 'api.github.com'
       required: false
   company-uuid:
       description: 'If you want to add data to the CarbonDB you can set this to your company UUID'
@@ -75,16 +75,73 @@ runs:
       name: Setup
       shell: bash
       run: |
-        ${{github.action_path}}/scripts/setup.sh initialize "${{inputs.machine-power-data}}"
+        if command -v python3 &>/dev/null; then
+            echo "Python is already installed."
+        else
+            echo "Python is not installed. Installing..."
+            apt-get update
+            apt-get install -y python3.12 python3.12-venv || apt-get install -y python3.10 python3.10-venv
+            echo "Python has been installed."
+        fi
+
+        python_version=$(python3 --version 2>&1)
+        python_major_version=$(python3 -c 'import sys; print(sys.version_info[0])')
+        python_minor_version=$(python3 -c 'import sys; print(sys.version_info[1])')
+        python_cache_path="/tmp/eco-ci/venv/lib/python${python_major_version}.${python_minor_version}/site-packages"
+        echo "python_cache_path=$python_cache_path" >> $GITHUB_OUTPUT
+
+
+        # call the initialize function of setup.sh
+        ${{github.action_path}}/scripts/setup.sh initialize -g ${{inputs.display-graph}}
+
+      # To identify the hash for our cache we cannot use the classic mechansim of
+      # hashFiles('/tmp/eco-ci/spec-power-model/requirements.txt')
+      # hashFiles is restricted to ONLY work in the GITHUB_WORKSPACE which is for the calling action
+      # therefore we need to construct the hash ourselfs beforehand and save it to an output variable
+    - if:  inputs.task == 'start-measurement' && env.ECO_CI_INIT != 'DONE'
+      name: Hash requirements file
+      id: hash-requirements
+      shell: bash
+      run: echo "myhash=$(md5sum /tmp/eco-ci/spec-power-model/requirements.txt | cut -d ' ' -f1)" >> $GITHUB_OUTPUT;
+
+    - if:  inputs.task == 'start-measurement' && env.ECO_CI_INIT != 'DONE'
+      name: Cache pip packages
+      id: cache-pip
+      uses: actions/cache@v4
+      env:
+        cache-name: cache-pip-packages
+      with:
+        # npm cache files are stored in `~/.npm` on Linux/macOS
+        path: ${{ steps.initialize.outputs.python_cache_path }}
+        key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ steps.hash-requirements.outputs.myhash }}
+        restore-keys: |
+          ${{ runner.os }}-build-${{ env.cache-name }}-${{ steps.hash-requirements.outputs.myhash }}
+
+    - if: inputs.task == 'start-measurement' && env.ECO_CI_INIT != 'DONE' && steps.cache-pip.outputs.cache-hit == 'true'
+      name: Inform about cache hit
+      continue-on-error: true
+      shell: bash
+      run: |
+        echo "Cache hit succeeded! 😀"
+
+    - if: inputs.task == 'start-measurement' && env.ECO_CI_INIT != 'DONE' && steps.cache-pip.outputs.cache-hit != 'true'
+      name: Inform about cache hit
+      continue-on-error: true
+      shell: bash
+      run: |
+        echo "Cache hit failed! ❌"
+
 
     - if:  inputs.task == 'start-measurement'
       name: Starting measurement
       shell: bash
       # if measurement is started first time the reporter might not have run already
       # we prefer this over manual startint / stopping as it is less error prone for users
       run: |
+        ${{github.action_path}}/scripts/setup.sh setup_python
+
         if ${{inputs.send-data}}; then
-          curl_response=$(curl -s -H "Authorization: Bearer ${{github.token}}" ${{ inputs.gh-api-base }}/repos/${{ github.repository }}/actions/workflows)
+          curl_response=$(curl -s -H "Authorization: Bearer ${{github.token}}" ${{ github.api_url }}/repos/${{ github.repository }}/actions/workflows)
           workflow_id=$(echo $curl_response | jq '.workflows[] | select(.name == "${{ github.workflow }}") | .id')
           ${{github.action_path}}/scripts/vars.sh add_var "WORKFLOW_ID" $workflow_id
         else
@@ -111,7 +168,7 @@ runs:
       id: run-total-model
       shell: bash
       run: |
-        ${{github.action_path}}/scripts/display_results.sh -dt ${{inputs.display-table}} -db ${{inputs.display-badge}} -b "${{inputs.branch}}" -r ${{ github.run_id }} -R "${{ github.repository }}" -sd ${{inputs.send-data}} -sc ${{inputs.show-carbon}} -s "github"
+        ${{github.action_path}}/scripts/display_results.sh -dt ${{inputs.display-table}} -dg ${{inputs.display-graph}} -db ${{inputs.display-badge}} -b "${{inputs.branch}}" -r ${{ github.run_id }} -R "${{ github.repository }}" -sd ${{inputs.send-data}} -sc ${{inputs.show-carbon}} -s "github"
         cat "/tmp/eco-ci/output.txt" >> $GITHUB_STEP_SUMMARY
         total_data_file="/tmp/eco-ci/total-data.json"
         echo "data-total-json=$(cat $total_data_file)" >> $GITHUB_OUTPUT
@@ -123,7 +180,7 @@ runs:
       env:
         PR_NUMBER: ${{ github.event.pull_request.number }}
       run: |
-        COMMENTS=$(curl -s -H  "Authorization: Bearer ${{github.token}}" "${{ inputs.gh-api-base }}/repos/${{ github.repository }}/issues/$PR_NUMBER/comments")
+        COMMENTS=$(curl -s -H  "Authorization: Bearer ${{github.token}}" "https://${{inputs.api-base}}/repos/${{ github.repository }}/issues/$PR_NUMBER/comments")
 
         echo "$COMMENTS" | jq -c --arg username "github-actions[bot]" '.[] | select(.user.login == $username and (.body | index("Eco-CI") // false))' | while read -r comment; do
             COMMENT_ID=$(echo "$comment" | jq -r '.id')
@@ -136,12 +193,12 @@ runs:
         $INNER_BODY
 
         </details>" '{"body": $body}')
-            curl -s -H "Authorization: Bearer ${{github.token}}" -X PATCH -d "$PAYLOAD" "${{ inputs.gh-api-base }}/repos/${{ github.repository }}/issues/comments/$COMMENT_ID"
+            curl -s -H "Authorization: Bearer ${{github.token}}" -X PATCH -d "$PAYLOAD" "https://${{inputs.api-base}}/repos/${{ github.repository }}/issues/comments/$COMMENT_ID"
             echo "Comment $COMMENT_ID collapsed."
         done
 
         NEW_COMMENT=$(cat "/tmp/eco-ci/output-pr.txt" | jq -Rs '.')
-        API_URL="${{ inputs.gh-api-base }}/repos/${{ github.repository }}/issues/${PR_NUMBER}/comments"
+        API_URL="https://${{inputs.api-base}}/repos/${{ github.repository }}/issues/${PR_NUMBER}/comments"
         curl -X POST -H "Authorization: Bearer ${{github.token}}" -d @- $API_URL <<EOF
         {
           "body": $NEW_COMMENT