Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server, plugin: enhance storage stats for IOPS #10034

Merged
merged 12 commits into from
Jan 7, 2025

Conversation

shwstppr
Copy link
Contributor

@shwstppr shwstppr commented Dec 4, 2024

Description

Adds framework layer change to allow retrieving and storing IOPS stats for storage pools. Custom PrimaryStoreDriver can implement method - getStorageIopsStats for returning IOPS stats. Existing method getUsedIops can also be overridden by such plugins when only used IOPS is returned. For testing purpose, implementation has been added for simulator hypervisor plugin to return capacity and used IOPS for a pool. For local storage pool, implementation has been added using iostat to return currently used IOPS. StoragePoolResponse class has been updated to return IOPS values which allows showing IOPS values in UI for different storage pool related views and APIs.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

With local storage (used iops returned by listStoragePools API):

(localcloud) 🐱 > list storagepools scope=HOST
{
  "count": 1,
  "storagepool": [
    {
      "clusterid": "3c21973f-de72-4196-af20-a8d13bd043a2",
      "clustername": "p1-c1",
      "created": "2024-12-03T06:31:52+0000",
      "disksizeallocated": 0,
      "disksizetotal": 20386414592,
      "disksizeused": 10423214080,
      "hasannotations": false,
      "hypervisor": "KVM",
      "id": "124992d7-8710-4c58-bafd-24efa7b79570",
      "ipaddress": "10.0.32.118",
      "name": "ol8.localdomain-local-124992d7",
      "overprovisionfactor": "2.0",
      "path": "/var/lib/libvirt/images",
      "podid": "42791654-5ace-4f1e-89e4-62bc2c3e6396",
      "podname": "Pod1",
      "provider": "DefaultPrimary",
      "scope": "HOST",
      "state": "Up",
      "storagecapabilities": {
        "VOLUME_SNAPSHOT_QUIESCEVM": "false"
      },
      "type": "Filesystem",
      "usediops": 2600140,
      "zoneid": "7fd98beb-22f0-4fb7-9740-8ec6290c7500",
      "zonename": "pr488-t11753-kvm-ol8"
    }
  ]
}

How did you try to break this feature and the system with this change?

Adds framework layer change to allow retrieving and storing IOPS stats for storage pools. Custom `PrimaryStoreDriver` can implement method - `getStorageIopsStats` for returning IOPS stats. Existing method `getUsedIops` can also be overridden by such plugins when only used IOPS is returned.
For testing purpose, implementation has been added for simulator hypervisor plugin to return capacity and used IOPS for a pool.
For local storage pool, implementation has been added using iostat to return currently used IOPS.
StoragePoolResponse class has been updated to return IOPS values which allows showing IOPS values in UI for different storage pool related views and APIs.

Signed-off-by: Abhishek Kumar <[email protected]>
@shwstppr
Copy link
Contributor Author

shwstppr commented Dec 4, 2024

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 11707

Signed-off-by: Abhishek Kumar <[email protected]>
Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 62.98343% with 67 lines in your changes missing coverage. Please review.

Project coverage is 16.05%. Comparing base (9231c1c) to head (56e8660).
Report is 28 commits behind head on 4.20.

Files with missing lines Patch % Lines
...oud/hypervisor/kvm/storage/LibvirtStoragePool.java 20.00% 12 Missing ⚠️
...src/main/java/com/cloud/server/StatsCollector.java 50.00% 7 Missing and 4 partials ⚠️
...e/cloudstack/api/response/StoragePoolResponse.java 0.00% 6 Missing ⚠️
...cloudstack/storage/datastore/db/StoragePoolVO.java 0.00% 6 Missing ⚠️
...m/cloud/hypervisor/kvm/storage/KVMStoragePool.java 0.00% 6 Missing ⚠️
...ava/com/cloud/simulator/dao/MockVolumeDaoImpl.java 0.00% 6 Missing ⚠️
...java/com/cloud/api/query/vo/StoragePoolJoinVO.java 0.00% 6 Missing ⚠️
...ain/java/com/cloud/storage/StorageManagerImpl.java 64.28% 5 Missing ⚠️
.../subsystem/api/storage/PrimaryDataStoreDriver.java 0.00% 3 Missing ⚠️
...om/cloud/agent/manager/MockStorageManagerImpl.java 0.00% 3 Missing ⚠️
... and 2 more
Additional details and impacted files
@@             Coverage Diff              @@
##               4.20   #10034      +/-   ##
============================================
+ Coverage     16.03%   16.05%   +0.01%     
- Complexity    12815    12840      +25     
============================================
  Files          5637     5637              
  Lines        493501   493650     +149     
  Branches      59829    59838       +9     
============================================
+ Hits          79130    79231     +101     
- Misses       405595   405636      +41     
- Partials       8776     8783       +7     
Flag Coverage Δ
uitests 4.02% <ø> (-0.01%) ⬇️
unittests 16.89% <62.98%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Abhishek Kumar <[email protected]>
Signed-off-by: Abhishek Kumar <[email protected]>
@shwstppr
Copy link
Contributor Author

shwstppr commented Dec 4, 2024

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Signed-off-by: Abhishek Kumar <[email protected]>
Copy link
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11711

Signed-off-by: Abhishek Kumar <[email protected]>
Copy link

github-actions bot commented Dec 4, 2024

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@rohityadavcloud rohityadavcloud added this to the 4.20.1 milestone Dec 4, 2024
@shwstppr
Copy link
Contributor Author

shwstppr commented Dec 5, 2024

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Signed-off-by: Abhishek Kumar <[email protected]>
@rohityadavcloud rohityadavcloud marked this pull request as ready for review December 16, 2024 13:25
@rohityadavcloud
Copy link
Member

@blueorangutan package

@blueorangutan
Copy link

@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11822

@shwstppr
Copy link
Contributor Author

@blueorangutan test

@blueorangutan
Copy link

@shwstppr a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11920)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 51214 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t11920-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@shwstppr
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11921

@shwstppr
Copy link
Contributor Author

@blueorangutan test matrix

@blueorangutan
Copy link

@shwstppr a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-11996)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 51526 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t11996-kvm-ol8.zip
Smoke tests completed. 141 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@blueorangutan
Copy link

[SF] Trillian test result (tid-11997)
Environment: kvm-ubuntu22 (x2), Advanced Networking with Mgmt server u22
Total time taken: 54551 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t11997-kvm-ubuntu22.zip
Smoke tests completed. 139 look OK, 2 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_oobm_multiple_mgmt_server_ownership Failure 31.79 test_outofbandmanagement.py
test_hostha_enable_ha_when_host_disabled Error 3.09 test_hostha_kvm.py
test_hostha_enable_ha_when_host_in_maintenance Error 302.18 test_hostha_kvm.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-11998)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 56601 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t11998-vmware-70u3.zip
Smoke tests completed. 133 look OK, 8 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_DeployVmAntiAffinityGroup_in_project Error 131.25 test_affinity_groups_projects.py
test_DeployVmAntiAffinityGroup Error 45.95 test_affinity_groups.py
test_03_deploy_and_scale_kubernetes_cluster Failure 48.20 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 0.08 test_kubernetes_clusters.py
test_01_deployVMInSharedNetwork Error 155.31 test_network.py
test_03_nic_multiple_vmware Error 490.75 test_nic.py
test_01_non_strict_host_anti_affinity Failure 158.26 test_nonstrict_affinity_group.py
test_02_non_strict_host_affinity Error 100.98 test_nonstrict_affinity_group.py
test_02_restore_vm_with_disk_offering Error 62.13 test_restore_vm.py
test_03_restore_vm_with_disk_offering_custom_size Error 50.88 test_restore_vm.py
ContextSuite context=TestMigrateVMStrictTags>:setup Error 0.00 test_vm_strict_host_tags.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-11999)
Environment: xcpng82 (x2), Advanced Networking with Mgmt server ol9
Total time taken: 76324 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t11999-xcpng82.zip
Smoke tests completed. 134 look OK, 7 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_condensed_drs_algorithm Failure 173.93 test_cluster_drs.py
test_02_balanced_drs_algorithm Failure 184.03 test_cluster_drs.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Error 499.49 test_internal_lb.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 7.40 test_network.py
test_01_non_strict_host_anti_affinity Error 223.62 test_nonstrict_affinity_group.py
test_02_non_strict_host_affinity Error 108.97 test_nonstrict_affinity_group.py
test_02_create_volume Error 5.26 test_resource_names.py
test_05_scale_vm_dont_allow_disk_offering_change Failure 70.52 test_scale_vm.py
test_01_volume_usage Error 96.88 test_usage.py

@shwstppr
Copy link
Contributor Author

Test failures are unrelated to change and look similar to what we have in the test matrix for health check PR. Re-running

@blueorangutan package

@blueorangutan
Copy link

@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11962

@shwstppr
Copy link
Contributor Author

shwstppr commented Jan 2, 2025

@blueorangutan test matrix

@blueorangutan
Copy link

@shwstppr a [SL] Trillian-Jenkins matrix job (EL8 mgmt + EL8 KVM, Ubuntu22 mgmt + Ubuntu22 KVM, EL8 mgmt + VMware 7.0u3, EL9 mgmt + XCP-ng 8.2 ) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-12023)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 54713 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t12023-kvm-ol8.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_05_vmschedule_test_e2e Failure 362.07 test_vm_schedule.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-12024)
Environment: kvm-ubuntu22 (x2), Advanced Networking with Mgmt server u22
Total time taken: 56540 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t12024-kvm-ubuntu22.zip
Smoke tests completed. 140 look OK, 1 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_oobm_multiple_mgmt_server_ownership Failure 31.78 test_outofbandmanagement.py

@blueorangutan
Copy link

[SF] Trillian test result (tid-12025)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 63014 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10034-t12025-vmware-70u3.zip
Smoke tests completed. 138 look OK, 3 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_deployVMInSharedNetwork Error 157.32 test_network.py
test_02_restore_vm_with_disk_offering Error 61.28 test_restore_vm.py
test_03_restore_vm_with_disk_offering_custom_size Error 55.11 test_restore_vm.py
test_02_list_cpvm_vm Failure 0.04 test_ssvm.py
test_04_cpvm_internals Failure 0.04 test_ssvm.py
test_12_destroy_cpvm Error 6.28 test_ssvm.py

@shwstppr
Copy link
Contributor Author

shwstppr commented Jan 7, 2025

Merging based on the manual testing and LGTMs
#10034 (review)

Test failures are unrelated and can be seen in the health check PR

@shwstppr shwstppr merged commit bd488c4 into apache:4.20 Jan 7, 2025
26 checks passed
@shwstppr shwstppr deleted the add-storagestats-iops-4.20 branch January 7, 2025 11:47
DaanHoogland added a commit that referenced this pull request Jan 8, 2025
* 4.20:
  merge errors fixed
  Restrict the migration of volumes attached to VMs in Starting state (#9725)
  server, plugin: enhance storage stats for IOPS (#10034)
  Introducing granular command timeouts global setting (#9659)
  Improve logging to include more identifiable information (#9873)
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Jan 10, 2025
Adds framework layer change to allow retrieving and storing IOPS stats for storage pools. Custom `PrimaryStoreDriver` can implement method - `getStorageIopsStats` for returning IOPS stats. Existing method `getUsedIops` can also be overridden by such plugins when only used IOPS is returned.
For testing purpose, implementation has been added for simulator hypervisor plugin to return capacity and used IOPS for a pool.
For local storage pool, implementation has been added using iostat to return currently used IOPS.
StoragePoolResponse class has been updated to return IOPS values which allows showing IOPS values in UI for different storage pool related views and APIs.

Signed-off-by: Abhishek Kumar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants