Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't start the exporter prometheus #236

Closed
faguayot opened this issue Jun 24, 2021 · 12 comments
Closed

Can't start the exporter prometheus #236

faguayot opened this issue Jun 24, 2021 · 12 comments

Comments

@faguayot
Copy link

Describe the bug
We can't start the recollecion of data with prometheus for the new harvest releases 21.05.2 and the pre-release 21.05.3. In the previous version, we were running the collector without errors.
Environment

  • Harvest version: harvest version 21.05.2-1 (commit ce091de) (build date 2021-06-14T20:31:09+0530) linux/amd64 or harvest version 21.05.3-1 (commit 63f0b11) (build date 2021-06-23T22:11:00+0530) linux/amd64

  • Command line arguments used: bin/harvest start --config two.yml

  • OS: RHEL 8.2

  • Install method: yum

  • ONTAP Version: 9.5 and 9.7

To Reproduce
Here the output of the command running in the foreground mode, here you can see the behaviour for the three different versions. In the 21.05.1 we were running correctly and in the other two not.

====================================
HARVESST 21.05.1
====================================

[root@harvest20 harvest]# bin/harvest -v
harvest version 21.05.1-1 (commit 2211c00) (build date 2021-05-21T01:28:12+0530) linux/amd64
[root@harvest20 harvest]# bin/harvest start --config two.yml --foreground
set debug mode ON (starting poller in foreground otherwise is unsafe)
starting in foreground, enter CTRL+C or close terminal to stop poller
2021/06/24 01:18:42 (info ) : options config: two.yml
2021/06/24 01:18:42 (info ) (poller) (sces1p1_01): started in foreground [pid=1792]
2021/06/24 01:18:45 (info ) (poller) (sces1p1_01): poller start-up complete
2021/06/24 01:18:45 (info ) (poller) (sces1p1_01): updated status, up collectors: 41 (of 41), up exporters: 1 (of 1)
2021/06/24 01:18:45 (info ) (collector) (Zapi:SnapMirror): no [SnapMirror] instances on system, entering standby mode
2021/06/24 01:18:45 (info ) (collector) (Zapi:SnapMirror): no [SnapMirror] instances on system, entering standby mode
2021/06/24 01:18:45 (info ) (collector) (ZapiPerf:Path): no [Path] instances on system, entering standby mode
2021/06/24 01:18:45 (info ) (collector) (ZapiPerf:Path): recovered from standby mode, back to normal schedule
2021/06/24 01:18:45 (warning) (collector) (ZapiPerf:Path): lagging behind schedule 60.63µs
2021/06/24 01:18:46 (info ) (collector) (Zapi:Lun): no [Lun] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:WAFLSizer): no [WAFLSizer] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:WAFLSizer): recovered from standby mode, back to normal schedule
2021/06/24 01:18:46 (warning) (collector) (ZapiPerf:WAFLSizer): lagging behind schedule 61.529µs
^X2021/06/24 01:18:46 (error ) (collector) (ZapiPerf:FCVI): instance request: api request rejected => Counter collection is disabled
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:FCVI): no [FCVI] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:FCVI): recovered from standby mode, back to normal schedule
2021/06/24 01:18:46 (warning) (collector) (ZapiPerf:FCVI): lagging behind schedule 74.465µs
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:WAFLCompBin): no [WAFLCompBin] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:WAFLCompBin): recovered from standby mode, back to normal schedule
2021/06/24 01:18:46 (warning) (collector) (ZapiPerf:WAFLCompBin): lagging behind schedule 75.41µs
2021/06/24 01:18:46 (info ) (collector) (Zapi:Lun): no [Lun] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:WAFLAggr): no [WAFLAggr] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:WAFLAggr): recovered from standby mode, back to normal schedule
2021/06/24 01:18:46 (warning) (collector) (ZapiPerf:WAFLAggr): lagging behind schedule 56.449µs
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:CIFSvserver): no [CIFSvserver] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:CIFSvserver): recovered from standby mode, back to normal schedule
2021/06/24 01:18:46 (warning) (collector) (ZapiPerf:CIFSvserver): lagging behind schedule 185.021µs
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:ObjectStoreClient): no [ObjectStoreClient] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:ObjectStoreClient): recovered from standby mode, back to normal schedule
2021/06/24 01:18:46 (warning) (collector) (ZapiPerf:ObjectStoreClient): lagging behind schedule 59.669µs
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:ISCSI): no [ISCSI] instances on system, entering standby mode
2021/06/24 01:18:46 (info ) (collector) (ZapiPerf:ISCSI): recovered from standby mode, back to normal schedule
2021/06/24 01:18:46 (warning) (collector) (ZapiPerf:ISCSI): lagging behind schedule 64.995µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv3Node): no [NFSv3Node] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv3Node): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:NFSv3Node): lagging behind schedule 354.255µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv4Node): no [NFSv4Node] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv4Node): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:NFSv4Node): lagging behind schedule 46.863µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:ExtCacheObj): no [ExtCacheObj] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:ExtCacheObj): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:ExtCacheObj): lagging behind schedule 188.952µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv3): no [NFSv3] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv3): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:NFSv3): lagging behind schedule 73.488µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:CIFSNode): no [CIFSNode] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:CIFSNode): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:CIFSNode): lagging behind schedule 57.672µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv41Node): no [NFSv41Node] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv41Node): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:NFSv41Node): lagging behind schedule 62.548µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv4): no [NFSv4] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv4): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:NFSv4): lagging behind schedule 64.687µs
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv41): no [NFSv41] instances on system, entering standby mode
2021/06/24 01:18:47 (info ) (collector) (ZapiPerf:NFSv41): recovered from standby mode, back to normal schedule
2021/06/24 01:18:47 (warning) (collector) (ZapiPerf:NFSv41): lagging behind schedule 82.519µs

 ====================================
 HARVESST 21.05.2
 ====================================

[root@harvest20 harvest]# bin/harvest -v
harvest version 21.05.2-1 (commit ce091de) (build date 2021-06-14T20:31:09+0530) linux/amd64
[root@harvest20 harvest]# bin/harvest start --config two.yml --foreground
set debug mode ON (starting poller in foreground otherwise is unsafe)
starting in foreground, enter CTRL+C or close terminal to stop poller
1:20AM INF command-line-arguments/poller.go:159 > log level used: info Poller=sces1p1_01
1:20AM INF command-line-arguments/poller.go:160 > options config: two.yml Poller=sces1p1_01
1:20AM INF command-line-arguments/poller.go:191 > started in foreground [pid=1895] Poller=sces1p1_01
1:20AM INF command-line-arguments/poller.go:293 > poller start-up complete Poller=sces1p1_01
1:20AM INF command-line-arguments/poller.go:417 > updated status, up collectors: 41 (of 41), up exporters: 1 (of 1) Poller=sces1p1_01
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [SnapMirror] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:SnapMirror
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [Path] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:Path
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:Path
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 336.887µs Poller=sces1p1_01 collector=ZapiPerf:Path
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [SnapMirror] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:SnapMirror
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [WAFLSizer] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:WAFLSizer
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:WAFLSizer
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 262.112µs Poller=sces1p1_01 collector=ZapiPerf:WAFLSizer
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [WAFLCompBin] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:WAFLCompBin
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:WAFLCompBin
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 81.755µs Poller=sces1p1_01 collector=ZapiPerf:WAFLCompBin
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [Lun] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:Lun
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [ObjectStoreClient] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:ObjectStoreClient
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:ObjectStoreClient
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 171.365µs Poller=sces1p1_01 collector=ZapiPerf:ObjectStoreClient
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [ISCSI] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:ISCSI
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:ISCSI
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 112.106µs Poller=sces1p1_01 collector=ZapiPerf:ISCSI
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [CIFSvserver] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:CIFSvserver
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:CIFSvserver
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 275.845µs Poller=sces1p1_01 collector=ZapiPerf:CIFSvserver
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [CIFSNode] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:CIFSNode
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:CIFSNode
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 114.169µs Poller=sces1p1_01 collector=ZapiPerf:CIFSNode
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [NFSv3] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv3
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv3
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 92.334µs Poller=sces1p1_01 collector=ZapiPerf:NFSv3
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [WAFLAggr] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:WAFLAggr
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:WAFLAggr
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 85.435µs Poller=sces1p1_01 collector=ZapiPerf:WAFLAggr
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [Lun] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:Lun
1:20AM ERR goharvest2/cmd/collectors/zapiperf/zapiperf.go:1163 > instance request error="api request rejected => Counter collection is disabled" Poller=sces1p1_01 collector=ZapiPerf:FCVI stack=[{"func":"New","line":"35","source":"errors.go"},{"func":"(*Client).invoke","line":"402","source":"client.go"},{"func":"(*Client).InvokeBatchWithTimers","line":"280","source":"client.go"},{"func":"(*Client).InvokeBatchRequest","line":"253","source":"client.go"},{"func":"(*ZapiPerf).PollInstance","line":"1162","source":"zapiperf.go"},{"func":"(*task).Run","line":"60","source":"schedule.go"},{"func":"(*AbstractCollector).Start","line":"270","source":"collector.go"},{"func":"goexit","line":"1371","source":"asm_amd64.s"}]
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [FCVI] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:FCVI
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:FCVI
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 95.872µs Poller=sces1p1_01 collector=ZapiPerf:FCVI
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [NFSv4] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv4
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv4
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 139.563µs Poller=sces1p1_01 collector=ZapiPerf:NFSv4
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [NFSv3Node] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv3Node
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv3Node
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 130.995µs Poller=sces1p1_01 collector=ZapiPerf:NFSv3Node
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [NFSv41Node] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv41Node
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv41Node
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 121.992µs Poller=sces1p1_01 collector=ZapiPerf:NFSv41Node
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [NFSv4Node] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv4Node
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv4Node
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 138.336µs Poller=sces1p1_01 collector=ZapiPerf:NFSv4Node
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [ExtCacheObj] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:ExtCacheObj
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:ExtCacheObj
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 144.945µs Poller=sces1p1_01 collector=ZapiPerf:ExtCacheObj
1:20AM INF goharvest2/cmd/poller/collector/collector.go:296 > no [NFSv41] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv41
1:20AM INF goharvest2/cmd/poller/collector/collector.go:318 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv41
1:20AM WRN goharvest2/cmd/poller/collector/collector.go:387 > lagging behind schedule 96.71µs Poller=sces1p1_01 collector=ZapiPerf:NFSv41

====================================
HARVESST 21.05.3
====================================

[root@harvest20 harvest]# bin/harvest -v
harvest version 21.05.3-1 (commit 63f0b11) (build date 2021-06-23T22:11:00+0530) linux/amd64
[root@harvest20 harvest]# bin/harvest start --config two.yml --foreground
set debug mode ON (starting poller in foreground otherwise is unsafe)
configuration error => Poller does not exist sces1p1_01
starting in foreground, enter CTRL+C or close terminal to stop poller
1:21AM INF command-line-arguments/poller.go:154 > log level used: info Poller=sces1p1_01
1:21AM INF command-line-arguments/poller.go:155 > options config: two.yml Poller=sces1p1_01
1:21AM INF command-line-arguments/poller.go:180 > started in foreground [pid=2017] Poller=sces1p1_01
1:21AM INF command-line-arguments/poller.go:282 > poller start-up complete Poller=sces1p1_01
1:21AM INF command-line-arguments/poller.go:406 > updated status, up collectors: 41 (of 41), up exporters: 1 (of 1) Poller=sces1p1_01
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [SnapMirror] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:SnapMirror
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [SnapMirror] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:SnapMirror
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [Path] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:Path
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:Path
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 99.318µs Poller=sces1p1_01 collector=ZapiPerf:Path
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [WAFLSizer] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:WAFLSizer
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:WAFLSizer
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 134.251µs Poller=sces1p1_01 collector=ZapiPerf:WAFLSizer
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [WAFLCompBin] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:WAFLCompBin
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:WAFLCompBin
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 118.276µs Poller=sces1p1_01 collector=ZapiPerf:WAFLCompBin
1:21AM ERR goharvest2/cmd/collectors/zapiperf/zapiperf.go:1170 > instance request error="api request rejected => Counter collection is disabled" Poller=sces1p1_01 collector=ZapiPerf:FCVI stack=[{"func":"New","line":"35","source":"errors.go"},{"func":"(*Client).invoke","line":"403","source":"client.go"},{"func":"(*Client).InvokeBatchWithTimers","line":"281","source":"client.go"},{"func":"(*Client).InvokeBatchRequest","line":"254","source":"client.go"},{"func":"(*ZapiPerf).PollInstance","line":"1169","source":"zapiperf.go"},{"func":"(*task).Run","line":"60","source":"schedule.go"},{"func":"(*AbstractCollector).Start","line":"269","source":"collector.go"},{"func":"goexit","line":"1371","source":"asm_amd64.s"}]
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [FCVI] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:FCVI
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:FCVI
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 75.559µs Poller=sces1p1_01 collector=ZapiPerf:FCVI
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [WAFLAggr] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:WAFLAggr
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:WAFLAggr
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 81.203µs Poller=sces1p1_01 collector=ZapiPerf:WAFLAggr
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [Lun] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:Lun
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [ObjectStoreClient] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:ObjectStoreClient
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:ObjectStoreClient
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 133.832µs Poller=sces1p1_01 collector=ZapiPerf:ObjectStoreClient
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [CIFSvserver] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:CIFSvserver
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:CIFSvserver
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 5.773295ms Poller=sces1p1_01 collector=ZapiPerf:CIFSvserver
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [ISCSI] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:ISCSI
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:ISCSI
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 83.647µs Poller=sces1p1_01 collector=ZapiPerf:ISCSI
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [ExtCacheObj] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:ExtCacheObj
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:ExtCacheObj
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 115.696µs Poller=sces1p1_01 collector=ZapiPerf:ExtCacheObj
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [CIFSNode] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:CIFSNode
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:CIFSNode
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 156.771µs Poller=sces1p1_01 collector=ZapiPerf:CIFSNode
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [Lun] instances on system, entering standby mode Poller=sces1p1_01 collector=Zapi:Lun
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [NFSv4Node] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv4Node
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv4Node
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 80.038µs Poller=sces1p1_01 collector=ZapiPerf:NFSv4Node
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [NFSv3Node] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv3Node
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv3Node
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 686.415µs Poller=sces1p1_01 collector=ZapiPerf:NFSv3Node
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [NFSv41Node] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv41Node
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv41Node
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 190.802µs Poller=sces1p1_01 collector=ZapiPerf:NFSv41Node
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [NFSv3] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv3
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv3
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 327.095µs Poller=sces1p1_01 collector=ZapiPerf:NFSv3
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [NFSv4] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv4
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv4
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 81.35µs Poller=sces1p1_01 collector=ZapiPerf:NFSv4
1:21AM INF goharvest2/cmd/poller/collector/collector.go:295 > no [NFSv41] instances on system, entering standby mode Poller=sces1p1_01 collector=ZapiPerf:NFSv41
1:21AM INF goharvest2/cmd/poller/collector/collector.go:317 > recovered from standby mode, back to normal schedule Poller=sces1p1_01 collector=ZapiPerf:NFSv41
1:21AM WRN goharvest2/cmd/poller/collector/collector.go:386 > lagging behind schedule 120.266µs Poller=sces1p1_01 collector=ZapiPerf:NFSv41

@cgrinds
Copy link
Collaborator

cgrinds commented Jun 24, 2021

@faguayot appreciate all the details - let's focus on 21.05.3 for now.
When staring harvest in foreground mode, Harvest also enables debug=true
and when debug is enabled, the Prometheus exporter won't export data.

Let's try:

  1. clear logs for sces1p1_01 via rm /var/log/poller_sces1p1_01.log
  2. cd to /opt/harvest
  3. Run that one poller bin/harvest --config two.yml start sces1p1_01
  4. Paste the results of ps aux | grep poller
  5. Paste or attach the log output of /var/log/poller_sces1p1_01.log
  6. Paste the results of bin/harvest --config two.yml status
    This will show you the PromPort - wait a couple of minutes, and then use that port to curl the metrics endpoint like so:
    curl -s 'http://127.0.0.1:<PORT>/metrics

@faguayot
Copy link
Author

Hello Chris,

Here the image with the status like 6-7 minutos after I've start the process.
image

In this image appears the process running.
image

Since I've upgraded the harvest from 21.05.1, in the logs I've seen those traces in which the exporter isn't running.

{"level":"info","Poller":"sces1p1_01","caller":"command-line-arguments/poller.go:406","time":"2021-06-25T09:50:32+02:00","message":"updated status, up collectors: 41 (of 41), up exporters: 0 (of 0)"}

Here the poller log file.
poller_sces1p1_01.log

This is our configuration for the pollers, exporters and the default. I've changed the format yml to log because if it's not, I can't upload.
two.log

@rahulguptajss
Copy link
Contributor

@faguayot your configuration in two.yml is not correct w.r.t. port. prometheus_port configuration is not supported.
You can refer @ https://github.com/NetApp/harvest/blob/main/harvest.yml for port configuration.
Alternatively, we have another way of configuring ports #172 in 21.05.3 . Documentation @ https://github.com/NetApp/harvest/blob/release/21.05.3/cmd/exporters/prometheus/README.md . Documentation will be updated in main branch by today.

@cgrinds
Copy link
Collaborator

cgrinds commented Jun 25, 2021

@faguayot as @rahulguptajss mentioned prometheus_port is not a supported key/value. Take a look at port range and see if that's a better fit.

@faguayot
Copy link
Author

Thanks @rahulguptajss I was checking something like this parameter port_range: 2000-2030 because I knew that you will implement in the future but I didn't found and I didn't know the prometheus_port wasn't supported anymore.

Another thing is in this new version I should put the exporter which I want to use with every poller instead of define the exporter into the Defaults configuration and taking this value as default for every poller always in the configuration of the cluster. In a case where we have this configuration in the Defaults and in the specific configuration for this cluster, the prevalence should be for the configuration in the cluster which is more specific. That is what happen in the configuration in the Defaults now (If I am not wrong) and it was the same for harvest 1.6.

I think it is important to know which configuration/parameters can use in the configuration for the different versions and which parameters will be remove about the parameters were using , because if you remove something we don't know what happen until we spend time in troubleshooting and we will drive crazy trying to look where is the problem, like happened to me.

Thanks to both.

@faguayot
Copy link
Author

In the release notes for the veersion 21.05.2 , it says:

Add workload counters to ZapiPerf #9

Do we need to do something for see this counters? Maybe it is required to change something in the configuration in the ZapiPerf.

Could you help me with this? Because I moved to 21.05.2 and 21.05.3 versions for achiving this.

@rahulguptajss
Copy link
Contributor

@faguayot
Copy link
Author

Perfect, when I uncommented those lines the workloads have been written in prometheus. There is documentation about what exactly is every metric?

Thanks.

@cgrinds
Copy link
Collaborator

cgrinds commented Jun 29, 2021

Glad that's working for you @faguayot.

Documentation is a great question - ZAPIs are somewhat self-documenting and Harvest includes some rudimentary tools to help surface the ZAPI metadata as well as actual data. Here's an example relevant to your workload metrics question. And yes, this needs improvement - in the meantime, hopefully this will give you a bit more knowledge on how to dig deeper.

bin/harvest zapi
...
Examples:
  harvest zapi -p infinity show apis                             Query cluster infinity for available APIs
  harvest zapi -p infinity show attrs --api volume-get-iter      Query cluster infinity for volume-get-iter metrics
                                                                 Typically APIs suffixed with 'get-iter' have interesting metrics 
  harvest zapi -p infinity show data --api volume-get-iter       Query cluster infinity and print attribute tree of volume-get-iter

Let's use workload_detail_volume as an example.

The output of the command below shows us the metadata (including a description) of each of the counters listed in workload_detail_volume.yaml. For example, we can see below that service_time is The workload's average service time per visit to the service center.

# example
bin/harvest --config harvest.openlab.yml zapi -p umeng_aff300 show counters --object workload_detail_volume
connected to umeng-aff300-05-06 (NetApp Release 9.7P7: Thu Aug 27 20:57:05 UTC 2020)
[counters]                            -                                   *
  [counter-info]                      -                                   *
    [desc]                            - Determines whether or not service center-based statistics are in the latency path.
    [is-deprecated]                   -                               false
    [name]                            -                     in_latency_path
    [privilege-level]                 -                            advanced
    [properties]                      -                  raw,no-zero-values
    [unit]                            -                                none
  [counter-info]                      -                                   *
    [desc]                            - Name of the workload_detail_volume instance
    [is-deprecated]                   -                               false
    [is-key]                          -                                true
    [name]                            -                       instance_name
    [privilege-level]                 -                            advanced
    [properties]                      -                   string,no-display
    [unit]                            -                                none
  [counter-info]                      -                                   *
    [desc]                            - UUID for the workload_detail_volume instance
    [is-deprecated]                   -                               false
    [name]                            -                       instance_uuid
    [privilege-level]                 -                            advanced
    [properties]                      -                   string,no-display
    [unit]                            -                                none
  [counter-info]                      -                                   *
    [desc]                            -                    System node name
    [is-deprecated]                   -                               false
    [is-key]                          -                                true
    [name]                            -                           node_name
    [privilege-level]                 -                            advanced
    [properties]                      -                   string,no-display
    [unit]                            -                                none
  [counter-info]                      -                                   *
    [desc]                            -                      System node id
    [is-deprecated]                   -                               false
    [name]                            -                           node_uuid
    [privilege-level]                 -                            advanced
    [properties]                      -                   string,no-display
    [unit]                            -                                none
  [counter-info]                      -                                   *
    [desc]                            - Ontap process that provided this instance
    [is-deprecated]                   -                               false
    [is-key]                          -                                true
    [name]                            -                        process_name
    [privilege-level]                 -                                diag
    [properties]                      -                              string
    [unit]                            -                                none
  [counter-info]                      -                                   *
    [desc]                            -    Name of the associated resource.
    [is-deprecated]                   -                               false
    [name]                            -                       resource_name
    [privilege-level]                 -                            advanced
    [properties]                      -                              string
    [unit]                            -                                none
  [counter-info]                      -                                   *
    [base-counter]                    -                              visits
    [desc]                            - The workload&apos;s average service time per visit to the service center.
    [is-deprecated]                   -                               false
    [name]                            -                        service_time
    [privilege-level]                 -                            advanced
    [properties]                      -              average,no-zero-values
    [unit]                            -                            microsec
  [counter-info]                      -                                   *
    [desc]                            - The number of visits that the workload made to the service center; measured in visits per second.
    [is-deprecated]                   -                               false
    [name]                            -                              visits
    [privilege-level]                 -                            advanced
    [properties]                      -                 rate,no-zero-values
    [unit]                            -                             per_sec
  [counter-info]                      -                                   *
    [base-counter]                    -                              visits
    [desc]                            - The workload&apos;s average wait time per visit to the service center.
    [is-deprecated]                   -                               false
    [name]                            -                           wait_time
    [privilege-level]                 -                            advanced
    [properties]                      -              average,no-zero-values
    [unit]                            -                            microsec

Other ZAPI documentation

@faguayot
Copy link
Author

Hello @cgrinds,

Thanks for this info because it is useful for understand what it is every counter. In the case of ZAPI doesn't show the "show counters" command so we didn't know this command.

I've upgraded the harvest version to the last release and I've seen some errors for the Volume and Disk with ZAPI.

harvest version 21.05.3-2 (commit b482aff) (build date 2021-06-28T21:08:05+0530) linux/amd64
Here the errors:
image

image

Do you know why this could be happening?

@cgrinds
Copy link
Collaborator

cgrinds commented Jun 30, 2021

@faguayot glad it helped - the zapi tool needs to make its arguments more obvious. counters is there, but hard to see in the show line

image

I'll take at the errors - are these persistent or transient? On the next poll do they go away? Do you see anything earlier in the logs about context deadline exceeded (Client.Timeout or context cancellation while reading

There's a similar conversation in Slack about the same errors. There the problem is with a client timeout. Because the ZAPI times out, you get these skip instance messages.

You can try increasing the client _timeout by editing conf/zapi/default.yaml and adding this at line 9

client_timeout: 60

This increases the ZAPI timeout from 10s to 60s.

@faguayot
Copy link
Author

faguayot commented Jul 8, 2021

Hello @cgrinds

Regarding the issue context deadline exceeded (Client.Timeout or context cancellation while reading I increase the
client_timeout from 30 to 60. We had this problem in the past and @vgratian recommend us to increase the time to 30s.

Doing this change, the problem seem to be solved.

Thanks.
Best regards!

@cgrinds cgrinds closed this as completed Jul 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants