Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] [PFNs] improve observability of view function and simulate usage #11696

Merged
merged 8 commits into from
Jan 26, 2024

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Jan 18, 2024

Description

On PFNs, periodically (every minute) log the top-9 gas-using view functions and simulate transactions. This gives observability to the PFNs usage.

Utilizes a small cache that is reset every minute. Alternatively, such high-cardinality data would not work with prometheus.

Requires minimal changes to propagate gas usage.

Test Plan

Deploy PFN locally, observe log messages.

Copy link

trunk-io bot commented Jan 18, 2024

⏱️ 9h 40m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-tests 2h 33m 🟩🟩🟩🟩
windows-build 1h 46m 🟩🟩🟩🟩🟩
rust-move-unit-coverage 1h 27m 🟩🟩🟩
rust-lints 44m 🟩🟩🟩🟩🟩
rust-smoke-tests 43m 🟥
run-tests-main-branch 30m 🟩🟩🟩🟩🟥 (+2 more)
execution-performance / single-node-performance 20m 🟩
check 20m 🟩🟩🟩🟥🟩
forge-e2e-test / forge 17m 🟩
forge-compat-test / forge 15m 🟩
general-lints 13m 🟩🟩🟩🟩🟩
rust-images / rust-all 9m 🟩
check-dynamic-deps 9m 🟩🟩🟩🟩🟩
cli-e2e-tests / run-cli-tests 7m 🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+2 more)
node-api-compatibility-tests / node-api-compatibility-tests 51s 🟩
file_change_determinator 51s 🟩🟩🟩🟩🟩
permission-check 30s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 19s 🟩🟩🟩🟩🟩 (+1 more)
permission-check 18s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 17s 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 10s 🟩
execution-performance / file_change_determinator 8s 🟩
execution-performance / sequential-execution-performance 8s 🟩
determine-docker-build-metadata 8s 🟩
execution-performance / parallel-execution-performance 7s 🟩
permission-check 4s 🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
rust-smoke-tests 43m 31m +38%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (d38c66b) 70.9% compared to head (0758558) 70.9%.
Report is 6 commits behind head on main.

Files Patch % Lines
aptos-move/aptos-vm/src/aptos_vm.rs 96.0% 2 Missing ⚠️
types/src/on_chain_config/execution_config.rs 77.7% 2 Missing ⚠️
aptos-move/block-executor/src/limit_processor.rs 95.8% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #11696   +/-   ##
=======================================
  Coverage    70.9%    70.9%           
=======================================
  Files         791      791           
  Lines      180312   180406   +94     
=======================================
+ Hits       127990   128067   +77     
- Misses      52322    52339   +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bchocho bchocho force-pushed the brian/api-better-observability branch from 89ce918 to 9102ca2 Compare January 19, 2024 21:27
@bchocho bchocho changed the title brian/api better observability [API] [PFNs] improve observability of view function and simulate usage Jan 19, 2024
@bchocho bchocho marked this pull request as ready for review January 19, 2024 21:33
@bchocho bchocho requested a review from sitalkedia January 19, 2024 21:33
) -> Result<Vec<Vec<u8>>> {
) -> Result<(Vec<Vec<u8>>, u64)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, it would be preferred to be a struct, so the outputs are known names. That way it won't require someone to read the code at the end of the function to determine it's the gas used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we can have something similar to TransactionOutput or VMOutput structs

entry_function.module(),
&entry_function.function().into(),
),
TransactionPayload::Multisig(_) => "MultiSig".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MultiSig has an entry function tied to it, so I would probably add that here as well. It's in the inner type.

@@ -1220,6 +1220,19 @@ impl TransactionsApi {
_ => ExecutionStatus::MiscellaneousError(None),
};

let stats_key = match txn.payload() {
TransactionPayload::Script(_) => "Script".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add the script hash here

@bchocho bchocho force-pushed the brian/api-better-observability branch from 002ce11 to 20eab1c Compare January 22, 2024 23:36
@bchocho bchocho requested a review from gregnazario January 23, 2024 00:13
@bchocho
Copy link
Contributor Author

bchocho commented Jan 23, 2024

@gregnazario @georgemitenkov thanks! Ready for a new review -- I've added a proper struct, also making changes so view functions that fail will still get gas usage logged.

Copy link
Contributor

@JoshLind JoshLind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me -- unblocking 😄

@bchocho bchocho enabled auto-merge (squash) January 25, 2024 19:44

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@bchocho bchocho force-pushed the brian/api-better-observability branch from 20eab1c to 0758558 Compare January 26, 2024 01:32

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 075855844b2b53ce4152f095becd9e12db2d7556

two traffics test: inner traffic : committed: 7146 txn/s, latency: 5335 ms, (p50: 4800 ms, p90: 6900 ms, p99: 11400 ms), latency samples: 3094420
two traffics test : committed: 100 txn/s, latency: 2342 ms, (p50: 2200 ms, p90: 2600 ms, p99: 7100 ms), latency samples: 1700
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.219, avg: 0.197", "QsPosToProposal: max: 0.156, avg: 0.147", "ConsensusProposalToOrdered: max: 0.601, avg: 0.577", "ConsensusOrderedToCommit: max: 0.544, avg: 0.508", "ConsensusProposalToCommit: max: 1.118, avg: 1.086"]
Max round gap was 1 [limit 4] at version 1398057. Max no progress secs was 5.085525 [limit 10] at version 1398057.
Test Ok

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.8.3 ==> 075855844b2b53ce4152f095becd9e12db2d7556

Compatibility test results for aptos-node-v1.8.3 ==> 075855844b2b53ce4152f095becd9e12db2d7556 (PR)
1. Check liveness of validators at old version: aptos-node-v1.8.3
compatibility::simple-validator-upgrade::liveness-check : committed: 4948 txn/s, latency: 6631 ms, (p50: 6900 ms, p90: 9200 ms, p99: 13200 ms), latency samples: 183100
2. Upgrading first Validator to new version: 075855844b2b53ce4152f095becd9e12db2d7556
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1780 txn/s, latency: 15809 ms, (p50: 19200 ms, p90: 21900 ms, p99: 22600 ms), latency samples: 92560
3. Upgrading rest of first batch to new version: 075855844b2b53ce4152f095becd9e12db2d7556
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1572 txn/s, latency: 18443 ms, (p50: 19900 ms, p90: 22200 ms, p99: 22800 ms), latency samples: 81780
4. upgrading second batch to new version: 075855844b2b53ce4152f095becd9e12db2d7556
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3497 txn/s, latency: 9041 ms, (p50: 9900 ms, p90: 12900 ms, p99: 13300 ms), latency samples: 139880
5. check swarm health
Compatibility test for aptos-node-v1.8.3 ==> 075855844b2b53ce4152f095becd9e12db2d7556 passed
Test Ok

@bchocho bchocho merged commit 1ac0b8a into main Jan 26, 2024
47 checks passed
@bchocho bchocho deleted the brian/api-better-observability branch January 26, 2024 02:17
bchocho added a commit that referenced this pull request Jan 26, 2024
#11696)

### Description

On PFNs, periodically (every minute) log the top-9 gas-using view functions and simulate transactions. This gives observability to the PFNs usage.

Utilizes a small cache that is reset every minute. Alternatively, such high-cardinality data would not work with prometheus.

Requires minimal changes to propagate gas usage.

### Test Plan

Deploy PFN locally, observe log messages.
bchocho added a commit that referenced this pull request Jan 30, 2024
#11696)

### Description

On PFNs, periodically (every minute) log the top-9 gas-using view functions and simulate transactions. This gives observability to the PFNs usage.

Utilizes a small cache that is reset every minute. Alternatively, such high-cardinality data would not work with prometheus.

Requires minimal changes to propagate gas usage.

### Test Plan

Deploy PFN locally, observe log messages.
bchocho added a commit that referenced this pull request Jan 30, 2024
#11696) (#11791)

### Description

On PFNs, periodically (every minute) log the top-9 gas-using view functions and simulate transactions. This gives observability to the PFNs usage.

Utilizes a small cache that is reset every minute. Alternatively, such high-cardinality data would not work with prometheus.

Requires minimal changes to propagate gas usage.

### Test Plan

Deploy PFN locally, observe log messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants