-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use application latest height in query_status
#2021
Conversation
query_status
relayer/src/chain/cosmos.rs
Outdated
/// Query the application status | ||
fn query_application_status(&self) -> Result<StatusResponse, Error> { | ||
crate::time!("query_application_status"); | ||
crate::telemetry!(query, self.id(), "query_application_status"); | ||
|
||
let status = self.status()?; | ||
// query the chain status | ||
let status = self.chain_status()?; | ||
|
||
// query the application status | ||
let abci_status = self | ||
.block_on(self.rpc_client.abci_info()) | ||
.map_err(|e| Error::rpc(self.config.rpc_addr.clone(), e))?; | ||
|
||
// has the application been updated with latest block? | ||
let time = if abci_status.last_block_height == status.sync_info.latest_block_height { | ||
// yes, use the chain latest block time | ||
status.sync_info.latest_block_time | ||
} else { | ||
// no, retrieve the time of the header at application latest height | ||
self.block_on(self.rpc_client.commit(abci_status.last_block_height)) | ||
.map_err(|e| Error::rpc(self.config.rpc_addr.clone(), e))? | ||
.signed_header | ||
.header | ||
.time | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main change for this PR. It requires 3 RPCs instead of one. And this happens for handling of every IBC send packet event with impact on relayer performance and extra churn on the full node. So I am a bit hesitant to move this forward until performance tests are done and/or get all info in one RPC.
I opened tendermint/tendermint#8248 to get input from tendermint team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also wary of increasing the pressure on the node, especially for every SendPacket
event.
Are we seeing this only when issuing queries manually or is it also happening during relaying? If only the former, I would vote for waiting until we can get away with a single RPC call or until something like this is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is seen during relaying and can cause failures for any query. In one extreme case reported by Mircea (see issue) hermes was not relaying at all on a channel as it was not able to spawn workers due to connection not found
.
This was fixed with a partial fix, similar to the one in this PR but without the time
fix (i.e. we would only fix the height to be the application height). When I did some tests with custom chains (with increased block execution times) the lack of time
fix would sometime cause header in the future
chain error for client updates.
I did some basic benchmarking with two RPC nodes:
Both found from here: https://chain-registry.netlify.app I looked at these 5 endpoints:
These numbers are preliminary, and may not reflect other operator's endpoints performance. Basic takeaway: If we can, we should avoid the DataThe numbers are in milliseconds.
|
Thanks @adizere for this! For some reason I don't see the |
Good point. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to reproduce the problem reliably. I haven't yet checked Anca's instructions from here, but that's probably the best place to start from.
let blocks = self | ||
.block_on( | ||
self.rpc_client | ||
.blockchain(abci_info.last_block_height, abci_info.last_block_height), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers:
- we query
/blockchain?minHeight=last_block_height&maxHeight=last_block_height
to fetch the metadata for a single block
|
||
// Check that transaction indexing is enabled | ||
if status.node_info.other.tx_index != TxIndexStatus::On { | ||
return Err(Error::tx_indexing_disabled(chain_id.clone())); | ||
} | ||
|
||
// Check that the chain identifier matches the network name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This change has not much to do with the present PR. It's just an additional non-critical health check that can reveal misconfigurations of config.toml.
I believe I am still seeing the same issue, when running the script against Gaia v6.0.3 using a patched Tendermint:
|
Detailed instructions for reproducing the issue and results:
|
Great instruction for testing Romain, thank you for sharing that! I was able to see the same issue as you. E.g.:
After digging into it, I realize that To summarize, I think the purpose of
That being said, this PR only replaces the use of blockchain height with application height specifically for |
The script retrieves the blockchain height then queries the connection with that height. If the height is higher than the application query state height you see that error. I used this script because in the original issue the command used was |
Alright so I've done just that a couple times, both against To me this is good to merge :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Anca for kicking off the fix and your guidance! Amazing work Romain with testing! LGTM 🚀
* Test with abci_status * Fix status also * Remove redundant code * Move changes in chain query_application_status() * Cargo fmt * Added /blockchain-based implementation for query_application_status * Documenting impl of query_application_status. * changelog * Cleanup Co-authored-by: Adi Seredinschi <[email protected]>
Closes: #1970
Description
PR author checklist:
unclog
.docs/
).Reviewer checklist:
Files changed
in the GitHub PR explorer.