Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed parentchain extrinsic submission would block subsequent extrinsic execution #970

Open
Kailai-Wang opened this issue Sep 2, 2022 · 10 comments
Labels
F2-bug Something isn't working P1-asap S4-ExtrinsicSender Parentchain Component

Comments

@Kailai-Wang
Copy link
Contributor

Hi, we encountered this problem during our development where we mistakenly used i32 instead of u8 type when building the call indexes that should be sent to the parentchain.

Then we saw an error upon sending to the parentchain, which is expected:

[2022-09-01T03:25:48Z WARN  integritee_service::ocall_bridge::worker_on_chain_ocall] send call..."0x3902840054d1f38ed4ceb09f507422c1ff77038cf82bfd5954e5358f549882d49c241e2200b463411d2ef0d207004266629e0aa8e2b96ab3daad1e72eeefa5ad626b86c82e6593ef1d19422788f75bf8622a6aa09beb9fcf20b4c6691152ea6d8876e7c30c0064004000000000000000d43593c715fdd31c61141abd04a99fd6822c8558854ccde39a5684e7a56da27d"
[2022-09-01T03:25:48Z ERROR ws::handler] WS Error <Custom(Extrinsic("extrinsic error code 1002: Verification Error: Runtime error: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed\nWASM backtrace:\n\n    0: 0x35b912 - <unknown>!rust_begin_unwind\n    1: 0x3e96 - <unknown>!core::panicking::panic_fmt::ha53e328e2bd8426b\n    2: 0xb736e - <unknown>!TaggedTransactionQueue_validate_transaction\n: RuntimeApi(\"Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed\\nWASM backtrace:\\n\\n    0: 0x35b912 - <unknown>!rust_begin_unwind\\n    1: 0x3e96 - <unknown>!core::panicking::panic_fmt::ha53e328e2bd8426b\\n    2: 0xb736e - <unknown>!TaggedTransactionQueue_validate_transaction\\n\")"))>

However, all subsequent extrinsics after this couldn't be executed on the parentchain anymore, e.g. no more ProposedSidechainBlock event since then:
image

We dig the code a bit and reckon there might be more than one problem here:

The ticket in our repo: litentry/heima#1036
It doesn't matter if we use ocall_api directly or extrinsic_callbacks, they are routed to use send_extrinsic eventually.

@haerdib
Copy link
Contributor

haerdib commented Sep 8, 2022

I believe that's a serious issue. I haven't had the time to look into this yet, but we should do so soon. Not sure if it's a bug, but marking it as such until we know better. Thanks for investigating!

@haerdib haerdib added F2-bug Something isn't working S4-ExtrinsicSender Parentchain Component P1-asap labels Sep 8, 2022
@Kailai-Wang
Copy link
Contributor Author

Thanks for looking!

@echevrier echevrier self-assigned this Sep 15, 2022
@echevrier
Copy link
Contributor

echevrier commented Sep 28, 2022

send_extrinsic

if let Err(e) = api.send_extrinsic(call.to_hex(), XtStatus::Ready) {
returns an error if something went wrong in the Websocket connection, not if the extrinsics is invalid.

The extrinsic error is in the return message. In your case here https://github.com/scs/substrate-api-client/blob/b91d78b53fdc3a05b3c1d1e7fe0fd6b8cfc0dd17/src/std/rpc/ws_client/mod.rs#L265.
The error is written in the log file but is not thrown or catch somewhere in the code:
[2022-09-28T08:29:10Z ERROR ws::handler] WS Error <Custom(Extrinsic("extrinsic error code 1002: Verification Error: Runtime error: Execution failed: Execution aborted due to trap: wasm trap: wasm unreachableinstruction executed\nWASM backtrace:\n\n 0: 0x2a300d - <unknown>!rust_begin_unwind\n 1: 0x3e43 - <unknown>!core::panicking::panic_fmt::h038028a9926e2771\n 2: 0x14d38c - <unknown>!TaggedTransactionQueue_validate_transaction\n: Runtime error: Execution failed: Execution aborted due to trap: wasm trap: wasmunreachableinstruction executed\nWASM backtrace:\n\n 0: 0x2a300d - <unknown>!rust_begin_unwind\n 1: 0x3e43 - <unknown>!core::panicking::panic_fmt::h038028a9926e2771\n 2: 0x14d38c - <unknown>!TaggedTransactionQueue_validate_transaction\n"))>

As you have already stated, subsequent extrinsics are unable to get executed because of their wrong nonce. They all land in the future pool and will not be executed.

The nonce is set when the extrinsic is created :

but I noticed that a later created extrinsic is sent before we get the returned ready message for the "invalid" extrinsics.

To fix this bug we need more investigation.

@echevrier
Copy link
Contributor

This will be solved by #1027. I will close this issue for now.

@jingleizhang
Copy link

Hey @echevrier @haerdib!

It seems #1027 doesn't solve this issue. We'd better reopen this one. What do you think?

Thanks!

@clangenb
Copy link
Contributor

clangenb commented Dec 7, 2022

Yes, I actually don't see how #1027 should have solved the issue.

@clangenb clangenb reopened this Dec 7, 2022
@haerdib
Copy link
Contributor

haerdib commented Jan 5, 2023

Some news from the api-client side:
With tag v0.7.0 which the recent polkadot branches base on, at the very least the future status should return an error: https://github.com/scs/substrate-api-client/blob/f46480f5c4939fdb17f7755711070fe143b31f8a/examples/example_custom_nonce.rs#L58-L75

Can you check if this issue still persists with the new polkadot branch version?

@ziming-zung
Copy link

ziming-zung commented Jan 6, 2023

Some news from the api-client side: With tag v0.7.0 which the recent polkadot branches base on, at the very least the future status should return an error: https://github.com/scs/substrate-api-client/blob/f46480f5c4939fdb17f7755711070fe143b31f8a/examples/example_custom_nonce.rs#L58-L75

Can you check if this issue still persists with the new polkadot branch version?

I believe this issue still persist.

  • If the extrinsics is invalid, we do nothing.
    if let Err(e) = api.send_extrinsic(call.to_hex(), XtStatus::Ready) {
    error!("Could not send extrsinic to node: {:?}", e);
    }
  • The nonce is set when the extrinsic is created.
    let extrinsics_buffer: Vec<OpaqueExtrinsic> = calls
    .iter()
    .map(|call| {
    let extrinsic_params = ParentchainExtrinsicParams::new(
    runtime_spec_version,
    runtime_transaction_version,
    nonce_value,
    self.genesis_hash,
    params_builder,
    );
    let xt = compose_extrinsic_offline!(self.signer.clone(), call, extrinsic_params)
    .encode();
    nonce_value += 1;
    xt
    })
    .map(|xt| {
    OpaqueExtrinsic::from_bytes(&xt)
    .expect("A previously encoded extrinsic has valid codec; qed.")
    })
    .collect();
    *nonce_lock = Nonce(nonce_value);

    So I think this maybe needs a mechanism(redesign the implementation of caching nonce?) or a callback function to reset the nonce cache.

@haerdib
Copy link
Contributor

haerdib commented Jan 6, 2023

As long as the sending of the xt happens on the untrusted side, a redesign will be hard. A callback sounds more doable. It will require quite some design though, as the following needs to be covered:

  • if the node accepts Future all extrinsic sent after the failed one, will be executed once the missing nonce xt has been sent. So a simple reset will possibly render quite some xt useless.. e.g. will result in a duplicate nonce error.
  • which error should result in a none reset? As of now, a future status will also be returned as an error. That should not result in a reset. Same might go for other errors. One would need to investigate.

@clangenb
Copy link
Contributor

We are prioritizing #1238 over this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F2-bug Something isn't working P1-asap S4-ExtrinsicSender Parentchain Component
Projects
None yet
Development

No branches or pull requests

7 participants