Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel lookup api method #3000

Merged
merged 5 commits into from
Aug 30, 2019
Merged

Kernel lookup api method #3000

merged 5 commits into from
Aug 30, 2019

Conversation

jaspervdm
Copy link
Contributor

@jaspervdm jaspervdm commented Aug 25, 2019

This PR adds an API call to look up an on-chain kernel and the height of the block it is included in.
It has an optional minimum search height. If provided (and close to the actual height), the request will be really fast (<10ms). Without this parameter the request can currently take up to 2.5s. Seeing as in the worst case scenario this time will increase linearly with the chain head height (which keeps increasing forever), it is recommended to always set the minimum height parameter.

Requesting review from @antiochp since I am not very experienced with MMRs and as a result some implementation details might be sub-optimal.

Closes #2460

Use cases:

Usage:
v1/chain/kernels/XXX?min_height=YYY&max_height=ZZZ where XXX is the excess commitment. The min_height and max_height parameters are both optional. If not supplied, min_height will be set to 0 and max_height will be set to the head of the chain. The method will start at the block height max_height and traverse the kernel MMR backwards, until either the kernel is found or min_height is reached.

Example:
Request:

/v1/chain/kernels/0939fe3dc6a35350da91c6288138b7a257e0c0322eae30bda3938229d649e2e642?max_height=324300

Response:

{
  "tx_kernel": {
    "features": {
      "Plain": {
        "fee": 12000000
      }
    },
    "excess": "0939fe3dc6a35350da91c6288138b7a257e0c0322eae30bda3938229d649e2e642",
    "excess_sig": "c6f269b98f611c03cde361467daa5c27109dea0e8b28a3189646a4e1e9ad96fe92ea86b65dc76151686c5a4f3922202bd298b665ec80ba2b1e43fb46df34f6a4"
  },
  "height": 324016,
  "mmr_index": 1946151
}

@jaspervdm jaspervdm requested a review from antiochp August 25, 2019 23:25
@DavidBurkett
Copy link
Contributor

Why not index these in lmdb?

@antiochp
Copy link
Member

Why not index these in lmdb?

We do this for outputs. There is also a related PR #2903 that proposes storing the height per output in lmdb.

We came to the conclusion in previous conversations that -

  • we need fast random access for all outputs (node needs to verify spent outputs for every tx/block)
  • we do not necessarily need fast random access for all kernels
    • a node only cares about a limited subset of kernels (based on connected wallets)
    • relative kernels and scriptless script use cases are assumed to be relatively rare

So given the tradeoff between storage costs for lots of kernels and slower lookups - we erred on the side of slower lookups.

We are comfortable maintaining an index of all unspent outputs in lmdb as the UTXO set will not grow indefinitely.
We are less comfortable indexing all kernels as this set will grow indefinitely (and vast majority of this index will never be accessed or used).

That said - a given node probably has a reasonable idea of which kernels it is interested in locally. so it may make sense to store/cache these in lmdb based on prior access.
Or maybe it makes more sense to cache these at the wallet level (in the wallet lmdb instance).

@antiochp
Copy link
Member

Thinking a bit more about this. I'm torn about the "index or not to index" question.

@yeastplume
Copy link
Member

yeastplume commented Aug 26, 2019

Couple of things (that ties in with a few thoughts I've had on wallet improvements)

  1. Caching a min height in a wallet along with each transaction should be trivial, but this will obviously only be useful where the tx log and data is intact. (Think something will have to be done here anyhow as I don't think it's currently storing kernel IDs in a handy place). When restoring, checking the chain, etc, the wallet should still be retrieving by output anyhow so shouldn't make much of a difference.
  2. Another approach to the original issue is to simply have the wallet scan outputs as part of normal operation (i.e. perform the wallet_check command more or less continually, like most other wallets do for their chains). Wallets could store a birthdate block, keep track of the last block scanned between operations and start checks from last_block - sync horizon). I think this will be needed at some point anyhow for 100% wallet data robustness, so we will have to figure out how to keep this operation efficient.

@jaspervdm
Copy link
Contributor Author

@antiochp Hmm I hadn't considered the relative kernel locks usecase. But even then increasing the storage requirement of kernels by ~40% might be a lot, especially since we can never prune them.

Do you want me to already accomodate for this usecase? That means moving some of the logic into the Chain, and probably providing a flag that allows to scan backwards in the MMR instead of forwards.

@yeastplume Right, the kernel lookup would only be used for unconfirmed transactions, not for restores as they happen on the output level rather than the tx level.

Another approach to the original issue is to simply have the wallet scan outputs as part of normal operation

Yes, I think incremental scanning of the utxo set is a good idea anyway. But I'm not sure if checking the tx confirmation status by looking at the presence of the inputs in the utxo set is ideal. If the user is using the seed on multiple different wallets at the same time, it could happen that the wrong tx gets marked as confirmed.

@yeastplume
Copy link
Member

Yep, agreed that kernel lookup is the best way to confirm transactions, we can get this implemented then look at incremental scanning as a near-term improvement.

@antiochp
Copy link
Member

Do you want me to already accomodate for this usecase? That means moving some of the logic into the Chain, and probably providing a flag that allows to scan backwards in the MMR instead of forwards.

Do we want to always scan backwards through the MMR? Presumably if we are confirming txs they will be younger txs, present in more recent blocks?

@antiochp
Copy link
Member

But even then increasing the storage requirement of kernels by ~40% might be a lot, especially since we can never prune them.

I'm not sure I follow what you are saying here - you mean in terms of the lmdb index/storage (if we stored the index in the db)?

@jaspervdm
Copy link
Contributor Author

Do we want to always scan backwards through the MMR? Presumably if we are confirming txs they will be younger txs, present in more recent blocks?

Makes sense, will change it.

@jaspervdm
Copy link
Contributor Author

I'm not sure I follow what you are saying here - you mean in terms of the lmdb index/storage (if we stored the index in the db)?

Yes if we store an index we'd need 40% more storage per kernel.

Thinking about it some more, a malicious user could significantly slow down initial sync by creating lots of relative lock height kernels that rely on very old kernels. What if we only create the index during initial sync and delete it afterwards?

@antiochp
Copy link
Member

Yeah I think there may be reasons why we would want to temporarily create an index during initial sync. But I don't think we want to assume this index is available for general usage after sync.

I would not be overly concerned about supporting relative kernels today - we can optimize initial fast sync later as you suggest.

We don't want to add 40% to the permanent storage costs today in anticipation of something that may be very rare.

@garyyu
Copy link
Contributor

garyyu commented Aug 27, 2019

Thinking a bit more about this. I'm torn about the "index or not to index" question.

Same here :-) My left hand vote Yes for index but right hand vote for No index 😸

There could be a middle way on this:

  • Use index for recent kernels (pruning those indexes when blocks pruned, and no pruning if it's an archived mode node).
  • Use slow searching for far old kernels (if the estimated height is beyond the STATE_SYNC_THRESHOLD and not found in indexed kernels).

The fast indexed kernel querying will cover 99% transaction status querying requirement, since in most cases people will query status immediately (or at least in two days) after a transaction posted.

(As reference, here is my trying on a pure indexing solution: gottstech#3 and gottstech#13. if working as a wallet service backend, the config item pruning_kernel_index can be changed as false.)

@antiochp
Copy link
Member

@jaspervdm Is this ready for final review?

@jaspervdm
Copy link
Contributor Author

@antiochp I think it is! I just updated the description to reflect the latest changes.

@@ -260,6 +260,29 @@ impl TxHashSet {
.elements_from_insertion_index(start_index, max_count)
}

/// Find a kernel with a given excess. Work backwards from `max_index` to `min_index`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might make more sense in the rewindable_kernel_view (rather than the txhashset).
Maybe a separate PR to refactor this.

@antiochp
Copy link
Member

Looks pretty fast to me. This is the coinbase kernel for the genesis block -

time curl "http://127.0.0.1:3413/v1/chain/kernels/096385d86c5cfda718aa0b7295be0adf7e5ac051edfe130593a2a257f09f78a3b1"
{
  "tx_kernel": {
    "features": "Coinbase",
    "excess": "096385d86c5cfda718aa0b7295be0adf7e5ac051edfe130593a2a257f09f78a3b1",
    "excess_sig": "142a6a482a0c64ba71ba216ba5361612696fc76fe8d5c03c79fae01cab29d050ed7567e9a6d4fbbc1b6c0bf4434b8450dafa3f29b31612fda815f6e4b2bcfd43"
  },
  "height": 0,
  "mmr_index": 1
}
curl   0.01s user 0.01s system 1% cpu 1.049 total

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@antiochp antiochp merged commit 30156cd into mimblewimble:master Aug 30, 2019
@lehnberg lehnberg added this to the 2.1.0 milestone Aug 30, 2019
@jaspervdm jaspervdm deleted the api_kernel branch December 29, 2019 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Look up kernel status
6 participants