Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: drains a channel to crash the daemon #2688

Merged
merged 13 commits into from
Jun 11, 2019

Conversation

m-schmoock
Copy link
Collaborator

@m-schmoock m-schmoock commented May 29, 2019

Test to reproducible crash a remote channeld

This adds a complicated test that demonstrates howto crash a remote channeld when intentionally draining a to a technical minimal value 'twice'.

  • The test will setup a [l1] -> [l2] setup with 1million satoshi.
  • It will drain it once by l1 sending l2 the exact maximum value (976559200msat) required to get around routing and HTLC fee checks. The steps are commented in source.
  • It can and will drain another time, this time with a less required HTLC commitment fee. I don't know why, but for a channel that is already very low or maybe the payment is very small, the required HTLC commitment fees are less than normal (spendable:13440800msat htlc_fee:10860sat => second drain amount:2580800msat).
  • Now the channel is in a broken state where the following three thing can happen:
    • l2 is a c-lightning node and tries to send a normal payment of i.e. 100000sat (100k sat). This will immideately crash the channeld with the attached stacktrace.
    • l2 sends a smaller initial payment back, i.e. just 10000sat (10ksat), this will unlock the broken state. The channel is then normally operable again.
    • l2 is a LND node, and will reject 'bigger' payments in this channel until a small enough one is being made first.

Notes

  • l2 crashing will result in a unilateral close.
  • the issue offers potential security and network healthy risks.
  • The exact routing and HTLC fee values are different in mainnet, but it is similar possible as I tried with the channel drain plugin.
  • An easy way to demonstrate it outside of the testsuite is to use my work-in-progress drain and fill channel plugin lightningd/plugins#22 drain plugin, simply drain a channel 'twice' and try to use it afterwards. The drain plugin will split the amount in chunks to fit inbound capacities and estimate required HTLC for the last chunk.

Stacktrace

lightning_channeld: channeld/channeld.c:1382: handle_peer_commit_sig: Assertion `can_funder_afford_feerate(peer->channel, peer->channel->view[LOCAL] .feerate_per_kw)' failed.
FATAL SIGNAL 6 (version v0.7.0-397-gd803275)'
backtrace: common/daemon.c:45 (send_backtrace) 0x562b824fa13f'
backtrace: common/daemon.c:53 (crashdump) 0x562b824fa18f'
backtrace: (null):0 ((null)) 0x7f1028c3c8af'
backtrace: (null):0 ((null)) 0x7f1028c3c82f'
backtrace: (null):0 ((null)) 0x7f1028c27671'
backtrace: (null):0 ((null)) 0x7f1028c27547'
backtrace: (null):0 ((null)) 0x7f1028c34db5'
backtrace: channeld/channeld.c:1380 (handle_peer_commit_sig) 0x562b824e984d'
backtrace: channeld/channeld.c:1819 (peer_in) 0x562b824eadcd'
backtrace: channeld/channeld.c:3102 (main) 0x562b824ee14b'
backtrace: (null):0 ((null)) 0x7f1028c28ce2'
backtrace: (null):0 ((null)) 0x562b824e57fd'
backtrace: (null):0 ((null)) 0xffffffffffffffff'
lightning_channeld: FATAL SIGNAL 6 (version v0.7.0-397-gd803275)

@rustyrussell
Copy link
Contributor

So much to unpack here!

First, "spendable_msat" is unreliable, as you've discovered. It's too simplistic for a funder, since we have to pay for tx fees for the enlarged tx as well (known issue, but fairly easy to fix).

Secondly, even when I change riskfactor to 0, it can't find a route. That's because we consider our own fee. I did that because we want to prioritize cheap channels over expensive ones, even though we're not paying it ourselves. But correctness comes first, so I think I need to change that (it's hard to have both, unfortunately.).

Let me fix those two first, then get to the Real Issue.

@m-schmoock
Copy link
Collaborator Author

m-schmoock commented May 30, 2019

@Rusty

  • does the test produce the error that can be seen on Travis? I wanted to make the negative test reliable...
  • if we send a payment to a direct connected peer, do routing fees even apply? Or is the first or last hop always free of charge?
  • can we reflect the exact 'Capacity exceeded' amount in the exception text, so a plugin has not to try and error until HTLC fees fit?

@ZmnSCPxj
Copy link
Contributor

ZmnSCPxj commented May 30, 2019

if we send a payment to a direct connected peer, do routing fees even apply? Or is the first or last hop always free of charge?

First hop is always free (the last hop is free only if it is the first hop), where hop refers to a channel.

The routing algorithm weighs the first hop as part of the cost of the route, but it should not.

(fixing this would have also fixed #2119 without weakening the route randomization fuzz from 90% to 5%)

@rustyrussell
Copy link
Contributor

Ok, I got nerd sniped and spent almost the whole day making spendable_msat accurate. Then I fixed routing. Still didn't get to the crash, that's tomorrow's job!

But trust me, everything else (work-related) is on hold until this is fixed...

@m-schmoock
Copy link
Collaborator Author

Great. But how can you make spendable accurate without knowing the dynamic parameters (fees, risk, fuzz, htlc fee ...) in advance?

@ZmnSCPxj
Copy link
Contributor

ZmnSCPxj commented May 30, 2019

Great. But how can you make spendable accurate without knowing the dynamic parameters (fees, risk, fuzz, htlc fee ...) in advance?

msat_spendable includes what you spend on fees.

fuzz, riskfactor simply affect what fees we pay. This is limited by your selected maxfeepercent and/or exemptfee, at least in pay plugin.

@m-schmoock
Copy link
Collaborator Author

m-schmoock commented May 30, 2019

In a certain way (depending on the viewpoint) opposite of spendable (receivable) is accurate, as the fill command of the drajn plugin is able to fill up the our amount so high to be exaclty total-reserves. that meant the remote got down to just reserves when the initiator of the payment was not the remote but someone else (in case of fill the local with another outbound channel, not sure if that makes sense to you (or me :)).

@m-schmoock
Copy link
Collaborator Author

@rustyrussell What do you think about passing the exceeded amounts into the exception so they can be used by upper layers like this?: #2691

@rustyrussell
Copy link
Contributor

OK, how's this?

I spent a lot of time fixing spendable_msat; it's not perfect but it's much better than it was. Then I actually fixed the bug you found. Finally, I explained what was happening with your test.

Phew!

@m-schmoock
Copy link
Collaborator Author

@rustyrussell Wow, nice you tracked it down so fast and precisely! I was not really able to understand what was going wrong, only that it was going wrong. I will fixup some Travis complaints so we can get this polished and finished quickly. I can then also update my drain plugin work more efficient :D

Michael

@m-schmoock
Copy link
Collaborator Author

We have the channeld/test/run-commit_tx left and some valgrind memory complaints ( https://travis-ci.org/ElementsProject/lightning/jobs/539676341 ):

lightning.git (test_drainage_crash)]$ channeld/test/run-commit_tx
local_payment_basepoint: 034f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa
remote_payment_basepoint: 032c0b7cf95324a07d05398b240174dc0c2be444d96b159aa6c7f7b1e668680991
# obscured commitment transaction number = 0x2bb038521914 ^ 42
local_funding_privkey: 30ff4956bbdd3222d44cc5e8a1261dab1e07957bdac5ae88fe3261ef321f37490101
local_funding_pubkey: 023da092f6980e58d2c037173180e9a465476026ee50f96695963e8efe436f54eb
remote_funding_pubkey: 030e9f7b623d2ccc7c9bd44d66d5ce21ce504c0acf6385a132cec6d3c39fa711c1
local_secretkey: bb13b121cdc357cd2e608b0aea294afca36e2b34cf958e2e6451a2f27469449101
localkey: 030d417a46946384f88d5f3337267c5e579765875dc4daca813e21734b140639e7
remotekey: 0394854aa6eab5b2a8122cc726e9dded053a2184d88256816826d6231c068d4a5b
local_htlckey: 030d417a46946384f88d5f3337267c5e579765875dc4daca813e21734b140639e7
remote_htlckey: 0394854aa6eab5b2a8122cc726e9dded053a2184d88256816826d6231c068d4a5b
local_delayedkey: 03fd5960528dc152014952efdb702a88f71e3c1653b2314431701ec77e57fde83c
remote_revocation_key: 0212a140cd0c6539d07cd08dfe09984dec3251ea808b892efeac3ede9402bf2b19
# funding wscript = 5221023da092f6980e58d2c037173180e9a465476026ee50f96695963e8efe436f54eb21030e9f7b623d2ccc7c9bd44d66d5ce21ce504c0acf6385a132cec6d3c39fa711c152ae

name: simple commitment tx with no HTLCs
to_local_msat: 7000000000
to_remote_msat: 3000000000
local_feerate_per_kw: 15000
# base commitment transaction fee = 10860sat
# actual commitment transaction fee = 10860
# to-local amount 6989140sat wscript 63210212a140cd0c6539d07cd08dfe09984dec3251ea808b892efeac3ede9402bf2b1967029000b2752103fd5960528dc152014952efdb702a88f71e3c1653b2314431701ec77e57fde83c68ac
# to-remote amount 3000000sat P2WPKH(0394854aa6eab5b2a8122cc726e9dded053a2184d88256816826d6231c068d4a5b)
remote_signature = 3045022100f51d2e566a70ba740fc5d8c0f07b9b93d2ed741c3c0860c613173de7d39e7968022041376d520e9c0e1ad52248ddf4b22e12be8763007df977253ef45a4ca3bdb7c001
# local_signature = 3044022051b75c73198c6deee1a875871c3961832909acd297c6b908d59e3319e5185a46022055c419379c5051a78d00dbbce11b5b664a0c22815fbcc6fcef6b1937c383693901
output commit_tx: 02000000000101bef67e4e2fb9ddeeb3461973cd4c62abb35050b1add772995b820b584a488489000000000038b02b8002c0c62d0000000000160014ccf1af2f2aabee14bb40fa3851ab2301de84311054a56a00000000002200204adb4e2f00643db396dd120d4e7dc17625f5f2c11a40d857accc862d6b7dd80e0400473044022051b75c73198c6deee1a875871c3961832909acd297c6b908d59e3319e5185a46022055c419379c5051a78d00dbbce11b5b664a0c22815fbcc6fcef6b1937c383693901483045022100f51d2e566a70ba740fc5d8c0f07b9b93d2ed741c3c0860c613173de7d39e7968022041376d520e9c0e1ad52248ddf4b22e12be8763007df977253ef45a4ca3bdb7c001475221023da092f6980e58d2c037173180e9a465476026ee50f96695963e8efe436f54eb21030e9f7b623d2ccc7c9bd44d66d5ce21ce504c0acf6385a132cec6d3c39fa711c152ae3e195220
num_htlcs: 0

name: commitment tx with all 5 htlcs untrimmed (minimum feerate)
to_local_msat: 6988000000
to_remote_msat: 3000000000
local_feerate_per_kw: 0
htlc_is_trimmed called!
Aborted (core dumped)

@m-schmoock
Copy link
Collaborator Author

m-schmoock commented May 31, 2019

Is there a way we can easily --fixup a code that has been moved by successive commits? We should add @pytest.mark.xfail(strict=True) to 22e58f0f test: drains a channel to crash the daemon.

edit: did it with git rebase -i master and then break and git commit --fixup ontop. Don't know if I hijacked the commit history that way.

@m-schmoock m-schmoock force-pushed the test_drainage_crash branch from 495cc5e to ee019f8 Compare May 31, 2019 21:24
@rustyrussell
Copy link
Contributor

Rebasing is fine: I hit end of day Friday here and had to push what I had. Good point about the test breakage, just needs to put common/htlc_trim.o into the Makefile.

@m-schmoock
Copy link
Collaborator Author

@cdecker this is now green for tests. I will rebuild drain on this base to further test it. But I think you could have a look at this now.

@m-schmoock
Copy link
Collaborator Author

@rustyrussell

  • So, LND definitively has the same problem, but without crashing. We should tell @Roasbeef at least next meeting.
  • Im currently working with the patchset to find out how it performs in detail, not 100% sure yet.

@rustyrussell
Copy link
Contributor

It's a choice though; we could choose not to send the final tx which would drive the channel into this state.

@rustyrussell rustyrussell added this to the 0.7.1 milestone Jun 2, 2019
@m-schmoock m-schmoock force-pushed the test_drainage_crash branch from 8e0ccfe to 958748a Compare June 2, 2019 21:29
@m-schmoock
Copy link
Collaborator Author

@rustyrussell Hm, I made a funny testcase where I initially send just their_reserves_msat + 1 from l1 to l2. After that l2 told me via RPC that is has now 1msat spendable. Is this correct, or should spendable remain at 0msat at l2 until enough was send HTLC fees can be covered for the way back?

@rustyrussell
Copy link
Contributor

Yes, spendable_msat still isn't perfect, especially in corner cases. But to be honest, the complexity of calculating that field is getting out of hand; it now really belongs in a plugin.

I had to stop somewhere :)

@m-schmoock m-schmoock force-pushed the test_drainage_crash branch from 958748a to ce0f903 Compare June 3, 2019 12:19
@m-schmoock
Copy link
Collaborator Author

m-schmoock commented Jun 3, 2019

@rustyrussell

  • Since 'Capacity exceeded' when using spendable_msat at first hop can still happen, can you consider passing the exceeded amount to the local exception text ( Pass htlc amount exceeded to exception #2691 ). I can finish this PR if you are okay with this procedure. I don't understand why/when spendable seems to account for HTLC and sometimes not, maybe you can explain or is my observation faulty?
  • Or is there a way for a plugin to calculate the current required HTLC fees?
  • Can you elaborate on the new meaning of spendable_msat what is accounted for in what situation and what has to be accounted for by a plugin? It seems to me that when I have a full channel, spendable_msat accounts for HTLC fees, when its low, it does not...
  • Can you elaborate on the new listchannels fields htlc_minimum_msat and htlc_maximum_msat? When I have a channel in both directions, it seems the htlc_maximum_msat for the receiving direction does not account for channel reserves. Is this correct?

@rustyrussell
Copy link
Contributor

@rustyrussell

* Since 'Capacity exceeded' when using `spendable_msat` at first hop can still happen, can you consider passing the exceeded amount to the local exception text ( #2691 ). I can finish this PR if you are okay with this procedure.

Yes, I think adding that feedback for local failures is an excellent idea.

I don't understand why/when spendable seems to account for HTLC and sometimes not, maybe you can explain or is my observation faulty?

If the amount would create a new output (ie. htlc output is not trimmed), the funder needs to pay more onchain fees. This means there's a corner case, if the funder can't afford that: we can't send an htlc which would not be trimmed.

* Or is there a way for a plugin to calculate the current required HTLC fees?

A plugin could do everything we're doing here, basically, but I'm not proposing a plugin. What I'm saying is that if we didn't have the spendable_msat field already, I would not add all this code and would tell people to write a plugin.

* Can you elaborate on the new meaning of `spendable_msat` what is accounted for in what situation and what has to be accounted for by a plugin? It seems to me that when I have a full channel,  `spendable_msat` accounts for HTLC fees, when its low, it does not...

There are no HTLC fees if the HTLC is small enough to be trimmed.

* Can you elaborate on the new `listchannels` fields `htlc_minimum_msat` and `htlc_maximum_msat`? When I have a channel in both directions, it seems the `htlc_maximum_msat` for the receiving direction does not account for channel reserves. Is this correct?

That's whatever the peer tells us in the channel_announce message. The max field is optional, so if it doesn't tell us anything we just use the total channel size (we have no idea what their reserve is).

@m-schmoock m-schmoock force-pushed the test_drainage_crash branch from ce0f903 to c6bd7fe Compare June 4, 2019 08:59
@m-schmoock
Copy link
Collaborator Author

m-schmoock commented Jun 4, 2019

@rustyrussell okay, thanks for all the clarification. We can consider this PR ready, as it fixes the crash, improves on route finding with the correct amount and adds tests. I did another rebase to solve conflicts, lets wait for Travis and we need a review from @cdecker .

We can finish #2691 in the aftermath.

Update: We have a test that does not complete after rebase: tests/test_plugin.py::test_htlc_accepted_hook_fail (update: unrealted, addressed separately)

@m-schmoock m-schmoock force-pushed the test_drainage_crash branch from c6bd7fe to 6ebcf68 Compare June 6, 2019 07:07
@m-schmoock
Copy link
Collaborator Author

m-schmoock commented Jun 6, 2019

Update: rebased again, now tests/test_pay.py::test_forward_local_failed_stats seem to timeout/crash something. But not always on my local machine either... \_(°-°)_/ unsure if related

@m-schmoock m-schmoock force-pushed the test_drainage_crash branch 3 times, most recently from f8ef455 to cc9e115 Compare June 9, 2019 10:33
@rustyrussell
Copy link
Contributor

Consider this?

diff --git a/channeld/full_channel.c b/channeld/full_channel.c
index c6aadc3a3..ee14eb310 100644
--- a/channeld/full_channel.c
+++ b/channeld/full_channel.c
@@ -540,7 +543,10 @@ static enum channel_add_err add_htlc(struct channel *channel,
 					    adding,
 					    removing,
 					    channel->funder);
-			if (htlc_fee) *htlc_fee = fee;  /* set fee output pointer if given */
+			/* set fee output pointer if given (they want max) */
+			if (htlc_fee && amount_sat_greater(fee, *htlc_fee))
+				*htlc_fee = fee;
+
 			if (amount_msat_less_sat(balance, fee)) {
 				status_trace("Funder could not afford own fee %s with %s above reserve",
 					     type_to_string(tmpctx,
@@ -556,7 +562,9 @@ static enum channel_add_err add_htlc(struct channel *channel,
 					    adding,
 					    removing,
 					    !channel->funder);
-			if (htlc_fee) *htlc_fee = fee;  /* set fee output pointer if given */
+			/* set fee output pointer if given (they want max) */
+			if (htlc_fee && amount_sat_greater(fee, *htlc_fee))
+				*htlc_fee = fee;
 			if (amount_msat_less_sat(balance, fee)) {
 				status_trace("Funder could not afford peer's fee %s with %s above reserve",
 					     type_to_string(tmpctx,

Copy link
Member

@cdecker cdecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that was a lot to unpack, thanks to @rustyrussell for the detailed analysis, and the fixes 👍

ACK cc9e115

spendable_l2_bak = spendable_l2
while spendable_l1_bak == spendable_l1 or spendable_l2_bak == spendable_l2:
spendable_l1 = l1.rpc.listpeers()['peers'][0]['channels'][0]['spendable_msat']
spendable_l2 = l2.rpc.listpeers()['peers'][0]['channels'][0]['spendable_msat']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to reduce the RPC pressure by using wait_for here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but its not in the final test after refactor. The following commit fixes it:

1aea063 pytest: extract separate tests that spendable_msat is accurate.

If you still insist I can rewrite the history (test has been moved also).

@m-schmoock
Copy link
Collaborator Author

Okay, I will do another Update with the feedback. hopefully we can get this merged.

m-schmoock and others added 11 commits June 11, 2019 14:34
This is where payment tests should go.  Also mark it xfail for the moment,
and remove developer-only tag (propagating gossip is only 60 seconds, which
is OK).

Signed-off-by: Rusty Russell <[email protected]>
Turns out we needed more comprehensive testing; we ended up with three
separate tests.  To avoid changing test_channel_drainage as we fix
spendable_msat, I substituted raw numbers there.

The first is a variation of the existing tests, testing we can't
exceed spendable_msat, and we can pay it, both ways.

The second is with a larger amount, which triggers a different problem.

The final is with a giant channel, which tests our 2^32-1 msat cap.

Signed-off-by: Rusty Russell <[email protected]>
Take into account the fee we'd have to pay if we're the funder, and
also drop to 0 if the amount is less than the smallest HTLC the peer
will accept.

Signed-off-by: Rusty Russell <[email protected]>
This means there's now a semantic difference between the default `fromid`
and setting `fromid` explicitly to our own node_id.  In the default case,
it means we don't charge ourselves fees on the route.

This means we can spend the full channel balance.

We still want to consider the pricing of local channels, however:
there's a *reason* to discount one over another, and that is to bias
things.  So we add the first-hop fee to the *risk* value instead.

Signed-off-by: Rusty Russell <[email protected]>
The current calculation ignores them, which is unrealistic.

Signed-off-by: Rusty Russell <[email protected]>
Subtracting both arbitrarily reduces our capacity, even for ourselves
since the routing logic uses this maximum.

I also changed 'advertise' to 'advertize', since we use american
spelling.

Signed-off-by: Rusty Russell <[email protected]>
@m-schmoock m-schmoock force-pushed the test_drainage_crash branch from cc9e115 to 6c222bc Compare June 11, 2019 12:45
We track whether each change is affordable as we go;
test_channel_drainage got us so close that the difference mattered; we
hit an assert when we tried to commit the tx and realized we couldn't
afford it.

We should not be trying to add an HTLC if it will result in the funder
being unable to afford it on either the local *or remote* commitments.

Note the test still "fails" because it refuses to send the final
payment.

Signed-off-by: Rusty Russell <[email protected]>
Remove gratuitous prints, add explanations of what's going on,
and demonstrate that we can add a final trimmed HTLC but not
a non-trimmed one.

Signed-off-by: Rusty Russell <[email protected]>
@m-schmoock m-schmoock force-pushed the test_drainage_crash branch from 6c222bc to c69beb6 Compare June 11, 2019 12:51
@cdecker
Copy link
Member

cdecker commented Jun 11, 2019

ACK c69beb6

Will wait for @rustyrussell to give it one more round and let him merge 👍

@rustyrussell rustyrussell merged commit db22d2b into ElementsProject:master Jun 11, 2019
@m-schmoock m-schmoock deleted the test_drainage_crash branch June 18, 2019 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants