Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Own implementation of Bech32::u5 type, upgrade of bech32 dependency #3201

Closed
wants to merge 2 commits into from

Conversation

optout21
Copy link
Contributor

@optout21 optout21 commented Jul 23, 2024

Own implementation of u5 type and conversions.
Upgrade of bech32 dependency, from 0.9.1 to 0.11.0.

Fixes #3176 and #3195 . Obsoletes #3181 .

Changes summary:

  • Introduced lightning::util::bech32::u5 type, for holding a 32-bit value, taken over from bech32 v0.9.1 (This type is no longer available in bech32` crate)
  • It has simple type conversions (from/to u8), Bech32 character encoding/decoding (from/to char), ...
  • ... unpacking from packed bytes and packing to bytes (iterative), traits for conversions.
  • External crate bech32 is still used directly in a few places; it has been upgraded to 0.11.0.

More details in #3195 .

Potential further improvements:

  • Encapsulate checksum functionality into util/bech32.rs, by calling into external bech32 implementation
  • Add HRP decoding/encoding functionality to bech32/mod.rs, by calling into external crate, or better, by own implementation (simple string parsing / concatenation)
  • Move tagged parsing logic to bech32/mod.rs
  • Add test vectors from lightning invoice spec (BOLT11)
  • Isolate bech32 functionality into a standalone crate (e.g. bech32-ligthning).
  • Remove dependency to external crate completely

TODO:

  • fix fuzz target
  • fmt
  • squash

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 94.52888% with 18 lines in your changes missing coverage. Please review.

Project coverage is 91.36%. Comparing base (78c0eaa) to head (c8279fe).
Report is 206 commits behind head on main.

Files Patch % Lines
lightning/src/util/bech32.rs 97.09% 8 Missing ⚠️
lightning-invoice/src/de.rs 84.00% 2 Missing and 2 partials ⚠️
lightning/src/offers/parse.rs 63.63% 2 Missing and 2 partials ⚠️
lightning-invoice/src/ser.rs 83.33% 0 Missing and 1 partial ⚠️
lightning/src/util/test_utils.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3201      +/-   ##
==========================================
+ Coverage   89.80%   91.36%   +1.56%     
==========================================
  Files         121      123       +2     
  Lines      100045   113913   +13868     
  Branches   100045   113913   +13868     
==========================================
+ Hits        89845   104078   +14233     
+ Misses       7533     7289     -244     
+ Partials     2667     2546     -121     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@optout21 optout21 force-pushed the bech32-own branch 5 times, most recently from ae1b49d to dd02b21 Compare July 23, 2024 15:14
@arik-so
Copy link
Contributor

arik-so commented Jul 24, 2024

Also, do you think this could possibly be split up into two commits? First the introduction of the new type, and then switching to it? Or would it cause too much havoc with check_commits and needing to add and then remove allow_unused?

@arik-so arik-so added this to the 0.0.124 milestone Jul 24, 2024
@optout21
Copy link
Contributor Author

Also, do you think this could possibly be split up into two commits? First the introduction of the new type, and then switching to it? Or would it cause too much havoc with check_commits and needing to add and then remove allow_unused?

It is possible to split up into two commits, it helps review, but check_commits will probably complain, and it should be squashed before merging. Or did you have in mind 2 separate PRs? Also, what would be the benefit? Since the addition is in a new file, it is quite easy to separate during review.

@optout21
Copy link
Contributor Author

It is possible to split up into two commits, ...

Actually, it's not really possible to separate the external upgrade and the new own implementation:

  • if external crate is upgraded, code has to be adopted throughout, which does require the new own implementation
  • if the old version of the external crate is kept, the new implementation has to be adapted to the new version, which does not make much sense.

@optout21 optout21 force-pushed the bech32-own branch 3 times, most recently from ee27f1a to a4e40fc Compare July 25, 2024 15:54
@optout21 optout21 marked this pull request as ready for review July 25, 2024 16:59
@tnull tnull mentioned this pull request Jul 30, 2024
@apoelstra
Copy link

I'm a little confused about the motivation for this PR. Upstream you complained that Fe32 did not have Ord or Default, but if I remove those impls from your u5 nothing fails to compile in lightning except your test for Default being zero. It appears that you use these impls in lightning-invoice within ser.rs in a couple places:

One is to implement extend_from_stretch generically over T: Default. But actually you only call this function with u8 and u5, and when you call it with u5 the semantics are slightly different (in that you have this Optional return but you always call .expect on it). A short-diff fix for this is

diff --git a/lightning-invoice/src/ser.rs b/lightning-invoice/src/ser.rs
index 7317b5cc3..4e9bce317 100644
--- a/lightning-invoice/src/ser.rs
+++ b/lightning-invoice/src/ser.rs
@@ -177,6 +177,22 @@ impl Display for SiPrefix {
 	}
 }
 
+fn encode_int_be_base32_fixed(int: u64, size: usize) -> Vec<u5> {
+	let base = 32u64;
+
+	let mut out_vec = Vec::<u5>::new();
+
+	let mut rem_int = int;
+	for _ in 0..size {
+		out_vec.push(u5::try_from_u8((rem_int % base) as u8).expect("always <32"));
+		rem_int /= base;
+	}
+	assert_eq!(rem_int, 0, "input {} did not fit into {} u5 words (remainder {})", int, size, rem_int);
+
+	out_vec.reverse();
+	out_vec
+}
+
 fn encode_int_be_base32(int: u64) -> Vec<u5> {
 	let base = 32u64;
 
@@ -250,10 +266,7 @@ impl ToBase32 for RawDataPart {
 impl ToBase32 for PositiveTimestamp {
 	fn write_base32<W: WriteBase32>(&self, writer: &mut W) -> Result<(), <W as WriteBase32>::Err> {
 		// FIXME: use writer for int encoding
-		writer.write(
-			&try_stretch(encode_int_be_base32(self.as_unix_timestamp()), 7)
-				.expect("Can't be longer due than 7 u5s due to timestamp bounds")
-		)
+		writer.write(&encode_int_be_base32_fixed(self.as_unix_timestamp() , 7))
 	}
 }
 
@@ -415,10 +428,12 @@ impl ToBase32 for TaggedField {
 			assert!(len < 1024, "Every tagged field data can be at most 1023 bytes long.");
 
 			writer.write_u5(u5::try_from_u8(tag).expect("invalid tag, not in 0..32"))?;
-			writer.write(&try_stretch(
-				encode_int_be_base32(len as u64),
-				2
-			).expect("Can't be longer than 2, see assert above."))?;
+			writer.write(&encode_int_be_base32_fixed(len as u64, 2))?;
+
+			writer.write(&[
+                            u5::try_from_u8((len / 32) as u8).unwrap(),
+                            u5::try_from_u8((len % 32) as u8).unwrap(),
+                        ])?;
 			payload.write_base32(writer)
 		}

But of course a proper fix would stop using Vec unnecessarily and get rid of these unwraps. For example you could use the upstream ByteIterExt trait then call uint64.to_be_bytes().iter().copied().bytes_to_fes() and then copy the output of that into a fixed-size array. (Or just collect if you want to keep the Vec and address it later, which might be easier to review.)

Then for Ord, it seems this is needed to derive Ord for RawTaggedField. But this is a two-variant enum, and while annoying, it is very easy to manually impl Ord for this:

diff --git a/lightning-invoice/src/lib.rs b/lightning-invoice/src/lib.rs
index 063ae8f40..e6fc5b1bc 100644
--- a/lightning-invoice/src/lib.rs
+++ b/lightning-invoice/src/lib.rs
@@ -419,7 +419,7 @@ impl From<Currency> for Network {
 /// Tagged field which may have an unknown tag
 ///
 /// This is not exported to bindings users as we don't currently support TaggedField
-#[derive(Clone, Debug, Hash, Eq, PartialEq, Ord, PartialOrd)]
+#[derive(Clone, Debug, Hash, Eq, PartialEq)]
 pub enum RawTaggedField {
 	/// Parsed tagged field with known tag
 	KnownSemantics(TaggedField),
@@ -427,6 +427,23 @@ pub enum RawTaggedField {
 	UnknownSemantics(Vec<u5>),
 }
 
+impl PartialOrd for RawTaggedField {
+    fn partial_cmp(&self, other: &Self) -> Option<core::cmp::Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl Ord for RawTaggedField {
+    fn cmp(&self, other: &Self) -> core::cmp::Ordering {
+        match (self, other) {
+            (RawTaggedField::KnownSemantics(..), RawTaggedField::UnknownSemantics(..)) => core::cmp::Ordering::Less,
+            (RawTaggedField::UnknownSemantics(..), RawTaggedField::KnownSemantics(..)) => core::cmp::Ordering::Greater,
+            (RawTaggedField::KnownSemantics(ref a), RawTaggedField::KnownSemantics(ref b)) => a.cmp(b),
+            (RawTaggedField::UnknownSemantics(ref a), RawTaggedField::UnknownSemantics(ref b)) => a.iter().map(|a| a.as_u8()).cmp(b.iter().map(|b| b.as_u8())),
+        }
+    }
+}
+
 /// Tagged field with known tag
 ///
 /// For descriptions of the enum values please refer to the enclosed type's docs.

And both of these changes could be done in commits independent of and prior to any bech32 changes, meaning that their review can be isolated from the rest of your changes.

Also, even if you did want to create your own u5 type, why reimplement things like conversion of u5strings to bytestrings when those exist upstream and can be easily implemented in terms of Fe32? Similar for the char conversions.

@optout21
Copy link
Contributor Author

optout21 commented Aug 9, 2024

Thanks for looking into this in detail!

You are right that the Default requirement could worked around, but I think the Default trait makes sense for a numerical value, and it doesn't degrade code quality.

For Ord: correct, it is needed for the tagged values in lightning invoices, for the unparsed parts (UnknownSemantics). Ord and PartialOrd could be implmented as you suggest, but even in your implementation it converts to u8 and does the comparison. I think it's just simpler if u5 natively support it. (My -- now discarded -- solution was to wrap Fe32 in a struct with Ord implemented, and use that inside UnknownSemantics.)

If my goal was to keep using Fe32 these suggestions would make it possible.

Also, even if you did want to create your own u5 type, why reimplement things like conversion of u5strings to bytestrings when those exist upstream and can be easily implemented in terms of Fe32? Similar for the char conversions.

You are right, and I had a version like that. However, after weighing the conversions and adaptations needed vs. the code/logic to be duplicated, I arrived at the conclusion that it's easier to essentially duplicate some logic in the way suitable for the LDK project. Lightning invoice parsing is a basic operation for LDK, and I think it makes sense to have a well separated, minimal dependency module within LDK for lightning parsing, I'd like to move into that direction. In general I try to achieve high code reuse, but in this case maybe it's more practical to go towards more self-sufficient solution with no external dependency.

I have written up my concerns in the issue #3195 .

But I'm listening to your opinion as well. I may revisit the version without code duplication and check the shortcoming again ( #3181 )

@apoelstra
Copy link

For Ord: correct, it is needed for the tagged values in lightning invoices, for the unparsed parts (UnknownSemantics). Ord and PartialOrd could be implmented as you suggest, but even in your implementation it converts to u8 and does the comparison. I think it's just simpler if u5 natively support it.

My suggestion does not convert to u8 anywhere.

And no, it is not "simpler" to implement a nonsensical ordering on a field element when there is a standard notion of "ordered field" which does not apply to the field in question.

this case maybe it's more practical to go towards more self-sufficient solution with no external dependency.

You intend to write your own bech32 checksum encoding and verification code? I will switch to the other issue to avoid splitting the conversation but this is a dramatic failure of the bech32 API if it is unable to do basic checksumming.

@optout21 optout21 marked this pull request as draft August 12, 2024 08:29
@optout21
Copy link
Contributor Author

Put back to daft for now, depends on #3234, and it is still open whether to go the route of #3181 .

@optout21
Copy link
Contributor Author

Pivoted back to reusing rust-bech32 as much as possible, now in #3244 (following #3234 ).

@optout21 optout21 closed this Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade bech32 dependency
3 participants