Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megular Expressions #263

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Megular Expressions #263

wants to merge 9 commits into from

Conversation

MarcoPolo
Copy link
Contributor

@MarcoPolo MarcoPolo commented Jan 16, 2025

A very simple regular expression matcher for Multiaddr components. Supports capturing values. Matches in linear time (no back tracking).

The core logic is about 100 LOC. The sugar to make it nicer to use is about another 100 LOC.

Motivation

If we are going to treat Multiaddrs as encoding, then we need to make it more ergonomic to parse multiaddrs. Right now we have a lot of somewhat wrong manual parsers using ForEach. This should be able to replace those, make it cleaner, and most importantly make them obviously correct.

In draft while I try this out in go-libp2p.

Example

Parsing a WebTransport Multiaddr m.

var dnsName string
var ip4Addr string
var ip6Addr string
var udpPort string
var certHashesStr []string
matched, err := m.Match(
  meg.Or(
    meg.CaptureVal(ma.P_IP4, &ip4Addr),
    meg.CaptureVal(ma.P_IP6, &ip6Addr),
    meg.CaptureVal(ma.P_DNS4, &dnsName),
    meg.CaptureVal(ma.P_DNS6, &dnsName),
    meg.CaptureVal(ma.P_DNS, &dnsName),
  ),
  meg.CaptureVal(ma.P_UDP, &udpPort),
  meg.Val(ma.P_QUIC_V1),
  meg.Optional(
    meg.CaptureVal(ma.P_SNI, &wtAddr.sni),
  ),
  meg.Val(ma.P_WEBTRANSPORT),
  meg.CaptureZeroOrMore(ma.P_CERTHASH, &certHashesStr),
)
if err != nil {
  return webtransportAddr{}, err
}
if !matched {
  return webtransportAddr{}, errNotQUICAddr
}

@MarcoPolo MarcoPolo requested a review from sukunrt January 16, 2025 21:57
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch 2 times, most recently from 3fbfcdd to 126f9ef Compare January 20, 2025 18:13
@MarcoPolo MarcoPolo marked this pull request as ready for review January 21, 2025 20:04
Copy link
Member

@sukunrt sukunrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very useful to me modulo comments. See how nice the IsWebRTCDirectMultiaddr method is now vs compared to master.

I think we should make this package Experimental / Alpha and start using this in go-libp2p. After some experience we can remove the Experimental tag, till then we can keep iterating on a satisfactory api.

meg/sugar.go Outdated
import (
"errors"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the API. Some other things I'd like are:

Any which matches any protocol

Match the pattern somewhere in the middle, not necessarily at the start.

Any()* which will match everything from that point onwards, so we can match IP, TCP, Port, Rest,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have any now. Here's an example that captures a ipport and any*

found, err := m.Match(
    CaptureAddrPort(&network, &addrPort),
    meg.ZeroOrMore(meg.Any),
)

meg/meg.go Outdated
Comment on lines 82 to 83
if s.capture != nil {
cm[s.capture] = append(cm[s.capture], c.Value())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more powerful to capture the entire Component and not just Value. That way for some uses you don't have to move back from the string from to the component form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in #269

util.go Outdated
Comment on lines 128 to 129
func (m Multiaddr) Match(p ...meg.Pattern) (bool, error) {
s := meg.PatternToMatchState(p...)
return meg.Match(s, m)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to make this a method as opposed to:

MatchPattern(m Multiaddr, p ...meg.Pattern)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. I think it's a bit more ergonomic for it to be a method, but I don't have a strong preference

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect most users to not want to run this everytime. s := meg.PatternToMatchState(p...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. They would just do:

var m Multiaddr
// ...
m.Match(<some pattern>)

No use of meg.PatternToMatchState directly

@MarcoPolo MarcoPolo force-pushed the marco/multiaddr-refactor branch from 493f175 to 47c55fc Compare February 6, 2025 19:36
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from 126f9ef to 6bbe24b Compare February 6, 2025 19:47
@sukunrt sukunrt changed the base branch from marco/multiaddr-refactor to master February 13, 2025 10:40
x/meg/sugar.go Outdated
Comment on lines 122 to 240
func Optional(s Pattern) Pattern {
return func(next *MatchState) *MatchState {
return &MatchState{
kind: split,
next: s(next),
nextSplit: next,
}
}
}
Copy link
Member

@sukunrt sukunrt Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want an Optional or do we want a MatchIfExists? By MatchIfExists I mean if the component is there we always want to match it. While an Optional if I understand correctly, may be ignored if it is a prefix of the following match components.This might simplify the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the regex equivalent of MatchIfExists?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is. I think the expectation is to rewrite such regexs.

Implementation wise, if the next character doesn't match we still move the state ahead without moving the character index ahead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation wise, if the next character doesn't match we still move the state ahead without moving the character index ahead.

This sounds a bit like Optional.

Overall, I think we do want Optional. For example, SNI is an optional field in many multiaddrs

@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from 2a8b8af to be1c5ad Compare February 20, 2025 01:21
Copy link
Member

@sukunrt sukunrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one unaddresses comment:
#263 (comment)

MarcoPolo and others added 7 commits February 25, 2025 16:17
Support captures

export some things

wip thinking about public API

Think about exposing meg as a public API

doc comments

Finish rename

Add helper for meg and add test

add comment for devs
twice as fast without the copy
* much cheaper copies of captures

* Add a benchmark

* allocate to a slice. Use indexes as handles

* cleanup

* Add nocapture loop benchmark

It's really fast. No surprise

* cleanup

* nits
* Use Matchable interface

* Add Bytes to Matchable interface

* feat(x/meg): Support capturing bytes

* Export CaptureWithF

Can be used by more specific capturers (e.g capture net.AddrIP)

* Support Any match, RawValue, and multiple Concatenations

* Add CaptureAddrPort
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from c14016b to ae47e22 Compare February 26, 2025 00:18
@p-shahi p-shahi mentioned this pull request Feb 26, 2025
9 tasks
@MarcoPolo MarcoPolo force-pushed the marco/match-and-capture branch from ad1932c to 0c5383d Compare February 26, 2025 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants