Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compatibility with some IoT devices using avahi 0.8-rc1 #27

Merged
merged 3 commits into from
Jun 23, 2022

Conversation

DerAndereAndi
Copy link

@DerAndereAndi DerAndereAndi commented Jun 7, 2022

This fixes browse, lookup and also register not working properly with some devices running with avahi 0.8-rc1

The problem came up with Elli Charger wallboxes, which are using avahi 0.8-rc1 which couldn't be seen using this library and the device also didn't see the service announced with this library. Using avahi 0.8-rc1 on a linux device (ubuntu) does not show the same effects. So something within this device is causing this.

Upping the IPv4 TTL to 255 and the IPv6 HopLimit to 255 solves this. The avahi library also uses these values since almost 18 years and they haven't been changed.

This fixes browse, lookup and also register not working properly with devices running with avahi 0.8-rc1
@DerAndereAndi
Copy link
Author

Hi everyone, thanks for continuing the work with zeroconf!

I had an issue with IoT devices running avahi 0.8-r1 not responding to requests and also not seeing a created service. When using tcpdump to compare why other devices on the network successfully found this IoT device, the only difference turned out to be the TTL set to 1 before this patch. Now everything works fine.

connection.go Outdated Show resolved Hide resolved
@MarcoPolo MarcoPolo self-requested a review June 17, 2022 17:25
@BigLep
Copy link

BigLep commented Jun 17, 2022

Related PR: grandcat#108

@MarcoPolo
Copy link

@DerAndereAndi Thanks! A couple questions:

  1. Does this fail on the proper 0.8 release?
  2. Can you elaborate a bit on if this is a bug in this repo or is this a workaround for avahi 0.8-rc1.
  3. Why 255?
  4. Any relevant specs here we can reference?

@DerAndereAndi
Copy link
Author

Hi @MarcoPolo

thanks for your questions.

  1. I haven't tested this with earlier releases. The problem came up when trying to connect to a EVSE (EV charger) that uses avahi-0.8-rc1. The other devices I have are using version 0.6 and didn't have that issue.
  2. To me this looks like a bug, though I haven't checked the RFC or the avahi code itself
  3. I have used Wireshark with multiple other IoT devices that use the EEBUS protocol (which I am implementing as Open Source), and they all show a TTL of 255 and they all could see that device, but this library couldn't. Also the device couldn't see the mDNS service created with this library which also got solved when setting the TTL to 255.
  4. I haven't checked the RFC, as my primary goal for now was to make find the issue and make it work, so that EVSE can be used using the evcc software: github.com/evcc-io/evcc/

I am happy to assist providing anything I can. But so far I wasn't that deep into mDNS and this implementation.

@marten-seemann
Copy link

It looks like the SetMulticastTTL call you're adding is the IP TTL, not the DNS TTL (see https://datatracker.ietf.org/doc/html/rfc6762#section-2).

Wireshark is also complaining about the TTL (which appears to be 1 by default):
image
I'm not sure I understand why it should be 255 though, RFC 3171 doesn't say anything about that (it just defines what the Local Network Control Block is: https://datatracker.ietf.org/doc/html/rfc3171#section-2).

I'm still not 100% convinced that 255 is the correct value, as our old mDNS implementation also uses a TTL of 1.

@marten-seemann
Copy link

The Multicast DNS RFC has a section about IP TTLs: https://datatracker.ietf.org/doc/html/rfc6762#section-11

All Multicast DNS responses (including responses sent via unicast) SHOULD be sent with IP TTL set to 255. This is recommended to provide backwards-compatibility with older Multicast DNS queriers (implementing a draft version of this document, posted in February 2004) that check the IP TTL on reception to determine whether the packet originated on the local link. These older queriers discard all packets with TTLs other than 255.

Note that this only applies to responses, and not to queries. I'd assume it's fair to say that we don't really care about interop with implementations of a draft version from 18 years ago.

Copy link

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, all the mDNS traffic I'm observing on my computer / local network has the TTL set to 255, so this is probably fine.

@DerAndereAndi Do we also need to set the Hop Limit on IPv6?

@DerAndereAndi
Copy link
Author

I did some more testing using the examples resolve client and changed the service to _ship._tcp which is the service required for EEBUS devices.

Running with out the TTL change results in:

> go run examples/resolv/client.go
2022/06/19 11:22:43 &{{EVCC_HEMS_01 _ship._tcp [] local _ship._tcp.local. EVCC_HEMS_01._ship._tcp.local. _services._dns-sd._udp.local.} primarypi.local. 4712 [txtvers=1 path=/ship/ id=EVCC-3ce84c490507212f ski=bc1e89fc545fdc09db1a0152f6e4c9cef2388fa1 brand=EVCC model=EVCC_HEMS_01 type=EnergyManagementSystem register=true] 3200 [192.168.1.8] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:22:43 &{{SEMPSHIPGW _ship._tcp [] local _ship._tcp.local. SEMPSHIPGW._ship._tcp.local. _services._dns-sd._udp.local.} SMA3002857654.local. 4712 [model=Sunny Home Manager 2.0 type=Energy Manager brand=SMA ski=501d74013e68ea2038613512d32963b4e9f5a836 register=false path=/ship/ id=SEMPSHIPGW txtvers=1] 120 [192.168.1.3] []}
2022/06/19 11:22:53 No more entries.

Now running with the TTL change:

> go run examples/resolv/client.go
2022/06/19 11:24:17 &{{EVCC_HEMS_01 _ship._tcp [] local _ship._tcp.local. EVCC_HEMS_01._ship._tcp.local. _services._dns-sd._udp.local.} primarypi.local. 4712 [txtvers=1 path=/ship/ id=EVCC-3ce84c490507212f ski=bc1e89fc545fdc09db1a0152f6e4c9cef2388fa1 brand=EVCC model=EVCC_HEMS_01 type=EnergyManagementSystem register=true] 3200 [192.168.1.8] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:24:17 &{{Elli-Wallbox-2019A0OV8H _ship._tcp [] local _ship._tcp.local. Elli-Wallbox-2019A0OV8H._ship._tcp.local. _services._dns-sd._udp.local.} wallbox-2019A0OV8H.local. 4712 [model=Wallbox type=Wallbox brand=Elli ski=46b9642e684fc5274187487aad35a0508e32de3e register=false path=/ship/ id=Elli-Wallbox-2019A0OV8H txtvers=1 org.freedesktop.Avahi.cookie=413265520] 120 [192.168.1.14] [fe80::40b2:61cb:4f44:4ae7]}
2022/06/19 11:24:17 &{{SEMPSHIPGW _ship._tcp [] local _ship._tcp.local. SEMPSHIPGW._ship._tcp.local. _services._dns-sd._udp.local.} SMA3002857654.local. 4712 [model=Sunny Home Manager 2.0 type=Energy Manager brand=SMA ski=501d74013e68ea2038613512d32963b4e9f5a836 register=false path=/ship/ id=SEMPSHIPGW txtvers=1] 120 [192.168.1.3] []}
2022/06/19 11:24:27 No more entries.

Now using avahi 0.8 on Ubuntu and run it via:

> avahi-publish-service --version
avahi-publish-service 0.8
> avahi-publish-service -s Demo10 _ship._tcp 4713 textvers=1 path=/ship/ id=Demo-1ce34ca905f7013a ski=1c1f59ac545fdcc9dc1a085bfee5c94ef1348da2 brand=Demo model=Demo_Hems type=EnergyManagementSystem register=true
Established under name 'Demo10

Running the resolve client without the TTL change:

> go run examples/resolv/client.go
2022/06/19 11:26:54 &{{EVCC_HEMS_01 _ship._tcp [] local _ship._tcp.local. EVCC_HEMS_01._ship._tcp.local. _services._dns-sd._udp.local.} primarypi.local. 4712 [txtvers=1 path=/ship/ id=EVCC-3ce84c490507212f ski=bc1e89fc545fdc09db1a0152f6e4c9cef2388fa1 brand=EVCC model=EVCC_HEMS_01 type=EnergyManagementSystem register=true] 3200 [192.168.1.8] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:26:54 &{{Demo10 _ship._tcp [] local _ship._tcp.local. Demo10._ship._tcp.local. _services._dns-sd._udp.local.} primary.local. 4713 [textvers=1 path=/ship/ id=Demo-1ce34ca905f7013a ski=1c1f59ac545fdcc9dc1a085bfee5c94ef1348da2 brand=Demo model=Demo_Hems type=EnergyManagementSystem register=true] 120 [192.168.1.9] [2003:d2:f726:7200:e65f:1ff:fe54:65e3]}
2022/06/19 11:26:55 &{{SEMPSHIPGW _ship._tcp [] local _ship._tcp.local. SEMPSHIPGW._ship._tcp.local. _services._dns-sd._udp.local.} SMA3002857654.local. 4712 [model=Sunny Home Manager 2.0 type=Energy Manager brand=SMA ski=501d74013e68ea2038613512d32963b4e9f5a836 register=false path=/ship/ id=SEMPSHIPGW txtvers=1] 120 [192.168.1.3] []}

The Demo service appears. So contrary to my statement, this doesn't seem to be a general issue with avahi 0.8. Still it all these devices (and more I have) are using an TTL of 255 when checking it with Wireshark.

@DerAndereAndi
Copy link
Author

@marten-seemann I haven't seen any issues with that yet and also didn't see anything in Wireshark that hints for this. But I might have overseen something.

@marten-seemann
Copy link

Thank you @DerAndereAndi. As I've said in #27 (review), I think this change is fine.

I'm just wondering if we need to make the equivalent change for IPv6. Would be good to fix this once and for all.

@MarcoPolo
Copy link

https://man7.org/linux/man-pages/man7/ip.7.html

IP_MULTICAST_TTL (since Linux 1.2)
Set or read the time-to-live value of outgoing multicast
packets for this socket. It is very important for
multicast packets to set the smallest TTL possible. The
default is 1 which means that multicast packets don't
leave the local network unless the user program explicitly
requests it. Argument is an integer.

According to the man page we want this to be as small as possible?

@DerAndereAndi
Copy link
Author

I tried multiple IP TTLs for this device, only using 255 will find the device reliably. Even 254 will not work. They must be doing something locally on the device with routing that causes this. As the Avahi code the IP TTL was set to 255 18 years ago when that code was introduced and never changed, they surely wouldn't find an issue with their setup.

The question for me is, if one would want the number to be as low as possible: what is the benefit of doing it differently than the "de-facto" standard implementation? Right now I only see a downside, of the library not working with this specific device that is sold in the thousands.

@MarcoPolo
Copy link

The question for me is, if one would want the number to be as low as possible: what is the benefit of doing it differently than the "de-facto" standard implementation? Right now I only see a downside, of the library not working with this specific device that is sold in the thousands.

I think I agree with you here. I'm just confused in general why the manpage would recommend the exact opposite of what is seen in practice.

Do you happen to know if this is an issue with ipv6 as well? My guess is that ipv6 implementations would be a bit newer and not have this issue.

@DerAndereAndi
Copy link
Author

I just disabled IPv4 and tested with IPv6 only. And you are right, this device also requires to up the hop limit on IPv6. So I added this.

@DerAndereAndi DerAndereAndi changed the title Fix compatibility with avahi 0.8-rc1 Fix compatibility with some IoT devices using avahi 0.8-rc1 Jun 23, 2022
Copy link

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I dislike setting a TTL of 255, it seems like this is what all mDNS implementations do, and that it actually fixes a problem with certain devices.

@marten-seemann
Copy link

Thanks for digging into this @DerAndereAndi!

@marten-seemann marten-seemann merged commit af1f1d3 into libp2p:master Jun 23, 2022
@andig
Copy link

andig commented Jun 23, 2022

Thanks for merging @marten-seemann. What I can't wrap my head around is why this is necessary at all? This is the IP TTL and that is imho the number of hops. Inside the local network there would typically be a single hop only. Are we patching network stack issues here or what is the likely root cause?

@marten-seemann
Copy link

The root cause is most likely the one described in the RFC (see #27 (comment)): Terribly broken and horribly outdated mDNS implementations.

@DerAndereAndi DerAndereAndi deleted the bugfix/avahi-0.8-browse branch June 23, 2022 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants