CRITICAL: IPFS Companion exposes issue where Slate Gateway URLs are not resolving when the IPFS Companion Extension is enabled. #342

jimmylee · 2020-09-29T18:51:13Z

This issue was reported by momack2. She reported that with IPFS Companion Extension on Google Chrome, you could not view any of the image assets on https://slate.host.

The original screenshot in the report included this:

I was able to deduce that the issue is not Metamask related.
a1.jpg is a reference to an old default avatar we had that we have since removed.

Reproduction

I was able to reproduce this bug by using Google Chrome, and the IPFS companion extension.

Upon further investigation I was able to deduce that:

Slate is still functional.
All IPFS gateway URLs fail to resolve

You don't need to be running https://slate.host to reproduce this issue, you can just try to visit these URLS:

sanderpick · 2020-09-29T19:18:30Z

Hey @jimmylee - My guess is that IPFS Companion naively parses outbound GET requests, snagging any that contain /ipfs.

jimmylee · 2020-09-29T19:20:11Z

Thanks @sanderpick, pinging @lidel here in case that helps with the diagnosis

olizilla · 2020-09-29T19:37:02Z

IPFS Companion will redirect urls that are valid IPFS addresses to the local IPFS deamon. That's the core feature. Are the images announced to the dht? Are there any discoverable provider records?

lidel · 2020-09-29T19:37:09Z

I played with slate a bit and it works fine with Companion and my local go-ipfs 0.7.0, if i upload 💚 well known file the content is read from IPFS instead of slate.textile.io/ipfs/.. and I get content integrity guarantees for free:

💔 Uploading unique stuff thats not on the IPFS network already just hangs.

IIUC all this is expected behavior: you have IPFS Companion enabled, and by default it will load content from local IPFS node. If the content is not there yet, it may take time for your node to find it and fetch it.

So.. sounds like the problem you experience is slow content discovery when using a local IPFS node?
@jimmylee what type of local node you have? are you on slow network?
@sanderpick is textile's gateway running go-ipfs 0.7.0? are you providing records to the network?

jimmylee · 2020-09-29T19:40:29Z

(1) @momack2 was the first to report this issue, hopefully she can provide more details if her bandwidth provides

(2) I'm on WIFI with this speed:

Tagging @carsonfarmer so he might be able to provide details if bandwidth permits.

jimmylee · 2020-09-29T19:41:24Z

Node Type:

I am using the out of the box config for IPFS companion. No custom settings @lidel

lidel · 2020-09-29T19:47:40Z

Ok, after longer look, I believe this is not a problem with Companion, but discoverability of data on Textile's nodes.

Not only my local node is unable to find CIDs you provided in the first comment:

$ ipfs refs -r bafkreibp4qw5qq3bzgx5fbcz3bvznyc2xyjeevn3hhbjav35dl5fy7ew54

But none of the public gateways, for example:

They hang forever.

To confirm its content discovery issue you can download this file and import it to Slate – you will see it works fine, even with companion and local node.

olizilla · 2020-09-29T19:50:18Z

@lidel an idle thought, companion could look up a dnsaddr record for the domain when it encounters IPFS urls. So slate.host could have a txt record like /dnsaddr/slate.host/tcp/4001/ipfs/QmNodeyNodeNode and while it redirects the requests to the local daemon, also try and connect to the suggested peer, to help with situations where it is difficult to publish all the provider records to the dht.

jimmylee · 2020-09-29T19:52:15Z

@lidel @olizilla thank you for the next level debugging 🌸 I appreciate it!

carsonfarmer · 2020-09-29T19:55:21Z

@lidel they appear to work perfectly fine for me? Even "fresh" slate cids.

jimmylee · 2020-09-29T20:14:30Z

A few other images started working 👀

sanderpick · 2020-09-29T20:16:39Z

@lidel we're still on ipfs/go-ipfs:v0.6.0. The node is announcing records, but is currently NATed... which may account for the large discovery times. We're going to attach a public IP routed to that node's swarm port.

lidel · 2020-09-29T20:21:18Z

@olizilla yes.. that would be really elegant. Won't be an easy fix as there is no Web API for dnsaddr lookup, nor we expose it in IPFS APIs. I created ipfs/ipfs-companion#925 to track this idea.

@carsonfarmer @jimmylee yeah, now I see them too. Looks like a really slow discovery for some reason.

If I add unique CID to Slate, my local node is unable to find it unless I execute preload call to https://node1.preload.ipfs.io/api/v0/refs?r=true&arg=<cid>. I suspect preload nodes are peered with Textile, and that is how I was able to get them to the network.

@sanderpick ah.. yeah, that would explain the above observation. Those nodes need to be publicly dialable for other nodes behind NAT to be able to reach them.

sanderpick · 2020-09-29T20:57:20Z

Sounds good! It's now publicly visible (telnet 40.76.153.74 4001) but not announcing there yet. We'll schedule some downtime tomorrow morning to get that going and update the node to ipfs/go-ipfs:v0.7.0.

sanderpick · 2020-09-30T23:36:13Z

Reporting back here. That node is now running ipfs/go-ipfs:v0.7.0 and is announcing a public IP:

Swarm announcing /ip4/40.76.153.74/tcp/4001

lidel · 2020-10-01T14:45:48Z

I've added https://slate.textile.io/ipfs/bafkreid5w43amr736etsba7jkpyqlh4tb5powe3ftssd7f5ws5dkfeekqe but my local node is unable to find it via DHT (been looking for over 10 minutes+)

@sanderpick what is the PeerID of that machine?
I want to confirm it is dialable from behind NAT (and that ipfs dht findpeer returns the address you mentioned).

sanderpick · 2020-10-01T15:00:55Z

PeerID: QmR69wtWUMm1TWnmuD4JqC1TWLZcc8iR2KrTenfZZbiztd.

From my local node,

⋊> ~ ipfs dht findpeer QmR69wtWUMm1TWnmuD4JqC1TWLZcc8iR2KrTenfZZbiztd
/ip4/40.76.153.74/tcp/4001
⋊> ~ ipfs swarm connect /ip4/40.76.153.74/tcp/4001/p2p/QmR69wtWUMm1TWnmuD4JqC1TWLZcc8iR2KrTenfZZbiztd
connect QmR69wtWUMm1TWnmuD4JqC1TWLZcc8iR2KrTenfZZbiztd success

But even after connecting, ipfs get /ipfs/bafkreid5w43amr736etsba7jkpyqlh4tb5powe3ftssd7f5ws5dkfeekqe hangs and hangs. More investigation needed.

jimmylee · 2020-10-28T23:11:59Z

I need to double check if this is still a problem, I'll verify and ping the necessary parties again.

lidel · 2020-11-12T13:36:50Z

@jimmylee For what its worth I still experience the problem with content discovery of newly added content 😿

For example, when redirect in IPFS Companion is enabled, my local IPFS node is unable to find content from:

What is concerning, is that it fails to find the content even if I manually connect to peer provided by @sanderpick:

$ ipfs swarm connect /p2p/QmR69wtWUMm1TWnmuD4JqC1TWLZcc8iR2KrTenfZZbiztd
connect QmR69wtWUMm1TWnmuD4JqC1TWLZcc8iR2KrTenfZZbiztd success

A quick fix is to disable IPFS integration for slate.host website:

..but we really need to figure out the content discovery problem.
Without this working, it had sell is just a centralized website.

Note that loading content from local IPFS node works fine on other websites that use content-addressed assets, such as Audius. @jessicaschilling did case study on Audius and can put you in touch with them if you'd like to compare notes on backend setup related to the way content is provided to the IPFS network.

sanderpick · 2020-11-13T06:56:15Z

Hey @lidel @jimmylee sorry to have dropped the ball here. I just popped open this nodes logs and saw a number of these errors shown in this issue:

2020-11-04T23:58:33.322Z	ERROR	dht	ignoring incoming dht message while not in server mode

However, the config is definitely not setup to be running in client mode. Maybe related to high CPU as mentioned in the issue above.

In any case, I also noticed that "DisableNatPortMap" was set to true. Out of curiosity, I flipped that to false. After restarting the node, I can ipfs get Slate Cids and browse the site IPFS Companion on. This node runs on a Kubernetes pod, so maybe something special is going on with the networking which requires the NAT port map. Another possibility is the node just needed a restart to reduce CPU... in which case this probably isn't a permanent fix.

lidel · 2020-11-13T13:52:03Z

I get mixed results. Can confirm the content routing issue seems to be gone for links I posted (content was found fast and loads fine from my local node), however other ones (eg. https://slate.host/bitgraves/september) still struggle to find the content (ipfs dht findprovs bafybeiensbi2qyx2fpyjhl32deplv264ewf2ceo5duypfmx6ykraf7nc3u returns nothing).

Gut feeling: perhaps it only works when you are directly connected to the node in the pod?
If your local node was connected to the node in the pod, then you were able to cache the content form my links and then your local node started providing it to the network. This would explain why common links posted here work for me (pod+your laptop), but not the other stuff (only pod).

@sanderpick some ideas to try:

See if Reprovider.Interval and Reprovider.Strategy are set to all and 12h or something else
If you are running behind NAT
- Try.. not doing that on the server, if possible :-)
  (not familiar with Kubernetes enough to tell if its feasible).
- If you need to run behind NAT,
  - Set Routing.Type to dhtclient to avoid ignoring incoming dht message while not in server mode in logs (sounds that Kubernetes NAT causes your node to switch into client mode anyway).
  - Try to manually forward swarm ports, then try adding public IP+port to Addresses.Announce list to ensure publicly dialable address is published to DHT. This should help if your network topology is too complex for go-ipfs to infer its own publicly diallable address for some reason.
Did you apply server profile?
- https://docs.ipfs.io/reference/cli/#ipfs-config-profile-apply
- https://github.com/ipfs/go-ipfs-config/blob/v0.10.0/profile.go#L52
  - iiuc it disables use of local addresses from being published to DHT, so client won't waste time trying them
  - note that it overwrites some keys in existing config (details under second link above), so after applying the profile you need to set keys like DisableNatPortMap back if needed.
Mind sharing full config? Perhaps something sticks out.

sanderpick · 2020-11-13T15:47:28Z

Gut feeling: perhaps it only works when you are directly connected to the node in the pod?

After a restart (low CPU), I was able to browse slate pages with fresh local node (no direct connection and no caching).

* See if `Reprovider.Interval` and `Reprovider.Strategy` are set to `all` and `12h` or something else

Yep, not changed from the default.

* If you are running behind NAT

No.

  * If you need to run behind NAT,      
    * Set [Routing.Type](https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#routing) to `dhtclient`  to avoid `ignoring incoming dht message while not in server mode` in logs (sounds that Kubernetes NAT causes your node to switch into client mode anyway).

It appears that when the node goes under very high load (providing a huge amount of Cids), the connectivity suffers, and we see these ignoring incoming dht message while not in server mode errors. I'll compare results with a staging setup that has many fewer Cids.

    * Try to [manually forward swarm ports](https://kubernetes.io/docs/tutorials/stateless-application/expose-external-ip-address/), then  try [adding public IP+port to  `Addresses.Announce` list](https://discuss.ipfs.io/t/how-to-add-external-ip-to-ipfs-swarm-announcing-list/4647/3?u=lidel) to ensure publicly dialable address is published to DHT. This should help if your network topology is too complex for go-ipfs to infer its own publicly diallable address for some reason.

The public IP is included in Addresses.Announce as mentioned above.

* Did you apply `server` profile?

Partially. We can't use the address filters with Kubernetes. I can manually pluck the filters that will work, but I doubt that will solve the connectivity issue. DisableNatPortMap has previously been set to true, but as mentioned above I flipped it... though again I doubt this is related.

* Mind sharing full config? Perhaps something sticks out.

Sure, https://gist.github.com/sanderpick/7bd7eb045b31f17e14a80a408e4a1b10

sanderpick · 2020-11-13T16:04:30Z

Idea from @jsign: Since this machine is only pinning buckets, would it be possible to only provide recursively pinned Cids? That may reduce the load, and since a connection should then exist (at least intermittently), the other Cids will be indirectly discoverable.

In any case, I think we need some config tweaking to handle huge amounts of Cids. Any recommendations @lidel, @aschmahmann, @hsanjuan?

jsign · 2020-12-15T13:35:24Z

Here are some experiments with the theory DHT reproviding being the cause of the issue.

Use pinned reproviding strategy:

35hs approx to reach CPU limit.

Use roots reproviding stragegy:

Some days to ~reach CPU limit.

So it seems that using roots alleviates the issue, but we still hit the limit.

While still running with roots I took a CPU pprof profile with the hot-path being:

So it looks like most of CPU usage is related to querying peers in the DHT, so might also confirm is related to reproviding?

Some extra facts about this IPFS node below.

Stats:

NumObjects: 6163728
RepoSize:   1059479615925
StorageMax: 1000000000000
RepoPath:   /data/ipfs
Version:    fs-repo@10

The number of pins --type=recursive is: 122138.
As shown in the CPU usage history, the node has a limit of vCPU (are these enough resources for this IPFS node size?).
The config is quite default-ish (apart from reproviding strategy changes), so might be non-optimal so don't assume an optimized one.

jsign · 2020-12-15T15:00:04Z

@ribasushi, recommended trying out a quite reasonable config change: disable QUIC.
We did, now we'll wait some hours and I'll report back here how this worked out.

momack2 · 2020-12-17T22:21:32Z

@aschmahmann and @jacobheun as FYI for the roots pinning profiles (these are gold @jsign!)

aschmahmann · 2020-12-18T00:23:24Z

@momack2 we're in touch. I'm pretty sure those profiles have very little to do with providing/advertising since we are "finding providers" in that profile.

My guess is that this has to do with Textile having many IPNS over PubSub channels and periodically querying the DHT to find if anyone new has joined the channel. I suspect Textile is pushing this feature harder than most and with hundreds of topics per node is doing a lot of crawling. They're running some tests now where periodically searching for new PubSub peers is just disabled to see if that's really the issue. Once we've confirmed then we can discuss what the options are.

Note: I don't currently have a good guess as to why switching from providing pins to roots would be any less work for this machine since I don't think they're likely to be able to reprovide even 100k pin roots within the default 12 hr period.

kuzdogan · 2022-01-28T14:21:53Z

Hey @lidel @jimmylee sorry to have dropped the ball here. I just popped open this nodes logs and saw a number of these errors shown in this issue:
2020-11-04T23:58:33.322Z	ERROR	dht	ignoring incoming dht message while not in server mode
However, the config is definitely not setup to be running in client mode. Maybe related to high CPU as mentioned in the issue above.

In any case, I also noticed that "DisableNatPortMap" was set to true. Out of curiosity, I flipped that to false. After restarting the node, I can ipfs get Slate Cids and browse the site IPFS Companion on. This node runs on a Kubernetes pod, so maybe something special is going on with the networking which requires the NAT port map. Another possibility is the node just needed a restart to reduce CPU... in which case this probably isn't a permanent fix.

We are experiencing massive resource consumption on our nodes. It seemed to increase when we start getting the error above. Tried running on a server profile with ipfs init --profile server but didn't help. Also using pinned strategy.

Anyone found a solution to this?

Our config: https://gist.github.com/kuzdogan/c1d69dabafc8286f31afc8bb988099b8

jimmylee added the Bug Something we want to fix. label Sep 29, 2020

jimmylee self-assigned this Sep 29, 2020

lidel mentioned this issue Sep 29, 2020

Connect to known providers if dnsaddr exist for redirected gateway ipfs/ipfs-companion#925

Open

jimmylee changed the title ~~CRITICAL: IPFS Companion Prevents Slate Gateway URLs from working.~~ CRITICAL: IPFS Companion exposes issue where Slate Gateway URLs are not resolving when the IPFS Companion Extension is enabled. Sep 29, 2020

jimmylee mentioned this issue Oct 31, 2020

November Task List - Things I need to get done in November that can not drag out. #401

Closed

9 tasks

jimmylee assigned martinalong Mar 29, 2021

martinalong closed this as completed Mar 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRITICAL: IPFS Companion exposes issue where Slate Gateway URLs are not resolving when the IPFS Companion Extension is enabled. #342

CRITICAL: IPFS Companion exposes issue where Slate Gateway URLs are not resolving when the IPFS Companion Extension is enabled. #342

jimmylee commented Sep 29, 2020 •

edited by sync-by-unito bot

Loading

sanderpick commented Sep 29, 2020

jimmylee commented Sep 29, 2020

olizilla commented Sep 29, 2020 •

edited

Loading

lidel commented Sep 29, 2020 •

edited

Loading

jimmylee commented Sep 29, 2020

jimmylee commented Sep 29, 2020

lidel commented Sep 29, 2020 •

edited

Loading

olizilla commented Sep 29, 2020

jimmylee commented Sep 29, 2020

carsonfarmer commented Sep 29, 2020

jimmylee commented Sep 29, 2020

sanderpick commented Sep 29, 2020

lidel commented Sep 29, 2020 •

edited

Loading

sanderpick commented Sep 29, 2020

sanderpick commented Sep 30, 2020

lidel commented Oct 1, 2020

sanderpick commented Oct 1, 2020

jimmylee commented Oct 28, 2020

lidel commented Nov 12, 2020 •

edited

Loading

sanderpick commented Nov 13, 2020

lidel commented Nov 13, 2020 •

edited

Loading

sanderpick commented Nov 13, 2020

sanderpick commented Nov 13, 2020

jsign commented Dec 15, 2020 •

edited

Loading

jsign commented Dec 15, 2020 •

edited

Loading

momack2 commented Dec 17, 2020

aschmahmann commented Dec 18, 2020 •

edited

Loading

kuzdogan commented Jan 28, 2022 •

edited

Loading

CRITICAL: IPFS Companion exposes issue where Slate Gateway URLs are not resolving when the IPFS Companion Extension is enabled. #342

CRITICAL: IPFS Companion exposes issue where Slate Gateway URLs are not resolving when the IPFS Companion Extension is enabled. #342

Comments

jimmylee commented Sep 29, 2020 • edited by sync-by-unito bot Loading

Reproduction

sanderpick commented Sep 29, 2020

jimmylee commented Sep 29, 2020

olizilla commented Sep 29, 2020 • edited Loading

lidel commented Sep 29, 2020 • edited Loading

jimmylee commented Sep 29, 2020

jimmylee commented Sep 29, 2020

lidel commented Sep 29, 2020 • edited Loading

olizilla commented Sep 29, 2020

jimmylee commented Sep 29, 2020

carsonfarmer commented Sep 29, 2020

jimmylee commented Sep 29, 2020

sanderpick commented Sep 29, 2020

lidel commented Sep 29, 2020 • edited Loading

sanderpick commented Sep 29, 2020

sanderpick commented Sep 30, 2020

lidel commented Oct 1, 2020

sanderpick commented Oct 1, 2020

jimmylee commented Oct 28, 2020

lidel commented Nov 12, 2020 • edited Loading

sanderpick commented Nov 13, 2020

lidel commented Nov 13, 2020 • edited Loading

sanderpick commented Nov 13, 2020

sanderpick commented Nov 13, 2020

jsign commented Dec 15, 2020 • edited Loading

jsign commented Dec 15, 2020 • edited Loading

momack2 commented Dec 17, 2020

aschmahmann commented Dec 18, 2020 • edited Loading

kuzdogan commented Jan 28, 2022 • edited Loading

jimmylee commented Sep 29, 2020 •

edited by sync-by-unito bot

Loading

olizilla commented Sep 29, 2020 •

edited

Loading

lidel commented Sep 29, 2020 •

edited

Loading

lidel commented Sep 29, 2020 •

edited

Loading

lidel commented Sep 29, 2020 •

edited

Loading

lidel commented Nov 12, 2020 •

edited

Loading

lidel commented Nov 13, 2020 •

edited

Loading

jsign commented Dec 15, 2020 •

edited

Loading

jsign commented Dec 15, 2020 •

edited

Loading

aschmahmann commented Dec 18, 2020 •

edited

Loading

kuzdogan commented Jan 28, 2022 •

edited

Loading