Duplicate Data increases with the number of nodes serving the file. #4588

natewalck · 2018-01-17T05:43:12Z

Version information:

ipfs version --all
go-ipfs version: 0.4.13-
Repo version: 6
System version: amd64/linux
Golang version: go1.9.2

Type: Bug

Severity: Medium

Description:

When running ipfs get on a given file, the more nodes that have the file and supply it to the test node, the more dupe data the node receives. This leads to lower performance and massive amounts of bandwidth waste.

Test Setup:
Testing was done with EC2 medium instances that all lived on the same subnet. Bootstrap was updated to ensure all nodes could find each other correctly.

ipfs swarm peers was used to confirm that the nodes were connected before doing ipfs get on the test file.

Before testing different file sizes, the ipfs daemon was stopped, .ipfs was deleted and the node was re-provisioned using ipfs init and the ipfs bootstrap add command. This ensured no files were cached and the stats were only for the test file in question.

Test files used were as follows:
5.1GB - sintel_4k.mov - QmWntgau1qWJh7hos91e6CqEzSWfaSn7permky8A3WJEnS
1.1GB - Sintel.2010.1080p.mkv - QmUwZFGPptdF5ZG58EdozjDSXYugPsxe1MwPZFQ4vZmAsb
649MB - Sintel.2010.720p.mkv - QmcdSfr63CHZ3sJkubrozeRmT4bo2DqpD8DKPFfhNby4FB

Test files can be found here: http://download.blender.org/durian/movies/

Replication procedure:

Configure a fresh ipfs node
Add the test file to the node (Node 1)
Configure another ipfs node (Node N)
Run ipfs get HASHHERE on Node N
Record output of ipfs stats bw and ipfs bitswap stat
Repeat steps 3-5 until you have Node 6 retrieving the test file from Nodes 1-5 (Which have each done an ipfs get on the test file over the course of this testing).

See full dataset here: https://gist.github.com/natewalck/c739b57b1e90dfe2092344f78bf7de78

For each node that the test file was retrieved from, the duplicate data rate increased in a linear fashion with the number of nodes that served the file. For instance, if Node 3 retrieved the test file from Nodes 1 and 2, the duplicate data can be expected to be 100% of the file size. If Node 4 retrieves data from Nodes 1, 2 and 3, the expected duplicate data is around 200% of the file size.

iftop was used to validate that the actual traffic incoming to the node matched the data observed in TotalIn from the ipfs stats bw command.

In the chart below, each of the test files is compared as number of nodes vs duplicate data received. As you can see, it is almost completely linear.

I'm not sure if the situation is better for small files (it probably impacts the transfer of smaller files to a lesser degree due to their size), but this seems like a rather large issue for big files.

One use case for ipfs is a distributed yum/software repo. With the current bitswap/wantlist performance, it would be difficult to host rpms and serve them out to clients in a performant fashion.

I'm not sure where to start looking to optimize this issue, but I wanted to investigate it and provide some data. Is it possible this is caused by a node requesting all of the same blocks from other nodes in its wantlist, receiving the blocks from all of the nodes at nearly the same time and then requesting the same block yet again to all the nodes, etc?

Thanks for all the work you are doing on IPFS, it is a fantastic project! :)

The text was updated successfully, but these errors were encountered:

jessepeterson · 2018-01-17T05:49:37Z

I'm working with Nate on this problem as well. As a (possibly) related issue I found when looking into this issue — I still seem to get duplicated data when I change the maxProvidersPerRequest from its default 3 to 1 and recompile. I'm probably misunderstanding something but shouldn't that only permit one potential host per block and thus completely rule out any duplicate blocks?

leerspace · 2018-01-17T22:19:14Z

Related to (or duplicate of?) #3802, #3786 and #1750

edit: added #3802 to this list

natewalck · 2018-01-19T16:17:39Z

@leerspace I agree. It would be nice if these 4 issues could be summarized and squashed into one issue for each underlying improvement. It seems like this issue has been around for a bit, but might have gotten buried over time.

kvm2116 · 2018-03-19T18:20:55Z

Is there any update here? Has anyone been able to fix the duplicate blocks issue?
I am building an application on top of IPFS and I am running into the duplicate blocks issue (leading to poor performance for the application).

If this issue has been fixed, where should I download it?
If not, how can I contribute?

momack2 · 2020-05-29T17:29:49Z

We've made a lot of improvements to Bitswap that rolled out in the go-ipfs 0.5 release (#6782) -- addressing this exact "duplicate data" performance challenge, so closing this issue to redirect the conversation/evaluation there. =]

christroutner · 2021-12-26T03:42:30Z

@momack2 were those improvements ever pushed to js-ipfs?

momack2 · 2021-12-31T23:01:00Z

Not sure - @dirkmc @achingbrain - do you know? If not, is there an open issue for someone to pick this up?

Stebalien mentioned this issue Jan 26, 2018

Unexpected network overhead: 1800% #4612

Closed

ivan386 mentioned this issue Jun 11, 2018

Ivan386/bitswap 1.2.0 #5104

Closed

davinci26 mentioned this issue Jul 18, 2018

IPFS Performance #5226

Closed

momack2 closed this as completed May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate Data increases with the number of nodes serving the file. #4588

Duplicate Data increases with the number of nodes serving the file. #4588

natewalck commented Jan 17, 2018

jessepeterson commented Jan 17, 2018

leerspace commented Jan 17, 2018 •

edited

Loading

natewalck commented Jan 19, 2018

kvm2116 commented Mar 19, 2018

momack2 commented May 29, 2020

christroutner commented Dec 26, 2021

momack2 commented Dec 31, 2021

Duplicate Data increases with the number of nodes serving the file. #4588

Duplicate Data increases with the number of nodes serving the file. #4588

Comments

natewalck commented Jan 17, 2018

Version information:

Type: Bug

Severity: Medium

Description:

jessepeterson commented Jan 17, 2018

leerspace commented Jan 17, 2018 • edited Loading

natewalck commented Jan 19, 2018

kvm2116 commented Mar 19, 2018

momack2 commented May 29, 2020

christroutner commented Dec 26, 2021

momack2 commented Dec 31, 2021

leerspace commented Jan 17, 2018 •

edited

Loading