-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate Data increases with the number of nodes serving the file. #4588
Comments
I'm working with Nate on this problem as well. As a (possibly) related issue I found when looking into this issue — I still seem to get duplicated data when I change the |
@leerspace I agree. It would be nice if these 4 issues could be summarized and squashed into one issue for each underlying improvement. It seems like this issue has been around for a bit, but might have gotten buried over time. |
Is there any update here? Has anyone been able to fix the duplicate blocks issue? If this issue has been fixed, where should I download it? |
We've made a lot of improvements to Bitswap that rolled out in the go-ipfs 0.5 release (#6782) -- addressing this exact "duplicate data" performance challenge, so closing this issue to redirect the conversation/evaluation there. =] |
@momack2 were those improvements ever pushed to js-ipfs? |
Not sure - @dirkmc @achingbrain - do you know? If not, is there an open issue for someone to pick this up? |
Version information:
Type: Bug
Severity: Medium
Description:
When running
ipfs get
on a given file, the more nodes that have the file and supply it to the test node, the more dupe data the node receives. This leads to lower performance and massive amounts of bandwidth waste.Test Setup:
Testing was done with EC2 medium instances that all lived on the same subnet. Bootstrap was updated to ensure all nodes could find each other correctly.
ipfs swarm peers
was used to confirm that the nodes were connected before doingipfs get
on the test file.Before testing different file sizes, the ipfs daemon was stopped,
.ipfs
was deleted and the node was re-provisioned usingipfs init
and theipfs bootstrap add
command. This ensured no files were cached and the stats were only for the test file in question.Test files used were as follows:
5.1GB - sintel_4k.mov - QmWntgau1qWJh7hos91e6CqEzSWfaSn7permky8A3WJEnS
1.1GB - Sintel.2010.1080p.mkv - QmUwZFGPptdF5ZG58EdozjDSXYugPsxe1MwPZFQ4vZmAsb
649MB - Sintel.2010.720p.mkv - QmcdSfr63CHZ3sJkubrozeRmT4bo2DqpD8DKPFfhNby4FB
Test files can be found here: http://download.blender.org/durian/movies/
Replication procedure:
ipfs get HASHHERE
on Node Nipfs stats bw
andipfs bitswap stat
ipfs get
on the test file over the course of this testing).See full dataset here: https://gist.github.com/natewalck/c739b57b1e90dfe2092344f78bf7de78
For each node that the test file was retrieved from, the duplicate data rate increased in a linear fashion with the number of nodes that served the file. For instance, if Node 3 retrieved the test file from Nodes 1 and 2, the duplicate data can be expected to be 100% of the file size. If Node 4 retrieves data from Nodes 1, 2 and 3, the expected duplicate data is around 200% of the file size.
iftop
was used to validate that the actual traffic incoming to the node matched the data observed inTotalIn
from theipfs stats bw
command.In the chart below, each of the test files is compared as number of nodes vs duplicate data received. As you can see, it is almost completely linear.
I'm not sure if the situation is better for small files (it probably impacts the transfer of smaller files to a lesser degree due to their size), but this seems like a rather large issue for big files.
One use case for ipfs is a distributed yum/software repo. With the current bitswap/wantlist performance, it would be difficult to host rpms and serve them out to clients in a performant fashion.
I'm not sure where to start looking to optimize this issue, but I wanted to investigate it and provide some data. Is it possible this is caused by a node requesting all of the same blocks from other nodes in its wantlist, receiving the blocks from all of the nodes at nearly the same time and then requesting the same block yet again to all the nodes, etc?
Thanks for all the work you are doing on IPFS, it is a fantastic project! :)
The text was updated successfully, but these errors were encountered: