-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix: Do not load all of a DAG into memory when pinning #2372
Conversation
Given a `CID`, the `dag._recursiveGet` method returns a list of all descendents of the node with the passed `CID`. This can cause enormous memory useage when importing large datasets. Where this method is invoked the results are either a) disgarded or b) used to calculate the `CID`s of the nodes which is then bad for memory *and* CPU usage. This PR removes the buffering and `CID` recalculating for a nice speedup when adding large datasets. fixes #2310
Holy wow 3s faster than go-ipfs! |
.aegir.js
Outdated
@@ -8,7 +8,7 @@ const ipfsdServer = IPFSFactory.createServer() | |||
const preloadNode = MockPreloadNode.createNode() | |||
|
|||
module.exports = { | |||
bundlesize: { maxSize: '689kB' }, | |||
bundlesize: { maxSize: '756KB' }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is change is needed, In this branch I'm getting the following:
⨎ npx aegir build --bundlesize
Child
1466 modules
Child
1466 modules
PASS ./dist/index.min.js: 682.3KB < maxSize 756KB (gzip)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert since we're disabling it anyway
@@ -25,7 +25,7 @@ jobs: | |||
include: | |||
- stage: check | |||
script: | |||
- npx aegir build --bundlesize | |||
# - npx aegir build --bundlesize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self we need a sister PR that re-enables this after it is merged.
src/core/components/pin.js
Outdated
q.push({ cid }) | ||
} | ||
|
||
function getIndirectKeys (callback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to pass the preload
option from pin.ls
here and then pass it to walkDag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
For certain datasets under certain conditions. Not cracking out the 🍾 just yet... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Port of #2372 into gc branch to ease merging
Port of #2372 into gc branch to ease merging
Port of #2372 into gc branch to ease merging
Port of #2372 into gc branch to ease merging
Port of #2372 into gc branch to ease merging
Given a
CID
, thedag. _getRecursive
method returns a list of all descendents of the node with the passedCID
. This can cause enormous memory usage when importing large datasets.Where this method is invoked the results are either a) disgarded or b) used to calculate the
CID
s of the nodes which is then bad for memory and CPU usage.This PR removes the buffering and
CID
recalculating for a nice speedup when adding large datasets.In my (non-representative, may need all the other unfinished async/iterator stuff) testing, importing folder of 4MB files totalling about 5GB files with content from
/dev/urandom
into a fresh repo with a daemon running in the background is now:Which is nice.
fixes #2310