[RFC/Braindump] Change how blocks are stored to allow for filesystem features (reflink, dedup, etc...) #10016
Closed
3 tasks done
Labels
kind/enhancement
A net-new feature or improvement to an existing feature
need/triage
Needs initial labeling and prioritization
Checklist
Description
Hi,
#8198 is very much related to this though it relies on OS features and a blockstore implementing those. My proposal is taking a 180 degree different angle on this. Both would achieve the same goal (data deduplication).
Currently adding a file to IPFS chunks it and adds those individual blocks to the data folder. The filestore datastore partly does what's proposed here. Though it's design - not storing the file itself - is counter to this proposal. A version that does what's proposed here would be the filestore with the added feature to save the full data too. While very redundant at first glance, it allows for filesystem features to kick in and solve that redundancy.
Some command outputs to help clear the intent. Let's say i have this big file:
The file, in this case, would have the following blake3 hash. This matters for the example later on when it's in the ipfs cache folder.
Now if we add this file
ipfs add big_file
we get (side note, the hash doesn't matter, just using blake3 cause i like it):Within the ipfs data folder we should now have the file
bafyb4idj7r6ilfrlve7hwgx2kqgoquh6snhgkr2srpmlj32zgx6jwy5kci
but with the blake3 crc ofb792e5e711008fcf90548189bbe4771aea9c1a9751066c866e05637124791ade
, thus matching the source file! It's a copy.All of the above is just using commands to visualize the intended output. None of this exists in IPFS yet. But if it were to exist then this would give us all the features filesystems have to offer!
A thing making this slightly complicated is missing metadata. You need to know the chunk, the offset and the CID it has. The
ipfs add --nocopy
command is probably doing something like this already (haven't looked at it's code) so technically this shouldn't be that difficult to solve. I can see two possible ways that would seem clean to me.Option 1: a file per chunk
When adding a file in this proposed mechanism would create a folder in this structure:
<cid>_<num_of_blocks>_metadata
(so:bafyb4idj7r6ilfrlve7hwgx2kqgoquh6snhgkr2srpmlj32zgx6jwy5kci_43580_metadata
)The
num_of_blocks
in the name tells kubo how many blocks that file has. If a count of all files within that folder is done and that count equals what's in the folder name (thus43580
in this case) then kubo can assume the whole file is known to kubo.The contents of this folder would contain the individual chunks in the following format (note only filenames, no file content):
<cid>_<chunk_begin_in_bytes>_<chunk_end_in_bytes>
Which would make it look like:
This structure allows to quickly know which chunks of said file are known to the IPFS instance, a simple file exists check for a specific chunk would tell.
The CRC of that specific chunk is part of the CID spec so if needed a specific chunk can be checked for CRC too.
Knowing which chunks are missing is slightly more complicated in this option as one would need to parse all filenames to build op a structure to tell them which chunks are missing.
Option 2: Single metadata file
Alternatively a single metadata file with it's name in this format:
<cid>_<num_of_blocks>_metadata
The name logic is the same as option 1 above, just as a file instead of a folder.
The file content would tell information per block.
The format would be:
<index>,<cid>,<chunk_begin_in_bytes>,<chunk_end_in_bytes>
Which would look somewhat like this:
The index here defines the individual block position. This also means that the file doesn't have to be sorted. Any order would be perfectly valid too:
This is done to make appending to this file possible when downloading the source file. A block could arrive out-of-order which can just be appended to the metadata file in this design.
Knowing if the file is complete can be done by counting the lines in this metadata file. If that matches the number in the metadata filename (43580 in this case) then you can assume the whole file and all it's blocks are known to Kubo.
Is the whole file known?
As a side note to both options. What i'm describing here are weak checks based on a count to determine if a file is complete. Those checks could be stronger. Blake3 streaming verification might be interesting.
Both options are lightweight
The metadata is the only added data to make a file work for IPFS. The size of the metadata proposed here is in the order of kilobytes. The metadata for a single block would be:
bafyb4idj7r6ilfrlve7hwgx2kqgoquh6snhgkr2srpmlj32zgx6jwy5kci_0000000000000_0000000000000
Say that's - rounded - to 90 bytes, and that's being generous with offsets in the range of terrabytes and the lengthy bafy cid. If a file would have 10000 of such blocks then it's metadata overhead size would be (90 * 10000) ~878 kilobytes. With the default chunker (creates 256KB blocks) you'd have ~10000 blocks with a file of ~2.5GB.
Opinions?
Let me know what you think of this idea!
Are there better ways perhaps?
The text was updated successfully, but these errors were encountered: