Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(iroh-bytes): Rewrite the blob store to use redb #2051

Merged
merged 153 commits into from
Mar 19, 2024
Merged

Conversation

rklaehn
Copy link
Contributor

@rklaehn rklaehn commented Feb 29, 2024

Description

Rewrite the blob store to use redb

This does a number of things:

  1. rewrite the blob store to use redb.
  2. inline small data blobs as well as small outboards in the redb db
  3. properly handle remotes lying about sizes
  4. make sure all redb interactions, which can be IO, are done in an actor on a thread

It also adds a consistency check of the store as a feature.

Since this is a totally new new store format, it provides functions to import from a flat store, import_flat_store rather than having a migration. This just takes the paths of some flat store directories and adsorbs all the data in that location.

Notes & open questions

  • Open question: should the inline data and inline outboard store be combined into one?
  • Todo: audit the places where the actor terminates. If it does, the store is dead. We don't want to do this unless there is something really bad going on, on the other hand it does not make much sense to continue with a broken store.

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.

Use Self::POSTCARD_MAX_SIZE for fixed size
...to allow for stores that use io even for the lookup, such as a redb store.

Also use iroh_base::Hash instead of blake3::Hash, and make gc delete use batches.
# Conflicts:
#	iroh-bytes/src/get/db.rs
#	iroh/src/downloader/get.rs
# Conflicts:
#	iroh/src/client.rs
#	iroh/src/node.rs
#	iroh/src/rpc_protocol.rs
Also align existing comments with reality
The split is now an impl detail of the batch writer
tx.send(export_file_copy(temp_tag, path, size, target, progress))
.ok();
});
if path == target {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if we call export_file with the same external path but different blob?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you would break it as well. But that's on you then. When using external files the user promises to keep the files unchanged.

If you do
export <hash1> foo.bin
export <hash2> foo.bin

you are basically violating that promise.

Copy link
Contributor

@ppodolsky ppodolsky Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is a very poor man solution, but may it worth to add nested size comparision here to detect 99.99% of such cases and add a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, there are a million ways how you can modify files and void the contract. This would just avoid one, and only in some cases.

what you could do is to check if the target is in the set of all external files and in that case refuse. It would be costly though, and it would still not help with symlinks and hardliks.

Copy link
Contributor

@ppodolsky ppodolsky Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant cases when you export file to existing file path, not all possible cases. I can easily imagine cases when people do something like iroh export-file <doc>/<very-complicated-key-name-of-large-file>.zip ~/archive.zip with different files and warning would cover most of such cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just a warning if you are going to override something? Would make sense in any case, and when you use it from code you would not pay for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, warning would be nice

rklaehn added 4 commits March 14, 2024 11:45
## Description

refactor(iroh-bytes): Weak entry map

Get rid of some of the drop complexity by storing a weak handle map. The
handles themselves have a strong reference, so they are still easy to
use.

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist

- [ ] Self-review.
- [ ] Documentation updates if relevant.
- [ ] Tests if relevant.
# Conflicts:
#	iroh-cli/src/commands/blob.rs
#	iroh-cli/src/commands/doctor.rs
#	iroh-cli/src/commands/start.rs
#	iroh-cli/tests/cli.rs
#	iroh/Cargo.toml

impl Store {
/// Get the complete state of an entry, both in memory and in redb.
#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make the whole module cfg[test]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used from the cli tests, which are not in the same crate.

# Conflicts:
#	iroh-cli/src/commands/start.rs
#	iroh/examples/rpc.rs
#	iroh/src/node.rs
@rklaehn rklaehn added this pull request to the merge queue Mar 19, 2024
Merged via the queue into main with commit 980b53d Mar 19, 2024
20 checks passed
@rklaehn rklaehn deleted the bao-file branch March 19, 2024 09:47
matheus23 pushed a commit that referenced this pull request Nov 14, 2024
## Description

Rewrite the blob store to use redb

This does a number of things:
1. rewrite the blob store to use redb.
2. inline small data blobs as well as small outboards in the redb db
3. properly handle remotes lying about sizes
4. make sure all redb interactions, which can be IO, are done in an
actor on a thread

It also adds a consistency check of the store as a feature.

Since this is a totally new new store format, it provides functions to
import from a flat store, `import_flat_store` rather than having a
migration. This just takes the paths of some flat store directories and
adsorbs all the data in that location.

## Notes & open questions

- Open question: should the inline data and inline outboard store be
combined into one?
- Todo: audit the places where the actor terminates. If it does, the
store is dead. We don't want to do this unless there is something really
bad going on, on the other hand it does not make much sense to continue
with a broken store.

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist

- [x] Self-review.
- [x] Documentation updates if relevant.
- [x] Tests if relevant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants