Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add builder for unixfs dags #12

Merged
merged 20 commits into from
Oct 18, 2021
Merged

Add builder for unixfs dags #12

merged 20 commits into from
Oct 18, 2021

Conversation

willscott
Copy link
Collaborator

No description provided.

// * we assume we are using CIDv1, which has implied that the leaf
// data nodes are stored as raw bytes.
// ref: https://github.com/ipfs/go-mfs/blob/1b1fd06cff048caabeddb02d4dbf22d2274c7971/file.go#L50
func BuildUnixFSFile(r io.Reader, chunker string, ls *ipld.LinkSystem) (ipld.Link, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's also a way we could make a NodeAssembler type somewhere that has a constructor taking a chunker string, ls *ipld.LinkSystem, and then we can have its AssignBytes method call this?

It wouldn't be memory-efficient of course, but the fact that it works at all would be nice.

More importantly, it would get us within a centimeter of adding an AssignFromReader method to that type, or something of that nature, which we could later let people feature-detect for.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to do it at that level, or would we prefer the assembler a level up where it could handle either file or directory assembly (switching on being either given bytes or a map listing of nodes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Still kinda wanna do something about this but don't have an exact fullbaked idea of how I'd want it to look, either, so it's definitely not something to hold up the rest of this PR landing on.)

@warpfork
Copy link
Member

👍 for getting even more of the dots to be connected!

@willscott willscott marked this pull request as ready for review October 10, 2021 23:20
data/builder/directory.go Outdated Show resolved Hide resolved
data/builder/directory.go Outdated Show resolved Hide resolved
data/builder/directory.go Outdated Show resolved Hide resolved
data/builder/directory.go Outdated Show resolved Hide resolved
}

// a shard entry is either another shard, or a direct link.
type entry struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading comprehension check for me: did this not end up using any of the other types or code for dealing with sharding, which we already had for the read direction?

(Not necessarily offering judgement, just surprised if so, enough that I'm wanting to doublecheck my reading.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. it can potentially overlap the shard type here with the _UnixFSHAMTShard in hamt, but
this would end up as the builder type rather than that type itself, and would need to do the full serialization here, since the structure in hamt expects a fully serialized pbnode substrate to exist that it's providing a view over.

Copy link
Collaborator

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mostly concerned there could be bugs in the HAMT code -- see note about log2

I think the most important thing I'd like to know is how are we proving this is generating equivalent outputs. Is this manual? What testing have done so far? Can we automate somehow?

data/builder/dirshard.go Outdated Show resolved Hide resolved

func (s *shard) add(lnk hamtLink) error {
// get the bucket for lnk
bucket, err := lnk.hash.Slice(s.depth*s.size, s.size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you're missing a log2 somewhere here. There's a lot of calculation of log2 in the HAMT sharding code, and it's cause even thought we want 256 width, that only means 8-bits per item.

there are tests in the go-unixfs implementation -- how hard would it be to substantially copy them to very you have the same result?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm totally wrong -- that's part of why I'm wondering if we have a way to equivalence test this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple tests. you were right on this one 😨


// BuildUnixFSFile creates a dag of ipld Nodes representing file data.
// This recreates the functionality previously found in
// github.com/ipfs/go-unixfs/importer/balanced, but tailored to the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guece we can live without trickle dag for the moment... I wonder if we even ever use this.

// go-unixfsnode & ipld-prime data layout of nodes.
// We make some assumptions in building files with this builder to reduce
// complexity, namely:
// * we assume we are using CIDv1, which has implied that the leaf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this part

@willscott
Copy link
Collaborator Author

This now depends on
multiformats/go-multihash#150
ipld/go-ipld-prime#266

@willscott
Copy link
Collaborator Author

i've done some basic sanity checks. i want to wire this up in the downstream ipld/go-car#246 and put some car-equivalence fixtures there in a setting where there's easier access to interact with the data format here

Copy link
Member

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no complaints here :)

(I'd disclaim I'm not an expert unixfsv1 reviewer, either. But it seems to be going in the right direction, at least, afaict.)

Copy link
Contributor

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering whether the Must() calls could realistically lead to panics. If the field should always be set, why is it optional to begin with?

data/builder/directory.go Outdated Show resolved Hide resolved
data/builder/file.go Outdated Show resolved Hide resolved

// rough estimates on expected sizes
var roughLinkBlockSize = 1 << 13 // 8KB
var roughLinkSize = 34 + 8 + 5 // sha256 multihash + size + no name + protobuf framing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make these constants? otherwise I'm wondering whether you modify them anywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are copied directly from how they're defined in the comment above this block.

data/builder/dirshard.go Show resolved Hide resolved
bitfield "github.com/ipfs/go-bitfield"
"github.com/ipfs/go-unixfsnode/data"
"github.com/ipfs/go-unixfsnode/hamt"
dagpb "github.com/ipld/go-codec-dagpb"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's all starting to come together with past work! :)

data/builder/directory.go Outdated Show resolved Hide resolved
data/builder/file.go Outdated Show resolved Hide resolved
@willscott willscott merged commit e0bbe4a into main Oct 18, 2021
@mvdan mvdan deleted the unixfsbuilder branch January 20, 2022 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants