Skip to content

Commit

Permalink
IPLD Prime In IPFS: Target Merge Branch (#7976)
Browse files Browse the repository at this point in the history
* feat: switch to using go-ipld-prime for codecs, path resolution, and the `dag put/get` commands
* fix: `dag put/get` not roundtripping due to an extra new line being added (#3503)

More detailed information is in the CHANGELOG.md file. Very high level:
* IPLD codecs (and their plugins) must use go-ipld-prime
* Added support for the dag-json codec
* `dag get/put` use IPLD codec names from the multicodec table
* `dag get` defaults to dag-json output instead of json, but may output with other codecs
* Data model pathing can be achieved using the /ipld prefix. For example, you can use `/ipld/QmFoo/Links/0/Hash` to traverse through a DagPB node
* With `dag get/put` the DagPB field names have been changed to match the ones in the protobuf listed in the specification

Co-authored-by: hannahhoward <[email protected]>
Co-authored-by: Daniel Martí <[email protected]>
Co-authored-by: acruikshank <[email protected]>
Co-authored-by: Steven Allen <[email protected]>
Co-authored-by: Will Scott <[email protected]>
Co-authored-by: Will Scott <[email protected]>
Co-authored-by: Rod Vagg <[email protected]>
Co-authored-by: Adin Schmahmann <[email protected]>
Co-authored-by: Eric Myhre <[email protected]>
  • Loading branch information
9 people authored Aug 17, 2021
1 parent 360aff4 commit f63a997
Show file tree
Hide file tree
Showing 19 changed files with 511 additions and 222 deletions.
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,28 @@
# go-ipfs changelog

## v0.10.0 TBD

**IPLD Levels Up**

The handling of data serialization as well as many aspects of DAG traversal and pathing have been migrated from older libraries, including [go-merkledag](https://github.com/ipfs/go-merkledag) and [go-ipld-format](https://github.com/ipfs/go-ipld-format) to the new **[go-ipld-prime](https://github.com/ipld/go-ipld-prime)** library and its components. This allows us to use many of the newer tools afforded by go-ipld-prime, stricter and more uniform codec implementations, support for additional (pluggable) codecs, and some minor performance improvements.

This is significant refactor of a core component that touches many parts of IPFS, and does come with some **breaking changes**:

* **IPLD plugins**:
* The `PluginIPLD` interface has been changed to utilize go-ipld-prime. There is a demonstration of the change in the [bundled git plugin](./plugin/plugins/git/).
* **The semantics of `dag put` and `dag get` change**:
* `dag get` now takes the `format` option which accepts a multicodec name used to encode the output. By default this is `dag-json`. Users may notice differences from the previously plain Go JSON output, particularly where bytes are concerned which are now encoded using a form similar to CIDs: `{"/":{"bytes":"unpadded-base64-bytes"}}` rather than the previously Go-specific plain padded base64 string. See the [dag-json specification](https://ipld.io/specs/codecs/dag-json/spec/) for an explanation of these forms.
* `dag get` no longer prints an additional new-line character at the end of the encoded block output. This means that the output as presented by `dag get` are the exact bytes of the requested node. A round-trip of such bytes back in through `dag put` using the same codec should result in the same CID.
* `dag put` uses the `input-enc` option to specify the multicodec name of the format data is being provided in, and the `format` option to specify the multicodec multicodec name of the format the data should be stored in. These formerly defaulted to `json` and `cbor` respectively. They now default to `dag-json` and `dag-cbor` respectively but may be changed to any supported codec (bundled or loaded via plugin) by its [multicodec name](https://github.com/multiformats/multicodec/blob/master/table.csv).
* The `json` and `cbor` multicodec names (as used by `input-enc` and `format` options) are now no longer aliases for `dag-json` and `dag-cbor` respectively. Instead, they now refer to their proper [multicodec](https://github.com/multiformats/multicodec/blob/master/table.csv) types. `cbor` refers to a plain CBOR format, which will not encode CIDs and does not have strict deterministic encoding rules. `json` is a plain JSON format, which also won't encode CIDs and will encode bytes in the Go-specific padded base64 string format rather than the dag-json method of byte encoding. See https://ipld.io/specs/codecs/ for more information on IPLD codecs.
* The **dag-pb codec**, which is used to encode UnixFS data for IPFS, is now represented in a form via the `dag` API that mirrors the protobuf schema used to define the binary format and unifies the implementations and specification of dag-pb across the IPLD and IPFS stacks. Previously, additional layers of code within IPFS between protobuf serialization and UnixFS handling for file and directory data, obscured the forms that are described by the protobuf representation. Much of this code has now been replaced and there are fewer layers of transformation. This means that interacting with dag-pb data via the `dag` API will use different forms:
* Previously, using `dag get` on a dag-pb block would present the block serialized as JSON as `{"data":"padded-base64-bytes","links":[{"Name":"foo","Size":100,"Cid":{"/":"Qm..."}},...]}`.
* Using the dag-pb data model specification and the new default dag-json codec for output, this will now be serialized as: `{"Data":{"/":{"bytes":"unpadded-base64-bytes"}},"Links":[{"Name":"foo","Tsize":100,"Hash":{"/":"Qm..."}},...]}`. Aside from the change in byte formatting, most field names have changed: `data` → `Data`, `links` → `Links`, `Size` → `Tsize`, `Cid` → `Hash`. Note that this output can be changed now using the `--format` option to specify an alternative codec.
* Using `dag put` and a `format` option of `dag-pb` now requires that the input conform to this dag-pb specified form. Previously, input using `{"data":"...","links":[...]}` was accepted, now it must be `{"Data":"...","Links":[...]}`.
* Previously it was not possible to use paths to navigate to any of these properties of a dag-pb node, the only possible paths were named links, e.g. `dag get QmFoo/NamedLink` where `NamedLink` was one of the links whose name was `NamedLink`. This functionality remains the same, but by prefixing the path with `/ipld/` we enter data model pathing semantics and can `dag get /ipld/QmFoo/Links/0/Hash` to navigate to links or `/ipld/QmFoo/Data` to simply retrieve the data section of the node, for example.
* See the [dag-pb specification](https://ipld.io/specs/codecs/dag-pb/) for details on the codec and its data model representation.
* See this [detailed write-up](https://github.com/ipld/ipld/blob/master/design/tricky-choices/dag-pb-forms-impl-and-use.md) for further background on these changes.

## v0.9.1 2021-07-20

This is a small bug fix release resolving the following issues:
Expand Down
9 changes: 6 additions & 3 deletions core/commands/dag/dag.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,10 @@ into an object of the specified format.
cmds.FileArg("object data", true, true, "The object to put").EnableStdin(),
},
Options: []cmds.Option{
cmds.StringOption("format", "f", "Format that the object will be added as.").WithDefault("cbor"),
cmds.StringOption("input-enc", "Format that the input object will be.").WithDefault("json"),
cmds.StringOption("format", "f", "Format that the object will be added as.").WithDefault("dag-cbor"),
cmds.StringOption("input-enc", "Format that the input object will be.").WithDefault("dag-json"),
cmds.BoolOption("pin", "Pin this object when adding."),
cmds.StringOption("hash", "Hash function to use").WithDefault(""),
cmds.StringOption("hash", "Hash function to use").WithDefault("sha2-256"),
},
Run: dagPut,
Type: OutputObject{},
Expand Down Expand Up @@ -108,6 +108,9 @@ format.
Arguments: []cmds.Argument{
cmds.StringArg("ref", true, false, "The object to get").EnableStdin(),
},
Options: []cmds.Option{
cmds.StringOption("format", "f", "Format that the object will be serialized as.").WithDefault("dag-json"),
},
Run: dagGet,
}

Expand Down
45 changes: 39 additions & 6 deletions core/commands/dag/get.go
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
package dagcmd

import (
"strings"
"fmt"
"io"

"github.com/ipfs/go-ipfs/core/commands/cmdenv"
ipldlegacy "github.com/ipfs/go-ipld-legacy"
"github.com/ipfs/interface-go-ipfs-core/path"

"github.com/ipld/go-ipld-prime"
"github.com/ipld/go-ipld-prime/multicodec"
"github.com/ipld/go-ipld-prime/traversal"
mc "github.com/multiformats/go-multicodec"

cmds "github.com/ipfs/go-ipfs-cmds"
)

Expand All @@ -15,6 +22,12 @@ func dagGet(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) e
return err
}

format, _ := req.Options["format"].(string)
var fCodec mc.Code
if err := fCodec.Set(format); err != nil {
return err
}

rp, err := api.ResolvePath(req.Context, path.New(req.Arguments[0]))
if err != nil {
return err
Expand All @@ -25,14 +38,34 @@ func dagGet(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) e
return err
}

var out interface{} = obj
universal, ok := obj.(ipldlegacy.UniversalNode)
if !ok {
return fmt.Errorf("%T is not a valid IPLD node", obj)
}

finalNode := universal.(ipld.Node)

if len(rp.Remainder()) > 0 {
rem := strings.Split(rp.Remainder(), "/")
final, _, err := obj.Resolve(rem)
remainderPath := ipld.ParsePath(rp.Remainder())

finalNode, err = traversal.Get(finalNode, remainderPath)
if err != nil {
return err
}
out = final
}
return cmds.EmitOnce(res, &out)

encoder, err := multicodec.LookupEncoder(uint64(fCodec))
if err != nil {
return fmt.Errorf("invalid encoding: %s - %s", format, err)
}

r, w := io.Pipe()
go func() {
defer w.Close()
if err := encoder(finalNode, w); err != nil {
_ = w.CloseWithError(err)
}
}()

return res.Emit(r)
}
85 changes: 64 additions & 21 deletions core/commands/dag/put.go
Original file line number Diff line number Diff line change
@@ -1,16 +1,28 @@
package dagcmd

import (
"bytes"
"fmt"
"math"

blocks "github.com/ipfs/go-block-format"
"github.com/ipfs/go-cid"
"github.com/ipfs/go-ipfs/core/commands/cmdenv"
"github.com/ipfs/go-ipfs/core/coredag"
ipldlegacy "github.com/ipfs/go-ipld-legacy"
"github.com/ipld/go-ipld-prime/multicodec"
basicnode "github.com/ipld/go-ipld-prime/node/basic"

cmds "github.com/ipfs/go-ipfs-cmds"
files "github.com/ipfs/go-ipfs-files"
ipld "github.com/ipfs/go-ipld-format"
mh "github.com/multiformats/go-multihash"
mc "github.com/multiformats/go-multicodec"

// Expected minimal set of available format/ienc codecs.
_ "github.com/ipld/go-codec-dagpb"
_ "github.com/ipld/go-ipld-prime/codec/cbor"
_ "github.com/ipld/go-ipld-prime/codec/dagcbor"
_ "github.com/ipld/go-ipld-prime/codec/dagjson"
_ "github.com/ipld/go-ipld-prime/codec/json"
_ "github.com/ipld/go-ipld-prime/codec/raw"
)

func dagPut(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) error {
Expand All @@ -24,16 +36,33 @@ func dagPut(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) e
hash, _ := req.Options["hash"].(string)
dopin, _ := req.Options["pin"].(bool)

// mhType tells inputParser which hash should be used. MaxUint64 means 'use
// default hash' (sha256 for cbor, sha1 for git..)
mhType := uint64(math.MaxUint64)
var icodec mc.Code
if err := icodec.Set(ienc); err != nil {
return err
}
var fcodec mc.Code
if err := fcodec.Set(format); err != nil {
return err
}
var mhType mc.Code
if err := mhType.Set(hash); err != nil {
return err
}

if hash != "" {
var ok bool
mhType, ok = mh.Names[hash]
if !ok {
return fmt.Errorf("%s in not a valid multihash name", hash)
}
cidPrefix := cid.Prefix{
Version: 1,
Codec: uint64(fcodec),
MhType: uint64(mhType),
MhLength: -1,
}

decoder, err := multicodec.LookupDecoder(uint64(icodec))
if err != nil {
return err
}
encoder, err := multicodec.LookupEncoder(uint64(fcodec))
if err != nil {
return err
}

var adder ipld.NodeAdder = api.Dag()
Expand All @@ -48,22 +77,36 @@ func dagPut(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) e
if file == nil {
return fmt.Errorf("expected a regular file")
}
nds, err := coredag.ParseInputs(ienc, format, file, mhType, -1)

node := basicnode.Prototype.Any.NewBuilder()
if err := decoder(node, file); err != nil {
return err
}
n := node.Build()

bd := bytes.NewBuffer([]byte{})
if err := encoder(n, bd); err != nil {
return err
}

blockCid, err := cidPrefix.Sum(bd.Bytes())
if err != nil {
return err
}
blk, err := blocks.NewBlockWithCid(bd.Bytes(), blockCid)
if err != nil {
return err
}
if len(nds) == 0 {
return fmt.Errorf("no node returned from ParseInputs")
ln := ipldlegacy.LegacyNode{
Block: blk,
Node: n,
}

for _, nd := range nds {
err := b.Add(req.Context, nd)
if err != nil {
return err
}
if err := b.Add(req.Context, &ln); err != nil {
return err
}

cid := nds[0].Cid()
cid := ln.Cid()
if err := res.Emit(&OutputObject{Cid: cid}); err != nil {
return err
}
Expand Down
27 changes: 14 additions & 13 deletions core/core.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@ import (
"github.com/ipfs/go-ipfs-pinner"

bserv "github.com/ipfs/go-blockservice"
"github.com/ipfs/go-fetcher"
"github.com/ipfs/go-graphsync"
bstore "github.com/ipfs/go-ipfs-blockstore"
exchange "github.com/ipfs/go-ipfs-exchange-interface"
"github.com/ipfs/go-ipfs-provider"
ipld "github.com/ipfs/go-ipld-format"
logging "github.com/ipfs/go-log"
mfs "github.com/ipfs/go-mfs"
resolver "github.com/ipfs/go-path/resolver"
goprocess "github.com/jbenet/goprocess"
connmgr "github.com/libp2p/go-libp2p-core/connmgr"
ic "github.com/libp2p/go-libp2p-core/crypto"
Expand Down Expand Up @@ -70,18 +70,19 @@ type IpfsNode struct {
PNetFingerprint libp2p.PNetFingerprint `optional:"true"` // fingerprint of private network

// Services
Peerstore pstore.Peerstore `optional:"true"` // storage for other Peer instances
Blockstore bstore.GCBlockstore // the block store (lower level)
Filestore *filestore.Filestore `optional:"true"` // the filestore blockstore
BaseBlocks node.BaseBlocks // the raw blockstore, no filestore wrapping
GCLocker bstore.GCLocker // the locker used to protect the blockstore during gc
Blocks bserv.BlockService // the block service, get/add blocks.
DAG ipld.DAGService // the merkle dag service, get/add objects.
Resolver *resolver.Resolver // the path resolution system
Reporter *metrics.BandwidthCounter `optional:"true"`
Discovery discovery.Service `optional:"true"`
FilesRoot *mfs.Root
RecordValidator record.Validator
Peerstore pstore.Peerstore `optional:"true"` // storage for other Peer instances
Blockstore bstore.GCBlockstore // the block store (lower level)
Filestore *filestore.Filestore `optional:"true"` // the filestore blockstore
BaseBlocks node.BaseBlocks // the raw blockstore, no filestore wrapping
GCLocker bstore.GCLocker // the locker used to protect the blockstore during gc
Blocks bserv.BlockService // the block service, get/add blocks.
DAG ipld.DAGService // the merkle dag service, get/add objects.
IPLDFetcherFactory fetcher.Factory `name:"ipldFetcher"` // fetcher that paths over the IPLD data model
UnixFSFetcherFactory fetcher.Factory `name:"unixfsFetcher"` // fetcher that interprets UnixFS data
Reporter *metrics.BandwidthCounter `optional:"true"`
Discovery discovery.Service `optional:"true"`
FilesRoot *mfs.Root
RecordValidator record.Validator

// Online
PeerHost p2phost.Host `optional:"true"` // the network host (server+client)
Expand Down
30 changes: 17 additions & 13 deletions core/coreapi/coreapi.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,12 @@ import (
"fmt"

bserv "github.com/ipfs/go-blockservice"
"github.com/ipfs/go-ipfs-blockstore"
"github.com/ipfs/go-ipfs-exchange-interface"
"github.com/ipfs/go-fetcher"
blockstore "github.com/ipfs/go-ipfs-blockstore"
exchange "github.com/ipfs/go-ipfs-exchange-interface"
offlinexch "github.com/ipfs/go-ipfs-exchange-offline"
"github.com/ipfs/go-ipfs-pinner"
"github.com/ipfs/go-ipfs-provider"
pin "github.com/ipfs/go-ipfs-pinner"
provider "github.com/ipfs/go-ipfs-provider"
offlineroute "github.com/ipfs/go-ipfs-routing/offline"
ipld "github.com/ipfs/go-ipld-format"
dag "github.com/ipfs/go-merkledag"
Expand Down Expand Up @@ -55,13 +56,14 @@ type CoreAPI struct {
baseBlocks blockstore.Blockstore
pinning pin.Pinner

blocks bserv.BlockService
dag ipld.DAGService

peerstore pstore.Peerstore
peerHost p2phost.Host
recordValidator record.Validator
exchange exchange.Interface
blocks bserv.BlockService
dag ipld.DAGService
ipldFetcherFactory fetcher.Factory
unixFSFetcherFactory fetcher.Factory
peerstore pstore.Peerstore
peerHost p2phost.Host
recordValidator record.Validator
exchange exchange.Interface

namesys namesys.NameSystem
routing routing.Routing
Expand Down Expand Up @@ -167,8 +169,10 @@ func (api *CoreAPI) WithOptions(opts ...options.ApiOption) (coreiface.CoreAPI, e
baseBlocks: n.BaseBlocks,
pinning: n.Pinning,

blocks: n.Blocks,
dag: n.DAG,
blocks: n.Blocks,
dag: n.DAG,
ipldFetcherFactory: n.IPLDFetcherFactory,
unixFSFetcherFactory: n.UnixFSFetcherFactory,

peerstore: n.Peerstore,
peerHost: n.PeerHost,
Expand Down
24 changes: 10 additions & 14 deletions core/coreapi/path.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ import (
"github.com/ipfs/go-namesys/resolve"

"github.com/ipfs/go-cid"
"github.com/ipfs/go-fetcher"
ipld "github.com/ipfs/go-ipld-format"
ipfspath "github.com/ipfs/go-path"
"github.com/ipfs/go-path/resolver"
uio "github.com/ipfs/go-unixfs/io"
ipfspathresolver "github.com/ipfs/go-path/resolver"
coreiface "github.com/ipfs/interface-go-ipfs-core"
path "github.com/ipfs/interface-go-ipfs-core/path"
)
Expand Down Expand Up @@ -49,23 +49,19 @@ func (api *CoreAPI) ResolvePath(ctx context.Context, p path.Path) (path.Resolved
return nil, err
}

var resolveOnce resolver.ResolveOnce

switch ipath.Segments()[0] {
case "ipfs":
resolveOnce = uio.ResolveUnixfsOnce
case "ipld":
resolveOnce = resolver.ResolveSingle
default:
if ipath.Segments()[0] != "ipfs" && ipath.Segments()[0] != "ipld" {
return nil, fmt.Errorf("unsupported path namespace: %s", p.Namespace())
}

r := &resolver.Resolver{
DAG: api.dag,
ResolveOnce: resolveOnce,
var dataFetcher fetcher.Factory
if ipath.Segments()[0] == "ipld" {
dataFetcher = api.ipldFetcherFactory
} else {
dataFetcher = api.unixFSFetcherFactory
}
resolver := ipfspathresolver.NewBasicResolver(dataFetcher)

node, rest, err := r.ResolveToLastNode(ctx, ipath)
node, rest, err := resolver.ResolveToLastNode(ctx, ipath)
if err != nil {
return nil, err
}
Expand Down
Loading

0 comments on commit f63a997

Please sign in to comment.