Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functional support with go-ipfs#5516 #7

Merged
merged 7 commits into from
Nov 28, 2018
Merged

Conversation

hinshun
Copy link
Contributor

@hinshun hinshun commented Oct 4, 2018

Hi there, we wanted to experiment with IPFS but we wanted to back a few nodes with S3 to ensure long term availability of blocks we publish.

I switched the implementation to use the stable aws-sdk-go instead, and had to use dep for vendoring because of issues with gx and gx-go:

  • Cannot import packages with no buildable go packages at root
  • Cannot import packages with dummy packages like golang.org/x/tools testing with fruits.io/banana, etc

@ghost ghost assigned Stebalien Oct 23, 2018
@ghost ghost added the status/in-progress In progress label Oct 23, 2018
@Stebalien
Copy link
Member

Yeah, I don't think we can reasonably do anything but vendor in this case. I've checked-in the vendored files and un-vendored go-datastore (and friends).

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a standard s3 library will make this so much easier to maintain. Thanks!

s3.go Outdated
return res, true
}
index -= len(resp.Contents)
if index < 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be impossible, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I found a bug... suppose q.Limit = 10, q.Offset = 11 and the bucket has more than 11 objects. Then index = 11, and len(resp.Contents) > 11 so index - len(resp.Contents) < 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False alarm, the scenario I mentioned was incorrect.

I think the only way for index < 0 is if the aws-go-sdk or s3 returns an invalid payload containing more than 1000 items.

s3.go Outdated
defer close(errChan)

for _, k := range putKeys {
go func(k ds.Key, op batchOp) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to launch quite a few goroutines. Unfortunately, this isn't as "free" as the go documentation would lead you to believe. I'd prefer to spawn N goroutines and then feed them with a channel.

Copy link
Contributor Author

@hinshun hinshun Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in it will take up memory on the stack? I'm okay with switching the concurrency pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That and it'll cost the scheduler something. I mean, really, it's probably not that big of an issue. There are just better ways to do this that aren't that complicated.

s3.go Outdated
for i := 0; i < errChanSize; i++ {
err := <-errChan
if err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow us to return while we're still writing. If we do that and then call close on the datastore, we'll close the limiter channel causing outstanding writes to panic.

Personally, I'd fix this by both:

  1. Not using a global "limiter" channel. There's really no reason to make it global.
  2. Making sure to wait for all outstanding goroutines to finish (not absolutely necessary but considered the "nice" thing to do).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@hinshun
Copy link
Contributor Author

hinshun commented Oct 24, 2018

As a side note, the docker registry implements a lot of its s3 driver with multipart uploads, but since IPFS blocks are strictly <= 4MB, I think we don't need resumability or resiliency against failing PUTs.

@Stebalien
Copy link
Member

As a side note, the docker registry implements a lot of its s3 driver with multipart uploads, but since IPFS blocks are strictly <= 4MB, I think we don't need resumability or resiliency against failing PUTs.

Technically blocks can be any size. For example, we support importing git objects directly as IPLD objects and, unfortunately, git doesn't do any chunking.

However, bitswap won't handle large blocks (and other parts of the system may refuse to operate) so it's probably fine if the S3 datastore doesn't work with them (for now). This may need to change eventually but we can handle that later if it becomes a problem).

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review (still need to look over the changes a bit)

s3.go Outdated
res := dsq.Result{Entry: dsq.Entry{Key: kval}}
return res, true
}
index -= len(resp.Contents)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this logic is wrong. Shouldn't this go above the ListObjectsV2 call? That is, if index >= len(resp.Contents), we want to skip those contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, not sure why I had it down there.

s3.go Outdated
go func(k ds.Key, op batchOp) {
b.s.limiter <- struct{}{}
var wg sync.WaitGroup
wg.Add(b.workers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we'd use workers := min(b.workers, numJobs).

@Stebalien Stebalien merged commit 6111fec into ipfs:master Nov 28, 2018
@ghost ghost removed the status/in-progress In progress label Nov 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants