-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functional support with go-ipfs#5516 #7
Conversation
Yeah, I don't think we can reasonably do anything but vendor in this case. I've checked-in the vendored files and un-vendored go-datastore (and friends). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a standard s3 library will make this so much easier to maintain. Thanks!
s3.go
Outdated
return res, true | ||
} | ||
index -= len(resp.Contents) | ||
if index < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be impossible, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think I found a bug... suppose q.Limit = 10
, q.Offset = 11
and the bucket has more than 11
objects. Then index = 11
, and len(resp.Contents) > 11
so index - len(resp.Contents) < 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
False alarm, the scenario I mentioned was incorrect.
I think the only way for index < 0
is if the aws-go-sdk
or s3
returns an invalid payload containing more than 1000 items.
s3.go
Outdated
defer close(errChan) | ||
|
||
for _, k := range putKeys { | ||
go func(k ds.Key, op batchOp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to launch quite a few goroutines. Unfortunately, this isn't as "free" as the go documentation would lead you to believe. I'd prefer to spawn N goroutines and then feed them with a channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in it will take up memory on the stack? I'm okay with switching the concurrency pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That and it'll cost the scheduler something. I mean, really, it's probably not that big of an issue. There are just better ways to do this that aren't that complicated.
s3.go
Outdated
for i := 0; i < errChanSize; i++ { | ||
err := <-errChan | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will allow us to return while we're still writing. If we do that and then call close on the datastore, we'll close the limiter channel causing outstanding writes to panic.
Personally, I'd fix this by both:
- Not using a global "limiter" channel. There's really no reason to make it global.
- Making sure to wait for all outstanding goroutines to finish (not absolutely necessary but considered the "nice" thing to do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
As a side note, the docker registry implements a lot of its s3 driver with multipart uploads, but since IPFS blocks are strictly <= 4MB, I think we don't need resumability or resiliency against failing PUTs. |
Technically blocks can be any size. For example, we support importing git objects directly as IPLD objects and, unfortunately, git doesn't do any chunking. However, bitswap won't handle large blocks (and other parts of the system may refuse to operate) so it's probably fine if the S3 datastore doesn't work with them (for now). This may need to change eventually but we can handle that later if it becomes a problem). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review (still need to look over the changes a bit)
s3.go
Outdated
res := dsq.Result{Entry: dsq.Entry{Key: kval}} | ||
return res, true | ||
} | ||
index -= len(resp.Contents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think this logic is wrong. Shouldn't this go above the ListObjectsV2
call? That is, if index >= len(resp.Contents)
, we want to skip those contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, not sure why I had it down there.
s3.go
Outdated
go func(k ds.Key, op batchOp) { | ||
b.s.limiter <- struct{}{} | ||
var wg sync.WaitGroup | ||
wg.Add(b.workers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we'd use workers := min(b.workers, numJobs)
.
Hi there, we wanted to experiment with IPFS but we wanted to back a few nodes with S3 to ensure long term availability of blocks we publish.
I switched the implementation to use the stable aws-sdk-go instead, and had to use
dep
for vendoring because of issues withgx
andgx-go
:golang.org/x/tools
testing withfruits.io/banana
, etc