-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalizing upload code for minio #1035
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1035 +/- ##
==========================================
- Coverage 53.05% 52.95% -0.11%
==========================================
Files 80 80
Lines 2731 2744 +13
==========================================
+ Hits 1449 1453 +4
- Misses 1163 1170 +7
- Partials 119 121 +2
Continue to review full report at Codecov.
|
f44e7b8
to
05d8047
Compare
@marpio I was able to get down the memory consumption without going through GoGetFetcher and get minio passing on windows also. Let me know what do you think. |
@manugupt1 this will make the tests work but won't solve the underlying problem. |
true! Let me see if there is a way to get this down lower for zip and bracket it to the maximum size of the zip. |
We actually have to do better than that b/c some module are big - i.e. cockroachdb so we should use only a small buffer or even none and stream directly if possible. |
This is what I am thinking: About measurement: What do you think? |
Also, @marpio I have been thinking a lot more. If you look at the minio docs, we would not ever require more than 600MB for zip if we do not provide a length. Granted that there are situations, where we can get by by < 600MB, but for cockroachdb 600MB is the max we can go if not provided a length. Correct me if I am wrong here! |
we definitely do not have to remove minio, if we manage to reduce memory consumption considerably, otherwise it won't be a workable solution for anyone.
Yep, not more but 600MiB is still way more than it can/should be if we want to serve more than few requests, no? As I said before, we can try extending |
Let me look there! I was trying not to touch that part of the code. Maybe during this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manugupt1 thanks for picking this back up! overall looks really nice, but I left some comments 👍
Also, this is really gonna help me in #1058, so I'm looking forward to this going in 🎉
@@ -23,17 +23,17 @@ func (v *storageImpl) Delete(ctx context.Context, module, version string) error | |||
|
|||
versionedPath := v.versionLocation(module, version) | |||
|
|||
modPath := fmt.Sprintf("%s/go.mod", versionedPath) | |||
modPath := fmt.Sprintf("%s.mod", versionedPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here and below regarding versionedPath
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! That is why I mentioned it is a breaking change :(
The reason being: https://github.com/gomods/athens/pull/1035/files#diff-617c9c1a2f86faafdb72e42ccd20a502R46 where config.PackageVersionPath
has its own function of storing it and minio will use the moduploader which uses this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manugupt1 Just few small comments and one big question about versionLocation
func.
} | ||
|
||
func (s *storageImpl) versionLocation(module, version string) string { | ||
return fmt.Sprintf("%s/%s", module, version) | ||
return fmt.Sprintf("%s/@v/%s", module, version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to change that? Wouldn't that make an existing deployment re-download everything again b/c Exists
wouldn't find stored modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! That is why I mentioned it is a breaking change :(
The reason being: https://github.com/gomods/athens/pull/1035/files#diff-617c9c1a2f86faafdb72e42ccd20a502R46 where config.PackageVersionPath
has its own function of storing it and minio will use the moduploader which uses this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite surprised by how much has to change just so that a helper function works for Minio as well.
I'm not sure if a lot of these changes are necessary or not, so would love to hear some feedback. But nice work on making such changes not too crazy :)
My biggest two concerns are:
- The uploader function might have to change soon anyway (or even be removed)
- I can't see why we have to include "Size" and not just do -1 for all modules. What's the harm in that? It would save us a lot of complexity. And if we truly need to, maybe it's best to have the upload mechanism for Minio be untied to other storage uploads.
// Zip represents zip file of a specific version and it's size | ||
type Zip struct { | ||
Reader io.ReadCloser | ||
Size int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the size actually required by Minio? What's the benefit of including size? I'd imagine That's something Minio should do itself once we give it a big blob. For example, I imagine S3/GCS calculate the size for you.
// Version represents a version of a module and contains .mod file, a .info file and zip file of a specific version | ||
type Version struct { | ||
Mod []byte | ||
Zip io.ReadCloser | ||
Zip Zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a pointer, because copying by value will still copy by reference the ReadCloser
) | ||
|
||
// Saver saves module metadata and its source to underlying storage | ||
type Saver interface { | ||
Save(ctx context.Context, module, version string, mod []byte, zip io.Reader, info []byte) error | ||
Save(ctx context.Context, module, version string, mod []byte, zip Zip, info []byte) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should also be a pointer to Zip so that people are aware they're "sharing" a Zip struct and not copying it.
@@ -14,16 +15,30 @@ import ( | |||
const numFiles = 3 | |||
|
|||
// Uploader takes a stream and saves it to the blob store under a given path | |||
type Uploader func(ctx context.Context, path, contentType string, stream io.Reader) error | |||
type Uploader func(ctx context.Context, path, contentType string, stream Stream) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can all Storage save by path
? Shouldn't the uploader take a module, version
and then the storage itself determines how to combine them into a path, or maybe not combine them at all? This would be very necessary when we want to fix module.Checker
to be able to tell us how many files a particular version has, or how many versions a particular module has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw the comment above is out of this PR's scope, but just wanted to make peeps aware of potential changes to it so that we don't break everyone twice
func (s *storageImpl) upload(ctx context.Context, path, contentType string, stream moduploader.Stream) error { | ||
const op errors.Op = "minio.upload" | ||
if stream.Size > 600*mb { | ||
stream.Size = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to my question below, why can't we just put -1 for all the things? This would guarantee the same behavior for all modules
I think it might be best to implement this solution in our pkg/storage/minio and avoid touching a lot of our codebase minio/minio-go#848 (comment) I'm not sure yet if we need to use module.Uploader or avoid it completely. That's something to think about. But it might be worth creating a new issue for the above, and maybe a new PR. Up to you! |
Closing as discussed on Slack! |
What is the problem I am trying to address?
Makes minio-storage implementation with other implementations like S3.
How is the fix applied?
Refactor the code to use modUploader and fix it in other places.
NB: It is a breaking change, so it has to be taken care while integrating this change.