Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalizing upload code for minio #1035

Closed
wants to merge 39 commits into from

Conversation

manugupt1
Copy link
Member

@manugupt1 manugupt1 commented Jan 8, 2019

What is the problem I am trying to address?

Makes minio-storage implementation with other implementations like S3.

How is the fix applied?

Refactor the code to use modUploader and fix it in other places.

NB: It is a breaking change, so it has to be taken care while integrating this change.

@manugupt1 manugupt1 requested a review from a team as a code owner January 8, 2019 06:51
pkg/storage/minio/saver.go Outdated Show resolved Hide resolved
@manugupt1 manugupt1 changed the title [WIP] Start generalizing upload code for minio Generalizing upload code for minio Jan 13, 2019
@codecov-io
Copy link

codecov-io commented Jan 13, 2019

Codecov Report

Merging #1035 into master will decrease coverage by 0.1%.
The diff coverage is 79.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1035      +/-   ##
==========================================
- Coverage   53.05%   52.95%   -0.11%     
==========================================
  Files          80       80              
  Lines        2731     2744      +13     
==========================================
+ Hits         1449     1453       +4     
- Misses       1163     1170       +7     
- Partials      119      121       +2
Impacted Files Coverage Δ
pkg/module/zip_read_closer.go 54.16% <ø> (-33.34%) ⬇️
pkg/storage/gcp/saver.go 0% <0%> (ø) ⬆️
pkg/storage/mongo/cataloger.go 81.81% <100%> (+0.56%) ⬆️
pkg/storage/fs/saver.go 51.61% <100%> (ø) ⬆️
pkg/storage/minio/getter.go 79.59% <100%> (ø) ⬆️
pkg/storage/minio/minio.go 60.86% <100%> (ø) ⬆️
pkg/storage/minio/lister.go 90.62% <100%> (+2.16%) ⬆️
pkg/storage/minio/checker.go 80% <100%> (ø) ⬆️
pkg/storage/minio/deleter.go 55.55% <100%> (ø) ⬆️
pkg/storage/module/upload.go 97.56% <100%> (+0.33%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d379c20...12f2102. Read the comment docs.

pkg/storage/minio/saver.go Outdated Show resolved Hide resolved
@manugupt1 manugupt1 changed the title Generalizing upload code for minio [WIP] Generalizing upload code for minio Jan 15, 2019
@manugupt1 manugupt1 changed the title [WIP] Generalizing upload code for minio Generalizing upload code for minio Jan 16, 2019
@manugupt1
Copy link
Member Author

@marpio I was able to get down the memory consumption without going through GoGetFetcher and get minio passing on windows also.

Let me know what do you think.

@marpio
Copy link
Member

marpio commented Jan 16, 2019

@manugupt1 this will make the tests work but won't solve the underlying problem.
You're still passing -1 in case of zip which makes minio allocate 600MiB per mod@ver. Imagine now that athens serves 20 requests concurrently - this means 20*600MiB. This is way to much in my opinion...
Btw the issue is not OS dependent - it will happen on Linux as well. It's just that the Travis build server has more free memory.

@manugupt1
Copy link
Member Author

true! Let me see if there is a way to get this down lower for zip and bracket it to the maximum size of the zip.

@marpio
Copy link
Member

marpio commented Jan 16, 2019

...and bracket it to the maximum size of the zip.

We actually have to do better than that b/c some module are big - i.e. cockroachdb so we should use only a small buffer or even none and stream directly if possible.
I took a glance at minio code and I think passing the size should help but it would be awesome if we could measure memory consumption somehow.

@manugupt1
Copy link
Member Author

manugupt1 commented Jan 17, 2019

This is what I am thinking:
The concern that you are really expressing is at the tail end of the spectrum. As of now, I do not think anyone uses Minio and anyone having low RAM would be an even lower probability IMO. While I am not advocating for this PR to go through, I honestly do not think that we should remove Minio for the very minority of the users who are on very old and limited hardware. Because by removing that, we will be actually removing a feature that is potentially useful for the other 90% of the user base.

About measurement:
Would not the observability dashboard help to measure memory consumption?
I can do it on my computer as well to measure change in memory before and after to see what it is.

What do you think?

@manugupt1
Copy link
Member Author

Also, @marpio I have been thinking a lot more. If you look at the minio docs, we would not ever require more than 600MB for zip if we do not provide a length. Granted that there are situations, where we can get by by < 600MB, but for cockroachdb 600MB is the max we can go if not provided a length. Correct me if I am wrong here!

@marpio
Copy link
Member

marpio commented Jan 17, 2019

@manugupt1

I honestly do not think that we should remove Minio for the very minority of the users who are on very old and limited hardware. Because by removing that, we will be actually removing a feature that is potentially useful for the other 90% of the user base.

we definitely do not have to remove minio, if we manage to reduce memory consumption considerably, otherwise it won't be a workable solution for anyone.

If you look at the minio docs, we would not ever require more than 600MB for zip if we do not provide a length

Yep, not more but 600MiB is still way more than it can/should be if we want to serve more than few requests, no?

As I said before, we can try extending storage.Version by the size of the zip inside goGetFetcher because there we don't need to read the whole stream. I'm pretty optimistic that this will help us in this regard.

@manugupt1
Copy link
Member Author

Let me look there! I was trying not to touch that part of the code. Maybe during this week.

@manugupt1 manugupt1 changed the title [WIP] Generalizing upload code for minio Generalizing upload code for minio Feb 19, 2019
Copy link
Member

@arschles arschles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manugupt1 thanks for picking this back up! overall looks really nice, but I left some comments 👍

Also, this is really gonna help me in #1058, so I'm looking forward to this going in 🎉

pkg/download/protocol_test.go Outdated Show resolved Hide resolved
pkg/storage/compliance/tests.go Outdated Show resolved Hide resolved
pkg/storage/minio/cataloger.go Outdated Show resolved Hide resolved
pkg/storage/minio/checker.go Show resolved Hide resolved
@@ -23,17 +23,17 @@ func (v *storageImpl) Delete(ctx context.Context, module, version string) error

versionedPath := v.versionLocation(module, version)

modPath := fmt.Sprintf("%s/go.mod", versionedPath)
modPath := fmt.Sprintf("%s.mod", versionedPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here and below regarding versionedPath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That is why I mentioned it is a breaking change :(
The reason being: https://github.com/gomods/athens/pull/1035/files#diff-617c9c1a2f86faafdb72e42ccd20a502R46 where config.PackageVersionPath has its own function of storing it and minio will use the moduploader which uses this function.

pkg/storage/mongo/cataloger.go Outdated Show resolved Hide resolved
pkg/storage/zip.go Outdated Show resolved Hide resolved
Copy link
Member

@marpio marpio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manugupt1 Just few small comments and one big question about versionLocation func.

}

func (s *storageImpl) versionLocation(module, version string) string {
return fmt.Sprintf("%s/%s", module, version)
return fmt.Sprintf("%s/@v/%s", module, version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to change that? Wouldn't that make an existing deployment re-download everything again b/c Exists wouldn't find stored modules?

Copy link
Member Author

@manugupt1 manugupt1 Feb 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That is why I mentioned it is a breaking change :(
The reason being: https://github.com/gomods/athens/pull/1035/files#diff-617c9c1a2f86faafdb72e42ccd20a502R46 where config.PackageVersionPath has its own function of storing it and minio will use the moduploader which uses this function.

pkg/storage/minio/saver.go Outdated Show resolved Hide resolved
pkg/storage/zip.go Outdated Show resolved Hide resolved
@manugupt1 manugupt1 requested a review from arschles February 23, 2019 22:27
@arschles arschles removed this from the v0.3.0 milestone Feb 26, 2019
Copy link
Contributor

@marwan-at-work marwan-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite surprised by how much has to change just so that a helper function works for Minio as well.

I'm not sure if a lot of these changes are necessary or not, so would love to hear some feedback. But nice work on making such changes not too crazy :)

My biggest two concerns are:

  1. The uploader function might have to change soon anyway (or even be removed)
  2. I can't see why we have to include "Size" and not just do -1 for all modules. What's the harm in that? It would save us a lot of complexity. And if we truly need to, maybe it's best to have the upload mechanism for Minio be untied to other storage uploads.

// Zip represents zip file of a specific version and it's size
type Zip struct {
Reader io.ReadCloser
Size int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the size actually required by Minio? What's the benefit of including size? I'd imagine That's something Minio should do itself once we give it a big blob. For example, I imagine S3/GCS calculate the size for you.

// Version represents a version of a module and contains .mod file, a .info file and zip file of a specific version
type Version struct {
Mod []byte
Zip io.ReadCloser
Zip Zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a pointer, because copying by value will still copy by reference the ReadCloser

)

// Saver saves module metadata and its source to underlying storage
type Saver interface {
Save(ctx context.Context, module, version string, mod []byte, zip io.Reader, info []byte) error
Save(ctx context.Context, module, version string, mod []byte, zip Zip, info []byte) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should also be a pointer to Zip so that people are aware they're "sharing" a Zip struct and not copying it.

@@ -14,16 +15,30 @@ import (
const numFiles = 3

// Uploader takes a stream and saves it to the blob store under a given path
type Uploader func(ctx context.Context, path, contentType string, stream io.Reader) error
type Uploader func(ctx context.Context, path, contentType string, stream Stream) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all Storage save by path? Shouldn't the uploader take a module, version and then the storage itself determines how to combine them into a path, or maybe not combine them at all? This would be very necessary when we want to fix module.Checker to be able to tell us how many files a particular version has, or how many versions a particular module has.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw the comment above is out of this PR's scope, but just wanted to make peeps aware of potential changes to it so that we don't break everyone twice

func (s *storageImpl) upload(ctx context.Context, path, contentType string, stream moduploader.Stream) error {
const op errors.Op = "minio.upload"
if stream.Size > 600*mb {
stream.Size = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to my question below, why can't we just put -1 for all the things? This would guarantee the same behavior for all modules

@marwan-at-work
Copy link
Contributor

I think it might be best to implement this solution in our pkg/storage/minio and avoid touching a lot of our codebase minio/minio-go#848 (comment)

I'm not sure yet if we need to use module.Uploader or avoid it completely. That's something to think about. But it might be worth creating a new issue for the above, and maybe a new PR. Up to you!

@manugupt1
Copy link
Member Author

Closing as discussed on Slack!

@manugupt1 manugupt1 closed this Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants