Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bottleneck in Usage of io.Copy for Large File Downloads from S3 #2662

Closed
jeremy-green opened this issue Jun 24, 2019 · 2 comments · Fixed by #2858
Closed

Potential Bottleneck in Usage of io.Copy for Large File Downloads from S3 #2662

jeremy-green opened this issue Jun 24, 2019 · 2 comments · Fixed by #2858
Labels
enhancement Change to SDK that resolves an issue improves existing functionality.

Comments

@jeremy-green
Copy link

When attempting to download a file from S3 using the SDK, I noticed download times taking a very long time. After enabling of the HTTP requests to S3 and logging before and after of the download call, I noticed a large gap between the logging of the HTTP request and when the download finished. I dug further into the SDK and added logging specifically within download.go around line 407 and 413.

I used time.Now and time.Since to see how long the HTTP calls took and how long the call to io.Copy took. It turns out that the calls of io.Copy took the longest to execute. In the SDK, I updated the following code to output how long each call was taking:

var n int64
var err error
for retry := 0; retry <= d.partBodyMaxRetries; retry++ {
  var resp *s3.GetObjectOutput
  gStart := time.Now()
  resp, err = d.cfg.S3.GetObjectWithContext(d.ctx, in, d.cfg.RequestOptions...)
  gEnd := time.Since(gStart)
  if err != nil {
    return err
  }
  d.setTotalBytes(resp) // Set total if not yet set.

  start := time.Now()
  n, err = io.Copy(&chunk, resp.Body)
  elapsed := time.Since(start)

  fmt.Println("GetObjectWithContext", gEnd, *resp.ContentRange)
  fmt.Println("Copy", elapsed, *resp.ContentRange)

  resp.Body.Close()
  if err == nil {
    break
  }

  chunk.cur = 0
  logMessage(d.cfg.S3, aws.LogDebugWithRequestRetries,
    fmt.Sprintf("DEBUG: object part body download interrupted %s, err, %v, retrying attempt %d",
      aws.StringValue(in.Key), err, retry))
}

Decreasing the PartSize and increasing the Concurrency helps to relieve the bottleneck.

Go isn't my primary language so let me know if there's any information that I'm missing.

Version of AWS SDK for Go?

1.16.34

Version of Go (go version)?

go version go1.12.1 linux/amd64

What issue did you see?

io.Copy takes seconds to run after downloading a file from S3.

Steps to reproduce

  • Upload a large file to S3
  • Turn on HTTP logging
  • Attempt to download the file from S3
  • Add in debug code and reattempt to download

If you have an runnable example, please include it.
https://gist.github.com/jeremy-green/d09f80e8d5ca62798488fcd00e885a22

@diehlaws diehlaws self-assigned this Jun 25, 2019
@diehlaws diehlaws added investigating This issue is being investigated and/or work is in progress to resolve the issue. enhancement Change to SDK that resolves an issue improves existing functionality. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jun 25, 2019
@diehlaws
Copy link
Contributor

diehlaws commented Jul 2, 2019

Hi @jeremy-green, thanks for reaching out to us about this and I do apologize for the long delay in response from our end. I'm able to reproduce the described behavior - during my tests I found that, using the default PartSize and Concurrency values, the SDK consistently downloaded data from S3 between 38 and 58 MB/s but copied that same data to the disk between 0.64 and 0.73 MB/s.

I've brought up this behavior of the SDK in this week's sprint, and we were able to identify areas for improvement in how s3manager downloads parts for an object from S3 and writes them locally. We should have a fix out for this soon - once I have more information on this matter I'll update the issue accordingly.

@jeremy-green
Copy link
Author

Great! Thanks for the update!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Change to SDK that resolves an issue improves existing functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants