-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream decompression instead of buffering #2018
Conversation
This needs some serious edge cases tests. Some concerns - if some middleware, somewhere after this middleware, will also switch the request body this |
@aldas It is the any replacer's responsibility to call Close on the underlying Body. The current strategy that consumes the entire req.Body before calling the next middleware introduces serious latency. |
i := pool.Get() | ||
gr, ok := i.(*gzip.Reader) | ||
if !ok { | ||
return echo.NewHTTPError(http.StatusInternalServerError, i.(error).Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're trying reading the request body, wouldn't this be a 4xx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was as is. Regardless I believe this is correct.
Since sync.Pool returns an interface{}
we have no choice but to cast the value to the underlying type to work with it.
This middleware creates a sync.Pool to amortise allocation of gzip.Reader objects. If for some reason the pool is returning something else that we aren't expecting than something is wrong with the lib, not the request. Hence the 500.
An argument could be made that this might even be a good spot to panic
, but I don't want to introduce unneeded changes in this PR.
@aldas is there anything blocking this PR? I would like to get this through for use at my company. |
I do not know, I have not tried it yet, only looked diagonally at the code and wrote my initial thoughts. If it so time critical you can copy that modified middleware code and use it already today and switch to Echo provided version when it gets merged and released. |
@davidmdm I am struggling here. How did you test this? I can not see This is example I am using: func main() {
e := echo.New()
e.Use(middleware.Logger())
e.Use(middleware.Recover())
e.Use(middleware.Decompress())
e.POST("/", func(c echo.Context) error {
body, err := ioutil.ReadAll(c.Request().Body)
if err != nil {
return err
}
// so we could have multiple running request coroutines for pooling
// watch -n 0.1 curl -v -i http://localhost:8080 -H'Content-Encoding: gzip' --data-binary @51kb.csv.gz
time.Sleep(500 * time.Millisecond)
return c.String(http.StatusOK, string(body))
})
log.Fatal(e.Start(":8080"))
} I'm debugging what happens to request body and far I know that the http.Server will close request body here but that |
why not just do? defer func() {
gr.Close()
pool.Put(i)
b.Close()
}()
c.Request().Body = gr
return next(c) |
@aldas I hadn't realised that the underlying Close that the http.Server does is not on the userland exposed request body but on an underlying w.reqBody. Your proposed simplification works perfectly. |
Codecov Report
@@ Coverage Diff @@
## master #2018 +/- ##
==========================================
+ Coverage 91.23% 91.32% +0.09%
==========================================
Files 33 33
Lines 2887 2871 -16
==========================================
- Hits 2634 2622 -12
+ Misses 161 159 -2
+ Partials 92 90 -2
Continue to review full report at Codecov.
|
Haven't worked through the changes, but supporting streaming decompression is very welcome. |
@lammel Anything holding up this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If someone needs to test this by hand.
func main() {
e := echo.New()
e.Use(middleware.Logger())
e.Use(middleware.Recover())
e.Use(middleware.Decompress())
e.POST("/", func(c echo.Context) error {
body, err := io.ReadAll(c.Request().Body)
if err != nil {
return err
}
return c.String(http.StatusOK, string(body))
})
log.Fatal(e.Start(":8080"))
}
echo '{ "mydummy" : "json" }' | gzip > body.gz
curl -v http://localhost:8080 -H'Content-Encoding: gzip' --data-binary @body.gz
@davidmdm Thanks for your contribution! |
The current implementation of decompress will decompress the entire request body into a bytes.Buffer,
and sets the request.Body to be that buffer. For very large payloads this is creates latency and high memory issues.
For example, should the user not which to process the request because it is invalid in some way, bad header, invalid query parameters, etc... The current middleware will still read in the whole body before passing giving control to the user's handler.
This fix instead wraps the gzip reader into a custom closer that the user will read from directly. Once the user or the native go http library closes the request, everything is cleaned via the custom
closeFunc
.If the user for any reason chooses not to read from the body, the middleware will not have wasted time reading the request body in the first place, and should the user decide to write the stream of data to some other source no content will have been buffered in memory.