Memory issue when linking to a large file #1439

darkpixel · 2022-04-28T15:28:54Z

Shlink Version: 3.1.0
PHP Version: 4.1.5
How do you serve Shlink: Kubernetes cluster using shlink:stable docker image
Database engine used: Postgres

Summary

When trying to create a link to a large file, shlink says:
An error occurred while creating the URL :(

And the following error appears in the log:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 365891616 bytes) in /etc/shlink/vendor/guzzlehttp/psr7/src/Stream.php on line 99
[2022-04-28 15:17:43 *63.0]	ERROR	php_swoole_server_rshutdown() (ERRNO 503): Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 365891616 bytes) in /etc/shlink/vendor/guzzlehttp/psr7/src/Stream.php on line 99

The file in question is ~349 MB.

I have tried it with "Validate URL" checked and unchecked and it doesn't change anything.

Linking to smaller files (~100 MB) and other random URLs works without a problem.

The fix appears to be to increase the memory limits in php.ini, but modifying the official docker image is a bit of a pain. I basically have to create my own Dockerfile to generate an image based on shlink:stable, upload it somewhere, and then pull it down into the cluster.

I see three potential solutions:

Make docker-entrypoint.sh set the php memory_limit (maybe use sed to update the value in php.ini) from an environment variable so users can configure it themselves
Increase the php memory_limit by statically assigning a larger value in php.ini
If shlink is downloading the URL to do some sort of processing, stop doing that (maybe through a checkbox similar to "Validate URL")

The text was updated successfully, but these errors were encountered:

acelaya · 2022-04-28T16:22:51Z

Shlink should not try to download the whole body, specially if you disable the validation of the URL.

A possible mitigation so far is that you disable both validation and title resolution. If any of those is enabled, Shlink will probably try to download the URL.

One easy way to disable title resolution is providing a title actively.

I will look into methods to mitigate this.

I like the idea of allowing to provide a custom memory limit via env vars, but not as a solution for this, but an independent nice to have.

darkpixel · 2022-04-28T16:44:47Z

I worked around it by tweaking the entrypoint to use sed to increase the memory before running the normal /etc/shlink/docker-entrypoint.sh.

      containers:
      - name: shlink
        image: shlinkio/shlink:stable
        command: ["/bin/sh", "-c"]
        args: ["sed -i \"s/memory_limit=256M/memory_limit=512M/\" /usr/local/etc/php/conf.d/php.ini ; chmod +x /etc/shlink/docker-entrypoint.sh ; /etc/shlink/docker-entrypoint.sh"]

Maybe doing an initial HEAD request for the URL and inspecting the Content-Type header to see if it's HTML or application/octet-stream....although I'm not sure if that would work as some sites probably block HEAD requests.

acelaya · 2022-05-01T06:44:35Z

although I'm not sure if that would work as some sites probably block HEAD requests

Could you elaborate on this? Why do you say some sites block HEAD requests?

acelaya · 2022-05-01T08:22:01Z

This is what I have for now: https://github.com/shlinkio/shlink/pull/1440/files

Not completely relevant for this bug, but for context to understand the pull request: Shlink tries to internally optimize the amount of requests made to the long URL, so that if you have enabled URL validation and also title resolution, it tries to do both with a single request.

The pull request above changes a bit the logic so that it makes a HEAD request instead of a GET one when title resolution is not enabled or the title has been provided.

Now, the problem is when the title has to be resolved (either with validation enabled or not, that should not affect the approach). The options I have come up so far are these:

Make a HEAD request first, and based on the content type, try to follow with a GET request to resolve the title.
The problem with this is that it would add some extra latency to the short URL creation, for an option (auto-resolve titles) that most of the people have probably enabled. Although, it is disabled by default, so maybe it's not that critical.
Try to stream the request, and stop reading once headers have been read, and potentially, the <title />` tag as well, if needed.
This would be my preferred option, but I'm not even sure if this is possible, the HTTP client used by Shlink supports it, or how much does it depend on how the remote server has been set-up. I need further investigation.
Dump the downloaded body to the filesystem instead of keeping it in memory.
This would mitigate the issue with the allocated memory, but large files would still take a lot of time to get validated.

Related with this issue, I have opened a poll in order to better understand how users configure their Shlink instances: #1442

acelaya · 2022-05-01T09:26:29Z

Ok, the option I wanted to implement is possible. The HTTP client supports it, and it does not depend on the server.

Enabling the streaming option I can just download the headers, and if the content type is not html, I don't even try to continue downloading. And even if it is html, I can download chunks of the body until the body tag is present, and then stop.

That should solve the problem.

acelaya · 2022-05-01T10:15:10Z

This has been merged and will be part of v3.1.1

darkpixel · 2022-05-02T16:19:24Z

I think it's probably rare, but I've run into a handful of sites that block HEAD requests. Regardless, I think your approach is probably the best solution.

darkpixel added the bug label Apr 28, 2022

acelaya added this to the 3.1.1 milestone Apr 28, 2022

acelaya mentioned this issue May 1, 2022

Feature/url validation memory issue #1440

Merged

acelaya closed this as completed May 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issue when linking to a large file #1439

Memory issue when linking to a large file #1439

darkpixel commented Apr 28, 2022 •

edited

Loading

acelaya commented Apr 28, 2022

darkpixel commented Apr 28, 2022

acelaya commented May 1, 2022

acelaya commented May 1, 2022

acelaya commented May 1, 2022 •

edited

Loading

acelaya commented May 1, 2022

darkpixel commented May 2, 2022

Memory issue when linking to a large file #1439

Memory issue when linking to a large file #1439

Comments

darkpixel commented Apr 28, 2022 • edited Loading

Summary

acelaya commented Apr 28, 2022

darkpixel commented Apr 28, 2022

acelaya commented May 1, 2022

acelaya commented May 1, 2022

acelaya commented May 1, 2022 • edited Loading

acelaya commented May 1, 2022

darkpixel commented May 2, 2022

darkpixel commented Apr 28, 2022 •

edited

Loading

acelaya commented May 1, 2022 •

edited

Loading