Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue when linking to a large file #1439

Closed
darkpixel opened this issue Apr 28, 2022 · 7 comments
Closed

Memory issue when linking to a large file #1439

darkpixel opened this issue Apr 28, 2022 · 7 comments
Labels
Milestone

Comments

@darkpixel
Copy link

darkpixel commented Apr 28, 2022

  • Shlink Version: 3.1.0
  • PHP Version: 4.1.5
  • How do you serve Shlink: Kubernetes cluster using shlink:stable docker image
  • Database engine used: Postgres

Summary

When trying to create a link to a large file, shlink says:
An error occurred while creating the URL :(

And the following error appears in the log:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 365891616 bytes) in /etc/shlink/vendor/guzzlehttp/psr7/src/Stream.php on line 99
[2022-04-28 15:17:43 *63.0]	ERROR	php_swoole_server_rshutdown() (ERRNO 503): Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 365891616 bytes) in /etc/shlink/vendor/guzzlehttp/psr7/src/Stream.php on line 99

The file in question is ~349 MB.

I have tried it with "Validate URL" checked and unchecked and it doesn't change anything.

Linking to smaller files (~100 MB) and other random URLs works without a problem.

The fix appears to be to increase the memory limits in php.ini, but modifying the official docker image is a bit of a pain. I basically have to create my own Dockerfile to generate an image based on shlink:stable, upload it somewhere, and then pull it down into the cluster.

I see three potential solutions:

  1. Make docker-entrypoint.sh set the php memory_limit (maybe use sed to update the value in php.ini) from an environment variable so users can configure it themselves
  2. Increase the php memory_limit by statically assigning a larger value in php.ini
  3. If shlink is downloading the URL to do some sort of processing, stop doing that (maybe through a checkbox similar to "Validate URL")
@darkpixel darkpixel added the bug label Apr 28, 2022
@acelaya
Copy link
Member

acelaya commented Apr 28, 2022

Shlink should not try to download the whole body, specially if you disable the validation of the URL.

A possible mitigation so far is that you disable both validation and title resolution. If any of those is enabled, Shlink will probably try to download the URL.

One easy way to disable title resolution is providing a title actively.

I will look into methods to mitigate this.

I like the idea of allowing to provide a custom memory limit via env vars, but not as a solution for this, but an independent nice to have.

@acelaya acelaya added this to the 3.1.1 milestone Apr 28, 2022
@darkpixel
Copy link
Author

I worked around it by tweaking the entrypoint to use sed to increase the memory before running the normal /etc/shlink/docker-entrypoint.sh.

      containers:
      - name: shlink
        image: shlinkio/shlink:stable
        command: ["/bin/sh", "-c"]
        args: ["sed -i \"s/memory_limit=256M/memory_limit=512M/\" /usr/local/etc/php/conf.d/php.ini ; chmod +x /etc/shlink/docker-entrypoint.sh ; /etc/shlink/docker-entrypoint.sh"]

Maybe doing an initial HEAD request for the URL and inspecting the Content-Type header to see if it's HTML or application/octet-stream....although I'm not sure if that would work as some sites probably block HEAD requests.

@acelaya
Copy link
Member

acelaya commented May 1, 2022

although I'm not sure if that would work as some sites probably block HEAD requests

Could you elaborate on this? Why do you say some sites block HEAD requests?

@acelaya
Copy link
Member

acelaya commented May 1, 2022

This is what I have for now: https://github.com/shlinkio/shlink/pull/1440/files

Not completely relevant for this bug, but for context to understand the pull request: Shlink tries to internally optimize the amount of requests made to the long URL, so that if you have enabled URL validation and also title resolution, it tries to do both with a single request.

The pull request above changes a bit the logic so that it makes a HEAD request instead of a GET one when title resolution is not enabled or the title has been provided.

Now, the problem is when the title has to be resolved (either with validation enabled or not, that should not affect the approach). The options I have come up so far are these:

  • Make a HEAD request first, and based on the content type, try to follow with a GET request to resolve the title.
    The problem with this is that it would add some extra latency to the short URL creation, for an option (auto-resolve titles) that most of the people have probably enabled. Although, it is disabled by default, so maybe it's not that critical.
  • Try to stream the request, and stop reading once headers have been read, and potentially, the <title />` tag as well, if needed.
    This would be my preferred option, but I'm not even sure if this is possible, the HTTP client used by Shlink supports it, or how much does it depend on how the remote server has been set-up. I need further investigation.
  • Dump the downloaded body to the filesystem instead of keeping it in memory.
    This would mitigate the issue with the allocated memory, but large files would still take a lot of time to get validated.

Related with this issue, I have opened a poll in order to better understand how users configure their Shlink instances: #1442

@acelaya
Copy link
Member

acelaya commented May 1, 2022

Ok, the option I wanted to implement is possible. The HTTP client supports it, and it does not depend on the server.

Enabling the streaming option I can just download the headers, and if the content type is not html, I don't even try to continue downloading. And even if it is html, I can download chunks of the body until the body tag is present, and then stop.

That should solve the problem.

@acelaya
Copy link
Member

acelaya commented May 1, 2022

This has been merged and will be part of v3.1.1

@acelaya acelaya closed this as completed May 1, 2022
@darkpixel
Copy link
Author

I think it's probably rare, but I've run into a handful of sites that block HEAD requests. Regardless, I think your approach is probably the best solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants