Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient check for empty directory in Shrine::Storage::FileSystem#clean #365

Closed
adamniedzielski opened this issue Mar 14, 2019 · 11 comments

Comments

@adamniedzielski
Copy link
Contributor

Brief Description

Thank you for creating this great gem! We’ve been using it in production at @liefery for over half a year now and we’re happy with how extensible it is (especially with moving image processing to the background).

I noticed that a background worker which calls Shrine::Attacher.delete(data) takes a lot of time to execute (up to 2-3 minutes) and can allocate a lot of memory (up to 400-500 MB).

I narrowed down the problem to cleaning empty directories in Shrine::Storage::FileSystem. I can see that it’s possible to disable it completely and that’s what we will do in the first place. However, I was wondering if it’s possible to improve the performance so we can re-enable it later.

Shrine::Storage::FileSystem#clean contains pathname.children.empty (

if pathname.children.empty? && pathname != directory
). That's a very inefficient way to check for an empty directory, because it loads the pathnames of all sub-directories into memory. In directories with hundreds of thousands of sub-directories (our case) it takes a long time and consumes a lot of memory.

For us, the additional problem is that we use GlusterFS (network filesystem) so it's even slower.

Expected behavior

Checking for empty directory doesn't require loading directory content into memory.

Actual behavior

Checking for empty directory loads directory content into memory.

Our configuration

# in initializer

require "shrine"
require "shrine/storage/file_system"

Shrine.storages = {
  cache: Shrine::Storage::FileSystem.new("public", prefix: "system/cache"),
  store: Shrine::Storage::FileSystem.new("public", prefix: "system")
}

# [...]
Shrine.plugin :backgrounding

Shrine::Attacher.promote { |data| PromoteUploadWorker.perform_async(data) }
Shrine::Attacher.delete { |data| DeleteUploadWorker.perform_async(data) }

# worker

class DeleteUploadWorker
  sidekiq_options queue: :default, retry: true

  def perform(data)
    Shrine::Attacher.delete(data)
  end
end

System configuration

Ruby version: ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin17]

Shrine version: 2.16.0

@adamniedzielski
Copy link
Contributor Author

I have a few ideas how this could be improved:

  1. Ruby 2.4 introduced Dir.empty? method (https://ruby-doc.org/core-2.4.0/Dir.html#method-c-empty-3F). That's really straightforward and readable. I didn't look how it's implemented (but hopefully better 😆). A quick test in our production system confirmed that it's much faster and doesn't allocate additional memory.
  2. As far as I can see, Shrine currently supports Ruby 2.3. That's of course a problem here, because we can't use Dir.empty? in a straightforward way. What are the plans with respect to supporting Ruby 2.3?
  3. I think that it's possible to implement Dir.empty? in Ruby 2.3 using a bit hacky approach with Dir#read (https://ruby-doc.org/core-2.3.0/Dir.html#method-i-read). An empty directory returns [".", "..", nil] on Linux and [nil] on Windows.
  4. If you want to continue supporting Ruby 2.3 it's possible to add a check for Ruby version and use Dir.empty? in Ruby 2.4+ and the "hacky way" in 2.3. Or Dir.empty? in Ruby 2.4+ and the inefficient implementation in Ruby 2.3.

What do you think?

@janko
Copy link
Member

janko commented Mar 14, 2019

Thank you for such a detailed report!

TIL about Dir.empty?, thanks for pointing it out and trying it out, I agree we should definitely use it. At the moment I'm a bit swamped with other things, so I'd gladly receive a pull request for this.

Regarding Ruby 2.3, it seems that it will be EOL'd at the end of this month already, so we can drop support for it in Shrine.

@RKushnir
Copy link

RKushnir commented Mar 14, 2019

I think, something like this could work:

def empty_dir?(path)
  Dir.foreach(path) { |x| return false unless ['.', '..'].include?(x) }
  true
end

adamniedzielski added a commit to adamniedzielski/shrine that referenced this issue Mar 18, 2019
Fixes shrinerb#365

Dir.empty? does not load all the child directories into memory.
This improves the performance for directories with a lot of
children.
@texpert
Copy link
Contributor

texpert commented Jul 19, 2019

@janko , sorry for intervention into this closed issue, but there are still a lot of applications, using Ruby 2.3.x. In fact, AWS Elastic Beanstalk is still supporting all the versions starting from Ruby 1.9.

It's a pity and major inconvenience to not be able to use new Shrine versions since 2.16 with Ruby 2.3, given the impressive list of Shrine's new features, enhancements and fixes.

So, is it possible a implementation of Dir.empty? workaround for older Rubies, described by @adamniedzielski with Dir#read method?

@jrochkind
Copy link
Contributor

Curious what's keeping you from upgrading to ruby 2.4, when 2.3 is out of support? 2.3=>2.4 is normally a pretty painless upgrade.

@texpert
Copy link
Contributor

texpert commented Jul 19, 2019

@jrochkind , at least Rails 3.2.x isn't working wit Ruby 2.4.x mostly because of the old version of json gem. Moreover, extensive use of FixNum is adding a lot of incompatibilities, because of the unifying of FixNum and BigInt into Integer in 2.4.

@jrochkind
Copy link
Contributor

jrochkind commented Jul 20, 2019

Ah, Rails 3.2, that would explain it. Rails 3.2 is also end-of-lifed, receiving no support, not even security updates, from Rails maintainers.

I understand that some are stuck on it anyway.

I also understand when janko decides there's a very real cost to supporting this ancient unsupported versions, and we can't afford to.

I wonder if you wanted to prepare a PR back-porting the fix into what could be like a, I dunno, shrine 2.16.1, if janko would be amenable to doing such a release. It's still confusing on the maintainer's end to keep track of such things. Running very old unsupported software is indeed challenging.

@janko
Copy link
Member

janko commented Jul 20, 2019

When I dropped support for Ruby 2.3, I thought that was a completely fine support strategy, considering that Ruby 2.3 was EOL and that high backwards compatibility is maintained in 2.x versions. I haven't had any report (until now) of someone being stuck on Ruby 2.3 without the ability to upgrade. I would like for Shrine to still support Rails 3.2, so I'm open for adding back support for Ruby 2.3.

However, I would like to first understand better why people on Rails 3.2 cannot upgrade to Ruby 2.4. @texpert Do you have some GH issues you could link me to? I would like to know more about the problem with older json gem versions, that sounds like a real blocker.

On the other hand, Fixnum & Bignum classes being deprecated should not be a blocker, because they were merely deprecated, any code using them will still continue to work, right? I'm aware that it would be annoying to see those deprecation warnings, but you can silence them with the ruby-warning gem by Jeremy Evans.

require "warning"
Warning.ignore([:fixnum, :bignum])

@janko
Copy link
Member

janko commented Jul 20, 2019

The only commit I was able to find was this one, so it appears there was zero cost in maintaining Ruby 2.3 support before this fix. And @RKushnir's 2.3-compatible approach looks straightforward.

I will add Ruby 2.3 support back to Shrine. The latest Shrine version is 2.19.0, but the master is now for 3.0.0, so I will add Ruby 2.3 compatibility to master and backport it to a new 2.19.1 release.

I will also bring back Ruby 2.3 support in image_processing. I think that should be all – down, content_disposition, tus-ruby-server and uppy-s3_multipart gems all still support Ruby 2.3.

janko added a commit that referenced this issue Jul 20, 2019
Even though Ruby 2.3 is EOL, people still on Rails 3.2.x are stuck with
Ruby 2.3, because Ruby 2.4 has some backwards incompatible changes with
the JSON gem, and possibly other problems as well.

#365 (comment)

Since it's really not a problem for us to maintain Ruby 2.3
compatibility (at least for now), we add back support for Ruby 2.3. This
will be backported into Shrine 2.19.1 as well.
@janko
Copy link
Member

janko commented Jul 20, 2019

Done ✅

  • released Shrine 2.19.1 with support for Ruby 2.3
  • Shrine 3.x will continue to support Ruby 2.3
  • added back Ruby 2.3 support to ImageProcessing gem and released 1.9.1

@texpert
Copy link
Contributor

texpert commented Jul 22, 2019

Thank you very much, @janko for re-enabling Ruby 2.3 support!!

As for GH issues, this is the issue of json gem - ruby/json#308. They've fixed compatibility with Ruby 2.4.x in json version 2.0.0, but the Rails team refused to backport changes to unsupported Rails 3.2.x - see Rails GH issue - rails/rails#27450.

d-yuicuonlab pushed a commit to palettecloud/shrine that referenced this issue Jan 24, 2020
Even though Ruby 2.3 is EOL, people still on Rails 3.2.x are stuck with
Ruby 2.3, because Ruby 2.4 has some backwards incompatible changes with
the JSON gem, and possibly other problems as well.

shrinerb#365 (comment)

Since it's really not a problem for us to maintain Ruby 2.3
compatibility (at least for now), we add back support for Ruby 2.3. This
will be backported into Shrine 2.19.1 as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants