-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak when writing #67
Comments
Hi @edsu I doubt if you're seeing a leak, it's probably an issue with caching and the GC. I tried this program:
I put 10,000 copies of 100 50MB ruby-vips closes files on GC. Ruby will normally GC when its memory is getting low, but this can take a while to happen, and with libraries like vips which hold sometimes large objects outside the ruby heap, you can exhaust main memory before ruby notices. It's also easy to run out of file descriptors: most systems limit you to 1000 files open at once and this is an easy thing to hit with ruby. If you GC on every file open you kill performance, so ruby-vips compromises and requests a GC every 100 file operations. vips also keeps a cache of the most recent 1,000 operations in case one is reused. That's probably not what's causing your high memory use though. I would guess you are processing fewer, much larger tiffs and the GC is not triggering. Try inserting a |
I have the same problem. Writers allocate memory that is never released. Try the following example with a large jpeg, like your 10000x10000px wtc.jpg and it will have several gigabytes of memory usage in no time, although a single write only allocates ~60 MB: #!/usr/bin/env ruby
require 'rubygems'
require 'vips'
include VIPS
repeat = 1000
def rss
`ps -o rss= -p #{$$}`.chomp.to_i/1024
end
GC.start
puts "RSS at startup with gems loaded: %d MB" % [rss]
# you can give vips argument flags to ruby-vips programs, eg.
# --vips-progress, make sure we don't try to load those
filenames = ARGV.reject { |arg| arg.start_with? "-" }
repeat.times do |i|
filenames.each do |filename|
img = Image.jpeg filename, :sequential => true
img = JPEGWriter.new(img, quality: 50)
img.write('test.jpg')
print "\rIteration: %-8d RSS: %6d MB File: %-32s".freeze % [i+1, rss, filename]
end
end
puts
GC.start
puts "RSS at exit: %d MB" % [rss] Tested with MRI 2.2.4, ruby-vips f680013 and vips 8.2.2 with libjpeg-turbo 1.4.2 on Mac OS X 10.11.4. Other writers like PNG and WEBP have the same issue, I chose JPEG because it's very fast. Occasionally I see drops in RSS eg. from 6 GB down to 3 GB before climbing up again, I think this is the GC.start in the writer class that's triggered every 100 writes. Example Output:
|
If I add a |
I replaced the module VIPS
class Writer
def write_gc(path)
GC.start full_mark: false, immediate_sweep: false
write_internal path
end
end
end
|
This is interesting: If I remove the
|
I didn't know about Every 100 times might be too long as well, this could maybe come down. |
I tried your nice test program with #!/usr/bin/env ruby
require 'vips8'
repeat = 2000
def rss
`ps -o rss= -p #{$$}`.chomp.to_i/1024
end
GC.start
puts "RSS at startup with gems loaded: %d MB" % [rss]
repeat.times do |i|
ARGV.each do |filename|
img = Vips::Image.new_from_file filename, :access => :sequential
img.write_to_file 'test.jpg', :Q => 50
print "\rIteration: %-8d RSS: %6d MB File: %-32s".freeze % [i+1, rss, filename]
end
end
puts
GC.start
puts "RSS at exit: %d MB" % [rss] And then:
|
I tried a few experiments. With the ruby-vips 0.3.9 gem, 500 iterations and a 2k x 1k image I see:
With a lazy GC, it becomes:
With lazy GC plus GC every 10 writes I see:
And with lazy GC plus GC every write I see:
GC every write, even a lazy GC, will probably have a bad effect on performance for larger Ruby programs. Does lazy GC every 10 writes seem like a reasonable compromise? |
The If this were to be added as a temporary workaround it would probably make sense to do some benchmarking with smaller image sizes to see the impacts. In my understanding minor GC (full_mark: false) has very little overhead compared to full GC. |
In my opinion there's nothing wrong in the binding, it's just a consequence of the mark-sweep Ruby GC. libvips can only free memory when objects are unreffed, Ruby will only unref objects on GC, and Ruby only runs a GC when it begins to fill its own heap. This must be a problem for any Ruby gem which manipulates large objects. I wonder how they handle this issue? I suppose one solution would be to have an explicit We could also GC on every write, but throttle to no more than one GC per second or one GC every 10 writes, whichever is sooner. That might give more 'expected' behaviour. |
Looks like |
I think it would make sense to have the choice of either a block based api with automatic cleanup at the end of the block and manual cleanup if this is not possible. This is much like ruby's File and Tempfile classes work. # block based
Image.jpeg filename, sequential: true do |img|
img.write('foo.jpg')
end
# manual cleanup
img = Image.jpeg filename, sequential: true
img.write('bar.jpg')
img.close # or free / destroy / destroy! This would allow to control cleanup when it's important but could still work without changes for existing users, who would get cleanup at next gc. Btw. I'd prefer I think this would be much cleaner than manually messing with the gc. |
I think the block API would only be useful in trivial cases. For example:
Would not free correctly, since the image generated by the
which seems very ugly. |
That would be weird. But maybe we could track all allocations inside the block? |
That's a good idea, maybe we could. libvips has a thing to send a signal to downstream images (images which depend on this image) which it uses for cache invalidation. We could maybe use it to ask all downstream Ruby objects to It would need a small API addition. I'll have a look. |
Great, let me know if you have something to try out… |
I've uploaded 0.3.12 including this slight change to the GC. Thanks! |
These params are only supported since the generational gc was introduced in ruby 2.1. So if the gem should support ruby 1.9 and 2.0, there need to be a version check. |
This should work as a check on at least MRI 1.9+ and jruby: RUBY_ENGINE == "ruby" && RUBY_VERSION.to_f >= 2.1 This could be assigned to a class variable or constant for quick lookup. |
Oh dear, I was too hasty pushing a new version of the gem. I thought from the docs that unknown GC options would be ignored on older Rubys. I've added something to not use the incremental options on older GCs, as per your suggestion. 89988dc#diff-18805b26f50826ce81fb2ba8430f9ee8R17 Travis is complaining now, I'm not certain why. I'll check tomorrow. |
Looks like it doesn't find the libvips-dev package, maybe the ubuntu universe source is not active or |
OK, travis is now passing with 1.9.3 / 2.0 / 2.1. I've updated the gem again, 0.3.13 now. |
The test programs above should now work without too much memory being used. Please reopen if they don't. |
@jcupitt Can you please re-open the issue? I had some time to test 0.3.13 this evening and the new strategy is not very effective at reducing memory usage. I have done some benchmarking on ruby 2.2.4 with different GC strategies and image sizes and have come to the following conclusion:
In the benchmark results So my recommendation would be to do something like this: @@generational_gc = RUBY_ENGINE == "ruby" && RUBY_VERSION.to_f >= 2.1
@@gc_interval = 100
@@gc_countdown = @@gc_interval
def write_gc(path)
if @@generational_gc
GC.start full_mark: false
else
@@gc_countdown -= 1
if @@gc_countdown < 0
@@gc_countdown = @@gc_interval
GC.start
end
end
write_internal path
end I have also tested ruby 2.1.8 and 2.3.0 which behave very similar, although 2.1 has higher and 2.3 lower memory usage than 2.2. |
on ruby2.1 and later, do an incremental GC on every write on older ruby, full GC every 100 writes see #67
Hi, I've made your suggested change, thank you very much for investigating this again. This time I'll wait a few days before updating the gem to let things settle. I wonder about the incremental GC. Presumably it just sweeps the young object pool, in which case you could have large images which the GC moved to the old object pool which would never get collected, or not until something else forced a full GC. I wonder if ruby2.1 should do a full GC every 100 writes as well? On the other hand it makes me very uncomfortable to be putting too much GC tuning into this gem. It seems very fragile: a small change to the GC in Ruby could throw all of these settings off. Perhaps your proposal is the best compromise. |
I did a quick test on master and it's now working fine for me. |
You could try to use ruby-vips8, it has much better memory management. I've successfully used it to do batch processing of thousands of images. |
Interesting, I didn't know this existed. Is the API backwards compatible? |
Not really, but it uses dynamic bindings using gobject-instrospection, so you always have access to all features in libvips, without adding any code to the gem. The nice thing is that you can get documentation for all operations through the |
@ioquatix, could you post a sample program that shows the bad memory behaviour? I've tried a few simple loops over images here and it seems to mostly behave itself. A #close method would be useful, but would be a lot of work. We'd need a proxy object between ruby-vips and libvips which noted the image state, so all code that looked at the image class would need reworking.
http://www.rubydoc.info/gems/ruby-vips8/0.1.0 http://www.rubydoc.info/gems/ruby-vips8/0.1.0/Vips It's quite a bit nicer to use than ruby-vips. It has a full set of operator overloads, a much better load/save system, an operation cache, better constant handling and better docs. Plus the entire thing is only 1,500 lines of Ruby, whereas ruby-vips was 10,000 lines of C. |
Yeah, I'll probably move to ruby-vips8 at some point soon. |
There was a blog post as well: http://libvips.blogspot.co.uk/2016/01/ruby-vips-is-dead-long-live-ruby-vips8.html It runs over some of the changes. |
I've made v1.0 of Here's a test program: #!/usr/bin/ruby
require 'vips'
ARGV.each do |filename|
next if File.extname(filename) != '.jpg'
puts "processing #{filename} ..."
im = Vips::Image.new_from_file filename, :access => :sequential
im.write_to_file 'x_' + filename
end If I make a directory with 10,000 jpg files, run it, and watch in The README has some notes on moving code to the new API, it should be very easy. |
This is a good solution. |
@jcupitt I'm still seeing some amount of memory bloat with ruby-vips 1.0.4. I have a script that goes through ~3000 assets (tiff, jpeg, svg, pdf) and converts them to jpeg. If I run the script without manual gc the memory usage steadily grows up to some hundred MB, if I do a full I am running with |
I've done some more runs and monitored memory usage. This might just be how the ruby's GC works. Without manually forcing GC, the memory usage always grows to somewhere around 500-600 MB and then GC kicks in and memory usage drops back to around 160 MB. So it is freeing memory just very lazily. Tested on ruby 2.3.3, I'll try again on 2.4.0 although I don't think it changed much in terms of GC. |
Hi @felixbuenemann, nice to hear from you again. Yes, I think this is just how Ruby's generational GC works. ruby-vips is doing this just after every write, for reference: https://github.com/jcupitt/ruby-vips/blob/master/lib/vips/image.rb#L359 So for ruby2.1 and later, it queues a sweep of recent objects after every write. I think the idea is to leave a full sweep for Ruby itself or the application to do, since that can be very expensive. |
on ruby2.1 and later, do an incremental GC on every write on older ruby, full GC every 100 writes see libvips/ruby-vips#67
Coming full circle, back to this issue, I'm still seeing some pretty crazy memory usage. I'm using my fork of There is a memsize operation (e.g. I've been playing around with the great memory leak sample given above: https://github.com/ioquatix/vips-thumbnail/blob/master/examples/memory/leak.rb In the worst case, it basically keeps on chewing through data, which is never returned, even after the final GC:
I would have expected calling
The only thing that works is GC after each iteration:
Thoughts? |
Okay, I was playing around with it a bit more. It turns out if you inform the GC of the buffer size, it will do a better job.
What we need is a way to tell the GC the correct buffer size/memory usage. |
@ioquatix How did you arrive at the 30 MiB buffer size and shouldn't the value in release be negative? Btw. a shortcut for specifying KiB, MiB etc.: |
@felixbuenemann I just took a random stab based on the 4MiB JPEG + 32MiB frame buffer (decoded). Yeah, it should be negative. I'll fix it. But it was just a typo, it still works in the actual code. |
Hey all- Thanks you for your hard works on the ruby-vips gem. I'm also seeing a strange case when I load pdfs.. I'm converting each page to its own image. After browsing through this thread I removed the write to see if that's where the memory was globbing but it's actually in the read. It looks like every call to pdfload is storing the whole pdf in memory. Not sure if this is a GC issue or this is the right thread to post it.
In this case, I don't actually see a GC happen. The file is 298 pages and 5.5mb
|
That is definitely some kind of memory explosion. |
Could it be the libvips cache? Try: Vips::cache_set_max 0 It could also be your libvips not being configured with pdfload and falling back to converting via imagemagick. At the bash prompt, try:
To check that you have a pdf loader. I guess it could also be a bug in poppler --- we'd need a test image to verify that. |
We're doing the same thing as @jalevin, and if I don't manually call |
I had a go at improving the behaviour of the PDF loader with file handles, if anyone has time to test. It has some other useful improvements as well around default cache sizing. |
That improvement to the PDF loader has been merged to git master libvips. I made a small benchmark: #!/usr/bin/ruby
require "vips"
im = Vips::Image.pdfload ARGV[0]
n_pages = im.get "n-pages"
n_iterations = ARGV[1].to_i
n_iterations.times do |i|
p = Random.rand n_pages
puts "iteration #{i} ..."
im = Vips::Image.pdfload ARGV[0], page: p, n: 1, access: :sequential
im.write_to_file "page.png"
end If I run like this:
And watch in I realize this is a lot of memory, but at least it's stable. File handle use stays low, at least. Perhaps something can be done to make it stabilize at a lower level, but that should probably be in a new issue. |
I should have said, that high memory use is just for the PDF loader. If I try: #!/usr/bin/ruby
require "vips"
n_iterations = ARGV[1].to_i
n_iterations.times do |i|
im = Vips::Image.new_from_file ARGV[0], access: :sequential
im.write_to_file "page.jpg"
end ie. repeatedly convert a single file to JPG, I see:
HEIC and PNG are hilariously slow. PDF looks good here, but that's converting page 0, the title page, which just has a couple of lines of text on. A complex page with a lot of graphics would look very different. |
I apologize if this is a very basic question. I'm using ruby-vips in what I think is a very simplistic way to process a few thousand TIFF files and convert them to PNG. It does convert files to PNG but fairly rapidly gobbles up all available memory:
Is there something I'm doing wrong here? I noticed that if I comment out the write it is able to successfully read in all the images, so it appears to be caused by writing the PNG file?
The text was updated successfully, but these errors were encountered: