Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak when writing #67

Closed
edsu opened this issue Jun 25, 2015 · 51 comments
Closed

memory leak when writing #67

edsu opened this issue Jun 25, 2015 · 51 comments

Comments

@edsu
Copy link

edsu commented Jun 25, 2015

I apologize if this is a very basic question. I'm using ruby-vips in what I think is a very simplistic way to process a few thousand TIFF files and convert them to PNG. It does convert files to PNG but fairly rapidly gobbles up all available memory:

requite 'vips'

for file in Dir.foreach('image_dir') do |filename|
  next if File.extname(file) != '.tif'
  png = tiff.sub '.tif', '.png'
  img = VIPS::Image.new tiff
  img.write png
end

Is there something I'm doing wrong here? I noticed that if I comment out the write it is able to successfully read in all the images, so it appears to be caused by writing the PNG file?

@jcupitt
Copy link
Member

jcupitt commented Jun 26, 2015

Hi @edsu I doubt if you're seeing a leak, it's probably an issue with caching and the GC. I tried this program:

#!/usr/bin/ruby

require 'rubygems'
require 'vips'

file_n = 0
Dir.glob('image_dir/*.tif') do |filename|
  file_n += 1
  if file_n % 100 == 1
      puts "processing file #{file_n} ..."
  end

  png = filename.sub '.tif', '.png'
  img = VIPS::Image.new filename
  img.write png
end

I put 10,000 copies of cramps.tif into image_dir/, ran the program and watched MEM in top. I saw:

100 50MB
300 80MB
600 150MB
1000 182MB
1600 230MB
2100 250MB
2400 280MB
2700 260MB
4200 260MB
8400 230MB

ruby-vips closes files on GC. Ruby will normally GC when its memory is getting low, but this can take a while to happen, and with libraries like vips which hold sometimes large objects outside the ruby heap, you can exhaust main memory before ruby notices. It's also easy to run out of file descriptors: most systems limit you to 1000 files open at once and this is an easy thing to hit with ruby.

If you GC on every file open you kill performance, so ruby-vips compromises and requests a GC every 100 file operations.

vips also keeps a cache of the most recent 1,000 operations in case one is reused. That's probably not what's causing your high memory use though.

I would guess you are processing fewer, much larger tiffs and the GC is not triggering. Try inserting a GC.start into your loop and see if that fixes it.

@felixbuenemann
Copy link
Contributor

I have the same problem. Writers allocate memory that is never released. Try the following example with a large jpeg, like your 10000x10000px wtc.jpg and it will have several gigabytes of memory usage in no time, although a single write only allocates ~60 MB:

#!/usr/bin/env ruby

require 'rubygems'
require 'vips'

include VIPS

repeat = 1000

def rss
  `ps -o rss= -p #{$$}`.chomp.to_i/1024
end

GC.start
puts "RSS at startup with gems loaded: %d MB" % [rss]

# you can give vips argument flags to ruby-vips programs, eg.
# --vips-progress, make sure we don't try to load those
filenames = ARGV.reject { |arg| arg.start_with? "-" }

repeat.times do |i|
  filenames.each do |filename|
    img = Image.jpeg filename, :sequential => true

    img = JPEGWriter.new(img, quality: 50)
    img.write('test.jpg')

    print "\rIteration: %-8d RSS: %6d MB File: %-32s".freeze % [i+1, rss, filename]
  end
end
puts

GC.start
puts "RSS at exit: %d MB" % [rss]

Tested with MRI 2.2.4, ruby-vips f680013 and vips 8.2.2 with libjpeg-turbo 1.4.2 on Mac OS X 10.11.4.

Other writers like PNG and WEBP have the same issue, I chose JPEG because it's very fast.

Occasionally I see drops in RSS eg. from 6 GB down to 3 GB before climbing up again, I think this is the GC.start in the writer class that's triggered every 100 writes.

Example Output:

RSS at startup with gems loaded: 32 MB
Iteration: 1000     RSS:   6232 MB File: earth10k.jpg

@felixbuenemann
Copy link
Contributor

If I add a GC.start after each write, memory usage stays between 220 and 255 MB RSS.

@felixbuenemann
Copy link
Contributor

I replaced the VIPS::Writer#write_gc method with a different implementations, that does a minor, lazy GC and it keeps RSS steady at ~189 MB:

module VIPS
  class Writer
    def write_gc(path)
      GC.start full_mark: false, immediate_sweep: false
      write_internal path
    end
  end
end
RSS at startup with gems loaded: 32 MB
Iteration: 1000     RSS:    190 MB File: earth10k.jpg

@felixbuenemann
Copy link
Contributor

This is interesting: If I remove the GC.start completely from write_gc, ruby seems to never deallocate memory from writers:

RSS at startup with gems loaded: 32 MB
Iteration: 1000     RSS:   9329 MB File: earth10k.jpg
# Ruby spends a huge time here running GC.start at exit
RSS at exit: 1496 MB

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

I didn't know about full_mark: false, immediate_sweep: false. Do you think it would be a good idea to use these options in write_gc?

Every 100 times might be too long as well, this could maybe come down.

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

I tried your nice test program with ruby-vips8 and it runs in a steady amount of memory, though only because it's forced to GC on every call into libvips because of a reference counting issue in ruby-gnome2 :-(

#!/usr/bin/env ruby

require 'vips8'

repeat = 2000

def rss
  `ps -o rss= -p #{$$}`.chomp.to_i/1024
end

GC.start
puts "RSS at startup with gems loaded: %d MB" % [rss]

repeat.times do |i|
  ARGV.each do |filename|
    img = Vips::Image.new_from_file filename, :access => :sequential
    img.write_to_file 'test.jpg', :Q => 50

    print "\rIteration: %-8d RSS: %6d MB File: %-32s".freeze % [i+1, rss, filename]
  end
end
puts

GC.start
puts "RSS at exit: %d MB" % [rss]

And then:

$ ./write8.rb k2.jpg 
RSS at startup with gems loaded: 15 MB
Vips::call running with a buggy gobject-introspection
  as a workaround, this gem will GC on every call to libvips
  fixing this bug will speed things up a lot
  http://sourceforge.net/p/ruby-gnome2/mailman/message/34555949/
Iteration: 47       RSS:    145 MB File: k2.jpg                          
Iteration: 71       RSS:    144 MB File: k2.jpg                          
Iteration: 97       RSS:    144 MB File: k2.jpg                          
Iteration: 126      RSS:    145 MB File: k2.jpg                          
Iteration: 157      RSS:    144 MB File: k2.jpg                          
Iteration: 198      RSS:    144 MB File: k2.jpg                          
Iteration: 232      RSS:    144 MB File: k2.jpg      
etc. etc.

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

I tried a few experiments. With the ruby-vips 0.3.9 gem, 500 iterations and a 2k x 1k image I see:

$ time ./write.rb k2.jpg 
RSS at startup with gems loaded: 29 MB
Iteration: 500      RSS:   1228 MB File: k2.jpg                          
real    0m33.947s
user    0m26.832s
sys 0m18.784s

With a lazy GC, it becomes:

Iteration: 500      RSS:   1661 MB File: k2.jpg                          
real    0m38.993s

With lazy GC plus GC every 10 writes I see:

Iteration: 500      RSS:    284 MB File: k2.jpg                          
real    0m24.337s

And with lazy GC plus GC every write I see:

Iteration: 500      RSS:     84 MB File: k2.jpg                          
real    0m20.761s

GC every write, even a lazy GC, will probably have a bad effect on performance for larger Ruby programs. Does lazy GC every 10 writes seem like a reasonable compromise?

@felixbuenemann
Copy link
Contributor

The GC.start in write_gc is supposed to protect against running out of file descriptors, not clean up memory allocations, so it's really a hack to use it for that purpose. I'm not familiar with the Ruby C Ext API, but my guess would be that somethings is wrong with how the C binding interact with the Ruby VM's GC, so it's not freeing memory automatically.

If this were to be added as a temporary workaround it would probably make sense to do some benchmarking with smaller image sizes to see the impacts. In my understanding minor GC (full_mark: false) has very little overhead compared to full GC.

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

In my opinion there's nothing wrong in the binding, it's just a consequence of the mark-sweep Ruby GC. libvips can only free memory when objects are unreffed, Ruby will only unref objects on GC, and Ruby only runs a GC when it begins to fill its own heap.

This must be a problem for any Ruby gem which manipulates large objects. I wonder how they handle this issue?

I suppose one solution would be to have an explicit .free method which you can call if you wish to release memory associated with an image. It would unref the underlying vips object and set the pointer nil, leaving the Ruby image object in a 'zombie' state, then any subsequent operations on that image would throw an exception. This would be fiddly and error-prone to use though.

We could also GC on every write, but throttle to no more than one GC per second or one GC every 10 writes, whichever is sooner. That might give more 'expected' behaviour.

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

Looks like rmagick has a .destroy! method for this purpose:

https://rmagick.github.io/image1.html#destroy_bang

@felixbuenemann
Copy link
Contributor

I think it would make sense to have the choice of either a block based api with automatic cleanup at the end of the block and manual cleanup if this is not possible. This is much like ruby's File and Tempfile classes work.

# block based
Image.jpeg filename, sequential: true do |img|
  img.write('foo.jpg')
end
# manual cleanup
img = Image.jpeg filename, sequential: true
img.write('bar.jpg')
img.close # or free / destroy / destroy!

This would allow to control cleanup when it's important but could still work without changes for existing users, who would get cleanup at next gc.

Btw. I'd prefer close or free over destroy, because it could be mistaken to mean deleting the image, if it has been opened from a file. After freeing, the image probably should throw an exception if any methods are called on it and have a method to check for it similar to rmagick's destroyed?. rmagick also allows inspect, which makes sense, because REPLs tend to call it automatically.

I think this would be much cleaner than manually messing with the gc.

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

I think the block API would only be useful in trivial cases. For example:

Vips::Image.new_from_file filename do |image|
    image.invert.write_to_file output_filename
end

Would not free correctly, since the image generated by the .invert would not have .destroy! called on it. I suppose you'd have to write:

Vips::Image.new_from_file filename do |image|
    image.invert do |image2|
        image2.write_to_file output_filename
    end
end

which seems very ugly.

@felixbuenemann
Copy link
Contributor

That would be weird. But maybe we could track all allocations inside the block?

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

That's a good idea, maybe we could.

libvips has a thing to send a signal to downstream images (images which depend on this image) which it uses for cache invalidation. We could maybe use it to ask all downstream Ruby objects to .destroy! themselves.

It would need a small API addition. I'll have a look.

@felixbuenemann
Copy link
Contributor

Great, let me know if you have something to try out…

jcupitt added a commit that referenced this issue Jan 17, 2016
@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

I've uploaded 0.3.12 including this slight change to the GC. Thanks!

@felixbuenemann
Copy link
Contributor

These params are only supported since the generational gc was introduced in ruby 2.1. So if the gem should support ruby 1.9 and 2.0, there need to be a version check.

@felixbuenemann
Copy link
Contributor

This should work as a check on at least MRI 1.9+ and jruby:

RUBY_ENGINE == "ruby" && RUBY_VERSION.to_f >= 2.1

This could be assigned to a class variable or constant for quick lookup.

@jcupitt
Copy link
Member

jcupitt commented Jan 17, 2016

Oh dear, I was too hasty pushing a new version of the gem. I thought from the docs that unknown GC options would be ignored on older Rubys.

I've added something to not use the incremental options on older GCs, as per your suggestion.

89988dc#diff-18805b26f50826ce81fb2ba8430f9ee8R17

Travis is complaining now, I'm not certain why. I'll check tomorrow.

@felixbuenemann
Copy link
Contributor

Looks like it doesn't find the libvips-dev package, maybe the ubuntu universe source is not active or apt-get update is needed first.

@jcupitt
Copy link
Member

jcupitt commented Jan 18, 2016

OK, travis is now passing with 1.9.3 / 2.0 / 2.1.

I've updated the gem again, 0.3.13 now.

@jcupitt
Copy link
Member

jcupitt commented Jan 18, 2016

The test programs above should now work without too much memory being used. Please reopen if they don't.

@jcupitt jcupitt closed this as completed Jan 18, 2016
@felixbuenemann
Copy link
Contributor

@jcupitt Can you please re-open the issue? I had some time to test 0.3.13 this evening and the new strategy is not very effective at reducing memory usage.

I have done some benchmarking on ruby 2.2.4 with different GC strategies and image sizes and have come to the following conclusion:

  • Overhead for minor GCs is very low, it's feasible to do a GC on every write
  • Major GC stalls the process and causes quite a slowdown, it should be avoided if possible
  • immediate_sweep: false increased memory usage without positive effects on performance, so the default of immediate_sweep: true should be used for both minor and major GC
  • Doing manual GC actually improves performance, probably due to less memory management

In the benchmark results no gc means no manual GC was triggered from write gc. minor gc means full_mark: false and lazy sweep means immediate_sweep: false.

So my recommendation would be to do something like this:

@@generational_gc = RUBY_ENGINE == "ruby" && RUBY_VERSION.to_f >= 2.1

@@gc_interval = 100
@@gc_countdown = @@gc_interval

def write_gc(path)
  if @@generational_gc 
    GC.start full_mark: false
  else
    @@gc_countdown -= 1
    if @@gc_countdown < 0 
      @@gc_countdown = @@gc_interval
      GC.start
    end
  end

  write_internal path
end

I have also tested ruby 2.1.8 and 2.3.0 which behave very similar, although 2.1 has higher and 2.3 lower memory usage than 2.2.

jcupitt added a commit that referenced this issue Jan 21, 2016
on ruby2.1 and later, do an incremental GC on every write
on older ruby, full GC every 100 writes

see #67
@jcupitt
Copy link
Member

jcupitt commented Jan 21, 2016

Hi, I've made your suggested change, thank you very much for investigating this again. This time I'll wait a few days before updating the gem to let things settle.

I wonder about the incremental GC. Presumably it just sweeps the young object pool, in which case you could have large images which the GC moved to the old object pool which would never get collected, or not until something else forced a full GC.

I wonder if ruby2.1 should do a full GC every 100 writes as well?

On the other hand it makes me very uncomfortable to be putting too much GC tuning into this gem. It seems very fragile: a small change to the GC in Ruby could throw all of these settings off. Perhaps your proposal is the best compromise.

@jcupitt jcupitt reopened this Jan 21, 2016
@felixbuenemann
Copy link
Contributor

I did a quick test on master and it's now working fine for me.

@felixbuenemann
Copy link
Contributor

felixbuenemann commented May 13, 2016

You could try to use ruby-vips8, it has much better memory management. I've successfully used it to do batch processing of thousands of images.

@ioquatix
Copy link
Member

Interesting, I didn't know this existed. Is the API backwards compatible?

@felixbuenemann
Copy link
Contributor

Not really, but it uses dynamic bindings using gobject-instrospection, so you always have access to all features in libvips, without adding any code to the gem.

The nice thing is that you can get documentation for all operations through the vips cli utility or by looking at the vips reference manual.

@jcupitt
Copy link
Member

jcupitt commented May 13, 2016

@ioquatix, could you post a sample program that shows the bad memory behaviour? I've tried a few simple loops over images here and it seems to mostly behave itself.

A #close method would be useful, but would be a lot of work. We'd need a proxy object between ruby-vips and libvips which noted the image state, so all code that looked at the image class would need reworking.

ruby-vips8 is supposed to be where current dev effort is going. It has sections which go into the reasoning behind this new binding and explain how it works:

http://www.rubydoc.info/gems/ruby-vips8/0.1.0

http://www.rubydoc.info/gems/ruby-vips8/0.1.0/Vips

It's quite a bit nicer to use than ruby-vips. It has a full set of operator overloads, a much better load/save system, an operation cache, better constant handling and better docs. Plus the entire thing is only 1,500 lines of Ruby, whereas ruby-vips was 10,000 lines of C.

@ioquatix
Copy link
Member

Yeah, I'll probably move to ruby-vips8 at some point soon.

@jcupitt
Copy link
Member

jcupitt commented May 13, 2016

There was a blog post as well:

http://libvips.blogspot.co.uk/2016/01/ruby-vips-is-dead-long-live-ruby-vips8.html

It runs over some of the changes.

@jcupitt
Copy link
Member

jcupitt commented Jun 7, 2016

I've made v1.0 of ruby-vips, it's just ruby-vips8 renamed. The old gem is still there and maintained if you pin your version to 0.3.

Here's a test program:

#!/usr/bin/ruby 

require 'vips'

ARGV.each do |filename|
    next if File.extname(filename) != '.jpg'
    puts "processing #{filename} ..."
    im = Vips::Image.new_from_file filename, :access => :sequential
    im.write_to_file 'x_' + filename
end

If I make a directory with 10,000 jpg files, run it, and watch in top, I see a pretty steady 150 - 200MB of memory use. It wobbles up and down a bit as the GC triggers.

The README has some notes on moving code to the new API, it should be very easy.

@jcupitt jcupitt closed this as completed Jun 7, 2016
@ioquatix
Copy link
Member

This is a good solution.

@felixbuenemann
Copy link
Contributor

felixbuenemann commented Mar 9, 2017

@jcupitt I'm still seeing some amount of memory bloat with ruby-vips 1.0.4.

I have a script that goes through ~3000 assets (tiff, jpeg, svg, pdf) and converts them to jpeg.

If I run the script without manual gc the memory usage steadily grows up to some hundred MB, if I do a full GC.start or GC.start(full_mark: true, immediate_sweep: false) after each image is processed the memory usage stays fixed at around 140 MB.

I am running with Vips.cache_set_max 0 in case that matters.

@felixbuenemann
Copy link
Contributor

I've done some more runs and monitored memory usage. This might just be how the ruby's GC works.

Without manually forcing GC, the memory usage always grows to somewhere around 500-600 MB and then GC kicks in and memory usage drops back to around 160 MB. So it is freeing memory just very lazily.

Tested on ruby 2.3.3, I'll try again on 2.4.0 although I don't think it changed much in terms of GC.

@jcupitt
Copy link
Member

jcupitt commented Mar 10, 2017

Hi @felixbuenemann, nice to hear from you again.

Yes, I think this is just how Ruby's generational GC works. ruby-vips is doing this just after every write, for reference:

https://github.com/jcupitt/ruby-vips/blob/master/lib/vips/image.rb#L359

So for ruby2.1 and later, it queues a sweep of recent objects after every write. I think the idea is to leave a full sweep for Ruby itself or the application to do, since that can be very expensive.

oleksandrbyk added a commit to oleksandrbyk/olek-ruby-vips that referenced this issue Feb 6, 2019
oleksandrbyk added a commit to oleksandrbyk/olek-ruby-vips that referenced this issue Feb 6, 2019
on ruby2.1 and later, do an incremental GC on every write
on older ruby, full GC every 100 writes

see libvips/ruby-vips#67
@ioquatix
Copy link
Member

ioquatix commented Mar 24, 2020

Coming full circle, back to this issue, I'm still seeing some pretty crazy memory usage.

I'm using my fork of ruby-vips in which I've also removed the GC.start after write operations (because it shouldn't be necessary). However, even inserting this doesn't always fix the memory usage issues in production.

There is a memsize operation (e.g. ObjectSpace.memsize_of) in CRuby which is used to report the actual memory size of a pointer/value. Otherwise, as you say, the GC doesn't know the impact/heap usage of an image. However, I can't see a way to expose this via FFI which is unfortunate.

I've been playing around with the great memory leak sample given above:

https://github.com/ioquatix/vips-thumbnail/blob/master/examples/memory/leak.rb

In the worst case, it basically keeps on chewing through data, which is never returned, even after the final GC:

koyoko% ./leak.rb
RSS at startup with gems loaded: 35 MB
Iteration: 100      RSS:    731 MB File: ../../spec/vips/thumbnail/IMG_8537.jpg
RSS at exit: 639 MB

I would have expected calling img.close (vips_image_invalidate_all) should release memory:

koyoko% ./leak.rb --close  
RSS at startup with gems loaded: 35 MB
Iteration: 100      RSS:    727 MB File: ../../spec/vips/thumbnail/IMG_8537.jpg
RSS at exit: 641 MB

The only thing that works is GC after each iteration:

koyoko% ./leak.rb --gc-each
RSS at startup with gems loaded: 35 MB
Iteration: 100      RSS:    103 MB File: ../../spec/vips/thumbnail/IMG_8537.jpg
RSS at exit: 98 MB

Thoughts?

@ioquatix
Copy link
Member

ioquatix commented Mar 24, 2020

Okay, I was playing around with it a bit more. It turns out if you inform the GC of the buffer size, it will do a better job.

module GC
	extend FFI::Library
	ffi_lib FFI::CURRENT_PROCESS
	
	attach_function :rb_gc_adjust_memory_usage, [:ssize_t], :void
end

def acquire(img)
	GC.rb_gc_adjust_memory_usage(30*1024*1024)
	ObjectSpace.define_finalizer(img, self.method(:release))
end

def release
	GC.rb_gc_adjust_memory_usage(-30*1024*1024)
end

What we need is a way to tell the GC the correct buffer size/memory usage.

@ioquatix ioquatix reopened this Mar 24, 2020
@felixbuenemann
Copy link
Contributor

@ioquatix How did you arrive at the 30 MiB buffer size and shouldn't the value in release be negative?

Btw. a shortcut for specifying KiB, MiB etc.: 1<<10 KiB, 1<<20 MiB, 1<<30 GiB.

@ioquatix
Copy link
Member

@felixbuenemann I just took a random stab based on the 4MiB JPEG + 32MiB frame buffer (decoded).

Yeah, it should be negative. I'll fix it. But it was just a typo, it still works in the actual code.

@jalevin
Copy link

jalevin commented May 27, 2020

Hey all- Thanks you for your hard works on the ruby-vips gem.

I'm also seeing a strange case when I load pdfs.. I'm converting each page to its own image. After browsing through this thread I removed the write to see if that's where the memory was globbing but it's actually in the read. It looks like every call to pdfload is storing the whole pdf in memory. Not sure if this is a GC issue or this is the right thread to post it.

    page_count = Vips::Image.pdfload("test.pdf", access: :sequential).get("n-pages")
    page_count.times do |p|
      im = Vips::Image.pdfload("test.pdf", page: p, n: 1, access: :sequential)
      im.write_to_file("buffer/test.#{p}.png")
    end

In this case, I don't actually see a GC happen. The file is 298 pages and 5.5mb

starting mem: 39mb
page: 0 - Memory: 52.38671875mb
page: 1 - Memory: 57.9375mb
page: 2 - Memory: 63.46875mb
-snip-
page: 297 - Memory: 1701.53125mb

@ioquatix
Copy link
Member

That is definitely some kind of memory explosion.

@jcupitt
Copy link
Member

jcupitt commented May 27, 2020

Could it be the libvips cache? Try:

Vips::cache_set_max 0

It could also be your libvips not being configured with pdfload and falling back to converting via imagemagick. At the bash prompt, try:

vips pdfload

To check that you have a pdf loader. I guess it could also be a bug in poppler --- we'd need a test image to verify that.

@tpendragon
Copy link

tpendragon commented Aug 6, 2020

We're doing the same thing as @jalevin, and if I don't manually call GC.start every 10 iterations or so we run out of file handles for large PDFs. An explicit close would be super helpful. The handles still go up with Vips::cache_set_max 0.

@jcupitt
Copy link
Member

jcupitt commented Aug 7, 2020

I had a go at improving the behaviour of the PDF loader with file handles, if anyone has time to test.

libvips/libvips#1770

It has some other useful improvements as well around default cache sizing.

@jcupitt
Copy link
Member

jcupitt commented Sep 27, 2020

That improvement to the PDF loader has been merged to git master libvips. I made a small benchmark:

#!/usr/bin/ruby

require "vips"

im = Vips::Image.pdfload ARGV[0]
n_pages = im.get "n-pages"
n_iterations = ARGV[1].to_i

n_iterations.times do |i|
  p = Random.rand n_pages
  puts "iteration #{i} ..."
  im = Vips::Image.pdfload ARGV[0], page: p, n: 1, access: :sequential
  im.write_to_file "page.png"
end

If I run like this:

john@yingna ~/try $ /usr/bin/time -f %M:%e ./pdf-convert.rb ~/pics/nipguide.pdf 
iteration 1 ...
...
iteration 9999 ...
2553748:397.33

And watch in top, memory use stabilises after about 3,000 pages at 2.3gb. After that, it wobbles up and down between 2.3 and 2.4, presumably due to memory fragmentation.

I realize this is a lot of memory, but at least it's stable. File handle use stays low, at least.

Perhaps something can be done to make it stabilize at a lower level, but that should probably be in a new issue.

@jcupitt jcupitt closed this as completed Sep 27, 2020
@jcupitt
Copy link
Member

jcupitt commented Sep 27, 2020

I should have said, that high memory use is just for the PDF loader. If I try:

#!/usr/bin/ruby

require "vips"

n_iterations = ARGV[1].to_i

n_iterations.times do |i|
  im = Vips::Image.new_from_file ARGV[0], access: :sequential
  im.write_to_file "page.jpg"
end

ie. repeatedly convert a single file to JPG, I see:

john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/k2.jpg 5000
227880:108.03
john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/k2.tif 5000
253604:109.82
john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/k2.png 5000
216716:485.39
john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/k2.ppm 5000
74152:72.03
john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/k2.webp 5000
102676:71.26
john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/k2.heic 5000
793964:1374.04
john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/lion.svg 5000
104920:43.98
john@yingna ~/try $ /usr/bin/time -f %M:%e ./convert.rb ~/pics/nipguide.pdf 5000
71404:21.33

HEIC and PNG are hilariously slow. PDF looks good here, but that's converting page 0, the title page, which just has a couple of lines of text on. A complex page with a lot of graphics would look very different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants