Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use varint gem to speed up decoding? #268

Closed
zanker opened this issue Sep 25, 2015 · 4 comments
Closed

Use varint gem to speed up decoding? #268

zanker opened this issue Sep 25, 2015 · 4 comments

Comments

@zanker
Copy link
Contributor

zanker commented Sep 25, 2015

The protobuf gem https://github.com/codekitchen/ruby-protocol-buffers has a C extension called varint. If you take a look at a trace after the enum fix is applied

Measure Mode: wall_time
Thread ID: 70271059223040
Fiber ID: 70271070182440
Total: 3.308983
Sort by: self_time

 %self      total      self      wait     child     calls  name
 10.16      0.736     0.336     0.000     0.400   140000   <Class::Protobuf::Decoder>#read_varint
  4.92      0.182     0.163     0.000     0.019    94000   Kernel#respond_to?
  4.62      1.148     0.153     0.000     0.995    72000   <Class::Protobuf::Decoder>#read_field
  3.33      0.501     0.110     0.000     0.391    72000   <Class::Protobuf::Decoder>#read_key
  3.25      0.308     0.108     0.000     0.201    72000   Protobuf::Message::Fields::ClassMethods#get_field
  2.70      0.156     0.089     0.000     0.066   166500   IO::generic_readable#readbyte
  2.37      0.078     0.078     0.000     0.000   410500   Fixnum#&
  2.34      0.119     0.077     0.000     0.042   171500   Numeric#nonzero?
  2.29      0.076     0.076     0.000     0.000   144000   Protobuf::Field::BaseField#repeated?
  2.12      0.070     0.070     0.000     0.000    69000   Protobuf::Field::BaseField#setter
  2.05      0.068     0.068     0.000     0.000     1500   Kernel#caller
  2.00      0.066     0.066     0.000     0.000   166500   StringIO#getbyte
  1.69      0.056     0.056     0.000     0.000    72000   Protobuf::Message::Fields::ClassMethods#field_store
  1.53      0.093     0.050     0.000     0.042    72000   Protobuf::Field::BaseField#packed?
  1.41      0.102     0.047     0.000     0.055    51000   Kernel#dup
  1.37      0.045     0.045     0.000     0.000    47000   StringIO#read
  1.34      0.138     0.044     0.000     0.093    36003   Class#new
  1.34      0.285     0.044     0.000     0.241    43000   <Class::Protobuf::Decoder>#read_length_delimited
  1.34      0.165     0.044     0.000     0.121    30000   <Class::Protobuf::Enum>#fetch
  1.26      0.042     0.042     0.000     0.000   172000   Fixnum#zero?
  1.16      0.099     0.039     0.000     0.061    25000   Protobuf::Field::StringField#decode
  1.11      0.061     0.037     0.000     0.025    17501   Protobuf::Message#initialize
  1.06      0.035     0.035     0.000     0.000    89501   StringIO#eof?
  1.03      0.034     0.034     0.000     0.000   166500   Fixnum#<<

That's the top time spent on decoding. It's super easy to patch this in if the gem is available

require 'varint/varint'
module Protobuf
  class FastVarint
    extend ::Varint
  end

  class Decoder
    def self.read_varint(stream)
      FastVarint.decode(stream)
    end
  end
end

This is something that could be offered automatically if the gem is available, it patches it in otherwise it just uses pure Ruby. I understand if you'd rather not include this, since it's a third party C library. If you're interested, I can wrap this up into a PR. Using it resulted in a 25% speedup.

@jjowdy
Copy link

jjowdy commented Sep 26, 2015

read_varint is pretty simple, so I'm a bit surprised the C code is that much faster. Certainly small things add up, though.

A few changes that might help a little:

  1. Skip the multiply operation. Just rename index to shift_amount, then say shift_amount += 7 at the end of each iteration.
  2. I'm not sure if it's significant, but my guess is doing a direct comparison to 0x80 may be a bit faster than the nonzero? method call; e.g, (byte & 0x80) == 0x80.

@zanker
Copy link
Contributor Author

zanker commented Sep 26, 2015

That does help kind of, but not by any appreciable amount when I tested compared to replacing it with C. You're describing is what ruby-protocol-buffers does, which I tested as well:

    def decode(io)
      int_val = 0
      shift = 0
      loop do
        raise(DecodeError, "too many bytes when decoding varint") if shift >= 64
        byte = io.getbyte
        int_val |= (byte & 0b0111_1111) << shift
        shift += 7
        return int_val if (byte & 0b1000_0000) == 0
      end
   end

I also tried replacing nonzero? and zero? calls with > 0 and == 0 to see if that helped, and it didn't do much.

@abrandoned
Copy link
Contributor

would be interested in a Varint PR that still allows the pure ruby version to be usable without any hacks; our primary deployment VM is JRuby which is why we make so many "pure ruby" updates

@zanker
Copy link
Contributor Author

zanker commented Sep 27, 2015

Yup no problem, we have JRuby apps as well. I'll whip up a PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants