Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AbstractMessage public in Ruby #12550

Closed
jez opened this issue Apr 25, 2023 · 8 comments
Closed

Make AbstractMessage public in Ruby #12550

jez opened this issue Apr 25, 2023 · 8 comments
Assignees
Labels
discussion inactive Denotes the issue/PR has not seen activity in the last 90 days. ruby

Comments

@jez
Copy link

jez commented Apr 25, 2023

What language does this apply to?

If it's a proto syntax change, is it for proto2 or proto3?
If it's about generated code change, what programming language?

Ruby

Describe the problem you are trying to solve.

In this change the AbstractMessage parent class was added for Ruby code. The motivation was to be a pure performance win, exploiting how classes and objects are laid out in the Ruby code.

But as it happens, having a common base class that has all these methods defined is particularly convenient. When using an optional static type checker in a Ruby project, it's very easy if you can write Google::Protobuf::AbstractMessage in a type signature, and know that it has all these methods. For example, using Sorbet, it would be possible to write a signature like

sig do
  params(
    proto_class: T.class_of(Google::Protobuf::AbstractMessage),
    bytes: String
  )
  .returns(Google::Protobuf::AbstractMessage)
end
def decode_and_log(proto_class, bytes)
  puts("Decoding with class: #{proto_class.descriptor.name}")
  proto_class.decode(bytes)
end

Right now it's annoying to write signatures for these methods, because methods like descriptor and decode aren't actually defined on the only other common type in the inheritance hierarchy, MessageExts::ClassMethods:

[dev] main:0> Google::Protobuf::MessageExts::ClassMethods.instance_methods
=> []

So there's not a convenient type you can write—but there would be if AbstractMessage were not a private constant.

Describe the solution you'd like

Can we remove private_constant :AbstractMessage?

private_constant :AbstractMessage

Describe alternatives you've considered

An alternative would be to actually define the methods on MessageExts::ClassMethods.

In the Sorbet type system this is slightly less ergonomic to use, MessageExts and MessageExts::ClassMethods look like completely orthogonal modules to the type system. (For example, it doesn't know that MessageExts::ClassMethods has all the methods that a normal class would have, like .name or .ancestors.) That means you have to go around writing things like T.all(Google::Protobuf::MessageExts, Class) which is confusing.

It's a lot easier to just mention the exact superclass you want in your type annotation, which in this case is AbstractMessage.

Additional context
Add any other context or screenshots about the feature request here.

@jez jez added the untriaged auto added to all issues by default when created. label Apr 25, 2023
@zhangskz zhangskz added ruby discussion and removed untriaged auto added to all issues by default when created. labels Apr 25, 2023
@haberman
Copy link
Member

haberman commented Apr 25, 2023

The thing we are really trying to avoid here is giving the impression that protobuf messages have a common base class where common functionality goes.

If we followed the common expectation of base classes, then once AbstractMessage is a public base class, people could start sending us PRs to add any number of convenient methods to it.

But this is asking for conflicts, because the generated class will have accessors for every field defined in the .proto file. You could wreak havoc by creating a proto file like:

message BadForRuby {
  optional int32 to_json = 1;
  optional int32 to_proto = 2;
  optional int32 decode = 3;
  optional int32 convenient_method_added_in_pr = 4;
}

To avoid these kinds of conflicts, we want generated classes to look as much like dumb data containers as possible, with no base class, and have all interesting/common functionality come from other places.

@jez
Copy link
Author

jez commented Apr 25, 2023

@haberman thanks for the quick response, really appreciate it!

If we followed the common expectation of base classes, then once AbstractMessage is a public base class, people could start sending us PRs to add any number of convenient methods to it.

Seems feasible to make it clear where the line is drawn on contributions there, both with a comment in the code and with code review.

But this is asking for conflicts, because the generated class will have accessors for every field defined in the .proto file.

Doesn’t this problem exist regardless of whether you define methods like decode and descriptor on AbstractMessage or you pretend they’re only accessible on each concrete message class?

The PR that came before this already introduced AbstractMessage. If someone defines a proto message with fields whose names collide with names already used by AbstractMessage, they will override them, regardless of whether AbstractMessage is public or private.

Also, correct me if I'm wrong, but this only applies to instance methods, right? I'm not sure that optional int32 to_json (or any other name) will ever collide with singleton class methods on AbstractMessage. So maybe in your example you meant to use something like optional int32 to_s or optional int32 dup.

In the case of having field names collide with method names, there's even more reason to want to expose AbstractMessage:

bad_for_ruby = # ... create BadForRuby message
Google::Protobuf::AbstractMessage.instance_method(:to_s).bind(bad_for_ruby).call

This .bind(...).call pattern allows calling the real to_s method directly, avoiding problematic overrides. This pattern is used all over the place in Ruby code that has to be defensive against people overriding built-in functionality (for example, in a gem I maintain, we have a whole module full of this pattern).

So my argument is:

  • the colliding names problem is not made worse by making AbstractMessage public
  • in fact, it's actually made better, because it empowers people to work around it as they see fit

@haberman
Copy link
Member

I wish there was a way of writing this more like:

sig do
  params(
    proto_class: Google::Protobuf::MessageClass,
    bytes: String
  )
  .returns(T.instance_of)
end
def decode_and_log(proto_class, bytes)
  puts("Decoding with class: #{proto_class.descriptor.name}")
  proto_class.decode(bytes)
end

It's really the singleton methods you are interested in here, which are like instance methods on the class. The fact that the proto has a base class isn't interesting or important, except in what it says about what singleton methods are available.

On the other hand, I notice that Struct.new() in Ruby returns classes that have Struct as a base class, which is a pretty close match for what we are doing here. So maybe a public base class makes sense.

I'm not sure that optional int32 to_json (or any other name) will ever collide with singleton class methods on AbstractMessage.

I think to_json and to_proto are defined on AbstractMessage, as non-singleton methods https://github.com/protocolbuffers/protobuf/blob/main/ruby/lib/google/protobuf/message_exts.rb#L43-L49

Tested here: https://github.com/protocolbuffers/protobuf/blob/main/ruby/tests/basic.rb#L110

@jez
Copy link
Author

jez commented Apr 26, 2023

I think to_json and to_proto are defined on AbstractMessage, as non-singleton methods

ah thanks, I had only checked the C defined methods

Google::Protobuf::MessageClass

is this a class that already exists? If not, could you say more about how you thought this would be defined?

also, in your example what does T.instance_of mean?

Again, I’m pretty sure that the relevant Sorbet machinery exists to write a sig, as long as the methods are defined on some class or module that already exists.

If the singleton class methods were defined on Google::Protobuf::MessageExts::ClassMethods, that would good, but AbstractMessage would still be better.

Basically: any solution that involves defining the methods that are common to all messages on a common ancestor of all messages is what unblocks better types. That could be a parent class, or a module, but it has to be some class/module that actually is visible at runtime and actually has the methods defined on it.

Writing the type annotations is easier and involves less boilerplate when it’s a class like AbstractMessage, but if the instance methods were on MessageExts and the singleton class methods on MessageExts::ClassMethods, sorbet would be able to write types for that too (again, this approach would just add verbosity to users, but at least this is tolerable).

It's really the singleton methods you are interested in here, which are like instance methods on the class.

So I’m also interested in the instance methods, to a lesser degree. We have 120 MB of proto-generated Ruby code in our codebase, which adds about 120 + 120*4 MB to the memory used when parsing and type checking. If these methods were defined on a parent class, we could define them once and not have to make seemingly-identical definitions for all the instance and singleton class methods. This would cut down on the size of generated files, making type checking faster and use less memory.

Sorry for not bringing up this point in the original post, as I only realized it yesterday afternoon.

jez added a commit to jez/protoc-gen-rbi that referenced this issue Apr 26, 2023
In our codebase we're fighting a constant battle against the codebase's
growing size. Having a larger codebase makes basically everything
slower, from creating an initial dev environment to type checking code
to creating deploy artifacts.

Generated protobuf files end up contributing a lot to the bloat of our
codebase. These two options make protoc-gen-rbi generate less code. The
options:

- `hide_common_methods`

  This option hides the common methods like `decode_json` and `to_h`
  that protoc-gen-rbi would otherwise create. While it's most accurate
  to have those methods be defined on these classes, it's actually
  possible to get basically the same type safety by defining these
  methods elsewhere, using `T.attached_class`.

- `use_abstract_message`

  A recent `google-protobuf` PR changed the superclass of all message
  classes:

  <protocolbuffers/protobuf#10281>

  The new superclass, `Google::Protobuf::AbstractMessage`, is
  technically private. However, having a common superclass is quite
  convenient for a number of reasons, and codebases may wish forcibly
  make this class publicly visible. If that's been done, it's nice to be
  able to mention that superclass instead of the `include` and `extend`
  calls, because `AbstractMessage` is exactly defined by just an empty
  class with those two mixins, and getting rid of those calls generates
  less code.

  For more discussions on the merits of making this class public:

  <protocolbuffers/protobuf#12550>

I'm happy to split this change apart if you think it would be better
that way.

On Stripe's codebase, these changes amount to a 4% reduction in Sorbet's
peak memory usage, and a 1% improvement to time spent type checking, so
I'm quite eager to get this change or some equivalent version of it
landed.
jez added a commit to jez/protoc-gen-rbi that referenced this issue Apr 26, 2023
In our codebase we're fighting a constant battle against the codebase's
growing size. Having a larger codebase makes basically everything
slower, from creating an initial dev environment to type checking code
to creating deploy artifacts.

Generated protobuf files end up contributing a lot to the bloat of our
codebase. These two options make protoc-gen-rbi generate less code. The
options:

- `hide_common_methods`

  This option hides the common methods like `decode_json` and `to_h`
  that protoc-gen-rbi would otherwise create. While it's most accurate
  to have those methods be defined on these classes, it's actually
  possible to get basically the same type safety by defining these
  methods elsewhere, using `T.attached_class`.

- `use_abstract_message`

  A recent `google-protobuf` PR changed the superclass of all message
  classes:

  <protocolbuffers/protobuf#10281>

  The new superclass, `Google::Protobuf::AbstractMessage`, is
  technically private. However, having a common superclass is quite
  convenient for a number of reasons, and codebases may wish forcibly
  make this class publicly visible. If that's been done, it's nice to be
  able to mention that superclass instead of the `include` and `extend`
  calls, because `AbstractMessage` is exactly defined by just an empty
  class with those two mixins, and getting rid of those calls generates
  less code.

  For more discussions on the merits of making this class public:

  <protocolbuffers/protobuf#12550>

I'm happy to split this change apart if you think it would be better
that way.

On Stripe's codebase, these changes amount to a 4% reduction in Sorbet's
peak memory usage, and a 1% improvement to time spent type checking, so
I'm quite eager to get this change or some equivalent version of it
landed.
@haberman
Copy link
Member

haberman commented May 1, 2023

Basically: any solution that involves defining the methods that are common to all messages on a common ancestor of all messages is what unblocks better types.

Right, and my concern is that I don't want messages to have common methods. As much as possible, I want message instances to have accessors for the fields defined in a .proto file, and nothing more.

We do have common methods on message classes (ie. class-level methods). That's why I would favor a solution that can somehow represent that protobuf classes have common class-level methods, but not object methods.

MessageClass doesn't exist currently. I was thinking that if a class derived from T, then T.instance_of would be an instance of that class.

@jez
Copy link
Author

jez commented May 1, 2023

Right, and my concern is that I don't want messages to have common methods.

... but there already are common methods, right?

If a message defines a field named hash or to_s or to_proto, those common methods will already be shadowed by the corresponding message field. So if you did

my_message.hash

you would be calling the accessor for the proto field named hash, not calling the hash function defined by the Message_hash C function in this gem.

But if AbstractMessage were public, anyone who is already aware of this pre-existing problem with methods getting shadowed would instead be able to write

Google::Protobuf::AbstractMessage.instance_method(:hash).bind_call(my_message)

which lets the user forcibly call the real Message_hash function. So making AbstractMessage public is a way to let the shadowed methods be called even though they have already been shadowed.

The gem as written today, where AbstractMessage is private, does not actually prevent people from making the mistake that calling to_h or hash or to_json on a message class might actually access a proto field with those names. Another example: the following code is already valid—the Ruby runtime will not prevent the call to to_h simply because to_h is defined on the parent class, and the parent class is private

def count_fields(message)
  as_hash.to_h.size
  #       ^^^ expected to call `to_h` in parent, but might actually access a field
end

The user will only find out that it misbehaves the first time they happen to import a proto file with an uncommon name—whether AbstractMessage is public or private doesn't actually change this problem.

We do have common methods on message classes (ie. class-level methods). That's why I would favor a solution that can somehow represent that protobuf classes have common class-level methods, but not object methods.

Given what I've said above, I'm confused how to satisfy these criteria. The way I read this request suggests that you're nearly asking for the changes in #10281 to be reverted, which started to define common instance methods on all messages via a superclass. I am not suggesting that we add more common methods to all messages, but merely that the existing parent class be made public.

MessageClass doesn't exist currently. I was thinking that if a class derived from T, then T.instance_of would be an instance of that class.

Without getting into a full discussion of how most type systems for Ruby work (including Sorbet's type system), suffice it to say that as long as some constant like SomeClass exists, you can write T.class_of(SomeClass)[T.type_parameter(:U)] to represent "any singleton class which is equal to or a subclass of SomeClass", and then use T.type_parameter(:U) to represent "whatever an instance of that class is."

So the feature you're describing already exists, although it has different syntax from the syntax you propose.

But that syntax requires that SomeClass be a public class, otherwise you can't write the class's name in a type annotation.

Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

@github-actions github-actions bot added the inactive Denotes the issue/PR has not seen activity in the last 90 days. label Nov 13, 2023
Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion inactive Denotes the issue/PR has not seen activity in the last 90 days. ruby
Projects
None yet
Development

No branches or pull requests

3 participants