Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby phase 2 - Workflows #96

Merged
merged 4 commits into from
Nov 18, 2024
Merged

Ruby phase 2 - Workflows #96

merged 4 commits into from
Nov 18, 2024

Conversation

cretz
Copy link
Member

@cretz cretz commented Oct 24, 2024

Summary

  • This is the proposal for the second phase of the Ruby SDK about workflows. Please review and leave comments!
  • See it rendered.
  • Much of this is inspired by the previous Temporal Ruby SDK (and its proposals which were moved to a sub directory) and our other SDKs.
  • The design of workflows and their handlers is an early one and we welcome any suggestions for alternative designs here. Nothing is concrete.
  • There are plenty of outstanding questions on the proposal, some of which need more implementation and performance measurements to decide.
  • Use this PR for comments/discussion. Don't be afraid of making too many comments; if it gets too noisy, I'll just create another PR.
  • Any points of contention will be discussed in-person by the SDK team and a decision will be made.

For general discussion beyond comments here, please also join us on #ruby-sdk in Slack.

@cretz cretz requested a review from a team October 24, 2024 12:18
* `Temporalio::Workflow` is both the class to extend and the class with class methods for calling statically to do
things.
* The `workflow_` class methods on the `Temporalio::Workflow` can only be called at class definition time and cannot
be called on `Temporalio::Workflow` directly. These are for adjusting the definition.
Copy link

@dandavison dandavison Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user writing client code writes something like

supported_languages = wf_handle.query(
  GreetingWorkflow.<TAB>

will their IDE show them all the "framework" methods (including the unavailable methods starting workflow_)? It is nice that in other languages the user's IDE only offers the methods from their own workflow interface. I think we would like to have the same be true for Ruby if possible.

Copy link
Member Author

@cretz cretz Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I am afraid this is only available at runtime (via method_missing). Ruby does not allow you to access an instance method at the class level.

I think we would like to have the same be true for Ruby if possible.

Completely agree, but I cannot think of a way to do this without some form of proxying which were trying to avoid for a few reasons (mostly because we'd have to lie about what the type actually is). And even if we did do proxying, Ruby is often not expressive enough to represent this well statically. Here may be some other options, but all have more tradeoffs than the current approach I think.

class Workflow
  # @return [self]
  def self.stub
    StubDelegator.new(self)
  end
end

# This if you want to keep accurate with typing, but no clear way to provide
# handle and args
my_wf = MyWorkflow.stub
my_wf.update('my arg') 

# This if you want better options, but you'd fail a type checker
my_wf_handle.update(MyWorkflow.stub.update, 'my arg')

# You could use blocks, but it gets confusing for users too
my_wf_handle.update(wait_for_stage: whatever) { |stub| stub.update('my arg') }

Open to suggestions here, but I fear Ruby just cannot statically express things to the IDE like this. Ruby developers are used to having to use symbols or runtime methods that are not in their IDE. Of course a user can provide YARD macros or Sorbet/RBI types to support this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've probably already implicitly explained this, but in Python this is done by implementing static things like @query decorators as functions on a workflow module. Can you (re-)summarize why an approach like that doesn't work for Ruby?

Copy link
Member Author

@cretz cretz Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python you can reference an instance method via Class.instance_method, but in Ruby you cannot. You must have an instance of a class to reference an instance method on it. FooWorkflow.my_query_method will give a NoMethodError. You can do hacky things like FooWorkflow.instance_method(:my_query_method) to get an unbounde method, but you're into reflection there not IDE help.

We will support FooWorkflow.my_query_method via method_missing approaches, but that is runtime, not definition time like IDEs need. But I'm open to ideas here. There are concepts of "stubs" we have used in Java, but they have many of their own problems.

I am investigating https://github.com/Shopify/tapioca#writing-custom-dsl-compilers to see if we may be able to write a custom DSL compiler/extension to help here.

Copy link
Member Author

@cretz cretz Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have confirmed that even for Sorbet users, custom DSL compilers don't run in IDEs by default anyways. I cannot see a way at this time to support typesafe workflow messages without a fake/proxy type of object which I am trying to avoid. Ruby users don't really expect DSLs to be well typed in their IDE (and in the Ruby world, our need to define certain methods as special like as a query is only really done via DSL).

Comment on lines 331 to 332
* ❓ Is this acceptable? This is a consequence of us wanting to share workflow context class with workflow base
class. Basically at the top of these calls there's a `if self != Workflow` check.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's pretty much fine, but... why not just have a WorkflowContext class and be explicit? Or, have that be Workflow and the other be WorkflowDefinition

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely a good question and subject to debate and is the reason I put "Temporalio::Workflow is both the class to extend and the class with class methods for calling statically to do things." at the top of the controversial decisions.

So basically you have these options:

Option 1 - Together

class MyWorkflow < Temporalio::Workflow
  def execute
    Temporalio::Workflow.wait_condition { false }
  end
end

Option 2 - Context class named separate

class MyWorkflow < Temporalio::Workflow
  def execute
    Temporalio::Workflow::Context.wait_condition { false }
  end
end

One benefit to this is that we do have Temporalio::Activity::Context that houses all of the activity things, but if you look at some other SDKs like Java and .NET, the class name is "activity context" but not "workflow context", so that disparity does exist elsewhere.

Option 3 - Workflow class named separate

class MyWorkflow < Temporalio::WorkflowImpl
  def execute
    Temporalio::Workflow.wait_condition { false }
  end
end

I am using WorkflowImpl here because there will be a Temporalio::Workflow::Definition that represents the static definition built from the impl (and signal, query, and definition classes, same as there is already Temporalio::Activity::Definition). Open to a better name, I just struggled to come up with one.

Option 4 - Don't even require class extension

class MyWorkflow
  def execute
    Temporalio::Workflow.wait_condition { false }
  end
end

You aren't buying that much with a class extension, but it is clear and is common in Ruby and is what we do for activities.

Thoughts/preference? Any other/better options here? Mixins would suffer the same class-for-two-purposes reuse issue.

Copy link
Member

@Sushisource Sushisource Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like option 3. Clear separation, just as terse as option 1 for the vast majority of the code..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, which is basically the same as option 2 just different on who gets the "Workflow" part. I will say I like "Context" for users more than I like "WorkflowImpl" for extenders, but I do like callers of workflow utilities to only need Temporalio::Workflow instead of Temporalio::Workflow::Context (and what drove my original approach). Either way, I am starting to lean towards separating what you extend from what you invoke. Any ideas for other names though besides either "Context" for workflow utilities or "WorkflowImpl" for the extension class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really lol. Those are pretty obvious choices. Maybe WorkflowDefinition like I suggested instead of impl. I would err on making that longer since you type it way less.

Copy link
Member Author

@cretz cretz Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after some internal discussion, I think I'm landing at something like:

module Temporalio
  class Workflow
    class << self
      protected

      def workflow_name(name)
        raise 'TODO'
      end
    end

    module API
      def execute_activity(activity, blah1, blah2)
        raise 'TODO'
      end
    end

    extend API
  end
end

So basically you have protected class methods for adjusting the workflow definition, then an API set of module methods in the workflow class by default (extend in this way I think avoids YARD assuming they are part of the class, still investigating). This will allow others to write utilities for workflows using Temporalio::Workflow::API directly (or extending it themselves too). Doing a bit more investigation before accepting this approach.

EDIT: This is still no good, the IDEs and docs still see the dozens of methods in API as part of the class extending Workflow. I am looking at various options here, but I'm struggling to find a way beyond splitting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, proposal updated to require workflows extend from Temporalio::Workflow::Definition and use module functions on Temporalio::Workflow. As a consistency consequence, activities will now also have to extend from Temporalio::Activity::Definition.

* 💭 Why? It is helpful for callers/client-side to be able to reference something more explicit than a symbol, and it
also helps with making sure if the name is changed we use that name instead of the method name. Symbols can always
be used though.
* ❓ Is this acceptable or too much magic?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this kind of magic is largely fine because it's not really pretending some behavior is different, it's just advanced sugar for fetching the definition in a nice way. Works for me. The only sort of magic bit I suppose is that, if you call that, it's not invoking the method, but it would seem fairly obviously undesirable to do that anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It is not very common in Ruby to have class methods and instance methods of the same name with different behavior, though it does happen here and there. But I think our use case uniquely justifies it.

The bigger struggle is type safety integration, but Ruby type safety is in such poor state I 1) cannot find a good way to support caller-side type safety in RBS/Sorbet (unless relying on something like this which doesn't work in IDEs anyways), and 2) do not want to degrade ease of use by the majority of users using this as normal Ruby without type safety checks. Users can of course write their own signatures for these things, e.g. in RBS you might have:

class MyWorkflow < Temporalio::Workflow
  def self.my_update: -> Temporalio::Workflow::Definition::Update[MyReturnType, MyArgType]

  def my_update: (MyArgType arg) -> MyReturnType
end

I am open to any/all designs here. Requirements for others reading:

  • Workflow handlers for signal, query, and update must be easily implemented as instance methods
  • Workflow handlers for signal, query, and update must be easily usable by clients without having an instance of the workflow (we don't want fake/proxy objects where method invocation does different things)

Comment on lines 396 to 399
def self.any_of_no_raise(*futures)
# Returns future whose result is set to nil when all futures complete
# regardless of whether any future failed.
def self.all_of_no_raise(*futures)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are confusing. The wait and wait_no_raise method accomplishes the waiting & maybe raising part just fine.

From what I understand these are really more like different conditions for when any/all resolve. The first is more like any_success and the second is really all_of, where the current all_of is really try_all_of or something. IE: It looks like the originals short circuit on any failure, and these versions do not.

That or I'm not understanding them, because what's the difference between any_of_no_raise(...).wait and any_of_no_raise(...).wait_no_raise? The latter reads particularly strangely. Either way if I'm wrong or right, I think it says the names are a bit confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because what's the difference between any_of_no_raise(...).wait and any_of_no_raise(...).wait_no_raise?

No big difference if you have both of these calls together. But they have value independently (the first one returns a future the second you have chosen not to even be able to look at the failure) and have value as consistent shortcuts. I can't imagine anyone would call the latter combo, wait_no_raise will more often be used for people using singular futures they don't care about.

Most languages have forms of these. But I am open to better names though don't want to get too inconsistent with each other and don't want these to be arguments since they change the return type. Basic requirements I think are:

  • Ability to wait for first of multiple futures or raise on first failure
  • Ability to wait for first of multiple futures but do not raise on first failure
  • Ability to wait on all of multiple futures or raise on first failure
  • Ability to wait on all of multiple futures but do not raise on first failure
  • Ability to wait on a future or raise on failure
  • Ability to wait on a future and not raise on failure

This was my best attempt after many permutations to get those 6 methods in. Definitely open to clearer suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would then change the names a-la my suggestions - the most important part I think is just not to use no_raise in these names since it's not really about not raising, it's about not short-circuiting in the event one fails.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would then change the names a-la my suggestions

Do you have a specific suggestion on names? So instead of:

  • any_of
  • any_of_no_raise
  • all_of
  • all_of_no_raise
  • wait
  • wait_no_raise

reading comment above, something like:

  • any_of
  • any_of_success - says success but includes failure?
  • try_all_of - fails future if any fail?
  • all_of - does not fail future if any fail, so one "x_of" doesn't ail the future and one does?
  • wait
  • wait_no_raise

Maybe something like try instead of _no_raise, e.g.

  • any_of
  • try_any_of
  • all_of
  • try_all_of
  • wait
  • try_wait

Looking for symmetry in naming between messages that fail their things and ones that don't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the try names, but I think they should actually be the short-circuiting versions. IE: Existing any_of and all_of become the try_xxx versions, and the current _no_raise versions just become any_of and all_of. Since "try" implies "try until something breaks". This also matches the Rust naming scheme for these, FWIW.

I would probably keep wait_no_raise though rather than try there just because that's more obvious. And, I think, to my point, that is actually about raising or not where the other stuff is about short-circuiting.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, to me "try" means failure is captured or doesn't fail as it otherwise would, e.g. .NET int.Parse vs int.TryParse. One fails as normal, another captures failure (same as the actual try statement in many languages).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try here is like "try them, but fail if any fails soon" whereas without it is "definitely do all the futures even if some of them fail".

But, I wouldn't cry about having it the other way around, I think that can work too w/ clear docs. I do think wait and wait_no_raise should stay that way though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, so wait, wait_no_raise settled. Any other term we can think of for "swallow error" form of the others besides try? I will update proposal with the try_ form, but definitely open to other words here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal updated to change any_of_no_raise and all_of_no_raise to try_any_of and try_all_of and clarify they do not raise when waited on.

* ❓ Is this an acceptable default? Python uses `[Etc.nprocessors, 4].max` as the max threads, but we figure it can
be unbounded here and remain bounded by the Core `max_concurrent_workflow_tasks` (TODO: expose this, it was left
off in `main`).
* ❓ Should the default share the same thread pool as activity executor?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, specifically because of some of the workflow isolation stuff that may or may not need to happen in them. Seems easier and cleaner to keep them separate without much downside.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference between thread pool and executor in this case. One benefit of reusing the thread pool (but still different executors of course) is that it is an "unbounded caching thread pool" which means the same threads could be reused across workflows and activities. The threads, once returned back to the pool, should have no side effects from its use. By having separate unbounded thread pools here, both pools will have idle threads that the other could be using unnecessarily. Definitely not an easy/hard thing to share vs keep separate because we will have lazily created global defaults for these (and already do for activity executor).

So the only argument against I can think of is maybe some kind of concern for user clarity, but my hope is that users never touch this stuff for the most part.

is that not only are Ractors not stable and may have bugs, but they also give off a warning. A workflow instance will be
in a Ractor and Core activations will be communicated.

* ❓ Should we enable this by default, disable this by default, or somehow force the user to make a choice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone cares about choosing between these two really. I think we need to figure out what has acceptable performance and correctness and do that. The warning / experimental status for Ractors seems like a nonstarter, though.

But, other than that, I can't really imagine why someone is coming along and deciding "Oh I really need to use Ractors and not TracePoint" for some reason. The only thing they care about is that the workflows run fast & errors get caught.

That is, of course, excepting some other weird interactions that aren't mentioned here or we don't know about.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning / experimental status for Ractors seems like a nonstarter, though.

I am not sure this is true. The Ractor gives so much benefit to ensuring safe determinism, it's basically a language-supported sandbox. And if we don't enable them by default (or force the user to choose whether to use them for now), we never can enable them by default. And they are quite decent for now for limited uses from my understanding.

I can't really imagine why someone is coming along and deciding "Oh I really need to use Ractors and not TracePoint" for some reason

This section is purely for Ractors, I think we will turn on TracePoint by default.

It is a very tough call here on whether to enable Ractors by default, because Ractors are a perfect fit for this use case and we would support turning them off (at the worker level), and if we have a non-Ractor default now, we will never be able to change to the safer Ractor default if we want to later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we never can enable them by default.

So why is that true?

I think the warning isn't so much the problem as the content of the warning that basically says "these are buggy". If that's not actually true, then cool, I think we can be fine with it, but, we should know that for sure.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why is that true?

To clarify I mean "if we want this as default, we have to do the more restrictive default now because we can't later" because you can't go from a less-restrictive default to a more-restrictive default without breaking people. Ractors prevent state sharing (e.g. a global is different in each Ractor). Also, things that are shared across Ractors must be marked as such via https://docs.ruby-lang.org/en/master/Ractor.html#method-c-make_shareable (which is an awesome feature and we already require this be done with payload and failure converters knowing they may be used inside sandbox). So turning on Ractors by default when they weren't before would break a lot of people that were, say, using their logger or OTel impl or other things people sometimes opt out of sandboxes for.

I think the warning isn't so much the problem as the content of the warning that basically says "these are buggy". If that's not actually true, then cool, I think we can be fine with it, but, we should know that for sure.

We definitely have to test here, and they may be buggy for some advanced uses and maybe not our own. But even if that's the case, defaulting to experimental thing is rough and it spits out a warning, even though plenty of Ruby users do use Ractors today. Even if we default, we have to make it very easy to change and document it very clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see. But arguably they already need to not be doing all that stuff for proper workflows anyway?

I think, if we find that we're not running into implementation bugs, it can be an OK default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see. But arguably they already need to not be doing all that stuff for proper workflows anyway?

Usually, yes, but some things people accept they need to cheat (e.g. interceptor sending exception to sentry when not replaying, or a logger that writes to disk but is not the logger we offer)

I think, if we find that we're not running into implementation bugs, it can be an OK default.

I'm leaning there too. But it will be super loud that we use them and how not to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to force users to explicitly specify whether to use Ractors or not.

If we think we might want to change the default in the future, I think it's fine to require users to explicitly specify it now and remove the requirement later (so long as we're not asking them to do that for too many things). This enables us to migrate to the more restrictive default for future new users without breaking existing users.

We did something similar for Update IIRC.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, no SDK requires extra options for running workflows. I think if we choose Ractors we can change the default to be less restrictive if we ever had to (though I don't suspect we ever would).

We did something similar for Update IIRC.

This is not comparable I don't think because we know we will provide a default in the future but it is not yet implemented. Here, we don't know, so this isn't necessarily temporary. I think we might as well decide now, because in 5 years I don't think our decision would change.

I'm leaning towards the safer-though-technically-warns-as-experimental default that is easy to opt out of. Obviously it will be a heavily tested default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal updated to Ractors by default with easy, well-documented way to opt out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

Copy link

@josh-berry josh-berry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general non-blocking comments, no need to wait for my approval.

Comment on lines 373 to 375
* ❓ Why not just take a dependency on `async` gem? It does have a deterministic scheduler and does a decent job of
everything we want, but not only do we want to avoid transitive dependencies in our library (for versioning and
other reasons), we cannot rely on its logic into the future.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is asyncs deterministic behavior actually part of its contract or no? (I'm thinking of how we got burned by Python's change in behavior for asyncio; I imagine you are too)

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is asyncs deterministic behavior actually part of its contract or no?

Not explicitly, but most of the things it does are delegated to the underlying fiber scheduler. The other problem is not just that it is deterministic, but that it also doesn't make changes that would be incompatible (e.g. schedule two fibers in a different order in a newer version of the library).

Concerning determinism, they do say as a comment inside https://github.com/socketry/async/blob/72fae62658b68e85d53d4c6adf13d3b34108cfe8/lib/async/scheduler.rb#L428-L435 that they intentionally try to preserve determinism, but obviously a code comment is not a clear contract (and that's their scheduler, we'll have our own).

It's a tough call, because it's a full featured library with lots of utilities that leans on the deterministic fiber scheduler, and we surely don't want to recreate every construct they have already created. But maybe we don't want to encourage use of this project? Java SDK gets along just fine with its primitive Promise support. We may still want to lean on traditional Ruby things like queues, timeouts, and sleep I think, but maybe not (see #96 (comment)). Regardless, we'd have to write our own mutex (we're already writing our own future).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would be a much easier call if determinism were an explicit design goal, but since it's not as such, I think it's probably safer to avoid the dependency for now. Perhaps we could align our own API surface with its API so we could switch later if there is user demand? That might also help reduce the cognitive overhead of learning Temporal if the user already knows async.

Copy link
Member Author

@cretz cretz Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably safer to avoid the dependency for now

To clarify, we would never have a dependency, just that it would happen to work and we could recommend its use (and we'd test that it does in our tests). Having said that, I think I agree maybe we should not even recommend its use or discuss it.

Perhaps we could align our own API surface with its API so we could switch later if there is user demand

It's too large. I think we can have Future (what they call Task) and Mutex manually implemented, and then decide whether to explicitly implement sleep and Timeout and Queue or allow reuse of the standard library ones (knowing in Ruby this isn't the same as leaning on a large set of async primitives, these are 3 very limited low-level common Ruby things that will not change behavior).

I think I'm leaning towards no async library recommendation, Future + Mutex implemented, sleep + Timeout + Queue just work from the standard library. We haven't really needed more than this in any other language.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal updated to discourage use of async library and clarify that we do support stdlib things that work in all environments.

Comment on lines 490 to 495
* `Temporalio::Worker::WorkflowTaskExecutor::ThreadPool` that accepts a thread pool which defaults to max threads
unbounded.
* There is a `default` class method with a lazily created global default.
* ❓ Is this an acceptable default? Python uses `[Etc.nprocessors, 4].max` as the max threads, but we figure it can
be unbounded here and remain bounded by the Core `max_concurrent_workflow_tasks` (TODO: expose this, it was left
off in `main`).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhmmm, unbounded thread creation seems scary to me. I think I'm OK leaving it bounded by max_concurrent_workflow_tasks as long as it's actually designed that way intentionally and it's not just something that happens by accident (e.g. it should be explicit in our code/code comments that this is the intention).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bounded by that from Core. But that is per worker and the thread pool is shared across workers for obvious benefits. So yes, Core ensures it cannot be more than the sum of max concurrent tasks across all workers using the thread pool.

is that not only are Ractors not stable and may have bugs, but they also give off a warning. A workflow instance will be
in a Ractor and Core activations will be communicated.

* ❓ Should we enable this by default, disable this by default, or somehow force the user to make a choice

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to force users to explicitly specify whether to use Ractors or not.

If we think we might want to change the default in the future, I think it's fine to require users to explicitly specify it now and remove the requirement later (so long as we're not asking them to do that for too many things). This enables us to migrate to the more restrictive default for future new users without breaking existing users.

We did something similar for Update IIRC.

@cretz
Copy link
Member Author

cretz commented Oct 29, 2024

Updated with the following changes:

  • Workflows now must extend from Temporalio::Workflow::Definition
  • Activities now must extend from Temporalio::Activity::Definition
  • Ractor-based task execution is now on by default
  • Clarified that concurrency primitives in the standard library can be used
  • We now discourage use of the async library
  • Changed names of no-failure Future class methods

class GreetingWorkflow < Temporalio::Workflow::Definition
workflow_query_attr :language

def initialize(name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider: getting ahead of Workflow Init and making that a default in ruby. I don't foresee a lot of downside to taking workflow args by default.

Copy link
Member Author

@cretz cretz Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be the default and we chose not to make it the default in .NET even though [WorkflowInit] was present since its inception. There is a workflow_init below to facilitate opting in just like every other SDK that allows it, not make Ruby some special thing here. Would also like not to eagerly validate that the constructor arguments match the execute arguments if not opted in. And the exact workflow_init behavior likely wouldn't default anyways, instead we'd have to reflectively look at initialize and it'd optionally be provided start args only if they were accepted (which is what @dandavison considered doing instead of @workflow.init in Python before deciding otherwise). We've found requiring either "parameterless + no workflow init" or "workflow init + exact execute parameter matching" to be safer/clearer than "optional workflow init based on reflection".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBC, what I'm proposing is just to require initialize to take start workflow args always. That feels like the most normal and straightforward approach in any object-oriented setting, familiar to everyone. Your approach will dramatically reduce the number of folks who take advantage.
I understand that from your perspective, the workflow init approach is "normal." But stepping out of your Chad shoes, what we've done in other SDKs is pretty esoteric. To most users, using the constructor is normal unless they happen to have used temporal in other languages, and even they will be plenty familiar with constructors.

Copy link
Member Author

@cretz cretz Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a default parameterless constructor is esoteric nor do I think an absent constructor is esoteric. These are the defaults for the class in the language, and can/should be the default for us. What is esoteric in Ruby and most programming languages is requiring a constructor with certain parameters even if you don't need it. That's what's non-normal.

@mutex = Temporalio::Workflow::Mutex.new
end

def execute

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might want to show how args appear here.
(should they appear here, or should they just be part of initialize?)
I think having both methods take them is a bit confusing, but I can buy the convenience of having execute take them.

Copy link
Member Author

@cretz cretz Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(should they appear here, or should they just be part of initialize?)

Here

I think having both methods take them is a bit confusing, but I can buy the convenience of having execute take them.

Like every other SDK, and consistent with our behavior of workflow entry points

end

workflow_update_validator(:set_language)
def validate_set_language(language)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the experience if they don't keep the args in sync b/w validator and handler? I'd like them to get reasonable feedback.
suggestion: This seems like one to see if sorbet/rbs can wrangle static typesafety via a generic, perhaps by putting in a nested class.

Copy link
Member Author

@cretz cretz Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails at register time like every other register-time validator in every other SDK

This seems like one to see if sorbet/rbs can wrangle static typesafety via a generic, perhaps by putting in a nested class.

I can't envision how a nested class might look, regardless would like to keep this as simple as possible for regular Ruby users

activities: [GreetingActivity],
workflows: [GreetingWorkflow]
)
worker.run(shutdown_signals: ['SIGINT'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is shutdown_signals a choice that the user needs to make, or is there a good default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a choice that a user needs to make. This is an application not a library and it's not neighborly for us to trap signals by default (or have any global side effects by default). This is already in main btw and was part of the activity PR.

Copy link

@drewhoskins-temporal drewhoskins-temporal Nov 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to comment here since that PR is in main.
I think this is important since every user will need to set up a Worker, and this will be part of the "getting started" scenario, so I have a few questions.
Looks like we default to []. If this is a choice that every developer needs to make, why is there a default? What % of users do you think that default will be acceptable for?
Looking at your docs: https://github.com/temporalio/sdk-ruby/blob/67741fcbd9e354b9a8d03e7e923efe8844ccf1ca/temporalio/lib/temporalio/worker.rb#L82

I don't see any guidance. It's typed as String | Int in rbs, and I don't see any runtime validations. How do users know what values are acceptable or get guidance on what's normal to pass in there in what situations and get feedback if they make a mistake?

Copy link
Member Author

@cretz cretz Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a choice that every developer needs to make, why is there a default?

It is not a choice every developer will make. This is a library and many developers will prefer to use a block body or the Cancellation token as a way to control worker lifetime. Signals is but one way to stop a worker, by no means the default and by no means one that everyone will want. These workers are just things that run how the user wants, we just provide this helper to cancel the Cancellation for them on Signal.trap.

How do users know what values are acceptable or get guidance on what's normal to pass in there in what situations and get feedback if they make a mistake?

I will clarify in docs that these values are just passed to Signal.trap. That's typed in https://github.com/ruby/rbs/blob/d9000d23c2c1d032a194848562806896c54fd0af/core/signal.rbs#L92-L93 and https://github.com/sorbet/sorbet/blob/0ab8f3c29260f8ecee31aa960de3692419e2d3c3/rbi/core/signal.rbi#L100-L113 as general item (I need to add Symbol there it seems), there is no more typing guidance than that. Unsure what is expected wrt "get feedback if they made a mistake" but if https://docs.ruby-lang.org/en/master/Signal.html#method-c-trap is not documented enough in this area, we can consider contributing.

# Send query
supported_languages = wf_handle.query(
GreetingWorkflow.languages,
{ include_unsupported: false }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's our static type safety story here? Probably worth making sure that .rbi users can get good static validations..

Copy link
Member Author

@cretz cretz Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static typing is still under research, but our early research has shown it would not affect design (nor would we want it to for the majority of Ruby users that don't care about it and don't deserve to be affected by it), so we are not blocking this effort.

Comment on lines 225 to 230
# Exposes an attribute as a query. 💭 Why? We found this _very_ valuable in
# .NET. Think of it as attr_reader for workflows. It was also decided there
# was not enough value to have attr_accessor shortcut for update or
# attr_writer shortcut for signals since both are not commonly just
# getter/setter.
def self.workflow_query_attr(query_method)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool.
nit: I wonder if workflow_query_attr_reader would more grokkable by the ruby community, despite the partial reduncancy of query vs reader.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Can add suffix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

def self.workflow_name(name)

# Placed above initialize to get start arguments.
def self.workflow_init

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, I wonder if we can avoid this rigmarole since we're starting fresh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied above, but I think this should be opt-in and was chosen that way in the last SDK even though we were starting fresh there too.

* Handler methods will use `RawValue` for args if told to, similar with workflow `execute`/`initialize` if
`workflow_raw_args` is set.
* `RawValue` can be returned too.
* To mark a workflow as dynamic, `workflow_dynamic` class method is called. Only one dynamic workflow can be present for

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a layering violation. The workflow itself doesn't need to know whether it's dynamic; instead, it should be registered as dynamic on the worker, which also makes it easier to select which is the dynamic workflow for the worker. Without that, if the user has two workflows registered as dynamic_workflow in the same codebase, it would barf even if only one is registered in the worker.
See https://github.com/coinbase/temporal-ruby/blob/f41efb7bb0c9e755a07382b94e00a4f31bcd33c1/lib/temporal/worker.rb#L97 for what we did in the coinbase SDK.

Copy link
Member Author

@cretz cretz Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow itself doesn't need to know whether it's dynamic

This is not true in any SDK nor should it be true here. Dynamic workflows/activities should be allowed to be in the same list at registration time with the rest of the workflows. We want authors to clearly mark and be aware that a workflow, activity, signal, query, or update is dynamic. While technically dynamic workflow and activity's signatures are no different, dynamic signal, query, and update signatures are different and must opt in. So all dynamics in all forms are clearer as author opt-in.

Without that, if the user has two workflows registered as dynamic_workflow in the same codebase, it would barf even if only one is registered in the worker.

This is not true here or in any SDK, this is only checked at registration time, same as in every SDK with dynamics. You can have as many as you want implemented, you can only have one dynamic per worker. There are no globals here.


## Workflow Asynchrony

Ruby does not have a built-in task/promise/future/etc concept. However it does have fibers and we can leverage those.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like this is out of date and seems to contradict your Future stuff below?

Copy link
Member Author

@cretz cretz Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statements are accurate. Ruby does not have a built-in concept and we can leverage fibers. We add our own future later on top of fibers of course.

class ManyActivityWorkflow < Temporalio::Workflow
def execute(count)
futures = count.times.map do
Future.new do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found Temporal ruby's implicit Futures weird, so this could work well.
However, please try to make sure that return types propagate through the future for static types. Would be a shame to lose type safety for async.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static typing is still under research, but our early research has shown it would not affect design (nor would we want it to for the majority of Ruby users that don't care about it and don't deserve to be affected by it), so we are not blocking this effort.

Comment on lines 279 to 280
def info
def current_update_info
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def info
def current_update_info
def self.info
def self.current_update_info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@cretz
Copy link
Member Author

cretz commented Nov 18, 2024

Made two minor fixes, but otherwise no outstanding/unresolved things (or even updates) for a while. Merging.

@cretz cretz merged commit 2373449 into temporalio:master Nov 18, 2024
2 checks passed
@cretz cretz deleted the ruby-phase-2 branch November 18, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants