-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refresh connector configuration between attempts / activities #20912
Comments
Adding the Platform Workflow team, as this also falls into their wheelhouse. This problem also exists today if a user updates their config mid-attempt. This story is blocked until we can talk to the platform team to learn the best way to do this. This also likely means that we cannot certify any connector which uses short-lived credentials to GA. |
I spoke with @davinchia, @colesnodgrass, and @jdpgrailsdev about this issue. We all agree that there's a platform-level problem we need to solve around the mutability of inputs between temporal activities & job attempts. It is likely that as the platform transitions away from "sync-first" to "stream-first" behavior, this will become less of an issue... but that doesn't help us solve this problem in the near term. Today, the 'sync' is the atomic operation (inclusive of all subsequent activities), rather than the 'attempt', which is why inputs changing between attempts is hard for the platform to handle. I think there are 2 potential paths forward:
Re-hydrate config (and state) at the beginning of the replication activityThis pathway involves /always/ refreshing the contents of the config and state objects, as an atomic operation, as the first step of the replication activity. If we can get this working, we'll be sure that the sync is using the latest inputs (STATE and CONFIG) from the database. We need to investigate if Temporal will get mad at us if we are changing the values of objects it expects to be controlling in memory. One way around this might be the introduction of a new small sub-activity(?) to refresh these inputs, which run before each "normal" phase of the replication workflow (check, replicate, normalize) Throw a new "restart-the-sync-now" exception@davinchia proposed an interesting idea which might be the stop-gap we need today. In the case in which the replication activity has received at least one
This specially retried sync will re-load the config (and state) that it needs from the database and be on it's way. There's a possible circle in which the CHECK the next sync does will change the config again... but that seems unlikely because of the short time period between syncs. This will look strange in the UI The path forward is to time-box an investigation into these 2 options and see which one will be faster. |
If we emit a reasonable error message in option 2, it won't be that weird for the user. "Failure Origin: Facebook has reset the oauth token, which requires the snync to restart from where it left off. It may need to retry up to 3 times." Doing option 1 on its own feels a little spooky to me, because it breaks one of the core invariants of the current system "every attempt for a job runs on the same inputs". If we want to move away from this paradigm entirely that's no problem, but just poking a hole in it for just this use case will be hard to reason about. I guess unless we entirely change the paradigm to "every attempt always fetches the new config". That'll make the job / attempt superfluous but at least is easy to understand. |
Grooming:
|
I also agree that we should choose option 1, because "every attempt always fetches the new config" is what we want, and we also want "every attempt always fetches the latest state". Right now, I think we have a bug in which Attempt 2 will re-use the original state before Attempt 1. This means that Checkpointing doesn't work. Solution 1 will solve this! |
From @cgardens: We do have a test for platform checkpointing ( Lines 150 to 152 in 0b27aa5
Re-hydrate config (and state) at the beginning of the replication activity )
|
The re-hydrate solution will also solve #17774 |
Tell us about the problem you're trying to solve
#19811 introduces the ability for connectors to update configuration throughout the sync in order to support single-use refresh tokens. However, configuration is currently loaded at the start of the sync and passed around between attempts via temporal. This means that even though the database has updated configuration, subsequent attempts will not have that data.
I also believe the same problem exists between CHECK / SYNC, if a check occurs before the sync that emits a control message, the sync that runs afterwards will not have the updated configuration.
This is particularly bad because:
Describe the solution you’d like
Subsequent sync attempts should use the latest configuration.
The text was updated successfully, but these errors were encountered: