-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the wait for GITC response #7
Comments
This is something that we also were trying to come up with a good design for. One thought I had was to have the CNM-R from GITC trigger another workflow that would have the same granule id, so in cumulus you would have multiple workflows attached to a single granule. It was brought up in cumulus office hours that this approach is problematic for the current way cumulus works with multi workflow granules as there is no way to de couple the nominal ingest workflow granule status from something like the GIBS workflow at the granule level in the RDS. We could use the granule id with some additional identifier to like |
Does the cumulus dashboard pull in all step function workflows like that or is there another integration mechanism we need to know about? For example if I just go into the AWS console, create a new step function workflow and execute it; does that execution show up in Cumulus dashboard? I kind of assumed we'd also have to like create an entry in Cumulus' database somewhere. Also, our operations team has had the same complaint about decoupling the ingest workflow status from the status of the post processing workflows |
As long as steps are returning the cumulus granule object the dashboard will pull outcome of the workflow as the status of the granule. We are returning the granules object in every step so that the CMA picks it up and populates RDS. |
Ah ok, just so I make sure I understand, you're saying if the workflow triggered by CNM-R from GITC had steps where each step was a lambda that implemented the CMA library, the cumulus integration is taken care of for us? |
Yes, and to keep the granule from messing up the nominal ingest status we could make the granuleId slightly different so that it can be associated to a unique workflow execution in the dashboard. Ideally its a stop gap until they fix the operational issue for tracking a granule status across many workflows. could have collections and rule setup to trigger when a CNM-R comes to a this new GITC workflow that would be the GITC-Response decoupling of the We are still working on integrating OPERA what sort of response times are you guys seeing from GITC for CNM-R? I see you said you are up to 2000 concurrent (is this minutes of waiting has to be less than the 10 min lambda time out right?) |
Got it, we were planning on just throwing the entire CNM-R json response into the cumulus-internal s3 bucket keyed by collection/granule/uuid or something similar. This would take it one step further and make it available in the cumulus dashboard which I agree would be nice for operations.
I don't understand your question.
According to our kibana metrics, the average time (over last 30 days, ~1.27 million images total) to process 1 granule has been 21 minutes. The concurrency limit is set as described in throttling-queued-executions and it tracks the number of workflows submitted for execution. We currently have the limit set to 2000 for the BrowseImageGeneration post processing workflow. Currently we allow up to 24 hours before we timeout the response from GITC. It is not limited by lambda timeout because we are using step function task tokens to cause the workflow to enter a "wait for token" state. |
Sorry I was purposing that using the CNM-R approach described to a new workflow that handled GITC responses could be a soln for maintaining the functionality that is
I see, I was confused how the system was handling concurrency! I will check this out, we have not explored throttling-queued-executions and the step function task tokens. |
I understand. Your suggestion of starting another workflow on response is a good idea, but it does not directly address the reason why we want to eliminate the "wait for token" step. The reason we want to eliminate the wait step is because it couples the DAAC's ingest process to the status of the GITC system. What I mean by that is; if GITC processing is having an issue/downtime, we don't want the DAAC processing to get backed-up enough that messages start getting lost. We experienced that a few times during our initial deployments with OPERA data. By removing the wait, the browse image workflow is considered "complete" once a message is successfully put onto the GITC queue. |
Plan we will be implementing during 24.2:
|
Testing changes in SIT, ready soon |
After some internal discussions, PO.DAAC feels that we should just eliminate the “wait for response” step entirely. We will still process GITC responses but it will be “out-of-band” and not tied to the Cumulus post processing workflow. This will have a few consequences:
The main question I want to explore is (given item #2) do we feel that GITC can handle and even higher throughput of messages if we remove the waiting?
The text was updated successfully, but these errors were encountered: