-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[source-stripe] Customer_Balance_Transaction records are occasionally getting missed #48857
Comments
We are running on community edition so I can share a public example, but I was able to verify the CBT did exist when I manually called Stripe's API through python, so I'm fearing it could be an issue with a way I implemented it and it should have been a created only one.. |
@marcosmarxm @maxi297 I may have had an epiphany on what is going on, but I'd love for you to check my thinking to make sure it pencils out before I open up a PR fix for it. In PR 46864 I introduced support for customer balance transactions to pull incrementally instead of requiring a full resync each time. The previous implementation only tried pulling records when a customer's balance was zero, but that missed updated where offsetting CBT transactions occurred raising and then lowering the balance. It was also an intensive pull when a Stripe customer had a lot of When I coded this solution, I assumed the way UpdatedCursorIncrementalStripeSubStream would work is it would call it once per customer to pull any incremental updates on the event_type On top of that, I think the event I used for CBT is wrong and applies to cash balance transactions (which are different than customer balance transactions). I don't see any direct events for CBT. Before I revert this PR or try to fix it, I want to make sure I'm not fixing one thing and breaking another. Here was the old code in streams.py that was full syncing everything for CBT but only when there was an invoice and the balance was zero (balance zero was a bug that dropped records that created and offsetted between airbyte syncs). streams.py. Some ideas:
I'm leaning toward #1, but I want to make sure I understand #2 correctly. |
@maxi297 do you mind taking a look into this issue reported by William? |
This is my first attempt at a fix - 5395a98 |
Just to share my understanding:
The change you are proposing would change it so that we do the following in incremental mode:
That change makes sense to me in the short run but it seems like we could be at risk of losing transactions on the next incremental sync. The example would be if we have the following set of events:
In the case above on the next sync, we will query only for transactions where
Does that reasoning make sense to you? I've created an issue for this problem. In the meanwhile, I think your change makes sense. If you think you might be affected by this dataloss, I suggest that you run things in full_refresh mode for now. We will refine this issue with the team next Tuesday and see what is the priority and how we will tackle this. |
@maxi297 That makes sense. I agree, with this change, the risk of data loss is low but non-existent. Full Refresh on CBT is problematic, because of Stripe's bullshit API for CBTs where they don't emit an event on created nor do they have a list-all for all customers CBTs, either of which would solve this. We have a HubiFi customer with a Stripe Account with over 1.5M customer records. Doing a full refresh means we need make 1.5M calls every full refresh. No only does that take a good 6+ hours to complete, but it spikes Airbtyes memory consumption (I'm guessing all the customer records gets loaded into memory and the customers that are processed aren't being dereferenced thus are getting held onto). If this change works, for now I'm going to run on a custom build of the stripe connector with the commit above cherry-picked. If you guys aren't getting any complaints and aren't going to look at fixing this more probably, I may put this up as a minor PR to clean up my implementation that uses the wrong event. Just let me know how that goes next week. |
@williamkaper we will be looking at this change soon-ish. The expectation is that we will be working on moving source-stripe to our no-code framework in the coming weeks in order to simplify maintenance. It's not only for maintenance though. For example, the problem I mentioned above is already solved for free with this. So I think I would process in two steps:
Let me know if you have concerns around this plan and we'll figure these out together 💪 |
I already removed Legacy_cursor. This change seems to have resolved the
issue, but I'm looking forward to seeing what the low code Stripe looks
like!
I'll put up a PR either tomorrow or next week as an interim solution.
|
PR up: #49811 |
Connector Name
source-stripe
Connector Version
5.8.0
What step the error happened?
None
Relevant information
We are seeing CBT's that aren't getting picked up on created/updated. I would have assumed since there is a created event for customer_balance_transaction (doc here), the updated would catch it if the record was "created" after the last sync but with a created value that was before the last sync. https://github.com/williamkaper/airbyte/blob/e1217e8c7e5ce8400dc7a0eb0c25e2426734c[…]e-integrations/connectors/source-stripe/source_stripe/source.py
If that is not the case, then I'm worried that I might need to change this sync so its more like a normal balance transaction where it only has a created date and can leverage a look-back window to pick it up. Is there anyway I could get someone on the team to take a quick peek at this and weigh in?
Relevant log output
Contribute
The text was updated successfully, but these errors were encountered: