Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The lib analytics in Databricks not delivery 100% of calls #243

Closed
gevored opened this issue Nov 30, 2022 · 7 comments
Closed

The lib analytics in Databricks not delivery 100% of calls #243

gevored opened this issue Nov 30, 2022 · 7 comments

Comments

@gevored
Copy link

gevored commented Nov 30, 2022

Hello

I am trying to iterate from a table using python in Databricks and send a identify and track call to a python source in Segment, but I noticed is not every rows/calls that arrive in segment, I didn't receive any erro when I execute analytics.identify() or nalytics.track(), only a message on console "analytics-python queue is full" .

in my tests the volume of data is 50k of rows with 8 columns (properties)

only 37k is arriving in Segment
flow rate: 5.8k/min

our goal is send around 10 MM of calls by day from our data lake

My questions :

  1. Is there a way to garantee which every calls arrive in Segment ?
  2. Is there a way to increase the flow rate ?

Thanks

@nd4p90x
Copy link
Collaborator

nd4p90x commented Nov 30, 2022

@gevored Is this in a prod env or are you using the debugger in app.segment.com?

@gevored
Copy link
Author

gevored commented Nov 30, 2022

prod env, I am using this lib to send calls directly to Segment

@nd4p90x
Copy link
Collaborator

nd4p90x commented Nov 30, 2022

Thank you for that clarification. Let us review and get back with you shortly.

@nd4p90x
Copy link
Collaborator

nd4p90x commented Nov 30, 2022

@gevored Without asking for other specific info, I can only assume that you are looking at the segment debugger to validate the messages you are sending? If this is the case, the debugger by design does not record every single entry sent to it.
Screen Shot 2022-11-30 at 3 12 09 PM

@gevored
Copy link
Author

gevored commented Dec 2, 2022

Yeah the Debbuger have just a sample of the data,but actually I compared through the Schema data source tab, before starting and after finishing
image

in parallel I am using a http post and this solution is delivering 100% of the calls, but I expected this python lib solution to have a better result

@nd4p90x
Copy link
Collaborator

nd4p90x commented Dec 13, 2022

@gevored Can you enable logging and verify the records that are being sent are accurate and let us know the results?
https://segment.com/docs/connections/sources/catalog/libraries/server/python/#logging

Thank you,

@MichaelGHSeg
Copy link
Contributor

Closing this issue. Please reopen if you keep seeing this behavior, we would like to get some logs to understand the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants