-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a python script as datasource #2790
Comments
Well so the "connectors" interface is extendable, but it assumes that the source can filter, aggregate and can expose the tabular structures it makes available (tables/views). Superset asks the database through the query interface, and it's assumed that the backend aggregates and filter the specific "cut" of data. Would your use-cases be geared towards altering atomic data in Python, or preaggregated data? I'd assume the former, which means you'd need to perform aggregations yourself and taking over a lot of the database's functions, resulting in much more workload on the web server that would have see all atomic data of the table on each query. If it's the second use case, you could hook it in easily on top of existing datasource as some sort of hook for a dataframe mutator function that would receive the dataframe and return another. Though I'm unclear on what this would be used for. |
Thank you for your answer, @mistercrunch .
This python script should serve as a low level implementation of your dataframe handler which will provide transparent access to the features such as tables/views/aggregate. These features of the script should be sufficient to provide all necessary features of normal table queries - as far as I understand... But I might be wrong as I haven't looked at the code at all. |
I had the similar question earlier but not well said as you. But as I understand, your request would skip need to skip the SQLAlchemy interface and feed data to superset frontend directly? |
Notice: this issue has been closed because it has been inactive for 334 days. Feel free to comment and request for this issue to be reopened. |
Hi, any updates on this issue? it is very useful data science tool to have. |
Any update on this? |
1 similar comment
Any update on this? |
Is there any way to write and publish custom python scripts so that they would behave as any other (real time) data source.
For example assume that you have some trading data:
And my python script contains a class which reads this data from the input (given) table and outputs another table like this:
Now assume that my custom script runs on table T1 and produces table P1.
T1 by itself may be also an output of another python script or real(-time) database data.
P1 should be a normal citizen in the world of tables and being able to reply on SQL queries and etc.
This functionality will allow to build a robust R&D framework to not only explore the data but also build a pipeline of workflows in more visual way and present and share them with your team.
For example User1 would create a custom script CS1 which takes as input T1 and producing output P1.
User2 don't know if P1 is real table or produced (on the fly) by the CS1. User2 just puts it into it's dashboard and after playing with it enough realizes that in order to make some use of P1 he/she need to create a custom script CS2 which takes P1 and T2 as input and outputs P2 (which can be immediately visualized in the dashboard).
Here is a diagram:

Solution like this might also solve the issues with unsupported datasources for good as it will let users to write their own Table Generators and let them use those tables as inputs.
I can put more concrete examples if these ones were not clear enough.
Thank you very much for such a great software - but this missing feature is real showstopper for me and my team for now.
All the Best Wishes!
The text was updated successfully, but these errors were encountered: