-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(CLI command): Apache Superset "Factory Reset" CLI command #27207 #27221
feat(CLI command): Apache Superset "Factory Reset" CLI command #27207 #27221
Conversation
Thanks @mknadh for the PR. Any reason you didn't opt for a CLI command for removing all assets? Additionally if you're after a factory reset why not just drop all the tables in the database and start from afresh? |
Airflow used to have a "convenient" My take on this would be to document how to do it more at the DBA level -> point to a new schema, init that db, ... |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #27221 +/- ##
===========================================
+ Coverage 60.48% 83.69% +23.20%
===========================================
Files 1931 521 -1410
Lines 76236 37488 -38748
Branches 8568 0 -8568
===========================================
- Hits 46114 31376 -14738
+ Misses 28017 6112 -21905
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for local testing, but probably want to lock this down by default, and allow Superset admins to opt-in
superset/reset/api.py
Outdated
except ValidationError as error: | ||
return self.response_400(message=error.messages) | ||
try: | ||
if item.get("all") or item.get("datasets"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make a command for this?
superset/initialization/__init__.py
Outdated
@@ -221,6 +222,7 @@ def init_views(self) -> None: | |||
appbuilder.add_api(QueryRestApi) | |||
appbuilder.add_api(ReportScheduleRestApi) | |||
appbuilder.add_api(ReportExecutionLogRestApi) | |||
appbuilder.add_api(ResetRestApi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this add behind a boolean config that defaults to False
? Although useful for local dev work, this feels a little dangerous to have enabled by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is a good idea and will add it behind a feature flag, where the default behavior of the feature flag would be to disable this API (by default). That way, no one can run it against PROD, unless enabled.
CC: @john-bodley , @mistercrunch , @craig-rueda , @michael-s-molina
One more comment - I agree with @john-bodley that this would probably be better done as a CLI. |
I also agree this shouldn't be a new API. If we create a command, it should require many levels of confirmation to avoid the problem @mistercrunch mentioned. |
+1 for the suggestions to make it a cli command if anything, for reasons already stated. But what would this do which isn't already achieved by simply dropping the database? |
Could we have a config setting in |
|
|
Here are some reasons of implementing this as a REST API rather than a CLI command:
CC: @john-bodley, @mistercrunch, @craig-rueda, @michael-s-molina |
|
|
@mknadh I don't think I can agree with any of the points you listed above.
In my opinion an operation like this is too destructive and shouldn't be provided at all. I'd be a little more comfortable with it provided as a cli command to be run in development environments only, and (1) warn very clearly about the destructive nature of the operation and (2) do some sanity checks of your config to make sure it's not applied in a production-like environment (e.g. what @mistercrunch suggested) In terms of providing a rest API to do it, no, I don't see any reason why this could be desirable. You'd never want to run this in a production environment. And who should be able to perform this operation via a rest operation? Any admin user? Even in a large deployment where there are half a dozen administration staff of varying levels of familiarity with Superset or technical ability? |
@mknadh Hey, I believe you can't success to call the "reset" REST API in a real production environment since the database should lock a table if other users read/write at same time. BTW.....if you really need a new metadata of Superset, please create a new database instance and change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on converting this into a CLI command
Agreed on the CLI approach, and either (a) locking that down by default in config, or (b) having some VERY thorough confirmations before it executes ("are you sure you want to drop your database? There's no going back. Solve this math problem to prove you're serious"). |
Thank you all. Agreed and now planning to transition the current REST API to a Command-Line Interface (CLI) with the following specifications. Is this specification for this CLI looking fine? Command-Line Interface (CLI) for Factory ResetSpecifications
|
Sounds reasonable to me. It might just be good to add a confirmation (with a safe default) in the CLI to the effect of:
|
Assuming that this can only be executed by admins after layers of confirmations, I don't think we need another feature flag.
I'm curious here. Do we currently have authentication for the CLI commands? Or are assuming that only admins can execute CLI commands? The reason I'm asking is because we already have CLI commands that require admin privileges but we don't require authentication. One example is the
There are many more entities such as saved queries, CSS templates, native filters, etc. Should we change the message to say that this is just a subset of entities that will be deleted? Or even revert the message saying that we'll reset to the factory state and list what we're going to preserve (the except parts)? Are we going to preserve the example dashboards? |
I believe the number of questions and replies to this PR clearly indicates that this should have been a SIP. If the cat is out of the bag already, maybe we could still send something to the @dev list referencing this PR to give other members of the community a chance to evaluate the proposal. |
I don't think it's too late for this to be a SIP. The PR isn't yet doing what's discussed on the thread, and even if the PR were ready, I think it's fine to put a |
3fe790f
to
dacae1b
Compare
Hi @craig-rueda, @dpgaspar - Updated the pull-request by converting the superset factory-reset REST API to a CLI command. Please review and merge, when you get a chance. Thanks. |
@mknadh you'll need to run the pre-commit hook (https://superset.apache.org/docs/contributing/development/#git-hooks-1) to run the linter(s) and get this to pass CI. |
dacae1b
to
6319d4f
Compare
Looks like you'll need to run that pre-commit hook again. Then hopefully we can get this across the finish line :D |
6319d4f
to
5a12441
Compare
Pre-commit hook is successful. Please check now. CC: @rusackas |
feat(CLI command): Apache Superset "Factory Reset" CLI command #27207
SUMMARY
Over time, Apache Superset instances can accumulate large amounts of data, including charts, dashboards, saved queries, and other artifacts. There might be scenarios where users want to start fresh with a clean slate, removing all existing data.
Testing and Development: Developers and administrators often require a quick and efficient way to reset Apache Superset instances to a default state for testing purposes or when setting up development environments.
Data Privacy and Security: In some cases, there might be sensitive or confidential data stored within Apache Superset that needs to be wiped out completely to ensure data privacy and security compliance.
What is the proposal?
Implement a Factory CLI command: Develop CLI command that allows users to trigger a factory reset of Apache Superset. This CLI should be designed to delete all existing data, including charts, dashboards, saved queries, databases, and other related artifacts.
Authorization and Confirmation: Ensure that the CLI requires appropriate authorization to prevent unauthorized access. Additionally, consider implementing a confirmation mechanism to prevent accidental data loss.
Documentation and Best Practices: Provide comprehensive documentation on how to use the Factory Reset CLI, along with best practices for when and how to perform a reset. Include warnings about data loss and the irreversible nature of the operation.
Error Handling and Logging: Implement robust error handling mechanisms within the CLI to handle any unexpected errors gracefully. Additionally, log all reset operations for auditing purposes.
Integration with Configuration Management: Optionally, provide integration with configuration management tools or scripts to automate the process of resetting Apache Superset instances in a controlled and reproducible manner.
ADDITIONAL INFORMATION