Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Make the _internal database persistent, customizable, and hidden #2157

Open
asg017 opened this issue Aug 24, 2023 · 3 comments
Open

Comments

@asg017
Copy link
Collaborator

asg017 commented Aug 24, 2023

The current _internal database is used by Datasette core to cache info about databases/tables/columns/foreign keys of databases in a Datasette instance. It's a temporary database created at startup, that can only be seen by the root user. See an example _internal DB here, after logging in as root.

The current _internal database has a few rough edges:

  • It's part of datasette.databases, so many plugins have to specifically exclude _internal from their queries examples here
  • It's only used by Datasette core and can't be used by plugins or 3rd parties
  • It's created from scratch at startup and stored in memory. Why is fine, the performance is great, but persistent storage would be nice.

Additionally, it would be really nice if plugins could use this _internal database to store their own configuration, secrets, and settings. For example:

  • datasette-auth-tokens creates a _datasette_auth_tokens table to store auth token metadata. This could be moved into the _internal database to avoid writing to the gues database
  • datasette-socrata creates a socrata_imports table, which also can be in _internal
  • datasette-upload-csvs creates a _csv_progress_ table, which can be in _internal
  • datasette-write-ui wants to have the ability for users to toggle whether a table appears editable, which can be either in datasette.yaml or on-the-fly by storing config in _internal

In general, these are specific features that Datasette plugins would have access to if there was a central internal database they could read/write to:

  • Dynamic configuration. Changing the datasette.yaml file works, but can be tedious to restart the server every time. Plugins can define their own configuration table in _internal, and could read/write to it to store configuration based on user actions (cell menu click, API access, etc.)
  • Caching. If a plugin or Datasette Core needs to cache some expensive computation, they can store it inside _internal (possibly as a temporary table) instead of managing their own caching solution.
  • Audit logs. If a plugin performs some sensitive operations, they can log usage info to _internal for others to audit later.
  • Long running process status. Many plugins (datasette-upload-csvs, datasette-litestream, datasette-socrata) perform tasks that run for a really long time, and want to give continue status updates to the user. They can store this info inside _internal
  • Safer authentication. Passwords and authentication plugins usually store credentials/hashed secrets in configuration files or environment variables, which can be difficult to handle. Now, they can store them in _internal

Proposal

  • We remove _internal from datasette.databases property.
  • We add new datasette.get_internal_db() method that returns the _internal database, for plugins to use
  • We add a new --internal internal.db flag. If provided, then the _internal DB will be sourced from that file, and further updates will be persisted to that file (instead of an in-memory database)
  • When creating internal.db, create a new _datasette_internal table to mark it a an "datasette internal database"
  • In datasette serve, we check for the existence of the _datasette_internal table. If it exists, we assume the user provided that file in error and raise an error. This is to limit the chance that someone accidentally publishes their internal database to the internet. We could optionally add a --unsafe-allow-internal flag (or database plugin) that allows someone to do this if they really want to.

New features unlocked with this

These features don't really need a standardized _internal table per-say (plugins could currently configure their own long-time storage features if they really wanted to), but it would make it much simpler to create these kinds of features with a persistent application database.

  • datasette-comments : A plugin for commenting on rows or specific values in a database. Comment contents + threads + email notification info can be stored in _internal
  • Bookmarks: "Bookmarking" an SQL query could be stored in _internal, or a URL link shortener
  • Webhooks: If a plugin wants to either consume a webhook or create a new one, they can store hashed credentials/API endpoints in _internal
@simonw
Copy link
Owner

simonw commented Aug 24, 2023

We discussed this in-person this morning and these notes reflect what we talked about perfectly.

I've had so many bugs with plugins that I've written myself that have forgotten to special-case the _internal database when looping through datasette.databases.keys() - removing it from there entirely would help a lot.

Just one tiny disagreement: for datasette-comments I think having it store things in _internal could be an option, but in most cases I expect users to chose NOT to do that - because being able to join against those tables for more advanced queries is going to be super useful.

Show me all rows in foia_requests with at least one associated comment in datasette_comments.comments kind of tihng.

@simonw
Copy link
Owner

simonw commented Aug 24, 2023

But yes, I'm a big +1 on this whole plan.

@asg017
Copy link
Collaborator Author

asg017 commented Aug 31, 2023

@simonw what do you think about adding a DATASETTE_INTERNAL_DB_PATH env variable, where when defined, is the default location of the internal DB? This means when the --internal flag is NOT provided, Datasette would check to see if DATASETTE_INTERNAL_DB_PATH exists, and if so, uses that as the internal database (and would fallback to an ephemeral memory database)

My rationale: some plugins may require, or strongly encourage, a persistent internal database (datasette-comments, datasette-bookmarks, datasette-link-shortener, etc.). However, for users that have a global installation of Datasette (say from brew install or a global pip install), it would be annoying having to specify --internal every time. So instead, they can just add export DATASETTE_INTERNAL_DB_PATH="/path/to/internal.db" to their bashrc/zshrc/whereever to not have to worry about --internal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants