Skip to content

Commit

Permalink
Rewrite synchronizer and registry to drop tables at maintenance just …
Browse files Browse the repository at this point in the history
…before some DB operations. Better keep track of allocated dry run tables and ensure that a dry run name does not collide with a pre-existing user table.
  • Loading branch information
radeusgd committed Jun 23, 2023
1 parent 2b33adc commit 5b8fc83
Show file tree
Hide file tree
Showing 13 changed files with 207 additions and 115 deletions.
7 changes: 4 additions & 3 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -2058,9 +2058,10 @@ lazy val `std-database` = project
Compile / packageBin / artifactPath :=
`database-polyglot-root` / "std-database.jar",
libraryDependencies ++= Seq(
"org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided",
"org.xerial" % "sqlite-jdbc" % sqliteVersion,
"org.postgresql" % "postgresql" % "42.4.0"
"org.graalvm.truffle" % "truffle-api" % graalVersion % "provided",
"org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided",
"org.xerial" % "sqlite-jdbc" % sqliteVersion,
"org.postgresql" % "postgresql" % "42.4.0"
),
Compile / packageBin := Def.task {
val result = (Compile / packageBin).value
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ type Connection
Later, once nodes can have expandable arguments, we can merge this with
`tables`, marking the `include_hidden` argument as expandable.
get_tables_advanced self name_like=Nothing database=self.database schema=Nothing types=self.dialect.default_table_types all_fields=False include_hidden=False =
self.maybe_run_maintenance
types_vector = case types of
Nothing -> Nothing
_ : Vector -> types
Expand All @@ -179,6 +180,14 @@ type Connection
hidden_tables = self.hidden_table_registry.list_hidden_tables
result.filter "Name" (Filter_Condition.Not_In hidden_tables)

## PRIVATE
Checks if the table with the given name exists in the database.
table_exists : Text -> Boolean
table_exists table_name =
# TODO make this more efficient
tables = self.get_tables_advanced name_like=Nothing database=connection.database schema=Nothing types=Nothing all_fields=False include_hidden=True
tables.at "Name" . to_vector . contains table_name

## PRIVATE
Set up a query returning a Table object, which can be used to work with
data within the database or load it into memory.
Expand All @@ -201,7 +210,7 @@ type Connection
query self query alias="" = case query of
_ : Text ->
result = self.query alias=alias <|
if (all_known_table_names self).contains query then (SQL_Query.Table_Name query) else
if self.table_exists query then (SQL_Query.Table_Name query) else
SQL_Query.Raw_SQL query
result.catch SQL_Error sql_error->
case self.dialect.is_probably_a_query query of
Expand Down Expand Up @@ -342,13 +351,71 @@ type Connection
drop_table self table_name if_exists=False =
self.execute_update (self.dialect.generate_sql (Query.Drop_Table table_name if_exists))

## PRIVATE
Returns the base `Connection` instance.

Used, so that all internal helper functions do not need to be replicated
on the 'subclasses'.
base_connection : Connection
base_connection = self

## PRIVATE
If no thread (including the current one) is currently running operations
on the connection, maintenance will be performed.

Currently, this consists of removing dry run tables that are no longer
used.

This method should be run by most database operations to ensure that
unused tables are cleaned at some point.

All errors are swallowed and not propagated, so it is safe to call this
method wherever. There is no point of calling this method inside of
critical sections as then it will not do anything.
maybe_run_maintenance self =
callback _ =
Hidden_Table_Registry.run_maintenance_table_cleanup self
self.jdbc_connection.run_maintenance_action_if_possible callback

## PRIVATE
max_table_name_length : Integer | Nothing
max_table_name_length self =
reported = self.jdbc_connection.with_metadata .getMaxTableNameLength
if reported == 0 then Nothing else reported

## PRIVATE
Generates a temporary table name for the given table name, used for dry
runs.

The table name is 'stable', meaning that the same name will be returned
for the given input `table_name` on subsequent calls, unless the user
creates a clashing table in the meantime.

The table name is guaranteed to be unique for the database at the time it
is generated - this is used to ensure that the dry run tables never
overwrite pre-existing user data.
generate_dry_run_table_name : Text -> Text
generate_dry_run_table_name table_name =
go ix =
prefix = "enso-dry-run-" + if ix == 0 then "" else ix.to_text + "-"
max_length = (self.max_table_name_length.if_nothing 60) - 1
name = prefix + table_name.take (max_length - prefix.length)
## The dry run name is ok if it is already registered (that means it
may exist in the Database, but it was created by other dry runs
and is safe to overwrite) or if it does not exist in the database.
name_ok = (self.hidden_table_registry.is_registered name) || (self.table_exists name . not)
if name_ok then name else
@Tail_Call go (ix + 1)
go 0

## PRIVATE
Creates a Table reference that refers to a table with the given name.

Once all references to the table with this name are destroyed, the table
will be dropped.
internal_allocate_hidden_table self table_name =
ref = self.hidden_table_registry.get_reference table_name
will be marked for removal and dropped at the next maintenance.
internal_allocate_dry_run_table : Text -> Database_Table
internal_allocate_dry_run_table self table_name =
ref = self.hidden_table_registry.make_reference table_name
make_table_for_name self table_name table_name ref

## PRIVATE
Expand All @@ -369,11 +436,6 @@ make_schema_selector connection =
schemas_without_nothing = connection.schemas.filter Filter_Condition.Not_Nothing
Single_Choice values=(schemas_without_nothing.map t-> Option t t.pretty)+[Option "any schema" "Nothing"]

## PRIVATE
all_known_table_names connection =
tables = connection.get_tables_advanced name_like=Nothing database=connection.database schema=Nothing types=Nothing all_fields=False include_hidden=True
tables.at "Name" . to_vector

## PRIVATE
make_table_name_selector : Connection -> Widget
make_table_name_selector connection =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ from project.Internal.Upload_Table import all
@primary_key Widget_Helpers.make_column_name_vector_selector
Table.select_into_database_table : Connection -> Text -> Vector Text | Nothing -> Boolean -> Problem_Behavior -> Table ! Table_Already_Exists | Inexact_Type_Coercion | Missing_Input_Columns | Non_Unique_Primary_Key | SQL_Error | Illegal_Argument
Table.select_into_database_table self connection (table_name : Text) primary_key=[self.columns.first.name] temporary=False on_problems=Problem_Behavior.Report_Warning =
select_into_table self connection table_name primary_key temporary on_problems
select_into_table_implementation self connection table_name primary_key temporary on_problems

## Updates the target table with the contents of this table.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ from project.Internal.Upload_Table import all
@primary_key Widget_Helpers.make_column_name_vector_selector
Table.select_into_database_table : Connection -> Text -> Vector Text | Nothing -> Boolean -> Problem_Behavior -> Database_Table ! Table_Already_Exists | Inexact_Type_Coercion | Missing_Input_Columns | Non_Unique_Primary_Key | SQL_Error | Illegal_Argument
Table.select_into_database_table self connection (table_name : Text) primary_key=[self.columns.first.name] temporary=False on_problems=Problem_Behavior.Report_Warning =
select_into_table self connection table_name primary_key temporary on_problems
select_into_table_implementation self connection table_name primary_key temporary on_problems

## Updates the target table with the contents of this table.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,49 +8,45 @@ polyglot java import org.enso.database.dryrun.HiddenTableReferenceCounter
A reference to a hidden table that keeps it alive.

Once all references to a particular hidden table are garbage collected, the
hidden table itself will be dropped.
hidden table is marked for deletion.
type Hidden_Table_Reference
Reference (parent : Hidden_Table_Registry) (table_name : Text)

## PRIVATE
A registry that keeps track of temporary hidden tables.

These tables will all be destroyed once the connection is closed, but to
avoid creating too many, the registry tries to drop them more eagerly once
they stop being needed.
avoid creating too many, the registry may allow to drop them more eagerly.

! Concurrency

Note that this code is run in GC threads and may run into concurrency
issues. Currently, we have no synchronization primitives in Enso to prevent
this. This is just a 'heuristic' helper that tries to limit the amount of
dry-run tables living in a session, so the potential failures should not be
fatal. Nonetheless, it may be worth to revise this once we get better
concurrency support.

The potential point of failure is the two places were we `get` and later
`put` an updated `Ref`. These can happen concurrently and result in the
mapping getting inconsistent.
Moreover, the registry keeps track of which tables were created by Enso,
allowing us to avoid dropping tables with similar names that were created by
the user.
type Hidden_Table_Registry
## PRIVATE
Registry (reference_counter : HiddenTableReferenceCounter) (drop : Text -> Nothing)
Registry (reference_counter : HiddenTableReferenceCounter)

## PRIVATE
get_reference : Text -> Managed_Resource
get_reference self table_name =
make_reference : Text -> Managed_Resource
make_reference self table_name =
self.reference_counter.increment table_name
reference = Hidden_Table_Reference.Reference self table_name
Managed_Resource.register reference dispose_reference

## PRIVATE
list_hidden_tables : Vector Text
list_hidden_tables self =
self.mapping.get.keys
Vector.from_polyglot_array self.reference_counter.getKnownTables

## PRIVATE
is_registered : Text -> Bool
is_registered self table_name =
self.reference_counter.isRegistered table_name

## PRIVATE
Creates a new hidden table registry instance.
new : Hidden_Table_Registry
new (drop : Text -> Nothing) =
Hidden_Table_Registry.Registry (HiddenTableReferenceCounter.new) drop
new =
Hidden_Table_Registry.Registry (HiddenTableReferenceCounter.new)

## PRIVATE
Utility method for disposing of references. Provided to avoid accidental
Expand All @@ -59,5 +55,19 @@ dispose_reference : Any -> Nothing
dispose_reference reference =
registry = reference.parent
new_count = registry.reference_counter.decrement reference.table_name
if new_count <= 0 then
registry.drop reference.table_name

## PRIVATE
Drops all temporary hidden tables that have been marked for removal and not
brought back to life.

This method must be run in a critical section guaranteeing that no other
operations will be performed on the associated connection in parallel.
run_maintenance_table_cleanup connection =
registry = connection.hidden_table_registry
reference_counter = registry.reference_counter

tables_scheduled_for_removal = Vector.from_polyglot_array reference_counter.getTablesScheduledForRemoval
tables_scheduled_for_removal.each table_name->
# The table could not exist in case a transaction that created it was rolled back. We just ignore such cases.
connection.drop_table table_name if_exists=True
reference_counter.markAsDropped table_name
Original file line number Diff line number Diff line change
Expand Up @@ -38,16 +38,9 @@ type JDBC_Connection
## PRIVATE
Runs the provided action ensuring that no other thread is working with
this Connection concurrently.

This is used to ensure that queries issued from finalizers do not
interrupt any query processing happening on the main thread. This should
also be used to wrap transactions, so that the finalizer queries do not
run in the middle of a transaction but instead are forced to wait until
the whole transaction finishes. The `OperationSynchronizer` uses a
re-entrant lock, so it is safe to use this method in nested calls.
synchronized self ~action =
callback = _ -> action
self.operation_synchronizer.runActionSynchronously callback
self.operation_synchronizer.runSynchronizedAction callback

## PRIVATE
Closes the connection releasing the underlying database resources
Expand All @@ -63,28 +56,33 @@ type JDBC_Connection
Open the connection to the database, then run the action wrapping any
SQL errors.
with_connection : (Connection -> Any) -> Any
with_connection self ~action = self.synchronized <|
with_connection self action = self.synchronized <|
handle_sql_errors <|
self.connection_resource.with action

## PRIVATE
Runs the provided callback only if no thread is currently inside a
`synchronized` critical section (including the current thread).
run_maintenance_action_if_possible : (Nothing -> Any) -> Nothing
run_maintenance_action_if_possible self callback =
self.operation_synchronizer.runMaintenanceActionIfPossible callback

## PRIVATE

Open the connection to the database, then run the action passing the
database's metadata wrapping any SQL errors.
with_metadata : (DatabaseMetaData -> Any) -> Any
with_metadata self ~action = self.synchronized <|
handle_sql_errors <|
self.connection_resource.with connection->
metadata = connection.getMetaData
action metadata
with_metadata self ~action = self.with_connection connection->
metadata = connection.getMetaData
action metadata

## PRIVATE

Runs the provided action with a prepared statement, adding contextual
information to any thrown SQL errors.
with_prepared_statement : Text | SQL_Statement -> Statement_Setter -> (PreparedStatement -> Any) -> Any
with_prepared_statement self query statement_setter action = self.synchronized <|
prepare template values = self.connection_resource.with java_connection->
prepare template values = self.with_connection java_connection->
stmt = java_connection.prepareStatement template
handle_illegal_state caught_panic =
Error.throw (Illegal_Argument.Error caught_panic.payload.message)
Expand Down Expand Up @@ -145,7 +143,8 @@ type JDBC_Connection
running this function (so if it was off before, this method may not
change anything).
run_without_autocommit : Any -> Any
run_without_autocommit self ~action = self.synchronized <|
run_without_autocommit self ~action =
# The whole block is already `synchronized` by `with_connection`.
self.with_connection java_connection->
default_autocommit = java_connection.getAutoCommit
Managed_Resource.bracket (java_connection.setAutoCommit False) (_ -> java_connection.setAutoCommit default_autocommit) _->
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -223,21 +223,12 @@ type Postgres_Connection
self.connection.drop_table table_name if_exists

## PRIVATE
A helper that allows to access all tables in a database, including hidden
ones.
Returns the base `Connection` instance.

Later, once nodes can have expandable arguments, we can merge this with
`tables`, marking the `include_hidden` argument as expandable.
get_tables_advanced self name_like=Nothing database=self.database schema=Nothing types=self.dialect.default_table_types all_fields=False include_hidden=False =
self.connection.get_tables_advanced name_like database schema types all_fields include_hidden

## PRIVATE
Creates a Table reference that refers to a table with the given name.

Once all references to the table with this name are destroyed, the table
will be dropped.
internal_allocate_hidden_table self table_name =
self.connection.internal_allocate_hidden_table table_name
Used, so that all internal helper functions do not need to be replicated
on the 'subclasses'.
base_connection : Connection
base_connection = self.connection

## PRIVATE

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -217,21 +217,12 @@ type SQLite_Connection
self.connection.drop_table table_name if_exists

## PRIVATE
A helper that allows to access all tables in a database, including hidden
ones.
Returns the base `Connection` instance.

Later, once nodes can have expandable arguments, we can merge this with
`tables`, marking the `include_hidden` argument as expandable.
get_tables_advanced self name_like=Nothing database=self.database schema=Nothing types=self.dialect.default_table_types all_fields=False include_hidden=False =
self.connection.get_tables_advanced name_like database schema types all_fields include_hidden

## PRIVATE
Creates a Table reference that refers to a table with the given name.

Once all references to the table with this name are destroyed, the table
will be dropped.
internal_allocate_hidden_table self table_name =
self.connection.internal_allocate_hidden_table table_name
Used, so that all internal helper functions do not need to be replicated
on the 'subclasses'.
base_connection : Connection
base_connection = self.connection

## PRIVATE

Expand Down
Loading

0 comments on commit 5b8fc83

Please sign in to comment.