-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #2570: DB schema change to make extensions opaque #2593
Conversation
There must be something that I'm failing to do during the upgrade. I can create a pubnet database with schema 12, run on it for a while, and then upgrade it with my new code. And I can create a new database with schema 13 with my new code, and run on that. But if I run on schema 12 for a while and then upgrade with my new code, I can't run with the upgraded database; this happens on startup:
|
Oh, I think that was operator rather than code error on my part -- I did the two runs (with the old code and then the new code) from two different working directories, and I think that by default the "buckets" directory is placed in the current working directory. So I was using the wrong buckets directory for the run with the new code. |
Confirmed, that was my operator error, using different bucket directories for the two runs. |
Performance note: After running overnight on the pubnet with DB schema v12 (prior to this PR), I switched over to v13 (a binary with the changes from this PR), and, as it consistently has so far when I've tested upgrades on owlbear, it took ~20s to upgrade the database schema, which involved rewriting ~60K accounts records and ~120K trustlines records (that's how many have liabilities in the current database). Including that time and the ensuing catchup, it took ~2m40s for the node to sync back up on v13. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good. I have a variety of comments, many of which are similar in substance. I had so many comments on the tests that I stopped reviewing them and will review them again after they are updated.
src/database/Database.cpp
Outdated
// extensions in the future without bumping the database schema | ||
// version, writing any upgrade code, or changing the SQL that reads | ||
// and writes those tables. | ||
addTextColumnWithDefault( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not intrinsically opposed to doing this for accountdata and offers. But as it stands today, it does not look likely that those XDR types will be extended any time in the near future (I've yet to encounter a proposal that would extend either). There is a minor overhead in storage and performance. @MonsieurNicolas what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the new columns aren't being filled, but will just be NULL, perhaps that overhead will be less? (Of course, I'll implement whatever you and @MonsieurNicolas decide you prefer.)
I think I've addressed all the review comments so far, mostly by just making the suggested changes, and in a few cases proposing and implementing different ones or offering reasons to leave it as it is. I've finally marked this as ready for review, as opposed to a draft, though further review and discussion will almost certainly lead to some further changes. |
Here's a raw benchmark log from owlbear, comparing a series of ledger-entry-related [bench] tests, with a v12 run first and then a v13 run. I have not done enough to get an idea of the variance, but at a first glance nothing has jumped out at me as untoward. |
src/main/ApplicationImpl.h
Outdated
@@ -116,6 +116,8 @@ class ApplicationImpl : public Application | |||
|
|||
virtual AbstractLedgerTxnParent& getLedgerTxnRoot() override; | |||
|
|||
virtual void actBeforeDBSchemaUpgrade() override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this should be a virtual function of Database
instead of Application
? Seems like separation of concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was that Application would be the one class which dictated where its clients could inject code. If actBeforeDBSchemaUpgrade()
were to move to the Database class, how are you envisioning clients creating Application instances that injected code into Database methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a virtual function ApplicationImpl::createDatabase
(see createInvariantManager
, createOverlayManager
, and createOverlayManager
for reference) and overload this in the derived class. Then you could use basically the same injection mechanism as you have here, where the derived Application
takes a functor which passes it to the derived Database
upon construction, and the functor is called before schema upgrade.
I admit it's a bit less direct. It has the disadvantage of requiring a bit more boiler-plate. But it has the advantage of having better separation of concerns and being more composable. I'm not sure how much it matters here, but I figured we should at least discuss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha -- I had briefly mulled over trying to work out how to do that, but I was afraid it would come across as too heavyweight, and didn't get as far as noticing that we had precedent!
Though the result is more verbose, "having better separation of concerns and being more composable" is music to my ears, so I've taken your suggestion. Does the result look sane?
I'm attaching a couple "metrics" collections from running for a while on each of v12 and v13. This is my first time trying to interpret our metrics, but after looking over the database.* and ledger.* ones that appear to be measuring the database operations that involve the tables that I've changed, I'm not seeing anything that looks like a statistically |
Jon pointed out that, although we mostly expect PostgreSQL to be our production-use database, we're not sure that no one's using SQLite in production, and the latter behaves differently during the upgrade because it doesn't let us drop columns, so we NULL out that field of all rows instead, which is a much larger operation than anything we do in the PostgreSQL upgrade. Therefore, I'd better test the SQLite upgrade on the real pubnet database too. I've now done so, and on owlbear, the SQLite schema upgrade took 2m13s. I'd happened to guess based on other measurements that it would be somewhere around 2m, and asked Nicolas whether it would be alright if it were, and he said that that would be fine, so I am venturing to hope that 2m13s is fine too. |
Jon pointed out that catchup performance would provide a strongly apples-to-apples comparison between v12 and v13, since it would involve applying exactly the same buckets; he suggested the command |
16b6814
to
5ac37ea
Compare
Commits have been re-squashed. ...Non-destructively, I hope. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find actual bugs, good job.
Mostly cosmetic issues from what I see.
In particular, the commit history is quite hard to follow as it's only partially squashed.
Given the changes, I think it's probably easier to squash everything and resplit instead of propagating a subset of changes across commits (but you can try to do this as an exercise if you've never done it).
src/database/Database.cpp
Outdated
{ | ||
auto st_update = getPreparedStatement(updateStr).statement(); | ||
|
||
for (auto recT : in) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto & recT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -252,6 +258,33 @@ Database::applySchemaUpgrade(unsigned long vers) | |||
// the accountbalances index around. | |||
mSession << "DROP INDEX IF EXISTS accountbalances"; | |||
break; | |||
case 13: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's open an issue to collapse all schemas older than 12 into just 12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; see issue #2601 .
@@ -9,10 +9,13 @@ | |||
#include "main/Application.h" | |||
#include "main/Config.h" | |||
#include "overlay/StellarXDR.h" | |||
#include "util/Decoder.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (for the future): commit description is a bit verbose, just give a short summary. If a reader wants to get the details, they can look at the actual diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I'll keep that in mind when I squash and re-split. Verbosity is a habit it might take me some time to break, though. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...I didn't actually shorten them much this time. Sorry about that.
src/database/Database.cpp
Outdated
{ | ||
// Add columns for the LedgerEntry extension to each of | ||
// the tables that stores a type of ledger entry. | ||
addTextColumn("accounts", ledgerExtName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you forgot to update schema.md
to match this latest schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all the changes got reflected in my later commit to db-schema.md
, right?
src/database/Database.cpp
Outdated
{ | ||
std::string addColumnStr("ALTER TABLE " + table + " ADD " + column + | ||
" TEXT;"); | ||
CLOG(INFO, "Database") << "Adding column with string '" << addColumnStr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why doesn't it just say "Adding column '" << column << "'"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general problem in this file: you're logging "how the sausage is made", which is too verbose for normal operators (and nobody really cares about those details)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I did it that way because I occasionally wrote syntax errors and wanted to see exactly what I'd done wrong (the soci
exceptions generally don't say). But it makes sense that for a final version, operators wouldn't care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the log messages in that file to try to constrain them to "what" and not "how".
src/ledger/LedgerTxnDataSQL.cpp
Outdated
"ON CONFLICT (accountid, dataname) DO UPDATE SET " | ||
"datavalue = excluded.datavalue, " | ||
"lastmodified = excluded.lastmodified, " | ||
"extension = excluded.extension, ledgerext " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here ledgerext = excluded.ledgerext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
src/ledger/LedgerTxnOfferSQL.cpp
Outdated
"price = excluded.price, " | ||
"flags = excluded.flags, " | ||
"lastmodified = excluded.lastmodified, " | ||
"extension = excluded.extension, ledgerext " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ledgerext = excluded.ledgerext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
src/main/Application.h
Outdated
@@ -278,8 +278,10 @@ class Application | |||
|
|||
virtual AbstractLedgerTxnParent& getLedgerTxnRoot() = 0; | |||
|
|||
// For tests to perform operations on databases while they're | |||
// still using old schemas. | |||
// Give clients the opportunity to perform operations on databases while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, I think.
@@ -214,6 +218,16 @@ class Database : NonMovableOrCopyable | |||
// Access the optional SOCI connection pool available for worker | |||
// threads. Throws an error if !canUsePool(). | |||
soci::connection_pool& getPool(); | |||
|
|||
protected: | |||
// Give clients the opportunity to perform operations on databases while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the application test hook that you removed here should just be removed from history as you never used it (it's just extra commits and churn for no reason)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -51,8 +51,8 @@ homedomain | VARCHAR(44) | (BASE64) | |||
thresholds | TEXT | (BASE64) | |||
flags | INT NOT NULL | | |||
lastmodified | INT NOT NULL | lastModifiedLedgerSeq | |||
buyingliabilities | BIGINT CHECK (buyingliabilities >= 0) | | |||
sellingliabilities | BIGINT CHECK (sellingliabilities >= 0) | | |||
extension | TEXT | Extension specific to AccountEntry (XDR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
squash this commit into the commit with the schema change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Yeah, I wasn't sure what I should do about later commits that affected multiple previous ones. I wondered whether I should try to squash them all into one and then split them up again. Should I do that before merge? |
4676839
to
669bb99
Compare
Add a mechanism, using the Factory pattern, for production code to create injection points that can call back into test-defined code (or perhaps other applications in the future; parameters used by the injected code can vary with each test or other use, including, in effect, the method called, by the use of lambdas). A test client can create one derived class of Database for each injection point or combination of injection points (into methods of class Database -- other classes could also use this mechanism for injecting code into their methods) to override with methods that do something. The derived class's constructor may take any list of parameter types; each call to createTestApplication() (or Application::create()) passes its own list of parameters of those types. The derived class may store the parameters received by its constructor, so that it is effectively called with parameters even though the production code calls a method with no parameters. This means in particular that TestApplication itself does not need to acquire new members to allow tests to store parameters to injection point methods. A single derived class of Database may be used by multiple test cases even to call, in effect, different injected methods, because the parameters passed by createTestApplication() to the constructor of the derived class of TestApplication may include lambdas. This checkin also adds one use of the new mechanism, actBeforeDBSchemaUpgrade(), by production code to create an injection point.
Introduce three generic functions for operating on database records in bulk: - selectMap() takes a function which maps a SQL record to an element of a client-defined (templated) type, and produces a function which takes a SQL SELECT statement and produces a vector of the client-defined type comprising the list of selected records mapped through the client-defined function. - updateMap() takes a function which maps an element of a client-defined (templated) type to a SQL UPDATE statement, and produces a function which generates a series of SQL UPDATE statements from a vector of the client-defined type. - selectUpdateMap() produces the composition of updateMap() following selectMap(), namely, a function which takes a SQL SELECT statement and produces for each selected record an UPDATE statement derived from a transformation applied to the record via client-defined functions which interpret the record as an element of a client-defined type and then use that interpretation to choose some update to perform on it.
Upgrade the DB schema from 12 to 13. The new schema makes all of the extensions in all ledger entry types opaque to the database code itself. Those extensions are the extension to LedgerEntry itself, which is stored in the table for each LedgerEntry type (account, trustline, data, offer), and the extensions which are specific to each type. The idea of making the extensions opaque is that they can be extended in the future without further changes to either the DB schema or the DB code, as the database itself will simply continue to treat the extensions as variable-length, opaque XDR. The cost is that we can't query on components of the extensions (which we don't do currently, and won't do with any of the extensions that we currently have planned). The general LedgerEntry extension, as well as the extensions to DataEntry and OfferEntry, are currently unused -- they are unions with the only version being 0 and no content besides that fixed version number, and they are not stored in any database columns at all. The upgrade code therefore simply creates columns for them and populates all of the cells of the new columns with base-64 encodings of opaque XDR representing that trivial tagged union. The two extensions that are already used are those specific to AccountEntry and TrustLineEntry, both of which contain the same thing, namely a Liabilities structure (which contains two fields, one for buying and one for selling). Those extensions are tagged unions with a genuine version option -- version 0 for entries that have never yet had liabilities, and then version 1 when liabilities are first created for the entry. Existing databases have NULL in the "buyingliabilities" and "sellingliabilities" columns for the version-0 entries, and non-NULL values, namely base-64 encodings of the XDR for the Liabilities structure, for the version-1 entries. Therefore, the upgrade code for AccountEntry and TrustLineEntry performs a conversion: for each record, it reads the "buyingliabilities" and "sellingliabilities" columns, which are 64-bit integers transparent to the database, and generates encoded XDR, opaque to the database, representing an extension field which, in those cases, has a Liabilities field, and has the values read from the old columns filled in. (If the old liabilities fields for a record are NULL, then the new "extension" column for that record is also NULL -- the new live database code will recognize that NULL values of those two extensions are possible.) Once all of the records have had their new "extension" columns written, the upgrade code deletes the old "buyingliabilities" and "sellingliabilities" columns.
Modify the live database code for all of the LedgerEntry types -- AccountEntry, TrustLineEntry, DataEntry, and OfferEntry -- to use database schema 13, in which the extensions for each of those individual entry types, as well as the extension for LedgerEntry itself which is also stored in the table for whatever specific type a given LedgerEntry has, are just base-64-encoded opaque XDR as far as the database is concerned, regardless of how the ledger-transaction code inteprets those extensions, now or in the future.
Test upgrades to database schema 13, in which all extensions in all ledger entry types become opaque XDR as far as the database is concerned. The tests use the test-code-injection mechanism which allow them to run arbitrary code between the creation of a database (which is done in an old schema version, MIN_SCHEMA_VERSION) and the upgrading of that database to the current schema version (SCHEMA_VERSION). They create ledger entries of each of the four types, using raw SQL so that they can create them in the old format which the live production code no longer uses. After the database upgrade completes (which it has done by the time that createTestApplication() returns), the tests validate that the contents of the ledger entries are intact and logically unchanged (although their representations in the database have changed, with some columns having been created and some having been deleted, and some contents being encoded differently).
Done; commits have been squashed and re-split. |
r+ 5801f31 |
Update DB schema to make all ledger entry extensions opaque
Resolves #2570
This pull request changes the database schema as proposed in issue #2570 by converting the extension fields of AccountEntry, TrustLineEntry, DataEntry, OfferEntry, and LedgerEntry into opaque XDR. It upgrades old databases by copying the existing transparent extensions (which are Liabilities fields in the case of AccountEntry and TrustLineEntry, and empty in the other cases) into new columns, and then deleting the column containing the transparent extensions. (Extensions that were empty had no old columns; they simply gain new ones.)
The PR introduces a test which creates a mechanism for injecting code into points during application creation, in this case, between the creation of the database (which we do with an old schema) or the loading of an old database and the upgrading of the database to the new schema. It injects code to use raw SQL to create ledger entries of all types while the database is in the old format. Then, after the application initialization has completed, and the database has been upgraded, it uses the standard interfaces to load the accounts, trustlines, key-value data, and offers, and validate that they have been upgraded as expected (accounts or trustlines which had NULL Liabilities fields will now have NULL extension fields, while those which had non-NULL Liabilities will have non-NULL opaque extensions which decode to Liabilities fields equal to the old ones).
See #2584 for conversations about a previous version of this pull request, which is made obsolete by this one (which has had its history cleaned up, as well as having had extensive revisions since PR 2584).
I'm starting out by filing this PR as a draft because I still have to do some performance investigations on large, real-world databases; possible questions include:
Checklist
clang-format
v5.0.0 (viamake format
or the Visual Studio extension)