-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.in
335 lines (281 loc) · 11.8 KB
/
README.in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
## trasco
A minimalist system for relational database schema updates.
### Features
* Declare schemas, and upgrades to those schemas.
* Reliably and atomically update databases.
* Written in pure Java 17.
* [OSGi](https://www.osgi.org/) ready.
* [JPMS](https://en.wikipedia.org/wiki/Java_Platform_Module_System) ready.
* ISC license.
* High-coverage automated test suite.
### Motivation
Most relational databases involve declaring a schema ahead of time, detailing
the tables, views, and functions. Most applications, as they evolve, will
require changes to be made to those schemas over time. The `trasco` package
provides a minimal system for performing schema upgrades.
The package assumes the use of a database with transactional DDL. That is,
DDL commands such as `CREATE TABLE`, `ALTER TABLE`, and etc, can be executed
inside transactions and committed or rolled back as necessary. If you are not
using a database that supports transactional DDL, every upgrade has the
potential to damage your database whether or not you use `trasco`;
use a better database.
Databases known to support transactional DDL at the time of writing include:
* [PostgreSQL](https://www.postgresql.org/)
* [Apache Derby](https://db.apache.org/derby/)
* [SQLite](https://www.sqlite.org/index.html)
### Building
```
$ mvn clean verify
```
### Usage
The main classes used in the `trasco` package are the `TrExecutor` and
`TrSchemaRevisionSet` class. A `TrSchemaRevisionSet` contains a complete
set of versioned revisions to a database schema, starting from the initial
database state.
The `TrSchemaRevisionSetParser` class can produce a `TrSchemaRevisionSet`
from an XML file following the [included XSD schema](com.io7m.trasco.xml.schemas/src/main/resources/com/io7m/trasco/xml/schemas/statements-1.xsd).
The `trasco` package expects you to be able to store the current database
schema version and application name somewhere in the database. Typically,
applications will create a `schema_version` table as part of the initial
schema. For example:
```
<Schemas xmlns="urn:com.io7m.trasco.database.statements:1:0">
<Schema versionCurrent="0">
<Comment>
The schema version table stores the current version of the database schema. Implementations are expected to query
this table on connecting to the database in order to ensure that the calling code is compatible with the tables in
the database.
</Comment>
<Statement><![CDATA[
create table schema_version (
version_lock char(1) not null default 'X',
version_application_id text not null,
version_number bigint not null,
constraint check_lock_primary primary key (version_lock),
constraint check_lock_locked check (version_lock = 'X')
)
]]></Statement>
</Schema>
...
```
The initial schema version `0` declares a `schema_version` table that carries
a constraint that restricts it to consisting of a single row. Storing the
application ID is useful to prevent accidental errors in environments where
multiple databases are present; if a user points the package at the wrong
database, the mistake can be caught before any upgrades are attempted.
In order to actually populate and upgrade databases, the `TrExecutor` class
consumes a `TrSchemaRevisionSet` and runs each revision update in order.
The `TrExecutor` takes pair of functions that, given a database connection,
retrieve the current schema version, and set the current schema version,
respectively. For example, a function to retrieve the schema version given
a PostgreSQL database with the table declaration above looks like:
```
private static final String DATABASE_APPLICATION_ID = ...
private static Optional<BigInteger> schemaVersionGet(
final Connection connection)
throws SQLException
{
Objects.requireNonNull(connection, "connection");
try {
final var statementText =
"SELECT version_application_id, version_number FROM schema_version";
LOG.debug("execute: {}", statementText);
try (var statement = connection.prepareStatement(statementText)) {
try (var result = statement.executeQuery()) {
if (!result.next()) {
throw new SQLException("schema_version table is empty!");
}
final var applicationId =
result.getString(1);
final var version =
result.getLong(2);
if (!Objects.equals(applicationId, DATABASE_APPLICATION_ID)) {
throw new SQLException(
String.format(
"Database application ID is %s but should be %s",
applicationId,
DATABASE_APPLICATION_ID
)
);
}
return Optional.of(valueOf(version));
}
}
} catch (final SQLException e) {
final var state = e.getSQLState();
if (state == null) {
throw e;
}
if (state.equals(PSQLState.UNDEFINED_TABLE.getState())) {
connection.rollback();
return Optional.empty();
}
throw e;
}
}
```
The function takes into account that a completely fresh database might not
have a `schema_version` table, and returns `Optional.empty()` accordingly. This
will cause the `TrExecutor` class to assume that the database is fresh, and
start upgrades from the initial revision.
The function to set the database schema version is unsurprising:
```
private static void schemaVersionSet(
final BigInteger version,
final Connection connection)
throws SQLException
{
final String statementText;
if (Objects.equals(version, BigInteger.ZERO)) {
statementText = "insert into schema_version (version_application_id, version_number) values (?, ?)";
try (var statement =
connection.prepareStatement(statementText)) {
statement.setString(1, DATABASE_APPLICATION_ID);
statement.setLong(2, version.longValueExact());
statement.execute();
}
} else {
statementText = "update schema_version set version_number = ?";
try (var statement =
connection.prepareStatement(statementText)) {
statement.setLong(1, version.longValueExact());
statement.execute();
}
}
}
```
If the schema version is `0`, we insert the schema version into the table.
Otherwise, we update the existing single row in the table.
The `TrExecutor` class should be used with connections that do not auto-commit.
The class itself will _not_ commit any changes by itself, allowing database
upgrades to be completely atomic assuming that the underlying database supports
transactional DDL. The use of the `trasco` package on databases without
transactional DDL is strongly discouraged; in fact the use of databases
without transactional DDL is strongly discouraged in any situation.
Assuming the functions given above, the `TrExecutor` class can be used:
```
TrSchemaRevisionSet revisions;
DataSource datasource;
try (var connection = dataSource.getConnection()) {
connection.setAutoCommit(false);
new TrExecutors().create(
new TrExecutorConfiguration(
Example::schemaVersionGet,
Example::schemaVersionSet,
event -> { },
revisions,
PERFORM_UPGRADES,
TrArguments.empty(),
connection
)
).execute();
connection.commit();
}
```
The `TrExecutor` class publishes events detailing its current progress. These
can be ignored (as in the example) if not needed. Additionally, the
`TrExecutor` can be told to either perform upgrades, or simply fail if the
database is not already at the correct version. Failing immediately can be
useful in the case of read-only databases, or where code simply wants to check
if a target database is at the right version or not.
If any part of the upgrade fails, the `TrExecutor` class raises an exception
and the database will be left in its original state prior to _any_ revision
upgrades.
### Parameterized Statements
Some statements may require the use of configurable parameters. For example,
PostgreSQL's full text search feature requires indexes to be created with
the name of the language used as a parameter (`'english'`, for example). Schemas
can have typed, named parameters that are supplied with values at runtime:
```
<Schemas xmlns="urn:com.io7m.trasco.database.statements:1:0">
<Parameters>
<Parameter name="number0" type="NUMERIC"/>
<Parameter name="number1" type="NUMERIC"/>
<Parameter name="number2" type="NUMERIC"/>
<Parameter name="string0" type="STRING"/>
<Parameter name="number3" type="NUMERIC"/>
</Parameters>
<Schema versionCurrent="0">
<Statement><![CDATA[
create table x (f0 integer, f1 bigint, f2 varchar(100), f3 double, f4 decimal)
]]></Statement>
<StatementParameterized>
<ParameterReferences>
<ParameterReference order="0" name="number0"/>
<ParameterReference order="1" name="number1"/>
<ParameterReference order="2" name="string0"/>
<ParameterReference order="3" name="number2"/>
<ParameterReference order="4" name="number3"/>
</ParameterReferences>
<Text><![CDATA[
insert into x values (?, ?, ?, ?, ?)
]]></Text>
</StatementParameterized>
</Schema>
</Schemas>
```
Parameters have a name and a type, and are scoped over all `Schema` declarations
in the `Schemas` set. A `StatementParameterized` is an SQL statement that
takes a list of parameter references. A `ParameterReference` refers to
a parameter declared in the `Parameters` section, and has a defined `order`.
In the example above, the value the programmer supplies to the `number0`
parameter will be passed as the first parameter to the parameterized SQL
statement.
```
new TrExecutors().create(
new TrExecutorConfiguration(
Example::schemaVersionGet,
Example::schemaVersionSet,
event -> { },
revisions,
PERFORM_UPGRADES,
new TrArguments(
Map.ofEntries(
Map.entry("number0", new TrArgumentNumeric("number0", 23)),
Map.entry("number1", new TrArgumentNumeric("number1", 23L)),
Map.entry("string0", new TrArgumentString("string0", "23")),
Map.entry("number2", new TrArgumentNumeric("number2", 23.0)),
Map.entry("number3", new TrArgumentNumeric("number3", BigDecimal.valueOf(23.0)))
)
),
connection
)
).execute();
```
Failing to provide parameters, or providing parameters of the wrong type,
is an error.
#### Parameter Interpolation
By default, `trasco` uses prepared statements to execute schema statements.
For some kinds of statements, on some databases, this will cause problems.
For example, some kinds of DDL statements on PostgreSQL (such as `CREATE INDEX`)
cannot have parameters bound using JDBC. For example:
```
try (var st = connection.prepareStatement("CREATE INDEX ? ON t (c)")) {
st.setString(1, "index_name");
st.execute();
}
```
The above code will fail with an error similar to `ERROR: there is no parameter $1`.
Unfortunately, to deal with this kind of statement, it is necessary to manually
build an SQL statement. The `trasco` package offers a `STRING_FORMATTING`
interpolation mode that safely builds query strings with the proper escapes.
Specify an `interpolation` attribute with a value of `STRING_FORMATTING`
for a particular parameterized statement, and use `%s` format string
parameters instead of the standard `?` JDBC parameters:
```
<StatementParameterized parameterInterpolation="STRING_FORMATTING">
<ParameterReferences>
<ParameterReference order="0" name="number0"/>
<ParameterReference order="1" name="number1"/>
<ParameterReference order="2" name="string0"/>
<ParameterReference order="3" name="number2"/>
<ParameterReference order="4" name="number3"/>
</ParameterReferences>
<Text><![CDATA[
insert into x values (%s, %s, %s, %s, %s)
]]></Text>
</StatementParameterized>
```
The statement will be interpolated manually using `String.format()` and
the Apache Commons Text string escape functions to safely escape characters
in strings. The resulting text will be passed to JDBC directly.