label | synopsis | status |
---|---|---|
Databases |
Class <code>cds.DatabaseService</code> and subclasses thereof are technical services representing persistent storage.
|
released |
[[toc]]
Creates a transaction out of the DatabaseService. Is called by cds.tx
automatically.
Commits all the write operations executed on the transaction. Is called by cds.tx
automatically.
Reverts all the write operations executed on the transaction. Is called by cds.tx
automatically.
The factory property provides an object which implements the pool APIs. Allowing the current DatabaseService to re use the physical database connections improving overall performance.
Creates a connection for the current DatabaseService. Is called by begin automatically.
Releases the connection for the current DatabaseService. Allowing the connection to be used again for future acquire calls. Is Called by commit and rollback automatically.
Destroys the connection for the current DatabaseService. Can be called to completely remove the connection from the DatabaseService. Is required to be called when the connection transaction is in an unknown
state.
Disconnects all connections of the current DatabaseService. This is required for database connections that leverage network connections. As the connections are kept alive while inside the pool
. Which prevents the process from reaching the idle
state required for the processes to grasefully shutdown. Is called by cds.on('shutdown')
automatically.
Uses the current DatabaseService model to infer
a provided cds.ql.Query
. It is best to use cds.infer
instead. (deprecated?)
Sets the provided key value pairs as variables for the current connection. The values can be access by using the session_context('<key>')
function inside any Queries. Is called by begin automatically.
Runs the provided cds.ql.Query
using the current DatabaseService.
- On INSERT, DatabaseServices return an instance of
InsertResult
defined as follows:- Iterator that returns the keys of the created entries, for example:
- Example:
[...result]
->[{ ID: 1 }, { ID: 2 }, ...]
- In case of
INSERT...as(SELECT...)
, the iterator returns{}
for each row
- Example:
affectedRows
: the number inserted (root) entries or the number of affectedRows in case of INSERT into SELECTvalueOf()
: returnsaffectedRows
such that comparisons likeresult > 0
can be used ::: tip===
can't be used as it also compares the type :::
- Iterator that returns the keys of the created entries, for example:
Instead of opening and closing a database connection for every request, we use a pool to reuse connections. By default, the following pool configuration is used:
{
"acquireTimeoutMillis": <if (NODE_ENV='production') 1000 else 10000>,
"evictionRunIntervalMillis": <2 * (idleTimeoutMillis || softIdleTimeoutMillis || 30000)>,
"min": 0,
"max": 100,
"numTestsPerEvictionRun": <(max - min) / 3>,
"softIdleTimeoutMillis": 30000,
"idleTimeoutMillis": 30000,
"testOnBorrow": true,
"fifo": false
}
::: warning
This default pool configuration does not apply to @cap-js
database implementations.
:::
The generic-pool has a built-in pool evictor, which inspects idle database connections in the pool and destroys them if they are too old.
The following parameters are provided in the pool configuration:
-
acquireTimeoutMillis: The parameter specifies how much time it is allowed to wait an existing connection is fetched from the pool or a new connection is established.
-
evictionRunIntervalMillis: The parameter specifies how often to run eviction checks. In case of 0 the check is not run.
-
min: Minimum number of database connections to keep in pool at any given time. ::: warning This should be kept at the default 0. Otherwise every eviction run destroys all unused connections older than
idleTimeoutMillis
and afterwards creates new connections untilmin
is reached. ::: -
max: Maximum number of database connections to keep in pool at any given time.
-
numTestsPerEvictionRun: Number of database connections to be checked with one eviction run.
-
softIdleTimeoutMillis: Amount of time database connection may sit idle in the pool before it is eligible for eviction. At least "min" connections should stay in the pool. In case of -1 no connection can get evicted.
-
idleTimeoutMillis: The minimum amount of time that a database connection may stay idle in the pool before it is eligible for eviction due to idle time. This parameter supercedes softIdleTimeoutMillis.
-
testOnBorrow: Should the pool validate the database connections before giving them to the clients?
-
fifo: If false, the most recently released resources will be the first to be allocated (stack). If true, the oldest resources will be first to be allocated (queue). Default value: false.
Pool configuration can be adjusted by setting the pool
option as shown in the following example:
{
"cds": {
"requires": {
"db": {
"kind": "hana",
"pool": {
"acquireTimeoutMillis": 5000,
"min": 0,
"max": 100,
"fifo": true
}
}
}
}
}
::: warning ❗ Warning The parameters are very specific to the current technical setup, such as the application environment and database location. Even though we provide a default pool configuration, we expect that each application provides its own configuration based on its specific needs. :::
The main use case of upsert is data replication. Upsert updates existing entity records from the given data or inserts new ones if they don't exist in the database.
UPSERT
statements can be created with the UPSERT query API:
UPSERT.into('db.Books')
.entries({ ID: 4711, title: 'Wuthering Heights', stock: 100 })
UPSERT
queries are translated into DB native upsert statements, more specifically they unfold to an UPSERT SQL statement on SAP HANA and to an INSERT ON CONFLICT SQL statement on SQLite.
- The upsert data must contain all key elements of the entity.
- If upsert data is incomplete only the given values are updated or inserted, which means the
UPSERT
statement has "PATCH semantics". UPSERT
statements don't have a where clause. The key values of the entity that is upserted are extracted from the data.
The following actions are not performed on upsert:
- UUID key values are not generated.
- Generic CAP handlers, such as audit logging, are not invoked.
::: warning In contrast to the Java runtime, deep upserts and delta payloads are not yet supported. :::
The node.js
DatabaseService core class is implemented by the @cap-js/db-service
module. With more database specific implementation in @cap-js/hana
, @cap-js/sqlite
and @cap-js/postgres
. Which can be used interchangably depening on the underlying database used.
The core principle of the @cap-js
database services is "don't look at the data". As the database services are the foundational service of all CAP applications the performance of these services is especially important. The heaviest work the database service has to do is handling the data
.
In CAP applications all the data
uses the JSON
format. It would be nice if the databases could understand the same format. As this would allow the CAP applications to not transform the data
between different formats. While SQL
doesn't specify how data
should be stored by the database implementing the specification. It does provide certain paradigms which require computationally heavy operations. Which has most implementations pick heavily optimized internal data
formats that allow for improved performance. Over time the JSON
format has gained wide popularity and has resulted in many modern databases implement the specification. Which allows CAP applications to convey its intentions to the database through these JSON
APIs. Removing the need to transform the data
when reading or writing.
It is important to understand the special challenges that come with using javascript. As most database protocols use their internal data
format to communicate with clients. It is required for the javascript client to convert the javascript types into the database native binary format.
Probably the most simplistic data type for all programming languages and databases will be an integer
. For javascript this type doesn't actually exist it is a subset of the Number
type. Therefor when the database driver has to convert the Number
type into an integer
it has to do more work then you might expect. To give a real reference the publicly available hdb
driver will be used. Which has an implementation for the int32
, int64
and int128
types. Here is a breakdown of the amount of objects and operations each type require before they can be send to the database.
int32
- Objects: 1 (Buffer)
- Operations: 1 (Function)
int64
- Objects: 14 (dynamic Numbers) 40 (static Numbers) 1 (String)
- Operations: 90 (operators) 11 (Functions)
int128
- Objects: 22 (dynamic Numbers) 80 (static Numbers) 1 (String)
- Operations: 192 (operators) 18 (Functions)
As for comparison when these types are used in a compiled language there are no operations required it will be a pointer. With the only exception being when the database and client use a different endianness. In which case one of them has to swap the bytes around.
When reading data
from the database the new implementations rely on the database responding in the JSON
format. Allowing the CAP application to not have to do any postprocessing on the response. It is possible for the ODataService to take the result of the DatabaseService as is. The way this is achieved is by using output converts which are baked into the SQL
statement. Allowing the database to convert the internal format into the OData specified format of that type. The output converter is a function attached to the element of any entity or query. Which enables protocol adapters to generate queries with protocol specific converters. The output converters are database specific so depending on the database internal data
structure the output converter might be more or less computationally intensive or be completely omitted when the database is OData compliant.
Another big benefit that reading in the JSON
format directly from the database enables. Is the ability to read deep nested data structures. While SQL
only allows a single scalar value to be selected by sub queries. By converting a multi column / row result into a single JSON
result it is possible to create database native expand
queries. Which means that the database can optimize the execution better. Where in the past it was required for the application to send multiple requests for the different levels of expand
queries or convert flattened to-one expand
queries.
When the CAP applications know that there is no post processing required (e.g. no after
handlers). It can skip the JSON.parse
and JSON.stringify
which are required to grand javascript access to the result for manipulations. With the ultimate goal of not having to load the whole result into memory, but instead stream the database result chunks directly into the http response connection. Allowing CAP applications to handle much larger data
sets while using significantly less memory and cpu.
Using the JSON
format for writing operations comes with many cascading improvements.
When using the standard way of executing INSERT
and UPSERT
queries. It is required to match the SQL
to the data structure. Which goes against the main principle of the new database services.
A simple example of the impact is as follows:
await INSERT([
{ID:1},
{ID:2, name: ''},
{ID:3, descr: ''}
])
Which will actually prepare
and execute
the following queries:
INSERT INTO (ID) VALUES (?)
INSERT INTO (ID,name) VALUES (?,?)
INSERT INTO (ID,descr) VALUES (?,?)
Where the usage of the JSON
format allows the database services to only prepare
and execute
the following query:
INSERT INTO (ID,name,descr,...) AS SELECT ID, name, descr,... FROM JSON_EACH(?)
Where the placeholder will be provided with the JSON.stringify
of the provided entries
. Which means that the query will only be executed once. Reducing the number of network round trips required to do the same amount of work. Greatly improving through put as multiple rows can fit within a single network packet and the transformation to a JSON
string is much cheaper then the native transformation steps as mentioned in the transform section.
By having a single JSON
placeholder it is possible to stream the dataset through the CAP application. By taking the req
object which natively is a Readable
stream and providing it as the entries
of the INSERT
statement.
app.post('/upload', async (req,res) => {
try{
await INSERT(req).into('entity')
res.status(201)
res.end()
} catch (err) {
res.status(400)
res.end(err.message)
}
})
As the JSON
is converted into an intermediate table it is also possible to improve the way that UPSERT
statements are handled. It has all the same benefits as INSERT
query have and a bit more.
UPSERT INTO (ID,name,createdAt,modifiedAt,...) AS
SELECT
ID,
new.name ? new.name : old.name, -- only update name when provided
old.ID ? old.createdAt : $now, -- only apply @cds.on.insert when no OLD entry exists
old.ID ? $now : null -- only apply @cds.on.update when an OLD entry exists
FROM JSON_EACH(?) AS NEW
JOIN OLD ON OLD.ID = NEW.ID
For @cap-js/hana
specifically there is a JSON
optimization that assists HANA in re-using execution plans. As HANA has a very advanced execution plan optimizer it is very valuable to be able to re-use the already existing execution plans. One kind of query was always preventing HANA from using the existing execution plans as the SQL
query would always change based upon the data provided.
const IDs = [{val:1},{val:2},...]
cds.ql`SELECT * FROM ${entity} WHERE ID in ${IDs}`
Which would create a slightly different variant of the following query based upon then number of val
s provided:
SELECT * FROM entity WHERE ID IN (?,?,...)
Where now this query will always prodice the same SQL
statement. Allowing HANA to use the existing execution plan.
SELECT * FROM entity WHERE ID IN (SELECT VAL FROM JSON_TABLE(?,'$' COLUMNS(VAL DOUBLE PATH '$.val')))
This documentation is not complete yet, or the APIs are not released for general availability. Stay tuned to upcoming releases for further updates.