Skip to content

Latest commit

 

History

History
316 lines (210 loc) · 16.8 KB

databases.md

File metadata and controls

316 lines (210 loc) · 16.8 KB
label synopsis status
Databases
Class <code>cds.DatabaseService</code> and subclasses thereof are technical services representing persistent storage.
released

Database Services

[[toc]]

cds.DatabaseService class {#cds-db-service }

class cds.DatabaseService extends cds.Service

srv.begin () → this {#db-begin }

Creates a transaction out of the DatabaseService. Is called by cds.tx automatically.

srv.commit () → this {#db-commit }

Commits all the write operations executed on the transaction. Is called by cds.tx automatically.

srv.rollback () → this {#db-rollback }

Reverts all the write operations executed on the transaction. Is called by cds.tx automatically.

srv.factory () → this {#db-factory }

The factory property provides an object which implements the pool APIs. Allowing the current DatabaseService to re use the physical database connections improving overall performance.

srv.acquire () → this {#db-acquire }

Creates a connection for the current DatabaseService. Is called by begin automatically.

srv.release () → this {#db-release }

Releases the connection for the current DatabaseService. Allowing the connection to be used again for future acquire calls. Is Called by commit and rollback automatically.

srv.destroy () → this {#db-destroy }

Destroys the connection for the current DatabaseService. Can be called to completely remove the connection from the DatabaseService. Is required to be called when the connection transaction is in an unknown state.

srv.disconnect () → this {#db-disconnect }

Disconnects all connections of the current DatabaseService. This is required for database connections that leverage network connections. As the connections are kept alive while inside the pool. Which prevents the process from reaching the idle state required for the processes to grasefully shutdown. Is called by cds.on('shutdown') automatically.

srv.infer () → this {#db-infer }

Uses the current DatabaseService model to infer a provided cds.ql.Query. It is best to use cds.infer instead. (deprecated?)

srv.set () → this {#db-set }

Sets the provided key value pairs as variables for the current connection. The values can be access by using the session_context('<key>') function inside any Queries. Is called by begin automatically.

srv.run () → this {#db-run }

Runs the provided cds.ql.Query using the current DatabaseService.

cds.DatabaseService — Consumption {#databaseservice-consumption }

InsertResult (Beta)

  • On INSERT, DatabaseServices return an instance of InsertResult defined as follows:
    • Iterator that returns the keys of the created entries, for example:
      • Example: [...result] -> [{ ID: 1 }, { ID: 2 }, ...]
      • In case of INSERT...as(SELECT...), the iterator returns {} for each row
    • affectedRows: the number inserted (root) entries or the number of affectedRows in case of INSERT into SELECT
    • valueOf(): returns affectedRows such that comparisons like result > 0 can be used ::: tip === can't be used as it also compares the type :::

cds.DatabaseService — Configuration {#databaseservice-configuration }

Pool

Instead of opening and closing a database connection for every request, we use a pool to reuse connections. By default, the following pool configuration is used:

{
  "acquireTimeoutMillis": <if (NODE_ENV='production') 1000 else 10000>,
  "evictionRunIntervalMillis": <2 * (idleTimeoutMillis || softIdleTimeoutMillis || 30000)>,
  "min": 0,
  "max": 100,
  "numTestsPerEvictionRun": <(max - min) / 3>,
  "softIdleTimeoutMillis": 30000,
  "idleTimeoutMillis": 30000,
  "testOnBorrow": true,
  "fifo": false
}

::: warning This default pool configuration does not apply to @cap-js database implementations. :::

The generic-pool has a built-in pool evictor, which inspects idle database connections in the pool and destroys them if they are too old.

The following parameters are provided in the pool configuration:

  • acquireTimeoutMillis: The parameter specifies how much time it is allowed to wait an existing connection is fetched from the pool or a new connection is established.

  • evictionRunIntervalMillis: The parameter specifies how often to run eviction checks. In case of 0 the check is not run.

  • min: Minimum number of database connections to keep in pool at any given time. ::: warning This should be kept at the default 0. Otherwise every eviction run destroys all unused connections older than idleTimeoutMillis and afterwards creates new connections until min is reached. :::

  • max: Maximum number of database connections to keep in pool at any given time.

  • numTestsPerEvictionRun: Number of database connections to be checked with one eviction run.

  • softIdleTimeoutMillis: Amount of time database connection may sit idle in the pool before it is eligible for eviction. At least "min" connections should stay in the pool. In case of -1 no connection can get evicted.

  • idleTimeoutMillis: The minimum amount of time that a database connection may stay idle in the pool before it is eligible for eviction due to idle time. This parameter supercedes softIdleTimeoutMillis.

  • testOnBorrow: Should the pool validate the database connections before giving them to the clients?

  • fifo: If false, the most recently released resources will be the first to be allocated (stack). If true, the oldest resources will be first to be allocated (queue). Default value: false.

Pool configuration can be adjusted by setting the pool option as shown in the following example:

{
  "cds": {
    "requires": {
      "db": {
        "kind": "hana",
        "pool": {
          "acquireTimeoutMillis": 5000,
          "min": 0,
          "max": 100,
          "fifo": true
        }
      }
    }
  }
}

::: warning ❗ Warning The parameters are very specific to the current technical setup, such as the application environment and database location. Even though we provide a default pool configuration, we expect that each application provides its own configuration based on its specific needs. :::

cds.DatabaseService — UPSERT {#databaseservice-upsert }

The main use case of upsert is data replication. Upsert updates existing entity records from the given data or inserts new ones if they don't exist in the database.

UPSERT statements can be created with the UPSERT query API:

UPSERT.into('db.Books')
  .entries({ ID: 4711, title: 'Wuthering Heights', stock: 100 })

UPSERT queries are translated into DB native upsert statements, more specifically they unfold to an UPSERT SQL statement on SAP HANA and to an INSERT ON CONFLICT SQL statement on SQLite.

  • The upsert data must contain all key elements of the entity.
  • If upsert data is incomplete only the given values are updated or inserted, which means the UPSERT statement has "PATCH semantics".
  • UPSERT statements don't have a where clause. The key values of the entity that is upserted are extracted from the data.

The following actions are not performed on upsert:

  • UUID key values are not generated.
  • Generic CAP handlers, such as audit logging, are not invoked.

::: warning In contrast to the Java runtime, deep upserts and delta payloads are not yet supported. :::

@cap-js/db-service

The node.js DatabaseService core class is implemented by the @cap-js/db-service module. With more database specific implementation in @cap-js/hana, @cap-js/sqlite and @cap-js/postgres. Which can be used interchangably depening on the underlying database used.

Architecture

The core principle of the @cap-js database services is "don't look at the data". As the database services are the foundational service of all CAP applications the performance of these services is especially important. The heaviest work the database service has to do is handling the data.

JSON

In CAP applications all the data uses the JSON format. It would be nice if the databases could understand the same format. As this would allow the CAP applications to not transform the data between different formats. While SQL doesn't specify how data should be stored by the database implementing the specification. It does provide certain paradigms which require computationally heavy operations. Which has most implementations pick heavily optimized internal data formats that allow for improved performance. Over time the JSON format has gained wide popularity and has resulted in many modern databases implement the specification. Which allows CAP applications to convey its intentions to the database through these JSON APIs. Removing the need to transform the data when reading or writing.

Transform {#databaseservice-architecture-transform }

It is important to understand the special challenges that come with using javascript. As most database protocols use their internal data format to communicate with clients. It is required for the javascript client to convert the javascript types into the database native binary format.

Probably the most simplistic data type for all programming languages and databases will be an integer. For javascript this type doesn't actually exist it is a subset of the Number type. Therefor when the database driver has to convert the Number type into an integer it has to do more work then you might expect. To give a real reference the publicly available hdb driver will be used. Which has an implementation for the int32, int64 and int128 types. Here is a breakdown of the amount of objects and operations each type require before they can be send to the database.

int32

  • Objects: 1 (Buffer)
  • Operations: 1 (Function)

int64

  • Objects: 14 (dynamic Numbers) 40 (static Numbers) 1 (String)
  • Operations: 90 (operators) 11 (Functions)

int128

  • Objects: 22 (dynamic Numbers) 80 (static Numbers) 1 (String)
  • Operations: 192 (operators) 18 (Functions)

As for comparison when these types are used in a compiled language there are no operations required it will be a pointer. With the only exception being when the database and client use a different endianness. In which case one of them has to swap the bytes around.

Read

When reading data from the database the new implementations rely on the database responding in the JSON format. Allowing the CAP application to not have to do any postprocessing on the response. It is possible for the ODataService to take the result of the DatabaseService as is. The way this is achieved is by using output converts which are baked into the SQL statement. Allowing the database to convert the internal format into the OData specified format of that type. The output converter is a function attached to the element of any entity or query. Which enables protocol adapters to generate queries with protocol specific converters. The output converters are database specific so depending on the database internal data structure the output converter might be more or less computationally intensive or be completely omitted when the database is OData compliant.

Another big benefit that reading in the JSON format directly from the database enables. Is the ability to read deep nested data structures. While SQL only allows a single scalar value to be selected by sub queries. By converting a multi column / row result into a single JSON result it is possible to create database native expand queries. Which means that the database can optimize the execution better. Where in the past it was required for the application to send multiple requests for the different levels of expand queries or convert flattened to-one expand queries.

When the CAP applications know that there is no post processing required (e.g. no after handlers). It can skip the JSON.parse and JSON.stringify which are required to grand javascript access to the result for manipulations. With the ultimate goal of not having to load the whole result into memory, but instead stream the database result chunks directly into the http response connection. Allowing CAP applications to handle much larger data sets while using significantly less memory and cpu.

Write

Using the JSON format for writing operations comes with many cascading improvements.

When using the standard way of executing INSERT and UPSERT queries. It is required to match the SQL to the data structure. Which goes against the main principle of the new database services.

A simple example of the impact is as follows:

await INSERT([
  {ID:1},
  {ID:2, name: ''},
  {ID:3, descr: ''}
])

Which will actually prepare and execute the following queries:

INSERT INTO (ID) VALUES (?)
INSERT INTO (ID,name) VALUES (?,?)
INSERT INTO (ID,descr) VALUES (?,?)

Where the usage of the JSON format allows the database services to only prepare and execute the following query:

INSERT INTO (ID,name,descr,...) AS SELECT ID, name, descr,... FROM JSON_EACH(?)

Where the placeholder will be provided with the JSON.stringify of the provided entries. Which means that the query will only be executed once. Reducing the number of network round trips required to do the same amount of work. Greatly improving through put as multiple rows can fit within a single network packet and the transformation to a JSON string is much cheaper then the native transformation steps as mentioned in the transform section.

By having a single JSON placeholder it is possible to stream the dataset through the CAP application. By taking the req object which natively is a Readable stream and providing it as the entries of the INSERT statement.

app.post('/upload', async (req,res) => {
  try{
    await INSERT(req).into('entity')
    res.status(201)
    res.end()
  } catch (err) {
    res.status(400)
    res.end(err.message)
  }
})

As the JSON is converted into an intermediate table it is also possible to improve the way that UPSERT statements are handled. It has all the same benefits as INSERT query have and a bit more.

UPSERT INTO (ID,name,createdAt,modifiedAt,...) AS 
SELECT 
  ID,
  new.name ? new.name : old.name, -- only update name when provided
  old.ID ? old.createdAt : $now,  -- only apply @cds.on.insert when no OLD entry exists
  old.ID ? $now : null            -- only apply @cds.on.update when an OLD entry exists
FROM JSON_EACH(?) AS NEW 
JOIN OLD ON OLD.ID = NEW.ID
Match

For @cap-js/hana specifically there is a JSON optimization that assists HANA in re-using execution plans. As HANA has a very advanced execution plan optimizer it is very valuable to be able to re-use the already existing execution plans. One kind of query was always preventing HANA from using the existing execution plans as the SQL query would always change based upon the data provided.

const IDs = [{val:1},{val:2},...]
cds.ql`SELECT * FROM ${entity} WHERE ID in ${IDs}`

Which would create a slightly different variant of the following query based upon then number of vals provided:

SELECT * FROM entity WHERE ID IN (?,?,...)

Where now this query will always prodice the same SQL statement. Allowing HANA to use the existing execution plan.

SELECT * FROM entity WHERE ID IN (SELECT VAL FROM JSON_TABLE(?,'$' COLUMNS(VAL DOUBLE PATH '$.val')))

More to Come

This documentation is not complete yet, or the APIs are not released for general availability. Stay tuned to upcoming releases for further updates.