- Virtual Schema API Documentation
In the Exasol database, a Virtual Schema adapter script is similar to a UDF script. The adapter script defines a callback function and the Exasol core calls this function in different situations, for example when creating a Virtual Schema or during query processing. In order to communicate with the Virtual Schema adapter script, the Exasol database passes a JSON parameter containing the Virtual Schema API request to the callback function which then returns a JSON string containing the response. This page documents these JSON messages.
To get a better understanding let's take a look on how the Exasol database processes a Virtual Schema query:
The diagram shows how Exasol handles Virtual Schema queries:
- When the core receives an SQL query on a Virtual Schema table, it first checks the capabilities of the corresponding Virtual Schema adapter. Based on that information it removes all functions and literals that are not supported by the adapter.
- Next, the Exasol Core sends a query to the Virtual Schema adapter as a
PushdownRequest
. - The Virtual Schema adapter now rewrites the query into a new SQL statement that typically invokes the Exasol importer
IMPORT INTO ...
. For details see theIMPORT
statements documentation. The importer statement contains a query to the external database as a string. - Next, the Exasol database parses this statement again and invokes the importer.
- Finally, the Exasol core applies the functions that were not supported by the remote database itself as post processing and returns that result to the SQL client.
Instead of the IMPORT
statement the adapter can also create other SQL statements.
The important part is, that the Virtual Schema adapter receives an SQL statement and rewrites it into another SQL statement.
For example, it can create a SELECT FROM VALUES
statement with an inlined result.
There are the following request and response types:
Type | Called ... |
---|---|
Create Virtual Schema | … for each CREATE VIRTUAL SCHEMA ... statement |
Refresh | … for each ALTER VIRTUAL SCHEMA ... REFRESH ... statement. |
Set Properties | … for each ALTER VIRTUAL SCHEMA ... SET ... statement. |
Drop Virtual Schema | … for each DROP VIRTUAL SCHEMA ... statement. |
Get Capabilities | … whenever a virtual table is queried in a SELECT statement. |
Pushdown | … whenever a virtual table is queried in a SELECT statement. |
We describe each of the types in the following sections.
Please note: To keep the documentation concise we defined the elements which are commonly in separate sections below, e.g. schemaMetadataInfo
and schemaMetadata
.
Informs the Adapter about the request to create a Virtual Schema, and asks the Adapter for the metadata (tables and columns).
The Adapter is allowed to throw an Exception if the user missed to provide mandatory properties or in case of any other problems (e.g. connectivity).
Request:
{
"type": "createVirtualSchema",
"schemaMetadataInfo": {
...
}
}
Response:
{
"type": "createVirtualSchema",
"schemaMetadata": {
...
}
}
Notes
schemaMetadata
is mandatory. However, it is allowed to contain no tables.
Request to refresh the metadata for the whole Virtual Schema, or for specified tables.
Request:
{
"type": "refresh",
"schemaMetadataInfo": {
...
},
"requestedTables": ["T1", "T2"]
}
Notes
requestedTables
is optional. If existing, only the specified tables shall be refreshed. The specified tables do not have to exist, it just tell Adapter to update these tables (which might be changed, deleted, added, or non-existing).
Response:
{
"type": "refresh",
"schemaMetadata": {
...
},
"requestedTables": ["T1", "T2"]
}
Notes
schemaMetadata
is optional. It can be skipped if the adapter does not want to refresh (e.g. because it detected that there is no change).requestedTables
must exist if and only if the element existed in the request. The values must be the same as in the request (to make sure that Adapter only refreshed these tables).
Request to set and unset properties. The Adapter can decide whether it needs to send back new metadata. The Adapter is allowed to throw an Exception if the user provided invalid properties or in case of any other problems (e.g. connectivity).
Request:
{
"type": "setProperties",
"properties": {
"JDBC_CONNECTION_STRING": "new-jdbc-connection-string",
"NEW_PROPERTY": "value of a not yet existing property",
"UNSET_PROPERTY": null
},
"schemaMetadataInfo": {
...
"properties": {
"JDBC_CONNECTION_STRING": "current-jdbc-connection-string",
"THIS_REMAINS_UNCHANGED": "not listed in the section above, so the value remains as it is"
}
}
}
The new values appear in a separate object /properties
directly in the root element. For reference the current property values are still reported under '/schemaMetadataInfo/properties'.
Adapters must apply the properties incrementally, meaning that all parameters not explicitly listed in /properties
, remain unchanged.
Response:
{
"type": "setProperties",
"schemaMetadata": {
...
}
}
Notes
- Request: A property set to null means that this property was asked to be deleted. Properties set to null might also not have existed before.
- Response:
schemaMetadata
is optional. It only exists if the adapter wants to send back new metadata. The existing metadata are overwritten completely.
Inform the Adapter that a Virtual Schema is about to be dropped. The Adapter can update external dependencies if it has such. The Adapter is not expected to throw an exception, and if it does, it will be ignored.
Request:
{
"type": "dropVirtualSchema",
"schemaMetadataInfo": {
...
}
}
Response:
{
"type": "dropVirtualSchema"
}
Request the list of capabilities supported by the Adapter. Based on these capabilities, the database will collect everything that can be pushed down in the current query and sends a pushdown request afterwards.
Request:
{
"type": "getCapabilities",
"schemaMetadataInfo": {
...
}
}
Response:
{
"type": "getCapabilities",
"capabilities": [
"ORDER_BY_COLUMN",
"AGGREGATE_SINGLE_GROUP",
"LIMIT",
"AGGREGATE_GROUP_BY_TUPLE",
"FILTER_EXPRESSIONS",
"SELECTLIST_EXPRESSIONS",
"SELECTLIST_PROJECTION",
"AGGREGATE_HAVING",
"ORDER_BY_EXPRESSION",
"AGGREGATE_GROUP_BY_EXPRESSION",
"LIMIT_WITH_OFFSET",
"AGGREGATE_GROUP_BY_COLUMN",
"FN_PRED_LESSEQUALS",
"FN_AGG_COUNT",
"LITERAL_EXACTNUMERIC",
"LITERAL_DATE",
"LITERAL_INTERVAL",
"LITERAL_TIMESTAMP_UTC",
"LITERAL_TIMESTAMP",
"LITERAL_NULL",
"LITERAL_STRING",
"LITERAL_DOUBLE",
"LITERAL_BOOL"
]
}
The set of capabilities in the example above would be sufficient to pushdown all aspects of the following query:
SELECT user_id, COUNT(url)
FROM vs.clicks
WHERE user_id>1
GROUP BY user_id
HAVING count(url)>1
ORDER BY user_id
LIMIT 10;
See also List of supported Capabilities.
Capability | Prefix | Java Implementation |
---|---|---|
Main Capabilities | (none) | MainCapability.java |
Literal Capabilities | LITERAL_ |
LiteralCapability.java |
Predicate Capabilities | FN_PRED_ |
PredicateCapability.java |
Scalar Function Capabilities | FN_ |
ScalarFunctionCapability.java |
Aggregate Function Capabilities | FN_AGG_ |
AggregateFunctionCapability.java |
There are capabilities that come in groups, sharing the same group prefix. An example is the FN_AGG_COUNT
group. In this case there is a base capability that needs to be set as a precondition of setting any other capability in that group.
As an example, to allow push-down of SELECT COUNT(*)
the Virtual Schema adapter needs to report FN_AGG_COUNT
+ FN_AGG_COUNT_STAR
.
Similarly a left join only works if the VS adapter reports JOIN
+ JOIN_TYPE_LEFT_OUTER
.
If the group has more members, you can combine all of them, but the base capability is the key of enabling any other capability in that group.
Contains an abstract specification of what to be pushed down, and requests a pushdown SQL statement from the Adapter which can be used to retrieve the requested data.
Request:
Running the following query
SELECT user_id, COUNT(url)
FROM vs.clicks
WHERE user_id>1
GROUP BY user_id
HAVING count(url)>1
ORDER BY user_id
LIMIT 10;
will produce the following Request, assuming that the Adapter has all required capabilities.
{
"type": "pushdown",
"pushdownRequest": {
"type" : "select",
"aggregationType" : "group_by",
"from" :
{
"type" : "table",
"name" : "CLICKS"
},
"selectList" :
[
{
"type" : "column",
"name" : "USER_ID",
"columnNr" : 1,
"tableName" : "CLICKS"
},
{
"type" : "function_aggregate",
"name" : "count",
"arguments" :
[
{
"type" : "column",
"name" : "URL",
"columnNr" : 2,
"tableName" : "CLICKS"
}
]
}
],
"selectListDataTypes" :
[
{
"precision" : 18,
"scale" : 0,
"type" : "DECIMAL"
},
{
"precision" : 10,
"scale" : 0,
"type" : "DECIMAL"
}
],
"filter" :
{
"type" : "predicate_less",
"left" :
{
"type" : "literal_exactnumeric",
"value" : "1"
},
"right" :
{
"type" : "column",
"name" : "USER_ID",
"columnNr" : 1,
"tableName" : "CLICKS"
}
},
"groupBy" :
[
{
"type" : "column",
"name" : "USER_ID",
"columnNr" : 1,
"tableName" : "CLICKS"
}
],
"having" :
{
"type" : "predicate_less",
"left" :
{
"type" : "literal_exactnumeric",
"value" : "1"
},
"right" :
{
"type" : "function_aggregate",
"name" : "count",
"arguments" :
[
{
"type" : "column",
"name" : "URL",
"columnNr" : 2,
"tableName" : "CLICKS"
}
]
}
},
"orderBy" :
[
{
"type" : "order_by_element",
"expression" :
{
"type" : "column",
"columnNr" : 1,
"name" : "USER_ID",
"tableName" : "CLICKS"
},
"isAscending" : true,
"nullsLast" : true
}
],
"limit" :
{
"numElements" : 10
}
},
"involvedTables": [
{
"name" : "CLICKS",
"columns" :
[
{
"name" : "ID",
"dataType" :
{
"type" : "DECIMAL",
"precision" : 18,
"scale" : 0
}
},
{
"name" : "USER_ID",
"dataType" :
{
"type" : "DECIMAL",
"precision" : 18,
"scale" : 0
}
},
{
"name" : "URL",
"dataType" :
{
"type" : "VARCHAR",
"size" : 1000
}
},
{
"name" : "REQUEST_TIME",
"dataType" :
{
"type" : "TIMESTAMP"
}
}
]
}
],
"schemaMetadataInfo": {
...
}
}
Notes
pushdownRequest
: Specification what needs to be pushed down. You can think of it like a parsed SQL statement.from
: The requested from clause. This can be a table or a join.selectList
: The requested select list. There are three options for this field:selectList
is not given: This meansSELECT *
. Adapters requiring an explicit select list can use the fieldfrom
to get the names and aliases of the tables (depth-first search on joins) and the fieldinvolvedTables
to get the columns for each table.selectList
is an empty array: Select any column/expression. This is used, for example, if a query can not be pushed down completely. The adapter may choose something likeSELECT TRUE
to get the correct number of rows. OtherwiseselectList
contains the requested select list elements, a list of expressions. The order of the elements matters.selectListDataTypes
: List of data types expected in the result set. This information is only provided by Exasol databases with version 8.6.0 and higher or major version 7 and version ≥ 7.1.14, see Exasol Data Types API Documentation. The data types are retrieved and sent by the database's compiler from the virtual schema's virtual table definitions. These definitions are stored when creating or refreshing the virtual schema.
filter
: The requested filter (where
clause), a single expression.aggregationType
: An optional element, set if an aggregation is requested. Eithergroup_by
orsingle_group
, if a aggregate function is used but no group by.groupBy
: The requested group by clause, a list of expressions.having
: The requested having clause, a single expression.orderBy
: The requested order-by clause, a list oforder_by_element
elements.limit
The requested limit of the result set, with an optional offset.
involvedTables
: Metadata of the involved tables, encoded like in schemaMetadata.
Response:
Following the example above, a valid result could look like this:
{
"type": "pushdown",
"sql": "IMPORT FROM JDBC AT 'jdbc:exa:remote-db:8563;schema=native' USER 'sys' IDENTIFIED BY 'exasol' STATEMENT 'SELECT USER_ID, count(URL) FROM NATIVE.CLICKS WHERE 1 < USER_ID GROUP BY USER_ID HAVING 1 < count(URL) ORDER BY USER_ID LIMIT 10'"
}
Notes
sql
: The pushdown SQL statement. It must be either anSELECT
orIMPORT
statement.
The following Json objects can be embedded in a request or response. They have a fixed structure.
This document contains the most important metadata of the virtual schema and is sent to the adapter just "for information" with each request. It is the value of an element called schemaMetadataInfo
.
{
"schemaMetadataInfo": {
"name": "MY_HIVE_VSCHEMA",
"adapterNotes": "<serialized adapter state>",
"properties": {
"HIVE_SERVER": "my-hive-server",
"HIVE_DB": "my-hive-db",
"HIVE_USER": "my-hive-user"
}
}
}
This document is usually embedded in responses from the Adapter and informs the database about all metadata of the Virtual Schema, especially the contained Virtual Tables and its columns.
The Adapter can optionally store so called adapterNotes
on each level (schema, table, column), to remember information which might be relevant for the Adapter in future. Adapter notes are simple strings. You can serialize objects into those strings of course, but keep in mind that the strings are embedded inside the Virtual Schemas JSON protocol, which makes quoting of conflicting characters necessary.
Some options to deal with the embedding issue:
- After serialization use Base64 encoding or
- Use a serialization that does not have conflicting characters like a simple CSV or key-value format or
- Quote conflicting characters
Which variant you should choose depends on considerations like amount of data to be transmitted, original data format and encoding overhead.
In the example below, the Adapter remembers the table partitioning and the data type of a column which is not directly supported in Exasol. The Adapter has this information during push-down and can consider the table partitioning during push-down or can add an appropriate cast for the column.
This example also demonstrates serialization in adapter notes via key-value encoding. As mentioned above more sophisticated serializations are possible as long as you make sure adapter notes are a valid string in the JSON format by encoding or quoting.
{"schemaMetadata":{
"adapterNotes": "lastRefreshed=2015-03-01 12:10:01;anotherKey=More custom schema state here",
"tables": [
{
"type": "table",
"name": "EXASOL_CUSTOMERS",
"adapterNotes": "hivePartitionColumns=CREATED,COUNTRY_ISO",
"columns": [
{
"name": "ID",
"dataType": {
"type": "DECIMAL",
"precision": 18,
"scale": 0
},
"isIdentity": true
},
{
"name": "COMPANY_NAME",
"dataType": {
"type": "VARCHAR",
"size": 1000,
"characterSet": "UTF8"
},
"default": "foo",
"isNullable": false,
"comment": "The official name of the company",
"adapterNotes": "hiveDataType=List<String>"
},
{
"name": "DISCOUNT_RATE",
"dataType": {
"type": "DOUBLE"
}
}
]
},
{
"type": "table",
"name": "TABLE_2",
"columns": [
{
"name": "COL1",
"dataType": {
"type": "DECIMAL",
"precision": 18,
"scale": 0
}
},
{
"name": "COL2",
"dataType": {
"type": "VARCHAR",
"size": 1000
}
}
]
}
]
}}
This section handles the expressions that can occur in a pushdown request. Expressions are consistently encoded in the following way. This allows easy and consisting parsing and serialization.
{
"type": "<type-of-expression>",
...
}
Each expression-type can have any number of additional fields of arbitrary type. In the following sections we define the known expressions.
This element currently only occurs in from clause
{
"type": "table",
"name": "CLICKS",
"alias": "A"
}
Notes
- alias: This is an optional property and is added if the table has an alias in the original query.
This element currently only occurs in from clause
{
"type": "join",
"join_type": "inner",
"left": {
...
},
"right" : {
...
},
"condition" : {
...
}
}
Notes
- join_type: Can be
inner
,left_outer
,right_outer
orfull_outer
. - left: This can be a
table
or ajoin
. - right: This can be a
table
or ajoin
. - condition: This can be an arbitrary expression.
A column lookup is a reference to a table column. It can reference the table directly or via an alias.
{
"type": "column",
"tableName": "T",
"tableAlias": "A",
"columnNr": 0,
"name": "ID"
}
Notes
- tableAlias: This is an optional property and is added if the referenced table has an alias.
- columnNr: Column number in the virtual table, starting with 0.
{
"type": "order_by_element",
"expression": {
...
},
"isAscending": true,
"nullsLast": true
}
Notes
- The field
expression
contains the expression to order by.
Refer to the Exasol Data Types API Documentation
{
"type": "literal_null"
}
{
"type": "literal_string",
"value": "my string"
}
{
"type": "literal_double",
"value": "1.2345000000000000e+02"
}
{
"type": "literal_exactnumeric",
"value": "123.45"
}
{
"type": "literal_bool",
"value": true
}
{
"type": "literal_date",
"value": "2015-12-01"
}
{
"type": "literal_timestamp",
"value": "2015-12-01 12:01:01.1234"
}
{
"type": "literal_timestamputc",
"value": "2015-12-01 12:01:01.1234"
}
{
"type": "literal_interval",
"value": "+2-01",
"dataType": {
"type": "INTERVAL",
"fromTo": "YEAR TO MONTH",
"precision": 2
}
}
{
"type": "literal_interval",
"value": "+0 00:00:02.000",
"dataType": {
"type": "INTERVAL",
"fromTo": "DAY TO SECONDS",
"precision": 2,
"fraction": 2
}
}
Whenever there is ...
this is a shortcut for an arbitrary expression.
{
"type": "predicate_and",
"expressions": [
...
]
}
The same can be used for predicate_or
.
{
"type": "predicate_not",
"expression": {
...
}
}
The same can be used for predicate_is_null
, predicate_is_not_null
.
{
"type": "predicate_equal",
"left": {
...
},
"right": {
...
}
}
The same can be used for predicate_notequal
, predicate_less
and predicate_lessequal
.
{
"type": "predicate_like",
"expression": {
...
},
"pattern": {
...
},
"escapeChar": {
...
}
}
Notes:
escapeChar
is optional.
{
"type": "predicate_like_regexp",
"expression": {
...
},
"pattern": {
...
}
}
{
"type": "predicate_between",
"expression": {
...
},
"left": {
...
},
"right": {
...
}
}
<exp> IN (<const1>, <const2>)
{
"type": "predicate_in_constlist",
"expression": {
...
}
"arguments": [
...
]
}
exp1 IS JSON {VALUE | ARRAY | OBJECT | SCALAR} {WITH | WITHOUT} UNIQUE KEYS
(requires predicate capability IS_JSON
)
{
"type": "predicate_is_json",
"expression": {
...
},
"typeConstraint": "VALUE",
"keyUniquenessConstraint": "WITHOUT UNIQUE KEYS"
}
Notes:
- typeConstraint is
"VALUE"
,"ARRAY"
,"OBJECT"
, or"SCALAR"
. - keyUniquenessConstraint is
"WITH UNIQUE KEYS"
or"WITHOUT UNIQUE KEYS"
.
The same can be used for a predicate type predicate_is_not_json
(requires predicate capability IS_NOT_JSON
).
Refer to the Exasol Scalar Functions API Documentation
Refer to the Exasol Aggregate Functions API Documentation