Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrateDB Bulkloader #3860

Merged
merged 1 commit into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions assemblies/plugins/dist/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -999,6 +999,19 @@
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-assemblies-plugins-transforms-cratedbbulkloader</artifactId>
<version>${project.version}</version>
<type>zip</type>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>

<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-assemblies-plugins-transforms-creditcardvalidator</artifactId>
Expand Down
44 changes: 44 additions & 0 deletions assemblies/plugins/transforms/cratedbbulkloader/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.hop</groupId>
<artifactId>hop-assemblies-plugins-transforms</artifactId>
<version>2.9.0-SNAPSHOT</version>
</parent>

<artifactId>hop-assemblies-plugins-transforms-cratedbbulkloader</artifactId>
<version>${parent.version}</version>
<packaging>pom</packaging>

<name>Hop Assemblies Plugins Transforms CrateDB bulk loader</name>
<description />

<dependencies>
<dependency>
<groupId>org.apache.hop</groupId>
<artifactId>hop-transform-cratedbbulkloader</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.3 http://maven.apache.org/xsd/assembly-1.1.3.xsd">
<id>hop-assemblies-plugins-transforms-cratedbbulkloader</id>
<formats>
<format>zip</format>
</formats>
<baseDirectory>transforms/cratedbbulkloader</baseDirectory>
<files>
<file>
<source>${project.basedir}/src/main/resources/version.xml</source>
<outputDirectory>.</outputDirectory>
<filtered>true</filtered>
</file>
</files>
<fileSets>
<fileSet>
<outputDirectory>lib</outputDirectory>
<excludes>
<exclude>**/*</exclude>
</excludes>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<useProjectArtifact>false</useProjectArtifact>
<includes>
<include>org.apache.hop:hop-transform-cratedbbulkloader:jar</include>
</includes>
</dependencySet>
</dependencySets>
</assembly>
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Licensed to the Apache Software Foundation (ASF) under one or more
~ contributor license agreements. See the NOTICE file distributed with
~ this work for additional information regarding copyright ownership.
~ The ASF licenses this file to You under the Apache License, Version 2.0
~ (the "License"); you may not use this file except in compliance with
~ the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
~
-->

<version>${project.version}</version>
3 changes: 2 additions & 1 deletion assemblies/plugins/transforms/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
<module>combinationlookup</module>
<module>concatfields</module>
<module>constant</module>
<module>cratedbbulkloader</module>
<module>creditcardvalidator</module>
<module>cubeinput</module>
<module>cubeoutput</module>
Expand Down Expand Up @@ -177,4 +178,4 @@
<module>zipfile</module>
</modules>

</project>
</project>
42 changes: 42 additions & 0 deletions docker/integration-tests/integration-tests-cratedb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

services:
integration_test_database:
extends:
file: integration-tests-base.yaml
service: integration_test
depends_on:
cratedb:
condition: service_healthy
links:
- cratedb

cratedb:
image: crate:latest
ports:
- "4200"
- "5432"
healthcheck:
test: [ "CMD", "curl", "-f", "http://localhost:4200" ]
interval: 20s
timeout: 10s
retries: 6
start_period: 120s
volumes:
- ./resource/cratedb/config/crate.yml:/crate/config/crate.yml
- ../../integration-tests/cratedb/import:/home
36 changes: 36 additions & 0 deletions docker/integration-tests/resource/cratedb/config/crate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

auth:
host_based:
enabled: true
config:
0:
user: crate
# address: _local_
method: trust
99:
method: password

network.host: _local_,_site_

# Paths
path:
logs: /data/log
data: /data/data
blobs:
path: /data/blobs
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
////
:documentationPath: /pipeline/transforms/
:language: en_US
:description: The CrateDB Bulk Loader transform loads data from Apache Hop to CrateDB both with HTTP endpoint and COPY command.

= image:transforms/icons/cratedb.svg[CrateDB Bulk Loader transform Icon, role="image-doc-icon"] CrateDB Bulk Loader

[%noheader,cols="3a,1a", role="table-no-borders" ]
|===
|
== Description

The CrateDB Bulk Loader transform loads data from Apache Hop to CrateDB using two different approaches:

* the https://cratedb.com/docs/crate/reference/en/5.7/sql/statements/copy-from.html#copy-from[`COPY FROM`^] command.
* The https://cratedb.com/docs/crate/reference/en/latest/interfaces/http.html#bulk-operations[CrateDB HTTP endpoint] for bulk operations.


|
== Supported Engines
[%noheader,cols="2,1a",frame=none, role="table-supported-engines"]
!===
!Hop Engine! image:check_mark.svg[Supported, 24]
!Spark! image:question_mark.svg[Maybe Supported, 24]
!Flink! image:question_mark.svg[Maybe Supported, 24]
!Dataflow! image:question_mark.svg[Maybe Supported, 24]
!===
|===

IMPORTANT: The CrateDB Bulk Loader is linked to the database type. When the COPY mode is used, it will fetch the JDBC driver from the hop/lib/jdbc folder. +


== General Options

[options="header"]
|===
|Option|Description
|Transform name|Name of the transform.
|Target schema|The name of the target schema to write data to. This is a mandatory field because CrateDB needs to know which of the default schemas write to (`doc` and `blob` are the default schemas in CrateDB).
|Target table|The name of the target table to write data to.
|===

== Main Options

[options="header"]
|===
|Option|Description
|Connection|Name of the database connection on which the target table resides.
|Use HTTP Endpoint|Choose the mode to use to load data into CrateDB. Supported options are `HTTP Endpoint` and `COPY`; when `HTTP Endpoint` is selected, the `COPY` options are disabled and vice versa.
|Batch size| HTTP mode works writing in batch. The number of rows to send in a single batch to CrateDB must be set as there's no default value.
|Specify database fields|Specify the database and stream fields mapping
|Stream to file|Write the current pipeline stream to a file in the local filesystem or in S3 before performing the `COPY` load.
|Local folder|Local folder where to store files that will be used by the `COPY` command.

As per documentation, CrateDB retrieves files from its nodes filesystem (scheme `file://`). However, in most cases, Hop is executed in a different machine than the CrateDB one, so the most adopted solution with such scenarios is mapping the remote folder (CrateDB) with a local one (Hop) via volumes, for example.

In the `Local folder` field, you can specify the local folder where the file will be written. The file will be written in the local filesystem, which is linked to the remote filesystem (for e.g. Docker Volume).

Leave it empty otherwise in other scenarios (i.e.: writing to S3).
|Read from file|Do not stream the contents of the current pipeline, but perform the `COPY` load from a pre-existing file in the local filesystem or in S3. Supported formats are `CSV` (comma delimited).
|===

== AWS Authentication
[options="header"]
|===
|Option|Description
|Use AWS system variables| When selected, picks up the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` values from your operating system's environment variables.
|AWS_ACCESS_KEY_ID|(if `Use AWS system variables` is unchecked) specify a value or variable for your `AWS_ACCESS_KEY_ID`.
|AWS_SECRET_ACCESS_KEY|(if `Use AWS system variables` is unchecked) specify a value or variable for your `AWS_SECRET_ACCESS_KEY`.
|===

== HTTP Authentication
At the moment, Hop only supports the `Basic` and `Bearer` authentication methods.
[options="header"]
|===
|Option|Description
|HTTP Login|Insert the username for the HTTP authentication
|HTTP password|Insert the password for the HTTP authentication

== Fields

Map the current stream fields to the CrateDB table's columns.

Loading
Loading