- NodeJS >= 16.x
- Docker
- Turborepo -- Install by running
npm install turbo --global
The Airbyte Specification doc describes each step of an Airbyte Source in detail. Also read Airbyte's development guide. This repository will automatically release the sources as Docker images via Github Actions.
Clone the repo and copy the sources/example-source
folder into a new folder
with the name of your new source. In this guide we will name our source
new-source
, so we will create sources/new-source
. In your new folder, update
package.json
and the ExampleSource class in src/index.ts
with the name of
your source.
Go back to the root folder of the repo and run npm i
to
install the dependencies for all the sources, including our new-source
.
The first step of a source is returning specification of the configuration
parameters required to connect to the targeted system (e.g. API credentials,
URLs, etc). The provided Source class does this by returning the JSON-Schema
object in resources/spec.json
. Update this file with your source's
configuration parameters, making sure to protect any sensitive parameters by
setting "airbyte_secret": true
in their corresponding properties.
See this guide on how to preview the Airbyte UI elements generated from the specification.
After the configuration parameters are populated by the user, the source
verifies that the provided configuration is usable. This is done via the
checkConnection
method in your source class. The config
argument is a
dictionary of the parameters provided by the user. This method should verify
that all credentials in the configuration are valid, URLs are correct and
reachable, values are within proper ranges, etc.
The method returns a tuple of [boolean, error]
, where the boolean indicates
whether or not the configuration is valid, and the error is an optional
VError indicating what is invalid about
the configuration. If the boolean is true, the error should be undefined.
A source contains one or more streams, which correspond to entity types of the
system your source is fetching data from. For example, a GitHub source would
have streams for users, commits, pull requests, etc. Each stream also has its
own arbitrary state for supporting incremental mode syncs. Implement your
streams, an example of which is in the Builds
class in src/stream.ts
,
and include them in your source via the streams()
method of your source class.
Each stream has a JSON-Schema object defining the schema of the records that
this stream will fetch. This is done in the streams' getJsonSchema()
method.
The source combines the results of calling this method on every stream to
create the Airbyte Catalog for the source's discover
command.
Tip: use json-to-schema-converter to help with generate the JSON-Schema files for your streams.
The primaryKey
property defines one or more fields of the record schema that
make up the unique key of each record.
The cursorField
property defines one or more fields of the record schema that
Airbyte will check to determine whether a record is new or updated. This is
required to support syncing in incremental mode, and all our sources should
support incremental mode unless the data from source's technical system doesn't
have any timestamp-like fields that describe the freshness of each record.
The readRecords()
method defines the logic for fetching data from your
source's technical system.
The getUpdatedState()
method defines how to update the stream's arbitrary
state, given the current stream state and the most recent record generated from
readRecords()
. The source calls this method after each record is generated.
The stateCheckpointInterval
property determines how often a state message is
outputted and persisted. For example, if the interval is 100, the stream's state
will be persisted after reading every 100 records. It is undefined by default,
meaning the state is only persisted after all streams in the source have
finished reading records. Alternatively, you can implement Stream
Slicing
by overriding the streamSlices()
method, but for most cases, setting a
checkpoint interval should be sufficient.
Each source must be tested against an Airbyte-provided docker image that runs a
series of tests to validate all the commands of a source. Pull this image by
running docker pull airbyte/source-acceptance-test
.
This test suite requires several json files defining a valid source
configuration and various input and expected outputs. The source-acceptance-test
docker image determines the paths for these files via the
acceptance-test-config.yml
file in your source folder.
First create a valid source configuration for the tests. In your source folder,
create a new folder secrets
and write your configuration to
secrets/config.json
. For example-source
, this JSON would be
{
"server_url":"url",
"user":"chris", // only this value is validated by the example-source
"token":"token"
}
Since this configuration would likely contain sensitive values, it cannot be
committed to the repo. To enable the Github Action Workflow to run the source
acceptance test, ask one of the Faros team members to add the configuration JSON
as a Github Repository Secret with the environment variable name
<SOURCE_NAME>_TEST_CREDS
. So for new-source
, the name would be
NEW_SOURCE_TEST_CREDS
.
The acceptance-test-config.yml
points to several other json files that enable
the tests for each of the source commands. See the Source Acceptance Tests
Reference
for how those files are used. These files should be committed to the repo.
Run the tests with the provided script from the
root repo folder ./scripts/source-acceptance-test.sh <source>
, where
<source>
is the folder name, e.g. new-source
.