Skip to content

Commit

Permalink
Bump GCS version with avro/parquet timestamp conversion (#8360)
Browse files Browse the repository at this point in the history
* get date-time format form json schema

* created universal date-time converter

* implemented jsonnode transformation for avro and parquet

* removed unneeded dependency from build.gradle

* fix checkstyle

* add DateTimeUtilsTest

* add AvroRecordHelperTest

* resolve merge conflicts | fix checkstyle

* update LocalTime parsing

* added String type to avro schema for Logical Types, removed date-time conversion

* fix checkstyle

* fix checkstyle

* added static String schema, added comments

* bump version

* Bump GCS version with avro/parquet timestamp conversion

* update docs

* update docs

Co-authored-by: vmaltsev <[email protected]>
  • Loading branch information
VitaliiMaltsev and vmaltsev authored Dec 2, 2021
1 parent 1286c8c commit 67f7cf0
Show file tree
Hide file tree
Showing 6 changed files with 76 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"destinationDefinitionId": "ca8f6566-e555-4b40-943a-545bf123117a",
"name": "Google Cloud Storage (GCS)",
"dockerRepository": "airbyte/destination-gcs",
"dockerImageTag": "0.1.3",
"dockerImageTag": "0.1.14",
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/gcs",
"icon": "googlecloudstorage.svg"
}
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
- name: Google Cloud Storage (GCS)
destinationDefinitionId: ca8f6566-e555-4b40-943a-545bf123117a
dockerRepository: airbyte/destination-gcs
dockerImageTag: 0.1.3
dockerImageTag: 0.1.14
documentationUrl: https://docs.airbyte.io/integrations/destinations/gcs
icon: googlecloudstorage.svg
- name: Google PubSub
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -594,7 +594,7 @@
- "overwrite"
- "append"
supportsNamespaces: true
- dockerImage: "airbyte/destination-gcs:0.1.3"
- dockerImage: "airbyte/destination-gcs:0.1.14"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/destinations/gcs"
connectionSpecification:
Expand Down
2 changes: 1 addition & 1 deletion airbyte-integrations/connectors/destination-gcs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar

RUN tar xf ${APPLICATION}.tar --strip-components=1

LABEL io.airbyte.version=0.1.3
LABEL io.airbyte.version=0.1.14
LABEL io.airbyte.name=airbyte/destination-gcs
1 change: 1 addition & 0 deletions docs/integrations/destinations/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,7 @@ Under the hood, an Airbyte data stream in Json schema is first converted to an A

| Version | Date | Pull Request | Subject |
| :--- | :--- | :--- | :--- |
| 0.1.14 | 2021-12-01 | [\#7732](https://github.com/airbytehq/airbyte/pull/7732) | Support timestamp in Avro and Parquet |
| 0.1.13 | 2021-11-03 | [\#7288](https://github.com/airbytehq/airbyte/issues/7288) | Support Json `additionalProperties`. |
| 0.1.2 | 2021-09-12 | [\#5720](https://github.com/airbytehq/airbyte/issues/5720) | Added configurable block size for stream. Each stream is limited to 10,000 by GCS |
| 0.1.1 | 2021-08-26 | [\#5296](https://github.com/airbytehq/airbyte/issues/5296) | Added storing gcsCsvFileLocation property for CSV format. This is used by destination-bigquery \(GCS Staging upload type\) |
Expand Down
72 changes: 71 additions & 1 deletion docs/understanding-airbyte/json-avro-conversion.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,77 @@ When an Airbyte data stream is synced to the Avro or Parquet format (e.g. Parque
| object | record |
| array | array |

2. Built-in Json schema formats are not mapped to Avro logical types at this moment.
2. Built-in Json schema date-time formats will be mapped to Avro logical types

**Date**

The date logical type represents a date within the calendar, with no reference to a particular time zone or time of day.

A date logical type annotates an Avro int, where the int stores the number of days from the unix epoch, 1 January 1970 (ISO calendar).


```json
{
"type": "string",
"format": "date"
}
```

will become in Avro schema:

```json
{
"type": "int",
"logicalType": "date"
}
```

**Time (microsecond precision)**

The time-micros logical type represents a time of day, with no reference to a particular calendar, time zone or date, with a precision of one microsecond.

A time-micros logical type annotates an Avro long, where the long stores the number of microseconds after midnight, 00:00:00.000000.


```json
{
"type": "string",
"format": "time"
}
```

will become in Avro schema:

```json
{
"type": "long",
"logicalType": "time-micros"
}
```

**Timestamp (microsecond precision)**

The timestamp-micros logical type represents an instant on the global timeline, independent of a particular time zone or calendar, with a precision of one microsecond.

A timestamp-micros logical type annotates an Avro long, where the long stores the number of microseconds from the unix epoch, 1 January 1970 00:00:00.000000 UTC.


```json
{
"type": "string",
"format": "date-time"
}
```

will become in Avro schema:

```json
{
"type": "long",
"logicalType": "timestamp-micros"
}
```

3. Combined restrictions \("allOf", "anyOf", and "oneOf"\) will be converted to type unions. The corresponding Avro schema can be less stringent. For example, the following Json schema

```json
Expand Down

0 comments on commit 67f7cf0

Please sign in to comment.