Create VDK Confluence Data Source #3048

antoniivanov · 2024-01-24T12:11:37Z

The goal of this issue is to create a Confluence data source for the Versatile Data Kit (VDK). T
his data source will enable VDK to fetch and ingest data from Confluence spaces and documents.

VDK Data Source encapsulate how a data source can be ingested

Requirements

Generate new plugin vdk-confluence using https://github.com/versatile-data-kit-dev/new-vdk-data-source

Data Source Implementation

Establish a connection to Confluence using provided URL and tokens.
Fetch specific documents or entire spaces based on provided IDs or keys.

Data Source Stream

Implement streams to handle subsets of Confluence data. Options:

Entire spaces as comprehensive streams.
Based on Confluence's functionality and typhe of data (comments, pages, spaces, activities)

Data Source Payload

Structure payload with data, metadata, and state.

Data: Extracting page content, attachments, metadata.
Metadata: Page IDs, timestamps, author information.
State: Handling versioning and history in Confluence.

Impl notes

To create new plugin one can use cookie-cutter https://github.com/versatile-data-kit-dev/new-vdk-data-source
There's an example in https://github.com/vmware/versatile-data-kit/tree/main/events/data-sources

See example data job: https://github.com/vmware/versatile-data-kit/tree/main/examples/confluence-data-retrieval-example

Of course write functional tests

duyguHsnHsn · 2024-02-09T15:26:08Z

Following stories: #3100

Objective: Develop a plugin for data sourcing from Confluence, incorporating an initial stream that comprehensively retrieves data across all Confluence spaces. Linked to: #3048 (additional tickets are added here) Signed-off-by: Duygu Hasan [[email protected]](mailto:[email protected])

antoniivanov added this to the Vector database Ingestion milestone Jan 24, 2024

stefan-pulov added the initiative: VDK for Private AI Initiative including the effort to support Private AI usecases of VMWare with VDK label Jan 30, 2024

antoniivanov assigned duyguHsnHsn Feb 1, 2024

duyguHsnHsn mentioned this issue Feb 9, 2024

vdk-confluence: add data source plugin for confluence #3094

Merged

DeltaMichael closed this as completed Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create VDK Confluence Data Source #3048

Create VDK Confluence Data Source #3048

antoniivanov commented Jan 24, 2024 •

edited

Loading

duyguHsnHsn commented Feb 9, 2024

Create VDK Confluence Data Source #3048

Create VDK Confluence Data Source #3048

Comments

antoniivanov commented Jan 24, 2024 • edited Loading

Requirements

Data Source Implementation

Data Source Stream

Data Source Payload

Impl notes

duyguHsnHsn commented Feb 9, 2024

antoniivanov commented Jan 24, 2024 •

edited

Loading