Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create VDK Confluence Data Source #3048

Closed
antoniivanov opened this issue Jan 24, 2024 · 1 comment
Closed

Create VDK Confluence Data Source #3048

antoniivanov opened this issue Jan 24, 2024 · 1 comment
Assignees
Labels
initiative: VDK for Private AI Initiative including the effort to support Private AI usecases of VMWare with VDK

Comments

@antoniivanov
Copy link
Collaborator

antoniivanov commented Jan 24, 2024

The goal of this issue is to create a Confluence data source for the Versatile Data Kit (VDK). T
his data source will enable VDK to fetch and ingest data from Confluence spaces and documents.

VDK Data Source encapsulate how a data source can be ingested

Requirements

Generate new plugin vdk-confluence using https://github.com/versatile-data-kit-dev/new-vdk-data-source

Data Source Implementation

Establish a connection to Confluence using provided URL and tokens.
Fetch specific documents or entire spaces based on provided IDs or keys.

Data Source Stream

Implement streams to handle subsets of Confluence data. Options:

  • Entire spaces as comprehensive streams.
  • Based on Confluence's functionality and typhe of data (comments, pages, spaces, activities)

Data Source Payload

Structure payload with data, metadata, and state.

  • Data: Extracting page content, attachments, metadata.
  • Metadata: Page IDs, timestamps, author information.
  • State: Handling versioning and history in Confluence.

Impl notes

To create new plugin one can use cookie-cutter https://github.com/versatile-data-kit-dev/new-vdk-data-source
There's an example in https://github.com/vmware/versatile-data-kit/tree/main/events/data-sources

See example data job: https://github.com/vmware/versatile-data-kit/tree/main/examples/confluence-data-retrieval-example

Of course write functional tests

@stefan-pulov stefan-pulov added the initiative: VDK for Private AI Initiative including the effort to support Private AI usecases of VMWare with VDK label Jan 30, 2024
@duyguHsnHsn
Copy link
Collaborator

Following stories: #3100

duyguHsnHsn added a commit that referenced this issue Feb 13, 2024
Objective: Develop a plugin for data sourcing from Confluence,
incorporating an initial stream that comprehensively retrieves data
across all Confluence spaces.

Linked to: #3048
(additional tickets are added here)

Signed-off-by: Duygu Hasan [[email protected]](mailto:[email protected])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
initiative: VDK for Private AI Initiative including the effort to support Private AI usecases of VMWare with VDK
Projects
None yet
Development

No branches or pull requests

4 participants