Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pinecone Provider #35094

Merged
merged 30 commits into from
Nov 6, 2023
Merged

Add Pinecone Provider #35094

merged 30 commits into from
Nov 6, 2023

Conversation

utkarsharma2
Copy link
Contributor

This PR is part of our larger effort to add first-class integrations to support LLMOps that was presented at Airflow Summit.

In this PR we are adding Pinecone Provider. Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. The primary objective of this Provider is to present users with an alternative Vector Database.

Example DAG:
The PineconeIngestOperator can accept either a list vector or a callable returning a list vector.

PineconeIngestOperator(
        task_id="pinecone_vector_ingest",
        index_name=index_name,
        input_vectors=[
            ("id1", [1.0, 2.0, 3.0], {"key": "value"}),
            ("id2", [1.0, 2.0, 3.0]),
        ],
        namespace=namespace,
        batch_size=1,
    )

Email Discussion related to the effort can be found here - https://lists.apache.org/thread/0d669fmy4hn29h5c0wj0ottdskd77ktp

@utkarsharma2 utkarsharma2 marked this pull request as ready for review October 30, 2023 12:44
@utkarsharma2 utkarsharma2 force-pushed the Pinecone-provider branch 2 times, most recently from fead816 to ffa9917 Compare November 1, 2023 14:21
@pankajkoti pankajkoti merged commit f493456 into apache:main Nov 6, 2023
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Nov 10, 2023
This PR is part of our larger effort to add first-class integrations to support LLMOps that was presented at Airflow Summit.

In this PR we are adding Pinecone Provider. Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. The primary objective of this Provider is to present users with an alternative Vector Database.

---------

Co-authored-by: Josh Fell <[email protected]>
Co-authored-by: Hussein Awala <[email protected]>
Co-authored-by: Pankaj Singh <[email protected]>
Co-authored-by: Pankaj <[email protected]>
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants