Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cohere Provider #34921

Merged
merged 36 commits into from
Nov 6, 2023
Merged

Add Cohere Provider #34921

merged 36 commits into from
Nov 6, 2023

Conversation

utkarsharma2
Copy link
Contributor

@utkarsharma2 utkarsharma2 commented Oct 13, 2023

This PR is part of our larger effort to add first-class integrations to support LLMOps that was presented at Airflow Summit. This PR specifically adds the Cohere Provider. Cohere is a renowned platform offering a range of AI Models tailored for various NLP tasks. In this iteration, we are integrating with their Embeddings Model.

The primary objective of this Provider is to present users with an alternative embedding model. This allows them to generate vectors for their proprietary data, a pivotal step towards establishing integrations with LLM models like ChatGPT.

Example DAG:
The CohereEmbeddingOperator can accept either a list of strings or a callable returning a list of strings.

from datetime import datetime

from airflow import DAG

from airflow.providers.cohere.operators.embedding import CohereEmbeddingOperator

with DAG("example_cohere_embedding", schedule=None, start_date=datetime(2023, 1, 1), catchup=False) as dag:
    texts = [
        "On Kernel-Target Alignment. We describe a family of global optimization procedures",
        " that automatically decompose optimization problems into smaller loosely coupled",
        " problems, then combine the solutions of these with message passing algorithms.",
    ]

    def get_text():
        return texts

    CohereEmbeddingOperator(input_text=texts, task_id="embedding_via_text")
    CohereEmbeddingOperator(input_callable=get_text, task_id="embedding_via_callable")

Email Discussion related to the effort can be found here - https://lists.apache.org/thread/0d669fmy4hn29h5c0wj0ottdskd77ktp

Copy link
Contributor

@phanikumv phanikumv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the .svg files to be added to this PR?

@utkarsharma2
Copy link
Contributor Author

Do we need the .svg files to be added to this PR?

Yes, we do need to add them, as the provider dependency changes, the breeze output command interface also changes and that needs to be reflected. Also, they are auto-generated and enforced via update-breeze-cmd-output pre-commit hooks.

@utkarsharma2 utkarsharma2 marked this pull request as ready for review October 26, 2023 10:05
@pankajastro pankajastro merged commit 7fc19d4 into apache:main Nov 6, 2023
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Nov 10, 2023
* Add Cohere Provider

* Add Cohere Provider

* Move link to seealso sphinx directive

* Updated check for parameters

* Update dependency of the cohere

* Move the dag out of rst and into system tests

* Add dependency to of cohere python sdk

* Add cache_property for the cohere client

* Remove unwanted get_conn method

* Add correct label to password field

* Expose timeout, max_retries and api_url to user

* Fix documentation

* Update interface of CohereEmbeddingOperator operator

* Updated testcases

* Updated testcases

* Fix static check and docs build

* Update CONTRIBUTING.rst

Co-authored-by: Josh Fell <[email protected]>

* Update docs/apache-airflow-providers-cohere/operators/embedding.rst

Co-authored-by: Josh Fell <[email protected]>

* Update airflow/providers/cohere/operators/embedding.py

Co-authored-by: Josh Fell <[email protected]>

* Update airflow/providers/cohere/hooks/cohere.py

Co-authored-by: Hussein Awala <[email protected]>

* Update airflow/providers/cohere/CHANGELOG.rst

Co-authored-by: Hussein Awala <[email protected]>

* Address the PR comments

* Resolve conflicts

* Fix breaking tests

* Fix static checks

* Update airflow/providers/cohere/operators/embedding.py

Co-authored-by: Pankaj Singh <[email protected]>

* Fix docstring

* Add note for initial release

* Add security.rst file

* Update airflow/providers/cohere/hooks/cohere.py

* Update airflow/providers/cohere/operators/embedding.py

Co-authored-by: Josh Fell <[email protected]>

* Update docs/apache-airflow-providers-cohere/operators/embedding.rst

Co-authored-by: Josh Fell <[email protected]>

* Add ref to sequrity.rst

* Update docs/apache-airflow-providers-cohere/security.rst

* Add /changelog.rst

* Resolve conflicts

---------

Co-authored-by: Josh Fell <[email protected]>
Co-authored-by: Hussein Awala <[email protected]>
Co-authored-by: Pankaj Singh <[email protected]>
Co-authored-by: Pankaj <[email protected]>
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.