Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proof of Concept] Bundle both SDK versions in one single package #4875

Closed
wants to merge 3 commits into from

Conversation

frascuchon
Copy link
Member

Description

This PR is a Proof of Concept of how to bundle both versions of SDK in one single package. There are 2 different approaches. Let's see them with some code examples:

# Importing SDK from the new argilla_v2 version.
# This could be renamed to argilla (I think) in the 2.0.0 release
import argilla_v2 as rg
import argilla_v2.v1 as rg_v1

client = rg.Argilla()
rg_v1.init()

print(rg_v1.list_datasets())
print(client.datasets.list())

# The other way is to import the new SDK version from the old one.
# This could be easier to do

import argilla as rg
import argilla.v2 as rg2

client = rg2.Argilla()
rg.init()

print(rg_v1.list_datasets())
print(client.datasets.list())

I think the approach will depend on how the migration process will be done.

Personally, it makes more sense (and it's more intuitive) to import the new version from the old one. Why?

  1. Feedback datasets do not require any migration. So, if users are working only with Feedback, the migration step can be skipped.
  2. The migration step would be a previous step to the upgrade. So, users can work with v2 without upgrading the whole system.
  3. The new version won't drag any old code references.

But in any case, this is a starting point to discuss the approach.

@burtenshaw
Copy link
Contributor

burtenshaw commented May 24, 2024

@frascuchon Example 2 makes sense for the package referencing.

How would you foresee the migration of datasets themselves from legacy to new v2?
Would we write functionality to interpret and init Dataset's for this?
Have you considering migrating in serialized/ generic forms? using the new from_disk method?

@frascuchon
Copy link
Member Author

How would you foresee the migration of datasets themselves from legacy to new v2?

For this what we can do is a HowTo guide to

  1. Configure the Dataset depending on the legacy task (text classification, token classification...)
  2. Use the old SDK to load records and add/update using the new SDK (using the mapping attribute can be easier)

The current SDK would be compatible with the latest versions of Argilla v1, so users can start working with it without the whole v2 upgrade.

Would we write functionality to interpret and init Dataset's for this?

I would say not for now. We can provide some guides to configure the Dataset and add records. The functionality for that should consider different scenarios depending on the user's setup, and this can be hard to tackle.

Do you have something in mind?

Have you considered migrating in serialized/ generic forms? using the new from_disk method?

I see this compatible in scenarios where users want to migrate FeedbackDatasets to v2 iteratively. For those scenarios, users can export datasets from one server using to_disk and push them back to the new server with from_disk

@frascuchon
Copy link
Member Author

Regarding this, I feel more natural to expose the old package through the new one. So, If we keep argilla as module naming, we can provide v2 as:

import argilla as rg
client = rg.Argilla()
...

and expose the v1 as a submodule:

import argilla.v1 as rg1

rg1.list_datasets(...)
...

I think this would be the right way. We can later discuss and tackle implementation details based on the results of this PoC.

@frascuchon frascuchon closed this Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants