[Proof of Concept] Bundle both SDK versions in one single package #4875

frascuchon · 2024-05-24T10:54:15Z

Description

This PR is a Proof of Concept of how to bundle both versions of SDK in one single package. There are 2 different approaches. Let's see them with some code examples:

# Importing SDK from the new argilla_v2 version.
# This could be renamed to argilla (I think) in the 2.0.0 release
import argilla_v2 as rg
import argilla_v2.v1 as rg_v1

client = rg.Argilla()
rg_v1.init()

print(rg_v1.list_datasets())
print(client.datasets.list())

# The other way is to import the new SDK version from the old one.
# This could be easier to do

import argilla as rg
import argilla.v2 as rg2

client = rg2.Argilla()
rg.init()

print(rg_v1.list_datasets())
print(client.datasets.list())

I think the approach will depend on how the migration process will be done.

Personally, it makes more sense (and it's more intuitive) to import the new version from the old one. Why?

Feedback datasets do not require any migration. So, if users are working only with Feedback, the migration step can be skipped.
The migration step would be a previous step to the upgrade. So, users can work with v2 without upgrading the whole system.
The new version won't drag any old code references.

But in any case, this is a starting point to discuss the approach.

for more information, see https://pre-commit.ci

burtenshaw · 2024-05-24T11:21:21Z

@frascuchon Example 2 makes sense for the package referencing.

How would you foresee the migration of datasets themselves from legacy to new v2?
Would we write functionality to interpret and init Dataset's for this?
Have you considering migrating in serialized/ generic forms? using the new from_disk method?

frascuchon · 2024-05-24T11:55:51Z

How would you foresee the migration of datasets themselves from legacy to new v2?

For this what we can do is a HowTo guide to

Configure the Dataset depending on the legacy task (text classification, token classification...)
Use the old SDK to load records and add/update using the new SDK (using the mapping attribute can be easier)

The current SDK would be compatible with the latest versions of Argilla v1, so users can start working with it without the whole v2 upgrade.

Would we write functionality to interpret and init Dataset's for this?

I would say not for now. We can provide some guides to configure the Dataset and add records. The functionality for that should consider different scenarios depending on the user's setup, and this can be hard to tackle.

Do you have something in mind?

Have you considered migrating in serialized/ generic forms? using the new from_disk method?

I see this compatible in scenarios where users want to migrate FeedbackDatasets to v2 iteratively. For those scenarios, users can export datasets from one server using to_disk and push them back to the new server with from_disk

frascuchon · 2024-05-27T10:21:42Z

Regarding this, I feel more natural to expose the old package through the new one. So, If we keep argilla as module naming, we can provide v2 as:

import argilla as rg
client = rg.Argilla()
...

and expose the v1 as a submodule:

import argilla.v1 as rg1

rg1.list_datasets(...)
...

I think this would be the right way. We can later discuss and tackle implementation details based on the results of this PoC.

frascuchon added 2 commits May 24, 2024 12:47

chore: Include new SDK as dep under argilla.v2

a684cd5

create a PoC v2 bundle package

f43bf84

frascuchon requested review from jfcalvo, dvsrepo, damianpumar and burtenshaw May 24, 2024 10:54

[pre-commit.ci] auto fixes from pre-commit.com hooks

96eb83b

for more information, see https://pre-commit.ci

frascuchon closed this Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proof of Concept] Bundle both SDK versions in one single package #4875

[Proof of Concept] Bundle both SDK versions in one single package #4875

frascuchon commented May 24, 2024

burtenshaw commented May 24, 2024 •

edited

Loading

frascuchon commented May 24, 2024

frascuchon commented May 27, 2024

[Proof of Concept] Bundle both SDK versions in one single package #4875

[Proof of Concept] Bundle both SDK versions in one single package #4875

Conversation

frascuchon commented May 24, 2024

Description

burtenshaw commented May 24, 2024 • edited Loading

frascuchon commented May 24, 2024

frascuchon commented May 27, 2024

burtenshaw commented May 24, 2024 •

edited

Loading