Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate historic Label Studio and "needs identification" sources #147

Open
5 tasks
josh-chamberlain opened this issue Feb 11, 2025 · 0 comments
Open
5 tasks

Comments

@josh-chamberlain
Copy link
Contributor

josh-chamberlain commented Feb 11, 2025

Context

  • pursuant to Data source ID: Retool interface #126: We now have data labeling infrastructure and an interface in Retool.
  • We used to do labeling in Label Studio, and we still have some labeled sources there (~500 IIRC).
  • We have ~1900 data sources where approval_status = "needs identification" because they are either missing record_type, agency_described, or both
    • they were only stored there for temporary safekeeping

Requirements

  • rescue the labeled data from Label Studio
    • reject labels where there are multiple annotators but not 100% agreement
  • fill in our new db, as though we used the retool interface to make those labels
  • bring in the "needs identification" sources (as a special manually-created batch?), and set them up to run through the official labeling process.
    • delete the sources; they can be created anew when labeled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant