Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model card metadata format is not preserved when loaded + saved again #2564

Closed
Wauplin opened this issue Sep 25, 2024 · 2 comments · Fixed by #2570
Closed

Model card metadata format is not preserved when loaded + saved again #2564

Wauplin opened this issue Sep 25, 2024 · 2 comments · Fixed by #2570
Labels
bug Something isn't working

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Sep 25, 2024

Here is an example: https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct/commit/846357c7ee5e3f50575fd4294edb3d898c8ea100.

Let's try to find a way to improve this, especially for fields that haven't changed. Can't promise there is a good solution though.

Related to slack thread (private).

@Wauplin Wauplin added the bug Something isn't working label Sep 25, 2024
@julien-c
Copy link
Member

yes in the initial implementation i was trying to do that. At least you should keep the attributes order.

@hlky
Copy link
Contributor

hlky commented Sep 25, 2024

diff --git a/src/huggingface_hub/repocard.py b/src/huggingface_hub/repocard.py
index f6ae591f..12c3e84e 100644
--- a/src/huggingface_hub/repocard.py
+++ b/src/huggingface_hub/repocard.py
@@ -109,7 +109,9 @@ class RepoCard:
             data_dict = {}
             self.text = content
 
-        self.data = self.card_data_class(**data_dict, ignore_metadata_errors=self.ignore_metadata_errors)
+        self.data = self.card_data_class(
+            **data_dict, ignore_metadata_errors=self.ignore_metadata_errors, original_order=list(data_dict.keys())
+        )
 
     def __str__(self):
         return self.content
diff --git a/src/huggingface_hub/repocard_data.py b/src/huggingface_hub/repocard_data.py
index b9b93aac..6b41cc57 100644
--- a/src/huggingface_hub/repocard_data.py
+++ b/src/huggingface_hub/repocard_data.py
@@ -172,8 +172,12 @@ class CardData:
     inherit from `dict` to allow this export step.
     """
 
-    def __init__(self, ignore_metadata_errors: bool = False, **kwargs):
+    def __init__(self, ignore_metadata_errors: bool = False, original_order: Optional[List[str]] = None, **kwargs):
         self.__dict__.update(kwargs)
+        if original_order:
+            self.__dict__ = {
+                k: self.__dict__[k] for k in original_order + list(set(self.__dict__.keys()) - set(original_order))
+            }
 
     def to_dict(self) -> Dict[str, Any]:
         """Converts CardData to a dict.
@@ -316,6 +320,7 @@ class ModelCardData(CardData):
         pipeline_tag: Optional[str] = None,
         tags: Optional[List[str]] = None,
         ignore_metadata_errors: bool = False,
+        original_order: Optional[List[str]] = None,
         **kwargs,
     ):
         self.base_model = base_model
@@ -347,7 +352,7 @@ class ModelCardData(CardData):
                         " some information will be lost. Use it at your own risk."
                     )
 
-        super().__init__(**kwargs)
+        super().__init__(**kwargs, original_order=original_order)
 
         if self.eval_results:
             if isinstance(self.eval_results, EvalResult):

Something like this, WDYT?

from huggingface_hub import ModelCard

model_card = """---
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
pipeline_tag: text-generation
tags:
- facebook
- meta
- pytorch
- llama
- llama-3
license: llama3.1
extra_gated_prompt: >-
  ### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
extra_gated_fields:
  First Name: text
  Last Name: text
  Date of birth: date_picker
  Country: country
  Affiliation: text
  Job title:
    type: select
    options:
    - Student
    - Research Graduate
    - AI researcher
    - AI developer/engineer
    - Reporter
    - Other
  geo: ip_location
  By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: >-
  The information you provide will be collected, stored, processed and shared in
  accordance with the [Meta Privacy
  Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
library_name: transformers
---
"""

card = ModelCard(model_card)
card.content

Currently returns:

"---\nlanguage:\n- en\n- de\n- fr\n- it\n- pt\n- hi\n- es\n- th\nlibrary_name: transformers\nlicense: llama3.1\npipeline_tag: text-generation\ntags:\n- facebook\n- meta\n- pytorch\n- llama\n- llama-3\nextra_gated_prompt: '### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT'\nextra_gated_fields:\n  First Name: text\n  Last Name: text\n  Date of birth: date_picker\n  Country: country\n  Affiliation: text\n  Job title:\n    type: select\n    options:\n    - Student\n    - Research Graduate\n    - AI researcher\n    - AI developer/engineer\n    - Reporter\n    - Other\n  geo: ip_location\n  ? By clicking Submit below I accept the terms of the license and acknowledge that\n    the information I provide will be collected stored processed and shared in accordance\n    with the Meta Privacy Policy\n  : checkbox\nextra_gated_description: The information you provide will be collected, stored, processed\n  and shared in accordance with the [Meta Privacy Policy](https://www.facebook.com/privacy/policy/).\nextra_gated_button_content: Submit\n---\n"

With patch original order is maintained:

"---\nlanguage:\n- en\n- de\n- fr\n- it\n- pt\n- hi\n- es\n- th\npipeline_tag: text-generation\ntags:\n- facebook\n- meta\n- pytorch\n- llama\n- llama-3\nlicense: llama3.1\nextra_gated_prompt: '### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT'\nextra_gated_fields:\n  First Name: text\n  Last Name: text\n  Date of birth: date_picker\n  Country: country\n  Affiliation: text\n  Job title:\n    type: select\n    options:\n    - Student\n    - Research Graduate\n    - AI researcher\n    - AI developer/engineer\n    - Reporter\n    - Other\n  geo: ip_location\n  ? By clicking Submit below I accept the terms of the license and acknowledge that\n    the information I provide will be collected stored processed and shared in accordance\n    with the Meta Privacy Policy\n  : checkbox\nextra_gated_description: The information you provide will be collected, stored, processed\n  and shared in accordance with the [Meta Privacy Policy](https://www.facebook.com/privacy/policy/).\nextra_gated_button_content: Submit\nlibrary_name: transformers\n---\n"

and after changes:

card = ModelCard(model_card)
card.data.license = "test"
card.content
"---\nlanguage:\n- en\n- de\n- fr\n- it\n- pt\n- hi\n- es\n- th\npipeline_tag: text-generation\ntags:\n- facebook\n- meta\n- pytorch\n- llama\n- llama-3\nlicense: test\nextra_gated_prompt: '### LLAMA 3.1 COMMUNITY LICENSE AGREEMENT'\nextra_gated_fields:\n  First Name: text\n  Last Name: text\n  Date of birth: date_picker\n  Country: country\n  Affiliation: text\n  Job title:\n    type: select\n    options:\n    - Student\n    - Research Graduate\n    - AI researcher\n    - AI developer/engineer\n    - Reporter\n    - Other\n  geo: ip_location\n  ? By clicking Submit below I accept the terms of the license and acknowledge that\n    the information I provide will be collected stored processed and shared in accordance\n    with the Meta Privacy Policy\n  : checkbox\nextra_gated_description: The information you provide will be collected, stored, processed\n  and shared in accordance with the [Meta Privacy Policy](https://www.facebook.com/privacy/policy/).\nextra_gated_button_content: Submit\nlibrary_name: transformers\n---\n"

Would also need adding to DatasetCardData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants