"KeyError: 'document' not found and no similar keys were found. #1445

LeMoussel · 2024-11-13T13:03:26Z

With HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1, I have the following error:

Loader not specified for model HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1, loading using sentence transformers.
Traceback (most recent call last):
  File "/home/dev/Python/AI/MTEB/mteb_fr.py", line 252, in <module>
    mteb_model = mteb.get_model(
                 ^^^^^^^^^^^^^^^
  File "/home/dev/Python/AI/MTEB/venv/lib/python3.12/site-packages/mteb/models/overview.py", line 126, in get_model
    model = meta.load_model(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/Python/AI/MTEB/venv/lib/python3.12/site-packages/mteb/model_meta.py", line 120, in load_model
    model: Encoder = loader(**kwargs)  # type: ignore
                     ^^^^^^^^^^^^^^^^
  File "/home/dev/Python/AI/MTEB/venv/lib/python3.12/site-packages/mteb/model_meta.py", line 37, in sentence_transformers_loader
    return SentenceTransformerWrapper(model=model_name, revision=revision, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/Python/AI/MTEB/venv/lib/python3.12/site-packages/mteb/models/sentence_transformer_wrapper.py", line 48, in __init__
    model_prompts = self.validate_task_to_prompt_name(self.model.prompts)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/Python/AI/MTEB/venv/lib/python3.12/site-packages/mteb/models/wrapper.py", line 81, in validate_task_to_prompt_name
    task = mteb.get_task(task_name=task_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/Python/AI/MTEB/venv/lib/python3.12/site-packages/mteb/overview.py", line 318, in get_task
    raise KeyError(suggestion)
KeyError: "KeyError: 'document' not found and no similar keys were found."

The text was updated successfully, but these errors were encountered:

Samoed · 2024-11-13T13:22:46Z

The issue is that this model specifies a prompt, but in MTEB, we have different prompts for tasks, which causes an error. Since this is an instruction model, it would be better to use it with InstructWrapper. Example for e5-instruct models.

LeMoussel · 2024-11-13T14:00:29Z

OK. I do this;

    # https://huggingface.co/jinaai/jina-embeddings-v3/discussions/75
    MODEL_NAME = "HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1"
    MODEL_URL = 'https://huggingface.co/HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1'

    OUPUT_FOLDER = "results"

    mteb_model = mteb.get_model(
        MODEL_NAME,
        device="cuda" if torch.cuda.is_available() else "cpu",
    )

    tasks = mteb.get_tasks(
        tasks=TASK_LIST, languages=["fra"]
    )

    evaluation = mteb.MTEB(tasks=tasks)
    mteb_results = evaluation.run(
        mteb_model,
        eval_splits=["test"],
        output_folder=f"{OUPUT_FOLDER}/{MODEL_NAME}",
    )

How can I use InstructWrapper in this case?

Samoed · 2024-11-13T14:15:31Z

You can run this model like this

import mteb
from mteb.models.instruct_wrapper import instruct_wrapper

mteb_model = instruct_wrapper(
    model_name_or_path="HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1",
    instruction_template="Instruct: {instruction} \n Query: ",
    attn="cccc",
    pooling_method="mean",
    mode="embedding",
    normalized=True,
)

tasks = mteb.get_tasks(
    tasks=["SciDocsRR"]
)

evaluation = mteb.MTEB(tasks=tasks)
mteb_results = evaluation.run(
    mteb_model,
)

It would be very nice if you could add this model to the models folder with the filled metadata

LeMoussel · 2024-11-13T14:23:41Z

It is my pleasure to help you by adding this model to the models folder with the metadata filled in, but I am new to using MTEB.
What should I do to add this model to the models folder with the metadata filled in?

Samoed · 2024-11-13T14:26:40Z

You should fill in the information similar to the e5_instruct models and run some tasks to ensure that this implementation matches the author's.

ayush1298 · 2025-04-02T06:07:42Z

@Samoed As the model is added here:

mteb/mteb/models/misc_models.py

Line 228 in 42068c6

HIT_TMG__KaLM_embedding_multilingual_mini_instruct_v1 = ModelMeta(

and Results are also added in Results repo at : https://github.com/embeddings-benchmark/results/tree/main/results/HIT-TMG__KaLM-embedding-multilingual-mini-instruct-v1
Can we close these issue now?

Samoed · 2025-04-02T06:16:25Z

Models in misc file autogenerated and HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1 should use instruct wrapper instead of default sentence transformers

ayush1298 · 2025-04-02T09:03:05Z

So, do we have to add it to separate file similar to e5_instruct in https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/e5_instruct.py ,and then rerun the results?

Samoed · 2025-04-02T09:12:32Z

Yes

ayush1298 · 2025-04-02T15:29:36Z

@Samoed Should I add 'HIT_TMG__KaLM_embedding_multilingual_mini_v1' with instruct wrapper in the new file?
Also, will it be okay if I remove them from misc_models.py?

Samoed · 2025-04-02T15:33:18Z

Both yes. I think it should use InstructSentenceTransformerWrapper

ayush1298 · 2025-04-02T15:42:12Z

How do we decide what to use exactly: InstructSentenceTransformerWrapper or instruct_wrapper?

Samoed · 2025-04-02T15:47:56Z

They're both are nearly the same, but InstructSentenceTransformerWrapper is more convenient to use

isaac-chung added the new model Questions related to adding a new model to the benchmark label Dec 24, 2024

ayush1298 linked a pull request Apr 2, 2025 that will close this issue

Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper #2478

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"KeyError: 'document' not found and no similar keys were found. #1445

"KeyError: 'document' not found and no similar keys were found. #1445

LeMoussel commented Nov 13, 2024

Samoed commented Nov 13, 2024

LeMoussel commented Nov 13, 2024 •

edited

Loading

Samoed commented Nov 13, 2024 •

edited

Loading

LeMoussel commented Nov 13, 2024

Samoed commented Nov 13, 2024

ayush1298 commented Apr 2, 2025

Samoed commented Apr 2, 2025

ayush1298 commented Apr 2, 2025 •

edited

Loading

Samoed commented Apr 2, 2025

ayush1298 commented Apr 2, 2025

Samoed commented Apr 2, 2025

ayush1298 commented Apr 2, 2025

Samoed commented Apr 2, 2025

"KeyError: 'document' not found and no similar keys were found. #1445

"KeyError: 'document' not found and no similar keys were found. #1445

Comments

LeMoussel commented Nov 13, 2024

Samoed commented Nov 13, 2024

LeMoussel commented Nov 13, 2024 • edited Loading

Samoed commented Nov 13, 2024 • edited Loading

LeMoussel commented Nov 13, 2024

Samoed commented Nov 13, 2024

ayush1298 commented Apr 2, 2025

Samoed commented Apr 2, 2025

ayush1298 commented Apr 2, 2025 • edited Loading

Samoed commented Apr 2, 2025

ayush1298 commented Apr 2, 2025

Samoed commented Apr 2, 2025

ayush1298 commented Apr 2, 2025

Samoed commented Apr 2, 2025

LeMoussel commented Nov 13, 2024 •

edited

Loading

Samoed commented Nov 13, 2024 •

edited

Loading

ayush1298 commented Apr 2, 2025 •

edited

Loading