-
Notifications
You must be signed in to change notification settings - Fork 380
Added HIT-TMG_KaLM-embedding-multilingual-mini-instruct-v1 with instruct wrapper #2478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@Samoed I will not be able to run the models on all tasks and add results to results repo. Can you do that if possible? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detailed Analysis of Results Comparison: Task Type - "Classification": Task - "EmotionClassification"Significant Differences in Results
Task Type - "MultilabelClassification": Task - "CEDRClassification"
Task Type - "Clustering": Task - "GeoreviewClusteringP2P"
Task Type - "PairClassification": Task - "Ocnli"
|
Hm. I they've reported different prompts in paper with what've using. Can you update your implementation with their prompts? You can change model to use sentence transformer wrapper, but this is a hack and not clear how to integrate their resuls properly. At least can you try to change prompt for 2-3 tasks directly to test if our implementation will match? |
I think only Classification and MultilabelClassification results are having some differences. For retrieval, reranking, STS tasks(whose results I was going to share in sometime), there are no differences. Update: |
I think you can create an issue for it to discuss. After we will decide what to do with this model |
@ayush1298 You can change Line 91 in cb2825c
similarly to get_prompt_name Line 21 in cb2825c
|
@Samoed I have modified 1 more thing, I think what I missed is the prompt given at end in paper are having same format only of: they just have given these as an example with task-specific instruction and query for each task. |
"""Get the instruction/prompt to be used for encoding sentences.""" | ||
if prompts_dict and task_name in prompts_dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And what if task want to create different instructions to query and passages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should be done for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think code can be changed like
task = mteb.get_task(task_name=task_name)
prompt = task_metadata.prompts
if prompt dict and task_name in prompts_dict:
prompt = prompts_dict[task_name]
if isinstance(prompt, dict) and prompt_type:
...
if prompt:
return prompt
...
"EightTagsClustering": "Instruct: Identify of headlines from social media posts in Polish into 8 categories: film, history, food, medicine, motorization, work, sport and technology \n Query: {query}", | ||
"GeoreviewClusteringP2P": "Instruct: Identify the topic or theme of the Russian reviews. \n Query: {query}", | ||
"RuSciBenchGRNTIClusteringP2P": "Instruct: Identify the topic or theme of the Russian articles. \n Query: {query}", | ||
"RuSciBenchOECDClusteringP2P": "Instruct: Identify the topic or theme of the Russian articles. \n Query: {query}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't add {query}
, because we append text to instruction
fixes #1445 #2482
Added 3 models:
Code Quality
make lint
to maintain consistent style.Documentation
Testing
make test-with-coverage
.make test
ormake test-with-coverage
to ensure no existing functionality is broken.Adding a model checklist
mteb.get_model(model_name, revision)
andmteb.get_model_meta(model_name, revision)