Skip to content

Latest commit

 

History

History
129 lines (129 loc) · 26.1 KB

results.md

File metadata and controls

129 lines (129 loc) · 26.1 KB
Nr Model FaRel child parent grand-child sibling grand-parent great grand-child niece or nephew aunt or uncle great grand-parent
1 o1-mini 99.78 100.00 100.00 100.00 100.00 98.00 100.00 100.00 100.00 100.00
1 deepseek-r1 99.78 100.00 100.00 100.00 98.00 100.00 100.00 100.00 100.00 100.00
3 o1-preview 98.89 100.00 100.00 96.00 100.00 100.00 100.00 94.00 100.00 100.00
4 qwq-32b-preview-sys 97.56 100.00 100.00 100.00 98.00 100.00 100.00 94.00 86.00 100.00
5 deepseek-v3-sys 96.89 100.00 100.00 98.00 98.00 100.00 98.00 88.00 90.00 100.00
6 qwq-32b-preview 96.67 100.00 100.00 100.00 98.00 100.00 98.00 90.00 88.00 96.00
7 deepseek-v3 96.44 100.00 100.00 100.00 96.00 100.00 100.00 82.00 92.00 98.00
8 claude-3.5-sonnet-1022-sys 95.78 100.00 100.00 100.00 96.00 100.00 98.00 92.00 76.00 100.00
9 claude-3.5-sonnet-1022 93.33 100.00 100.00 100.00 92.00 100.00 98.00 76.00 74.00 100.00
10 qvq-72b-preview 91.56 100.00 100.00 98.00 92.00 96.00 86.00 84.00 76.00 92.00
11 claude-3.5-sonnet-sys 88.67 100.00 100.00 100.00 86.00 96.00 92.00 64.00 60.00 100.00
12 gpt-4o-sys 88.44 100.00 100.00 86.00 92.00 100.00 88.00 72.00 62.00 96.00
12 mistral-large-2411-Q8_0 88.44 100.00 100.00 94.00 92.00 100.00 90.00 70.00 54.00 96.00
12 Sky-T1-32B-Preview-Q8_0 88.44 100.00 100.00 80.00 96.00 98.00 90.00 82.00 50.00 100.00
15 llama-3.1-405b-instruct-sys 87.78 100.00 100.00 100.00 96.00 96.00 92.00 50.00 56.00 100.00
15 deepseek-v2-chat-0628-Q8_0 87.78 100.00 100.00 98.00 86.00 94.00 94.00 60.00 60.00 98.00
17 gpt-4o-2024-11-20-sys 87.33 100.00 100.00 98.00 84.00 100.00 98.00 56.00 52.00 98.00
18 gemini-pro-1.5-002 87.11 100.00 100.00 74.00 88.00 100.00 84.00 70.00 72.00 96.00
18 minimax-01-sys 87.11 100.00 100.00 84.00 94.00 100.00 70.00 62.00 74.00 100.00
20 mistral-large-2411 86.89 100.00 100.00 68.00 88.00 98.00 96.00 64.00 68.00 100.00
20 claude-3.5-sonnet 86.89 100.00 100.00 98.00 80.00 98.00 94.00 60.00 56.00 96.00
20 mistral-large-2 86.89 100.00 100.00 70.00 92.00 100.00 94.00 60.00 66.00 100.00
23 gpt-4-turbo-sys 86.67 100.00 100.00 94.00 80.00 94.00 94.00 54.00 68.00 96.00
23 llama-3.3-70b-instruct-sys 86.67 100.00 100.00 96.00 92.00 100.00 78.00 66.00 54.00 94.00
25 llama-3.3-70b-instruct 86.44 100.00 100.00 90.00 92.00 100.00 76.00 68.00 56.00 96.00
26 gpt-4-turbo 86.22 100.00 100.00 92.00 84.00 96.00 90.00 56.00 60.00 98.00
27 llama-3.1-405b-instruct 85.78 100.00 100.00 88.00 92.00 98.00 88.00 54.00 52.00 100.00
28 minimax-01 85.56 100.00 100.00 96.00 82.00 100.00 72.00 52.00 68.00 100.00
29 mistral-large-2-sys 85.11 100.00 100.00 84.00 86.00 100.00 94.00 56.00 50.00 96.00
30 gpt-4o-2024-11-20 84.22 100.00 100.00 84.00 78.00 98.00 82.00 62.00 56.00 98.00
30 llama-3.1-nemotron-70b-instruct-sys 84.22 100.00 100.00 78.00 90.00 98.00 88.00 60.00 46.00 98.00
32 gemini-2.0-flash-exp 84.00 100.00 100.00 84.00 78.00 94.00 86.00 66.00 50.00 98.00
33 Phi-4-Q8_0-sys 83.78 100.00 100.00 100.00 84.00 100.00 90.00 40.00 42.00 98.00
34 gpt-4o 83.11 100.00 100.00 84.00 82.00 98.00 74.00 62.00 52.00 96.00
35 claude-3-opus-sys 82.67 100.00 100.00 88.00 72.00 96.00 92.00 48.00 50.00 98.00
36 grok-2-sys 81.78 100.00 100.00 80.00 92.00 100.00 90.00 34.00 42.00 98.00
37 tulu-3-70b-Q8_0-sys 81.11 98.00 100.00 98.00 84.00 98.00 74.00 42.00 38.00 98.00
37 Phi-4-Q8_0 81.11 100.00 100.00 94.00 74.00 96.00 86.00 40.00 42.00 98.00
39 grok-2 80.67 100.00 100.00 66.00 96.00 100.00 84.00 48.00 40.00 92.00
40 grok-beta 80.44 100.00 100.00 64.00 94.00 100.00 86.00 50.00 32.00 98.00
41 llama-3.1-nemotron-70b-instruct 80.00 100.00 100.00 92.00 80.00 96.00 82.00 46.00 32.00 92.00
42 nemotron-4-340b-instruct-Q8_0-sys 79.78 100.00 100.00 86.00 70.00 98.00 74.00 46.00 48.00 96.00
42 Athene-V2-Chat-Q8_0 79.78 100.00 100.00 82.00 78.00 98.00 86.00 32.00 44.00 98.00
44 claude-3-opus 78.89 100.00 100.00 86.00 72.00 94.00 90.00 40.00 32.00 96.00
45 nemotron-4-340b-instruct-Q8_0 78.67 100.00 100.00 76.00 60.00 96.00 76.00 46.00 58.00 96.00
45 tulu-3-70b-Q8_0 78.67 96.00 100.00 90.00 66.00 86.00 80.00 58.00 42.00 90.00
45 qwen-2.5-72b-instruct 78.67 100.00 100.00 60.00 78.00 96.00 84.00 42.00 54.00 94.00
48 mistral-large-sys 77.33 100.00 100.00 88.00 72.00 96.00 62.00 46.00 42.00 90.00
49 gemini-flash-1.5-002 77.11 100.00 100.00 86.00 58.00 100.00 64.00 48.00 42.00 96.00
50 llama-3.1-70b-instruct 76.89 100.00 100.00 72.00 66.00 96.00 78.00 52.00 34.00 94.00
51 nova-pro-v1-sys 76.67 100.00 100.00 94.00 68.00 86.00 72.00 54.00 40.00 76.00
52 nova-pro-v1 76.44 100.00 100.00 90.00 68.00 88.00 70.00 54.00 36.00 82.00
53 llama-3.1-70b-instruct-sys 75.11 100.00 98.00 76.00 70.00 90.00 76.00 44.00 30.00 92.00
53 Meta-Llama-3-70B-Instruct.Q8_0-sys 75.11 100.00 100.00 78.00 68.00 100.00 74.00 34.00 26.00 96.00
55 gpt-4-sys 74.44 100.00 100.00 90.00 66.00 96.00 60.00 46.00 46.00 66.00
56 gemini-pro-1.5 74.00 100.00 100.00 94.00 74.00 96.00 58.00 28.00 28.00 88.00
57 gemma-2-27b-Q5_K_M-sys 72.44 100.00 84.00 86.00 68.00 90.00 58.00 50.00 38.00 78.00
58 mistral-large 71.33 100.00 100.00 100.00 54.00 92.00 58.00 48.00 10.00 80.00
58 gemini-flash-1.5 71.33 100.00 100.00 94.00 56.00 98.00 62.00 30.00 18.00 84.00
60 gemma-2-27b-Q5_K_M 69.33 100.00 100.00 80.00 54.00 92.00 58.00 20.00 32.00 88.00
60 mistral-nemo-sys 69.33 96.00 100.00 52.00 76.00 96.00 54.00 36.00 26.00 88.00
62 gemma-2-9b-Q8_0 67.33 100.00 100.00 82.00 42.00 92.00 64.00 20.00 16.00 90.00
63 gemma-2-9b-Q8_0-sys 66.67 100.00 100.00 84.00 36.00 92.00 64.00 16.00 20.00 88.00
64 gpt-4 65.78 100.00 100.00 98.00 28.00 86.00 76.00 12.00 14.00 78.00
65 mixtral-8x22b-instruct-v0.1-Q8_0 65.11 100.00 100.00 100.00 22.00 92.00 50.00 24.00 16.00 82.00
65 Qwen2-72B-Instruct-Q8_0 65.11 100.00 100.00 86.00 44.00 88.00 68.00 22.00 16.00 62.00
67 Mistral-Nemo-Instruct-2407-Q8_0-sys 64.89 98.00 94.00 34.00 58.00 88.00 52.00 40.00 30.00 90.00
67 mixtral-8x22b-instruct-v0.1.Q8_0-sys 64.89 100.00 100.00 100.00 22.00 94.00 44.00 30.00 16.00 78.00
69 Meta-Llama-3-70B-Instruct.Q8_0 64.67 100.00 100.00 96.00 34.00 90.00 44.00 48.00 16.00 54.00
70 claude-3-haiku-sys 64.00 100.00 100.00 80.00 32.00 94.00 66.00 16.00 18.00 70.00
71 WizardLM-2-8x22B.Q8_0 63.56 100.00 98.00 86.00 24.00 82.00 54.00 28.00 20.00 80.00
72 Bielik-11B-v2.3-Instruct-Q8_0-sys 63.33 96.00 96.00 48.00 54.00 94.00 52.00 38.00 18.00 74.00
73 c4ai-command-r-plus-v01.Q8_0-sys 63.11 100.00 100.00 96.00 22.00 74.00 48.00 40.00 22.00 66.00
73 c4ai-command-r-plus-v01.Q8_0 63.11 100.00 100.00 96.00 22.00 72.00 46.00 46.00 18.00 68.00
75 phi-3-medium-4k-instruct-Q8_0 62.44 100.00 100.00 86.00 18.00 96.00 58.00 20.00 18.00 66.00
76 mixtral-8x7b-instruct-v0.1.Q8_0 62.00 98.00 96.00 78.00 24.00 96.00 50.00 34.00 8.00 74.00
77 deepseek-v2-chat-Q8_0 61.78 100.00 100.00 98.00 24.00 90.00 56.00 22.00 20.00 46.00
77 internlm2_5-20b-chat-Q8_0 61.78 100.00 100.00 100.00 0.00 96.00 32.00 50.00 30.00 48.00
79 deepseek-v2-chat-Q8_0-sys 61.56 100.00 100.00 100.00 16.00 90.00 74.00 20.00 12.00 42.00
79 qwen1_5-110b-chat-q8_0 61.56 100.00 100.00 68.00 26.00 94.00 40.00 30.00 18.00 78.00
79 Karasu-Mixtral-8x22B-v0.1.Q8_0 61.56 100.00 100.00 94.00 20.00 88.00 40.00 26.00 18.00 68.00
82 qwen1_5-110b-chat-q8_0-sys 61.33 100.00 100.00 62.00 54.00 96.00 36.00 22.00 14.00 68.00
83 gpt-3.5-turbo-sys 60.89 100.00 78.00 76.00 32.00 90.00 56.00 18.00 18.00 80.00
83 Mistral-Nemo-Instruct-2407-Q8_0 60.89 96.00 100.00 90.00 20.00 98.00 28.00 50.00 18.00 48.00
83 inflection-3-productivity 60.89 100.00 100.00 94.00 26.00 94.00 26.00 24.00 28.00 56.00
83 mixtral-8x7b-instruct-v0.1.Q8_0-sys 60.89 98.00 86.00 50.00 50.00 88.00 68.00 34.00 10.00 64.00
87 mistral-nemo 60.44 100.00 100.00 90.00 12.00 96.00 28.00 52.00 18.00 48.00
88 command-r7b-12-2024 60.22 94.00 94.00 72.00 40.00 94.00 62.00 20.00 4.00 62.00
89 c4ai-command-r-plus-08-2024-Q8_0 59.78 100.00 98.00 66.00 46.00 74.00 6.00 54.00 16.00 78.00
90 internlm2_5-20b-chat-Q8_0-sys 59.11 100.00 100.00 88.00 4.00 96.00 34.00 36.00 16.00 58.00
90 mistral-medium-sys 59.11 100.00 100.00 60.00 42.00 82.00 32.00 24.00 28.00 64.00
92 c4ai-command-r-08-2024-Q8_0 58.44 100.00 100.00 84.00 10.00 100.00 22.00 58.00 14.00 38.00
93 mistral-small 58.00 98.00 98.00 80.00 22.00 82.00 14.00 66.00 8.00 54.00
94 qwen1_5-72b-chat-q8_0 57.56 100.00 100.00 90.00 14.00 76.00 46.00 28.00 32.00 32.00
95 Smaug-2-72B.Q8_0 57.11 100.00 100.00 90.00 6.00 84.00 48.00 14.00 24.00 48.00
95 mistral-small-sys 57.11 92.00 100.00 76.00 26.00 84.00 6.00 68.00 18.00 44.00
97 qwen1_5-32b-chat-q8_0 56.67 100.00 94.00 82.00 16.00 94.00 18.00 46.00 12.00 48.00
97 OLMo-2-1124-13B-Instruct-Q8_0-sys 56.67 98.00 100.00 82.00 16.00 90.00 18.00 48.00 22.00 36.00
99 Bielik-11B-v2.3-Instruct-Q8_0 56.00 100.00 100.00 66.00 28.00 80.00 4.00 62.00 30.00 34.00
100 c4ai-command-r-v01-Q8_0 55.78 100.00 100.00 76.00 4.00 92.00 18.00 20.00 46.00 46.00
100 llama-2-70b-chat.Q8_0 55.78 100.00 92.00 72.00 14.00 80.00 52.00 28.00 10.00 54.00
100 mistral-medium 55.78 100.00 100.00 54.00 64.00 66.00 24.00 40.00 24.00 30.00
100 claude-3-haiku 55.78 100.00 100.00 92.00 10.00 84.00 14.00 58.00 22.00 22.00
104 aya-23-35b-Q8_0 55.33 100.00 100.00 92.00 6.00 98.00 12.00 24.00 64.00 2.00
105 Meta-Llama-3-8B-Instruct.Q8_0 55.11 96.00 94.00 46.00 38.00 96.00 36.00 8.00 28.00 54.00
106 miqu-1-70b.q5_K_M 54.89 100.00 100.00 50.00 66.00 64.00 16.00 40.00 30.00 28.00
106 c4ai-command-r-v01-Q8_0-sys 54.89 94.00 100.00 72.00 16.00 88.00 10.00 18.00 58.00 38.00
108 ggml-dbrx-instruct-16x12b-q8_0 54.44 100.00 100.00 58.00 34.00 70.00 12.00 46.00 20.00 50.00
109 snowflake-arctic-instruct-Q5_K_M-sys 53.56 86.00 100.00 56.00 14.00 86.00 38.00 28.00 20.00 54.00
110 Phi-3-mini-4k-instruct-Q8_0 53.33 98.00 96.00 98.00 4.00 90.00 20.00 26.00 36.00 12.00
110 OLMo-2-1124-13B-Instruct-Q8_0 53.33 96.00 100.00 82.00 0.00 88.00 16.00 48.00 24.00 26.00
112 Meta-Llama-3-8B-Instruct.Q8_0-sys 51.56 80.00 88.00 50.00 32.00 90.00 42.00 14.00 18.00 50.00
113 gpt-3.5-turbo 50.22 96.00 54.00 78.00 18.00 80.00 22.00 52.00 18.00 34.00
114 deepseek-v2-lite-chat-Q8_0 49.56 88.00 100.00 60.00 14.00 88.00 8.00 62.00 6.00 20.00
115 llama-3.1-8b-instruct 48.67 82.00 92.00 44.00 32.00 96.00 24.00 10.00 10.00 48.00
116 mistral-7b-instruct-v0.2.Q8_0 46.89 98.00 86.00 42.00 24.00 70.00 12.00 56.00 28.00 6.00
117 aya-23-8b-Q8_0 45.78 72.00 100.00 32.00 46.00 56.00 2.00 52.00 48.00 4.00
117 llama-3.1-8b-instruct-sys 45.78 82.00 78.00 34.00 30.00 84.00 30.00 8.00 8.00 58.00
119 deepseek-v2-lite-chat-Q8_0-sys 45.56 54.00 100.00 62.00 8.00 90.00 8.00 70.00 6.00 12.00
120 snowflake-arctic-instruct-Q5_K_M 44.89 54.00 82.00 70.00 8.00 60.00 30.00 44.00 34.00 22.00
121 gemma-7b-it-Q8_0 43.56 100.00 54.00 62.00 32.00 36.00 28.00 50.00 18.00 12.00
122 llama-2-13b-chat.Q8_0 43.33 88.00 82.00 32.00 22.00 76.00 6.00 42.00 30.00 12.00
123 mistral-7b-instruct-v0.2.Q8_0-sys 33.33 72.00 90.00 20.00 16.00 52.00 12.00 20.00 10.00 8.00
124 llama-2-7b-chat.Q8_0 31.56 36.00 72.00 34.00 24.00 28.00 22.00 22.00 30.00 16.00
125 WizardLM-2-7B-Q8_0 20.00 36.00 16.00 8.00 14.00 18.00 22.00 36.00 12.00 18.00
126 gemma-2b-it-Q8_0 5.56 0.00 0.00 0.00 0.00 0.00 12.00 24.00 14.00 0.00
127 qwen1_5-7b-chat-q8_0 2.89 6.00 2.00 4.00 0.00 2.00 0.00 8.00 2.00 2.00