Missing a field in Entity Linking datasets #23

dalek-who · 2023-12-20T16:01:09Z

Here is the data example of EL provided in the README:

'23235546-1', # table id
'Ivan Lendl career statistics', # page title
'Singles: 19 finals (8 titles, 11 runner-ups)', # section title
'', # caption
['outcome', 'year', ...], # headers
[[[0, 4], 'Björn Borg'], [[9, 2], 'Wimbledon'], ...], # cells, [index, entity mention (cell text)]
[['Björn Borg', 'Swedish tennis player', []], ['Björn Borg', 'Swedish swimmer', ['Swimmer']], ...], # candidate entities, this the merged set for all cells. [entity name, entity description, entity types]
[0, 12, ...] # labels, this is the index of the gold entity in the candidate entities
[[0, 1, ...], [11, 12, 13, ...], ...] # candidates for each cell

However, the final field:

[[0, 1, ...], [11, 12, 13, ...], ...] # candidates for each cell

is only provided in the test split, while in the train and dev split, it is missing. How to generate this field?

The text was updated successfully, but these errors were encountered:

belerico · 2024-02-08T12:10:57Z

I'm trying to understand the same here...

cc @xiang-deng @huan-sunrise

xiang-deng · 2024-02-12T06:25:49Z

Hi, as you can see in

TURL/data_loader/EL_data_loaders.py

Line 28 in bfec92e

    
           table_id, pgTitle, secTitle, caption, headers, entities, candidate_entities, labels,_ = input_data

The field is not used for training. If I recall correctly, when tuning the model, I compute the loss against all candidates for the table, not individual cells, as it is more efficient.

The field is used at test time to compute the final metric, i.e. if the model predicts something that is not in the candidate set associated with the specific cell. We can ignore it. As such we only provide it for the test set. The logic is in evaluate_task.ipynb and data_processing.ipynb.

Let me know if you have other questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing a field in Entity Linking datasets #23

Missing a field in Entity Linking datasets #23

dalek-who commented Dec 20, 2023

belerico commented Feb 8, 2024

xiang-deng commented Feb 12, 2024

Missing a field in Entity Linking datasets #23

Missing a field in Entity Linking datasets #23

Comments

dalek-who commented Dec 20, 2023

belerico commented Feb 8, 2024

xiang-deng commented Feb 12, 2024