-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Can PretrainedTransformerTokenizer track character offset like WordTokenizer? #3458
Comments
This is a TODO in the code. I know that the huggingface repo has code to train SQuAD models, so there must be a way to do this calculation in that repo, but I haven't looked at the code to figure it out. Contributions welcome! |
@matt-gardner if this issue is still pending I would love to take this up. I might need your assistance as I am relatively new to the code base. As in if you could provide me a list of TODO's it will really help me. |
The new |
@matt-gardner Oh, thanks for the update. |
I few weeks ago I added a parameter to |
I have no context or intuition about what time label this one should get; @dirkgr, any ideas? |
We already have the code for this (#3868), so this task is to integrate the new huggingface tokenizers whenever those remaining bugs are fixed, and bring that PR up to date. I'll say that's a day's worth of work. |
I'm aware. |
New huggingface tokenizers are still broken. I'm moving this to the bottom of the stack for 1.0. Maybe we'll bump it to 1.1. |
Finally done! |
Question
Since character offset is important to calculate answer span after wordpiece tokenization?
The text was updated successfully, but these errors were encountered: