SACR (from the French "Script d'Annotation des Chaînes de Référence") is a tool optimized for coreference chain annotation. It has been published in the following paper:
You can download the poster here.
SACR is a single webpage. All operations are done in the browser. You can download it and open the index.html
file, or use it online at boberle.com.
The workflow is as follows:
(1) Mark the referring expressions:
(2) Build the coreference chains:
(3) Add feature annotations:
(4) Play and search:
Documentation can be found in the user_guide.pdf
file.
Video tutorials (in French) are available on my Youtube channel, with a dedicated playlist:
- 01: ouvrir un fichier,
- 02: annoter,
- 03: stratégies d'annotation,
- 04: sauver les annotations,
- 05: la popup,
- 06: naviguer,
- 07: annoter les propriétés de chaque mention,
- 08: rechercher,
- 09 configurer,
- 10: récapitulatif,
- 11: comment afficher la popup bloquée par Firefox?
Use the coreference database project scripts to convert your work into a relational database, in the form of a series of CSV (Comma Separated Values) files, that you can use in a spreadsheet program like Microsoft Office or LibreOffice Calc, or in a specialized statistic program like R or Python's Pandas.
This works for a single text or a whole corpus (several texts separately annotated with SACR).
The table (CSV files) are:
tokens
: all the tokens in the textssentences
: all the sentences in the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),paragraphs
: all the paragraphs in the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),texts
: all the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),chains
: all the chains in the texts, with specific annotations (like the number of mentions, etc.)mentions
: all the mentions in the texts, with specific annotations (like the name of the chain, the size of the chain, etc.)relations
: all the relations in the texts, with specific annotations (like the distance between two mentions). There are several types of relations:first
: relations from the first mention to every other mentions in the chain (A-B, A-C, A-D...),consecutive
: relations from a mention to the next mention in the chain (A-B, B-C, C-D...),all
: both first and consecutive relations.
See the corefconversion
project to convert from and to Conll and other formats.
Source code may be found on github or boberle.com.
The tool is distributed under the terms of the Mozilla Public License v2. This program comes with ABSOLUTELY NO WARRANTY, see the LICENSE file for more details.
Want to talk? Reach me at [email protected].