Skip to content

boberle/sacr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SACR -- Coreference Chain Annotation Tool

Introduction

SACR (from the French "Script d'Annotation des Chaînes de Référence") is a tool optimized for coreference chain annotation. It has been published in the following paper:

Oberle B. (2018). SACR: A Drag-and-Drop Based Tool for Coreference Annotation. Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC 2018). Miyazaki, Japan.

You can download the poster here.

Usage

SACR is a single webpage. All operations are done in the browser. You can download it and open the index.html file, or use it online at boberle.com.

The workflow is as follows:

(1) Mark the referring expressions:

Mark the referring expressions

(2) Build the coreference chains:

build the coreference chains

(3) Add feature annotations:

add feature annotations

(4) Play and search:

play and search

Getting help

Documentation can be found in the user_guide.pdf file.

Video tutorials (in French) are available on my Youtube channel, with a dedicated playlist:

Convert all the annotation into a relational database

Use the coreference database project scripts to convert your work into a relational database, in the form of a series of CSV (Comma Separated Values) files, that you can use in a spreadsheet program like Microsoft Office or LibreOffice Calc, or in a specialized statistic program like R or Python's Pandas.

This works for a single text or a whole corpus (several texts separately annotated with SACR).

The table (CSV files) are:

  • tokens: all the tokens in the texts
  • sentences: all the sentences in the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),
  • paragraphs: all the paragraphs in the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),
  • texts: all the texts, with specific annotations (like the number of tokens, mentions, chains, etc.),
  • chains: all the chains in the texts, with specific annotations (like the number of mentions, etc.)
  • mentions: all the mentions in the texts, with specific annotations (like the name of the chain, the size of the chain, etc.)
  • relations: all the relations in the texts, with specific annotations (like the distance between two mentions). There are several types of relations:
    • first: relations from the first mention to every other mentions in the chain (A-B, A-C, A-D...),
    • consecutive: relations from a mention to the next mention in the chain (A-B, B-C, C-D...),
    • all: both first and consecutive relations.

Conversion scripts to other formats

From and to other coreference formats

See the corefconversion project to convert from and to Conll and other formats.

Source code and licence

Source code may be found on github or boberle.com.

The tool is distributed under the terms of the Mozilla Public License v2. This program comes with ABSOLUTELY NO WARRANTY, see the LICENSE file for more details.

Contact

Want to talk? Reach me at [email protected].

About

Coreference Annotation Tool

Resources

License

Stars

Watchers

Forks

Packages

No packages published