An online GroTE demo is available at https://gsarti-grote.hf.space. You can use admin
as a login code, and upload one of the files in assets/examples for the editing. The demo will log events to the repository grote/grote-logs.
demo.mp4
- Install requirements:
pip install -r requirements.txt
. - Make sure you have a local
npm
installation available to run the front-end. - Edit the GroTE config to set your custom
login_codes
andevent_logs_hf_dataset_id
. By default, you will be able to access the demo using theadmin
code, and logs will be written to a locallogs
directory, and synchronized with a privategrote-logs
dataset on your user profile in the Hugging Face Hub. - Run
grote
in your command line to start the server. You will need a Hugging Face token withWrite
permissions to log edits. - Visit http://127.0.0.1:7860 to access the demo.
- Enter your login code and load an example document from assets/examples.
- Press "📝 Start" to begin editing the document.
- Use the "Duplicate this space" option from the original GroTE demo to create a copy in your user/organization profile.
- In Settings > Variables and secrets, change the default value of
EVENT_LOGS_HF_DATASET_ID
,HF_TOKEN
andLOGIN_CODES
to your desired values (see GroTE config for more details). - Upon running the app and starting the editing, you should see the logs being written to the dataset having the id is specified in
EVENT_LOGS_HF_DATASET_ID
.
Use or modify the following code to create multiple copies of the app programmatically:
from huggingface_hub import duplicate_space, SpaceHardware
NUM_TRANSLATORS = 5
USER_OR_ORG = "<your_username_or_organization>"
YOUR_HF_TOKEN = "hf_<your_token>"
names = [f"translator-{idx}" for idx in range(1, NUM_TRANSLATORS + 1)]
for name in names:
duplicate_space(
from_id="gsarti/grote",
to_id=f"{USER_OR_ORG}/grote-{name}",
private=False,
token=YOUR_HF_TOKEN,
hardware=SpaceHardware.CPU_BASIC,
secrets=[
{
"key": "HF_TOKEN",
"value": YOUR_HF_TOKEN,
"description": " Hugging Face token for logging purposes",
},
{
"key": "LOGIN_CODES",
"value": f"{name.lower()},admin",
"description": "List of login codes for the users",
},
],
variables=[
{
"key": "MAX_NUM_SENTENCES",
"value": "50",
},
{
"key": "EVENT_LOGS_SAVE_FREQUENCY",
"value": "50",
},
{
"key": "EVENT_LOGS_HF_DATASET_ID",
"value": f"{USER_OR_ORG}/grote-{name}",
},
{
"key": "EVENT_LOGS_LOCAL_DIR",
"value": "logs",
},
{
"key": "ALLOWED_TAGS",
"value": "minor,major",
},
{
"key": "TAG_LABLES",
"value": "Minor,Major",
},
{
"key": "TAG_COLORS",
"value": "#ffedd5,#fcd29a",
}
]
)
for name in names:
print(f"URL: https://{USER_OR_ORG}-grote-{name}.hf.space\nLogin code: {name.lower()}")
- Open the webpage of the GroTE interface
- Insert the provided login code
- Load one of the provided files
- Press “📝 Start”
- Perform the editing. If needed, use green checkmarks to remove highlights from a segment.
- When all segments for the file are finished, click “✅ Done”
- A message “Saving trial information. Don't close the tab until the download button is available!” will appear. Do not close the tab.
- When the message “Saving complete! Download the output file by clicking the 'Download translations' button below.” appears, click “📥 Download translations” to download the edited files. The file will have the name
<LOGIN CODE>_<FILENAME>_output.txt
- Click “⬅️ Back to data loading” to return to the file loading page.
- If needed, pause and take a break
Steps 2-9 are repeated for each file, which represents a standalone document with ordered segments.
While the current version of GroTE is functional, there are several improvements that could be made to enhance the user experience and functionality. I am unlikely to implement these changes in the near future, but I am happy to provide guidance and support to anyone interested in contributing to the project.
- Separate rendering logic for loading/editing tabs (see ICLR 2024 Papers interface for an example)
- Use latest Gradio version to integrate features like multi-page structure, client-side functions, and dynamic rendering of components.
- Enable restoring the previous state of edited sentences if matching filename and user are found in the logs in the past 24 hours (with a modal to enable starting from scratch).
- Possibly rethink logging format to reduce redundancy and improve readability.
- Add optional tab to visualize the editing process (e.g., Highlighted diffs between original and edited sentences, replay of editing process by looping
.then
withtime.sleep
, download scoped logs for single text). - Change saving logic to use BackgroundScheduler
- Change transition from editing to loading to preserve login code and possibly allow the pre-loading of several files for editing (would require a custom
FileExplorer
component to mark done documents).
If you have any questions or feedback, please feel free to reach out to me at [email protected].