This repository contains food dishes and images of the BORSch dataset. In our paper "What are Foundation Models Cooking in the Post-Soviet World?", we used this dataset to explore Post-Soviet cultural understanding in Russian and Ukrainian models.
The data is split into three sub-datasets:
- RU (Russian): Dishes collected in the Russian language. This data comes with name, country of origin, and source (wikidata/bootstrapping).
- UK (Ukrainian): Dishes collected in the Ukrainian language. This data comes with name, country of origin, and source (wikidata/bootstrapping).
- PARALLEL (Russian/Ukrainian): This is the parallel version of the above two corpuses. Only dishes from Post-Soviet countries are included.
Each dish has an ID (given in the ID column). This allows one to trace the dish to its image, in the images subfolder in the RU/UK datasets.
TBD
Anton Lavrouk: Scholar | LinkedIn | Personal Website antonlavrouk [AT] google [DOT] com
The authors would like to thank Oleksandr Lavreniuk, Dennis Pozhidaev, and Jad Matthew Bardawil for their valuable discussion and annotation; Kartik Goyal for their valuable discussion.