Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Armenian translations #13

Open
jhdeov opened this issue Sep 10, 2022 · 9 comments
Open

Armenian translations #13

jhdeov opened this issue Sep 10, 2022 · 9 comments

Comments

@jhdeov
Copy link

jhdeov commented Sep 10, 2022

Hello. A while ago I manually compiled text-file versions of three Armenian Bibles: one in Western Armenian, two in Eastern Armenian (different translations). I don't know if you're still actively adding translations.

@christos-c
Copy link
Owner

christos-c commented Sep 12, 2022

Hi @jhdeov - thanks for the pointers. I'd be happy to add these versions in my corpus, but these days I don't have time to write custom readers for conversion into the TEI XML format. If you can, please use (or extend) the tools in my other repository (e.g. see https://github.com/christos-c/bible-corpus-tools/blob/master/src/bible/BibleConstructor.java) to create XML versions of these bibles and send me a Pull Request.

@jhdeov
Copy link
Author

jhdeov commented Sep 29, 2022

Hiya. Hmm. So I tried to make it easier for myself by trying to convert my text files into an excel sheet that's organized by book-chapter-verse. Because I had already extracted the content from Bible sites a while ago. Do you think there's an easy-ish modification to your code to convert such an excel sheet (or a TSV version) to your XML notation? https://easyupload.io/u0qh95
(PS: I was going to recycle my files for a corpus project anyway, so this github issue gave me an excuse to do it. So no worries if you think it's too much hassle to incorporate for your repo)

@christos-c
Copy link
Owner

Hi @jhdeov, yes that sounds more doable than working with an arbitrary text source. If you can share a small sample (a couple of chapters) in tsv format, I should be able to write the converter.

@jhdeov
Copy link
Author

jhdeov commented Sep 29, 2022

sure @christos-c. Here is a bit of Genesis. The first row (the headers) is pretty intuitive (book, chapter #, verse #, the actual content)

@christos-c
Copy link
Owner

Thanks @jhdeov. I'm on holidays now, I'll have a look at some point next week.

@christos-c
Copy link
Owner

Hi @jhdeov I have added a tab-separated file reader in my corpus-tools collection. To use it, add the file names in the array on line 15 (see the TODO note), then change the BibleConstructor.java file to use the new reader by specifying the location of all the files you have generated:

// Add this on line 7
import bible.readers.TSVFileReader;

// line 37
reader = new TSVFileReader("<FOLDER>");
// Comment out any other readers
// reader = new BibleOrgHTMLReader("amharic");

Also make sure to change the variables at the beginning of the file (lines 11-20) to the correct language, language code and distribution source.

To generate the XML file, compile the folder using the instructions in the README: make a "bin" folder and run the following
javac -cp "lib/*" -d bin src/bible/readers/*.java src/bible/*.java
then run the main constructor
java -cp "lib/*:bin" bible.BibleConstructor

Hopefully that will work without errors, but let me know if you find any issues.

@jhdeov
Copy link
Author

jhdeov commented Oct 10, 2022

Thanks @christos-c. It did work (here). A minor issue is that I had manually compiled the original TSV a year or so ago by copying from a liturgical website. The website didn't specify the exact edition of the Bible they used, so I'm trying to contact them to double-check the official edition and etc. Given that this worked, I can also try adding an hye version and a xcl version down the line too.

@christos-c
Copy link
Owner

Hi @jhdeov the file looks good! Once you have the edition details, feel free to send me the file as PR. Just to check that these three translations would be in addition to the current Armenian one or should one of them replace the text I have?

@jhdeov
Copy link
Author

jhdeov commented Oct 11, 2022

@christos-c they would replace them because they're three iso-codes and they're complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants