Stylometric Analysis for Akkadian Cuneiform Texts

This Jupyter notebook provides code to perform stylometric analysis on cuneiform texts. The code can be accessed and used without any installation through binder at the following link: https://mybinder.org/v2/gh/ARomach/Cuneiform-Stylometry/HEAD.

The code was designed specifically for Akkadian texts, however, it is likely to work well on all languages that used the cuneiform script, meaning all languages that have a transliteration system that follows the same principles. Additionally, texts written in other non-alphabetic writing systems for whose characters there are unicode representations could probably benefit from this code as well. The code is based on Introduction to stylometry with python from the Programming Historian website.

It is accompanied by a metadata folder and texts folders. The texts folder currently holds one corpus as a case-study, 81 legal and administrative documents from the Neo-Babylonian period (ca. 7^th^-5^th^ centuries BCE). These texts edited in the 1975 dissertation of Raymond B. Dillard. They are currently in the Free Library of Philadelphia (FLP), after being purchased on the antiquities market in the early 20^th^ century by John Frederik Lewis.

The notebook employs the following methods on the texts:

Tf-idf vectorization
Create a distance matrix between the texts
T-SNE dimenstion reduction to display the distances between the texts on a scatter plot
Network analysis of the relationships between similar texts
A close reading: looking at groupings of nearest texts

The code is written so that no previous knowledge in python or stylometric methods is assumed. There are instructions throughout that explain the code, the stylometric methods, and give the necessary information to choose specific parameters.

As different parameters can give different results, it is possible to run the code several times and assess the differences.

Certain parameters are decided in advance (e.g. not to lowercase all characters) and some are left to the user (e.g. choosing 1-gram, 2-gram, or 3-gram). The idea behind this was that certain parameters are necessary for a logical parsing of cuneiform transliterations or their representation in Unicode cuneiform glyphs.

Adding corpora

The stylometric analysis can be performed on texts which are either transliterated, or in which the signs are represented in Unicode. For transliterated texts, it is recommended that personal names and numbers be replaced with PN and NUM (or similar), depending on the focus of study (i.e. period, genre, scribal habits etc.).

The new corpora need to be added to the relevant folders before starting to use the code. This includes the following:

A folder which includes the texts for stylometric analysis:

The folder should be placed under the texts folder.
Each file should be a plain text file and contain one text only.
The texts should be transliterated texts without any editorial marks (including no hyphens and dots between signs), or texts in Unicode cuneiform.
It is recommended that each file name will be the name of the text (this is later used as the text identifier in the dataframes and CSVs produced).

A metadata file

The file should be placed under the metadata folder.
The file should be a CSV file.
One of the columns needs to be titled text_name, and its values need to be identical to the file names of the texts under the texts folder.
It is recommended that additional columns will have relevant information about the texts you want to study (period, genre, area, scribe, etc.).

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
metadata		metadata
texts		texts
LICENSE		LICENSE
README.md		README.md
Stylometry for Cuneiform.ipynb		Stylometry for Cuneiform.ipynb
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stylometric Analysis for Akkadian Cuneiform Texts

Adding corpora

About

Releases

Packages

Languages

License

ARomach/Cuneiform-Stylometry

Folders and files

Latest commit

History

Repository files navigation

Stylometric Analysis for Akkadian Cuneiform Texts

Adding corpora

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages