Zeitschriftenartikel
Towards a word similarity gold standard for Akkadian: creation and model optimization
Vorschaubild nicht verfügbar
Volltext URI
Dokumententyp
Text/Journal Article
Zusatzinformation
Datum
2024
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Verlag
De Gruyter
Zusammenfassung
We present a word similarity gold standard for Akkadian, a language documented in ancient Mesopotamian sources from the 24th century BCE until the first century CE. The gold standard comprises 300 word pairs ranked by their paradigmatic similarity by five independently working Assyriologists. We use the gold standard to tune PMI + SVD and fastText models to improve their performance. We also present a hyper-parametrized PMI + SVD model for building count-based word embeddings, that aims to deal with the data sparsity and repetition issues encountered in Akkadian texts. Our model combines Dirichlet smoothing with context distribution smoothing, and uses context similarity weighting to down-sample distortion caused by formulaic litanies and partially or fully duplicated passages.