Logo des Repositoriums
 

it - Information Technology 65(6) - October 2024

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 4 von 4
  • Zeitschriftenartikel
    Preparing multi-layered visualisations of Old Babylonian cuneiform tablets for a machine learning OCR training model towards automated sign recognition
    (it - Information Technology: Vol. 65, No. 6, 2023) Hameeuw, Hendrik; De Graef, Katrien; Ryberg Smidt, Gustav; Goddeeris, Anne; Homburg, Timo; Chandrasekar, Krishna Kumar Thirukokaranam
    In the framework of the CUNE-IIIF-ORM project the aim is to train an Artificial Intelligence Optical Character Recognition (AI-OCR) model that can automatically locate and identify cuneiform signs on photorealistic representations of Old Babylonian texts (c. 2000–1600 B.C.E.). In order to train the model, c. 200 documentary clay tablets have been selected. They are manually annotated by specialist cuneiformists on a set of 12 still raster images generated from interactive Multi-Light Reflectance images. This image set includes visualisations with varying light angles and simplifications based on the dept information on the impressed signs in the surface. In the Cuneur Cuneiform Annotator, a Gitlab-based web application, the identified cuneiform signs are annotated with polygons and enriched with metadata. This methodology builds a qualitative annotated training corpus of approximately 20,000 cropped signs (i.e. 240,000 visualizations), all with their unicode codepoint and conventional sign name. It will act as a multi-layerd core dataset for the further development and fine-tuning of a machine learning OCR training model for the Old Babylonian cuneiform script. This paper discusses how the physical nature of handwritten inscribed Old Babylonian documentary clay tablets challenges the annotation and metadating task, and how these have been addressed within the CUNE-IIIF-ORM project to achieve an effective training corpus to support the training of a machine learning OCR model.
  • Zeitschriftenartikel
    Three degrees of separation: networks in the city of Babylon during the Reign of Darius I (522–486 BCE)
    (it - Information Technology: Vol. 65, No. 6, 2023) Wang, Jinyan
    In this paper, I reconstruct the networks of Babylonian urban dwellers during the reign of Darius I (522–486 BCE) based on 803 tablets from 10 private archives in Babylon. The main aim is to examine the structure and connectivity of the network that connected different urban families and groups of individuals outside the families. I focus on the positions individuals occupied within the network that yielded them the power to connect smaller parts of the network. The first approach used to identify and analyze these positions is the betweenness centrality measure. The second approach is the analytic concept of brokerage, the role of mediating between two or more individuals or communities that would otherwise have no connection to each other. I identify differences in the ways that the intermediate position of brokers affected the formation of the network. These brokerage roles resulted from families’ strategies to increase their household wealth by constructing and optimizing marriage, prebendary, and business relations.
  • Zeitschriftenartikel
    Keep me PoS-ted: experimenting with Part-of-Speech prediction on Old Babylonian letters
    (it - Information Technology: Vol. 65, No. 6, 2023) Ryberg Smidt, Gustav; De Graef, Katrien; Lefever, Els
    Within this paper we will account for a cooperation between Ghent University based Assyriologists and computational linguists that has set up a pilot study to analyse the language used in Old Babylonian (OB) letters using Natural Language Processing (NLP) techniques. OB letters make up an interesting dataset because (1) they form an invaluable source for everyday vernacular language, and (2) more than 5000 have been recovered, many of which are accessible in transliteration and translation through the series Altbabylonische Briefe and the Cuneiform Digital Library Initiative. Based on a first batch of letters from OB Sippar, later extended by other Akkadian letters, we aim to develop machine learning approaches to perform semi-automatic text analysis and annotation of the letters. We will here present a Part-of-Speech (PoS) tag prediction model using machine learning. The input data is Akkadian in transliteration and the best performing model is a fine-tuned Multilingual BERT Transformer with Word embeddings (weighted avg F1: 90.19 %). When compared to the benchmark attempt of PoS tagging on a larger Akkadian corpus (97.67 %), it leaves room for improvement. However, analysing the results shows us that multilingual word embeddings improve the model performance and with an enlargement of the corpus targeting certain classes, we could considerably better the macro average F1 scores.
  • Zeitschriftenartikel
    Frontmatter
    (it - Information Technology: Vol. 65, No. 6, 2023) Frontmatter