Logo des Repositoriums
 

A Comparative Analysis on Machine Learning Techniques for Research Metadata: the ARDUOUS Case Study

dc.contributor.authorYadav, Dipendra
dc.contributor.authorTonkin, Emma
dc.contributor.authorStoev, Teodor
dc.contributor.authorYordanova, Kristina
dc.contributor.editorKlein, Maike
dc.contributor.editorKrupka, Daniel
dc.contributor.editorWinter, Cornelia
dc.contributor.editorGergeleit, Martin
dc.contributor.editorMartin, Ludger
dc.date.accessioned2024-10-21T18:24:26Z
dc.date.available2024-10-21T18:24:26Z
dc.date.issued2024
dc.description.abstractThe rapid increase in research publications necessitates effective methods for organizing and analyzing large volumes of textual data. This study evaluates various combinations of embedding models, dimensionality reduction techniques, and clustering algorithms applied to metadata from papers accepted at the ARDUOUS (Annotation of useR Data for UbiquitOUs Systems) workshop over a period of 7 years. The analysis encompasses different types of keywords, including All Keywords (a comprehensive set of all extracted keywords), Multi-word Keywords (phrases consisting of two or more words), Existing Keywords (keywords already present in the metadata), and Single-word Keywords (individual words). The study found that the highest silhouette scores were achieved with 3, 4, and 5 clusters across all keyword types. Principal Component Analysis (PCA) and Independent Component Analysis (ICA) were identified as the most effective dimensionality reduction techniques, while DistilBERT embeddings consistently yielded high scores. Clustering algorithms such as k-means, k-medoids, and Gaussian Mixture Models (GMM) demonstrated robustness in forming well-defined clusters. These findings provide valuable insights into the main topics covered in the workshop papers and suggest optimal methodologies for analyzing research metadata, thereby enhancing the understanding of semantic relationships in textual data.en
dc.identifier.doi10.18420/inf2024_37
dc.identifier.isbn978-3-88579-746-3
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/45196
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartofINFORMATIK 2024
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-352
dc.subjectKeyword Extraction
dc.subjectClustering Techniques
dc.subjectDimensionality Reduction
dc.subjectARDUOUS Workshop
dc.subjectNatural Language Processing
dc.subjectContextual Embeddings
dc.subjectResearch Metadata Analysis
dc.titleA Comparative Analysis on Machine Learning Techniques for Research Metadata: the ARDUOUS Case Studyen
dc.typeText/Conference Paper
gi.citation.endPage509
gi.citation.publisherPlaceBonn
gi.citation.startPage499
gi.conference.date24.-26. September 2024
gi.conference.locationWiesbaden
gi.conference.sessiontitle8th International Workshop on Annotation of useR Data for UbiquitOUs Systems

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
Yadav_et_al_A_Comparative_Analysis.pdf
Größe:
3.12 MB
Format:
Adobe Portable Document Format