Logo des Repositoriums
 
Zeitschriftenartikel

Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers

Vorschaubild nicht verfügbar

Volltext URI

Dokumententyp

Text/Journal Article

Zusatzinformation

Datum

2023

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Springer

Zusammenfassung

Natural language has proven to be a valuable source of data for various scientific inquiries including landscape perception and preference research. However, large high quality landscape relevant corpora are scare. We here propose and discuss a natural language processing workflow to identify landscape relevant documents in large collections of unstructured text. Using a small curated high quality collection of actively crowdsourced landscape descriptions we identify and extract similar documents from two different corpora ( Geograph and WikiHow ) using sentence-transformers and cosine similarity scores. We show that 1) sentence-transformers combined with cosine similarity calculations successfully identify similar documents in both Geograph and WikiHow effectively opening the door to the creation of new landscape specific corpora, 2) the proposed sentence-transformer approach outperforms traditional Term Frequency - Inverse Document Frequency based approaches and 3) the identified documents capture similar topics when compared to the original high quality collection. The presented workflow is transferable to various scientific disciplines in need of domain specific natural language corpora as underlying data.

Beschreibung

Baer, Manuel F.; Purves, Ross S. (2023): Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers. KI - Künstliche Intelligenz: Vol. 37, No. 1. DOI: 10.1007/s13218-022-00793-3. Springer. ISSN: 1610-1987

Zitierform

Tags