Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers

Baer, Manuel F.; Purves, Ross S.

Zeitschriftenartikel

Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers

Dokumententyp

Text/Journal Article

Datum

2023

Autor:innen

Baer, Manuel F.

Purves, Ross S.

Quelle

KI - Künstliche Intelligenz: Vol. 37, No. 1

Verlag

Springer

Zusammenfassung

Natural language has proven to be a valuable source of data for various scientific inquiries including landscape perception and preference research. However, large high quality landscape relevant corpora are scare. We here propose and discuss a natural language processing workflow to identify landscape relevant documents in large collections of unstructured text. Using a small curated high quality collection of actively crowdsourced landscape descriptions we identify and extract similar documents from two different corpora ( Geograph and WikiHow ) using sentence-transformers and cosine similarity scores. We show that 1) sentence-transformers combined with cosine similarity calculations successfully identify similar documents in both Geograph and WikiHow effectively opening the door to the creation of new landscape specific corpora, 2) the proposed sentence-transformer approach outperforms traditional Term Frequency - Inverse Document Frequency based approaches and 3) the identified documents capture similar topics when compared to the original high quality collection. The presented workflow is transferable to various scientific disciplines in need of domain specific natural language corpora as underlying data.

Baer, Manuel F.; Purves, Ross S. (2023): Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers. KI - Künstliche Intelligenz: Vol. 37, No. 1. DOI: 10.1007/s13218-022-00793-3. Springer. ISSN: 1610-1987

Schlagwörter

Crowdsourcing||Landscape research||Natural language processing||Sentence transformers

DOI

10.1007/s13218-022-00793-3

Sammlungen

Künstliche Intelligenz 37(1) - März 2023

Komplettanzeige

Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers

Volltext URI

Dokumententyp

Zusatzinformation

Datum

Autor:innen

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Quelle

Verlag

Zusammenfassung

Beschreibung

Schlagwörter

Zitierform

DOI

Tags

Sammlungen