Logo des Repositoriums
 

Identifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformers

dc.contributor.authorBaer, Manuel F.
dc.contributor.authorPurves, Ross S.
dc.date2023-03-01
dc.date.accessioned2024-11-18T13:19:08Z
dc.date.available2024-11-18T13:19:08Z
dc.date.issued2023
dc.description.abstractNatural language has proven to be a valuable source of data for various scientific inquiries including landscape perception and preference research. However, large high quality landscape relevant corpora are scare. We here propose and discuss a natural language processing workflow to identify landscape relevant documents in large collections of unstructured text. Using a small curated high quality collection of actively crowdsourced landscape descriptions we identify and extract similar documents from two different corpora ( Geograph and WikiHow ) using sentence-transformers and cosine similarity scores. We show that 1) sentence-transformers combined with cosine similarity calculations successfully identify similar documents in both Geograph and WikiHow effectively opening the door to the creation of new landscape specific corpora, 2) the proposed sentence-transformer approach outperforms traditional Term Frequency - Inverse Document Frequency based approaches and 3) the identified documents capture similar topics when compared to the original high quality collection. The presented workflow is transferable to various scientific disciplines in need of domain specific natural language corpora as underlying data.de
dc.identifier.doi10.1007/s13218-022-00793-3
dc.identifier.issn1610-1987
dc.identifier.urihttp://dx.doi.org/10.1007/s13218-022-00793-3
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/45378
dc.publisherSpringer
dc.relation.ispartofKI - Künstliche Intelligenz: Vol. 37, No. 1
dc.relation.ispartofseriesKI - Künstliche Intelligenz
dc.subjectCrowdsourcing
dc.subjectLandscape research
dc.subjectNatural language processing
dc.subjectSentence transformers
dc.titleIdentifying Landscape Relevant Natural Language using Actively Crowdsourced Landscape Descriptions and Sentence-Transformersde
dc.typeText/Journal Article
mci.reference.pages55-67

Dateien