Logo des Repositoriums
 
Textdokument

SubRosa: Determining Movie Similarities based on Subtitles

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Zusatzinformation

Datum

2021

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik, Bonn

Zusammenfassung

For streaming websites, media shopping platforms and movie databases, movie recommendation systems have become an important technology, where mostly hybrid methods of collaborative and content-based filtering on the basis of user ratings and user-generated content have proven to be effective. However, these methods can lead to popularity-biased results that show an under-representation of those movies for which only little user-generated data exists. In this paper we will discuss the possibility of generating movie recommendations that are not based on user-generated data or metadata, but solely on the content of the movies themselves, confining ourselves to movie dialog. We extract low-level features from movie subtitles by using methods from Information Retrieval, Natural Language Processing and Stylometry, and examine a possible correlation of these features' similarity with the overall movie similarity. In addition we present a novel web application called SubRosa (http://ch01.informatik.uni-leipzig.de:5001/), which can be used to interactively compare the results of different feature combinations.

Beschreibung

Luhmann, Jan; Burghardt, Manuel; Tiepmar, Jochen (2021): SubRosa: Determining Movie Similarities based on Subtitles. INFORMATIK 2020. DOI: 10.18420/inf2020_119. Gesellschaft für Informatik, Bonn. PISSN: 1617-5468. ISBN: 978-3-88579-701-2. pp. 1271-1280. Methoden und Anwendungen der Computational Humanities. Karlsruhe. 28. September - 2. Oktober 2020

Zitierform

Tags