Logo des Repositoriums
 
Zeitschriftenartikel

Same-Same But Different: On Understanding Duplicates in Stack Overflow

Vorschaubild nicht verfügbar

Volltext URI

Dokumententyp

Text/Journal Article

Zusatzinformation

Datum

2019

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Springer Verlag

Zusammenfassung

Stack Overflow (SO) is one of the most popular online sites for asking and answering developers’ questions. New posts that cover exactly the same knowledge as previously posted questions get closed and deleted by the community. However, new posts that are very similar to previous questions but which are phrased slightly different are kept and tagged as duplicates: since they might include additional information, hints, or keywords. In this paper, we study exact duplicates and similar duplicates in SO in order to get insights about their properties and content and to understand how the community distinguishes useful from useless (i. e. to be deleted) redundant knowledge. We identified several interesting trends. Unique questions are significantly longer than others. Original questions get answered faster, include more answers, and get more frequently viewed than exact and similar duplicates. When comparing the overlapped text in duplicate pairs, we found almost no difference between exact and similar duplicates. In both cases, about 20–25 % of the question text and 40 % of the tags are identical in an original and its duplicate. However, the answers of the duplicates seem much more diverse with only 5–6 % repeated text.

Beschreibung

Ellmann, Mathias (2019): Same-Same But Different: On Understanding Duplicates in Stack Overflow. Informatik Spektrum: Vol. 42, No. 4. DOI: 10.1007/s00287-019-01185-y. Berlin Heidelberg: Springer Verlag. PISSN: 0170-6012. pp. 266-286. Hauptbeitrag

Schlagwörter

Zitierform

Tags