Logo des Repositoriums
 

Same-Same But Different: On Understanding Duplicates in Stack Overflow

dc.contributor.authorEllmann, Mathias
dc.date.accessioned2019-09-30T07:31:21Z
dc.date.available2019-09-30T07:31:21Z
dc.date.issued2019
dc.description.abstractStack Overflow (SO) is one of the most popular online sites for asking and answering developers’ questions. New posts that cover exactly the same knowledge as previously posted questions get closed and deleted by the community. However, new posts that are very similar to previous questions but which are phrased slightly different are kept and tagged as duplicates: since they might include additional information, hints, or keywords. In this paper, we study exact duplicates and similar duplicates in SO in order to get insights about their properties and content and to understand how the community distinguishes useful from useless (i. e. to be deleted) redundant knowledge. We identified several interesting trends. Unique questions are significantly longer than others. Original questions get answered faster, include more answers, and get more frequently viewed than exact and similar duplicates. When comparing the overlapped text in duplicate pairs, we found almost no difference between exact and similar duplicates. In both cases, about 20–25 % of the question text and 40 % of the tags are identical in an original and its duplicate. However, the answers of the duplicates seem much more diverse with only 5–6 % repeated text.en
dc.identifier.doi10.1007/s00287-019-01185-y
dc.identifier.pissn0170-6012
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/27929
dc.language.isoen
dc.publisherSpringer Verlag
dc.relation.ispartofInformatik Spektrum: Vol. 42, No. 4
dc.titleSame-Same But Different: On Understanding Duplicates in Stack Overflowen
dc.typeText/Journal Article
gi.citation.endPage286
gi.citation.publisherPlaceBerlin Heidelberg
gi.citation.startPage266
gi.conference.sessiontitleHauptbeitrag

Dateien