Logo des Repositoriums
 

Significance of low frequent terms in patent classification using IPC hierarchy

dc.contributor.authorKhattak, Akmal Saeed
dc.contributor.authorHeyer, Gerhard
dc.contributor.editorEichler, Gerald
dc.contributor.editorKüpper, Axel
dc.contributor.editorSchau, Volkmar
dc.contributor.editorFouchal, Hacène
dc.contributor.editorUnger, Herwig
dc.contributor.editorEichler, Gerald
dc.contributor.editorKüpper, Axel
dc.contributor.editorSchau, Volkmar
dc.contributor.editorFouchal, Hacène
dc.contributor.editorUnger, Herwig
dc.date.accessioned2019-01-11T09:29:06Z
dc.date.available2019-01-11T09:29:06Z
dc.date.issued2011
dc.description.abstractInternational Patent Classification (IPC) is a standard taxonomy or hierarchy maintained by WIPO (World Intellectual Property Organization). Using this standard hierarchy, patents are classified using machine learning techniques. The first sets of experiments investigate the effect on classification performance at different levels (section, class, subclass and main group level) of IPC hierarchy. Experiments show that there is a decrease in performance going deep down the hierarchy and at the higher level of detail, the accuracy is very low. This might be due to inclusion of more general terms than specific terms. The deeper level (higher level of details) of hierarchy is more specific. The internal nodes of a hierarchy are more general than the leaf nodes and the leaf nodes are more specific than the internal nodes. Classification at different levels of hierarchy considering low frequent terms were investigated. Low frequent terms can refer to specific terms and it cannot be ignored as noise. The second set of experiments focuses on what field of patents optimize the classification accuracy at different levels of detail. The third set of experiments focuses on the significance of low frequent terms across the IPC hierarchy. Experiments show that by including low frequent terms, the accuracy at higher level of details can be improved significantly. The low frequent terms set outperforms full terms set in achieving better performance in terms of accuracy and it also reduces the dimension of text substantially.en
dc.identifier.isbn978-3-88579-280-2
dc.identifier.pissn1617-5468
dc.identifier.urihttps://dl.gi.de/handle/20.500.12116/18993
dc.language.isoen
dc.publisherGesellschaft für Informatik e.V.
dc.relation.ispartof11th International Conference on Innovative Internet Community Systems (I2CS 2011)
dc.relation.ispartof11th International Conference on Innovative Internet Community Systems (I2CS 2011)
dc.relation.ispartofseriesLecture Notes in Informatics (LNI) - Proceedings, Volume P-186
dc.titleSignificance of low frequent terms in patent classification using IPC hierarchyen
dc.typeText/Conference Paper
gi.citation.endPage250
gi.citation.publisherPlaceBonn
gi.citation.startPage239
gi.conference.dateJune 15-17, 2011
gi.conference.locationBerlin
gi.conference.sessiontitleRegular Research Papers

Dateien

Originalbündel
1 - 1 von 1
Lade...
Vorschaubild
Name:
239.pdf
Größe:
207.37 KB
Format:
Adobe Portable Document Format