Logo des Repositoriums
 
Konferenzbeitrag

Significance of low frequent terms in patent classification using IPC hierarchy

Lade...
Vorschaubild

Volltext URI

Dokumententyp

Text/Conference Paper

Zusatzinformation

Datum

2011

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Gesellschaft für Informatik e.V.

Zusammenfassung

International Patent Classification (IPC) is a standard taxonomy or hierarchy maintained by WIPO (World Intellectual Property Organization). Using this standard hierarchy, patents are classified using machine learning techniques. The first sets of experiments investigate the effect on classification performance at different levels (section, class, subclass and main group level) of IPC hierarchy. Experiments show that there is a decrease in performance going deep down the hierarchy and at the higher level of detail, the accuracy is very low. This might be due to inclusion of more general terms than specific terms. The deeper level (higher level of details) of hierarchy is more specific. The internal nodes of a hierarchy are more general than the leaf nodes and the leaf nodes are more specific than the internal nodes. Classification at different levels of hierarchy considering low frequent terms were investigated. Low frequent terms can refer to specific terms and it cannot be ignored as noise. The second set of experiments focuses on what field of patents optimize the classification accuracy at different levels of detail. The third set of experiments focuses on the significance of low frequent terms across the IPC hierarchy. Experiments show that by including low frequent terms, the accuracy at higher level of details can be improved significantly. The low frequent terms set outperforms full terms set in achieving better performance in terms of accuracy and it also reduces the dimension of text substantially.

Beschreibung

Khattak, Akmal Saeed; Heyer, Gerhard (2011): Significance of low frequent terms in patent classification using IPC hierarchy. 11th International Conference on Innovative Internet Community Systems (I2CS 2011). Bonn: Gesellschaft für Informatik e.V.. PISSN: 1617-5468. ISBN: 978-3-88579-280-2. pp. 239-250. Regular Research Papers. Berlin. June 15-17, 2011

Schlagwörter

Zitierform

DOI

Tags