Significance of low frequent terms in patent classification using IPC hierarchy
dc.contributor.author | Khattak, Akmal Saeed | |
dc.contributor.author | Heyer, Gerhard | |
dc.contributor.editor | Eichler, Gerald | |
dc.contributor.editor | Küpper, Axel | |
dc.contributor.editor | Schau, Volkmar | |
dc.contributor.editor | Fouchal, Hacène | |
dc.contributor.editor | Unger, Herwig | |
dc.contributor.editor | Eichler, Gerald | |
dc.contributor.editor | Küpper, Axel | |
dc.contributor.editor | Schau, Volkmar | |
dc.contributor.editor | Fouchal, Hacène | |
dc.contributor.editor | Unger, Herwig | |
dc.date.accessioned | 2019-01-11T09:29:06Z | |
dc.date.available | 2019-01-11T09:29:06Z | |
dc.date.issued | 2011 | |
dc.description.abstract | International Patent Classification (IPC) is a standard taxonomy or hierarchy maintained by WIPO (World Intellectual Property Organization). Using this standard hierarchy, patents are classified using machine learning techniques. The first sets of experiments investigate the effect on classification performance at different levels (section, class, subclass and main group level) of IPC hierarchy. Experiments show that there is a decrease in performance going deep down the hierarchy and at the higher level of detail, the accuracy is very low. This might be due to inclusion of more general terms than specific terms. The deeper level (higher level of details) of hierarchy is more specific. The internal nodes of a hierarchy are more general than the leaf nodes and the leaf nodes are more specific than the internal nodes. Classification at different levels of hierarchy considering low frequent terms were investigated. Low frequent terms can refer to specific terms and it cannot be ignored as noise. The second set of experiments focuses on what field of patents optimize the classification accuracy at different levels of detail. The third set of experiments focuses on the significance of low frequent terms across the IPC hierarchy. Experiments show that by including low frequent terms, the accuracy at higher level of details can be improved significantly. The low frequent terms set outperforms full terms set in achieving better performance in terms of accuracy and it also reduces the dimension of text substantially. | en |
dc.identifier.isbn | 978-3-88579-280-2 | |
dc.identifier.pissn | 1617-5468 | |
dc.identifier.uri | https://dl.gi.de/handle/20.500.12116/18993 | |
dc.language.iso | en | |
dc.publisher | Gesellschaft für Informatik e.V. | |
dc.relation.ispartof | 11th International Conference on Innovative Internet Community Systems (I2CS 2011) | |
dc.relation.ispartof | 11th International Conference on Innovative Internet Community Systems (I2CS 2011) | |
dc.relation.ispartofseries | Lecture Notes in Informatics (LNI) - Proceedings, Volume P-186 | |
dc.title | Significance of low frequent terms in patent classification using IPC hierarchy | en |
dc.type | Text/Conference Paper | |
gi.citation.endPage | 250 | |
gi.citation.publisherPlace | Bonn | |
gi.citation.startPage | 239 | |
gi.conference.date | June 15-17, 2011 | |
gi.conference.location | Berlin | |
gi.conference.sessiontitle | Regular Research Papers |
Dateien
Originalbündel
1 - 1 von 1