Auflistung nach Autor:in "Gelbukh, Alexander"
1 - 2 von 2
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragOn detection of malapropisms by multistage collocation testing(Natural language processing and information systems, 2003) Bolshakov, Igor A.; Gelbukh, AlexanderMalapropism is a (real-word) error in a text consisting in unintended replacement of one content word by another existing content word similar in sound but semantically incompatible with the context and thus destructing text cohesion, e.g.: they travel around the word. We present an algorithm of malapropism detection and correction based on evaluating the cohesion. As a measure of semantic compatibility of words we consider their ability to form syntactically linked and semantically admissible word combinations (collocations), e.g: travel (around the) world. With this, text cohesion at a content word is measured as the number of collocations it forms with the words in its immediate context. We detect malapropisms as words forming no collocations in the context. To test whether two words can form a collocation, we consider two types of resources: a collocation DB and an Internet search engine, e.g., Google. We illustrate the proposed method by classifying, tracing, and evaluating several English malapropisms.
- KonferenzbeitragSelection of representative documents for clusters in a document collection(Natural language processing and information systems, 2003) Gelbukh, Alexander; Alexandrov, Mikhail; Bourek, Ales; Makagonov, PavelAn efficient way to explore a large document collection (e.g., the search results returned by a search engine) is to subdivide it into clusters of relatively similar documents, to get a general view of the collection and select its parts of particular interest. A way of presenting the clusters to the user is selection of a document in each cluster. For different purposes this can be done in different ways. We consider three cases: selection of the average, the "most typical," and the "least typical" document. The algorithms are given, which rely on a dictionary of keywords reflecting the topic of the user's interest. After clustering, we select a document in each cluster basing on its closeness to the other ones. Different distance measures are discussed; preliminary experimental results are presented. Our approach was implemented in the new version of Document Classifier system.