Classifying business types from twitter posts using active learning
ISSN der Zeitschrift
Gesellschaft für Informatik e.V.
Today, many companies have adopted Twitter as an additional marketing medium to advertise and promote their business activities. One possible solution for organizing a large number of posts is to classify them into a predefined category of business types. Applying normal text categorization technique on Twitter is ineffective due to the short-length (140-character limit) characteristic of each post and a large number of unlabeled data. In this paper, we propose a text categorization approach based on the active learning technique for classifying Twitter posts into three business types, i.e., airline, food and computer & technology. By applying the active learning, we started by constructing an initial text categorization model from a small set of labelled data. Using this text categorization model, we obtain more positive data instances for constructing a new model by selecting the test data which are predicted as positive. As shown from the experimental results, our proposed approach based on active learning helped increase the classification accuracy over the normal text categorization approach.