Auflistung nach Schlagwort "text mining"
1 - 5 von 5
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragArchitecture of a recommender system to support collaboration in a software environment(WM 2003: Professionelles Wissesmanagement – Erfahrungen und Visionen, Beiträge der 2. Konferenz Professionelles Wissensmanagement, 2003) Lichtnow, Daniel; Loh, Stanley; Saldana Garin, Ramiro; Caringi, Augusto; Anjos, Pablo Lucas dosWithin organizations, people learn through exchanging knowledge. This kind of task (named collaboration) is important for the organizational learning. Collaboration can be supported by Information Technology tools as chats, newsgroups, forums and e-mailing lists. However, this kind of support only enables message exchange, lacking to help people in the learning process. This work presents the architecture of a recommender system to support collaboration among people in an software organization. The system analyzes textual messages sent during the session, identifies the context of the discussion and suggests documents, authorities (people with competence in a subject) and past discussions within the same context.
- TextdokumentA Hybrid Information Extraction Approach Exploiting Structured Data Within a Text Mining Process(BTW 2019, 2019) Kiefer, Cornelia; Reimann, Peter; Mitschang, BernhardMany data sets encompass structured data fields with embedded free text fields. The text fields allow customers and workers to input information which cannot be encoded in structured fields. Several approaches use structured and unstructured data in isolated analyses. The result of isolated mining of structured data fields misses crucial information encoded in free text. The result of isolated text mining often mainly repeats information already available from structured data. The actual information gain of isolated text mining is thus limited. The main drawback of both isolated approaches is that they may miss crucial information. The hybrid information extraction approach suggested in this paper adresses this issue. Instead of extracting information that in large parts was already available beforehand, it extracts new, valuable information from free texts. Our solution exploits results of analyzing structured data within the text mining process, i.e., structured information guides and improves the information extraction process on textual data. Our main contributions comprise the description of the concept of hybrid information extraction as well as a prototypical implementation and an evaluation with two real-world data sets from aftersales and production with English and German free text fields.
- KonferenzbeitragA novel, comprehensive method to detect and predict protein-protein interactions applied to the study of vesicular trafficking(German Conference on Bioinformatics, 2006) Winter, Christof; Baust, Thorsten; Hoflack, Bernard; Schroeder, MichaelMotivation. Computational methods to predict protein-protein interactions are of great need. They can help to formulate hypotheses, guide experimental research and serve as additional measures to assess the quality of data obtained in high-throughput interaction experiments. Here, we describe a fully automated threestep procedure to predict and confirm protein-protein interactions. By maximising the information from text mining of the biomedical literature, data from interaction databases, and from available protein structures, we aim at generating a comprehensive picture of known and novel potential interactions between a given set of proteins. Results. A recent proteomics assay to identify the protein machinery involved in vesicular trafficking between the biosynthetic and the endosomal compartments revealed 35 proteins that were found as part of membrane coats on liposomes. When applying our method to this data set, we are able to reconstruct most of the interactions known to the molecular biologist. In addition, we predict novel interactions, among these potential linkers of the AP-1 and the Arp2/3 complex to membrane-bound proteins as well as a potential GTPase-GTPase effector interaction. Conclusions. Our method allows for a comprehensive network reconstruction that can assist the molecular biologist. Predicted interactions are backed up by structural or experimental evidence and can be inferred at varying levels of confidence. Our method pinpoints existing key interactions and can facilitate the generation of hypotheses.
- TextdokumentQuality Indicators for Text Data(BTW 2019 – Workshopband, 2019) Kiefer, CorneliaTextual data sets vary in terms of quality. They have different characteristics such as the average sentence length or the amount of spelling mistakes and abbreviations. These text characteristics have influence on the quality of text mining results. They may be measured automatically by means of quality indicators. We present indicators, which we implemented based on natural language processing libraries such as Stanford CoreNLP2 and NLTK3. We discuss design decisions in the implementation of exemplary indicators and provide all indicators on GitHub4. In the evaluation, we investigate free texts from production, news, prose, tweets and chat data and show that the suggested indicators predict the quality of two text mining modules.
- ZeitschriftenartikelSentiStorm: Echtzeit-Stimmungserkennung von Tweets(HMD Praxis der Wirtschaftsinformatik: Vol. 53, No. 4, 2016) Zangerle, Eva; Illecker, Martin; Specht, GüntherDas automatisierte Erkennen der Stimmung von Texten hat in den letzten Jahren stark an Bedeutung gewonnen. Insbesondere durch die rapide Zunahme der Geschwindigkeit, mit der in sozialen Medien Informationen verbreitet werden, ist eine Echtzeit-Bestimmung der Stimmung von Texten ein herausforderndes Problem. Der Mikroblogging-Dienst Twitter verzeichnet im Durchschnitt über 8000 versendete Nachrichten pro Sekunde. In dieser Arbeit stellen wir mit dem SentiStorm-Ansatz einen Ansatz zur Stimmungserkennung von Tweets vor. Dabei erzeugen wir in einem ersten Schritt Merkmalsvektoren für die Tweets, die sowohl linguistische Informationen über den Tweet (Wichtigkeit der Wörter, Wortarten), wie auch über Sentiment-Lexika gewonnene Stimmungsinformationen beinhalten. In einem zweiten Schritt führen wir mittels der Merkmalsvektoren eine Stimmungsklassifikation durch, die eine Einteilung in positive, negative oder neutrale Tweets ermöglicht. Die durchgeführten Evaluationen zeigen, dass der präsentierte Ansatz bezüglich der Qualität der erkannten Stimmung sehr gute Erkennungsraten garantiert. Weiter zeigen wir, dass der Ansatz mittels der Apache Storm Plattform problemlos für die Echtzeit-Stimmungserkennung von Tweets skaliert werden kann.AbstractThe automatic detection of the sentiment of texts has become more and more important throughout the last years. Particularly, the rapid increase of the speed at which information is spread in social media makes real-time sentiment detection a challenging task. On the microblogging platform Twitter, more than 8,000 messages are sent every second. In this work, we present the SentiStorm approach, an approach for sentiment detection within tweets. We base the approach on feature vectors which contain linguistic information about the tweet content (weighting of words, word categories), as well as sentiment information which we gather based on sentiment lexica. Subsequently, we facilitate these feature vectors for a sentiment classification task which allows for distinguishing positive, negative and neutral tweets. Our conducted evaluations show that the proposed approach shows high classification accuracy. At the same time, we show that utilizing the Apache Storm platform we are able to easily scale the approach towards a real-time sentiment classification of tweets.