Logo des Repositoriums

P266 - BTW2017 - Datenbanksysteme für Business, Technologie und Web - Workshopband

Autor*innen mit den meisten Dokumenten  

Auflistung nach:

Neueste Veröffentlichungen

1 - 10 von 47
  • Konferenzbeitrag
    A Deep Learning-based Approach for Banana Leaf Diseases Classification
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Amara, Jihen; Bouaziz, Bassem; Algergawy, Alsayed
    Plant diseases are important factors as they result in serious reduction in quality and quantity of agriculture products. Therefore, early detection and diagnosis of these diseases are important. To this end, we propose a deep learning-based approach that automates the process of classifying ba- nana leaves diseases. In particular, we make use of the LeNet architecture as a convolutional neural network to classify image data sets. The preliminary results demonstrate the effectiveness of the proposed approach even under challenging conditions such as illumination, complex background, different resolution, size, pose, and orientation of real scene images.
  • Konferenzbeitrag
    Post-Debugging in Large Scale Big Data Analytic Systems
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Bergen, Eduard; Edlich, Stefan
    Data scientists often need to fine tune and resubmit their jobs when processing a large quantity of data in big clusters because of a failed behavior of currently executed jobs. Consequently, data scientists also need to filter, combine, and correlate large data sets. Hence, debugging a job locally helps data scientists to figure out the root cause and increases efficiency while simplifying the working process. Discovering the root cause of failures in distributed systems involve a different kind of information such as the operating system type, executed system applications, the execution state, and environment variables. In general, log files contain this type of information in a cryptic and large structure. Data scientists need to analyze all related log files to get more insights about the failure and this is cumbersome and slow. Another possibility is to use our reference architecture. We extract remote data and replay the extraction on the developer’s local debugging environment.
  • Konferenzbeitrag
    Workshop Big (and small) Data in Science and Humanities (BigDS17)
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Groß, Anika; König-Ries, Birgitta; Reimann, Peter; Seeger, Bernhard
  • Konferenzbeitrag
    Experiences with the Model-based Generation of Big Data Pipelines
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Eichelberger, Holger; Qin, Cui; Schmid, Klaus
    Developing Big Data applications implies a lot of schematic or complex structural tasks, which can easily lead to implementation errors and incorrect analysis results. In this paper, we present a model-based approach that supports the automatic generation of code to handle these repetitive tasks, enabling data engineers to focus on the functional aspects without being distracted by technical issues. In order to identify a solution, we analyzed different Big Data stream-processing frameworks, extracted a common graph-based model for Big Data streaming applications and de- veloped a tool to graphically design and generate such applications in a model-based fashion (in this work for Apache Storm). Here, we discuss the concepts of the approach, the tooling and, in particular, experiences with the approach based on feedback of our partners.
  • Konferenzbeitrag
    Mining Industrial Logs for System Level Insights
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Czora, Sebastian; Dix, Marcel; Fromm, Hansjörg; Klöpper, Benjamin; Schmitz, Björn
    Industrial systems are becoming more and more complex and expensive to operate. Companies are making considerable efforts to increase operational efficiency and eliminate unplanned downtime of their equipment. Condition monitoring has been applied to improve equipment availability and reliability. Most of the condition monitoring applications, however, focus on single components, not on entire systems. The objective of this research was to demonstrate that a combination of visual analytics and association rule mining can be successfully used in a condition monitoring context on system level.
  • Konferenzbeitrag
    BTW 2017 Data Science Challenge (SDSC17)
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Waizenegger, Tim
  • Konferenzbeitrag
    Scalable Data Management: An In-Depth Tutorial on NoSQL Data Stores
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Gessert, Felix; Wingerath, Wolfram; Ritter, Norbert
    The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed “NoSQL” database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting request loads and data volumes as well as 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems – now commonly referred to as NoSQL data stores – make it increasingly di cult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system’s design is governed by a central set of trade-o s over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges. In addition to earlier tutorials, we explicitly address how the quickly emerging topic of processing and storing massive amounts of data in real-time can be solved by di erent types real-time data management systems.
  • Konferenzbeitrag
    Understanding Trending Topics in Twitter
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Kahlert, Roland; Liebeck, Matthias; Cornelius, Joseph
    Many events, for instance in sports, political events, and entertainment, happen all over the globe all the time. It is difficult and time consuming to notice all these events, even with the help of different news sites. We use tweets from Twitter to automatically extract information in order to understand hashtags of real-world events. In our paper, we focus on the topic identification of a hashtag, analyze the expressed positive, neutral, and negative sentiments of users, and further investigate the expressed emotions. We crawled English tweets from 24 hashtags and report initial investigation results.
  • Konferenzbeitrag
    Vergleich und Evaluation von RDF-on-Hadoop-Lösungen
    (Datenbanksysteme für Business, Technologie und Web (BTW 2017) - Workshopband, 2017) Amann, Wolfgang
    Mit der steigenden Anzahl von Daten, welche in Form des Resource Description Frame- work (RDF) veröffentlicht werden entsteht eine Menge von Daten, bei der Datenoperationen nicht mehr von einem einzelnen Rechner zu bewältigen sind. In dieser Arbeit werden Systeme vorgestellt, welche zur Lösung dieses Problems das Hadoop-Framework ausschließlich bzw. in Kombination mit anderen Big-Data-Frameworks nutzen. Danach werden mit PigSPARQL und Rya zwei dieser Ansätze, welche exemplarisch für die neuere Entwicklung dieser RDF-on-Hadoop-Systeme stehen, anhand der Benchmark-Queries der Waterloo SPARQL Diversity Test Suite auf spezifische Stärken und Schwächen analysiert.