Auflistung nach Autor:in "Leser, Ulf"
1 - 8 von 8
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragAn urban health risk analysis for Berlin: exploration and integration of spatio-temporal Information on the urban environment(Integration of Environmental Information in Europe, 2010) Lakes, Tobia; Leser, Ulf; Senf, CorneliusUrban areas provide the living space for the majority of the world population. Well-being and health within urban areas is influenced by environmental and socioeconomic variables. To analyze the potential health risk in cities one has to integrate multi-dimensional and multi-scale information describing socioeconomic, environmental and healthrelated attributes. The aim of this paper is to study the urban health risk for different parts of the city of Berlin using a data-driven analysis approach. We focus on the detection and exploration of correlations between environmental, socioeconomic and health-related attributes in Berlin. We showcase the further study of selected correlations, including the biophysical and socioeconomic burden of certain diseases, and their visual exploration using advanced geovisualization.
- KonferenzbeitragBenchmarking Univariate Time Series Classifiers(Datenbanksysteme für Business, Technologie und Web (BTW 2017), 2017) Schäfer, Patrick; Leser, UlfTime series are a collection of values sequentially recorded over time. Nowadays, sensors for recording time series are omnipresent as RFID chips, wearables, smart homes, or event-based systems. Time series classification aims at predicting a class label for a time series whose label is unknown. Therefore, a classifier has to train a model using labeled samples. Classification time is a key challenge given new applications like event-based monitoring, real-time decision or streaming systems. This paper is the first benchmark that compares 12 state of the art time series classifiers based on prediction and classification times. We observed that most of the state-of-the-art classifiers require extensive train and classification times, and might not be applicable for these new applications.
- ZeitschriftenartikelData Management Challenges in Next Generation Sequencing(Datenbank-Spektrum: Vol. 12, No. 3, 2012) Wandelt, Sebastian; Rheinländer, Astrid; Bux, Marc; Thalheim, Lisa; Haldemann, Berit; Leser, UlfSince the early days of the Human Genome Project, data management has been recognized as a key challenge for modern molecular biology research. By the end of the nineties, technologies had been established that adequately supported most ongoing projects, typically built upon relational database management systems. However, recent years have seen a dramatic increase in the amount of data produced by typical projects in this domain. While it took more than ten years, approximately three billion USD, and more than 200 groups worldwide to assemble the first human genome, today’s sequencing machines produce the same amount of raw data within a week, at a cost of approximately 2000 USD, and on a single device. Several national and international projects now deal with (tens of) thousands of genomes, and trends like personalized medicine call for efforts to sequence entire populations. In this paper, we highlight challenges that emerge from this flood of data, such as parallelization of algorithms, compression of genomic sequences, and cloud-based execution of complex scientific workflows. We also point to a number of further challenges that lie ahead due to the increasing demand for translational medicine, i.e., the accelerated transition of biomedical research results into medical practice.
- KonferenzbeitragExperiences from developing the domain-specific entity search engine GeneView(Datenbanksysteme für Business, Technologie und Web (BTW) 2027, 2013) Thomas, Philippe; Starlinger, Johannes; Leser, UlfGeneView is a semantic search engine for the Life Sciences. Unlike traditional search engines, GeneView analyzes texts upon import to recognize and properly handle biomedical entities, relationships between those entities, and the structure of documents. This allows for a number of advanced features required to work effectively with scientific texts, such as entity disambiguation, ranking of documents by entity content, linking to structured knowledge about entities, userfriendly highlighting of entities etc. As of now, GeneView indexes approximately ~21,4M abstracts and ~358K full texts with more than 200M entities of 11 different types and more than 100K relationships. In this paper, we describe the architecture underlying the system with a focus on the complex pipeline of advanced NLP and information extraction tools necessary for achieving the above functionality. We also discuss open challenges in developing and maintaining a semantic search engine over a large (though not web-scale) corpus.
- ZeitschriftenartikelHow to improve information extraction from German medical records(it - Information Technology: Vol. 59, No. 5, 2017) Starlinger, Johannes; Kittner, Madeleine; Blankenstein, Oliver; Leser, UlfVast amounts of medical information are still recorded as unstructured text. The knowledge contained in this textual data has a great potential to improve clinical routine care, to support clinical research, and to advance personalization of medicine. To access this knowledge, the underlying data has to be semantically integrated – an essential prerequisite to which is information extraction from clinical documents.
- TextdokumentInformation Retrieval for Precision Oncology(BTW 2019, 2019) Seva, Jurica; Goetze, Julian; Lamping, Mario; Rieke, Damian Tobias; Schaefer, Reinhold; Leser, UlfDiagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient’s genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of search results in the common search engines focuses on basic science and is clinically irrelevant. We developed the Variant-Information Search Tool, a search engine designed for the targeted search of clinically relevant publications given a mutation profile. VIST indexes all PubMed abstracts, applies advanced text mining to identify mentions of genes and variants and uses machine-learning based scoring to judge the relevancy of documents. Its functionality is available through a fast and intuitive web interface. We also performed a comparative evaluation, showing that VIST’s ranking is superior to that of PubMed or vector space models.
- KonferenzbeitragPiPa: custom integration of protein interactions and pathways(INFORMATIK 2011 – Informatik schafft Communities, 2011) Arzt, Sebastian; Starlinger, Johannes; Arnold, Oliver; Kröger, Stefan; Jaeger, Samira; Leser, UlfInformation about proteins and their relationships to each other are a common source of input for many areas of Systems Biology, such as protein function prediction, relevance-ranking of disease genes and simulation of biological networks. While there are numerous databases that focus on collecting such data from, for instance, literature curation, expert knowledge, or experimental studies, their individual coverage is often low, making the building of an integrated protein-protein interaction database a pressing need. Accordingly, a number of such systems have emerged. But in most cases their content is only accessible over the web on a per-protein basis, which renders them useless for automatic analysis of sets of proteins. Even if the databases are available for download, often certain data sources are missing (e.g. because redistribution is forbidden by license), and update intervals are sporadic. We present PiPa, a system for the integration of protein-protein interactions (PPI) and pathway data. PiPa is a stand-alone tool for loading and updating a large number of common PPI and pathway databases into a homogeneously structured relational database. PiPa features a graphical administration tool for monitoring its state, triggering updates, and for computing statistics on the content. Due to its modular architecture, addition of new data sources is easy. The software is freely available from the authors.
- ZeitschriftenartikelThe Collaborative Research Center FONDA(Datenbank-Spektrum: Vol. 21, No. 3, 2021) Leser, Ulf; Hilbrich, Marcus; Draxl, Claudia; Eisert, Peter; Grunske, Lars; Hostert, Patrick; Kainmüller, Dagmar; Kao, Odej; Kehr, Birte; Kehrer, Timo; Koch, Christoph; Markl, Volker; Meyerhenke, Henning; Rabl, Tilmann; Reinefeld, Alexander; Reinert, Knut; Ritter, Kerstin; Scheuermann, Björn; Schintke, Florian; Schweikardt, Nicole; Weidlich, MatthiasToday’s scientific data analysis very often requires complex Data Analysis Workflows (DAWs) executed over distributed computational infrastructures, e.g., clusters. Much research effort is devoted to the tuning and performance optimization of specific workflows for specific clusters. However, an arguably even more important problem for accelerating research is the reduction of development, adaptation, and maintenance times of DAWs. We describe the design and setup of the Collaborative Research Center (CRC) 1404 “FONDA -– Foundations of Workflows for Large-Scale Scientific Data Analysis”, in which roughly 50 researchers jointly investigate new technologies, algorithms, and models to increase the portability, adaptability, and dependability of DAWs executed over distributed infrastructures. We describe the motivation behind our project, explain its underlying core concepts, introduce FONDA’s internal structure, and sketch our vision for the future of workflow-based scientific data analysis. We also describe some lessons learned during the “making of” a CRC in Computer Science with strong interdisciplinary components, with the aim to foster similar endeavors.