Auflistung nach Schlagwort "imbalanced data"
1 - 3 von 3
Treffer pro Seite
Sortieroptionen
- WorkshopbeitragHow can Small Data Sets be Clustered?(Mensch und Computer 2021 - Workshopband, 2021) Weigand, Anna Christina; Lange, Daniel; Rauschenberger, MariaIn many areas, only small data sets are available and big data does not play a significant role, e.g., in Human-Centered Design research. In the context of machine learning analysis, results of small data sets can be biased due to single variables or missing values. Nevertheless, reliable and interpretable results are essential for determining further actions, such as, e.g., treatments in a health-related use case. In this paper, we explore machine learning clustering algorithms on the basis of a small, health-related (variance) data set about early dyslexia screening. Therefore, we selected three different clustering algorithms from different clustering methods: K-Means, HAC and DBSCAN. In our case, K-Means and HAC showed promising results, while DBSCAN did not deliver distinct results. Based on our experiences, we provide first proposals on how to handle small data set clustering and describe situations in which using Human- Centered Design methods can increase interpretability of machine learning clustering results. Our work represents a starting point for discussing the topic of clustering small data sets.
- ZeitschriftenartikelHow to Handle Health-Related Small Imbalanced Data in Machine Learning?(i-com: Vol. 19, No. 3, 2021) Rauschenberger, Maria; Baeza-Yates, RicardoWhen discussing interpretable machine learning results, researchers need to compare them and check for reliability, especially for health-related data. The reason is the negative impact of wrong results on a person, such as in wrong prediction of cancer, incorrect assessment of the COVID-19 pandemic situation, or missing early screening of dyslexia. Often only small data exists for these complex interdisciplinary research projects. Hence, it is essential that this type of research understands different methodologies and mindsets such as the Design Science Methodology, Human-Centered Design or Data Science approaches to ensure interpretable and reliable results. Therefore, we present various recommendations and design considerations for experiments that help to avoid over-fitting and biased interpretation of results when having small imbalanced data related to health. We also present two very different use cases: early screening of dyslexia and event prediction in multiple sclerosis.
- WorkshopbeitragRecommendations to Handle Health-related Small Imbalanced Data in Machine Learning(Mensch und Computer 2020 - Workshopband, 2020) Rauschenberger, Maria; Baeza-Yates, RicardoWhen discussing interpretable machine learning results, researchers need to compare results and reflect on reliable results, especially for health-related data. The reason is the negative impact of wrong results on a person, such as in missing early screening of dyslexia or wrong prediction of cancer. We present nine criteria that help avoiding over-fitting and biased interpretation of results when having small imbalanced data related to health. We present a use case of early screening of dyslexia with an imbalanced data set using machine learning classification to explain design decisions and discuss issues for further research.