Konferenzbeitrag
Pretreatment of Environmental Data for Forecasting Purposes
Vorschaubild nicht verfügbar
Volltext URI
Dokumententyp
Text/Conference Paper
Zusatzinformation
Datum
2008
Autor:innen
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Verlag
Shaker Verlag
Zusammenfassung
To assess present actions on the environment, it is necessary to estimate its impact in the future. Niels Bohr2 recognized, “prediction is very difficult, especially about the future”. Fortunately, “the future is made of the same stuff as the present” (Simone Weil3). This holds the fundamental possibility to forecast. The present is described with data. To draw the right conclusion about the future, the data need to be significant, correct and complete. This work is part of a project for an active control of groundwater levels. In this project, expected groundwater levels are being prognosticated according to varying infiltration masses using Artificial Neural Networks (ANN). Thus, an adequate infiltration quantity will be identified in order to reach the desired groundwater level. Before the environmental data are suitable for the actual forecast purpose, they need to undergo a wide range of pretreatments. These efforts are being described within this paper. In a first step, substitution methods will be presented to impute missing data. Basically, these methods can be divided into two branches. One category with correlation and kriging methods which use related measuring data sets, i.e. data sets of a nearby measuring station for e.g. groundwater level, temperature, rainfall etc.. The other category that uses only the one regarded data set consists of mere statistical methods that are spline interpolation, time-series forecasts and multiple imputations. In a second step, the completed data sets need to be freed from gross errors. For that reason different test criteria like bound checking, comparisons of spacings and different statistical methods are implemented. Furthermore, the original dynamic and time-variant data sets are compared with computed data sets, generated with time series analysis models. Outliers are indicated if computed values strongly diverge from original values. In doubtful situations the current curve can be compared with a curve of a correlative data set, if available. In a third step, in terms of a complexity reduction, the number of the relevant data that serve as input parameters for the ANN need to be reduced without losing the necessary information to make predictions. This is important because in the present case the number of necessary input parameters is too high in comparison to the number of training sets to train the ANN. Different statistical approaches will be discussed, like moving averages, time-weighted transformations and a method to combine sets of moving averages to reduce the number of input parameters of the ANN with consistent information content.