Do We Need Real Data? - Testing and Training Algorithms with Artificial Geolocation Data
Abstract
As big data becomes increasingly important, so do algorithms that operate on geolocation data. Privacy requirements and the cost of collecting large sets of geolocation data, however, make it difficult to test those algorithms with real data. Artificially generated data sets therefore present an appealing alternative. This paper explores the use of two types of neural networks as generators of geolocation data and introduces a method based on the Turing Test to determine whether generated geolocation data is indistinguishable from real data. In an extensive evaluation we apply the method to data generated by our own implementation of neural networks as well as the widely used BerlinMOD generator on the one hand, the four most prominent data sets of real geolocation data covering at total of 65 million records on the other hand. The experiments show that in eleven of twelve cases artificial data sets can be told from real ones. We conclude that, at present, the generators we tested provide no safe replacement for real data.
- Citation
- BibTeX
Kaiser, J., Bavendiek, K. & Schupp, S.,
(2019).
Do We Need Real Data? - Testing and Training Algorithms with Artificial Geolocation Data.
In:
David, K., Geihs, K., Lange, M. & Stumme, G.
(Hrsg.),
INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft.
Bonn:
Gesellschaft für Informatik e.V..
(S. 205-218).
DOI: 10.18420/inf2019_25
@inproceedings{mci/Kaiser2019,
author = {Kaiser, Jan AND Bavendiek, Kai AND Schupp, Sibylle},
title = {Do We Need Real Data? - Testing and Training Algorithms with Artificial Geolocation Data},
booktitle = {INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft},
year = {2019},
editor = {David, Klaus AND Geihs, Kurt AND Lange, Martin AND Stumme, Gerd} ,
pages = { 205-218 } ,
doi = { 10.18420/inf2019_25 },
publisher = {Gesellschaft für Informatik e.V.},
address = {Bonn}
}
author = {Kaiser, Jan AND Bavendiek, Kai AND Schupp, Sibylle},
title = {Do We Need Real Data? - Testing and Training Algorithms with Artificial Geolocation Data},
booktitle = {INFORMATIK 2019: 50 Jahre Gesellschaft für Informatik – Informatik für Gesellschaft},
year = {2019},
editor = {David, Klaus AND Geihs, Kurt AND Lange, Martin AND Stumme, Gerd} ,
pages = { 205-218 } ,
doi = { 10.18420/inf2019_25 },
publisher = {Gesellschaft für Informatik e.V.},
address = {Bonn}
}
Dateien | Groesse | Format | Anzeige | |
---|---|---|---|---|
paper3_02.pdf | 1.296Mb | View/ |
Sollte hier kein Volltext (PDF) verlinkt sein, dann kann es sein, dass dieser aus verschiedenen Gruenden (z.B. Lizenzen oder Copyright) nur in einer anderen Digital Library verfuegbar ist. Versuchen Sie in diesem Fall einen Zugriff ueber die verlinkte DOI: 10.18420/inf2019_25
Haben Sie fehlerhafte Angaben entdeckt? Sagen Sie uns Bescheid: Send Feedback
More Info
DOI: 10.18420/inf2019_25
ISBN: 978-3-88579-688-6
ISSN: 1617-5468
xmlui.MetaDataDisplay.field.date: 2019
Language:
(en)

Content Type: Text/Conference Paper