Textdokument
Fast and accurate creation of annotated head pose image test beds as prerequisite for training neural networks
Volltext URI
Dokumententyp
Dateien
Zusatzinformation
Datum
2017
Zeitschriftentitel
ISSN der Zeitschrift
Bandtitel
Quelle
Verlag
Gesellschaft für Informatik, Bonn
Zusammenfassung
In this paper we present an experimental setup consisting of 36 cameras on 4 height levels covering more than half space around a centrally sitting person. The synchronous image release allows to build a 3D model of the human torso in this position. Using this so-called body scanner we recorded 36 different positions giving in total 1296 images in several minutes obtaining tens to hundreds of different pitch-roll-yaw head pose combinations with very high precision of less than +-5. From annotation of 7 facial keypoints (ears, eyes, nose, corners of the mouth) in the 36 calculated 3D models of a human head/upper body, we automatically get 1296 x 7 2D facial landmark points saving a factor 36 in annotation time. The projection of the 3D model to the camera provides a foreground/background separation mask of the person in each image usable for data set augmentation e.g. by inserting different backgrounds (required for training convolutional neural networks, CNNs). Moreover, we utilize our 3D model in combination with textures to create realistic images of the pitch-roll-yaw range not assessed in experiments. This interpolation is ad hoc applicable to a subset of 10 central out of 36 total camera views where fine-grained interpolation of head poses is possible. Using interpolation and background masks for background exchange enables us to augment the data set easily by a factor of 1000 or more knowing precisely pitch, roll, yaw and the 7 annotated facial keypoints in each image.