Auflistung P200 - ARCS 2012 Workshops nach Erscheinungsdatum
1 - 10 von 43
Treffer pro Seite
Sortieroptionen
- KonferenzbeitragImproving cache locality for ray casting with CUDA(ARCS 2012 Workshops, 2012) Sugimoto, Yuki; Ino, Fumihiko; Hagihara, KenichiIn this paper, we present an acceleration method for texture-based ray casting on the compute unified device architecture (CUDA) compatible graphics processing unit (GPU). Since ray casting is a memory-intensive application, our method increases the hit rate of the texture cache during rendering. To achieve this, our method dynamically selects the width and height of thread blocks (TBs) such that each warp, which is a series of 32 threads simultaneously processed on the GPU, can achieve high data locality for specific viewpoints. The objective of this selection is to allow every warp rather than every thread to access data with a small stride, because the GPU executes multiple threads at the same time. In experiments using a GeForce GTX 480 card (i.e., the latest Fermi architecture), we find that the speedup of our method ranges from a factor of 1.0 to that of 4.0, depending on viewpoints. We think that optimizing the shape of TBs is important to achieve more cache hits in the highly-threaded CUDA hardware.
- KonferenzbeitragFlexible scheduling and thread allocation for synchronous parallel tasks(ARCS 2012 Workshops, 2012) Kessler, Christoph W.; Hansson, ErikWe describe a task model and dynamic scheduling and resource allocation mechanism for synchronous parallel tasks to be executed on SPMD-programmed synchronous shared-memory MIMD parallel architectures with uniform, unit-time memory access and strict memory consistency, also known in the literature as PRAMs (Parallel Random Access Machines). Our task model provides a two-tier programming model for PRAMs that flexibly combines SPMD and fork-join parallelism within the same application. It offers flexibility by dynamic scheduling and late resource binding while preserving the PRAM execution properties within each task, the only limitation being that the maximum number of threads that can be assigned to a task is limited to what the underlying architecture provides. In particular, our approach opens for automatic performance tuning at run-time by controlling the thread allocation for tasks based on run-time predictions. By a prototype implementation of a synchronous parallel task API in the SPMD- based PRAM language Fork and experimental evaluation with example programs on the SBPRAM simulator, we show that a realization of the task model on a SPMD- programmable PRAM machine is feasible with moderate runtime overhead per task.
- KonferenzbeitragCorrection of faulty signal transmission for resilient designs of signed-digit arithmetic(ARCS 2012 Workshops, 2012) Neuhäuser, David; Zehendner, EberhardWhen arithmetic components are parallelized, fault-prone interconnections can tamper results significantly. Advances in feature size shrinking lead to a steady increase of errors caused by faulty transmission. We suggest to employ resilient data encoding schemes to offset these negative effects. Focusing on parallel signed-digit based arithmetic, frequently used in high-speed systems, we found that a suitable data encoding can reduce error rates by about 25% when using 2-bit encoding and about 62% when using 3-bit encoding. Data encoding should be driven by symbol occurrence probabilities. We develop a methodology to obtain these probabilities, show example fault-tolerant encodings, and discuss the impact on communicating parallel arithmetic circuits in example error scenarios.
- KonferenzbeitragHot topic detection in local areas using twitter and wikipedia(ARCS 2012 Workshops, 2012) Ishikawa, Shota; Arakawa, Yutaka; Tagashira, Shigeaki; Fukuda, AkiraAs microblog services become increasingly popular, spatial-temporal text data has increased explosively. Many studies have proposed methods to spatially and temporally analyze an event, indicated by the text data. These studies have aimed a extracting the period and the location in which a specified topic frequently occurs. In this paper, we focus on a system that detects hot topic in a local area and during a particular period. There can be a variation in the words used even though the posts are essentially about the same hot topic. We propose a classification method that mitigates the variation of posted words related to the same topic.
- KonferenzbeitragExploiting bit-level parallelism in GPGPUs: a case study on KeeLoq exhaustive key search attack(ARCS 2012 Workshops, 2012) Agosta, Giovanni; Barenghi, Alessandro; Pelosi, GerardoGraphic Processing Units (GPU) are increasingly popular in the field of high-performance computing for their ability to provide computational power for massively parallel problems at a reduced cost. However, the programming model exposed by the GPGPU software development tools is often insufficient to achieve full performance, and a major rethinking of algorithmic choices is needed. In this paper, we showcase such an effect on a case study drawn from the cryptography application domain. The pervasive use of cryptographic primitives in modern embedded systems is a growing trend. Small, efficient cryptosystems have been effectively employed to design and implement keyless password-based access control systems in various wireless authentication applications. The security margin provided by these lightweight ciphers should be accurately examined in light of the speed and area constraints imposed by the target environment. We present a re-design of the ASIC-oriented KEELOQ implementation to perform efficient exhaustive key search attacks while fitting tightly the parallel programming model exposed by modern GPUs. Indeed, the bitslicing technique allows the intrinsic parallelism offered by word-oriented SIMD computations to be effectively exploited. Through proper adaptation of the algorithm implementation to a platform radically different from the one it was designed for, we achieved a × 40 speedup in the computation time with respect to a single-core CPU bruteforce attack, employing only consumer grade hardware. The outstanding speedup obtainable points to a significant weakening of the cipher security margin, since it proves that anyone with off-the-shelf hardware is able to circumvent the security measures in place.
- KonferenzbeitragTowards integration of user interaction and context event processing in intelligent living environments(ARCS 2012 Workshops, 2012) Lehmann, Simon; Schäfer, Jan; Dörner, Ralf; Schwanecke, UlrichEvent processing plays a significant role in the current development of intelligent living environments. It ranges from processing of information produced by a magnitude of sensors to gain insight into the activities of the inhabitants on a more global scale, to the processing of immediate and rather short-lived events of user input on and around interactive systems embedded in common household furniture like tabletops or tablets. Based on the work conducted separately in those two fields, we found that the still evolving field of complex event processing (CEP) provides the methods and tools to handle those distinct use-cases equally. Especially the application to interactive systems, while being novel and uncommon, is well suited and further shows the broad applicability of CEP. The comparison of the two application fields shows that, even though the events occurring in them are distinguished by their intention, commonalities do exist and provide integration points. Furthermore, the integration of those applications within the context of smart homes allows to provide demand-oriented resource management, which realizes self adaptation and control.
- KonferenzbeitragTowards a top-view detection of body parts in an interactive tabletop environment(ARCS 2012 Workshops, 2012) Haubner, Nadia; Schwanecke, Ulrich; Dörner, Ralf; Lehmann, Simon; Luderschmidt, JohannesIntegrating digital tabletop systems in private living environments is a promising approach to enhance people's everyday life with information technology. Apart from using the surface of such a tabletop, research on the detection of interaction above and around the surface is increasing rapidly. So far, detection is limited either to very specific gestures above the surface or to rather abstract detection of users in a larger scenario. The detection of body parts in tabletop setups has rarely been investigated, although the knowledge about the whereabouts of body parts would be helpful to establish relationships between users and interactions. In this paper, we propose a system that is capable to detect body parts above and around such a tabletop setup using a depth camera. We further take up an existing approach to present how the detection in this setup could work. Additionally, we propose a new approach to obtain training data for the detection using a color suit.
- KonferenzbeitragHierarchical self-repair in heterogeneous multi-core systems by means of a software-based reconfiguration(ARCS 2012 Workshops, 2012) Müller, Sebastian; Schölzel, Mario; Vierhaus, Heinrich TheodorThis paper deals with the problem of a software-based self-repair in a statically scheduled multi-core system in the presence of permanent faults. The basic idea is to adapt the application in a way that the use of faulty components is avoided. This goal is achieved by re-compiling the program-code via an off-line repair process in the field. The repair process is organized in a hierarchical manner. At the beginning a local repair is applied considering only one core. If the local repair fails, a retry at a higher system-level is performed. For that purpose, the local repair techniques are re-used in a global context. The repair at a global system level results in some specific system constraints and properties, which are investigated in this work. Due to the use of pure software-based methods one gains the possibility to repair defects in different components (even multi errors) or defects in spare components. The presented approach is not bounded to a concrete architecture and is therefore adaptable to systems like MPSoCs or NoCs, if these systems provide some basic functionality.
- KonferenzbeitragSub-word handling in data-parallel mapping(ARCS 2012 Workshops, 2012) Psychou, Georgia; Fasthuber, Robert; Hulzink, Jos; Huisken, Jos; Catthoor, FranckyData-parallel processing is a widely applicable technique, which can be implemented on different processor styles, with varying capabilities. Here we address single or multi-core data-parallel instruction-set processors. Often, handling and reorganisation of the parallel data may be needed because of diverse needs during the execution of the application code. Signal word-length considerations are crucial to incorporate because they influence the outcome very strongly. This paper focuses on the broader solution space of selecting sub-word lengths (at design time) including especially hybrids, so that mapping on these data parallel single/multi-core processors is more energy-efficient. Our goal is to introduce systematic exploration techniques so that part of the designers effort is removed. The methodology is evaluated on a representative application driver for a number of data-path variants and the most promising trade-off points are indicated. The range of throughput-energy ratios among the different mapping implementations is spanning a factor of 2.2.
- KonferenzbeitragAdaptive content distribution network for live and on-demand streaming(ARCS 2012 Workshops, 2012) Miyauchi, Yuta; Matsumoto, Noriko; Yoshida, Norihiko; Kamiya, Yuko; Shimokawa, ToshihikoWe have proposed an adaptive content distribution network (CDN), FCAN (Flash Crowds Alleviation Network), which changes its structure dynamically against a flash crowd, that is a rapid increase in server load caused by a sudden access concentration. FCAN in our preceding studies responds only to static content delivery. In this paper, we extend FCAN to alleviate flash crowds in video streaming. Through some experiments, we confirmed that FCAN for video streaming is effective to alleviate flash crowds.