dc.description.abstract |
With energy efficiency becoming a major concern in the HPC community, low-power
alternatives such as heterogeneous Multi-Processor System-on-chip (MPSoC) are gainingpopularity. These devices house several different kinds of accelerators on-chip includingDigital Signal Processors (DSP) alongside CPU and GPU cores. Maximum
performance from MPSoC devices can be achieved through the simultaneous use of all
the processing elements. In addition to partitioning work across different processing
elements being notoriously difficult on MPSoC platforms, achieving an energy-optimal
partition is known to be non-trivial. The Keystone II Hawking (K2H) Platform is one such MPSoC from Texas Instrumentsthat consists of 4 cache coherent ARM cores and 8 TI DSP cores that are notcache coherent. This MPSoC provides high floating-point performance with low powerconsumption and is also being used in the nCore BrownDwarf supercomputer. Recentwork on the K2H MPSoC describes a novel hybrid work-stealing runtime that supportsconcurrent execution of computation across all ARM and DSP cores. While this runtimescales well with loop-based parallelism, it offers limited scalability for recursivedivide-and-conquer parallelism. Other recent work on K2H describes novel techniquesto predict an energy-optimal work partition through the use of a simple energy usagemodel. However, this requires programmer effort in work partitioning and applyingthis energy model. This thesis aims to combine these two approaches while overcomingtheir inherent limitations.
The key contributions of my thesis are: 1) an implementation of a novel hybrid
work-stealing runtime for the K2H MPSoC that supports high-performance execution
of both loop-based and recursive divide-and-conquer parallelism; 2) a novel automated
approach that uses this hybrid runtime for determining the energy-optimal partition; and
3) experimental analysis of the above claims by using several well-known benchmarks |
en_US |