Abstract:
In recent years, as multi-core processors are emerging as a solution to the limitations of power scaling, the software domain is also experiencing a surge of compute and data-intensive applications. These applications are often targeted for real-time processing systems which demand high performance and energy efficiency. To meet these needs of the real-time processing architectures, the embedded systems are moving toward heterogeneous System-on-Chip (SoC) design. Alongside the general-purpose cores, the heterogeneous SoCs include application-specific customized designs, commonly known as hardware accelerators, which accelerate the performance of specific functions and provide energy efficiency. All the processing elements share the on-chip resources like memory etc., and communicate with each other through a shared Network-on-chip (NoC). The overwhelming performance improvement achieved from hardware acceleration has spurred a growing number of fixed-function accelerators in modern heterogeneous SoCs. The growing system sizes and time-to-market pressure of accelerator-rich heterogeneous SoCs compel the chip designers to analyze only part of the design space, leading to suboptimal Intellectual Property (IP) designs. Hence, the accelerators are generally designed as standalone IP blocks by third-party vendors and chip designers often over-provision the amount of on-chip resources required to add flexibility to each accelerator design. Although this modularity simplifies IP design, integrating these off-the-shelf accelerator blocks into a single SoC may overshoot the resource budget of the underlying system. Furthermore, the integration of third-party accelerator IPs alongside other on-chip modules makes the system vulnerable to security threats. This is due to the lack of design details from third- party vendors, which makes it difficult to verify the design. Even in the presence of the design, it becomes infeasible to carry out exhaustive simulations and find any malicious logic or hardware Trojan. It is promising to target accelerator memory subsystem for on-chip resource optimization as memory is a significant part of an accelerator design. A straightforward approach will be to allow small chunks of private memories for each accelerator core, so that the budget constraints are met. But, this approach might lead to excessive off-chip memory accesses as the data-intensive accelerators try to bring their respective data on chip. While sharing the accelerators’ on-chip memory is a viable solution for meeting the budget constraints and reducing off-chip memory accesses, it also poses a few challenges as follows. Excessive sharing of accelerator on-chip memory would result in less space for accelerator private data, resulting in more off-chip memory accesses, and, increased NoC contention due to remote shared memory accesses. Another major challenge imposed by the integration of third-party accelerator is the threat of mounting an attack on the on-chip resources and degrading the overall application performance. One such attack is a flooding attack, where a malicious IP injects frequent useless packets into the network to create congestion and block legitimate communication, resulting in a Denial-of-Service (DoS) attack. The distributed nature of NoCs and the dynamic behaviour of the accelerator-rich heterogeneous systems make it difficult to detect and localize such flooding attacks. We address the challenges involved in designing efficient accelerator-rich SoCs by optimizing the utilization of on-chip resources and mitigating the aforementioned performance-based security threats. We propose a design exploration framework to provide an optimal memory configuration for multi-accelerator systems to maintain overall system performance under a given resource budget constraint. Our framework formulates a model for the optimization problem that captures an application’s memory access patterns and provides the best-suited memory system configurations, taking on-chip network contention into consideration. It also employs an efficient method for shared data allocation across the distributed shared memory banks and reduces communication overhead. To minimize the performance degradation from flooding-based DoS attack, we propose an attack detection framework, which employs state-of-the-art machine learning algorithms to study the communication behaviour and accurately raise a flag in case of a flooding attack. An attack localization framework, called Sniffer, is also proposed which is able to localize one or multiple malicious IPs creating the DoS attack. Sniffer employs a perceptron-based and collaborative approach in tracing back the attack path and find out all the attacker nodes. Finally, we also study the design challenges in developing convolution neural network (CNN) accelerators that are widely integrated in heterogeneous SoCs for a range of application domains like image processing, speech recognition, search engines etc. The large number of input parameters and weights involved in CNNs impose a major challenge in data movement between the processing elements of a CNN accelerator, and on-chip and/or off-chip memory. We address this communication bottlenecks by extensively studying the application behaviour and propose an efficient accelerator architecture with fused wired and wireless interconnection network along with a data-flow scheduling algorithm.