dc.description.abstract |
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the location where it is needed to improve response times and save bandwidth. Two important aspects of edge computing are low cost and low power. Now a days, because of edge computing many edge devices are being fitted with deep CNN based accelerators which are able to do object identification and detection on the fly. Our main focus will be primarily on CCTV or Closed circuit television cameras which are one such edge-devices that are being widely used for video surveillance techniques across the world. By placing CCTV cameras at strategic locations, we can help prevent acts of vandalism, break-ins, and other serious crimes. This has led to its widespread use across the globe and therefore it is equally important that CCTV just not remain a device for capturing and transmitting the data to cloud centres but also should be functionally efficient in processing and giving meaningful outputs of the data in real-time. To make CCTV cameras an active device, it is imperative to put some processing element. With this in mind, in the first semester, we first designed a low cost RISC V based processor with only a limited set of instructions so that it incurs less power. But for this, new custom compiler along with different set of software ecosystem has to be generated which altogether poses a problem of different domain. Moreover, open source hardware community would not be able to take full advantage of it. To remove this problem, we moved to an accelerator based approach. We first designed the accelerator for primitive image processing applications viz. blurring and edge detection so as to get acquainted with challenges and intricacies involved while designing an accelerator. In the current semester we extended the accelerator to deep CNN accelerator where we studied and analyzed the quantization techniques to improve the hardware utilization for edge devices and scratchpad based memory system for improving the communication bottleneck between host CPU and accelerator specifically designed for deep learning applications. |
en_US |