Abstract:
Edge computing is a distributed computing paradigm that brings computation anddata storage closer to the location where it is needed to improve response timesand save bandwidth. Two important aspects of edge computing are low cost andlow power. Now a days, because of edge computing many edge devices are beingfitted with deep CNN based accelerators which are able to do object identificationand detection on the fly. Our main focus will be primarily on CCTV or Closed-circuit television cameras which are one such edge-devices that are being widelyused for video surveillance techniques across the world. By placing CCTV camerasat strategic locations, we can help prevent acts of vandalism, break-ins, and otherserious crimes. This has led to its widespread use across the globe and therefore it isequally important that CCTV just not remain a device for capturing and transmittingthe data to cloud centres but also should be functionally efficient in processing andgiving meaningful outputs of the data in real-time. To make CCTV cameras an activedevice, it is imperative to put some processing element.With this in mind, in the first semester, we first designed a low cost RISC V basedprocessor with only a limited set of instructions so that it incurs less power. But forthis, new custom compiler along with different set of software ecosystem has to begenerated which altogether poses a problem of different domain. Moreover, opensource hardware community would not be able to take full advantage of it. To re-move this problem, we moved to an accelerator based approach. We first designedthe accelerator for primitive image processing applications viz. blurring and edgedetection so as to get acquainted with challenges and intricacies involved while de-signing an accelerator. In the current semester we extended the accelerator to deepCNN accelerator where we studied and analysed the quantization techniques to im-prove the hardware utilization for edge devices and scratchpad based memory sys-tem for improving the communication bottleneck between host CPU and acceleratorspecifically designed for deep learning applications.