Abstract:
Mobile devices like smartphones can augment their low-power processors by offloading GPU- heavy applications to cloud servers. However, cloud data centers consume a lot of energy and have high latency. To mitigate the problem of high latency by data centers, recently offloading to edge devices that are in proximity to the users has become popular. Such edge devices usually have much lower compute capacity, as they need to be more widely distributed than data centers. However, the recent rise of machine learning-based workloads make it a challenge to model such execution. To resolve this challenge, we benchmark a few widely used machine learning programs on a representative edge device and a server-grade GPU. Our benchmarks show that in addition to saving network latency, edge devices also consume less energy than server-grade GPU’s, albeit at the cost of higher execution time. Based on the above trade-off between execution time, network latency and computation time, we look at the problem of scheduling a job with a sequence of machine learning workloads. We formulate this scheduling as an integer linear programming problem, where the objective is to minimize the energy consumption while ensuring that the maximum number of jobs finish within a specific deadline. We use ILP-solvers to identify the optimal solution by using Google OR-tools and demonstrate on a few examples that our technique works well in practice.