Abstract:
Cloud providers host tenant applications on the shared server infrastructure and provide isolation by leveraging virtualization solutions such as virtual machines (VMs), containers, or serverless. The current virtualization solutions isolate server resources such as the CPU, memory, and I/O (e.g., network and disk). However, these tenants share the CPU’s Last Level Cache (LLC). With this setup, one or more applications (or tenants) may use a majority of cache blocks by evicting the cached content of other tenants, resulting in performance degradation for non-aggressive applications, which is popularly known as the ”noisy neighbor” problem. Some recent research has proposed cache partitioning techniques that implement hardware and software solutions such as cache coloring, Intel’s cache allocation technology (CAT), and partitioning based on sets, blocks, or ways for server processors. However, the solution choice depends on application and workload characteristics; therefore, we cannot have a static, generalized solution. To match the increasing network speeds (400 to 800 Gbps) and application performance requirements, the cloud providers offload the tenant applications (typically network and compute-intensive) to programmable network hardware (a.k.a. smart NICs). The offloaded applications process the incoming requests at the smart NIC and avoid the latency incurred in hypervisor and kernel stack traversal. A smart NIC comprises several wimpy processors (10’s to 100s) that share the LLC for performance, making it vulnerable to the noisy neighbor problem. The smart NIC has a smaller die area and power budget (5x less) than server processors. Therefore, the smart NIC performance isolation solutions cannot directly apply the cache partitioning solutions used for server processors. This project aims to design a cache partitioning framework for smart NICs to provide performance isolation for offloaded applications. The design should consider the hardware constraints and identify the tunable parameters (e.g., hardware vs. software, static vs. dynamic, minimize overall cache misses, provide QoS guarantees for few tenants) for performance isolation. Toward this objective, this report presents the literature on cache partitioning techniques and the analysis based on simulations to motivate the need for cache partitioning for typical offloaded applications