Abstract:
Multi-armed bandit (MAB) algorithms are designed to identify the best arm among several arms in an unknown environment. They guarantee optimal balance between exploration (select all arms sufficient number of times) and exploitation (select the best arm as many times as possible). They are widely used in applications such as website advertisement, robotics, healthcare, finance, and wireless radios. Robotics and radio applications need integration of MAB algorithms with the PHY on the hardware to meet the stringent area, power and latency constraints. Moreover, a single MAB algorithm may not be suitable for various scenarios and hence, the application needs to switch between MAB algorithms on-the-y. We effciently map the MAB algorithms on Zynq System on Chip (ZSoC) and make it reconfigurable such that the number of arms, as well as type of algorithm, can be changed on-the-y. We also exploit the proposed reconfigurable architecture to switch MAB algorithms on-the-y, after initial learning and obtain at least a 10-factor improvement in latency and throughput. Since learning duration depends on the unknown arm statistics, we offer intelligence embedded in architecture to decide the switching instant. To further improve the intelligence of the proposed dynamically reconfigurable architecture, we also propose an efficient aggregation algorithm to adaptively switch between various bandit algorithms in unknown environments. We have also validate the functional correctness and usefulness of the proposed architecture via a realistic wireless application and detailed complexity analysis demonstrates its feasibility in realizing intelligent radios.