IIIT-Delhi Institutional Repository

Change detection based Thompson sampling algorithm for non-stationary bandits

Show simple item record

dc.contributor.author Zaid, Kunwar
dc.contributor.author Ghatak, Gourab (Advisor)
dc.date.accessioned 2021-03-26T06:53:37Z
dc.date.available 2021-03-26T06:53:37Z
dc.date.issued 2020
dc.identifier.uri http://repository.iiitd.edu.in/xmlui/handle/123456789/868
dc.description.abstract The stationary multi-armed bandit (MAB) framework is a well-studied problem in literature, with many rigorous mathematical treatments and optimal solutions. However, for a non-stationary environment, i.e., when the reward distribution changes over time, the MAB problem is notoriously difficult to analyze. In general, to address non-stationary bandit problems, researchers have proposed two approaches: i) passively adaptive techniques, that are analytically tractable, or ii) actively adaptive techniques that keep track of the environment and adapt as soon as changes are detected. Consequently, researchers have come up with variants of bandit algorithms that are based on classical solutions, e.g., sliding-window upper-confidence bound (SW-UCB), dynamic UCB (d-UCB), discounted UCB (D-UCB), discounted Thompson sampling (DTS), etc. In this regard, we consider the piecewise stationary environment, where the reward distribution remains stationary for a random time and changes at an unknown instant. We propose a class of change-detection based, actively-adaptive, TS algorithms for this framework named TS-CD. In particular, the non-stationary in the environment is modeled as a Poisson arrival process, which changes the reward distribution on each arrival. For detecting the change we employ i) mean-estimation based methods, and ii) Goodness-of-fit tests, namely the Kolmogorov-Smirnov test (KS-test) and the Anderson-Darling test (AD-test). Once a change is detected, the TS algorithm either refreshes the parameters, or discounts the past rewards. To assess the performance of the proposed algorithm, we have tested it for edge-control of i) multi-connectivity1 and ii) RAT selection in a wireless network. We have compared the TS-CD algorithms with other bandit algorithms that are designed for non-stationary environments, such as D-UCB, discounted Thompson sampling (DTS) and change detection based UCB (CD-UCB). With extensive simulations, we establish the superior performance of the proposed TS-CD in the considered applications. en_US
dc.language.iso en_US en_US
dc.publisher IIIT-Delhi en_US
dc.subject Kolmogorov-Smirnov test, Stochastic Bandits, Thompson Sampling en_US
dc.title Change detection based Thompson sampling algorithm for non-stationary bandits en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Advanced Search

Browse

My Account