Please use this identifier to cite or link to this item: http://repository.iiitd.edu.in/xmlui/handle/123456789/868
Full metadata record
DC FieldValueLanguage
dc.contributor.authorZaid, Kunwar
dc.contributor.authorGhatak, Gourab (Advisor)
dc.date.accessioned2021-03-26T06:53:37Z
dc.date.available2021-03-26T06:53:37Z
dc.date.issued2020
dc.identifier.urihttp://repository.iiitd.edu.in/xmlui/handle/123456789/868
dc.description.abstractThe stationary multi-armed bandit (MAB) framework is a well-studied problem in literature, with many rigorous mathematical treatments and optimal solutions. However, for a non-stationary environment, i.e., when the reward distribution changes over time, the MAB problem is notoriously difficult to analyze. In general, to address non-stationary bandit problems, researchers have proposed two approaches: i) passively adaptive techniques, that are analytically tractable, or ii) actively adaptive techniques that keep track of the environment and adapt as soon as changes are detected. Consequently, researchers have come up with variants of bandit algorithms that are based on classical solutions, e.g., sliding-window upper-confidence bound (SW-UCB), dynamic UCB (d-UCB), discounted UCB (D-UCB), discounted Thompson sampling (DTS), etc. In this regard, we consider the piecewise stationary environment, where the reward distribution remains stationary for a random time and changes at an unknown instant. We propose a class of change-detection based, actively-adaptive, TS algorithms for this framework named TS-CD. In particular, the non-stationary in the environment is modeled as a Poisson arrival process, which changes the reward distribution on each arrival. For detecting the change we employ i) mean-estimation based methods, and ii) Goodness-of-fit tests, namely the Kolmogorov-Smirnov test (KS-test) and the Anderson-Darling test (AD-test). Once a change is detected, the TS algorithm either refreshes the parameters, or discounts the past rewards. To assess the performance of the proposed algorithm, we have tested it for edge-control of i) multi-connectivity1 and ii) RAT selection in a wireless network. We have compared the TS-CD algorithms with other bandit algorithms that are designed for non-stationary environments, such as D-UCB, discounted Thompson sampling (DTS) and change detection based UCB (CD-UCB). With extensive simulations, we establish the superior performance of the proposed TS-CD in the considered applications.en_US
dc.language.isoen_USen_US
dc.publisherIIIT-Delhien_US
dc.subjectKolmogorov-Smirnov test, Stochastic Bandits, Thompson Samplingen_US
dc.titleChange detection based Thompson sampling algorithm for non-stationary banditsen_US
dc.typeThesisen_US
Appears in Collections:Year-2020

Files in This Item:
File Description SizeFormat 
MT18164_Kunwar Zaid.pdf830.4 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.