Bayesian Active Learning on Multi-armed Bandit

This repository simulates the Bayesian active learning on Multi-armed Bandit using Beta prior. Three different agents are running on the stationary multi-armed bandit machine each of which uses a different strategy for balancing between exploration and exploitation

TS - TS agent select the arm according to the result of Thompson Sampling on the posterior probability of each arm.
R - R agent randomly selects the arm at each iteration
G - G agent greedily selects the arm that has a maximum mean posterior estimate

Settings

10 arms, flat Beta prior (i.e., $\alpha$, $\beta$ = 1)

Experiment 1 $$θ_1 = 0.9, θ_2 = 0.8, θ_3 = θ_4, . . . = θ_{10} = 0.5$$
Experiment 2 $$θ_1 = 0.9, θ_2 = 0.88, θ_3 = θ_4, . . . = θ_{10} = 0.5$$
Experiment 3 $$θ_1 = 0.9, θ_2, θ_3 = θ_4, . . . = θ_{10} = 0.5$$

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
agents.py		agents.py
experiments.py		experiments.py
multi_armed_bandit.py		multi_armed_bandit.py
results.png		results.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bayesian Active Learning on Multi-armed Bandit

Settings

Results

About

Languages

posgnu/bayesian-active-learning-on-multi-armed-bandit

Folders and files

Latest commit

History

Repository files navigation

Bayesian Active Learning on Multi-armed Bandit

Settings

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages