MultiArmedBandit
Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit problem
Install / Use
/learn @ReactiveCJ/MultiArmedBanditREADME
MultiArmedBandit
Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit Problem.
For the more detailed description, See MultiArmedBandit.pdf
Definition
This problem is also called Exploitation/Exploration problem.
Exploitation solves the Multi-armed Bandit Problem by exploiting the arm with the highest estimated value with respect to success times and rewards of previous plays.
Exploration solves the Multi-armed Bandit Problem by exploring any arm that does not have the highest estimated value based on previous plays.
The trade-off between exploitation and exploration is also faced in reinforcement learning.
Application
- Real Time Bidding of online advertisement
- Website/app optimization problem
- Other Markov Desion Process Problem
#Algorithm
##Epsilon-Greedy
##SoftMax
##Upper Confidence Bound
##Thompson Sampling

#Simulate Using MCMC and Bernoulli Bandit to test the different performance between four algorithms
class BernoulliBandit(val probabilityArms:Array[Double]) {
def draw(i:Int): Double ={
require( i < probabilityArms.length )
if(Random.nextDouble() > probabilityArms(i))
0.0
else
1.0
}
}
trait BanditAlgorithm {
def selectArm():Int
def update(chosenArm:Int,reward:Double)
def initialize(n_arms:Int)
}
#Experiment We can compare the performance of accuracy, average reward and cumulative reward.
Following, I only show the chart on the performance of cumulative rewards.
#Implementation
- Build it yourself
- This project can be built with sbt 0.13. And the breeze package is used for implementing beta distribution.
- For SBT, Add these lines to your SBT project definition:
libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "0.11.2"
)
#Reference Wiki of MultiArmedBandit
Bandit Algorithms for Website Optimization by John Myles White
Analysis of Thompson Sampling for the Multi-armed Bandit Problem by Shipra and Navin
An Information-Theoretic Analysis of Thompson Sampling by Daniel And Benjamin #To Do
- Contextual Bandit
- Stochastic Bandit
- Implementing on spark to support parallel and online learning
Related Skills
node-connect
342.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.7kCommit, push, and open a PR
