MultiArmedBandit

Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit problem

Generate Convert Improve

Install / Use

/learn @ReactiveCJ/MultiArmedBandit

About this skill

Quality Score

0/100

README

MultiArmedBandit

Introduction and implementation of the strategies(include Thompson Sampling) for multi-armed bandit Problem.

For the more detailed description, See MultiArmedBandit.pdf

Definition

This problem is also called Exploitation/Exploration problem.

Exploitation solves the Multi-armed Bandit Problem by exploiting the arm with the highest estimated value with respect to success times and rewards of previous plays.

Exploration solves the Multi-armed Bandit Problem by exploring any arm that does not have the highest estimated value based on previous plays.

The trade-off between exploitation and exploration is also faced in reinforcement learning.

Application

Real Time Bidding of online advertisement
Website/app optimization problem
Other Markov Desion Process Problem

#Algorithm ##Epsilon-Greedy ##SoftMax ##Upper Confidence Bound ##Thompson Sampling

#Simulate Using MCMC and Bernoulli Bandit to test the different performance between four algorithms

class BernoulliBandit(val probabilityArms:Array[Double]) {
  def draw(i:Int): Double ={
    require( i < probabilityArms.length )
    if(Random.nextDouble() > probabilityArms(i))
      0.0
    else
      1.0
  }
}

trait BanditAlgorithm {

  def selectArm():Int
  def update(chosenArm:Int,reward:Double)
  def initialize(n_arms:Int)

}

#Experiment We can compare the performance of accuracy, average reward and cumulative reward.

Following, I only show the chart on the performance of cumulative rewards. #Implementation

Build it yourself
This project can be built with sbt 0.13. And the breeze package is used for implementing beta distribution.
For SBT, Add these lines to your SBT project definition:

libraryDependencies ++= Seq(
"org.scalanlp" %% "breeze" % "0.11.2"
)

#Reference Wiki of MultiArmedBandit

Bandit Algorithms for Website Optimization by John Myles White

Analysis of Thompson Sampling for the Multi-armed Bandit Problem by Shipra and Navin

An Information-Theoretic Analysis of Thompson Sampling by Daniel And Benjamin #To Do

Contextual Bandit
Stochastic Bandit
Implementing on spark to support parallel and online learning

Related Skills

node-connect

342.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.7k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

342.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.7k

Commit, push, and open a PR