CytonRL

reinforcement learning, deep Q-network, double DQN, dueling DQN, prioritized experience replay

Generate Convert Improve

Install / Use

/learn @arthurxlw/CytonRL

About this skill

Quality Score

0/100

README

CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++

Xiaolin Wang (xiaolin.wang@nict.go.jp, arthur.xlw@gmail.com)

================================================

1) Prerequest

cuda >= 9.0 

cudnn >= 7.0

opencv  https://sourceforge.net/projects/opencvlibrary/

The Arcade Learning Environment (ALE)  https://github.com/mgbellemare/Arcade-Learning-Environment

2) Compile

make

3) Train

bin/cytonRl --env roms/breakout.bin --mode train --saveModel model/model --showScreen 1

or faster training as,

bin/cytonRl --env roms/breakout.bin --mode train --saveModel model/model --showScreen 0

The second command turn off the game window which makes the program run much faster. The game window can be brought back by creating a empty file "mode/model.screen", or turned off again by deleting that file.

4) Test

bin/cytonRl --mode test --env roms/breakout.bin --loadModel model/model.100000000 --showScreen 1

or using our trained model as

bin/cytonRl --mode test --env roms/breakout.bin --loadModel model-trained/model --showScreen 1

or faster test as

bin/cytonRl --mode test --env roms/breakout.bin --loadModel model-trained/model --showScreen 0

The game window can be brought back by creating a empty file "mode/model.screen", or turned off again by deleting that file.

================================================

If you are using our toolkit, please kindly cite our paper (available at doc/cytonRl.pdf).

@article{wang,2018cytonmt,

title={CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++},

author={Wang, Xiaolin},

year={2018}

}

================================================

5) Usage

bin/cytonRl --help

A.L.E: Arcade Learning Environment (version 0.5.1)

[Powered by Stella]

Use -help for help screen.

Warning: couldn't load settings file: ./ale.cfg

version 1.0

--help : explanation, [valid values] (default)

--batchSize : the size of batch (32)

--dueling : using dueling DQN, 0|1 (1)

--eGreedy : e-Greedy start_value : end_value : num_steps (1.0:0.01:5000000)

--env : the rom of an Atari game (roms/seaquest.bin)

--gamma : parameter in the Bellman equation (0.99)

--inputFrames : the number of concatnated frames as input (4)

--learningRate : the learning rate (0.0000625)

--learnStart : the step of starting learning (50000)

--loadModel : the file path for loading a model ()

--maxEpisodeSteps : the maximun number of steps per episode (18000)

--maxSteps : the maximun number of training steps (100000000)

--mode : working mode, train|test (train)

--networkSize : the ouput dimension of each hidden layer (32:64:64:512)

--optimizer : the optimzier, SGD|RMSprop (RMSprop)

--priorityAlpha : the alpha parameter of prioritized experience replay (0.6)

--priorityBeta : the beta parameter of prioritized experience replay (0.4)

--progPeriod : the period of showing training progress (10000)

--replayMemory : the capacity of replay memory (1000000)

--saveModel : the file path for saving models (model/model)

--savePeriod : the period of saving models (1000000)

--showScreen : whether show screens of playing Atari games, 0|1. Creating or deleting the model/model.screen file can change this setting during the running of the program. Note that hidding screens makes program run faster. (0)

--targetQ : the period of copy the current network to the target network (30000)

--testEGreedy : the e-Greedy threshold of test (0.001)

--testEpisodes : the number of test epidoes (100)

--testMaxEpisodeSteps : the maximun number of steps per test episode (18000)

--testPeriod : the period of test (5000000)

--updatePeriod : the period of learn a batch (4)

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

18.3k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

arthurxlw

View profile

View on GitHub

GitHub Stars31

CategoryEducation

Updated1y ago

Forks6

arthurxlw/cytonRL

Languages

Cuda

Security Score

80/100

Audited on Mar 21, 2025

No findings