Deepxube
Learn a domain-specific heuristic function in a domain-independent fashion to solve pathfinding problems.
Install / Use
/learn @forestagostinelli/DeepxubeREADME
<img src="./misc/images/scrambledCube.png" width="50"> DeepXube <img src="./misc/images/solvedCube.png" width="50">
Write-Up
Overview
DeepXube (pronounced "Deep Cube") aims to solve pathfinding using a combination of deep reinforcement learning and heuristic search.
- Learn a heuristic function that maps states and goals to an estimate of the cost-to-go from the given state to the given goal.
- Use the learned heuristic function with heuristic search algorithms, such as batch weighted A* search or batch weighted Q* search, to solve problem instances.
DeepXube is a generalization of DeepCubeA (code, paper).
For any issues, you can create a GitHub issue or contact Forest Agostinelli (foresta@cse.sc.edu).
Outline:
Installation
pip install deepxube
See INSTALL.md for more information.
Documentation
Documentation of all modules is here.
Command Line
deepxube can be run from the command line via the deepxube command (you may have to use python -m deepxube on Windows).
Run deepxube --help for detailed information. --help can also be run on
positional arguments (e.g. deepxube train --help).
Quick run
There is a simple example domain and heuristic function located in deepxube/domains/grid.py.
After installing deepxube run:
- Information
- Domain information:
deepxube domain_info - Heuristic information:
deepxube heuristic_info - Pathfinding information:
deepxube pathfinding_info
- Domain information:
- Generate problem instances for solving:
deepxube problem_inst --domain grid.7 --step_max 1000 --num 100 --file valid.pkl --redo - Visualization
- Visualize random problem instance:
deepxube viz --domain grid.7 --steps 10 - Visualize a problem instance from a file:
deepxube viz --domain grid.7 --file valid.pkl --idx 13
- Visualize random problem instance:
- Time to ensure basic functionality (Can use breakpoints in your code to debug):
- With deepxube residual neural network:
deepxube time --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type V - With deepxube residual neural network and deep Q-network:
deepxube time --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type QFix - With custom neural network:
deepxube time --domain grid.7 --heur gridnet.8CH_200FC --heur_type V
- With deepxube residual neural network:
- Training
- Train heuristic function:
deepxube train --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type V --pathfind graph_v --step_max 100 --up_itrs 100 --search_itrs 20 --backup -1 --procs 1 --batch_size 50 --max_itrs 5000 --dir dummy/ - Use tensorboard to see training progress:
tensorboard --logdir=dummy/ - Plot more detailed training information with interactive slider for training iteration:
deepxube train_summary --dir dummy
- Train heuristic function:
- Solving
- Solve problem instances with all-zero heuristic:
deepxube solve --domain grid.7 --heur_type V --pathfind graph_v.1B_1.0W --file valid.pkl --results results_zeros_ex/ --redo - Solve problem instances with trained heuristic:
deepxube solve --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_file dummy/heur.pt --heur_type V --pathfind graph_v.1B_1.0W --file valid.pkl --results results_trained_ex/ --redo - Solving with the trained heuristic should have a significantly lower number of nodes generated and number of iterations.
- You can get the actions taken for each instance with the "actions" key in the dictionary saved in the
results.pklfile in the--resultsdirectory. - Visualize solutions with
deepxube viz --domain grid.7 --file results_trained_ex/results.pkl --idx 3 --soln. This prints actions taken. You can implement__repr__for the action class to determine what gets printed for each action.
- Solve problem instances with all-zero heuristic:
Domains
User-defined domains should go in the ./domains/ folder.
deepxube will recursively search this directory and import all modules so that domains are registered.
Grid inherits from Mixin classes from deepxube.base.domain, which give it additional functionality (see the write-up and domain documentation).
ActsEnumFixed:Gridimplementsget_actions_fixed- Methods obtained:
sample_action,get_state_actions,expand,get_num_acts
- Methods obtained:
StartGoalWalkable:Gridimplementssample_goal_from_stateandsample_start_states- Methods obtained:
sample_start_goal_pairs
- Methods obtained:
By using registers from deepxube.factories.domain_factory, Grid can be obtained from its name.
Furthermore, a parser can be implemetned and registered to allow one to specify arguments for the constructor via the command line.
By convention, everything after '.' are considered arguments.
Running deepxube domain_info in a directory with domains/grid.py should produce information about grid (amongst other available domains):
Domain Visualization
Visualization of states/goals and the domain transition function can be useful to validating it.
To accomplish this, a domain can inherit from StateGoalVizable to convert state/goal pairs to figures
and inherit from StringToAct to be able to type actions into the command line and see how it changes the state.
By running deepxube viz --domain grid.7 --steps 10 will create a start/goal pair by taking a random walk of length 10 and visualize it.
One can vary the grid size by simply changing the number (e.g. deepxube viz --domain grid.10 --steps 10).
After applying an action, the transition cost and whether or not the goal is reached will be printed.
Neural Network Inputs
deepxube trains heuristic functions represented as neural networks. Different kinds of neural networks expect different kinds of inputs.
By inheriting from Mixins from deepxube.base.nnet_input a NNetInput class can be dynamically created for a domain.
Grid inherits from HasFlatSGActsEnumFixedIn and implements get_input_info_flat_sg, to_np_flat_sg, and actions_to_indices.
From this, if a neural network expects a flat (1D) input from a state/goal pair, then a NNetInput class that tells the neural network the
dimension of the input, the number of inputs, and that converts state/goal pairs to numpy arrays is dynamically created.
Furthermore, if deep Q-network is used with an output for each action, then the heuristic function is automatically modified to have the correct output dimension.
Hence, the flat_sg and flat_sg_actfix in the domain information.
A NNetInput object has a to_np function which converts a problem instance, or a problem instance along with actions in the case of a deep Q-network that
takes an action as input, to a list of numpy arrays. Each row of each array in the list of numpy arrays represents a different problem instance. Therefore,
multiple arrays of different sizes can be used to represent a single problem instance, if needed.
With this dynamic NNetInput creation, Grid can be used with deepxube's built in resnet_fc heuristic function, which expects a flat input.
Custom neural network input types can also be created and registered. Given a heuristic function, deepxube searches for a registered NNetInput class that
matches its expected input. If multiple exist, it uses the first one it finds.
Heuristics
User-defined neural networks for heuristic functions can go in the domain file, itself, or, if generlizable to multiple domains, should go in the go in the ./heuristics/ folder.
deepxube will recursively search this directory and import all modules so that heuristic functions are registered.
A heuristic function is constructed given a neural network input of a pre-determined type, the dimensionality of the output, and a boolean indicating whether
or not the neural network represents a deep Q-network with a fixed output size. The forward portion of the neural network expects a list of Tensors
(this corresponds to the list of numpy arrays from the NNetInput) and returns a Tensor representing heuristic values. If the output is a fixed set of actions,
the heuristic neural network automatically obtains the corresponding indices from the output.
An example of a custom neural network along with its parser is shown in GridNet and GridNetParser in deepxube/domains/grid.py.
Pathfinding
The current pathfinding algorithms implemented include graph search, which unifies batched and weighted versions of A* and Q* search, and beam search. There are other pathfinding algorithms specifically used for training a heuristic function via supervised learning that sets the target cost-to-go to be that of the path cost obtained from a random walk (see References for more information).
Training
Training is given the update frequency $U$ and a batch size of $N$ and a total of $U \cdot N$ training instances are generated.
When doing reinforcement learning based training, search is performed with the heuristic function with the given pathfinding algorithm for $I$ iterations and all states expanded during search are added to the training set. deepxube expects the pathfinding algorithm to expand exactly one state for each search iteration. Therefore, $\frac{U\cdot N}{I}$ proble
