<img src="./misc/images/scrambledCube.png" width="50"> DeepXube <img src="./misc/images/solvedCube.png" width="50">

Write-Up

Overview

DeepXube (pronounced "Deep Cube") aims to solve pathfinding using a combination of deep reinforcement learning and heuristic search.

Learn a heuristic function that maps states and goals to an estimate of the cost-to-go from the given state to the given goal.
Use the learned heuristic function with heuristic search algorithms, such as batch weighted A* search or batch weighted Q* search, to solve problem instances.

DeepXube is a generalization of DeepCubeA (code, paper).

For any issues, you can create a GitHub issue or contact Forest Agostinelli (foresta@cse.sc.edu).

Outline:

Installation
Documentation
Command line
Future Additions
References

Installation

pip install deepxube

See INSTALL.md for more information.

Documentation

Documentation of all modules is here.

Command Line

deepxube can be run from the command line via the deepxube command (you may have to use python -m deepxube on Windows).

Run deepxube --help for detailed information. --help can also be run on positional arguments (e.g. deepxube train --help).

Quick run

There is a simple example domain and heuristic function located in deepxube/domains/grid.py.

After installing deepxube run:

Information
- Domain information: deepxube domain_info
- Heuristic information: deepxube heuristic_info
- Pathfinding information: deepxube pathfinding_info
Generate problem instances for solving: deepxube problem_inst --domain grid.7 --step_max 1000 --num 100 --file valid.pkl --redo
Visualization
- Visualize random problem instance: deepxube viz --domain grid.7 --steps 10
- Visualize a problem instance from a file: deepxube viz --domain grid.7 --file valid.pkl --idx 13
Time to ensure basic functionality (Can use breakpoints in your code to debug):
- With deepxube residual neural network: deepxube time --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type V
- With deepxube residual neural network and deep Q-network: deepxube time --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type QFix
- With custom neural network: deepxube time --domain grid.7 --heur gridnet.8CH_200FC --heur_type V
Training
- Train heuristic function: deepxube train --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_type V --pathfind graph_v --step_max 100 --up_itrs 100 --search_itrs 20 --backup -1 --procs 1 --batch_size 50 --max_itrs 5000 --dir dummy/
- Use tensorboard to see training progress: tensorboard --logdir=dummy/
- Plot more detailed training information with interactive slider for training iteration: deepxube train_summary --dir dummy
Solving
- Solve problem instances with all-zero heuristic: deepxube solve --domain grid.7 --heur_type V --pathfind graph_v.1B_1.0W --file valid.pkl --results results_zeros_ex/ --redo
- Solve problem instances with trained heuristic: deepxube solve --domain grid.7 --heur resnet_fc.100H_2B_bn --heur_file dummy/heur.pt --heur_type V --pathfind graph_v.1B_1.0W --file valid.pkl --results results_trained_ex/ --redo
- Solving with the trained heuristic should have a significantly lower number of nodes generated and number of iterations.
- You can get the actions taken for each instance with the "actions" key in the dictionary saved in the results.pkl file in the --results directory.
- Visualize solutions with deepxube viz --domain grid.7 --file results_trained_ex/results.pkl --idx 3 --soln. This prints actions taken. You can implement __repr__ for the action class to determine what gets printed for each action.

Domains

User-defined domains should go in the ./domains/ folder. deepxube will recursively search this directory and import all modules so that domains are registered.

Grid inherits from Mixin classes from deepxube.base.domain, which give it additional functionality (see the write-up and domain documentation).

ActsEnumFixed: Grid implements get_actions_fixed
- Methods obtained: sample_action, get_state_actions, expand, get_num_acts
StartGoalWalkable: Grid implements sample_goal_from_state and sample_start_states
- Methods obtained: sample_start_goal_pairs

By using registers from deepxube.factories.domain_factory, Grid can be obtained from its name. Furthermore, a parser can be implemetned and registered to allow one to specify arguments for the constructor via the command line. By convention, everything after '.' are considered arguments.

Running deepxube domain_info in a directory with domains/grid.py should produce information about grid (amongst other available domains):

Domain Visualization

Visualization of states/goals and the domain transition function can be useful to validating it. To accomplish this, a domain can inherit from StateGoalVizable to convert state/goal pairs to figures and inherit from StringToAct to be able to type actions into the command line and see how it changes the state.

By running deepxube viz --domain grid.7 --steps 10 will create a start/goal pair by taking a random walk of length 10 and visualize it. One can vary the grid size by simply changing the number (e.g. deepxube viz --domain grid.10 --steps 10). After applying an action, the transition cost and whether or not the goal is reached will be printed.

Neural Network Inputs

deepxube trains heuristic functions represented as neural networks. Different kinds of neural networks expect different kinds of inputs. By inheriting from Mixins from deepxube.base.nnet_input a NNetInput class can be dynamically created for a domain.

Grid inherits from HasFlatSGActsEnumFixedIn and implements get_input_info_flat_sg, to_np_flat_sg, and actions_to_indices. From this, if a neural network expects a flat (1D) input from a state/goal pair, then a NNetInput class that tells the neural network the dimension of the input, the number of inputs, and that converts state/goal pairs to numpy arrays is dynamically created. Furthermore, if deep Q-network is used with an output for each action, then the heuristic function is automatically modified to have the correct output dimension.
Hence, the flat_sg and flat_sg_actfix in the domain information.

A NNetInput object has a to_np function which converts a problem instance, or a problem instance along with actions in the case of a deep Q-network that takes an action as input, to a list of numpy arrays. Each row of each array in the list of numpy arrays represents a different problem instance. Therefore, multiple arrays of different sizes can be used to represent a single problem instance, if needed.

With this dynamic NNetInput creation, Grid can be used with deepxube's built in resnet_fc heuristic function, which expects a flat input.

Custom neural network input types can also be created and registered. Given a heuristic function, deepxube searches for a registered NNetInput class that matches its expected input. If multiple exist, it uses the first one it finds.

Heuristics

User-defined neural networks for heuristic functions can go in the domain file, itself, or, if generlizable to multiple domains, should go in the go in the ./heuristics/ folder. deepxube will recursively search this directory and import all modules so that heuristic functions are registered.

A heuristic function is constructed given a neural network input of a pre-determined type, the dimensionality of the output, and a boolean indicating whether or not the neural network represents a deep Q-network with a fixed output size. The forward portion of the neural network expects a list of Tensors (this corresponds to the list of numpy arrays from the NNetInput) and returns a Tensor representing heuristic values. If the output is a fixed set of actions, the heuristic neural network automatically obtains the corresponding indices from the output.

An example of a custom neural network along with its parser is shown in GridNet and GridNetParser in deepxube/domains/grid.py.

Pathfinding

The current pathfinding algorithms implemented include graph search, which unifies batched and weighted versions of A* and Q* search, and beam search. There are other pathfinding algorithms specifically used for training a heuristic function via supervised learning that sets the target cost-to-go to be that of the path cost obtained from a random walk (see References for more information).

Training

Training is given the update frequency $U$ and a batch size of $N$ and a total of $U \cdot N$ training instances are generated.

When doing reinforcement learning based training, search is performed with the heuristic function with the given pathfinding algorithm for $I$ iterations and all states expanded during search are added to the training set. deepxube expects the pathfinding algorithm to expand exactly one state for each search iteration. Therefore, $\frac{U\cdot N}{I}$ proble

Deepxube

Install / Use

README