Generalised Filtering in Pytorch

General framework for active Bayesian inversion of continuous hierarchical dynamic models (HDM) for inference, see e.g.,

Friston, Karl. "Hierarchical models in the brain." PLoS Comput Biol 4.11 (2008),
Roweis, Sam, and Zoubin Ghahramani. "A unifying review of linear Gaussian models." Neural computation 11.2 (1999),

and control

Friston, Karl J., et al. "Action and behavior: a free-energy formulation." Biological cybernetics 102.3 (2010)
Kappen, Hilbert J., Vicenç Gómez, and Manfred Opper. "Optimal control as a graphical model inference problem." Machine learning 87.2 (2012)

with the particular structure advocated by active inference models, compared and discussed in more detail in

Baltieri, Manuel, and Christopher L. Buckley. "On Kalman-Bucy filters, linear quadratic control and active inference." arXiv preprint arXiv:2005.06269 (2020) (for continuous time models), and
Millidge, Beren, et al. "On the Relationship Between Active Inference and Control as Inference." International Workshop on Active Inference. Springer, Cham, 2020. (for discrete time models).

The present implementation further follows the relaxed assumptions driving to the the development of Friston, Karl, et al. "Generalised filtering." Mathematical Problems in Engineering (2010). The framework is also heavily inspired by spm/DEM (see Statistical Parametric Mapping).

Features

GPU support

The library is currently developed using pytorch, with the option of running large-scale simulations on GPUs. Pytorch was the first, and one if not the more natural choice at the beginning of this package's development due to the flexibility of dynamic computational graphs (the first idea to build this library and subsequent prototype came about before Tensorflow 2.0 was released).

The natural support for automated differentiation was also one of the main reasons to try and provide an alternative to the original code in spm/DEM which, to date, relies on numerical differentiation. Over time however, it has become increasingly clear that the use of specialised methods for numerical integration of SDEs, i.e., Local Linearisation (see next section), is hardly made any easier by pytorch and its automated differentiation system (at least up to March 2021 when this file was last updated) due to the strong reliance of the method here adopted on derivatives of vector functions (i.e., Jacobians rather than gradients). Due to the performance of the LL algorithm however, it is hard to imagine switching to different integration schemes and it is then perhaps worth investigating if moving to a different backend (JAX?) is a better option.

Arbitrary embedding orders for non-Markovian continuous-time processes

The treatment of non-Markovian stochastic processes is swiftly handled in discrete time via 'state augmentation', a technique that allows the conversion of non-Markovian variables, or rather Markov of order n (i.e., with non-zero autocorrelation), to Markovian ones, Markov of order 1, by augmenting the dimension of the state space. In continuous time however, this state augmentation technique can be more problematic to implement since an infinite of extra orders might have to be added for continuous autocorrelations, so approximations are usually in place, i.e., the number of extra states is truncated. A formulation of some of these issues is provided in Friston, Karl. "Hierarchical models in the brain." PLoS Comput Biol 4.11 (2008), and a comparison between state augmentation analogous to the discrete time case and a (linearised) Taylor expansion of non-Markovian varibles is provided in the section 'State space models in generalised coordinates of motion' (here 'generalised coordinates of motion' := embedding orders). This follows classical treatments of continuous-time stochastic processes (in both time and frequency domains) found, for example, in Cox, David Roxbee, and Hilton David Miller. The theory of stochastic processes. Vol. 134. CRC press, 1977.

Effective numerical integration

Inspired by Friston, Karl. "Hierarchical models in the brain." PLoS Comput Biol 4.11 (2008), the numerical integration of stochastic differential equations is carried out using a method first proposed by Ozaki, Tohru. "A bridge between nonlinear time series models and nonlinear stochastic dynamical systems: a local linearization approach." Statistica Sinica (1992) (see also Jimenez, J. C., I. Shoji, and T. Ozaki. "Simulation of stochastic differential equations through the local linearization method. A comparative study." Journal of Statistical Physics 94.3 (1999): 587-602.). This method was originally devised as bridge between (discrete-time) stochastic difference equations and (continuous-time) stochastic differential equations, but has later gained a certain amount of success also as an integration scheme for the problemic nature of SDEs simulations in continuous time (Kloeden, Peter Eris, Eckhard Platen, and Henri Schurz. Numerical solution of SDE through computer experiments. Springer Science & Business Media, 2012).

Local linearisation (LL) provides a robust integration scheme that allows to simulate continuous-time SDEs using an approximation based on the Jacobian of the integrand between pairs of time steps (cf. a constant used in the more basic Euler-Maruyama). Some references containing more details:

The method is extremely effective, but should not be seen as a 'free lunch', since the main idea is to simply discard the continuous time formulation in favour of a discrete one with arbitrarily small step, but crucially not a step that tends (in the limit) to zero so to recover the continuous time limit. If properties from the continuous time formulation are required (for example for some analytical calculation), this implementation will obviously just provide a perhaps useful approximation.

<!-- #### Local linearisation To gain an undestanding of the basic implementation of LL, we first look at its application on an ODE example (i.e., without noise), see also sec. 9.2.4 in Ozaki, Tohru. Time series modeling of neuroscience data. CRC press, 2012. (the generalisation to SDEs is rather straightforward and can be found in section 9.2.5, while in section 9.2.6 we find an application to SDEs with driving inputs): $$ \dot{x}(t) = \frac{d x(t)}{d t} = f(x(t)) $$ with $x \in R^n$ (for n>1, see section 9.3 in Ozaki, Tohru. Time series modeling of neuroscience data. CRC press, 2012). After differentiating both sides with respect to $t$ $$\frac{d}{d t} \dot{x}(t) = \frac{d}{d t} f(x(t))$$ using the chain rule, we can rewrite the equation as $$\ddot{x}(t) = \frac{d}{d x} f(x(t)) \frac{d x(t)}{d t}$$ and with the Jacobian $J_f(x(t)) = \frac{d}{d x} f(x(t))$, obtain $$\ddot{x}(t) = J_f(x(t)) \dot{x}(t)$$Applied Stochastic Differential Equations The key assumption behin

GeneralisedFiltering

Install / Use

README

Generalised Filtering in Pytorch

Features

GPU support

Arbitrary embedding orders for non-Markovian continuous-time processes

Effective numerical integration