umbralcalculations: Technical Article Repository

Building Python simulations of limit order books

Author. Hardwick, Robert J
Date. 2021-11-14
Concept. To illustrate the basic building blocks required to construct a full microsimulation of limit order book dynamics for financial markets. The simulation is a synchronous ensemble version of the well-studied epsilon-intelligence model. This is a short post on how the simulation was conceptualised, analysing some of its outputs and outlining prospects for potential future work on price emulation.

Introduction

The simulation of financial markets through the use of agent-based models is an increasingly popular technique to understand the microstructure of their dynamics from the bottom up. See, for some examples:

Raberto et al. (2001) is a basic model of a financial market with constant total cash being conserved [1].
Alfarano, Lux & Wagner (2005) details the construction of a market microsimulation with asymmetric herding to explain the fat tails in the distribution of returns while also showing how it follows a Fokker-Planck equation up to second-order corrections [2].
Smith et al. (2006) is the original Santa Fe model for limit order book (LOB) simulations [3].
Tóth et al. (2011) extends the original Santa Fe models to become ‘\(\epsilon\)-intelligence’ models [4].
Mastromatteo, Tóth & Bouchaud (2014) develops the \(\epsilon\)-intelligence model further by including non-unit market and limit meta-orders [5].
Chen, Tan & Zheng (2015) looks at multi-level herding in agent-based market simulation. This is able to maintain both sector-level structure of markets and clustering between them at the same time as their respective temporal correlation structures [6].
Bouchaud et al. (2018) is a detailed textbook on market microstructure [7].
Chen et al. (2017) looks at the ‘information driving force’ as a way to calibrate their specific form of agent-based model empirically [8].
Marcaccioli, Bouchaud, Benzaquen (2021) shows how endogenous (self-exciting) and exogeneous (driven by external factors like sudden news breaks) price jumps belong to different dynamical classes as their relaxation profiles can be inferred to be different [9].

In this post, we’ll build an example LOB simulation which replicates many of the key empirical stylised facts in real-world financial markets.

A synchronous ensemble version of the \(\epsilon\)-intelligence model

The simulation we will design is an individual-agent version of the \(\epsilon\)-intelligence extension to the original Santa Fe model. In plate notation, where have also added blue-shaded regions to indicate when a choice of path must be taken for a given moment in time, the LOB simulation we have constructed can be summarised by the following graphs

In the above model, we separate liquidity providers in the market; who place limit orders at price \(p\) with rates \(\lambda^B_p\) and \(\lambda^A_p\) on the bid and ask side, respectively, and cancel these orders at a uniform rate of \(\nu\) for each individual unit of volume from liquidity takers; who place market orders at price \(p\) for the \(i\)-th agent at rates \(\mu^B_{pi}\) and \(\mu^A_{pi}\).

Let us denote the complete set of volumes for both bid \(B_p\equiv\sum_{i=0}^{N_{\rm pr}}B_{pi}\) and ask \(A_p\equiv\sum_{i=0}^{N_{\rm pr}}A_{pi}\) for all liquidity providing agents and at all \(p\) prices, i.e., \({\cal V} = \{ \dots, B_{pi}, \dots, A_{pi}, \dots\}\). Using this notation, an approximate Markovian LOB master equation for the distribution over volumes \(P({\cal V}, t)\) is

\[ \begin{align} \frac{{\rm d}}{{\rm d}t}P({\cal V}, t) &= \sum_{p=\theta}^\infty \sum_{i=1}^{N_{\rm pr}}\bigg[ \sum^\infty_{S=1}\lambda^B_{p}({\cal V}_{B_p:B_p-S}, t) L_{i} (S\vert B_{p}-S) P({\cal V}_{B_p:B_p-S}, t) - \lambda^B_{p}({\cal V}, t) P({\cal V}, t)\bigg] \\ & + \sum_{p=\theta}^\infty \sum_{i=1}^{N_{\rm pr}}\bigg[ \sum^\infty_{S=1}\lambda^A_{p}({\cal V}_{A_p:A_p-S}, t) L_{i} (S\vert A_{p}-S) P({\cal V}_{A_p:A_p-S}, t) - \lambda^A_{p}({\cal V}, t) P({\cal V}, t) \bigg] \\ & + \sum_{p=\theta}^\infty \sum_{i=1}^{N_{\rm ta}}\bigg[ \sum^\infty_{S=1}\mu^B_{pi}({\cal V}_{A_p:A_{p}+S}, t) M_{i} (S\vert {\cal V}_{A_p:A_p+S}, t) P({\cal V}_{A_p:A_p+S}, t) - \mu^B_{pi}({\cal V}, t) P({\cal V}, t)\bigg] \\ & + \sum_{p=\theta}^\infty \sum_{i=1}^{N_{\rm ta}}\bigg[ \sum^\infty_{S=1}\mu^A_{pi}({\cal V}_{B_p:B_{p}+S}, t) M_{i} (S\vert {\cal V}_{B_p:B_{p}+S}, t) P({\cal V}_{B_p:B_{p}+S}, t) - \mu^A_{pi}({\cal V}, t) P({\cal V}, t) \bigg] \\ & + \sum_{p=\theta}^\infty \sum_{i=1}^{N_{\rm pr}}\bigg[ (B_{p}+1)\nu^B P({\cal V}_{B_p:B_{p}+1}, t) - B_{p}\nu^B P({\cal V}, t)\bigg] \\ & + \sum_{p=\theta}^\infty \sum_{i=1}^{N_{\rm pr}}\bigg[ (A_{p}+1)\nu^A P({\cal V}_{A_p:A_{p}+1}, t) - A_{p}\nu^A P({\cal V}, t)\bigg] \,, \end{align} \]

where in the above relation we have used the notation \({\cal V}_{X:Y}\) to denote the set \({\cal V}\) where the element \(X\) has been replaced with \(Y\). The limit order rates for each liquidity-providing agent are given by

\[ \begin{align} \lambda^B_{p}({\cal V}, t) &= \sum_{\delta b_-\in \{0, 1\}}(1+\alpha \delta b_-)P(\delta b_-\vert {\cal V}, t) e^{-\frac{d}{2}[a({\cal V})+b({\cal V})-2p]}\lambda^B \mathbb{1}_{a({\cal V})+b({\cal V})\geq 2p} \\ \lambda^A_{p}({\cal V}, t) &= \sum_{\delta a_+\in \{0, 1\}}(1+\alpha \delta a_+)P(\delta a_+\vert {\cal V}, t) e^{-\frac{d}{2}[2p-a({\cal V})-b({\cal V})]}\lambda^A \mathbb{1}_{2p\geq a({\cal V})+b({\cal V})} \,, \end{align} \]

where the distribution \(P(\delta b_-\vert {\cal V}, t)\) denotes the conditional probability that the best bid price decreased (\(\delta b_-=1\)) or not (\(\delta b_-=0\)) in the last transaction of the market given the current state \(({\cal V},t)\). Similarly, the distribution \(P(\delta a_+\vert {\cal V}, t)\) denotes the conditional probability that the best ask price increased (\(\delta a_+=1\)) or not (\(\delta a_+=0\)) in the last transaction of the market given the current state \(({\cal V},t)\).

The market order rates for each liquidity-taking agent are given by

\[ \begin{align} \mu^B_{pi}({\cal V}, t) &= \sum_{\epsilon_i\in\{-1, 1\}}(1+\epsilon_i)P_i(\epsilon_i\vert {\cal V}, t)\mu^B \mathbb{1}_{a({\cal V})=p} \\ \mu^A_{pi}({\cal V}, t) &= \sum_{\epsilon_i\in\{-1, 1\}}(1-\epsilon_i)P_i(\epsilon_i\vert {\cal V}, t)\mu^A \mathbb{1}_{b({\cal V})=p} \,, \end{align} \]

where \(\epsilon_i\) denotes the market order ‘sign’ of each liquidity-taking agent, which can change according to the strategy adopted. The \(\epsilon_i\) values for each of these agents are autocorrelated in time and are generated according to the power-law latent volume distribution suggested in [5].

Running simulations

The model we have introduced and analysed above has been implemented in the src/ folder of this repo: https://github.com/umbralcalc/lobsim using a synchronous ensemble rejection algorithm. All we need to do is call the necessary class structures and run them to see the LOB sim in action. The key point to remember for this algorithm to produce sensible results will be to make sure the overall holding rate is large enough to minimise the number of anachronisms in the order flow.

After running a simulation with our Python code, the dynamics of the LOB can be tracked by a mid-price time series and snapshot observations of the state of orders in the book itself. These are displayed in the plots below.

In order to investigate how well the simulated LOB matches the stylised facts of real books, we can look at the distributions of returns as well as the variogram correlations for different choices of timescale.

Future potential work on price emulation

In future, we could potentially try to build a model which compresses \(({\cal V},t)\rightarrow (b, a, t)\). To do this, we would need to make a few approximations to the expressions in the previous section in order to gain some tractability. Using the processes described in the \(P({\cal V},t)\) master equation, we could write down an approximate master equation for \(P(b, a, t)\) which assumes all price movements take value \(\theta\) (we can investigate the validity of this assumption with respect to the full LOB simulation later)

\[ \begin{align} &\frac{{\rm d}}{{\rm d}t}P(b, a, t) \simeq \\ & \sum_{p=b}^\infty \sum_{i=1}^{N_{\rm pr}} \bigg[ \tilde{\lambda}^B_{p}(b-\theta, a, t)J (\theta \vert b-\theta, a, t)P(b-\theta, a, t) - \tilde{\lambda}^B_{p}(b, a, t)P(b, a, t) \bigg] \\ & + \sum_{p=\theta}^a \sum_{i=1}^{N_{\rm pr}} \bigg[ \tilde{\lambda}^A_{p}(b, a+\theta, t)J (\theta \vert b, a+\theta, t)P(b, a+\theta, t) - \tilde{\lambda}^A_{p}(b, a, t)P(b, a, t) \bigg] \\ & + \sum_{i=1}^{N_{\rm ta}} \bigg[ \tilde{\mu}^B_{(a-\theta )i}(b, a-\theta, t) \tilde{M}_{i} (A_{a-\theta}\vert b, a-\theta, t) J (\theta \vert b, a-\theta, t) P(b, a-\theta, t) - \tilde{\mu}^B_{ai}(b, a, t) \tilde{M}_{i} (A_{a}\vert b, a, t)P(b, a, t) \bigg] \\ & + \sum_{i=1}^{N_{\rm ta}} \bigg[ \tilde{\mu}^A_{(b+\theta )i}(b+\theta, a, t) \tilde{M}_{i} (B_{b+\theta}\vert b+\theta, a, t) J (\theta \vert b+\theta, a, t) P(b+\theta, a, t) - \tilde{\mu}^A_{bi}(b, a, t) \tilde{M}_{i} (B_b\vert b, a, t)P(b, a, t) \bigg] \\ & + \sum_{i=1}^{N_{\rm pr}} \nu^A \bigg[ J (\theta \vert b, a-\theta, t) P(b, a-\theta, t) - P(b, a, t) \bigg] + \sum_{i=1}^{N_{\rm pr}} \nu^B \bigg[ J (\theta \vert b+\theta, a, t)P(b+\theta, a, t) - P(b, a, t) \bigg] \,, \end{align} \]

where \(J (\theta \vert b, a, t)\) is the conditional distribution over price jump size \(\theta\) triggered by any individual market event (one could potentially distinguish between types of triggering events, however we shall attempt to approximate this as a single overall distribution here).

From the equation above we can quickly ascertain that the means and square-expecations of the marginal distributions over \(b\) and \(a\) evolve according to the equations

\[ \begin{align} \frac{{\rm d}}{{\rm d}t}{\rm E}_t(b) &\simeq \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{p=b}^\infty \sum_{i=1}^{N_{\rm pr}} {\rm E} (\theta \vert b, a, t) \tilde{\lambda}^B_{p}(b, a, t)P(b, a, t) \nonumber \\ &- \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{i=1}^{N_{\rm ta}} {\rm E} (\theta \vert b, a, t)\tilde{\mu}^A_{bi}(b, a, t) \tilde{M}_{i} (B_{b}\vert b, a, t)P(b, a, t) - \nu^B{\rm E}_t(\theta) \\ \frac{{\rm d}}{{\rm d}t}{\rm E}_t(a) &\simeq \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{i=1}^{N_{\rm ta}} {\rm E} (\theta \vert b, a, t)\tilde{\mu}^B_{ai}(b, a, t) \tilde{M}_{i} (A_{a}\vert b, a, t)P(b, a, t) \nonumber \\ &+ \nu^A{\rm E}_t(\theta) -\sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{p=a}^\theta \sum_{i=1}^{N_{\rm pr}} {\rm E} (\theta \vert b, a, t) \tilde{\lambda}^A_{p}(b, a, t)P(b, a, t) \\ \frac{{\rm d}}{{\rm d}t}{\rm E}_t(b^2) &\simeq \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{p=b}^\infty \sum_{i=1}^{N_{\rm pr}} \big[ 2b{\rm E} (\theta \vert b, a, t) + {\rm E} (\theta^2 \vert b, a, t) \big] \tilde{\lambda}^B_{p}(b, a, t)P(b, a, t) \\ &- \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{i=1}^{N_{\rm ta}} \big[ 2b{\rm E}(\theta \vert b, a, t) - {\rm E} (\theta^2 \vert b, a, t) \big] \tilde{\mu}^A_{bi}(b, a, t) \tilde{M}_{i} (B_{b}\vert b, a, t)P(b, a, t) - \big[ 2{\rm E}_t(b\theta) - {\rm E}_t(\theta^2) \big] \nu^B \\ \frac{{\rm d}}{{\rm d}t}{\rm E}_t(a^2) &\simeq \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{i=1}^{N_{\rm ta}} \big[ 2a{\rm E} (\theta \vert b, a, t) + {\rm E} (\theta^2 \vert b, a, t) \big] \tilde{\mu}^B_{ai}(b, a, t) \tilde{M}_{i} (A_{a}\vert b, a, t)P(b, a, t) \\ &+ \big[ 2{\rm E}_t(a\theta) - {\rm E}_t(\theta^2) \big] \nu^A -\sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{p=a}^\theta \sum_{i=1}^{N_{\rm pr}} \big[ 2a{\rm E} (\theta \vert b, a, t) - {\rm E} (\theta^2 \vert b, a, t)\big] \tilde{\lambda}^A_{p}(b, a, t)P(b, a, t) \\ \frac{{\rm d}}{{\rm d}t}{\rm E}_t(ba) &\simeq \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{p=b}^\infty \sum_{i=1}^{N_{\rm pr}} a{\rm E} (\theta \vert b, a, t) \tilde{\lambda}^B_{p}(b, a, t)P(b, a, t) \nonumber \\ &- \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{i=1}^{N_{\rm ta}} a{\rm E} (\theta \vert b, a, t) \tilde{\mu}^A_{bi}(b, a, t) \tilde{M}_{i} (B_{b}\vert b, a, t)P(b, a, t) - \nu^B {\rm E}_t(a\theta) \\ &+ \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{i=1}^{N_{\rm ta}} b{\rm E} (\theta \vert b, a, t)\tilde{\mu}^B_{ai}(b, a, t) \tilde{M}_{i} (A_{a}\vert b, a, t)P(b, a, t) + \nu^A {\rm E}_t(b\theta) \nonumber \\ &- \sum_{b=\theta}^\infty \sum_{a=\theta}^\infty \sum_{p=a}^\theta \sum_{i=1}^{N_{\rm pr}} b{\rm E} (\theta \vert b, a, t) \tilde{\lambda}^A_{p}(b, a, t)P(b, a, t) \,. \end{align} \]

These moments could be used to construct a price time series emulator. For example, one could replace \(P(b, a, t)\) with a Gaussian process approximation.

References

[1]

M. Raberto, S. Cincotti, S. M. Focardi, and M. Marchesi, “Agent-based simulation of a financial market,” Physica A: Statistical Mechanics and its Applications, vol. 299, no. 1–2, pp. 319–327, 2001.

[2]

S. Alfarano, T. Lux, and F. Wagner, “Estimation of agent-based models: The case of an asymmetric herding model,” Computational Economics, vol. 26, pp. 19–49, 2005.

[3]

E. Smith, J. D. Farmer, L. Gillemot, and S. Krishnamurthy, “Statistical theory of the continuous double auction,” Quantitative finance, vol. 3, no. 6, p. 481, 2003.

[4]

B. Tóth, Y. Lemperiere, C. Deremble, J. De Lataillade, J. Kockelkoren, and J.-P. Bouchaud, “Anomalous price impact and the critical nature of liquidity in financial markets,” Physical Review X, vol. 1, no. 2, p. 021006, 2011.

[5]

I. Mastromatteo, B. Toth, and J.-P. Bouchaud, “Agent-based models for latent liquidity and concave price impact,” Physical Review E, vol. 89, no. 4, p. 042805, 2014.

[6]

J.-J. Chen, L. Tan, and B. Zheng, “Agent-based model with multi-level herding for complex financial systems,” Scientific Reports, vol. 5, no. 1, p. 8399, 2015.

[7]

J.-P. Bouchaud, J. Bonart, J. Donier, and M. Gould, Trades, quotes and prices: Financial markets under the microscope. Cambridge University Press, 2018.

[8]

T.-T. Chen, B. Zheng, Y. Li, and X.-F. Jiang, “New approaches in agent-based modeling of complex financial systems,” Frontiers of Physics, vol. 12, pp. 1–12, 2017.

[9]

R. Marcaccioli, J.-P. Bouchaud, and M. Benzaquen, “Exogenous and endogenous price jumps belong to different dynamical classes,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2022, no. 2, p. 023403, 2022.

Cite. You can cite this article using the following BibTeX:

@article{lobsim-2021,
  title = {Building Python simulations of limit order books},
  author = {Hardwick, Robert J},
  journal = {umbralcalculations (umbralcalc.github.io/posts/lobsim.html)},
  year = {2021},
}

Code. The code for this article was developed here: https://github.com/umbralcalc/lobsim.
License. This article is shared by the author under an MIT License.