Home About
umbralcalculations: Technical Article Repository



Useful state partitions for real-world simulations



Author. Hardwick, Robert J
Date. 2024-08-30
Concept. To provide some practical examples of the real-world simulation types which are supported in the stochadex engine by describing a group of widely-applicable state partitions. In particular, we discuss how these partitions can be useful in simulating everything from sports matches and spatial disease spread to traffic networks and supply chain logistics. With these examples (and many others) in mind, we also consider the realistic types of observation and interaction which are possible in each case.


Entity state transitions

In this article we’re going to define some widely-applicable state partitions which are useful in developing simulations of real-world systems. These will help to both illustrate how partitioning the state can be helpful in conceptualising the phenomena one wishes to simulate, and provide some practical insights into how the stochadex may be configured for different purposes.

We begin with the entity state transition, which refers to state transitions of any individual ‘entity’ that occur stochastically according to their respective transition rates. These transition rates may themselves be time-varying (even stochastically) and so it is useful to separate their values into a separate state partition and create a direct dependency channel on them, as in the rough schematic below.

Note how this computational structure is slightly more generic than (but related to) the event-based simulation schematics in [1].

Observations of the entity state transition in the real world typically take the form of either partial or noisy detections of the state transition times themselves over some period. Interactions with systems which require this kind of partitioning take the form of either direct changes to the entity state itself at some points in time or modifications to the rates at which state transitions occur.

It seems less useful to provide all of the examples of real-world problems which might use this kind of partitioning as it applies extremely generally. It will be more informative to discuss how these same examples apply in the context of the other partitions which are more specifically applicable. Having said this, it’s worth noting that our event-based representation of state transitions can also be trivially adapted to avoid the necessity for a continuous-time representation of the system. The applications for state transition models which only require sequential ordering (but not a continuous time variable) include sequential experimental design problems, e.g., astronomical telescopes (see [2] and [3]) and biological experiments [4].

Weighted mean points

The weighted mean point performs a weighted average over a specified collection of neighbouring states. Given that one of the more natural uses cases for this partition is in spatial field averaging, the topology of the subgraph is typically totally connected and highly structured. However, some connections matter more than others, according to the weighting. We have created a rough schematic below.

In the case of spatial fields, you can think of each point as being structured topologically in a kind of ‘lattice’ configuration where connections to other points are controlled indirectly by the relationship between states and their weighted point averages over time. Different distances in the lattice can contribute different importance weights in affecting each local average.

Which real-world control problems would this partition be useful for? Given the natural spatial interpetation, the kinds of simulation that would leverage it are:

Observations of the weighted mean point in the real world typically take the form of either partial or noisy detections of the raw state values before averaging. Actors in systems which require this kind of partitioning could be public health or wildlife/national park authorities as well as livestock/crop farmers. The interactions with these systems would therefore focus on modifying the parameters for spatial detection of disease or damage and changing a subset of the population states directly through interventions.

Node histograms

The node histogram counts the frequencies of state occupations exhibited by all of the specified connected states. This partition provides a summary of information about a single network node which exists as part of a larger ‘state network’, and can be configured in collection with other partitions of the same type to represent any desirable connectivity structure. We have illustrated how it works in the rough schematic below.

Which real-world control problems would this partition be useful for? If we consider networks which rely on counting the frequencies of neighbouring node states, the kinds of simulation that would leverage it are:

Observations of the node histogram in the real world typically take the form of either partial or noisy detections of the counts. Actors in systems which require this kind of partitioning could be a neurologist, traffic light controller or even city infrastructure maintainer. In all cases, interactions with these systems would typically be directly changing the state of some subset of nodes in the network itself.

Pipeline stage state histograms

The pipeline stage state histogram counts the frequencies of entity types which exist in a particular stage of some pipeline. These partitions can be connected together in a directed subgraph to represent a multi-stage pipeline structure. We’ve provided a rough schematic below.

Which real-world control problems would this partition be useful for? If we think about multi-stage pipelines whose future states depend on the frequencies of entity types which exist at each stage, the following real-world examples come to mind:

Observations of the pipeline stage state histogram in the real world typically take the form of either partial or noisy detections of the entity stage transtition events in time and/or the frequency counts in the stage itself. Actors in systems which require this kind of partitioning could be a supply/relief chain controller, hospital logistics manager, data pipeline maintainer or even software engineer. In all cases, interactions with these systems would likely be directly modifying the relative flows between different pipeline stages.

Centralised entity interactions

Centralised entity interactions divide the representation of the system state into a partitions of ‘entity states’ and some partition of ‘centralised state’ upon which interactions between entities can depend. The subgraph topology is hence a star configuration where every entity state is connected to the centralised state, but not necessarily to each other. We have provided a rough schematic for the structure below.

Which real-world control problems would this partition be useful for? Dividing the state up into a collection of entity states and some centralised state can be useful in a variety of settings. In particular, we can think of:

Observations of the centralised entity interactions in the real world typically take the form of either partial or noisy detections of the states and state changes. Actors in systems which require this kind of partitioning could be sports team managers, financial/betting/other market traders or market exchange mediators. The interactions with these systems would therefore typically focus on changing which entities are present, changing their parameters and/or changing the parameters of the centralised state iteration.

References

[1]
R. J. Hardwick, “Building a generalised simulation engine,” umbralcalculations (umbralcalc.github.io/posts/stochadexI.html), 2024.
[2]
P. Jia, Q. Jia, T. Jiang, and J. Liu, “Observation strategy optimization for distributed telescope arrays with deep reinforcement learning,” The Astronomical Journal, vol. 165, no. 6, p. 233, 2023.
[3]
S. Yatawatta and I. M. Avruch, “Deep reinforcement learning for smart calibration of radio telescopes,” Monthly Notices of the Royal Astronomical Society, vol. 505, no. 2, pp. 2141–2150, 2021.
[4]
N. J. Treloar, N. Braniff, B. Ingalls, and C. P. Barnes, “Deep reinforcement learning for optimal experimental design in biology,” PLOS Computational Biology, vol. 18, no. 11, p. e1010695, 2022.
[5]
A. Q. Ohi, M. Mridha, M. M. Monowar, and M. A. Hamid, “Exploring optimal control of epidemic spread using reinforcement learning,” Scientific reports, vol. 10, no. 1, p. 22106, 2020.
[6]
R. Carter, K. N. Mendis, and D. Roberts, “Spatial targeting of interventions against malaria,” Bulletin of the World Health Organization, vol. 78, pp. 1401–1411, 2000.
[7]
S. Ganapathi Subramanian and M. Crowley, “Using spatial reinforcement learning to build forest wildfire dynamics models from satellite images,” Frontiers in ICT, vol. 5, p. 6, 2018.
[8]
M. Lapeyrolerie, M. S. Chapman, K. E. Norman, and C. Boettiger, “Deep reinforcement learning for conservation decisions,” Methods in Ecology and Evolution, vol. 13, no. 11, pp. 2649–2662, 2022.
[9]
M. Chen et al., “A reinforcement learning approach to irrigation decision-making for rice using weather forecasts,” Agricultural Water Management, vol. 250, p. 106838, 2021.
[10]
S. M. Saliba, B. D. Bowes, S. Adams, P. A. Beling, and J. L. Goodall, “Deep reinforcement learning with uncertain data for real-time stormwater system control and flood mitigation,” Water, vol. 12, no. 11, p. 3222, 2020.
[11]
M. Lu, X. Wei, Y. Che, J. Wang, and K. A. Loparo, “Application of reinforcement learning to deep brain stimulation in a computational model of parkinson’s disease,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 28, no. 1, pp. 339–349, 2019.
[12]
J. Pineau, A. Guez, R. Vincent, G. Panuccio, and M. Avoli, “Treating epilepsy via adaptive neurostimulation: A reinforcement learning approach,” International journal of neural systems, vol. 19, no. 4, pp. 227–240, 2009.
[13]
K. Saboo, A. Choudhary, Y. Cao, G. Worrell, D. Jones, and R. Iyer, “Reinforcement learning based disease progression model for alzheimer’s disease,” Advances in Neural Information Processing Systems, vol. 34, pp. 20903–20915, 2021.
[14]
K.-L. A. Yau, J. Qadir, H. L. Khoo, M. H. Ling, and P. Komisarczuk, “A survey on reinforcement learning models and algorithms for traffic signal control,” ACM Computing Surveys (CSUR), vol. 50, no. 3, pp. 1–38, 2017.
[15]
Q. Li et al., “Integrating reinforcement learning and optimal power dispatch to enhance power grid resilience,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 3, pp. 1402–1406, 2021.
[16]
Z. A. Bukhsh, H. Molegraaf, and N. Jansen, “A maintenance planning framework using online and offline deep reinforcement learning,” Neural Computing and Applications, pp. 1–12, 2023.
[17]
Y. Yan, A. H. Chow, C. P. Ho, Y.-H. Kuo, Q. Wu, and C. Ying, “Reinforcement learning for logistics and supply chain management: Methodologies, state of the art, and future opportunities,” Transportation Research Part E: Logistics and Transportation Review, vol. 162, p. 102712, 2022.
[18]
L. Yu, C. Zhang, J. Jiang, H. Yang, and H. Shang, “Reinforcement learning approach for resource allocation in humanitarian logistics,” Expert Systems with Applications, vol. 173, p. 114663, 2021.
[19]
S. S. Shuvo, M. R. Ahmed, H. Symum, and Y. Yilmaz, “Deep reinforcement learning based cost-benefit analysis for hospital capacity planning,” in 2021 international joint conference on neural networks (IJCNN), IEEE, 2021, pp. 1–7.
[20]
J. D. Lomas et al., “Interface design optimization as a multi-armed bandit problem,” in Proceedings of the 2016 CHI conference on human factors in computing systems, 2016, pp. 4142–4153.
[21]
K. Nagrecha, L. Liu, P. Delgado, and P. Padmanabhan, “InTune: Reinforcement learning-based data pipeline optimization for deep recommendation models,” in Proceedings of the 17th ACM conference on recommender systems, 2023, pp. 430–442.
[22]
M. Pulis and J. Bajada, “Reinforcement learning for football player decision making analysis,” in StatsBomb conference, 2022.
[23]
T. Sawczuk, A. Palczewska, and B. Jones, “Markov decision processes with contextual nodes as a method of assessing attacking player performance in rugby league,” in Advances in computational intelligence systems: Contributions presented at the 20th UK workshop on computational intelligence, september 8-10, 2021, aberystwyth, wales, UK 20, Springer, 2022, pp. 251–263.
[24]
N. Ding, K. Takeda, and K. Fujii, “Deep reinforcement learning in a racket sport for player evaluation with technical and tactical contexts,” IEEE Access, vol. 10, pp. 54764–54772, 2022.
[25]
T. G. Fischer, “Reinforcement learning in financial markets-a survey,” FAU Discussion Papers in Economics, 2018.
[26]
T. L. Meng and M. Khushi, “Reinforcement learning in financial markets,” Data, vol. 4, no. 3, p. 110, 2019.
[27]
D. Cliff, “BBE: Simulating the microstructural dynamics of an in-play betting exchange via agent-based modelling,” arXiv preprint arXiv:2105.08310, 2021.
[28]
A. Dangi, “Financial portfolio optimization: Computationally guided agents to investigate, analyse and invest!?” arXiv preprint arXiv:1301.4194, 2013.
[29]
B. Yilmaz and A. Selcuk-Kestel, “A stochastic approach to model housing markets: The US housing market case,” Numerical Algebra Control and Optimization, vol. 8, no. 4, 2018.
[30]
A. Carro, M. Hinterschweiger, A. Uluc, and J. D. Farmer, “Heterogeneous effects and spillovers of macroprudential policy in an agent-based model of the UK housing market,” Industrial and Corporate Change, vol. 32, no. 2, pp. 386–432, 2023.
[31]
R. May and P. Huang, “A multi-agent reinforcement learning approach for investigating and optimising peer-to-peer prosumer energy markets,” Applied Energy, vol. 334, p. 120705, 2023.

Cite. You can cite this article using the following BibTeX:
@article{stochadexIV-2024,
  title = {Useful state partitions for real-world simulations},
  author = {Hardwick, Robert J},
  journal = {umbralcalculations (umbralcalc.github.io/posts/stochadexIV.html)},
  year = {2024},
}
Code. The code for this article was developed here: https://github.com/umbralcalc/stochadex.
License. This article is shared by the author under an MIT License.