Simulating Real-World Systems as a Programmer cover

A collection of posts on the foundations and patterns for building simulations of the real world. Written especially for programmers and non-technical readers wanting to learn the fundamentals. All written material and non-interactive diagrams were human-generated, where some interactive elements were programmed using generative AI tools.

Collection

Simulating Real-World Systems as a Programmer

    Architectures for current and future hardware

    Classical hardware

    When we talk about ‘classical’ hardware here, we just mean standard CPUs.

    On CPUs, simulation architectures may be constructed out of several ingredients. Loosely speaking, these are: Memory, Threads, Channels between Threads, Processes and Inter-Process Communication (IPC).

    All of these ingredients have their own tradeoffs in performance. But they are all useful in constructing the right simulation architectures to satisfy the right use cases.

    In all of the previous posts so far, the main simulation architectures we have been considering are defined as Stepwise; simulation architectures which evaluate the Next State Values for the system at each point in Time, in turn.

    Stepwise simulation architectures on CPUs are typically more performant when using Memory, Threads and Channels between Threads in the right combinations.

    In contrast, Processes, and IPC in particular, are typically more useful when we consider scaling computations in parallel across multiple non-interacting simulation Trajectories (which don’t need much IPC). This is because IPC comes with more performance limitations.

    Batch simulation architectures evaluate multiple successive sequences of Next State Values for the system over a wider interval in Time all as one computational block.

    Despite their appearance, Batch simulation architectures cannot fundamentally evaluate the Next State Values at different Timesteps in a truly parallel fashion. Simulations must still preserve the causal relationships between these Next State Values as they progress in Time.

    To ensure this causality, some form of Iteration can be performed; like the Stepwise architecture implies by evaluating it recursively.

    However, it is sometimes sufficient to simply encode the causal/temporal dependencies between State Values along the Simulation Timeline as part of a Batch prediction; which is how some Machine Learning models are used to predict time series data.

    Example: Stepwise vs batch

    Stepwise vs Batch Simulation Timeline

    Specialised classical hardware

    From the perspective of standard CPUs, Batch simulation architectures are often designed to evaluate segments of the Simulation Timeline using specialised classical hardware.

    When we talk about ‘specialised classical’ hardware here, we mean GPUs, TPUs, IPUs and other specialised processors based on classical computing principles (as opposed to quantum processors).

    This architecture can be used to reduce the overall processing time taken to complete a Simulation Run relative to a Stepwise equivalent, but there are tradeoffs which mean this isn’t always efficient.

    GPUs, TPUs, IPUs, etc. all have their limitations. For example, GPUs and TPUs are highly optimised for dense arithmetic operations but struggle with branching control flow. IPUs offer more flexibility for irregular compute patterns and sparse operations, though they still prioritise throughput over the complex sequential logic that CPUs handle well.

    So there are basically certain types of simulation algorithm that can be written that GPUs, TPUs, IPUs, etc. are not well-suited to reducing the overall processing time for.

    In addition, this specialised hardware typically requires data transfer to/from CPU Memory (at the very least for initialisation and final results), which also takes processing time.

    So, when deciding on the number of Timesteps a Batch simulation architecture should use for the best performance, software engineers must take into account:

    Example: Batch size tradeoffs

    Quantum hardware

    Note that the concepts in this section are the most likely to change with future advancements in Quantum Computing.

    Quantum hardware seems to naturally fit the Batch simulation architecture in the same way that specialised classical hardware does.

    In order to utilise this hardware within a given Batch evaluation, one would need to:

    Note also that the No-Cloning Theorem means we cannot simply copy the Qubits which have run the quantum gates; the circuit must run separately for each simulation Trajectory.

    Therefore, you only get a Quantum Advantage if you can store more than one Timestep worth of simulation Next State Values in Qubit Memory.

    Otherwise, if you only effectively have one instantaneous Timestep of Qubit Memory to use, the processing time will likely be dominated by I/O writing to and from the Qubits during the simulation. This is also known as the Quantum I/O Bottleneck.