Research Article

Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning

See allHide authors and affiliations

Science  06 Sep 2019:
Vol. 365, Issue 6457, eaaw1147
DOI: 10.1126/science.aaw1147

Efficient sampling of equilibrium states

Molecular dynamics or Monte Carlo methods can be used to sample equilibrium states, but these methods become computationally expensive for complex systems, where the transition from one equilibrium state to another may only occur through rare events. Noé et al. used neural networks and deep learning to generate distributions of independent soft condensed-matter samples at equilibrium (see the Perspective by Tuckerman). Supervised training is used to construct invertible transformations between the coordinates of the complex system of interest and simple Gaussian coordinates of the same dimensionality. Thus, configurations can be sampled in this simpler coordinate system and then transformed back into the complex one using the correct statistical weighting.

Science, this issue p. eaaw1147; see also p. 982

Structured Abstract

INTRODUCTION

Statistical mechanics aims to compute the average behavior of physical systems on the basis of their microscopic constituents. For example, what is the probability that a protein will be folded at a given temperature? If we could answer such questions efficiently, then we could not only comprehend the workings of molecules and materials, but we could also design drug molecules and materials with new properties in a principled way.

To this end, we need to compute statistics of the equilibrium states of many-body systems. In the protein-folding example, this means to consider each of the astronomically many ways to place all protein atoms in space, to compute the probability of each such “configuration” in the equilibrium ensemble, and then to compare the total probability of unfolded and folded configurations.

As enumeration of all configurations is infeasible, one instead must attempt to sample them from their equilibrium distribution. However, we currently have no way to generate equilibrium samples of many-body systems in “one shot.” The main approach is thus to start with one configuration, e.g., the folded protein state, and make tiny changes to it over time, e.g., by using Markov-chain Monte Carlo or molecular dynamics (MD). However, these simulations get trapped in metastable (long-lived) states: For example, sampling a single folding or unfolding event with atomistic MD may take a year on a supercomputer.

RATIONALE

Here, we combine deep machine learning and statistical mechanics to develop Boltzmann generators. Boltzmann generators are trained on the energy function of a many-body system and learn to provide unbiased, one-shot samples from its equilibrium state. This is achieved by training an invertible neural network to learn a coordinate transformation from a system’s configurations to a so-called latent space representation, in which the low-energy configurations of different states are close to each other and can be easily sampled. Because of the invertibility, every latent space sample can be back-transformed to a system configuration with high Boltzmann probability (Fig. 1). We then employ statistical mechanics, which offers a rich set of tools for reweighting the distribution generated by the neural network to the Boltzmann distribution.

RESULTS

Boltzmann generators can be trained to directly generate independent samples of low-energy structures of condensed-matter systems and protein molecules. When initialized with a few structures from different metastable states, Boltzmann generators can generate statistically independent samples from these states and efficiently compute the free-energy differences between them. This capability could be used to compute relative stabilities between different experimental structures of protein or other organic molecules, which is currently a very challenging problem. Boltzmann generators can also learn a notion of “reaction coordinates”: Simple linear interpolations between points in latent space have a high probability of corresponding to physically realistic, low-energy transition pathways. Finally, by using established sampling methods such as Metropolis Monte Carlo in the latent space variables, Boltzmann generators can discover new states and gradually explore state space.

CONCLUSION

Boltzmann generators can overcome rare event–sampling problems in many-body systems by learning to generate unbiased equilibrium samples from different metastable states in one shot. They differ conceptually from established enhanced sampling methods, as no reaction coordinates are needed to drive them between metastable states. However, by applying existing sampling methods in the latent spaces learned by Boltzmann generators, a plethora of new opportunities opens up to design efficient sampling methods for many-body systems.

Boltzmann generators overcome sampling problems between long-lived states.

The Boltzmann generator works as follows: 1. We sample from a simple (e.g., Gaussian) distribution. 2. An invertible deep neural network is trained to transform this simple distribution to a distribution pX(x) that is similar to the desired Boltzmann distribution of the system of interest. 3. To compute thermodynamics quantities, the samples are reweighted to the Boltzmann distribution using statistical mechanics methods.

Abstract

Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples in “one shot,” vast computational effort is invested for simulating these systems in small steps, e.g., using molecular dynamics. Combining deep learning and statistical mechanics, we developed Boltzmann generators, which are shown to generate unbiased one-shot equilibrium samples of representative condensed-matter systems and proteins. Boltzmann generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free-energy differences and discovery of new configurations are demonstrated, providing a statistical mechanics tool that can avoid rare events during sampling without prior knowledge of reaction coordinates.

Statistical mechanics is concerned with computing the average behavior of many copies of a physical system based on its microscopic constituents and their interactions. For example, what is the average magnetization in an Ising model of interacting magnetic spins? Or what is the probability of a protein to be folded as a function of the temperature? Under a wide range of conditions, the equilibrium probability of a microscopic configuration x (setting of all spins, positions of all protein atoms, etc.) is proportional to eu(x), for example, the well-known Boltzmann distribution. The dimensionless energy u(x) contains the potential energy of the system, the temperature, and, optionally, other thermodynamic quantities.

Except for simple model systems, we presently have no approach to directly draw “one-shot,” i.e., statistically independent, samples x from Boltzmann-type distributions to compute statistics of the system, such as free-energy differences. Therefore, one currently must rely on trajectory methods such as Markov chain Monte Carlo (MCMC) or molecular dynamics (MD) simulations that make tiny changes to x in each step. These methods sample from the Boltzmann distribution in the long run, but many simulation steps are needed to produce a statistically independent sample. This is because complex systems often have metastable (long-lived) phases or states and the transitions between them are rare events; for example, 109 to 1015 MD simulation steps are needed to fold or unfold a protein. As a result, MCMC and MD methods are extremely expensive and consume much of the worldwide supercomputing resources.

A common approach to enhance sampling is to speed up rare events by biasing user-defined order parameters, or “reaction coordinates” (RCs), that may be of a mechanical (14), thermodynamic (57), or alchemical nature (8, 9). Applying these techniques to high-dimensional systems with a priori unknown transition mechanisms is challenging, as identifying suitable order parameters and avoiding rare events in other, unbiased directions becomes extremely difficult. For example, the development of enhanced simulation protocols for the binding of small drug molecules to proteins has become a research area in its own right (10).

In this study, we set out to develop a “Boltzmann generator” machine that is trained on a given energy function u(x) and then produces unbiased one-shot samples from eu(x), circumventing the sampling problem without requiring any knowledge of RCs. At first sight, this enterprise seems hopeless for condensed-matter systems and complex polymers. In these systems, strongly repulsive particles are densely packed, such that the number of low-energy configurations is vanishingly small compared with the number of possible ways to place particles.

Key to the solution is combining the strengths of deep machine learning (11) and statistical mechanics (Fig. 1A). We train a deep invertible neural network to learn a coordinate transformation from x to a so-called “latent” representation z, in which the low-energy configurations of different states are close to each other and can be easily sampled, e.g., using a Gaussian normal distribution. Enhancing MD sampling by user-defined coordinate transformations has been proposed previously (12). The novelty of Boltzmann generators is that this transformation is learned and, owing to the deep transformation network, can be as complicated as needed to represent state changes in the many-body system. As Boltzmann generators are invertible, every sample z can be back transformed to a configuration x with high Boltzmann probability. We can improve the ability to find relevant parts of configuration space by “learning from example,” where the potential energy u(x) used to train the Boltzmann generator is complemented by relevant samples x, e.g., from the folded or unfolded state of a protein but without knowing the probabilities of these states. Then, we use statistical mechanics, which offers a rich set of tools to generate the target distribution eu(x) when the proposal distribution is sufficiently similar.

Fig. 1 Boltzmann generators.

(A) A Boltzmann generator is trained by minimizing the difference between its generated distribution and the desired Boltzmann distribution. Generation proceeds by drawing “latent” space samples z from a simple prior distribution (e.g., Gaussian) and transforming them to configurations x. The variable transformation is formed by stacking invertible transformations f1,...,fn to a deep neural network Fzx and its inverse, Fxz. To compute thermodynamics, such as configurational free energies, the samples must be reweighted to the Boltzmann distribution. (B) A Boltzmann generator is composed of invertible neural network blocks. Here, a non–volume-preserving transformation block is shown as an example.

This study demonstrates that Boltzmann generators can be trained to generate low-energy structures of condensed matter systems and protein molecules in one shot, as shown for model systems and a millisecond-timescale conformational change of the bovine pancreatic trypsin inhibitor (BPTI) protein. When the Boltzmann generator is initialized with a few structures from different metastable states, it can generate statistically independent samples from these states and can compute the free-energy profiles of the corresponding transitions without suffering from rare events. Although Boltzmann generators do not require RCs, they can be included in the training to sample continuous free-energy profiles and low-probability states. When trained in this way, Boltzmann generators can also generate physically realistic transition pathways by performing simple linear interpolations in latent space. We also show that multiple independent Boltzmann generators, trained on disconnected MD or MCMC simulations of different states, can be used to compute free-energy differences between these states in a direct and inexpensive way and without requiring any RCs. Finally, we demonstrate that when using established sampling methods such as Metropolis Monte Carlo in the latent space of a Boltzmann generator, efficient methods can be constructed to find new states and gradually explore state space.

Boltzmann generators

Neural networks that can draw statistically independent samples from a desired distribution are called directed generative networks (13, 14). Such networks have been demonstrated to generate photorealistic images (15), to produce deceivingly realistic speech audio (16), and even to sample formulae of chemical compounds with certain physicochemical properties (17). In these domains, the exact target distribution is not known and the network is “trained by example” using large databases of images, audio, or molecules. Here, we are in the inverse situation, as we can compute the Boltzmann weight of each generated sample x, but we do not have samples from the Boltzmann distribution a priori. The idea of Boltzmann generators is as follows (Fig. 1A):

1) A neural network transformation Fzx is learned such that when sampling z from a simple prior, e.g., a Gaussian normal distribution, Fzx(z) will provide a configuration x that has a high Boltzmann weight, i.e., is coming from a distribution pX(x) that is similar to the target Boltzmann distribution.

2) To obtain an unbiased sample and to compute Boltzmann-weighted averages, the generated distribution pX(x) is reweighted to the Boltzmann distribution eu(x). This can be achieved with various algorithms; here, the simplest one is used: assign the statistical weight w(x)=eu(x)/pX(x) to every sample x and then compute desired statistics, such as free-energy differences using this weight.

For both training and reweighting, it is important that we can compute the probability pX(x) of generating a configuration x. This can be achieved when Fzx is an invertible transformation, which allows us to transform the known prior distribution pZ(z) to pX(x) (Fig. 1A, materials and methods) (18, 19). Physically, invertible transformations are analogous to flows of a fluid that transform the probability density from configuration space to latent space, or vice versa. Volume-preserving transformations, comparable to incompressible fluids, were introduced in (19). Here, we use the non–volume-preserving transformations introduced in (20) (Fig. 1B), as they allow the probability distribution to be scaled differently at different parts of configuration space. Alternatively, Boltzmann generators can be built using more general invertible transformations (2123). Invertibility is achieved by adopting special neural network architectures (Fig. 1B; materials and methods). Multiple trainable invertible “blocks” can be stacked, thus encoding complicated variable transformations in the form of a deep invertible neural network (Fig. 1A).

Boltzmann generators are trained by combining two modes: training by energy and training by example. Training by energy is the main principle behind Boltzmann generators, and proceeds as follows: We sample random vectors z from a Gaussian prior distribution, and then transform them through the neural network to proposal configurations, x=Fzx(z). In this way, the Boltzmann generator will generate configurations from a proposal distribution pX(x), which initially will be very different from the Boltzmann distribution, and will include configurations with very high energies. Next, we compute the difference between the generated distribution pX(x) from the Boltzmann distribution whose statistical weights eu(x) are known. For Boltzmann generators, a natural measure of this difference is the relative entropy, or Kullback–Leibler (KL) divergence. The KL divergence can be computed as the following expectation value over samples z (materials and methods):JKL=Ez[u(Fzx(z))logRzx(z)](1)Here, u(Fzx(z)) is the energy of the generated configuration. Rzx is the determinant of the Boltzmann generator’s Jacobian matrix and measures how much the network scales the configuration space volume at z. The invertible network layers are designed such that Rzx can be easily computed (materials and methods). We treat JKL as a loss function: To train the Boltzmann generator, we approximate JKL using a batch of ~1000 samples, and then change the neural network parameters so as to decrease JKL. A few hundred or thousand such iterations are required to train the Boltzmann generator for the examples in this study. The resulting few million computations of the potential energy in Eq. 1 are the main computational investment and take between 1 min and few hours for the present systems.

The KL divergence (Eq. 1) is equivalent to the free-energy difference of transforming the Gaussian prior distribution to the generated distribution (materials and methods, supplementary materials): The first term E[u(Fzx(z))] is the mean potential energy, i.e., the internal energy of the system. The second term E[logRzx(z)] is equal to the entropic contribution to the free energy at the chosen temperature plus a constant factor. The terms in Eq. 1 counterplay in an interesting way: the first term tries to minimize the energy and therefore trains the Boltzmann generator to sample low-energy structures. The second term tries to maximize the entropy of the generated distribution and therefore prevents the Boltzmann generator from the so-called mode collapse (13), i.e., the repetitive sampling of a single minimum-energy configuration that would minimize the first term.

Despite the entropy term in Eq. 1, training by energy alone is not sufficient, as it tends to focus sampling on the most stable metastable state (fig. S2). We therefore additionally use training by example, which is the standard training method used in other machine learning applications, and is here implemented with the maximum likelihood (ML) principle. We initialize the Boltzmann generator with some “valid” configurations x, e.g., from short initial MD simulations or experimental structures, and transform them to latent space via z=Fxz(x). Maximizing their likelihood in the Gaussian distribution corresponds to minimizing the loss function (18, 19):JML=Ex[12Fxz(x)2logRxz(x)](2)Here, the first term 12Fxz(x)2 is the energy of a harmonic oscillator corresponding to the Gaussian prior distribution. Training by example is especially useful in the early stages of training, as it helps the Boltzmann generator to focus on relevant parts of state space.

By combining training by energy and training by example, we can sample configurations that have high probabilities and low free energies. However, sometimes we want to sample states with low equilibrium probabilities, such as transition states along a certain RC whose free-energy profile is of interest. For this purpose, we introduce an RC loss that can optionally be used to enhance the sampling of a Boltzmann generator along a chosen RC (materials and methods).

Results

Illustration on model systems

We first illustrate Boltzmann generators using two-dimensional model potentials that have metastable states separated by high energy barriers: the double well potential and the Mueller potential (Fig. 2, A and G). MD simulations stay in one metastable state for a long time before a rare transition event occurs. Hence, the distributions in configuration space (x1,x2) are split into two modes (Fig. 2, A and G; transition state and intermediate state ensembles are shown in yellow for clarity but are not used for training). We are training Boltzmann generators using the two short and disconnected simulations whose samples are shown in Fig. 2, A and G (details in supplementary materials, convergence in fig. S1). Figure 2, B and H, shows the latent spaces learned by the Boltzmann generator; note that their exact appearance varies between different runs due to stochasticity in neural network training. In both latent spaces, the probability densities of the two states and the transition/intermediate states are “repacked” to form a density concentrated around the origin.

Fig. 2 Application of Boltzmann generators on two-dimensional bistable systems.

(A and G) Two-dimensional potentials: double well (x1 is the slow coordinate) and Mueller potential. Two short MD simulation trajectories (blue, red) stay in their metastable states without crossing. Transition state and intermediate state ensembles are shown (orange) but were not used for training. (B and H) Latent-space distribution of trajectories shown in (A) and (G) when mapped through trained Fxz. (C and I) Potential energy distribution sampled by MD simulation (gray) and by Boltzmann generators trained by energy and by example (KL+ML, green) and using RC training (KL+RC, orange). (D and J) Boltzmann generator sample distribution along the slow coordinates. For the Mueller potential, xproj is defined as projection along the vector (1,–1). (E and K) Free-energy estimates obtained from Boltzmann generator samples after reweighting. (F and L) Paths generated by linear interpolation in Boltzmann generator latent space (B and H) between random pairs of “blue” and “red” MD samples.

We use the Boltzmann generators by sampling from their latent spaces according to Gaussian distributions. After transforming these variables via Fzx, this produces uncorrelated and low-energy samples from both stable states without any sampling problem (Fig. 2, C, D, I, and J). A variety of training methods succeed in sampling across the barrier such that the rare event nature of the system is eliminated (fig. S2). Using a Boltzmann generator trained by energy and by example with simple reweighting reproduces the precise free-energy differences of the two metastable states, although no RC is used to indicate the direction of the rare event (Fig. 2, E and K, green). By additionally training with the RC loss to promote sampling along x1 (double well) or x2 (Mueller potential), the low-probability transition states are sampled (Fig. 2, D and J, orange) and the full free-energy profile can be reconstructed with high precision (Fig. 2, E and K, orange).

The Boltzmann generator repacks the high-probability regions of configuration space into a concentrated latent space density. We therefore wondered about the physical interpretation of direct paths in latent space. Specifically, we interpolate linearly between the latent space representations of samples from different energy minima, similar to what is done with generative networks in other disciplines (22, 17). When mapping these linear interpolations back to configuration space, they result in nonlinear pathways that have low energies and high probabilities (Fig. 2, F and L). Although there is no general guarantee that linear paths in latent space will result in low energies, this result indicates that the latent spaces learned by Boltzmann generators can be used to provide candidates of order parameters for bias-enhanced or path-based sampling methods (1, 3, 24).

For the double-well system, the unbiased MD simulation needs, on average, 4 × 106 MD steps for a single return trip between the two states (supplementary materials), and ~100 such crossings are required to compute the free-energy difference with the same precision as the Boltzmann generator results shown in Fig. 2E. The total effort of training the Boltzmann generator (including generating the initial simulation data) corresponds to ~106 steps, but once this is done, statistically independent samples can be generated at no significant cost. For this simple system, the Boltzmann generator is therefore about a factor of 100 more efficient than direct simulation, but much more extreme savings can be observed for complex systems, as shown below.

Thermodynamics of condensed-matter systems

As a second example, we demonstrate that Boltzmann generators can sample high-probability structures and efficiently compute the thermodynamics in crowded condensed-matter systems. We simulated a dense system of two-dimensional particles confined to a box, as suggested in (25) (Fig. 3A). Immersed in a fluid is a bistable particle dimer whose open and closed states are separated by a high barrier (Fig. 3B). Opening or closing the dimer directly is not possible due to the high density of the system but requires concerted rearrangement of solvent particles. At close distances, particles repel each other strongly, resulting in a crowded system. Thus, the fraction of low-energy configurations is vanishingly small, and manually designing a sampling method that simultaneously places all 38 particles and achieves low energies appears unfeasible.

Fig. 3 Repulsive particle system with bistable dimer.

(A) Closed (blue) and open (red) configurations from MD simulations (input data). (B) Bistable dimer potential. (C) Distribution of MD simulation data in latent space coordinates z1,z2 after training the Boltzmann generator. (D) Potential energy distribution from MD (gray) and Boltzmann generator for closed (blue), open (red), and transition configurations (yellow). Insets show one-shot Boltzmann generator samples. (E) Free-energy differences as a function of dimer distance and relative temperature sampled with Boltzmann generators (generation and reweighting, green bullets with intervals indicating one standard error from 10 independent repeats) and umbrella sampling (black lines). (F) Linear latent space interpolation between the closed and open structures shown in the top row.

We trained a Boltzmann generator to sample one-shot low-energy configurations and used it to compute the free-energy profiles of opening or closing the dimer. Key to treating explicit-solvent systems such as this one is to incorporate the indistinguishability of solvent molecules. If physically identical solvent molecules were to be distinguished, then every exchange of solvent molecule positions due to diffusion would represent a new configuration, resulting in an enormous configuration space even for this 38-particle system. We therefore removed identical-particle permutations from all configurations input into or sampled by the Boltzmann generator by exchanging particle labels to minimize the distance to a reference configuration (supplementary materials).

The training is initialized with examples from separate, disconnected simulations of the open and closed states, but in later stages, training by energy Eq. 1 dominates (supplementary materials, fig. S1, and table S1). The trained Boltzmann generator has learned a transformation of the complex configuration space density to a concentrated, 76-dimensional ball in latent space (Fig. 3C). Indeed, direct sampling from a 76-dimensional Gaussian in latent space and transformation via Fzx generates configurations in which all particles are placed without significant clashes and potential energies that overlap with the energy distribution of the unbiased MD trajectories (Fig. 3D). Also, realistic transition states that have not been included in any training data are sampled (Fig. 3D, middle).

To demonstrate the computation of thermodynamic quantities, we perform training by energy (Eq. 1) simultaneously to a range of temperatures (supplementary materials). Although the temperature changes the configuration space distribution in a complex way, it can be modeled as a simple scaling factor in the width of the Gaussian prior distribution in latent space (materials and methods). Then, we sample the Boltzmann generator for a range of temperatures and use simple reweighting to compute the free energies along the dimer distances. As shown in Fig. 3E, these temperature-dependent free energies agree precisely with extensive umbrella-sampling simulations that use bias potentials along the dimer distance ((1), supplementary materials).

We estimate that the MD simulation needs at least 1012 steps to spontaneously sample a single transition from closed to open state and back (supplementary materials), and ~100 such transitions would be needed to compute free-energy differences with the precision of Boltzmann Generators shown in Fig. 3E. The total effort to train the Boltzmann generator is ~3 × 107 energy evaluations, but then statistically independent samples can be drawn in one shot at the entire temperature range trained, resulting in at least seven orders of magnitude speed-up compared with MD.

As above, we perform linear interpolations between the latent space representations of open- and closed-dimer samples. A significant fraction of all pair interpolations results in low-energy pathways. The lowest-energy interpolation of 100 randomly selected pairs of end states is shown in Fig. 3F, representing a physically meaningful rearrangement of the dimer and solvent.

Exploring configuration space

In the previous examples, Boltzmann generators were used to sample known regions of configuration space and compute statistics thereof. Here, we demonstrate that Boltzmann generators can help to explore configuration space. The basic idea is as follows: we construct an exploratory sampling method using an established sampling algorithm in latent space while simultaneously training the Boltzmann generator transformation using the configurations found so far.

We initialize the method with a (possibly small) set of configurations X. Training is done here by minimizing the symmetric loss function J=JKL+JML (Eqs. 1 and 2). The likelihood loss function JML is initially biased by the input data, but as X approaches an unbiased Boltzmann sample, the symmetric loss converges to a meaningful distance of probabilities (materials and methods). As an example, here, we use Metropolis Monte Carlo in the Boltzmann generator latent space to update X (materials and methods). The step size is chosen adaptively but reaches the order of the latent space distribution width. Thus, large-scale configuration transitions in physical space can be overcome in a single Monte Carlo step.

We now revisit the three previous examples and initialize X with only a single input configuration from the most stable state (Fig. 4, A, D, and G). The exploration method quickly fills the local metastable states and finds new metastable states within a few 105 energy calls, i.e., orders of magnitude faster than direct MD (Fig. 4, B, E, and H). This demonstrates that Boltzmann generators sample new, previously unseen states with a significant probability, and that this ability can be turned into exploring configuration space when past samples are stored and reused for training.

Fig. 4 Exploration with Boltzmann generators from a single snapshot.

(A to C) Double-well potential, (D to F) Mueller potential, and (G and H) solvated particle dimer. (A, D, and G) Starting configuration. (C, F, and I) Convergence of the loss terms (JML and JKL) and the MCMC efficiency (product of step length and acceptance rate). (B, E, and H) Evolution of sample distribution over MCMC iteration. As soon as sufficient density is available in the states of interest, these distributions can be reweighted to equilibrium as in Figs. 2 and 3. iter, iterations; acc, acceptance.

The Metropolis Monte Carlo method causes the sample to converge toward the Boltzmann distribution. However, we do not need to wait for this method to be converged because, with sufficient samples in the states of interest, the equilibrium free energies can be computed by reweighting as in Figs. 2 and 3 above. Although new states are found, the data-based loss JML may increase and decrease again while the Boltzmann generator transformation is updated to include these new states (Fig. 4, C, F, and I, top row). During training, the energy-based loss JKL decreases steadily until the full Boltzmann distribution is sampled (Fig. 4, C, F, and I, middle row). We also observe that the Metropolis Monte Carlo efficiency, defined by the product of step length and acceptance rate, tends to increase over time, although it may decrease temporarily when more states are found (Fig. 4, C, F, and I, bottom row).

Due to the invertible transformation between latent and configuration space, any sampling method that involves reweighting or Monte Carlo acceptance steps can be reformulated in Boltzmann generator latent space and potentially yield enhanced performance.

Complex molecules

We demonstrate that Boltzmann generators can generate equilibrium all-atom structures of macromolecules in one shot using the BPTI in an implicit solvent model (Fig. 5 and supplementary materials). To train Boltzmann generators for complex molecular models, we wrapped the energy and force computation functions of the OpenMM simulation software (26) in the standard deep learning library Tensorflow (27).

Fig. 5 One-shot sampling of all-atom structures in different conformations of the BPTI protein.

(A) Boltzmann generator for macromolecules: Backbone atoms are whitened using PCA; side-chain atoms are described in normalized internal coordinates (crds). (B) BPTI x-ray crystal structure (PDB: 5PTI). Cysteine disulfide bridges and aromatic residues are shown for orientation. (C) One-shot Boltzmann generator sample of all 892 atoms (2670 dimensions) of the BPTI protein similar to the x-ray structure. (D) Potential energy distribution from MD simulation (gray) and Boltzmann generator one-shot samples (blue). (E) Distribution of bonds and angles compared between MD simulation (black) and Boltzmann generator (blue). (F) Representative snapshots of four clusters of structures generated with the Boltzmann generator. Backbone root mean square deviation from the x-ray structure is given below the structure (in angstroms). Marked are the x-ray–like structure “X” and the open structure “O.” (G and H) Magnification of the most variable parts of the Boltzmann-generated samples from the “X” and “O” states. Side chains are shown in atomistic resolution.

Training a Boltzmann generator directly on the Cartesian coordinates resulted in large energies and unrealistic structures with distorted bond lengths and angles. This problem was solved by incorporating the following coordinate transformation in the first layer of the Boltzmann generator that is invertible up to rotation and translation of the molecule (Fig. 5A and materials and methods): the coordinates are split into Cartesian and internal coordinate sets. The Cartesian coordinates include heavy atoms of the backbone and the disulfide bridges. Cartesian coordinates are whitened, i.e., decorrelated and normalized, using a principal component analysis (PCA) of the input data. During whitening, the six degrees of freedom corresponding to global translation and rotation of the molecule are discarded. The remaining side-chain atoms are measured in internal coordinates (bond lengths, angles, and torsions with respect to parent atoms) and subsequently normalized.

After this coordinate transformation, the learning problem is substantially simplified as the transformed input data are already nearly Gaussian distributed. We first demonstrate that Boltzmann generators can learn to sample heterogeneous equilibrium structures when trained with examples from different configurations. To this end, we generated six short simulations of 20 ns each, starting from snapshots of the well-known 1-ms simulation of BPTI produced on the Anton supercomputer (28). In the more common situation that no ultralong MD trajectory is available, the Boltzmann generator can be seeded with different crystallographic structures or homology models. To promote simultaneous sampling of high- and low-probability configurations, we included an RC loss using two slow collective coordinates that have been used earlier in the analysis of this BPTI simulation (supplementary materials) (29, 30). Boltzmann generators of BPTIs that do not use RCs are discussed in the subsequent section.

A trained Boltzmann generator with eight invertible blocks can sample all 892 atom positions (2676 dimensions) in one shot and produce locally and globally valid structures (Fig. 5C). The potential energies of samples exhibit significant overlap with the potential energy distribution of MD simulations (Fig. 5D), and thus samples can be reweighted for free-energy calculations. The probability distributions of most bond lengths and valence angles are almost indistinguishable from the distributions of the equilibrium MD simulations (Fig. 5E). The only exception is that the distributions of valence angles involving sulfur atoms are slightly narrower.

The trained Boltzmann generator learns to encode and sample significantly different structures (Fig. 5F). In particular, it generates independent one-shot samples of the near-crystallographic structure “X” (1.4-Å mean backbone RMSD to crystal structure), and the open “O” structure, which involves significant changes in flexible loops and repacking of side chains (Fig. 5. G and H). The X→O transition has been sampled only once in the millisecond Anton trajectory, which is consistent with the observation of a millisecond-timescale “major–minor” state transition observed in nuclear magnetic resonance spectroscopy (31). We note that X↔O transition states are not included in the Boltzmann generator training data.

Sampling such a transition and collecting statistics for it is challenging for any existing simulation method. Brute-force MD simulations would require several milliseconds of simulation data. To use enhanced sampling, an order parameter or RC able to drive the complex rearrangement shown in Fig. 5, G and H, would need to be found, but because BPTI has multiple states with lifetimes on the order of ~10 to 100 μs (28, 30), the simulation time required for convergence would still be extensive. The computation of free-energy differences using Boltzmann generators will be discussed in the next section.

Thermodynamics between disconnected states

We develop a reaction-coordinate–free approach to compute free-energy differences from disconnected MD or MCMC simulations in separated states, such as two conformations of a protein. As demonstrated above, this can be achieved by a single Boltzmann generator that simultaneously captures multiple metastable states and maps them to the same latent space Z, where they are connected via the Gaussian prior distribution (Figs. 2, B and H, and 3C). However, a more direct statistical mechanics idea that has been successfully applied to certain simple liquids and solids is to compute free-energy differences by relating to a tractable reference state, e.g., ideal gas or crystal (3234). Here, we show that Boltzmann generators can turn this idea into a general method applicable to complex many-body systems.

Recall that the value of the energy-loss function JKL (Eq. 1) estimates the free-energy difference of transforming the Gaussian prior distribution to the generated distribution in configuration space. If we are now given MD data sampled in two or more disconnected states, then we can train independent Boltzmann generators for each of them. The goal here is not to explore configuration space, so training by energy is combined with training by example (Eq. 2) to restrain the generated distribution around the separate states. For each Boltzmann generator, the transformation free energy is computed, e.g., JKL1 and JKL2, by sampling from the Gaussian prior distributions and inserting into Eq. 1. The free-energy difference between the two states is directly given as a difference between these two values, ΔA12=JKL2JKL1 (Fig. 6A).

Fig. 6 Thermodynamics between disconnected states by coupling multiple Boltzmann generators.

(A) Using multiple Boltzmann generators, we can compute free-energy differences between states without requiring RCs using only disconnected MD simulations in each of them. This is possible because each Boltzmann generator estimates the free-energy difference to a common reference state. (B) Example for tracking convergence: estimate of free-energy difference for the double well potential (multiple temperatures) as a function of the training iterations of two Boltzmann generators. Convergence plots for the other systems are shown in fig. S3. Results show estimates from two Boltzmann generators with mean and one standard error computed from bootstrapping the converged segment of the free-energy estimate. (C) Left-to-right transition in the double well, (D) left-to-right transition in the Mueller potential, (E) closed-to-open transition in the solvated bistable particle dimer, (F) X→O transition in an atomistic model of BPTI.

We illustrate our method by computing temperature-dependent free-energy differences for the four systems discussed above, each using two completely disconnected MD simulations as input. Because the estimate of the free-energy difference is readily available from the value of the loss function, it can be conveniently tracked for convergence while the Boltzmann generators are trained (Fig. 6B and fig. S3).

For the two-dimensional systems (double well, Mueller potential), exact reference values for the free-energy differences can be computed, and the Boltzmann generator method recovers them accurately with a small statistical uncertainty over the entire temperature range (Fig. 6, C and D) using 10-fold less simulation data and an ~10-fold shorter training time than for the estimates using a single joint Boltzmann generator reported in Fig. 2 (supplementary materials).

For the solvated bistable dimer, we use simulations that are 10-fold shorter than for the single joint Boltzmann generator reported in Fig. 3 and train two independent Boltzmann generators at multiple temperatures. As a reference, three independent umbrella-sampling simulations were conducted at each of five different temperatures. Both predictions of the free-energy difference between open and closed dimer states are consistent and have overall similar uncertainties (Fig. 6E and fig. S3B; note that the uncertainty of umbrella sampling is strongly temperature dependent). Although umbrella sampling is well-suited for this system with a clear RC, the two–Boltzmann-generator method required 50 times fewer energy calls than the umbrella-sampling simulations at five temperatures and yet makes predictions across the full temperature range (supplementary materials).

Finally, we used the same method to predict the temperature-dependent free-energy difference of the “X” and “O” states in the BPTI protein. Sampling this millisecond transition 10 times by brute-force MD would take ~30 years on one of the GTX1080 graphics cards that are used for computations in this paper and would only give us the free-energy difference at a single temperature. Although speeding up this complex transition with enhanced sampling may be possible, engineering a suitable order parameter is a time-consuming trial-and-error task. Training two Boltzmann generators does not require any RC to be defined.

Here, we used the two–Boltzmann-generator method starting from two simulations of 20 ns each, which were started from selected frames of the 1-ms trajectory, but could generally be started from crystallographic structures or homology models. Conducting the MD simulations, training and analyzing the Boltzmann generators used a total of <3 × 107 energy calls, which results in a converged free-energy estimate within a total of about 10 hours on a GTX1080 graphics card, i.e., about five orders of magnitude faster than the brute-force approach (Fig. 6F and fig. S3C). Although no reference for this free-energy difference in the given simulation model is known, the temperature profile admits basic consistency checks: The x-ray structure is identified as the most stable structure at temperatures below 330 K. The internal energy and entropy terms of the free-energy difference (Eq. 1) are both positive across all temperatures. Therefore, the free-energy decreases at high temperatures as the entropic stabilization becomes stronger. A higher configurational entropy of the “O” state is consistent with its more open loop structure (compare Fig. 5, G and H) and the higher degree of fluctuations in the “O” state observed by the analysis in (30).

Discussion and conclusion

Boltzmann generators can overcome rare event-sampling problems in many-body systems by generating independent samples from different metastable states in one shot. We have demonstrated this for dense and unstructured many-body systems with up to 892 atoms (over 2600 dimensions) that are placed simultaneously, with most samples having globally and locally valid structures and potential energies in the range of the equilibrium distribution. In contrast to other generative neural networks, Boltzmann generators produce unbiased samples, as the generated probability density is known at each sample point and can be reweighted to the target Boltzmann distribution. This feature directly translates into being able to compute free-energy differences between different metastable states.

In contrast to enhanced sampling methods that directly operate in configuration space, such as umbrella sampling or metadynamics, Boltzmann generators can sample between metastable states without any predefined RC connecting them. This is achieved by learning a coordinate transformation in which different metastable states become neighbors in the transformed space where the sampling occurs. If suitable RCs are known, then these can be incorporated into the training to sample continuous pathways between states, e.g., to compute a continuous free-energy profile along a reaction.

As in many areas of machine learning, the key to success is to choose a representation of the input data that supports the learning problem. For macromolecules, we have found that a successful representation is to describe its backbone in Cartesian coordinates and all atoms that branch off from the backbone in internal coordinates. Additionally, normalizing these coordinates to mean 0 and variance 1 already makes their probability distribution close to a Gaussian normal distribution, thus simplifying the learning problem considerably.

We have shown, in principle, how explicit solvent systems can be treated. For this, it is essential to build the physical invariances into the learning problem. Specifically, we need to account for permutation invariance: when two equivalent solvent molecules exchange positions, the potential energy of the system is unchanged and so is the Boltzmann probability.

We have demonstrated scaling of Boltzmann generators to 1000’s of dimensions. Generative networks in other fields have been able to generate photorealistic images with 106 dimensions in one shot (15). However, for the present application, the statistical efficiency, i.e., the usefulness of these samples to compute equilibrium free energies, will decline with increasing dimension. Scaling to systems with 100,000’s of dimensions or more, such as solvated atomistic models of large proteins, can be achieved in different ways. Sampling of the full atomistic system may be addressed with a divide-and-conquer approach: In each iteration of such an approach, one would resample the positions of a cluster of atoms using the sum of potential energies between cluster atoms and all system atoms and then, e.g., perform Monte Carlo steps using these cluster proposals. Alternatively, Boltzmann generators could be used to sample lower-dimensional free-energy surfaces learned from all-atom models (35, 36).

A caveat of Boltzmann generators is that, depending on the training method, they may not be ergodic, i.e., they may not be able to reach all configurations. Here, we have proposed a training method that promotes state space exploration by performing Monte Carlo steps in the Boltzmann generator’s latent space while training the network. This may be viewed as a general recipe: The whole plethora of existing sampling algorithms, such as umbrella sampling, metadynamics, and replica exchange, can be reformulated in the latent space of a Boltzmann generator, potentially leading to dramatic performance gains. Any such approach can always be combined with MD or MCMC moves in configuration space to ensure ergodicity.

Finally, the Boltzmann generators described here learned a system-specific coordinate transformation. The approach would become much more general and efficient if Boltzmann generators could be pretrained on certain building blocks of a molecular system, such as oligopeptides in solvent or a protein, and then reused on a complex system consisting of these building blocks. A promising approach is to involve transferrable featurization methods developed in the context of machine learning for quantum mechanics (3740).

In summary, Boltzmann generators represent a powerful approach to addressing the long-standing rare-event sampling problem in many-body systems and open the door for new developments in statistical mechanics.

Materials and methods

Invertible networks

We used invertible networks with trainable parameters θ to learn the transformation between the Gaussian random variables z and the Boltzmann-distributed random variables x:z=Fxz(x;θ)x=Fzx(z;θ).Hence Fxz=Fzx1. Each transformation has a Jacobian matrix with the pairwise first derivatives of outputs with respect to inputs:Jzx(z;θ)=[Fzx(z;θ)z1,...,Fzx(z;θ)zn]Jxz(x;θ)=[Fxz(x;θ)x1,...,Fxz(x;θ)xn]The absolute value of the Jacobian’s determinant, |detJzx(z;θ)|, measures how much a volume element at z is scaled by the transformation. Below, we will omit the symbol θ and use the abbreviations:Rxz(x)=|detJxz(x)|Rzx(z)=|detJzx(z)|We use invertible transformations because they allow us to transform random variables as follows:

pX(x)=pZ(z)Rzx(z)1=pZ(Fxz(x))Rxz(x)(3)pZ(z)=pX(x)Rxz(x)1=pX(Fzx(z))Rzx(z)(4)

Trainable invertible layers

We use the RealNVP transformation as trainable part of invertible networks (20). The main idea is to split the variables into two channels, x=(x1,x2), z=(z1,z2), and do only trivially invertible operations on each channel, such as multiplication and addition. Additionally, we use arbitrary, noninvertible artificial neural networks S and T that respectively scale and translate the second input channel x2 using a nonlinear transformation of the first input channel x1.fxz(x1,x2):(z1=x1z2=x2exp(S(x1;θ))+T(x1;θ)(5)logRxz=iSi(x1;θ)(6)fzx(z1,z2):(x1=z1x2=(z2T(x1;θ))exp(S(z1;θ))(7)logRzx=iSi(z1;θ)(8)A RealNVP “block” is defined by two stacked RealNVP layers with channels swapped, such that both channels are transformed:(y1,y2)=fxy(x1,x2)(z1,z2)=fyz(y2,y1)Boltzmann generators are built by putting the forward and the inverse of such blocks in parallel that share the same nonlinear transformations T and S and parameters (Fig. 1B).

PCA whitening layer

We define a fixed-parameter layer “W” to transform the input coordinates into whitened principal coordinates. For systems with rototranslationally invariant energy, we first remove global translation and rotation by superimposing each configuration to a reference configuration. We then perform PCA on the N input coordinates X by solving the eigenvalue problem:1NXTXR=R Λwhere R=[r1,...,rN] are principal components vectors and Λ=diag(λ1,...,λd) their variances. For systems with rototranslationally invariant energy, the six smallest eigenvalues are 0 and are discarded along with the corresponding eigenvectors. The whitening transformation and its inverse are defined by:W(x):z=Λ12RxW1(z):x=RΛ12zNote that when translation and rotation are removed in the transformation, this layer is only invertible for x where translation and rotation are removed as well. However, the network is always invertible for the relevant sequence zxz. The Jacobians of W are:

logRxz=12ilogλilogRzx=12ilogλi

Mixed coordinate transformation layer

To treat macromolecules, we defined a new transformation layer “M” that transforms into mixed whitened Cartesian and normalized internal coordinates. We first split the coordinates into a Cartesian and an internal coordinate set, x[xC,xI]. xC is whitened (see above), xIis transformed into internal coordinates (ICs). For every particle i in xI, we define three “parent” particles, j, k, and l, and the Cartesian coordinates of particles i, j, k, and l are converted into distance, angle and dihedral (dij,αijk,ϕijkl). Finally, each IC is normalized by subtracting the mean and dividing by the standard deviation of the corresponding coordinates in the input data (Fig. 5A). PCA whitening and IC normalization are essential for training Boltzmann generators for complex molecules, as this sets large fluctuations of the whole molecule on the same scale as small vibrations of stiff coordinates such as bond lengths. We briefly call the transformation to normalized internal coordinates I(x).

The inverse transformation is straightforward: The Cartesian set is first restored by applying W1. Then, the particles in the internal coordinate unnormalized and then placed in a valid sequence, i.e., first particles I, whose parent particles are all in the Cartesian set, then particles with parents that have just been placed, etc. As for the W layer, the M layer is invertible up to global translation and rotation of the molecule that may have been removed during whitening. Additionally, we prevent noninvertibility in dihedral space by avoiding the generation of angle values outside the range[π,π](supplementary materials).

The Jacobians of the M layer are computed using Tensorflow’s automatic differentiation methods.

Training and using Boltzmann generators

The Boltzmann generator is trained by minimizing a loss functional of the following form:J=wMLJML+wKLJKL+wRCJRC.(9)where the terms represent ML (“training by example”), KL (“training by energy”), and RC optimization and the w’s control their weights. Below, we will derive these terms in detail.

We call the “exact” distributions μ and the generated distributions q. In particular, μZ(z) is the Gaussian prior distribution from which we sample latent space variables and qX(x) is the distribution that results from the network transformation Fzx. Likewise, μX(x)exp(u(x)) is the Boltzmann distribution in configuration space and qZ(z) is the distribution that results from the network transformation Fxz:μZ(z)FzxqX(x)μX(x)FxzqZ(z)A special case is to use Boltzmann generators to sample from the Boltzmann distribution of the canonical ensemble. Other ensembles can be modeled by incorporating the choice of ensemble into the reduced potential (41). The Boltzmann distribution has the form:μX(x)=ZX1eβU(x)(10)where β1=kBT with Boltzmann constant kB and temperature T. When we only have one temperature, we define the reduced energy:u(x)=U(x)kBTTo work with multiple temperatures (T1,...,TK), we define a reference temperature T0 and reduced energy u0(x)=U(x)/kBT0. The reduced energies are then obtained by scaling with the relative temperature τk=Tk/T0:uk(x)=T0Tku0(x)=u0(x)τk.To generate the prior distribution, we sample the input in z from the isotropic Gaussian distribution:μZk(z)=N(0,σk2I)=ZZ1e12z2/σk2,(11)with normalization constant ZZ. This corresponds to the prior energy of a harmonic oscillator:uZk(z)=logμZk(z)=12σk2z2+const.(12)Thus, the variance takes the same role as the relative temperature. We (arbitrarily) choose variance 1 for the standard temperature and obtain:

σk2=τk.

Latent KL divergence

The KL divergence measures the difference between two distributions q and p:KL(qp)=q(x)[logq(x)logp(x)]dx,=Hqq(x)logp(x)dx,where Hq is the entropy of distribution q. Here, we minimize the difference between the probability densities predicted by the Boltzmann generator and the respective reference distribution. Using Eqs. 3, 4, and 10, the KL divergence in latent space is:KLθ[μZqZ]=HZμZ(z)logqZ(z;θ)dz,=HZμZ(z)[logμX(Fzx(z;θ))+logRzx(z;θ)]dz,=HZ+logZX+EzμZ(z)[u(Fzx(z;θ))logRzx(z;θ)]Here, θ are the trainable neural network parameters. Because HZ and ZX are constants in θ, the KL loss is given by:JKL=EzμZ(z)[u(Fzx(z;θ))logRzx(z;θ)](13)Practically, each training batch samples points zqZ(z) from a normal distribution, transforms them via Fzx, and evaluates Eq. 13. As shown in the supplementary materials, the KL loss can be rewritten to:JKL=UHX+HZ(14)which is, up to the constant HZ, equal to the free energy of the generated distribution with internal energy U and entropic factor HX.

We can extend Eq. 13 to simultaneously train at multiple temperatures, obtaining:JKLT1,...,TK=k=1KEzμZk(z)[uk(Fzx(z;θ))logRzx(z;θ)]The KL divergence KLθ[μZqZ] is also minimized in probability density distillation used in different contexts, e.g., in the training of recent audio generation networks (16).

Reweighting and interpretation of latent KL as reweighting loss

A simple way to compute quantitative statistics using Boltzmann generators is to use reweighting of probability densities by assigning the statistical weight wX(x) to each generated configuration x. Using Eqs. 3 and 4, we obtain:wX(x)=μX(x)qX(x)=qZ(z)μZ(z)(15)euX(Fzx(z))+uZ(z)+logRzx(z;θ)Equilibrium expectation values can then be computed as:E[O]i=1NwX(x)O(x)i=1NwX(x)(16)All free-energy profiles shown in Figs. 2 and 3 and fig. S2 were computed by kBTlogp(R(x)) where p(R(x)) is a probability density computed from a weighted histogram of the coordinate R(x) using the weighted expectation (Eq. 16). Histogram bins with weights worth less than 0.01 samples are discarded to avoid making unreliable predictions.

Using Eq. 15, it can be shown that minimization of the KL divergence (Eq. 13) is equivalent to maximizing the sample weights:

minKLθ[μZqZ]=minEzμZ(z)[logμZ(z)logqZ(z;θ)]=maxEzμZ(z)[logwX(x|z)]

Configuration KL divergence and ML

Likewise, we can express the KL divergence in x space where we compute the divergence between the probability of generated samples with their Boltzmann weight. Using Eqs. 3, 4, and 11:KLθ[μXqX]=HXμX(x)logqX(x;θ)dx=HXμX(x)[logμZ(Fxz(x;θ))+logRxz(z;θ)]dx.=HX+logZZ+Exμ(x)[1σ2Fxz(x;θ)2logRxz(x;θ)]This loss is difficult to evaluate because we cannot sample from μ(x) a priori. However, we can approximate the configuration KL divergence by starting from a sample ρ(x), resulting in:JML=Exρ(x)[logqX(x;θ)]=Exρ(x)[1σ2Fxz(x;θ)2logRxz(x;θ)]JML is the negative log-likelihood, i.e., minimizing it maximizes the likelihood of the sample ρ(x) in the Gaussian prior density.

Symmetric divergence

The two KL divergences above can be naturally combined to the symmetric divergenceKLsym=12KL[μXqX]+12KL[μZqZ]which corresponds, up to an additive constant, to a Jensen–Shannon divergence that uses the geometric mean of m=qXqZ instead of the arithmetic mean.

Reaction coordinate loss

In some applications, we do not want to sample from the Boltzmann distribution but promote the sampling of high-energy states in a specific direction of configuration space, for example, to compute a free-energy profile along a predefined RC r(x) (Fig. 2, E and K). This is achieved by adding the RC loss to the minimization problem:JRC=p(r(x))logp(r(x)) dr(x)=ExqX(x)logp(r(x))To implement this loss, the function r is a user input, minimum and maximum bounds are given, and p(r(x)) is computed as a batchwise kernel density estimate between the bounds.

Adaptive sampling and training

We define the following adaptive sampling method that trains a Boltzmann generator while simultaneously using it to propose new samples. The method has a sample buffer X that stores a predefined number of x samples. This number is chosen such that low-probability states of interest still have a chance to be part of the buffer when it represents an equilibrium sample. For the examples in Fig. 4, it was chosen to be 10,000 (double well, Mueller potential) and 100,000 (solvated particle dimer). X can be initialized with any candidates for configurations; in the examples in Fig. 4, it was initialized with only one configuration (copied to all elements of X); for the particle dimer system, we additionally added small Gaussian noise with standard deviation 0.05 nm to avoid the initial Boltzmann generator overfitting on a single point. The Boltzmann generator was initially trained by example, minimizing JML, using batch-size 128 and 20, 20, and 200 iterations for double well, Mueller potential, and particle dimer, respectively. We then iterated the following adaptive sampling and training loop using batch-size 1000 for all examples.1. Sample batch {x1,...,xB} from X.2. Update Boltzmann generator parameters θ by training on batch.3. For each x in batch, propose a Metropolis Monte Carlo step in latent space with step size s:z'=Txz(x)+sN(0,I).4. Accept or reject proposal with probability min{1,exp(ΔE)} using:ΔE=u(Tzx(z'))u(x)logRzx(z';θ)+logRxz(x;θ)5. For the accepted samples, replace x by x'=Tzx(z').

Supplementary Materials

science.sciencemag.org/content/365/6457/eaaw1147/suppl/DC1

Supplementary Text

Figs. S1 to S3

Table S1

References (4349)

References and Notes

Acknowledgments: We are grateful to C. Clementi (Rice University), B. Husic, M. Sadeghi, M. Hoffmann (FU Berlin), and P. Shanahan (MIT) for valuable comments and discussions. Funding: We acknowledge funding from the European Commission (ERC CoG 772230 “ScaleCell”), the Deutsche Forschungsgemeinschaft (CRC1114/A04, GRK2433 DAEDALUS), the MATH+ Berlin Mathematics Research Center (AA1x8, EF1x2), and the Alexander von Humboldt Foundation (postdoctoral fellowship to S.O.). Author contributions: F.N., S.O., and J.K. designed and conducted research. F.N., J.K., and H.W. developed theory. F.N., S.O., and J.K. developed computer code. F.N. and S.O. wrote the paper. Competing interests: The authors declare no competing interests. Data and materials availability: Data and computer code for generating results of this paper are available at Zenodo (42).
View Abstract

Stay Connected to Science

Navigate This Article