Mathematical Questions in Molecular Dynamics

WINGS Todai - ENPC Workshop

Noé Blassel

CERMICS lab, École des Ponts ParisTech, MATHERIALS team, Inria Paris.

2025-04-02

Molecular Dynamics: a numerical microscope

Molecular Dynamics (MD) investigates the behavior of molecular systems at the level of classical motion of atoms.

Five picoseconds of Foldit1 (PDB ID 6MRR) simulated with Molly.jl, thanks to Joe Greener.

  • Replaces physical experiments in extreme (nuclear physics, materials science) or expensive (pharmaceutical trials) conditions with computer simulations (in silico experiments).

  • Very useful method in biophysics and materials science, e.g. to investigate properties of candidate drugs or new materials.

  • Many algorithmic tricks are needed to simulate long trajectories reliably and efficiently, and sample the whole configuration space.

  • Mathematical insight can help to justify algorithms theoretically and propose new methods. This is part of what the MAS (Modeling, Analysis and Simulation) team does at CERMICS.

Mathematical formulation

  • Typically the position-momentum pairs \((q_t,p_t) \in {\mathbb{{R}}}^{3N}\times{\mathbb{{R}}}^{3N}\) for \(N\) atoms follow the underdamped Langevin dynamics: \[ \left\{\begin{aligned} \mathrm{d}q_t &= \color{blue}{M^{-1}p_t\,\mathrm{d}t},\\ \mathrm{d}p_t &= \color{blue}{-\nabla V(q_t)\,\mathrm{d}t} \color{red}{-\gamma M^{-1}p_t\,\mathrm{d}t + \sqrt{\frac{2\gamma}{\beta}}\,\mathrm{d}W_t} \end{aligned}\right. \]
  • Hamiltonian dynamics + Ornstein–Uhlenbeck process which maintains the system at a target temperature \(T=(\beta k)^{-1}\).
  • Physical input: temperature parameter \(\beta>0\), friction \(\gamma>0\), mass matrix \(M \in {\mathbb{{R}}}^{3N\times 3N}\), and potential function \(V:{\mathbb{{R}}}^{3N}\to {\mathbb{{R}}}\) which captures the interactions between atoms.
  • \(V\) is usually determined by fitting an empirical model to data from ab initio computations (increasingly using machine learning models).
  • Typical scales: timestep \(\Delta t\approx 10^{-15}\)s, simulation length \(T=10^{-9}\)\(10^{-6}\)s (largest \(\approx 10^{-3}\)s with naive MD), number of atoms \(N=10^3\)\(10^{9}\) (largest \(\approx 10^{12}\)).

Physical properties from microscopic dynamics

  • The thermal equilibrium (Boltzmann–Gibbs) measure \[ \mu_\beta(q,p)=Z_\beta^{-1} \mathrm{e}^{-\beta\left(\frac12 p^\top M^{-1}p + V(q)\right)}\] is invariant under the underdamped Langevin dynamics \((q_t,p_t) \in {\mathbb{{R}}}^{3N}\times {\mathbb{{R}}}^{3N}\).
  • Thermodynamic properties: given a physical observable \(\varphi : {\mathbb{{R}}}^{3N}\times {\mathbb{{R}}}^{3N} \to {\mathbb{{R}}}\), compute a empirical time-average \[ \frac{1}{T} \int_{0}^{T} \varphi(q_t,p_t)\,\mathrm{d}t \xrightarrow{T\to \infty} \int_{{\mathbb{{R}}}^{3N}\times {\mathbb{{R}}}^{3N}} \varphi(q,p) \mathrm{d}\mu_\beta(q,p)\]
  • This convergence uses the ergodicity of the dynamics. Examples of thermodynamic properties: pressure, radial distribution function, bond angles, free-energy…
  • Dynamical properties: these depend on the trajectories themselves. These can be estimated by taking empirical averages \[ \frac{1}N\sum_{k=1}^N \mathcal F\left((q^n_t,p^n_t)_{0\leqslant t\leqslant T}\right) \xrightarrow{N\to\infty} \int_{\mathcal C([0,T];{\mathbb{{R}}}^{3N}\times {\mathbb{{R}}}^{3N})} \mathcal F((q_t,p_t)_{0\leqslant t\leqslant t})\mathrm{d} \mathbb P((q_t,p_t)_{0\leqslant t \leqslant T})\] over independent trajectories \((q^n_t,p^n_t)_{0\leqslant t\leqslant T}\), with typically \(T\gg 1\). This is a much harder problem.
  • Examples of dynamical properties: shear viscosity, heat conductivity (transport coefficients), transition times, reaction rates…
  • Both are examples of Monte-Carlo method to estimate averages w.r.t. high-dimensional probability measures.

Typical mathematical questions

  • Theoretical questions: Are the dynamics well-posed ? Do the averages converge to the target quantity ? How to estimate the statistical error, i.e. provide error bars ?
  • Discretizations / Numerical Analysis: How to discretize the continuous-time dynamics in a consistent/stable way ? How to estimate the discretization error ?
  • Enhanced Sampling / Variance reduction: How to improve the naive estimators ?
  • Algorithms: How to exploit properties of the system to design efficient simulation algorithms ?
  • Related to many fields: Statistics/Numerical Probability, Dynamical systems, Analysis of PDE…
  • Methodological ping-pong: methods in MD \(\rightleftarrows\) methods in other field of applied mathematics.

The timescale problem

  • Stability constraints force \(\Delta t \approx 10^{-15}\) s \(\sim\) vibrational period of hydrogen bonds.
  • Dynamical properties often requires much larger timescales: \(T \approx 10^{-3}\)\(10^{3}\) s \(\rightarrow \sim 10^{12}\)\(10^{18}\) timesteps with naive MD: not possible.
  • Key difficulty: metastability/separation of timescales.
  • Signature of energetic and entropic barriers separating metastable states in configuration space.

Accelerated MD

  • Methods designed to exploit metastability to accelerate transitions in dynamically unbiased ways. Allow to simulate very long trajectories.
  • The Parallel Replica Algorithm (ParRep) does this by parallelizing the exit from metastable states.
  • The question I’m working on: how to define metastable states optimally to make ParRep as efficient as possible ?
  • Mathematically: problem in shape optimization of eigenvalues of an operator, \[ \underset{\Omega\subset {\mathbb{{R}}}^{3N}}{\min}\,\frac{\lambda_1(\Omega)}{\lambda_2(\Omega)-\lambda_1(\Omega)},\] with \[ \mathcal{L}_\beta u_k(\Omega) = -\lambda_k(\Omega)\text{ in }\Omega,\quad u_k(\Omega)|_{\partial\Omega}=0,\quad \mathcal{L}_\beta =\mathrm{e}^{\beta V}\mathrm{div}\left(\mathrm{e}^{-\beta V}\nabla\cdot\right).\]

A simulation using ParRep

  • Simulation of the stretching of a silver nanowire at 300 K from Los Alamos National Laboratory. Simulation using ParRep with \(N \approx 10^5\) CPU cores.
  • Each \(\mu\)s = \(10^9\) timesteps of naive MD. With ParRep, it is \(\approx C\times 10^9 / N \approx C \times 10^4\) steps of accelerated MD, with \(C>1\).
  • My job: find mathematically principled ways to make the prefactor \(C\) as small as possible.