Learning quasistationarity diagnostics for accelerated molecular dynamics

Outline

Motivation long-time simulation of metastable systems
Theoretical results coarse-graining of killed Markov processes
Application quasi-stationarity diagnostics with RNNs

Ongoing work with choices/extensions to discuss

Dynamical quantities in MD

Molecular Dynamics (MD) models atomic motion with a stochastic process \(X\) in a configuration space \(\mathcal{X}\) (allowed positions of nuclei)
Invariant distribution is the Gibbs measure for the interaction potential \(V:\mathcal{X}\to\mathbb{R}\) \[ \mu(\mathrm{d}x) = \frac{1}{Z_\beta}\mathrm{e}^{-\beta V(x)}\,\mathrm{d}x \in\mathcal{P}(\mathcal{X}),\quad \beta = 1/(k_{\mathrm{B}}T), \] giving access to thermodynamic/static quantities (via ergodic path averages)
dynamical/trajectorial/kinetic quantities (reaction rates, transport coefficients, time-correlations, reactive states…) are much more difficult (averages in path space)

Typical kinetic sampling task in MD: protein-ligand interaction rates (Image from Wikipedia)

Metastability and the timescale problem

Metastable trajectories on a free energy landscape

Dynamical properties typically require long trajectories because of metastability (timescale problem)

Multiple timescales(\(\mathrm{fs}\) simulation timestep/ \(\mathrm{ps}\) mesoscopic fluctuations/\(\mu\mathrm{s}-\mathrm{ks}\) macroscopic transitions)

Local description of metastability

\(X\) enters a metastable state \(\Omega \subset \mathcal{X}\)
\(X\) reaches a local equilibrium in \(\Omega\)
\(X\) stays in local equilibrium for a long time before \(\tau_{\partial\Omega}=\)exit time

Local equilibria are quasi-stationary distributions

A probability measure \(\nu^X\in \mathcal{P}(\Omega)\) is a quasi-stationary distribution ¹ (QSD) if for any \(A\subset\Omega\) and \(t>0\), \[ \mathbb{P}_{\nu^X}\left(X_t\in A \,|\, \tau_{\partial \Omega} > t\right) = \nu^X(A) \]

Under broad conditions on \(X\) and \(\Omega\), \(\exists!\) QSD \(\nu^X\) ², and for each \(\eta_0^X \in \mathcal{P}(\Omega)\), \(\nu^X\) is the long-time limit in law conditioned on survival (Yaglom limit) \[ \underset{t\to+\infty}{\lim}\,\mathrm{d}_{\mathrm{TV}}(\eta_t^X,\nu^X) = 0,\qquad \eta_t^X = \mathrm{Law}\left(X_t\,\middle|\, \tau_{\partial\Omega} > t \right) \]

Accelerated dynamics

Accelerated MD methods exploit properties of the QSD to generate long approximate trajectories at lower cost

Tradeoff: short/accurate MD vs long/state-to-state AMD

Parallel replica method¹

Sample QSD using long-time limit of Fleming-Viot system

Fleming-Viot process

Interacting particle system \(({\bf X}_t)_{t\geqslant 0}\) where \({\bf X}_t = \left(X_t^{(i)}\right)_{1\leqslant i\leqslant N}\in \mathcal{X}^N\)

Algorithm

Each \(X^{(i)}\) is an independent replica \(X\)
When \(X^{(i)}\) exits, resample independently uniformly from survivors

Particle approximation of \(\eta_t^X\) via propagation of chaos results \[ \widehat{\eta}^X_{t,N}\xrightarrow{N\to +\infty} \eta_t^X,\qquad \widehat{\eta}^X_{t,N} = \frac1N\sum_{i=1}^N \delta_{X_t^{(i)}}, \] uniformly in time (see² for state of the art results)

Convergence diagnostics

Having a reliable and efficient algorithm requires knowing when \(\eta_t^X \approx \nu^X\)

Bottleneck develop sensitive/specific quasi-stationarity diagnostics \(\iff\) accurate estimates of decorrelation times

Gelman-Rubin diagnostic

\[ \widehat\eta_{t,N}^{X,(i)} = \frac1T\int_0^T \delta_{X_s^{(i)}}\,\mathrm{d}s,\qquad \widehat\eta_{t,N}^{X} = \frac1N\sum_{i=1}^N \widehat\eta_{t,N}^{X,(i)} \]

A Gelman-Rubin-type trace¹ is defined, for some collective variable (CV) \(\xi:\mathcal{X}\to\mathbb{R}\), by \[ R_{t,N}(\xi) = \frac{\mathrm{Var}\left(\xi_\star\widehat\eta_{t,N}\right)}{\displaystyle\frac1N \sum_{i=1}^N \mathrm{Var}\left(\xi_\star \widehat\eta_{t,N}^{(i)}\right)}\xrightarrow{t\to +\infty} 1 \] GR decorrelation time \[ T_{\mathop{\mathrm{corr}}}^{GR}(\alpha,t_0) = \inf\left\{t\geqslant t_0,\middle|\,\underset{\xi\in \{\xi_1,\dots,\xi_\ell\}}{\max} R_{t,N}(\xi) < 1+\alpha\right\} \]

Possible approaches

Physical heuristics ²
Quantitative analysis in asymptotic regimes³⁴⁵
MCMC (non)convergence diagnostics ⁶ on Fleming-Viot population(Gelman-Rubin)
Our approach: effective dynamics (see⁷ ⁸ for global constructions)

Only 3. and 4. are generic enough to apply to many systems (e.g. biomolecules)

3. assumes that each \(\xi(\mathbf{X})\) can be treated as a collection of independent Markov processes
Requires knowledge of good \(\xi\)/neglects correlations between replicas
Conservative rate of convergence

Coarse graining framework

Approximate the conditional law of \(\xi(X)\) with that of a Markovian approximation \(Z\)

Measure-valued flow

\(X,Z\) be an irreducible jump processes on finite state spaces \(\mathcal{X},\,\mathcal{Z}\) killed on \(\partial\mathcal{X},\partial\mathcal{Z}\). The law of \(Y\in\{X,Z\}\) conditioned on survival is a non-linear measure-valued flow \[ \eta_t^Y(f) = \frac{\eta_0^Y(Q_t^Y f)}{\eta_0^Y(Q_t^Y \mathbf{1}_{\mathcal{Y}})} \] where \(Q_t^Y\) is the killed semigroup \[ Q_t^Y f(y) = \mathbb{E}_y\left[f(Y_t)\mathbf{1}_{\tau_{\partial\mathcal{Y}}>t}\right] \]

Spectral decomposition

By Perron-Frobenius, spectral decomposition on \(Q^Y\), \((\lambda^Y, \nu^Y, h^Y)\) principal eigentriplet and a spectral gap \(\gamma^X>0\) \[ \|Q_t^Y\| \le C_\lambda^Y \mathrm{e}^{-\lambda^Y t},\qquad \|Q_t^Y P^Y\| \le C_\gamma^Y e^{-(\lambda^Y + \gamma^Y)t}\qquad P^Y f= f - \nu^Y(f)h^Y \]

Coarse-graining operators

Given a surjective CV \(\xi: \mathcal{X}\to \mathcal{Z}\) and a family \(\mathcal{F} =(\pi_z)_{z\in \mathcal{Z}}\), \(\pi_z\in \mathcal{P}(\xi^{-1}(z))\), define \[ \Pi^Z:\ell^\infty(\mathcal{X})\to\ell^\infty(\mathcal{Z}),\qquad \Pi^Z f(z) = \pi_z(f) \] and by pullback a projector onto fiber-constant observables \[ \Pi^X f = (\Pi^Z f)\circ\xi \]

Uniform bounds on the coarse-graining bias

Let \(\pi_z\) be the conditioning of \(\nu^X\) on \(\xi^{-1}(z)\), assume the timescale separation condition \[ \left\| e^{t(I-\Pi^X)\mathcal{L}^X(I-\Pi^X)} \right\| \le C_\zeta e^{-\zeta t} \]

Theorem ¹ (in preparation)

Suppose \(\zeta > 2(\lambda^X + \gamma^X)\) and \(\eta_0^Z = \xi_\star \eta_0^X\)

There exists \(Z^*\in\mathrm{Markov}(\mathcal{Z})\) such that for all \(T > 0\), \[ \mathrm{d}_{\mathrm{TV}}\left(\xi_\star\eta_T^X,\, \eta_T^{Z^*}\right) \le \frac{1}{\zeta} \left( C e^{-\gamma^X T} + C' T e^{-(\gamma^{Z^*} \wedge \gamma^X) T} \right)\xrightarrow{T\to\infty}0 \] for explicit prefactors \(C,C'\) independent of \(T\)

We have quasimode estimates (obviously \(\nu^Z =\xi_\star \nu^X\)) \[ \|h^X - \Pi^X h^X\| \le \frac{C_\lambda^X C_\zeta}{\zeta - \lambda^X} \|\mathcal{L}^X \Pi^X - \Pi^X \mathcal{L}^X\|, \qquad \|\Pi^{Z^*} h^X - h^{Z^*}\| = \frac{C_\lambda^X C_\zeta C_\gamma^{Z^*}}{\gamma^{Z^*}(\zeta - \lambda^X)} \|\mathcal{L}^X \Pi^X - \Pi^X \mathcal{L}^X\|^2 \]

Proof approach by backward numerical analysis inspired by ²

Small-parameter illustration

Consider \(\mathcal{L}_\epsilon^X = J_{\text{mac}} - K + \epsilon^{-1} J_{\text{mic}}\), with \(J_{\text{mic}}=\bigoplus_{z\in\mathcal{Z}} A_z\) fiber-reducible (\(A_z\in \mathrm{Jump}(\xi^{-1}(z))\))

\(\sup\) in time of the bias scales as \(C/\zeta\) (generally very small)

True bias decays at the bound rate

Supervised learning idea

Teach a model to detect quasi-stationarity on an ensemble of processes \({Z}\) given trajectories of \(\mathbf{Z}\) (Fleming-Viot)
Deploy the model given trajectories of \(\xi(\mathbf{X})\) (CV-Fleming-Viot population)
Hope that scale-separation holds for \((\xi,X)\) and that the ensemble covers \(Z^*\)

Classification problem

Approximate the conditional probability \[ p_\theta(\mathscr{Z}) \approx h^*(\mathscr{Z}) := \mathbb{P}_{(z,t_i,t_f,Z)\sim \rho_{\mathrm{synth}}}\left(t_f>t_{\mathop{\mathrm{corr}},Z}(z)\middle|\,\Psi_{t_i,t_f}\left({\bf Z}^z\right)=\mathscr{Z}\right) \]

Fleming-Viot started from \(z\): \({\bf Z}^z\)
Honest decorrelation time \(t_{\mathop{\mathrm{corr}},Z}(z) = \inf\{t>0:\,\mathrm{d}_{\mathrm{TV}}\left(\eta_t^Z,\nu^Z\right) < \varepsilon\}, \eta_0^Z =\delta_z\)
Featurization of \({\bf Z}^z\) on \([t_i,t_f]\): \(\Psi_{t_i,t_f}({\bf Z}^z)\)
Synthetic distribution: \(\rho_{\mathrm{synth}}\in \mathcal{P}(\mathcal{Z}\times (0,\infty)^2\times \mathrm{Markov}(\mathcal{Z}))\)

Specific instantiation

One-dimensional CV \(\mathcal{Z}= (0,1)\) and \(\Psi_{t_i,t_f}({\bf Z}^z)\) is a sequence of time-pooled population histograms \[ \left(\frac{1}{\tau}\int_{t_i}^{t_i+\tau}\widehat{\eta}^Z_{s,N}\,\mathrm{d}s,\,\dots,\frac{1}{\tau}\int_{t_f-\tau}^{t_f}\widehat{\eta}^Z_{s,N}\,\mathrm{d}s\right) \] (recall \(\widehat{\eta}^Z_{s,N}\) is the empirical measure of \({\bf Z}_s^x\))

Randomize \((\tau,N)\) in training, and classify conditionally on this information

Note that \(h^*\) minimizes the cross-entropy functional \[ J[h] = \mathbb{E}_{\rho_{\mathrm{synth}}}\left[1_{t_f>t_{\mathop{\mathrm{corr}},Z}(z)}\log\,h\left(\Psi_{t_i,t_f}\left({\bf Z}^z\right)\right) - 1_{t_f\leqslant t_{\mathop{\mathrm{corr}},Z}(z)}\log\left\{1-h\left(\Psi_{t_i,t_f}\left({\bf Z}^z\right)\right)\right\}\right] \]

Synthetic ensemble

Synthetic reversible diffusion ensemble

Sample free energy/diffusion landscapes under \(\rho_{\mathrm{synth}}\)

Distribution of honest decorrelation times

Sample honest decorrelation times under \(\rho_{\mathrm{synth}}\)

Model and training

We train on subsequences of \(\Psi_{t_i,t_f}({\bf Z}^x) =(X_1,\dots,X_K)\)

Decorrelation times (hence \(K\)) spans several orders of magnitude natural choice of architecture is a recurrent neural network (RNN)

\[ \forall\,1\leqslant k\leqslant K,\quad e_k = E_\theta(X_k),\quad h_k = R_\theta(e_k,h_{k-1},\color{red}{N,\tau}),\quad p_\theta(X)_k = D_\theta(h_k),\quad h_0=0 \]

We minimize for \[ \theta^* \in \mathrm{Argmin}_\theta J[p_\theta] \] using Adam + batched Monte-Carlo estimators of \(J\) in practice

Implementation/training using Flux.jl, hyperparameter selection using Bayesian optimization framework NePS¹

Test results: accuracy

We compare diagnoses on the original classification task in \(\rho_{\mathrm{synth}}\)

Accuracy of Gelman-Rubin diagnostic (th=\(\alpha\), stride=\(t_0\))

Accuracy of RNN diagnostic (decision threshold=\(\alpha\))

Test results: ROC curves

Receiver operating characteristic curves for various diagnoses (best \(N,\tau\) retained each time)

MD results: alanine dipeptide

Solvated alanine dipeptide and its backbone dihedrals \(\xi=(\phi,\psi)\)

Estimated \(\xi_\star\nu^X\), with initial samples for the Fleming-Viot process

MD results: time efficiency

RNN diagnostic

γ = 1

0.5	0.81 ± 0.16
0.6	0.87 ± 0.17
0.7	0.92 ± 0.16
0.8	1.01 ± 0.18
0.9	1.15 ± 0.19

0.5	0.62 ± 0.08
0.6	0.65 ± 0.08
0.7	0.69 ± 0.08
0.8	0.73 ± 0.09
0.9	0.79 ± 0.09

0.5	0.94 ± 0.19
0.6	1.02 ± 0.18
0.7	1.08 ± 0.20
0.8	1.18 ± 0.22
0.9	1.33 ± 0.24

0.5	0.81 ± 0.16
0.6	0.87 ± 0.17
0.7	0.93 ± 0.18
0.8	1.01 ± 0.19
0.9	1.13 ± 0.21

0.5	4.08 ± 0.69
0.6	4.50 ± 0.77
0.7	4.84 ± 0.84
0.8	5.30 ± 0.86
0.9	6.02 ± 0.89

γ = 2

0.5	0.96 ± 0.17
0.6	1.04 ± 0.19
0.7	1.10 ± 0.20
0.8	1.20 ± 0.21
0.9	1.37 ± 0.24

0.5	0.64 ± 0.09
0.6	0.67 ± 0.10
0.7	0.70 ± 0.11
0.8	0.74 ± 0.11
0.9	0.80 ± 0.12

0.5	0.98 ± 0.19
0.6	1.04 ± 0.21
0.7	1.12 ± 0.21
0.8	1.24 ± 0.23
0.9	1.39 ± 0.26

0.5	0.68 ± 0.09
0.6	0.73 ± 0.09
0.7	0.78 ± 0.10
0.8	0.86 ± 0.11
0.9	0.97 ± 0.12

0.5	3.24 ± 0.61
0.6	3.51 ± 0.59
0.7	3.77 ± 0.58
0.8	4.20 ± 0.69
0.9	4.69 ± 0.72

γ = 5

0.5	0.81 ± 0.15
0.6	0.87 ± 0.15
0.7	0.93 ± 0.15
0.8	1.01 ± 0.17
0.9	1.15 ± 0.18

0.5	0.52 ± 0.07
0.6	0.54 ± 0.07
0.7	0.57 ± 0.07
0.8	0.60 ± 0.07
0.9	0.66 ± 0.08

0.5	0.71 ± 0.10
0.6	0.76 ± 0.10
0.7	0.82 ± 0.11
0.8	0.88 ± 0.12
0.9	1.02 ± 0.14

0.5	0.65 ± 0.12
0.6	0.69 ± 0.12
0.7	0.74 ± 0.13
0.8	0.80 ± 0.14
0.9	0.91 ± 0.16

0.5	1.79 ± 0.29
0.6	1.96 ± 0.29
0.7	2.12 ± 0.32
0.8	2.32 ± 0.35
0.9	2.65 ± 0.43

γ = 10

0.5	0.50 ± 0.09
0.6	0.53 ± 0.09
0.7	0.57 ± 0.09
0.8	0.63 ± 0.11
0.9	0.71 ± 0.11

0.5	0.45 ± 0.07
0.6	0.48 ± 0.07
0.7	0.50 ± 0.07
0.8	0.53 ± 0.07
0.9	0.57 ± 0.08

0.5	0.66 ± 0.11
0.6	0.70 ± 0.12
0.7	0.76 ± 0.13
0.8	0.83 ± 0.15
0.9	0.95 ± 0.16

0.5	0.43 ± 0.06
0.6	0.47 ± 0.07
0.7	0.50 ± 0.07
0.8	0.55 ± 0.08
0.9	0.62 ± 0.08

0.5	1.63 ± 0.33
0.6	1.75 ± 0.34
0.7	1.89 ± 0.36
0.8	2.05 ± 0.37
0.9	2.37 ± 0.39

γ = 20

0.5	0.39 ± 0.05
0.6	0.41 ± 0.05
0.7	0.44 ± 0.06
0.8	0.48 ± 0.07
0.9	0.54 ± 0.07

0.5	0.35 ± 0.05
0.6	0.37 ± 0.06
0.7	0.39 ± 0.06
0.8	0.41 ± 0.06
0.9	0.45 ± 0.07

0.5	0.38 ± 0.05
0.6	0.41 ± 0.05
0.7	0.45 ± 0.06
0.8	0.50 ± 0.08
0.9	0.56 ± 0.09

0.5	0.33 ± 0.05
0.6	0.36 ± 0.06
0.7	0.38 ± 0.06
0.8	0.41 ± 0.07
0.9	0.47 ± 0.08

0.5	1.09 ± 0.19
0.6	1.19 ± 0.20
0.7	1.29 ± 0.23
0.8	1.44 ± 0.24
0.9	1.62 ± 0.25

α ∈ {0.5, 0.6, 0.7, 0.8, 0.9}

Gelman-Rubin diagnostic

γ = 1

0.2	0.36 ± 0.50
0.1	2.39 ± 1.89
0.05	2.76 ± 2.79 (p=0.20)
0.02	1.50 ± 1.95 (p=0.04)
0.01	ncv (> 6.00)

0.2	0.26 ± 0.29
0.1	1.46 ± 0.81
0.05	3.08 ± 1.00 (p=0.52)
0.02	ncv (> 4.29)
0.01	ncv (> 4.29)

0.2	0.20 ± 0.26
0.1	1.05 ± 1.07 (p=0.98)
0.05	3.15 ± 1.76 (p=0.42)
0.02	ncv (> 6.00)
0.01	ncv (> 6.00)

0.2	1.17 ± 1.08
0.1	3.99 ± 1.72 (p=0.98)
0.05	4.22 ± 2.68 (p=0.08)
0.02	ncv (> 6.82)
0.01	ncv (> 6.82)

0.2	6.99 ± 6.60
0.1	23.60 ± 6.63 (p=0.96)
0.05	31.88 ± 2.62 (p=0.08)
0.02	ncv (> 37.50)
0.01	ncv (> 37.50)

γ = 2

0.2	0.65 ± 0.95
0.1	3.92 ± 1.92 (p=0.94)
0.05	6.93 ± 0.74 (p=0.08)
0.02	ncv (> 7.50)
0.01	ncv (> 7.50)

0.2	0.45 ± 0.42
0.1	1.55 ± 0.73
0.05	3.16 ± 0.84 (p=0.60)
0.02	ncv (> 4.17)
0.01	ncv (> 4.17)

0.2	0.22 ± 0.17
0.1	1.80 ± 1.61
0.05	4.59 ± 2.23 (p=0.32)
0.02	ncv (> 6.52)
0.01	ncv (> 6.52)

0.2	1.44 ± 0.96
0.1	3.77 ± 1.22 (p=0.92)
0.05	5.50 ± NaN (p=0.02)
0.02	ncv (> 5.77)
0.01	ncv (> 5.77)

0.2	8.48 ± 5.29
0.1	21.42 ± 4.01 (p=0.90)
0.05	28.20 ± NaN (p=0.02)
0.02	ncv (> 30.00)
0.01	ncv (> 30.00)

γ = 5

0.2	0.81 ± 0.90
0.1	4.17 ± 1.52 (p=0.80)
0.05	ncv (> 6.25)
0.02	ncv (> 6.25)
0.01	ncv (> 6.25)

0.2	0.59 ± 0.44
0.1	1.78 ± 0.49
0.05	2.58 ± 0.41 (p=0.30)
0.02	ncv (> 3.19)
0.01	ncv (> 3.19)

0.2	0.30 ± 0.22
0.1	2.18 ± 1.25 (p=0.96)
0.05	3.47 ± 0.66 (p=0.04)
0.02	ncv (> 4.69)
0.01	ncv (> 4.69)

0.2	1.83 ± 1.14
0.1	4.27 ± 1.29 (p=0.60)
0.05	ncv (> 5.36)
0.02	ncv (> 5.36)
0.01	ncv (> 5.36)

0.2	7.11 ± 2.70
0.1	13.21 ± 2.36 (p=0.58)
0.05	ncv (> 16.67)
0.02	ncv (> 16.67)
0.01	ncv (> 16.67)

γ = 10

0.2	0.96 ± 0.82 (p=0.98)
0.1	2.96 ± 0.71 (p=0.52)
0.05	ncv (> 3.75)
0.02	ncv (> 3.75)
0.01	ncv (> 3.75)

0.2	0.54 ± 0.46
0.1	1.62 ± 0.49 (p=0.90)
0.05	1.53 ± 1.29 (p=0.06)
0.02	ncv (> 2.54)
0.01	ncv (> 2.54)

0.2	0.31 ± 0.19
0.1	2.11 ± 1.12 (p=0.70)
0.05	2.42 ± 0.27 (p=0.04)
0.02	ncv (> 4.17)
0.01	ncv (> 4.17)

0.2	1.72 ± 0.94
0.1	3.06 ± 0.40 (p=0.20)
0.05	ncv (> 3.57)
0.02	ncv (> 3.57)
0.01	ncv (> 3.57)

0.2	8.41 ± 2.81 (p=0.98)
0.1	12.68 ± 1.74 (p=0.16)
0.05	ncv (> 15.00)
0.02	ncv (> 15.00)
0.01	ncv (> 15.00)

γ = 20

0.2	0.76 ± 0.74 (p=0.96)
0.1	1.15 ± 1.41 (p=0.04)
0.05	ncv (> 2.78)
0.02	ncv (> 2.78)
0.01	ncv (> 2.78)

0.2	0.48 ± 0.42 (p=0.98)
0.1	1.39 ± 0.21 (p=0.36)
0.05	ncv (> 1.76)
0.02	ncv (> 1.76)
0.01	ncv (> 1.76)

0.2	0.33 ± 0.27 (p=0.98)
0.1	1.38 ± 0.62 (p=0.44)
0.05	ncv (> 2.42)
0.02	ncv (> 2.42)
0.01	ncv (> 2.42)

0.2	1.58 ± 0.83 (p=0.70)
0.1	ncv (> 2.54)
0.05	ncv (> 2.54)
0.02	ncv (> 2.54)
0.01	ncv (> 2.54)

0.2	7.01 ± 2.02 (p=0.76)
0.1	9.87 ± NaN (p=0.02)
0.05	ncv (> 10.00)
0.02	ncv (> 10.00)
0.01	ncv (> 10.00)

α ∈ {0.2, 0.1, 0.05, 0.02, 0.01}

Estimators of \(T_{\mathrm{corr}}(\alpha)/t_{\mathop{\mathrm{corr}}}\), where \(t_{\mathop{\mathrm{corr}}}\) is a decorrelation time for \(\xi_\star\eta_t^X\) (\(\xi\in\{\phi,\psi\}\)) estimated using many Fleming-Viot sample trajectories

MD results: bias

RNN diagnostic

γ = 1

0.5	0.06 ± 0.02
0.6	0.06 ± 0.02
0.7	0.05 ± 0.02
0.8	0.05 ± 0.02
0.9	0.03 ± 0.01

0.5	0.16 ± 0.04
0.6	0.15 ± 0.04
0.7	0.13 ± 0.03
0.8	0.11 ± 0.03
0.9	0.09 ± 0.03

0.5	0.07 ± 0.02
0.6	0.06 ± 0.02
0.7	0.05 ± 0.02
0.8	0.05 ± 0.01
0.9	0.04 ± 0.01

0.5	0.09 ± 0.03
0.6	0.08 ± 0.03
0.7	0.07 ± 0.02
0.8	0.06 ± 0.02
0.9	0.04 ± 0.02

0.5	0.02 ± 0.00
0.6	0.02 ± 0.00
0.7	0.02 ± 0.00
0.8	0.02 ± 0.00
0.9	0.02 ± 0.00

γ = 2

0.5	0.07 ± 0.02
0.6	0.06 ± 0.02
0.7	0.05 ± 0.02
0.8	0.04 ± 0.02
0.9	0.03 ± 0.01

0.5	0.17 ± 0.06
0.6	0.16 ± 0.06
0.7	0.14 ± 0.05
0.8	0.13 ± 0.05
0.9	0.10 ± 0.04

0.5	0.05 ± 0.02
0.6	0.05 ± 0.02
0.7	0.04 ± 0.02
0.8	0.03 ± 0.01
0.9	0.02 ± 0.01

0.5	0.09 ± 0.02
0.6	0.08 ± 0.02
0.7	0.07 ± 0.02
0.8	0.06 ± 0.02
0.9	0.05 ± 0.01

0.5	0.02 ± 0.00
0.6	0.02 ± 0.00
0.7	0.02 ± 0.00
0.8	0.02 ± 0.00
0.9	0.02 ± 0.00

γ = 5

0.5	0.09 ± 0.03
0.6	0.08 ± 0.03
0.7	0.07 ± 0.03
0.8	0.06 ± 0.02
0.9	0.04 ± 0.02

0.5	0.23 ± 0.05
0.6	0.21 ± 0.05
0.7	0.19 ± 0.05
0.8	0.17 ± 0.05
0.9	0.14 ± 0.04

0.5	0.08 ± 0.02
0.6	0.08 ± 0.01
0.7	0.07 ± 0.01
0.8	0.06 ± 0.01
0.9	0.05 ± 0.01

0.5	0.12 ± 0.03
0.6	0.11 ± 0.03
0.7	0.10 ± 0.03
0.8	0.09 ± 0.03
0.9	0.07 ± 0.02

0.5	0.03 ± 0.00
0.6	0.03 ± 0.00
0.7	0.03 ± 0.00
0.8	0.02 ± 0.00
0.9	0.02 ± 0.00

γ = 10

0.5	0.14 ± 0.03
0.6	0.12 ± 0.03
0.7	0.11 ± 0.02
0.8	0.10 ± 0.03
0.9	0.08 ± 0.02

0.5	0.31 ± 0.07
0.6	0.29 ± 0.06
0.7	0.27 ± 0.06
0.8	0.24 ± 0.06
0.9	0.21 ± 0.06

0.5	0.13 ± 0.04
0.6	0.11 ± 0.04
0.7	0.09 ± 0.03
0.8	0.08 ± 0.03
0.9	0.06 ± 0.02

0.5	0.15 ± 0.03
0.6	0.14 ± 0.03
0.7	0.13 ± 0.02
0.8	0.11 ± 0.02
0.9	0.09 ± 0.02

0.5	0.03 ± 0.01
0.6	0.03 ± 0.01
0.7	0.03 ± 0.01
0.8	0.03 ± 0.01
0.9	0.02 ± 0.00

γ = 20

0.5	0.20 ± 0.03
0.6	0.18 ± 0.02
0.7	0.17 ± 0.03
0.8	0.16 ± 0.02
0.9	0.14 ± 0.02

0.5	0.43 ± 0.07
0.6	0.41 ± 0.07
0.7	0.38 ± 0.07
0.8	0.35 ± 0.06
0.9	0.32 ± 0.06

0.5	0.22 ± 0.04
0.6	0.20 ± 0.03
0.7	0.18 ± 0.03
0.8	0.16 ± 0.03
0.9	0.13 ± 0.03

0.5	0.24 ± 0.03
0.6	0.23 ± 0.03
0.7	0.22 ± 0.03
0.8	0.20 ± 0.04
0.9	0.17 ± 0.04

0.5	0.05 ± 0.00
0.6	0.05 ± 0.00
0.7	0.04 ± 0.00
0.8	0.04 ± 0.00
0.9	0.04 ± 0.00

α ∈ {0.5, 0.6, 0.7, 0.8, 0.9}

Gelman-Rubin diagnostic

γ = 1

0.2	0.35 ± 0.17
0.1	0.16 ± 0.20
0.05	0.23 ± 0.23 (p=0.20)
0.02	0.23 ± 0.31 (p=0.04)
0.01	ncv

0.2	0.65 ± 0.31
0.1	0.16 ± 0.31
0.05	0.08 ± 0.23 (p=0.52)
0.02	ncv
0.01	ncv

0.2	0.58 ± 0.18
0.1	0.22 ± 0.22 (p=0.98)
0.05	0.08 ± 0.20 (p=0.42)
0.02	ncv
0.01	ncv

0.2	0.23 ± 0.22
0.1	0.07 ± 0.14 (p=0.98)
0.05	0.10 ± 0.17 (p=0.08)
0.02	ncv
0.01	ncv

0.2	0.03 ± 0.02
0.1	0.02 ± 0.01 (p=0.96)
0.05	0.01 ± 0.00 (p=0.08)
0.02	ncv
0.01	ncv

γ = 2

0.2	0.31 ± 0.17
0.1	0.07 ± 0.14 (p=0.94)
0.05	0.01 ± 0.00 (p=0.08)
0.02	ncv
0.01	ncv

0.2	0.52 ± 0.38
0.1	0.12 ± 0.26
0.05	0.05 ± 0.16 (p=0.60)
0.02	ncv
0.01	ncv

0.2	0.53 ± 0.18
0.1	0.12 ± 0.18
0.05	0.06 ± 0.14 (p=0.32)
0.02	ncv
0.01	ncv

0.2	0.15 ± 0.19
0.1	0.04 ± 0.09 (p=0.92)
0.05	0.02 ± NaN (p=0.02)
0.02	ncv
0.01	ncv

0.2	0.03 ± 0.02
0.1	0.02 ± 0.00 (p=0.90)
0.05	0.01 ± NaN (p=0.02)
0.02	ncv
0.01	ncv

γ = 5

0.2	0.26 ± 0.20
0.1	0.03 ± 0.06 (p=0.80)
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.38 ± 0.38
0.1	0.02 ± 0.01
0.05	0.02 ± 0.00 (p=0.30)
0.02	ncv
0.01	ncv

0.2	0.39 ± 0.20
0.1	0.05 ± 0.07 (p=0.96)
0.05	0.01 ± 0.00 (p=0.04)
0.02	ncv
0.01	ncv

0.2	0.12 ± 0.19
0.1	0.04 ± 0.11 (p=0.60)
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.02 ± 0.01
0.1	0.01 ± 0.00 (p=0.58)
0.05	ncv
0.02	ncv
0.01	ncv

γ = 10

0.2	0.19 ± 0.21 (p=0.98)
0.1	0.02 ± 0.00 (p=0.52)
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.44 ± 0.41
0.1	0.06 ± 0.19 (p=0.90)
0.05	0.33 ± 0.54 (p=0.06)
0.02	ncv
0.01	ncv

0.2	0.43 ± 0.22
0.1	0.04 ± 0.05 (p=0.70)
0.05	0.02 ± 0.00 (p=0.04)
0.02	ncv
0.01	ncv

0.2	0.09 ± 0.15
0.1	0.02 ± 0.00 (p=0.20)
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.02 ± 0.01 (p=0.98)
0.1	0.02 ± 0.00 (p=0.16)
0.05	ncv
0.02	ncv
0.01	ncv

γ = 20

0.2	0.29 ± 0.28 (p=0.96)
0.1	0.23 ± 0.30 (p=0.04)
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.47 ± 0.39 (p=0.98)
0.1	0.02 ± 0.01 (p=0.36)
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.36 ± 0.23 (p=0.98)
0.1	0.05 ± 0.04 (p=0.44)
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.12 ± 0.21 (p=0.70)
0.1	ncv
0.05	ncv
0.02	ncv
0.01	ncv

0.2	0.02 ± 0.01 (p=0.76)
0.1	0.01 ± NaN (p=0.02)
0.05	ncv
0.02	ncv
0.01	ncv

α ∈ {0.2, 0.1, 0.05, 0.02, 0.01}

Estimators of \(\mathrm{d}_{\mathrm{TV}}\left(\xi_\star\nu^X,\xi_\star\nu_{T_{\mathrm{corr}}}^X\right)\) for \(\xi\in\{\phi,\psi\}\)

Thank you !

Future steps

Generalize the theoretical bounds to (hypo)elliptic diffusions
Train diagnostics for 2D/3D CVs
Get serious about calibration of synthetic ensemble / benchmark on more serious systems
Use transfer-operator learning (TICA, shallow learning a la¹…) to parametrize \(\xi\) locally/on the fly

References

B., N. “Learning Quasi-Stationarity Diagnostics with Recurrent Neural Networks,” 2026.

Benaı̈m, M., Champagnat, N., Oçafrain, W., and Villemonais, D. “Degenerate Processes Killed at the Boundary of a Domain.” The Annals of Probability 53, no. 2 (2025): 720–52.

Binder, A., Lelièvre, T., and Simpson, G. “A Generalized Parallel Replica Dynamics.” Journal of Computational Physics 284 (2015): 595–616.

Blassel, N., Lelièvre, T., and Stoltz, G. “Quantitative Spectral Asymptotics for Reversible Diffusion in Temperature-Dependent Domains.” Preprint arXiv:2501.16082., 2025.

Brooks, S. P., and Gelman, A. “General Methods for Monitoring Convergence of Iterative Simulations.” Journal of Computational and Graphical Statistics 7, no. 4 (1998): 434–55.

Helffer, B., and Nier, F. “Quantitative Analysis of Metastability in Reversible Diffusion Processes via a Witten Complex Approach: The Case with Boundary.” Matemática Contemporânea 26 (2004): 41–85.

Journel, L., and Monmarché, P. “Uniform Convergence of the Fleming–Viot Process in a Hard Killing Metastable Case.” The Annals of Applied Probability 35, no. 2 (2025): 1019–48.

Journel, L., and Rousset, M. “The Particle Approximation of Quasi-Stationary Distributions. Part I: Concentration Bounds in the Uniform Case.” Preprint arXiv:2412.15820, 2024.

Lagardère, L., Jolly, L. H., Lipparini, F., Aviat, F., Stamm, B., Jing, Z. F., Harger, M., Torabifard, H., Cisneros, G. A., and Schnieders, M. J. “Tinker-HP: A Massively Parallel Molecular Dynamics Package for Multiscale Simulations of Large Complex Systems with Advanced Point Dipole Polarizable Force Fields.” Chemical Science 9, no. 4 (2018): 956–72.

Le Peutrec, D., and Nectoux, B. “Small Eigenvalues of the Witten Laplacian with Dirichlet Boundary Conditions: The Case with Critical Points on the Boundary.” Analysis & PDE 14, no. 8 (2021): 2595–2651.

Legoll, F., and Lelièvre, T. “Effective Dynamics Using Conditional Expectations.” Nonlinearity 23, no. 9 (2010): 21–31.

Méléard, S., and Villemonais, D. “Quasi-Stationary Distributions and Population Processes.” Probability Surveys 9 (2012): 340–410.

Perez, D., Uberuaga, B. P., and Voter, A. F. “The Parallel Replica Dynamics Method – Coming of Age.” Computational Materials Science 100, no. B (2015): 90–103.

Stoll, D., Mallik, N., Schrodi, S., Bergmann, E., Janowski, M., Garibov, S., T., Abou C., et al. “Neural Pipeline Search (NePS).” Github.com/Automl/Neps, no. 0.12.2 (2024).

Tabish, M., Leimkuhler, B., and Klus, S. “How Deep Is Your Network? Deep Vs. Shallow Learning of Transfer Operators.” Preprint arXiv:2509.19930, 2025.

Voter, A. F. “Parallel Replica Method for Dynamics of Infrequent Events.” Physical Review B 57, no. 22 (1998): 9599–9606.

Zhang, W., Hartmann, C., and Schütte, C. “Effective Dynamics Along Given Reaction Coordinates, and Reaction Rate Theory.” Faraday Discussions 195 (2016): 365–94.