Project

Accelerating Protein Dynamics Simulation Towards Folding Timescales

Allan dos Santos Costa

Groups

Media Lab Research Theme: Connected Mind + Body

Proteins form the clockwork behind biological processes, acting as molecular machines whose 3D shape changes drive essential cellular functions, such as signaling, transport and catalysis. Understanding this dynamical behavior is essential for advancing biological discovery and enabling rational drug design.

Molecular Dynamics (MD) simulation provides a powerful approach to study protein motion at atomic resolution. Still, the small time steps required for first-principles simulation severely limit our ability to study proteins over biologically meaningful timescales, creating a significant bottleneck in drug development.

Instead, recent deep learning approaches offer a promising alternative in which neural networks generate protein dynamics with longer time steps by learning from precomputed trajectory data. As this nascent approach matures, it opens uncharted frontiers in determining how fast we can push protein simulations without sacrificing physically meaningful accuracy, and how effectively we can generalize across proteins and extend predictions from short to long timescales.

In our most recent work, we investigate these frontiers with DeepJump, a generative euclidean-equivariant model for simulating protein conformational transitions through large time stepping. We train our model on the mdCATH dataset, which provides structural diversity across short trajectories, then evaluate its capacity to reproduce long-timescale conformational changes and folding of the fast-folder proteins.

We validate the performance of our model by simulating the fast-folders across their phase space, benchmarking their predicted free energies against reference calculations. We find that DeepJump can successfully recover fast-folder dynamics with high accuracy, while presenting a bias toward compact conformations near the native crystal structure akin to its training regime.

To challenge our model with a practical test, we evaluate its performance on ab initio folding, where starting from an extended state we demonstrate that the learned simulator can reach the crystal basin via a valid pathway:

We observe that the generated simulations quickly fall into a globular state, slowly transitioning through specific basins before reaching the crystal. We find that while some replicas and model configurations fail to fold within tested timescales, for all proteins we successfully generate folding trajectories with orders-of-magnitude acceleration over classical methods.

Our work with DeepJump demonstrates the potential of deep learning approaches to bridge the timescale gap that has long constrained protein simulation. By learning from shorter, yet structurally diverse trajectory snapshots, our model successfully generalizes and accelerates protein simulation across extended timescales, opening new venues for studying processes previously beyond computational reach. We hope this work may serve as an important milestone on the path towards accessible, routine and high-throughput simulation of protein dynamics.