Proteins form the clockwork behind biological processes, acting as molecular machines whose 3D shape changes drive essential cellular functions, such as signaling, transport and catalysis. Understanding this dynamical behavior is essential for advancing biological discovery and enabling rational drug design.
Molecular Dynamics (MD) simulation provides a powerful approach to study protein motion at atomic resolution. Still, the small time steps required for first-principles simulation severely limit our ability to study proteins over biologically meaningful timescales, creating a significant bottleneck in drug development.
Instead, recent deep learning approaches offer a promising alternative in which neural networks generate protein dynamics with longer time steps by learning from precomputed trajectory data. As this nascent approach matures, it opens uncharted frontiers in determining how fast we can push protein simulations without sacrificing physically meaningful accuracy, and how effectively we can generalize across proteins and extend predictions from short to long timescales.
In our most recent work, we investigate these frontiers with DeepJump, a generative euclidean-equivariant model for simulating protein conformational transitions through large time stepping. We train our model on the mdCATH dataset, which provides structural diversity across short trajectories, then evaluate its capacity to reproduce long-timescale conformational changes and folding of the fast-folder proteins.