Project

Ophiuchus: Deep Neural Modeling of Protein Structures from First-Principles of Symmetry

Allan dos Santos Costa

Groups

Media Lab Research Theme: Life with AI

We recently introduced Ophiuchus, a novel deep learning architecture for modeling protein structures. Ophiuchus is the first autoencoder of its kind, learning to represent all-atom proteins as coarse, compact geometric encodings.

In our work, we show how the compact and geometric Ophiuchus embeddings enable efficient protein modeling, inference and design.

Our model uses irreducible representations of SO(3) to represent protein features. This method allows the protein sequence, backbone positions and side-chain atom positions to be jointly encoded in a unified geometric embedding.

To capture a protein globally, we reduce these features and coordinates into increasingly coarser geometric embeddings:

By learning a decoder to reconstruct proteins from geometric embeddings, we can train an autoencoder to learn compact bottleneck representations of full proteins:

In our paper, we showed this geometric latent space to be well behaved for inference and sampling, with demonstrations of high-dimensional arithmetics through conformational latent interpolation, and of efficient generative sampling through latent denoising diffusion models. Our generative model is the first to generate large-scale protein structures at the speed of the blink of an eye.

Ophiuchus was jointly developed with the Atomic Architects group.

Stay tuned for more!