How a Simple Idea Revolutionized Our View of Protein Folding
In the intricate dance of life, proteins fold with breathtaking precision, and scientists have found a key to decoding their steps.
You are a masterpiece of molecular engineering. Every second, inside each of your cells, millions of tiny machines called proteins are hard at work. They digest your food, fire your neurons, and defend you from disease. Each one begins as a simple string of molecules, like a disordered bead necklace, and within milliseconds it spontaneously folds into a perfect, complex three-dimensional shape. This shape determines its function. For decades, how a simple chain transforms into a working machine—a process known as the protein folding problem—was one of biology's greatest mysteries.
While AI systems like AlphaFold have recently stunned the world by predicting final protein structures, a different class of models, known as native-structure-based or Gō-type models, has quietly solved another crucial puzzle: how do proteins actually fold? These elegant simulations, built on simple physical principles, allow scientists to watch the folding process in action, revealing the intricate pathways and hidden intermediate states that bring proteins to life 3 6 .
Visualization of a protein finding its native structure
The story begins in 1975 with a scientist named Nobuhiro Gō. He and his colleagues asked a revolutionary question: What if the final, functional shape of a protein—its "native structure"—is also the key to understanding how it folds?
They created a simple computer model where each part of the protein chain was represented as a bead on a grid. The crucial innovation was that they programmed the beads to attract each other only if they were neighbors in the final, folded structure 3 . This was a "structure-based" model. The results were striking. While models with weak or generic interactions failed to fold correctly, Gō's model, with its strong bias toward the native shape, consistently found the correct structure. It was the first demonstration that by focusing exclusively on the destination, you could reliably chart the journey.
The theoretical underpinning for this success is known as the "principle of minimal frustration." Imagine a protein's folding journey as a ball rolling down a landscape. A "frustrated" landscape is like a rugged valley full of pits and dead ends where the ball can get stuck. A "minimally frustrated" landscape, however, is a smooth, funnel-shaped slope. No matter where the ball starts, gravity pulls it directly to the bottom—the native state. Gō-type models are computational realizations of this perfect funnel, where every interaction guides the protein toward its correct form 5 .
The original Gō model was simplistic, but it planted a seed that would grow into a formidable toolkit for computational biologists.
Scientists soon moved the beads off the rigid grid, creating more realistic "off-lattice" models where the protein chain could move freely in space. These coarse-grained models, often using one bead per amino acid, allowed researchers to simulate the folding of real proteins and study their dynamic fluctuations 3 . Eventually, even more detailed all-atom Gō models were developed, offering near-atomic resolution 5 .
The true power of Gō models was their adaptability. Researchers realized that if they could simulate folding to a single structure, they could also model movement between multiple known structures. This opened the door to simulating critical biological processes like allostery (regulation at a distance) and the induced fit of a protein binding to a drug 5 .
A major theoretical advancement came with the Wako-Saitô-Muñoz-Eaton (WSME) model, which treated each amino acid as a two-state system (folded or unfolded). This model brilliantly predicted folding pathways for single-domain proteins but hit a wall with larger, multi-domain proteins. The problem was its inability to account for "non-local" interactions—contacts between segments far apart in the chain that come together early in folding. The solution, recently introduced, is the WSME-L model ("L" for linker). It introduces virtual linkers that allow these critical long-range interactions, enabling accurate prediction of complex, multi-domain protein folding for the first time 6 .
Serves as the "attractor" and blueprint for defining favorable interactions in the model. Often from Protein Data Bank (PDB).
A highly optimized, free software package used to run the molecular dynamics simulations for many Gō-type models.
A web server that automatically generates the necessary input files for structure-based simulations within GROMACS.
A specialized Gō-type approach for simulating the assembly of large multi-protein complexes from their individual subunits.
A recent analytical model that introduces virtual linkers to simulate the folding of complex, multi-domain proteins.
As deep learning systems like AlphaFold 3 (AF3) and RoseTTAFold All-Atom (RFAA) began predicting protein-ligand (protein-small molecule) complexes with astonishing accuracy, a critical question emerged: Are these AI models learning the true physics of molecular interactions, or are they just expert pattern-matchers from their training data? A compelling 2025 study put them to the test using the logic of Gō-type models 1 .
The researchers designed a series of "adversarial examples" based on fundamental physical and biological principles. They used the structure of Cyclin-dependent kinase 2 (CDK2) bound to its ligand, ATP. In a wild-type simulation, all AI models predicted this complex accurately. The challenge came with three biologically informed mutations 1 :
The results revealed a significant flaw. Instead of responding to these drastic changes by displacing the ligand—as the laws of physics and chemistry would dictate—the AI models persisted in placing ATP in the original, now non-existent binding site 1 .
This experiment demonstrated that these advanced AI co-folding models are heavily biased toward the patterns in their training data. They lack a genuine understanding of physical forces, making them prone to errors when faced with novel situations not seen during training. This highlights the complementary role of physics-based models like Gō, which are built from the ground up to obey physical principles, even if they rely on a known starting structure 1 .
| Model | Wild-Type (RMSD in Å) | All-Glycine Mutant | All-Phenylalanine Mutant | Dissimilar Mutations |
|---|---|---|---|---|
| AlphaFold 3 | 0.2 Å (Excellent) | Ligand misplaced | Ligand misplaced | Ligand misplaced |
| RoseTTAFold All-Atom | 2.2 Å (Good) | Ligand misplaced | Ligand misplaced | Ligand misplaced |
| Chai-1 | Similar to native | Ligand misplaced | Ligand misplaced | Ligand misplaced |
| Boltz-1 | Similar to native | Ligand misplaced | Slight pose adjustment | Ligand misplaced |
Table Caption: Root-mean-square deviation (RMSD) measures the difference between the predicted and actual ligand position; a lower value is better. "Misplaced" indicates the ligand was incorrectly predicted to remain in the original binding site despite disruptive mutations.
| Feature | Gō-Type Models | All-Atom Molecular Dynamics | AI Prediction (e.g., AlphaFold) |
|---|---|---|---|
| Primary Goal | Understand folding pathways & dynamics | Observe atomic-level physics & kinetics | Predict final 3D structure from sequence |
| Computational Cost | Low | Extremely High | Moderate to High |
| Underlying Principle | Principle of minimal frustration | Newton's laws of motion | Pattern recognition in databases |
| Strength | Provides mechanistic insight; high speed | Highest physical accuracy | High accuracy for static structures |
| Limitation | Requires a known native structure | Limited to short timescales | Limited understanding of folding physics |
Gō-type models continue to evolve. The new GoCa model is extending this framework to simulate one of biology's most complex processes: the assembly of massive multi-protein machines, like the proteasome, from dozens of individual subunits 4 . This allows scientists to ask not just how a protein folds, but how a cellular factory is built.
In the post-AlphaFold era, the role of physics-based models is more relevant than ever. While AI provides a stunning static picture, Gō models offer the dynamic movie, revealing the mechanics of how that picture comes to be. They remind us that true understanding in science often comes not just from predicting an outcome, but from comprehending the process that creates it 7 .
The journey that began with a simple lattice model has given us a fundamental truth: the architecture of a protein is not just a static image, but a narrative of motion and change, a story that we are now fully equipped to read.