[2] De novo methods, a term first coined by William DeGrado[3], tend to require vast computational resources, and have thus only been carried out for relatively small proteins.Research into de novo structure prediction has been primarily focused into three areas: alternate lower-resolution representations of proteins, accurate energy functions, and efficient sampling methods.A general paradigm for de novo prediction involves sampling conformation space, guided by scoring functions and other sequence-dependent biases such that a large set of candidate (“decoy") structures are generated.Second, several different human diseases, such as Duchenne muscular dystrophy, can be linked to loss of protein function resulting from a change in just a single amino acid in the primary sequence.One of the strongest lines of evidence for the supposition that all the relevant information needed to encode protein tertiary structure is found in the primary sequence was demonstrated in the 1950s by Christian Anfinsen.[10] By developing the QUARK program, Xu and Zhang showed that ab initio structure of some proteins can be successfully constructed through a knowledge-based force field .However, below this threshold three other classes of strategy are used to determine possible structure from an initial model: ab initio protein prediction, fold recognition, and threading.For example, a distributed method was utilized by a team of researchers at the University of Washington and the Howard Hughes Medical Institute to predict the tertiary structure of the protein T0283 from its amino acid sequence.Namely, ESMFold is a newly developed large language model (LLM) for the prediction of protein structures based solely on their amino acid sequences.In the CASP experiments, research groups are invited to apply their prediction methods to amino acid sequences for which the native structure is not known but to be determined and to be published soon.
An example of distributed computing (Rosetta) in predicting the 3D structure of a protein from its amino-acid sequence. The predicted structure (magenta) of a protein is overlaid with the experimentally determined crystal structure (blue) of that protein. The agreement between the two is very good.