Our major aim in the scoring project was to update the next generation about the plasticity of protein structures in a comprehensive and reproducible manner on one hand and to accumulate basic data for exploring the nature of life in the other. Every living organism is made up of water, carbohydrates, nucleic acids, lipids, and proteins. Polypeptides composed of 20 different types of amino acids, which form the chemical entity of proteins, make the earth rich in diverse living species.
The variety of polypeptides is enormous; for a stretch of 20 amino acids, the possible number of unique polypeptides is approximately 2020 or 1025, which exceeds the estimated number of stars in the whole universe. With such diversity, the actual length of proteins in biological cells is usually longer than 100 amino acids.
A randomly-generated polypeptide would rarely “fold.” Therefore, it is amazing that genetically-coded proteins have existed as folded entities over the span of evolution. Moreover, many of these proteins can also change their conformation.
The three-dimensional atomic arrangement in a protein or protein-complex is most commonly explored by X-ray crystallography. Such studies, boosted by several structural genomics projects worldwide, have resulted in the annual increase of approximately 11,000 data in the Protein Data Bank. Despite a substantial number of targets that still need to be identified, accumulation of multiple structures for a given protein enables systematic and reproducible examination of the “morphness,” i.e. variability of the structure.
Our scoring method attempted to provide each protein with its comprehensive “unmorphness factor (umf),” previously referred to as the average score, which becomes refined as the experimental data accumulates. The physicochemical parameters of a protein, such as isoelectric point, hydrophobicity index, etc., are mostly based on the amino acid sequences alone and are rarely refined.
Cα is the central atom in each amino acid, and its position can thus be utilized to evenly sample the molecular space of a protein. A conventional way to estimate structural alterations of a protein is with the averaged root-mean-square deviations of the Cα positions for a superimposed pair. On the other hand, the umf of a protein is defined by using multiple coordinates, obtained from different crystallographic conditions, and averaging the inverse of the coefficient of variation (average divided by the standard deviation) for all intramolecular Cα – Cα distances. The umf is high for “rigid” proteins and low for “dynamic” ones, and ranges from several hundreds to a few tens. As the proportion of disordered content increases in a polypeptide, the score approaches zero. Such a confined range of the umfs suggests how living organisms are balanced to have a dynamic repertoire of proteins.
Neither an ensemble of all-rigid proteins, nor that of all-dynamic proteins, can make up a living organism that needs to replicate using a variety of dynamic processes on one hand, while also maintaining a certain range of physical volume and activity at the cellular level on the other hand. Thus, it can be presumed that the total “morphness” of all proteins comprising a cell has to be balanced, except for that in cells exhibiting uncontrolled behavior as in cancer.
So far, it has not been possible to obtain such insights on any living species. For a recently-developed bacteria having minimal genome, the unmorphness factor of fewer than 500 proteins could provide the first view in this respect. Although many of these proteins are of unknown structure and function at this moment, it implies that the future generations will see an even more complicated life relying on the dynamic repertoire of proteins.
These findings are described in the article entitled Evaluation of variability in high-resolution protein structures by global distance scoring, recently published in the journal Heliyon. This work was conducted by Risa Anzai, Yoshiki Asami, Waka Inoue, Hina Ueno, Koya Yamada, and Tetsuji Okada from Gakushuin University.