Since 1960s, based on the experimental results of reducing and denaturing bovine pancreatic ribonuclease without any other substances, anfinsen put forward the "self-assembly theory" that "the amino acid sequence of polypeptide chain contains all the information necessary to form its thermodynamically stable natural conformation", and people further supplemented and expanded protein's folding theory. Anfinsen's self-assembly thermodynamics hypothesis has been proved by many experiments in vitro. There are indeed many protein that can be reversibly denatured and renaturated in vitro, especially some protein with small molecular weight, but not all protein. Moreover, due to special environmental factors, protein's folding in the body is far from this.
The folding of protein in vivo often requires the participation of other cofactors and is accompanied by the hydrolysis of ATP. Therefore, in 1987, Ellis put forward the "auxiliary assembly theory" of protein folding. This shows that protein's folding is not only a thermodynamic process, but also obviously controlled by dynamics. Based on the phenomenon that some protein with similar amino acid sequences have different folding structures, while others protein with different amino acid sequences have similar structures, some scholars put forward the hypothesis that the secondary structure of mRNA may be used as genetic code, thus affecting the structure of protein. But so far, there is no experimental evidence for this hypothesis, only some pure mathematical arguments [3]. So, how does the amino acid sequence of protein determine its spatial conformation? Researchers have done a lot of excellent work on this issue, but so far, our understanding of protein's folding mechanism is not complete, and there are even some wrong views in some aspects.
A typical research example that has made an important contribution in this field is the study of denaturation and renaturation of bovine pancreatic ribonuclease by C.B. anfinsen group in the United States. Bovine pancreatic ribonuclease contains 124 amino acid residues, which are paired by 8 sulfhydryl groups to form 4 pairs of disulfide bonds. It can be calculated that there are 105 possible ways for 8 sulfhydryl groups in the enzyme molecule to form 4 pairs of disulfide bonds, which provides an index for quantitative estimation of renaturation and recombination. Under mild alkaline conditions, 8 moles of concentrated urea and a large amount of mercaptoethanol can completely reduce four pairs of disulfide bonds, and the whole molecule becomes irregularly curled and the enzyme molecule denatures. Urea is removed by dialysis. In the presence of oxygen, disulfide bonds are re-formed and the enzyme molecules are completely renaturated. The paired sulfhydryl groups in disulfide bonds are the same as those in nature, and the renaturated molecules can be crystallized and have the same X-ray diffraction pattern as natural enzyme crystals, which proves that enzyme molecules can not only spontaneously refold, but also can only choose one of 105 possible disulfide bond pairing methods during renaturation.
Theoretical model folding
Frame model folding
(Frame model)
Frame model [4] assumes that the local conformation of protein depends on the local amino acid sequence. In the initial stage of polypeptide chain folding process, unstable secondary structural units are rapidly formed; It is called "scintillation cluster", and then these secondary structures are in close contact, thus forming a stable secondary structure framework; Finally, the secondary structure frames are spliced with each other, and the peptide chain gradually contracts to form the tertiary structure of the protein. The model holds that even a small molecule protein can be folded part by part, and the subdomain formed between them is an important structure of the folding intermediate.
Hydrophobic folding model folding
(hydrophobic collapse model)
In the hydrophobic folding model [5], the hydrophobic force is considered as the decisive factor in the protein folding process. Before any secondary structure and tertiary structure are formed, a rapid non-specific hydrophobic collapse occurs first.
Mechanism folding
(diffusion-collision-adhesion model)
According to this model, the folding of protein begins at several sites on the extended peptide chain, and unstable secondary structural units or hydrophobic clusters are generated at these sites, which are mainly maintained by the progress of local sequences or the medium-range (3-4 residues) interaction. They diffuse, collide and adhere to each other in the way of nonspecific Brownian motion, which leads to the formation of large structures, thus increasing stability. Further collision forms a spherical structure of quasi-molten spherical intermediate with hydrophobic core and secondary structure. The spherical intermediate is adjusted to a dense, inert and highly ordered molten spherical structure similar to the natural structure. Finally, the inactive and highly ordered molten spherical state is transformed into a complete dynamic natural state.
Growth model folding
(nuclear condensation growth model)
According to this model, a certain region in the peptide chain can form a "folding core", with them as the core, the whole peptide chain continues to fold, and then the natural conformation is obtained. The so-called "crystal nucleus" is actually a network structure similar to natural interaction formed by some special amino acid residues. These residues are not maintained by nonspecific hydrophobic interactions, but are closely packed by specific interactions. The formation of crystal nucleus is the rate-limiting step in the initial stage of folding.
Layout model folding
(jigsaw puzzle model)
The central idea of this model [9] is that polypeptide chains can be folded along many different paths, and in the process of folding along each path, more and more natural structures can be formed, and the folding speed along each path is faster. Compared with the single-channel folding method, the polypeptide chain speed is faster. On the other hand, small changes or mutations in the external physiological and biochemical environment may have a great impact on the single folding path. For the folding mode with multiple paths, these changes may affect one folding path, but will not affect other folding paths, so they will not interfere with the folding of polypeptide chains as a whole, unless the changes caused by these factors are too great to fundamentally affect the folding of polypeptide chains.
Lattice model folding
The lattice model (HP model for short) was first proposed by Dill et al. in 1989. Lattice model can be divided into two kinds: two-dimensional model and three-dimensional model. Two-dimensional grid model is to generate orthogonal grids with unit length in plane space. Each amino acid molecule is placed at the intersection of these grids in sequence, and the adjacent amino acid molecules in the sequence must be adjacent when placed in the grid, that is, the distance between adjacent amino acid molecules in the grid model is 1. However, it should be noted that at most one amino acid molecule can be placed at each intersection in the grid. If an amino acid molecule in the sequence has been placed in this position, then the subsequent amino acid molecules can no longer be placed on this grid. If there is no position of the currently placed amino acid molecule in the process of placing amino acid molecules, the configuration is unreasonable and needs to be repositioned. The three-dimensional mesh model is similar to the two-dimensional mesh model, which is a three-dimensional mesh with unit length generated in three-dimensional space. The placement method of amino acid molecules in the lattice is the same as that in the two-dimensional lattice model, but in the two-dimensional lattice model, except for the first two amino acid molecules in the sequence, there are only three directions to choose from, while in the three-dimensional lattice model, the complexity is much higher and there are five directions to choose from.
Molecular chaperone folding
During the period of 1978, Laskey found that histone and DNA can be assembled into nucleosomes only when acidic proteins exist in the nucleus-cytoplasm, otherwise precipitation will occur. Based on this, Lasky called it "molecular chaperone". Molecular chaperone refers to a kind of protein [10, 1 1], which can bind and stabilize the unstable conformation of another protein, and promote the folding of new polypeptide chains, the assembly or degradation of polymers and the transmembrane transport of organelle protein through controlled binding and release. Molecular chaperones are defined functionally, and all protein with this function are molecular chaperones, and their structures can be completely different. This concept has been extended to many protein, and now the molecular chaperones identified mainly belong to three highly conserved protein families [12]: stress 90 family, stress 70 family and stress 60 family. Among them, the stress 60 family exists in the mitochondria of eukaryotes (called Hsp58 in mammals) and chloroplasts (called cpn60), and in the cytoplasm of prokaryotes it is called GroEL.
It means folding.
The elucidation of protein's folding mechanism will reveal the second set of genetic codes in life, which is its theoretical significance. The narrow definition of protein Folding is to study the formation law and stability of protein's specific three-dimensional spatial structure and its relationship with biological activity. Conceptually, there are problems of thermodynamics and dynamics; Protein folding in vitro and in cells; There are problems of theoretical research and experimental research. The most fundamental scientific problem here is how the primary structure of polypeptide chain determines its spatial structure. Since the former determines the latter, there must be some definite relationship between the primary structure and the spatial structure. Is there a set of codes, just like nucleotides determine amino acid sequences through "triple codes"? Some people call this assumed password that the primary structure determines the spatial structure "the second genetic password".
If the triple code has been deciphered, but it has actually become plain code, then deciphering the second genetic code is the most direct theoretical solution to protein's folding problem, which is one of the last unsolved mysteries in protein's research. Protein structure prediction is a theoretical thermodynamic problem. It predicts the specific spatial structure determined by anfinsen principle according to the measured protein first-order sequence. The determination of amino acid sequence in protein, especially the nucleotide sequence encoding protein, has now almost become a routine technique. Amino acid sequence can be deduced from complementary DNA(cDNA) sequence according to "triple code". These molecular biology techniques, which made great breakthroughs in the last century, greatly accelerated the determination of the primary structure of proteins. At present, there are about1.7000 protein in the protein database, but there are only about1.2000 protein in the spatial structure, many of which are very similar homologous protein, while the truly different protein has only1.7000. With the successful completion of the human genome project and the interpretation of the whole DNA sequence, the data growth of the primary structure of protein will inevitably explode, while the speed of spatial structure determination is far behind, and there will be a greater distance between them, which makes it even more necessary to predict the protein structure.
Foreground folding
At the same time, it also has important potential application prospects, such as the following aspects:
Inclusion body renaturation folding
▲ DNA recombination technology can be used to introduce foreign genes into host cells. However, the expression products of recombinant genes often form inactive and insoluble inclusion bodies. The clarification of folding mechanism will be very helpful to the renaturation of inclusion bodies.
Hand-designed protein folding
▲ The development of ▲DNA recombination and polypeptide synthesis technology enables us to design longer polypeptide chains according to our own wishes. However, because we can't know what conformation this polypeptide will fold into, we can't design protein with specific functions according to our own wishes.
Looking for pathogenic mechanism folding
▲ Many diseases, such as Hallmo's disease, mad cow disease (BSE), infectious spongiform encephalopathy (CJD), amyotrophic lateral sclerosis (ALS) and Parkinson's disease, are caused by mutations in some important protein in cells, which leads to protein aggregation or wrong folding. Therefore, an in-depth understanding of the relationship between protein folding and misfolding will be of great help to clarify the pathogenesis of these diseases and find a cure.
Revealing protein functional folding
▲ With the development of genome sequences, we have obtained a large number of protein sequences, and the acquisition of structural information is very important to reveal their biological functions. The existing means (X-ray crystal diffraction, nuclear magnetic resonance, electron microscope) need a long time to determine the structure of protein, so the pace of structural analysis has lagged behind the pace of discovering new protein. But the method of structural prediction is fast, but its reliability is not high. Only when we have a better understanding of the physical and chemical factors that maintain protein structure and drive protein folding can we fundamentally improve this method. In addition, our research on the structure-activity relationship of protein interaction and ligand-protein interaction also depends on the elucidation of protein folding mechanism.
Protein folding
It is known that the change of only one amino acid residue in protein molecule caused by gene mutation can cause diseases, so-called "molecular diseases", such as sickle cell anemia in the Mediterranean, which is caused by the mutation of glutamic acid at the sixth position in hemoglobin molecule into tyrosine. Now it is found that the amino acid sequence of protein molecule has not changed, but its structure or conformation will also cause diseases, which are called "conformational diseases" or "folding diseases".
As we all know, mad cow disease is caused by a protein infection called prion. This protein can also infect people and cause nervous system diseases. In normal organisms, prions are protein for normal nerve activity, while the primary structure of pathogenic prions is exactly the same as that of normal prions, but the spatial structure is different. The study of this disease involves many basic biological problems. Why do protein with the same primary structure have different spatial structures? Is this inconsistent with the principles of anfinsen? Obviously, there are problems with protein's energy and stability.
It has always been thought that the change of protein's structure comes from the change of sequence, and the change of sequence comes from the change of gene, and life information is transmitted from nucleic acid to protein. The information of pathogenic prion has been proved by Nobel Prize winner Prusiner, and it does not come from genetic changes. Pathogenic protein prion makes normal protein prion into pathogenic folded state and be infected through the interaction between protein molecules! What is the nature and mechanism of this interaction? How can molecules that only change their folding state cause serious diseases? These problems can't be satisfactorily explained by traditional concepts, so they have caused heated debates in the scientific community, and the intensity and competitiveness of related research have also been greatly enhanced.
Other diseases caused by abnormal folding of protein, such as abnormal molecular aggregation or even precipitation or transport, include Alzheimer's disease, cystic fibrosis, familial hypercholesterolemia, familial amyloidosis, some tumors, cataracts, etc. Because molecular chaperone plays an important role in protein folding, the mutation of molecular chaperone itself will obviously cause protein folding abnormality and cause folding disease. With the deepening of protein's folding research, people will find more real pathogenic factors and more targeted treatments, and design more effective drugs. Now it is found that some small molecules can be used as ligands to cross cells and combine with mutant proteins, so that the mutant proteins that have lost their fighting ability can escape from "protein Quality Control System" and "fight with injuries". This small molecule is called "drug molecular chaperone" and is expected to become a new drug to treat "folding disease". The folding problem of newborn peptides or protein folding problem not only has important scientific significance, but also has important application value in bioengineering besides the above medical applications. Genetic engineering and protein engineering have gradually developed into big industries with output value of billions of dollars, and will have greater development after entering 2 1 century. However, it is often difficult to introduce exogenous DNA into simple microbial cells to synthesize polypeptide chains, which often cannot be correctly folded into bioactive proteins to form insoluble inclusions or degraded. The thorough solution of this "bottleneck" problem requires more understanding of the new peptide chain folding.