MOLECULAR BIOLOGY Historical development and basic concepts S. K. Jain Professor Hamdard University New Delhi 110062 E-mail: firstname.lastname@example.org 24-Jul-2006 (Revised 05-Sep-2007)
CONTENTS Introduction Discovery of DNA DNA is the genetic material Basic structure of nucleic acids Three-dimensional structure of DNA Double helical structure of DNA Alternate forms of DNA Structure of RNA Genome and the C-value Central dogma of molecular biology Genome organization The satellite DNA Highly and moderately repetitive sequences Denaturation of DNA G:C content of DNA and the Tm
Key words Deoxyribonucleic acid; Genetic material; Chromosome; Replication; Central dogma; Transcription; Ribonucleic acid; Translation; Proteins; Repetitive sequences; Denaturation
Introduction Molecular Biology is an old science. It started with the evolution of life. Soon after the cell was recognized as the basic unit of life and its structure was deciphered, it became clear that the cellular constituents are nothing but chemical compounds. While some of these compounds are simple chemicals such as water, NaCl and many other small molecules, certain other molecules are very large having complex chemical structure. Nucleic acids, proteins, complex carbohydrates and lipids are some such molecules. These complex molecules often have very large molecular weight and are relatively difficult to be synthesized by chemical means. These are therefore referred as bio-macromolecules. These are synthesized in the cell by a number of complex reactions catalyzed by enzymes, the biological catalysts. The biochemical pathways are highly regulated chemical reactions taking place in the cell. Almost all these biochemical reactions can be achieved in the test tube also. However, these often require high activation energy and their efficiency may not always be high in the test tubes. On the other hand, within the cell these reactions take place at body temperature at relatively low activation energy. Enzymes are responsible for this lowering the of activation energy. Further, at any given time thousands of different reactions take place simultaneously in a cell. There is a highly regulated coordination between different cellular reactions, many of which are of diverse nature. Further, this coordination is very efficient and precisely controlled. A number of regulatory molecules (usually protein in nature) regulate various reactions. These biochemical reactions can be influenced by both endogeneous factors such as hormones, metabolic status of the cell etc and by exogeneous factors such as the environmental changes. Life is nothing but the sum of organized interaction of these cellular components (the biomolecules) and any disorganization of these molecular interactions results either in a pathological condition or in cell death. A couplet written by an Urdu poet, Chakbast Lucknowi describes this scientific fact in a very beautiful manner. The couplet is: `Zindgi kya hai, anasir mein zuhure tertib Maut kya hai, inhi ajza ka parishan hona’ It can be translated as: What is life, it is the organization of components (the biomolecules) What is death, the disorganization of the same component’ Modern molecular biology started with the discovery of DNA by Friedrich Meischer in 1869. A series of experiments finally proved that DNA is the genetic material that transfers the genetic information from one generation to another. These included the classical experiments by Frederic Griffin (1928), Oswald Avery, Colin McLeod and Maclyn McCarty (1943) and the famous double labeling experiment of Alfred Hershey and Martha Chase (1952) that proved the role of DNA as carrier of genetic information without any doubt. The elucidation of DNA structure by Watson and Crick (1953) was the key event that set the ball rolling. Invention of new techniques for cell fractionation and isolation, purification, characterization and analysis of subcellular components made it possible to study the cellular processes at molecular level. Development and availability of new generation of analytical equipments further facilitated the process. A number of other discoveries especially that of DNA polymerase and genetic code helped in deciphering the secrets of life at the molecular level. Finally, the discovery of restriction endonucleases and ligases that can act as `molecular scissors’ and `molecular needles’, respectively led to the beginning of yet another important field, the `genetic engineering’. In 1997, it became 2
possible to clone an animal from the genome of a somatic cell (cloning of the sheep Dolly) and by the turn of the century, completion of Human Genome Project has taken us to the era of functional genomics, the proteomics and micro-array technology. The developments in the field of `stem cell research’ has opened yet another vista in the field of molecular biology. The discovery of SiRNA and antisense RNA technology has provided new tools for regulating the molecular events of the cell. All these discoveries and developments have resulted in new dimension to our understanding of life. The entire health management system has been revolutionized and it may soon be possible to have individualized molecular profile of each person and manage his/her health accordingly. The plant molecular biologists/ biotechnologists have also made important advances. It is possible to micropropagate the plants by tissue culture techniques and save the endangered plant species from being extinct. The tissue culture techniques can also be used for largescale cultivation of medicinal and economically important plants. The rate of synthesis and cellular concentration of useful secondary metabolites can be enhanced to obtain their higher yields. It has become possible to provide new characters to the plants by developing transgenics and using the plants as biofactory for the production of useful therapeutics and other biomolecules. Some of the important landmarks in the fields of molecular biology and biotechnology have been summarized in Table 1. Table 1: Landmarks in molecular biology and biotechnology 18691941194419511952195319551957195819601960196119611961196319671970-
Friedrich Miescher discovered DNA. Beadle and Tatum demonstrated that a gene codes for a single protein and one gene one protein theory was put forward. Avery, McLeod ana McCarty showed that DNA is the genetic material. The helical conformation of a chain of aminoacids was proposed and the αhelix and β-sheet structures in proteins were deciphered. Hershey and Chase proved that DNA is the carrier of genetic information. Watson and Crick gave the double helical structure of DNA. Method for determination of amino acid sequence of a protein was developed by Frederick Sanger and the sequence of insulin was determined. Arthur Kornburg discovered the DNA polymerase I. Meselson and Stahl showed that DNA replicates in a semi-conservative manner. The detailed 3-D structure of proteins was described to very high resolution. Polycistronic genes in bacteria were discovered. The one gene one protein theory became obsolete. The triplet nature of codons was discovered and the genetic code was deciphered by Marshal Nirenberg and H.G. Khurana. Messenger RNA was discovered. Jacob and Monad proposed the `operon model’ for regulation of gene expression. The circular nature of bacterial DNA was discovered by John Cairns. Enzyme DNA ligase was discovered by Gilbert. Temin and Baltimore reported the discovery of reverse transcriptase in 3
retroviruses. Type II restriction endonucleases were discovered. Eukaryotic genes were cloned in bacterial plasmids. The signal hypothesis was proposed by Gunter Blobel. Retroviral oncogenes were identified as the causative agents for cellular transformation by JM Bishop and HE Varmus. DNA sequencing protocols were developed (chemical method by Maxam & Gilbert and enzymatic method by Sanger) and it became possible to find out the nucleotide sequence of gene. It was shown that the eukaryotic genes are interrupted. The introns were discovered and the splicing mechanism for the removal of introns from primary transcripts was deciphered. The NIH guidelines for r-DNA technology were formulated. Cellular oncogenes were discovered. The catalytic activity of RNA was discovered and the concept of ribozyme was accepted. Transgenic mice and flies were created by introducing novel genes in the germ lines. Polymerase chain reaction (PCR) was discovered by Kary Muller. Dolly, the sheep was cloned from the somatic cell genome. The first animal cloning experiment established the totipotency in animal cells. RNAi was discovered. The human genome project was completed. The first draft of the sequence of human genome was published. Functional genomics and proteomics became the new fields.
Discovery of DNA The quest for understanding the mystery of life guided a number of scientists to analyze the chemical nature and composition of cells. While analyzing the cell nucleus in a systematic manner, Friedrich Miescher in 1868 studied the pus cells (leukocytes) from the hospital bandages and found some phosphorus containing compounds in it that were acidic in nature. Being identified within the nucleus of the cell, these compounds were named as nuclein. Later nuclein was fractionated into two main groups, an acidic component and a basic component. The acidic component was given the name nucleic acid. The basic component was later identified to be proteins. Similar chemical entity was later isolated from Salmon sperm cells. It was partially purified and some chemical studies were carried out on it and it was partially characterized. Miescher is regarded as the scientist who discovered DNA. Detailed analysis of the acidic component was followed by studying its chemical and physical characteristics. A number of different scientists analyzed and established its chemical composition and it was shown to have a nitrogenous base, a sugar and phosphate(s). Later two separate nucleic acids, one having ribose sugar and other having 2desoxyribose as the sugar moiety were identified and these were given the name ribonucleic acid (RNA) and deoxyribonucleic acid (DNA), respectively.
DNA is the genetic material The nucleic acids and proteins soon became the key molecules that were implicated in having the genetic information, its transfer from one generation to another and carrying out various biochemical processes. Scientists were looking for the material that was responsible for the storage of genetic information and transfer of parental characters to offsprings. A group of scientists believed that DNA is the genetic material while many of the genetists doubted it and thought that proteins are the key molecules for transfer of genetic information. In 1928 Fred Griffith carried out the classical experiment where he infected mice with two separate strains of Pneumococcus. One strain that contained capsular glycoproteins, had smooth surface (the `S’ strain) and was virulent while the other strain had rough cell surface (the `R’ strain) and was non-virulent. He showed that infection with live bacteria of `S’ strain resulted in disease and lead to death of mice while the mice were not killed if infected with `R’ strain. Further, the mice survived when infected with heatkilled bacteria of `S’ strain. However, the co-infection with bacteria of heat killed `S’ strain and the live `R’ strain together resulted in death of the animals. This suggested that some factor(s) present in the killed `S’ bacteria was capable of transforming the `R’ bacterium resulting in its (the bacteria of non-virulent `R’ strain) conversion to a virulent strain. Furthermore, when the DNA from heat killed `S’ strain was injected into the mice along with live `R’ strain bacteria, the combination was found to be virulent and lethal. It clearly demonstrated that DNA from killed `S’ strain bacteria was able to transform the `R’ strain bacteria and this was responsible for the pathogenicity of the transformed bacteria. However, doubts were expressed about the homogeneity and purity of the DNA preparation (many of the refined techniques for purification of subcellular components were not available at that time) and a number of scientists still thought that the contaminating proteins might have been responsible for transformation of the `R’ bacteria resulting in the death of mice. The dilemma continued till 1943 and many workers still believed that proteins were the genetic material. In 1944 Avery, McLead and McCarty extracted the transforming principle from the heat killed bacteria of `S’ strain, chemically characterized it and showed it to be DNA. Their analysis included elemental analysis as well as physical characterizations such as optical properties, ultracentrifugal behaviour, electrophoretic migration and diffusion properties. Further, they also showed that removal of even the last traces of lipids and proteins from the transforming principle has no effect on its effectiveness as transforming factor. The treatment of this factor with either proteases or RNases did not result in loss of its activity. But treatment with DNase resulted in total loss of its transforming property. In 1951 Roger Harriot determined the structure of bacteriophages and showed that the phages have needle like structure with a head and a sharp pointed tail. The nucleic acid was shown to be localized in the head region. Experiments demonstrated that during infection the tail pierces the host bacteria and facilitates the transfer of phage nucleic acid into the host cell. The outer envelope (or the proteins) remains outside the infected host. The phage nucleic acid was implicated to be responsible for the host cell transformation. This observation gave further support to the theory that DNA was the genetic material. Though all these studies made it clear that DNA is the genetic material and is responsible for the transfer of virulent characters in original experiment of Griffith, some doubts still persisted. Finally in 1952 Alfred Hershey and Martha Chase performed their famous double labeling experiment that unequivocally proved that DNA is the genetic material. They radiolabeled the DNA of phage T2 with 32P and its protein with 35S. Bacteria were infected with the double-labeled phage and the fate of radioactivity was carefully followed. It was 5
found that while 32P (i.e. phage DNA) was taken up by the host cells and was detected inside the infected host cells, 35S (i.e. the phage proteins) did not enter the host cells and remained outside in the culture medium. This confirmed that DNA was the genetic material that was responsible for the transformation of the host bacteria beyond any doubt. It is now well established that DNA is the genetic material for all the living cells except in certain viruses (see later). Basic structure of nucleic acids A number of efforts were made to decipher the structure of DNA and RNA. Their chemical composition was determined and it was established that three components namely the nitrogenous bases, a pentose sugar and phosphates are covalently linked together to form nucleotides. The polymerization of nucleotides leads to the synthesis of nucleic acid. Further, the basic backbone in the primary structure of both DNA and RNA was found to be similar. The analyses showed that there are only four different types of bases present in both DNA and RNA. Two of the bases are pyrimidines and two are purines. Pyrimidine has a 6 membered ring structure consisting of four carbons and two nitrogens. The two-pyrimidine derivatives present in DNA are cytosine (2-oxy, 4 amino pyrimidine) and thymine (2,4dioxy, 5 methyl pyrimidine). However, thymine is not present in RNA. RNA has uracil (2,4-dioxy pyrimidine) in place of thymine. The purine has 5 carbons and four nitrogens arranged in pyrimidine plus imidazole rings fused together. Two similar purines are present both in DNA and in RNA. These are adenine (5-amino purine) and guanine (2-amino, 6-oxy purine). The structure and the numbering on members of the rings have been shown in Fig. 1. In RNA, the pentose sugar is α-D-2-ribose present in β-furanose configuration (Fig. 2). In DNA the 2nd carbon of ribose lacks –OH group, which is replaced by a –H atom. Thus the sugar in DNA becomes α-D-2-deoxyribose. To differentiate the positions of carbon atoms in sugar from the positions of the members in the ring of the base, the carbons on sugar are numbered as 1’, 2’, 3’, 4’ and 5’ while the members in the rings of base are numbered 1-6 in pyrimidines and 1-9 in purines. The nitrogen at position 1 of pyrimidine or the nitrogen at position 9 of purine is attached to 1’ carbon of sugar through a gycosidic bond. The sugar base complex is known as a nucleoside. The 5’ carbon of the sugar is linked to phosphate group(s). Nucleoside and phosphates together are known as nucleotides. Naturally occurring nucleotides have either one, two or three phosphates attached to each other in linear manner (at the α,β and γ positions) through ester bonds and are known as nucleoside mono-, di- or triphosphates. The ribose containing nucleosides and nucleosides present in RNA are known as ribonucleosides and ribonucleotides (rNTPs) while those present in DNA contain deoxyribose and are called deoxynucleosides and deoxynucleotides (dNTPs). Following is the relationship between base/sugar/phosphates and the names of resulting nucleoside and nucleotides. Base
N-glycosidic linkage Nucleoside
Ester linkage Nucleotide
Fig. 1: Components of nucleic acids
Fig. 2: Nucleosides and nucleotides 8
Base Pyrimidines Cytosine Thymine (only in DNA) Uracil (only in RNA) Purines Adenine Guanine
Nucleoside Nucleotide Cytidine Cytidine (mono-, di- or tri-) phosphate Thymidine Thymidine (mono-, di- or tri-) phosphate Uridine Uridine (mono-, di- or tri-) phosphate Adenosine Adenosine (mono-, di- or tri-) phosphate Guanosine Guanosine (mono-, di- or tri-) phosphate
During polymerization, the 3’-OH group of first nucleotide condenses with the 5’-PO4 group of second nucleotide and forms a phosphodiester bond. The process continues `n’ number of times and a long chain of polynucleotide is formed that makes the basic structure of both DNA and RNA. It should be noted that only one PO4 group (the α-PO4) is required for the phosphodiester bond formation. However, under natural conditions during enzymatic biosynthesis of nucleic acids, the enzymes (RNA polymerase and DNA polymerase) use nucleoside triphosphates as substrate. Nucleoside mono or diphosphates can not serve as substrate for any of these enzymes. The α-phosphate is conserved in the phosphodiester bond while the β- and γ-phosphates are released as inorganic pyrophosphate. The general structure of a polynucleotide has been shown in Fig. 3. As can be seen in Fig.3, in a long chain of polynucleotide only the first nucleotide will have free phosphates at the 5’-end. This is referred as the `Head’ or 5’-end of nucleic acid. Similarly only the last nucleotide will have free –OH group at the 3’-position. This end is known as `Tail’ or the 3’-end of nucleic acid. The polymerization is thus directional and provides polarity to the nucleic acid. It is an accepted practice that the structure of nucleic acid is written in a 5’→ 3’ manner. If for any reason, one wants to write the sequence of nucleotides in opposite manner it is essential that the 5’ and the 3’ ends be clearly indicated. Further, no internal free –OH (except in the 3’-tail nucleotide) is present in DNA. This (absence of a reactive –OH group) makes DNA a very stable molecule. This stability serves very useful purpose in its role as carrier of genetic information and is essential for maintaining the genetic conservation. It is a very strong and convincing example of structure function relationship. RNA, on the other hand, has one free –OH group (the 2’ – OH) in each of the nucleotide. This makes RNA highly reactive and prone to chemical modification/ degradation. In fact, it is possible to isolate DNA from dried blood drops and even from mummies that died thousands of years ago. No intact RNA can be isolated from such samples.
Fig. 3: General structure of a polynucleotide (nucleic acid)
Three-dimensional structure of DNA Soon after the chemical nature and basic structure of DNA became clear, efforts were made to establish its three dimensional structure. The analysis of DNA from a number of organisms at different stages of evolution showed that while the same four bases are present in all the DNAs, the actual base composition of DNA from different species varies and is unique for any given organism. However, the base composition of DNA within any particular species remains constant and does not vary within the individuals of the same species. This further confirmed that DNA is responsible for the species-specific characters. However, detailed chemical analysis of various DNA molecules revealed an interesting observation. It was found that the number of purines (A+G) in any DNA sample always equals to the number of pyrimidines (T+C). Within this ratio, the number of A residues always equals the number of T residues and the number of G residues always equals the number of C residues (Table 2). It, therefore, became evident that there has to be a relationship between A and T and between C and G. Based on these observations, a number of biophysical studies and the X-ray diffraction studies, a number of different models for three dimensional structure of DNA were proposed. Finally, James Watson and Fredrick Crick gave the three dimensional structure of DNA in 1953. This model, popularly known as the double helical model of DNA structure, explained the three-dimensional and space filling structure of DNA in an unequivocal manner and is universally accepted. Table 2: Base composition of DNA from different species Organism
Relative ratio of bases (%) A
Human Sheep Chicken Turtle Salmon Sea urchin Locust Wheat Yeast E. coli Staphylococcus aureus Phage T7 Bacteriophage λ Phage ϕX174 (RF)
30.9 29.3 28.8 29.7 29.7 32.8 29.3 27.3 31.3 24.7 30.8
19.9 21.4 20.5 22.0 20.8 17.7 20.5 22.7 18.7 26.0 21.0
19.8 21.0 21.5 21.3 20.4 17.3 20.7 22.0 17.1 25.7 19.0
29.4 28.3 29.3 27.9 29.1 32.1 29.3 27.1 32.9 23.6 29.2
1.05 1.03 1.02 1.05 1.02 1.02 1.00 1.01 0.95 1.04 1.05
1.00 1.02 0.95 1.03 1.02 1.02 1.00 1.00 1.09 1.01 1.11
1.04 1.03 0.97 1.00 1.02 1.02 1.00 1.00 1.00 1.03 1.07
Double helical structure of DNA In their model, Watson and Crick depicted that DNA has a double stranded structure in which the two strands of polynucleotides are present. These strands run in an anti-parallel 11
manner (i.e. 5’-end of one strand faces the 3’-end of the other strand). Further, the two strands are coiled along a common axis in a spring like manner. The coiling is right-handed. The entire structure is highly organized. It has a diameter of 23.7 Å. Each complete turn of the coil contains 10 bases. Two adjacent residues are therefore at an angle of 36º from each other. Further, two bases are 3.4 Å apart from each other. The pitch or the length of a complete turn of the helix is 34 Å. The adjacent nucleotides in each strand are joined together by a phosphodiester bond between the 3’ and 5’ carbons of the sugars. The sugarphosphates form the backbone of DNA, while the bases are present at a right angle to the axis. Two strands are held together by hydrogen bonds between bases. The analysis of structure of the bases revealed that for maintaining stability the hydrogen bond formation should take place between the pyrimidine in one strand and the purine in other strand. A purine-purine pairing will be too big while a pyrimidine-pyrimidine pairing will be too small to fit within the helix. Detailed analysis revealed that in fact this is the case. It was found that the `A’ and `T’ and the `G’ and `C’ are complementary to each other and an `A’ base in one strand always pairs with a `T’ in the other strand and a `G’ base pairs with a `C’ and vice versa. This explained the genesis for observed ratio between purines and pyrimidines and between A/T and G/C in DNAs from different species. There are two possible hydrogen bonds between A:T pairing and three bonds in G:C pairing. The hydrogen bonding takes place either between the –NH2 group of one base and =O of the other base or between =NH of one base and the –N of the other base. For stable bond formation, the distance between N-N is 0.30 nm and that between O-N is 0.28-0.29 nm. The positions and the distances between these bonds have been shown in Fig. 4.
Fig. 4: Base pairing between complementary bases: Formation of hydrogen bonds 12
Besides the hydrogen bonds, base stacking or π-π interaction is the other force that helps in helix stabilization. It involves hydrophobic interaction between adjacent base pairs. These hydrophobic interactions are formed as the hydrogen bonded structure of water forces the hydrophilic groups into internal parts of the molecule. While both base pairing and base stacking are important in holding the two strands of DNA together, the base pairing has important biological implications. The complementarity of bases and the fact that an `A’ can pair only with T and a G can pair only with C provides a mechanism by which new copy of DNA can be produced in a template dependent manner. This pairing ensures the replication of DNA with high degree of fidelity. The coiling of the strands along the common axis creates two grooves of different sizes in each turn. The larger of the grooves is known as the major groove while the smaller groove is referred as the minor groove. The grooves provide a convenient site for binding of many DNA binding proteins that play important role in regulation of gene expression as well as for metabolic factors such as polymerases and transcription factors. Further, the base pairing provides certain degree of flexibility to DNA strands. It is therefore possible to change the configuration of DNA in response to certain signals which forms the basis of a number of regulatory processes. The three dimensional double-helical structure of DNA based on Watson and Crick’s model is shown in Fig. 5.
Fig. 5: Structure of DNA 13
Alternate forms of DNA The basic structure of DNA given by Watson and Crick represents the structure of majority of the cellular DNA. The DNA with this basic structure is referred as the `B’ DNA. However, all the DNA molecules are not uniform in their structure. A relatively small amount of DNA may have certain alternate structures. The double helical structure provides some degree of flexibility and allows the molecule to take slightly different shapes. While a number of possibilities exist, the most important conformational change involves the rotation around the glycosidic bond. It changes the orientation of base in relation to the sugar. The rotation around the bond between the 3’ and 4’ carbon can also take place. Both these rotations result in changed positioning of two strands and certain alternate structure of DNA can be formed. Following are the common alternate forms of DNA. `A’ DNA: It is very minor species of DNA that may or may not be present under normal physiological conditions. Its presence has been demonstrated in vitro in less hydrous environment having high Na+ and K+ concentrations. The `A’ DNA is more compact than the `B’ DNA. It has the diameter of 25.5 Å, distance between two adjacent bases is 2.9 Å and the pitch is 32 Å. Thus there are 11 bases/turn. This form of DNA has high degree of resemblance with double stranded RNA. It has much deeper major groove and the minor groove is very shallow. The `A’ DNA is right handed in its helical turnings. Variations, such as B’, C, C’ D, E and T DNAs that have minor differences with A or B DNAs and have right handed double helical structure have also been reported. However, their precise function is not clear. `Z’ DNA: It is left handed form of the DNA. In this form of DNA the turns in the DNA helix are in opposite direction than in other forms of DNA. The `Z’ DNA is slimmer and has a diameter of only 18.4 Å. The two strands of the helix are coiled in left-handed manner around the common axis having about 12-bases/ turn. There are no major and minor grooves. There is only one groove and that too is narrow and deep. The base conformation is more like a zig-zag arrangement (this gives the name `Z’ DNA). Under experimental conditions the presence of `Z’ DNA has been shown in high salt condition or in presence of certain specific cations, such as spermine and spermidine. This form of DNA has high degree of negative supercoiling and has certain specific proteins attached to it. Besides, relatively high methylation at 5- position of C residues has been found in `Z’ DNA. Though precise role of alternate forms of DNA is not very well understood, they may play some regulatory function. The possibility that some of these DNAs may be artifact of experimental conditions may not be completely ruled out. As discussed, the conformation of DNA plays an important biological function. Majority of the regulatory controls require binding of certain factors to DNA. Any change in the structure will affect the binding of these factors and will therefore regulate the biological activity.
Structure of RNA It has been mentioned earlier that the basic structure of RNA is very similar to the primary structure of DNA. The ribinucleotides are polymerized through phosphodiester bonds in a directional manner to form RNA. However, the three dimensional structure of RNA differs from DNA. Majority of the cellular RNA is single stranded. In many RNA molecules regions of internal complementarities may be present. These regions may create high degree of secondary structures consisting of intra-molecular double stranded stems. Though 14
relatively very small in quantity, some double stranded RNA molecules containing intermolecular base pairing (similar to DNA) may also be present in the cell. Such ds RNA may have regulatory functions. The base pairing (both intra- and inter-molecular) in such double stranded regions follows primarily the Watson-Crick’s model, i.e. an `A’ pairs with a `U’ (as discussed, RNA has uracil in place of the thymine in DNA, rest of the three bases are common between RNA and DNA) by two hydrogen bonds and `C’ pairs with `G’ by three hydrogen bonds. However, certain alternate base pairing especially between U and G can also be present. Certain RNA molecules (tRNA for example) may have some unusual and/ or modified bases. Some of such bases are ribothymine (rT), dihydrouridine (D), pseudouridine (ψ), 4-thiouridine (S4U), 3 or 5 methyl cytosine, inosine (I), N6- methyl (or isopentamyl) adenosine, methyl guanosine, quanosine (Q) and wyosine (W). These bases can form alternate base pairing (Fig. 6).
Fig. 6: Modified bases present in tRNA Sometimes triple base pairing can also take place in which a base pairs with two different bases (for example C:G:m7: G or U:A:A). In such pairing one pairing is the Watson-Crick pairing while other is the alternate pairing (Fig. 7). Such alternate bases and unusual base pairings play an important role in maintaining the three dimensional structure of specialized molecules and have implication in specific function of the molecules. The tRNA structure: function relationship is an important example of such modifications. Three main types of RNA are present in the cell. These are ribosomal RNA (rRNA), messenger RNA (mRNA) and transfer RNA (tRNA). Each of these RNA has specific properties and specific functions. Their detailed structure and functions will be described elsewhere. The rRNA is the constituent of ribosomes and is most predominant class of cellular RNA, making almost 90% of the total RNA. A number of rRNA molecules are present that include 28S RNA (23S in prokaryotes), 18S RNA (16S in prokaryotes), 5.8 S RNA (not present in prokaryotes) and 5S RNA (present both in eukaryotes and in prokaryotes). The tRNA is relatively small in size (4S) having 78-108 bases. The tRNA acts as the adapter molecule that carries specific aminoacids to ribosomes during protein 16
synthesis. The tRNA molecules have unusually high number of modified bases and a complex secondary structure consisting of four stems and loops, each having a specialized role during aminoacylation and/ or aminoacid transfer. The mRNA contains the genetic information copied from DNA, which is translated by the translational machinery in the form of sequence of aminoacids of proteins. While in terms of content, mRNA is only 1-2% of total cellular RNA, a typical mammalian cell can have up to 10,000 different mRNA molecules. Further, these have very heterogeneous distribution as their size ranges from 4S to 32S, 8S-15S being the most predominant class. The copy number of different mRNAs varies; some mRNAs have only 1-5 copies/cell (rare mRNAs) while certain other mRNAs are present in up to 12,000 copies/cell (the abundant mRNAs). Furthermore, the distribution of mRNAs varies from one tissue type to another and forms the basis of tissue specific functions. The mRNA has certain specialized structures (5â€™-cap and 3â€™-poly (A) tail in eukaryotes and SD sequences in prokaryotes) that help in its specialized functions.
Fig. 7: Unusual base pairing in nucleic acid (especially in tRNA molecules) 17
In addition to the three major classes of RNAs, some small RNA molecules are also present in the cell. These include small nuclear RNAs (snRNAs), small cytoplasmic RNAs (scRNAs), silencing RNA (SiRNA) that have specialized regulatory or other functions. Some of the snRNAs play key role in mRNA splicing. Some RNA molecules with catalytic function have also been recognized. These are known as ribozymes. The discovery of ribozymes changed the earlier concept that all enzymes are proteins. It is now accepted that most (but not all) enzymes are proteins. Majority of the cellular RNAs are present in association with proteins. Usually the RNAs and proteins are associated with each in non-covalently. The RNA protein complexes are known as ribonucleoprotein particles (RNPs). Very little free RNA is present in the cell. Unlike DNA, that is present in the nucleus (except for mitochondrial and chloroplast DNA), RNA is present both in nucleus and in cytoplasm. The RNA-associated proteins may help in stability of RNA. These may also play a role in regulation of mRNA metabolism. It has been found that certain proteins that are present when mRNA is not being translated (the CmRNPs), get dissociated when the same mRNA is being actively translated (the PmRNPs). Certain other specific proteins are associated with poly A tail region and are important for the stability. Certain viruses have RNA as their genetic material. Retroviruses and reoviruses are the RNA viruses. They have a specialized mechanism for the replication of their geneome.
Genome and the C-value The total content of DNA in a cell is known as the genome. The entire amount of DNA in the haploid cell is also known as the C-value. There is a relationship between the evolution and the C-value. Generally (but not always), the organism at a higher position in the evolutionary ladder has higher C-value. This has a simple logic as the genes are packed closely in less complex organisms to save space. The higher C-value has a clear advantage as it provides higher potential for the coding of more proteins. Due to complex cellular metabolic status of higher organism, more proteins are required for their cellular function. However, it also has a disadvantage to the cell in terms of need to replicate large amounts of DNA that is very taxing to the cell in terms of energy and other requirements. The C-value, DNA content and chromosome number of certain organisms has been shown in Table 3. An exception to general rule for inter-relationship between the C-value and evolution has been seen in the cases of certain amphibians. Even though much lower to mammalians, especially to humans, in terms of the evolutionary ladder, these have much higher genome size, some times up to 100X higher than the human genome. Further, their DNA content is much higher than most other amphibians. This is referred as the C-value paradox. Certain plants also have very high C-value. It also deserves mention that sometimes the DNA content may vary from cell to cell within the same species. For example, the amphibian oocytes have higher number of rRNA genes and large amounts of ribosomes than the somatic cells. In case of eukaryotes, the DNA is present in the nucleus of the cell. However, two organelles, namely mitochondria and chloroplasts, have their own genome. The basic structure of organelle genome is same as the structure of nuclear genome. In prokaryotes where there is no nucleus, the DNA is present in the centre of the cells in a semi-defined structure, the nucleoid. This observation is in contrast to earlier view that stated the prokaryotic DNA to be present in an interspersed state within the entire cytoplasmic region. 18
Table 3: DNA content of some of the organisms Organism
DNA content (C value)
Chromosome number (2x)
Nanogram/Cell Kbp SV40 HSV E. coli S. cerevisiae (Baker’s Yeast) Arabidopsis thaliana Drosophila melanogaster Sea urchin Gallus domesticus (chicken) Homo sapiens (human) Mas masculus Xenopus laevis (frog) Zea mays (corn) Trinaras cristatus HeLa cells (Human cervical cell line in culture)
6.0x10-9 1.7x10-7 5.3x10-6 25x10-6 7x 10-5 17x10-5 45x10-5 7x10-4 3.6x10-3 3.0x10-3 4.2x10-3 7.8x10-3 3.5x10-2 8.5x10-3
5.3 151 4.5x103 22.5x103 64x103 0.15x106 0.41x106 0.63x106 3.2x106 2.7x106 3.8x106 7x106 31.5x106 77x106
1 1 1 34 10 8 40 78 46 40 36 20 24 Polypoidy
Central dogma of molecular biology As discussed, DNA is the basic genetic material in almost all the cells (a few viruses are the exception – see later). It was found that DNA is the only macromolecule that has the capacity to self-replicate. With the help of the enzyme DNA polymerase, DNA can replicate and produce another copy of the genome. The DNA replication is a highly controlled process. The cell ensures the fidelity of primary structure of DNA and the daughter strands are true copies of the parent strands. The proof reading function and the repair mechanisms of the cell correct any mistake to ensure the fidelity. On an average the rate of mistake is 1 in 109 to 1 in 1010 bases. The replication of DNA is coupled with cell division and once DNA is replicated, the cell gets committed to divide. The DNA replication takes place during the S phase of cell cycle that is followed by the M phase when the cell divides. Thus the amount of DNA remains constant in all the living cells. Entire genome is replicated in toto during a cycle of replication and replicates only once during each cell cycle. No region of genome replicates preferentially or replicates more than once during one cycle. The information stored in DNA can be transferred to RNA by the process of transcription. The main enzyme for transcription is RNA polymerase. Both DNA and RNA use very similar language for encoding the genetic information (four nucleotides – dNTPs in DNA and rNTPs in RNA). However, unlike the replication of DNA entire genome is not transcribed together during transcription. A small segment of DNA, referred as gene, is the unit of transcription. Each gene codes for one molecule of RNA in eukaryotic cells (the monocistronic genes). In prokaryotic cells the genes encoding for molecules performing 19
similar functions are often grouped together and are transcribed as a single RNA molecules (polycistronic genes). Further, the transcription is a continuous process and has no corelation with cell cycle. Furthermore, a gene can be transcribed more than one time to produce multiple copies of RNA. It may be noted that only a small portion of DNA is represented as genes within the genome. Almost 90% (or even more) of the genome does not code for any useful information and represents the `junk DNAâ€™. Besides the junk DNA, introns are also the non-coding region of genes. However, these introns are removed from the primary transcript by a process known as splicing. The mature mRNA does not have the introns. The information in the form of RNA (mRNA) is further transferred to yet another molecule, the proteins. Proteins are the ultimate products of genetic information and carry out majority of the biological functions of the cell. The proteins and nucleic acids are entirely different and unrelated molecules. They have different chemical and physical properties. Unlike nucleic acids that are made up of four types of nucleotides, proteins are made of twenty different aminoacids. The transfer of information to proteins therefore requires the uncoding of the information from one language (the nucleotides) to another language (aminoacids). This coding is done in the form of a language referred as genetic code, where three nucleotides (a triplet) code for one aminoacid. The process of uncoding of genetic information to synthesize the proteins is referred as translation. The translation is a complex process involving a multi factorial machinery. The site of protein synthesis is ribosome. A number of enzymes and other `transâ€™ acting factors also participate in the process. The entire process is highly regulated to ensure the formation of correct proteins. This pathway for the transfer of genetic information from its storage form (DNA) to biologically active form (proteins) is referred as the central dogma of molecular biology (Fig. 8).
Exceptions to central dogma The description above refers to the pathway of transfer of genetic information as seen in almost all the living cells. However, certain viruses have RNA as their genetic material. How do these replicate and express their genetic information? In retroviruses, that have double stranded RNA as the genetic material, synthesis of complementary DNA is an obligatory requirement for the replication of the genome. It requires a specific DNA polymerase that uses RNA as template (RNA dependent DNA polymerase) and synthesizes a DNA molecule that is complementary to RNA genome (cDNA). This process is thus opposite of classical transcription and is referred as reverse transcription. The enzyme is commonly known as reverse transcriptase (RTase). It is a virus-coded enzyme, not present in host cells. The enzyme has template specificity and will use only the viral RNA as template. Specific tRNA molecules serve as primer for the RTase. Further, the process has certain specific requirements/steps (being discussed elsewhere). The properties and requirements of RTase are very similar to DNA polymerase. However, there are two marked differences: (i) the capability to use RNA as template, though RTase can use a DNA also as template and (ii) the majority of the RTases do not have proof reading functions. The fidelity of cDNA synthesis is therefore, much less than that of DNA synthesis. Obviously for replication of retroviruses, first the enzyme RTase has to be synthesized. For this purpose, the sub-genomic RNA (a portion of RNA genome that has an open reading frame coding for a protein) serves as mRNA and can synthesize RTase using host translational machinery. The resulting DNA form of the virus is referred as pro-virus that 20
can integrate into the host genome. The transcription of this DNA can produce multiple copies of viral genome.
DNA Replication DNA polymerase Reverse transcriptase In retroviruses
Transcription RNA polymerase
In reoviruses RNA replicase
Exception to Central Dogma Only in RNA viruses
The Central Dogma In all the living cells
Fig. 8: Central Dogma of Molecular Biology Another group of RNA viruses, the reovirus do not replicate through DNA intermediate. The RNA replicates directly. This is again an exception to the general rule that states that DNA is the only self-replicating biomolecule. The enzyme required for RNA replication is RNA dependent RNA polymerase, also known as RNA synthetase or replicase. This template specific enzyme synthesizes multiple copies of viral genome. The reoviruses are either double stranded or single stranded RNA viruses. Further, the single stranded viruses can be either `+’strand or `–‘strand. For the replication of its genome, first the enzyme has to be synthesized. The viral genome codes for RNA synthetase. The sub-genomic mRNA is first translated to produce the enzyme and this enzyme then carries out the replication of viral genome. The synthesis of enzyme is straight forward in case of double stranded or `+’strand ss:RNA viruses as the genome itself can serve as mRNA. However, there is a paradox in case of the `-‘ strand ss:RNA viruses. Their genome is complementary to mRNA and can not code for the protein. For the protein synthesis it is obligatory that the complementary RNA strand (the `+’ strand) is first synthesized. How this + strand (or the mRNA) that will code for the enzyme RNA synthetase can be synthesized. Without the synthesis of mRNA the necessary enzyme can not be synthesized and without the enzyme (RNA synthetase) the 21
mRNA can not be synthesized. It was a big puzzle. However, it got solved when it was discovered that small amounts of preformed enzyme are always encapsulated along with the RNA genome during the formation of viral particles. This enzyme initiates the process of mRNA synthesis, which starts the ball rolling. In the absence of this preformed enzyme, it will not be possible for the `-â€˜ strand RNA viruses to replicate. The entire gamut of gene expression in RNA viruses has been shown in Figure 8 at the left hand side as the exception to central dogma. By integrating the central dogma with the exceptions, it becomes clear that the transfer of genetic information from DNA to RNA can be considered as a reversible transfer (under experimental conditions, not under normal physiological condition within the cell in vivo). However, once the information has been expressed in the form of proteins, it is irreversible process and cannot be retrieved back in form of RNA or DNA. In fact, the process of reverse transcription (using viral RTase) is routinely used in genetic engineering for the synthesis of cDNA that serves as the important intermediate for gene cloning experiments. However, it may be possible to deduce the possible sequence of the coding region of a gene if the aminoacid sequence of the protein coded by this gene is known. Though, the degeneracy of genetic code makes it difficult to deduce the correct gene sequence. Furher, a protein molecule can never serve as the template for the synthesis of either a RNA or a DNA.
Genome organization A cell has large amounts of DNA. Bacteria have one of the smallest genome of any living cell. The eukaryotes have much larger genome. A careful analysis revealed that while the number of expressed genes in eukatryotes is 2-10 times higher than the prokaryotes, the complexity of genome can be higher by many orders of magnitude than in prokaryotic cell. This simply means that eukaryotic cells have substantial amount of DNA that does not code for any known gene. This DNA is referred as `junk DNAâ€™. Its role, if any, is not fully understood. In higher eukaryotes up to 90% of total genome consist of this DNA. However, the junk DNA may not necessarily be `junkâ€™ always. Though its precise function is not known today, it may have important role in evolution. Besides, it carries out at least one important function that is to protect the useful information from random spontaneous mutations by providing a type of cushion to the useful information. Recently concluded Human Genome Project (RGP) revealed that human genome is approximately 3.2x106 kb in size. This means that the length of DNA will be approximately one meter (as each base is 0.34 nm apart from each other, 1 kb DNA will have a length of 0.34x10-6m the length of entire DNA will therefore be 0.34x10-6 x 3.2x106 = 1.088m). How does such a long strand of DNA fit within a cell that is microscopic in size? It is possible only because the DNA is present in a highly condensed form. It is packed in thread like structures, the chromosome. A number of proteins and RNA also participate in chromosome formation. Prokaryotes have their entire DNA in the form of a single chromosome. Eukarytic cells on the other hand, have many chromosomes. The number of chromosomes is the characteristic property of a particular organism. In majority of the eukaryotes there are two copies of each chromosome (diploid, often written as 2x). Humans have 23 pairs (i.e. 46) chromosomes in their genome. Occasionally more than two copies of a chromosome may be present (polypoidy). Polypoidy is more common in plants than in animals. In a similar manner, sometime only one copy of a chromosome may be present (haploid, 1x). Chromosome represents the ultimate organized form of genome. The E.coli DNA in its fully extended form is approximately 1100 nm in length and is circular in shape. In contrast 22
to earlier belief that the bacterial genome is naked DNA that is spread all over the cytoplasm, it has now been established that bacterial DNA is present as highly packed structure known as the nucleoid, which represents the folded form of bacterial genome. The physical form of DNA, complexed with RNA and proteins is referred as the chromatin. The chromatin may be present in two forms. The region that is denser and stains darkly is referred as heterochromatin. It is metabolically inactive and is relatively resistant to DNase action. The less dense region that stains lightly and is highly sensitive to DNase action is known as euchromatin. The euchromatin is the metabolically active region of genome that gets actively transcribed. It may be noted that genomic portions that are present as euchromatin and heterochromatin are not irreversibly committed regions and these can interchange amongst themselves based on the metabolic state of the cell. Certain regions that may represent the junk DNA and are devoid of any gene may be permanently present as heterochromatin (the constitutive heterochromatin) while certain other regions may be present as heterochromatin in one cell type but as euchromatin in other cell type. These regions may represent the portion having genes that are expressed in a tissue specific manner. Such regions are known as facultative heterochromatin. The chromatin organization can play a role in regulation of gene expression also. Chromatin has highly organized structure consisting of three levels of organization. The first level of organization is referred as nucleosome. Under electron microscope, the nucleosomes are seen as ellipsoidal beads of 110x60 Å joined together by thin thread like structure at regular intervals. Mild digestion with DNase reveals that a region of 146 bp DNA is enzyme resistant. Further, integral multiples of DNase resistant fragments are seen, suggesting that the repeated structure of nuclease resistant fragments is present within chromatin. These 110 Å nucleosomes constitute the basic unit of chromatin. Detailed analysis of nucleosomes showed that it contains about 200 bp of DNA wrapped around an octamer of histone proteins. Histones are basic DNA binding proteins that participate in chromatin formation and are the integral part of chromosomes. The histone octamer includes two molecules each of H2a and H2b and H3 and H4 histones. The DNA is wound as one and three fourth (1¾) turns of superhelical DNA around the histone core. The repeated nucleosomes are joined together with linker DNA, one molecule of H1 histone (the H1 histone is also known as the linker histone, the nucleosome to which the H1 histone is attached, is also called as chromatosome) and certain non-histone proteins. The linker histones may be present on the surface of the nucleosome-DNA assembly and act as clamps that prevent the coiled DNA from getting detached from nucleosomes. However, in some cases the H1 may not be present on the surface but instead may be present imbedded between the core octamer and the wrapped DNA. The length of linker DNA varies from organism to organism and ranges between 8-114 bp. Certain variations between individuals of the same species and between cell to cell in the same individual have also been seen. Further, the H1 histone molecules may not be distributed evenly and the H1/DNA ratio of the linker region may vary between different loci of the chromatin. The histones help in stabilizing the nucleosome structure. These also help in further organization of chromatin namely the 300Å fibres. The nucleosomes are further condensed to form highly coiled lumpy fibre with an average diameter of 300Å. In formation of 300Å particles, the nucleosomes are in direct contact with each other without much of the linker DNA. These are super coiled into the solenoid structure. The linker histones and/or the tails of histone octamer can help in packaging of nucleosomes into 300Å particles. The nucleosomes are wound into a helix with 6 nucleosomes/ turn. These may be attached to nuclear matrix via A:T rich regions of DNA 23
that are known as matrix associated regions (MARs) or the scaffold attachment regions (SARs). Maximum degree of condensation of chromatin is observed during metaphase. The function of these structures is to package the giant DNA molecule into a form that can easily be segregated into daughter nuclei. It is important that the DNA molecule does not get entangled during the segregation and the physical forces do not result in shearing of the genetic material. This condensation of 300Ă… particles takes place with the help of nonhistone proteins. Histones do not directly participate in this condensation. This results in the formation of scaffold structure that has a central core sorrounded by huge pool of DNA. As entire chromosome is made up of a single DNA molecule, no apparent ends of DNA molecules are visible in this scaffold structure. Various degrees of organization of chromatin has been shown in Fig. 9.
Fig. 9: Organization of DNA into chromatin 24
The satellite DNA As the DNA has random sequence and is uniform in nature, it is expected to have uniform distribution of the nucleotides within the molecule. However, when the genomic DNA of an eukaryotic cell is centrifuged through a CsCl gradient (in CsCl gradient any molecule bands at a specific position where the buoyant density of CsCl is same as the density of the molecule being analyzed. Any homogeneous molecule should therefore give a single band), some unexpected results are obtained. In place of an expected single band the multiple bands are obtained. The predominant band is at about 1.7 g/cc that represents majority of the genomic DNA. However, at least three minor bands with buoyant densities of 1.692, 1.688 and 1.671 are also seen. These bands represent minor but distinct species of DNA, known as the satellite DNA. The name satellite DNA is given to these DNA species because these are different from the main class of DNA. The satellite DNAs represent the sequences containing multiple repeats of short lengths of DNA. Often these have a distinct sequence that is different from the average DNA and can therefore show independent behaviour. The repetitive DNA represents minor but substantial amount of genome comprising of up to 2050% of the genome. Some of the short sequences may be repeated up to a million times. Certain degree of conservation in sequences can be seen in various regions of satellite DNA in an organism. However, the length of the repeats may vary with the evolutionary position of an organism. The satellite DNAs have changed in an unusually rapid manner with the evolution. Further, certain characteristic differences can be seen between the satellite DNA of two closely related species. The satellite DNA sequences can also serve as transposable elements and can get interspersed within the genome in highly characteristic manner. The DNA of Alu family and L1 elements represent such DNAs. These can therefore, provide characteristically distinct characters to different individuals of a species. These have been used as the target for DNA fingerprinting. The satellite DNA is present within the heterochromatin region of the DNA and is not transcribed. So far no role has been assigned to these repeats. These do not perform any useful function in maintenance, metabolism, survival or replication of the cell. These DNAs are often referred as `selfish DNAâ€™ sequences that ensure their own retention during replication but do not make any contribution for the benefit of the cell. While many of the repeat sequences have distinct base composition (highly enriched in A:T, for example), others have similar structure as the average DNA and may not always band as distinct separate entity.
Highly and moderately repetitive sequences The repeatitive DNA can be of many types. Some of the repeat sequences are clustered at certain regions of genome in tandem arrays. These are referred as tandemly repeated DNA. This type of repeats is very common in eukaryotic DNA but are less frequent in prokaryotic genome. The satellite DNA is arranged in tandemly repeated form. Based on the length and number of repeats, these can be either microsatellites or minisatellites. The minisatellites consist of repeat units up to 25 bp, present in clusters of up to 20 kb. The microsattelites on the other hand are smaller, having a repeat unit of only up to 13 bp present in short clusters of 150 bp or less. The minisattelites are important feature of chromosome structure. For example, the telomeric DNA of a eukaryotic genome contains hundreds of copies of small repeats. In human, this repeat motif is 5â€™-TTAGGG-3â€™. This repeat motif plays an important role in DNA replication. Certain other minisatellites are present near the ends of eukaryotic chromosomes. The chromosomal ends are not the only region of the genome where 25
minisatelitles may be present; some minisattelites are present in other regions too. However, the functions of these minisatellites are not well understood. The microsatellies on the other hand are much shorter. A typical microsatellite is made of 10-20 repeats of 1, 2, 3 or 4 bp. A large number of microsatellites are present in the genome. In human a microsatellite with CA repeats such as 5’-CACACACACACACAC-3’ 3’-GTGTGTGTGTGTGTG-3’ makes about 0.25% of entire genome (approximately 8 Mbp total). Similarly single base repeats such as 5’-AAAAAAAAAAAAAAA-3’ 3’-TTTTTTTTTTTTTTT-5’ can make up to 0.15% of the genome. The microsatellites have been proved very useful to geneticsts. Due to slippage during the replication of the microsatellites, sometimes the number or location of these may become variable from one individual to another individual. These are thus one of the characterisitic features of an individual’s genome. The microsatellites, therefore, are used as the main tool for DNA fingerprinting. A large number of DNA fingerprinting probes based on microsatellites have been developed. Other types of repeats may be present interspersed throughout the genome. These may have evolved due to transposition events during the course of evolution. A number of such repeats are present in eukaryotic genome. Depending on the length of the repeats, these may be LINES (long interspersed nuclear elements) or SINES (short interspersed nuclear elements). Other repeats are LTRs (long terminal repeats) and of course, the transposons. Often such repeats form a charcterisitic feature of a species. For example, in a short fragment of 50 kb in human chromosome 7, which forms the part of a larger (685 kb) `human β T-cell receptor locus’, 52 such repeats are present.
Denaturation of DNA As discussed earlier, the DNA is a double stranded molecule and the two strands are joined together by hydrogen bonds between the complementary bases. There are no covalent linkages between the two strands. It is therefore, possible to separate two strands by relatively mild treatment without breaking its basic structure (i.e. the phosphodiester bond). The separation of two strands is referred as the denaturation of DNA or the melting of DNA. A number of different treatments can denature DNA. These include the denaturing chemicals such as alkali, urea and formaldehyde. However, one of the simplest and most commonly used methods to melt DNA is by heat denaturation that can be achieved by raising the temperature of a DNA solution. The temperature at which half of a DNA molecule (in aquous solution at neutral pH) opens is referred as the melting temperature or Tm. This can be determined by UV–spectroscopic analysis of DNA at 260 nm (the absorbtion maxima of DNA). The melting of DNA is associated with an increase in absorbance. This process is referred as hyperchromacity. This effect is due to the fact that melting of DNA results in decreased stacking of the bases (or increase in effective concentration of UV absorbing groups), which results in an increase in the absorbance at 26
260 nm. . A fully melted DNA has 37% higher absorbance than the ds:DNA. The absorbance of a solution of double stranded DNA (at 260 nm) with the concentration of 50 µg/ml is 1.0. However, the same solution will give an A260 value of 1.37 when fully melted (the single stranded DNA).
G:C content of DNA and the Tm As there are two hydrogen bonds between A:T and three hydrogen bonds between G:C, it will require higher energy to break a G:C bond than to melt the A:T base pairing. Thus, the Tm of a DNA will be higher if G:C content is higher. Hence, the melting temperature can give an idea about the base composition of a DNA molecule. Of course, the length of DNA will also play a role in the melting of DNA. Larger the molecule, higher temperature will be required for its melting. Following formula can be used to determine the G:C content of a DNA sample by determining the Tm. Tm = 69.3+ 0.41 (G+C)% - 650/L where L, is the length of the duplex DNA. It should be noted that the presence of other denaturing agents in the solution will lower the Tm. For example, if formamide is added to a DNA solution, the Tm of duplex DNA will get lowered. There is a decrease of 0.7ºC in Tm for every 1% formamide in the solution. Similarly, if there is any mismatch in the duplex DNA, the Tm gets lowered by 1ºC for each 1% of mismatches. As the base composition of DNA is the characteristic for a species, the Tm of the genome also becomes a characteristic property of a species. The Tm of an exclusive A:T DNA (synthetic, having no G:C) is approximately 60ºC. The Tm of majority of the eukaryotic DNAs is about 80ºC while the DNA of M.phlei (G:C content about 70%) has a Tm of ~95ºC. The interrelationship between Tm and G:C content has been shown in Fig. 10.
Fig. 10: Interrelationship between G:C content and Tm of DNA 27
Suggested Reading 1.
Text Book of Biotechnology: Fundamentals of Molecular Biology by S.K. Jain. Published by CBS Publishers & Distributors, New Delhi (India) 2. Biochemistry by Jeremy M. Berg, John L. Tymoczko and Lubert Stryer. Published by W.H. freeman and Company, New York (SA) 3. Principles of Biochemistry by Albert L. Lehninger, David L. Nelson and Michael M. Cox. Published by CBS Publishers & Distributors, New Delhi (India) 4. Genes IX by Benjamin Lewin. Published by Oxford University Press, Oxford (UK) 5. Molecular Biotechnology: Principles and Applications of Recombinant DNA by Bernard R. Glick & Jack J. Pasternak. Published by ASM Press Washington DC (USA). 6. Molecular Cell Biology by Harvey Lodish, David Baltimore, Arnold Berk, S. Lawrence Zipursky, Paul Matsudaira & James Darnell. Published by Scientific American Books, New York (USA) 7. Molecular Biology of the Cell by Bruce Albert, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts and James D. Watson. Published by Garland Publishing Inc. New York (USA) 8. Recombinant DNA by J.D. Watson, M. Gilman, J. Witkowski & M. Zoller. Published by Scientific American Books, New York (USA) 9. Microbial Genetics by S.R. Maloy, J.E. Cronan and D. Freifelder. Published by Jones Bartlett Publishers, Boston (USA) 10. Immunology by Richard A. Goldsby, Thomas J. Kindt and Barbara A. Osborne. Published by W.H. Freeman & Company, New York (USA) 11. Concepts of Biochemistry by L.M. Srivastava. Published by CBS Publishers & Distributors, New Delhi (India) 12. Physical Biochemistry by David Freifelder. Published by W.H. Freeman & Company, New York (USA)