2013 rosandic et al fundamental role of start stop regulators in whole dna and new

Page 1

GENE-39021; No. of pages: 7; 4C: 2 Gene xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification Marija Rosandić a, Vladimir Paar a,b, Matko Glunčić a,c,⁎ a b c

Faculty of Science, University of Zagreb, Bijenička 32, 10000 Zagreb, Croatia Croatian Academy of Sciences and Arts, Zrinski Trg 11, 10000 Zagreb, Croatia Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany

a r t i c l e

i n f o

Article history: Accepted 5 September 2013 Available online xxxx Keywords: Start trinucleotide Stop trinucleotides Genetic code Trinucleotide classification Noncoding DNA Regulators

a b s t r a c t The origin and logic of genetic code are two of greatest mysteries of life sciences. Analyzing DNA sequences we showed that the start/stop trinucleotides have broader importance than just marking start and stop of exons in coding DNA. On this basis, here we introduced new classification of trinucleotides and showed that all A+T rich trinucleotides consisting of three different nucleotides arise from start-ATG, stop-TGA and stop-TAG using their complement, reverse complement and reverse transformations. Due to the same transformations during generations of crossing-over they can switch from one form to the other. By direct process the start-ATG and stop-TAG can irreversibly transform into stop-TAA. By transformation into A+T rich trinucleotides and 16/32 C+G rich they can lose the start/stop function and take the role of a sense codon in reversible way. The remaining 16 C+G trinucleotides cannot directly transform into start/stop trinucleotides and thus remain a firm skeleton for structuring the C+G rich DNA. We showed that start/stops strongly enrich the A+T rich noncoding DNA through frequently extended forms. From the evolutionary viewpoint the start/stops are chief creators of prevailing A+T rich noncoding DNA, and of more stable coding DNA. We propose that start/stops have basic role as “seeds” in trinucleotide evolution of noncoding and coding sequences and lead to asymmetry between A+T and C+G rich DNA. By dynamical transformations during evolution they enabled pronounced phylogenetic broadness, keeping the regulator function. © 2013 Elsevier B.V. All rights reserved.

1. Introduction It is well known that the standard genetic code is applied to coding sequences (exons in genes) (Craig et al., 2010; Osawa, 1995). Genetic code is redundant and has highly non-random structure, so that it is highly albeit not optimally robust to errors of translation. The origin of evolution of genetic code represents a major puzzle of modern biology and numerous hypotheses have been formulated (Crick, 1968; Di Giulio, 2005, 2008; Frenkel and Trifonov, 2012; Gilbert, 1986; Higgs, 2009; Jordan et al., 2005; Kawahara-Kobayashi et al., 2012; Knight and Landweber, 2000; Koonin and Novozhilov, 2009; Novozhilov and Koonin, 2009; Wong, 1975). Initially, Crick (1968) has discussed the postulates of stereochemical theory and the frozen accident theory, and the co-evolution theory explored the third hypothesis (Di Giulio, 2008; Wong, 1975). In summarizing the state of the art in studies of code evolution, it was noted that one cannot escape considerable

Abbreviations: CLT, codon like trinucleotide; HOR, higher order repeat; TNT, trinucleotide. ⁎ Corresponding author at: Faculty of Science, University of Zagreb, Bijenička 32, 10000 Zagreb, Croatia. E-mail address: matko@pks.mpg.de (M. Glunčić).

skepticism: the fundamental question “why is the genetic code the way it is and how did it come to be?”, that was asked half a century ago might remain pertinent even in another half a century (Koonin and Novozhilov, 2009). On the other hand, the noncoding DNA sequences, including introns and intergenic DNA, represent fully 98% of human genome and do not encode protein sequences. Inspired by recent trend of recognition of important biological role of noncoding DNA, it is intriguing to ask the question “do the codon like trinucleotides have some biological role as regulators in noncoding sequences?” It is known that trinucleotide repeats are of interest in genetics because they are used as markers for tracing genotype–phenotype relations (Ellegren, 2004; Gyapay et al., 1994; Lander and Green, 1987; Weissenbach et al., 1992) and are directly involved in numerous human genetic diseases (Gatchel and Zoghbi, 2005; La Spada et al., 1991; Orr and Zoghbi, 2007; Richards et al., 2001; Usdin and Grabczyk, 2000). Dinucleotide and trinucleotide frequencies in noncoding sequences have been investigated in a number of studies (Astolfi et al., 2003; Collins et al., 2003; Karlin and Mrazek, 1997; Karlin et al., 1994; Kozlowski et al., 2010; Nussinov, 1984; Toth et al., 2000). Characteristic of A+T rich noncoding sequences is under-representation of TAdinucleotides (TA low pattern) (Karlin et al., 1994; Nussinov, 1984), but without providing clear etiology.

0378-1119/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.gene.2013.09.021

Please cite this article as: Rosandić, M., et al., Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification, Gene (2013), http://dx.doi.org/10.1016/j.gene.2013.09.021


2

M. Rosandić et al. / Gene xxx (2013) xxx–xxx

Of particular interest for noncoding sequences may be the role of trinucleotides which in coding region act as start/stop codons, marking the beginning and end of coding sequences (exons) within genes. If these trinucleotides have a signaling role in coding sequences, could they have a role of some regulators in noncoding regions? To our knowledge, not much attention was given to this question (Rosandić et al., 2011, 2013). In this light, here we reinvestigate standard genetic code introducing a new classification based on the analysis of noncoding sequences. We analyze various segments of noncoding genomic sequences, finding in particular interesting features of human α satellite repeats. α Satellites are tandem repeats, with repeat units of ≈171 bp, covering several Mb stretches in centromeric and pericentromeric region of human and great ape chromosomes. They were studied extensively (Alkan et al., 2011; Haaf and Ward, 1994; Manuelidis, 1978; Rudd and Willard, 2004; Tyler-Smith, 1985; Warburton and Willard, 1996; Willard, 1985) and recently were considered in the novel framework of extended codon-like trinucleotides (Rosandić et al., 2011, 2013). 2. Materials and methods Our studies in detecting and analyzing repetitive sequences in genome sequences from NCBI assembly (www.ncbi.nlm.nih.gov/ mapview/) are performed using GRM method (Glunčić and Paar, 2012; Paar et al., 2005, 2011a, 2011b; Rosandić et al., 2003a, 2003b, 2013). 3. Results 3.1. New trinucleotide classification Here we introduce a new classification of trinucleotides (TNTs) in both noncoding (A+T rich) and coding (C+G rich) DNA sequences. It is based on combinatorial approach to symbolic algebra of fourelement alphabet (A,C,G,T), emphasizing the special role of start/stop codons as regulators in noncoding sequences. The 43 = 64 TNTs are first divided into two 32-TNTs groups: the A+T rich group I and the C+G rich group II. Each group is classified into three subgroups (Fig. 1). Our starting point is the assumption that ATG, TGA, TAG and TAA act as TNT generators. In standard genetic code for coding sequences these TNTs represent start/stop codons. Already by its A+T rich composition, they induce the A+T vs. C+G asymmetry. In noncoding sequences we refer to them as start/stop codon like trinucleotides (CLTs): start-ATG, stop-TGA, stop-TAG and stop-TAA CLTs (Rosandić et al. 2011, 2013). We first construct the quadruplet consisting of ATG and its R,C-counterparts: reverse complement (RC) CAT, complement (C) TAC, and reverse (R) GTA (Fig. 1, first row in Ia subgroup). Analogously, we form the stopTGA CLT and stop-TAG CLT quadruplets (Fig. 1, second and third row in Ia subgroup, respectively). Thus, the A+T rich subgroup Ia contains 12 TNTs, each consisting of one A, one T, and one C or G nucleotide. The subgroup Ib, starting with stop-TAA CLT, consists of 10 A-rich TNTs, each containing two or three A nucleotides. The arguments for treating stop-TAA in a different way than the other three start/stop CLTs and for specific ordering of A-rich TNTs in subgroup Ib will become clear from analyzing domains of transforming start/stop CLTs (Figs. 2 and 3). Finally, the subgroup Ic from group I is RC of Ib. The A+T rich group I contains 36 A, 36 T, 12 C and 12 G nucleotides, i.e., characterized by nucleotide ratio (A+T):(C+G) = 3:1. The C+G rich counterpart of start-ATG-quadruplet is the CGTquadruplet (Fig. 1, first row in subgroup IIa). The start-ATG quadruplet is mapped into CGT quadruplet by Watson–Crick transformation: A→C; C→A; T→G; G→T; which maps Watson–Crick pairs into each other: (A,T) into (C,G), and vice versa. The same transformation maps the whole A+T rich group I into C+G rich group II (Fig. 1). In particular, the stop-TGA-, stop-TAG-, and

Group I (A+T rich) Soubgroup Ia RC C R ATG CAT TAC GTA TGA TCA ACT AGT TAG CTA ATC GAT Subgroup Ib TAA AAT ACA ATA AAC CAA AAG GAA AGA AAA

Subgroup Ic RC TTA ATT TGT TAT GTT TTG CTT TTC TCT TTT

Group II (C+G rich) Subgroup IIa RC C R CGT ACG GCA TGC GTC GAC CAG CTG GCT AGC CGA TCG Subgroup IIb GCC CCG CAC CGC CCA ACC CCT TCC CTC CCC

Subgroup IIc RC GGC CGG GTG GCG TGG GGT AGG GGA GAG GGG

Fig. 1. Start/stop based trinucleotide classification. Trinucleotides (TNTs) are classified into A+T rich group (denoted I) and C+G rich group (denoted II). Each group consists of three subgroups, denoted Ia, Ib, Ic and IIa, IIb, IIc, respectively. Initial TNTs in subgroup Ia are start-ATG (row 1), stop-TGA (row 2), and stop-TAG (row 3). The first row in subgroup Ib is stop-TAA. Pink: start/stop CLTs. Other TNTs in subgroups Ib and Ic: blue — TNTs created by start-ATG; green — TNTs created by stop-TGA; yellow — TNTs created only by stop-TAG; red — TNTs created only by stop-TAA. Group II is obtained by Watson–Crick transformation of group I. For details see the text.

stop-TAA-quadruplets are mapped into GTC-, GCT-, and GCCquadruplets, respectively. In the C+G rich group II the nucleotide ratio is (C+G):(A+T) = 3:1. 3.2. Direct-reverse complement pairs of trinucleotides We investigate the usefulness of our TNT classification scheme and of model of extended and nonextended form of TNTs with particular emphasize on start/stop CLTs and their RC, R and C maps for analysis of large noncoding DNA sequences. We determine frequencies in percentages expressed for direct and reverse complement pairs of standard nonextended TNTs in comparison with quotient of nucleotides constituting extended and nonextended forms of TNTs. Analysis is performed for the 12 Mb nonrepetitive A+T rich sequence from human chromosome 1, the 266 kb nonrepetitive C+G rich sequence from human chromosome 7 and 16 mer HOR (2734 bp repeat unit) from human chromosome 7 (Table 1). Additionally, we investigated two other A+T rich nonrepeat sequences of 5 Mb and 0.8 Mb which give similar results. We obtain a high degree of mutual identity of frequencies between TNTs within each direct-reverse complement pair. This indicates the role of TNTs in generating the second Chargaff's rule. From evolutionary point of view we can argue that together with each new TNT, its reverse complement counterpart was created in the same strand. The similarity is seen within direct-reverse complement pairs in quotients of nucleotides in extended and nonextended TNTs (expressed in percentages) in both sequences, with extended TNT forms prevailing and showing individual differences. Only the AAA– TTT pair of TNTs has a similar content of nucleotides both in extended and nonextended TNTs. In large sequences studied here, both A+T rich and C+G rich, the highest frequency in A+T rich TNTs has the TGA–TCA pair, and in C+G rich TNTs their mapping counterpart is the CTG–CAG pair according to our TNT classification scheme. Smaller repetitive sequences like 171 bp α satellites exhibit specific features both in frequencies and in quotient of constituting nucleotides in extended and nonextended TNTs (Rosandić et al., 2011, 2013). Thus,

Please cite this article as: Rosandić, M., et al., Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification, Gene (2013), http://dx.doi.org/10.1016/j.gene.2013.09.021


M. Rosandić et al. / Gene xxx (2013) xxx–xxx

A ATG CAT TAC GTA

B TGA TCA ACT AGT

C TAG CTA ATC GAT

D TAA TTA ATT AAT

ATG TGA GAT ATA TAT ACA CAT TAA AAT

CAT TGC GCA ATC TCA ACC CCA TAC ACA

TAC TGT GTA ATT TTA ACT CTA TAT ATA

GTA TGG GGT ATG TGT ACG CGT TAG AGT

TGA GAT ATG CAT ATG CTT TTG GTT TTG

TCA GAT ATC CAT ATC CTT TTC GTT TTC

ACT GAA AAC CAA AAC CTA TAC GTA TAC

AGT GAA AAG CAA AAG CTA TAG GTA TAG

TAG AGT GTA TAT ATA TCA CTA ATT TTA

CTA AGC GCT TAC ACT TCC CCT ATC TCT

ATC AGA GA TAA AAT TCA CAT ATA TAT

GAT AGG GGA TAG AGA TCG CGA ATG TGA

TAA AAT ATA TAT ATA TTT TTA ATT TTA

TTA AAT ATT TAT ATT TTT TTT ATT TTT

ATT AAA AAT TAA AAT TTA TAT ATA TAT

AAT AAA AAA TAA AAA TTA TAA ATA TAA

E SubNo. of Rows group TNTs enumeration Ia 12 1-3 Ib 4 1-4 Ic 4 1-4 IIa 4 1 IIb 2 5-6 IIc 2 5-6 Fig. 2. Generating matrix created by initial start/stop trinucleotides. (A) Initial trinucleotide start-ATG. The generating matrix contains 28 different TNTs classified according to new classification: 12 TNTs from subgroup Ia (rows 1–3), 4 TNTs from subgroup Ib (rows 1–4), 4 TNTs from subgroup Ic (rows 1–4), 4 TNTs from subgroup IIa (row 1), 2 TNTs from subgroup IIb (rows 5–6), and 2 TNTs from subgroup IIc (rows 5–6). In ATGmatrix we obtain also four TNTs corresponding to CGT-quadruplet (row 1 in subgroup IIa), the counterpart of the ATG-quadruplet. The remaining four C+G rich TNTs in the ATG-matrix correspond to rows 5–6 in subgroups IIb and IIc. In this way, start-ATG is "altruistic". (B) Initial TNT stop-TGA. The generated TNTs are: 12 TNTs from the A+T rich subgroup Ia generated again; TNTs from rows 5–8 in subgroups Ib and Ic. TGA cannot generate in a single step any of TNTs from C+G rich group II as "selfish" and those TNTs from subgroups Ib and Ic which contain AT or TA dinucleotides. (C) Initial TNT stop-TAG. StopTAG generates some TNTs from all six subgroups, similarly as start-ATG TNT. However, there are significant differences: stop-TAG generates TNTs in row 9 of subgroups Ib and Ic, row 3 in subgroup IIa, and rows 7–8 in subgroups IIb and IIc, as unique feature among start/stop CLTs. In this sense TAG is “altruistic” too, but complementary to stopATG. (D) Initial TNT stop-TAA. TAA is “selfish” because it cannot generate any TNT from C+G rich group II. Its unique role is to generate row 10 in Ib and Ic. (E) List of TNTs generated in matrix A.

the TNT TGA has the largest frequency of 4%, with pronounced dominance of extended forms of TGA. In absolute values each α satellite monomer contains 8–12 dominantly extended TGAs. Some difference appears when the middle G nucleotide in TGA is extended, like in TGGA, because in such a case going in step of 1 along the sequence, the TGA nonextended TNT is not identified. TAA appears once in each α satellite monomer copy sizably extended, while the TAG is mostly nonextended. Their reverse complements sizably differ in quotient of extended to nonextended forms, and for TGA–TCA pair also in frequency. Our results indicate that specific differences in frequency and in degree of extension of individual TNTs, especially of start/stop CLTs in

3

repetitive sequences of medium or shorter length, could provide a new tool in phylogenetic studies. 3.3. Combinatorial experiment and trinucleotide matrix In order to investigate how can TNTs in a simple way generate asymmetry between A+T rich group I and C+G rich group II, we consider combinatorial experiment, as a model of evolutionary processes in hypothetical primordial TNT “soup” and of crossing over processes during meiosis. Our idea is that four TNTs representing start/stop codons in coding sequences could be considered as “seed” for generating other TNTs. First we assign to each of start-ATG, stop-TGA and stop-TAG their corresponding RC, C and R transforms, summary referred to as R,C-transforms. Such approach has some resemblance to empirically observed intra-strand second Chargaff's parity rule (Albrecht-Buehler, 2006; Chargaff, 1979; Powdel et al., 2009; Sueoka, 1999). In this way, we obtain three quadruplets: start-ATG-, stopTGA- and stop-TAG-quadruplets (three rows in subgroup Ia, Fig. 1). The first quadruplet, arising from start-ATG, is ATG, CAT, TAC, and GTA (first row in subgroup Ia). Let us now construct the corresponding 4 × 4 generating matrix, denoted as ATG-matrix (Fig. 2A). At the position of each matrix element we place a dimer obtained by fusing TNTs at the intersection of the corresponding row and column. For example, the matrix element at the intersection of the first row (ATG) and the second column (CAT) is dimer ATGCAT. To each dimer we assign two TNTs immersed in it, i.e., built from nucleotides at positions 2 to 4 (TGC in the case of ATGCAT) and 3 to 5 (GCA in the case of ATGCAT). This corresponds to the concept of “running” sequences (TNTs read by frame shift of 1) (Albrecht-Buehler 2006). In this way the “running” TNTs immersed in ATGCAT dimer at the position of ATG/CAT intersection are TGC and GCA. The first TNT ATG and the last TNT CAT from ATGCAT are already included in ATG-quadruplet. In analogy, all matrix elements in ATG-matrix (Fig. 2A) are determined. To each position in the ATG-matrix two TNTs are assigned, representing TNTs generated by ATG-matrix. Domain of these TNTs is shown schematically (shadowed squares on the background of our classification scheme, Fig. 3A). The start-ATG generates in a single step TNTs from both A+T rich group I and some TNTs from the C+G rich group II. Therefore, the start-ATG will be referred to as “altruistic.” In analogy we construct the generating matrices corresponding to three stop-TNTs (Fig. 2B–D). The corresponding domains of generated TNTs are shown in Fig. 3B–D. Stop-TGA cannot generate any of TNTs from C+G rich group II. For this reason, stop-TGA will be referred to as “selfish.” Stop-TAG is “altruistic” like start-ATG, and stop-TAA “selfish” like stop-TGA, but with some unique features (Fig. 3). The C+G rich counterparts of start/stop CLTs are CGT, GTC, GCT, and GCC. Using generating matrix method for these four TNTs (Supplementary Fig. S1) we obtain all 32 C+G rich TNTs. In this way, the C+G rich TNTs generated by GTC and GCC are counterparts of TNTs generated by “selfish” TGA and TAA, respectively, and thus are “selfish.” For example, GTC and its R,C-transforms create all TNTs from C+G rich IIa subgroup, but none from Ia subgroup. Analogously, the C+G rich CGT and GCT are “altruistic,” being counterparts of “altruistic” ATG and TAG, respectively. But neither of 32 TNTs from C+G rich group II can create stop-TGA and stop-TAA. This effect can contribute to reduction of frequency of TGA and TAA and their R,C-transforms in exons, which can arise during crossing over. In this way the integrity of C+G rich group in exons can be protected. Transforms of start-ATG, stop-TGA and stop-TAG within subgroup Ia are reversible. However, transforms of start-ATG and stop-TAG into TNTs from the first three rows in Ib and Ic subgroups are irreversible: by reversed transformation it is not possible to go directly back to start-ATG and stop-TAG from Ia. Let us consider the TNTs generating potential of start-ATG. The ATGquadruplet generates the ATG-matrix (Fig. 2A). The set of 32 TNTs contains three stop-CLTs (TGA, TAG, TAA) from group I, as well as the

Please cite this article as: Rosandić, M., et al., Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification, Gene (2013), http://dx.doi.org/10.1016/j.gene.2013.09.021


4

M. Rosandić et al. / Gene xxx (2013) xxx–xxx start-ATG maps

stop-TGA maps

stop-TAG maps

stop-TAA maps

A+T rich

C+G rich

A+T rich

C+G rich

A+T rich

C+G rich

A+T rich

Ia

IIa

Ia

IIa

Ia

IIa

Ia

RC R

C

RC R

C

RC R

C

RC R

C

RC R

C

RC R

C

RC R

C+G rich IIa C

RC R

C

1 2 3 Ib

Ic

IIb

RC

IIc

Ib

RC

IIc RC

IIb

IIc

Ib

RC

IIc

IIb

RC

IIc

Ib

RC

IIc RC

1

1

1

2

2

2

3

3

3

IIb

IIc RC

4 5

5

6

6 7

7 8

8 9 10

Fig. 3. Domains of trinucleotides generated by initial start/stop trinucleotides. (A) Initial start-ATG. Domain of TNTs generated in a single step by ATG-matrix. Start-ATG generates TNTs from A+T rich group I (20/32) and from C+G rich group II (8/32). (B) Initial stop-TGA. (C) Initial stop-TAG. (D) Initial stop-TAA. The generated TNTs are shown in the scheme of new TNT systematics (shaded squares). This shows that start-ATG generates all three stop TNTs in a single step. Initial TGA cannot generate TAA in a single step. See caption to Fig. 2.

C+G rich counterpart of start-ATG (CGT) from group II. In the second step the CGT-quadruplet generates the CGT-matrix (Supplementary Fig. S1A), which produces all TNTs from subgroup IIa. In the third step, their generating matrices give rise to all TNTs from subgroups IIb and IIc. In this three-step process, the initial start-ATG generates all 64 TNTs. In this model, the A+T rich dominance originates from primordial ATG TNT. In the proposed scenario the start-ATG generates three stop-TNTs. In noncoding DNA all four start/stop TNTs have the role of accumulating and binding A and T nucleotides, mostly in their extended forms, additionally enhancing the dominance of A+T nucleotides. It should be pointed out that three stop-CLTs are needed in combination with start-ATG to cover in a single step all TNTs from the A+T rich group I (Fig. 3). Regarding the C+G rich group II, the start-ATG covers row 1 and the stop-TAG row 3 in subgroup IIa, while the row 2 is not covered in a single step. In subgroups IIb and IIc the rows 5–8 are covered, while the remaining six rows stay uncovered in a single step processes. Within the framework of our model we address the question why there is one start- and three stop-TNTs. Is it coincidence or based on rationale grounds? In the A+T rich group I the start-ATG, stop-TGA and stop-TAG have common characteristic that each of them acts on the whole subgroup Ia. On the other hand stop-TAA does not act on any of TNTs in subgroup Ia and therefore is classified into subgroup Ib. Start-ATG acts also on the first four rows in subgroups Ib and Ic, and stop-TAG and stop-TAA act on the first three rows in subgroups Ib and Ic. Only start-ATG acts on the fourth row, only stop-TGA on rows 5 to 8, only stop-TAG on row 9, and only stop-TAA on row 10. We see that for creation of all A+T rich TNTs (group I) all four start/stop TNTs are necessary, because only combination of their domains covers full domain of group I. We hypothesize that each start/stop TNT in noncoding sequences acts on those TNTs which are generated by the corresponding start/stop-TNT quadruplet. The action of start/stop TNTs in the domain of C+G rich TNTs is quite different. First, neither of TNTs from this group is generated simultaneously by start- and stop-TNTs. Start-ATG acts on the first row in subgroup IIa and on rows 5 and 6 in subgroups IIb and IIc. Stop-TAG acts on row 3 in IIa and on rows 7 and 8 in subgroups IIb and IIc. The remaining 16 TNTs are “neutral,” because neither of start/stop-TNTs acts upon them. These 16 “neutral” TNTs are C+G rich dominating in coding DNA. They cannot create any A+T rich TNT from group I and in this sense are out of control of start/stop-s. This irreversibility contributes to pronounced asymmetry between action of start/stop TNTs on A+T rich and on C+G rich groups. In addition, to this asymmetry contributes much higher density of start/stop TNTs in noncoding than in coding sequences.

3.4. Inverse rule for stop-TGA and TA dinucleotides TGA-quadruplet cannot generate six TNTs which contain AT and TA dinucleotides (Fig. 3), while the frequency of TGA in nonextended+extended form (Rosandić et al., 2011, 2013) in α satellites is the highest (Supplementary Table S1). The TGA CLTs in the corresponding consensus human α satellite monomer are TGA, TTGAA, TTTTGA, TTTTGAAA, TGGA, TTTGGA, TTTGA, and TGGAAAA. Only the first is nonextended TGA CLT, while all the others are extended. Thus TGA acts as a “seed” for multi-A and multi-T accumulation and binding. This was recognized by introducing the concept of extended CLTs (Rosandić et al., 2011, 2013). In the rest of α satellite monomer there are only two pronounced multi-T extended TNTs (TTTGT and TTTTTGT). In TGA and TGT extended forms the G nucleotide occupies the central position within TNT, with strong tendency to accumulate and bind Ts on one side and As on the other. Additionally, TGA being selfish and G in central position cannot generate any TNT from group II. Similar structure appears in α satellites of great apes (Rosandić et al., 2013). The extended TGAs accumulate and bind as much as 30% of all A and T nucleotides in human α satellites. This accumulation of T and A leads to their depletion in the rest of α satellite and therefore to TA-low pattern. Therefore, the frequencies of extended+nonextended stopTGAs and of dinucleotide TAs are inversely proportional. Human α satellites have the lowest frequency of TA dinucleotides and the highest frequency of extended+nonextended TGAs among all noncoding sequences (Rosandić et al., 2011). In great apes depletion of TA is slightly less pronounced and the frequency of stop-TGA CLTs similar as in humans but start/stop extensions are shorter (Rosandić et al., 2011, 2013). The opposite situation appears for repeats with 2.4 kb repeat unit (Paar et al., 2011b) in human chromosome 1: the TA frequencies similar to the case of randomized sequences, i.e., higher than in α satellites, and the TGA frequency reduced. The tendency of accumulation of A+T rich start/stop CLTs (including their counterparts in quadruplets) and under-representation of TA dinucleotides lead to nonequilibrium in noncoding DNA structure. 3.5. Saccharomyces cerevisiae as living fossil S. cerevisiae has very small centromeric DNA, only 119 bp long, called point centromere (Supplementary Fig. S2). It is marked A+T rich (44 A+57 T versus 9 C+9 G). It contains two extended and one nonextended start-ATG CLTs (AATG, ATG, and ATTTG), five pronounced stop-TAA CLTs (TTTTTAAAAA, TTTTAAAA, TTTTAAAAAA, TTAAAA, and TAAAAA), and one CGT. According to the classification from Fig. 1,

Please cite this article as: Rosandić, M., et al., Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification, Gene (2013), http://dx.doi.org/10.1016/j.gene.2013.09.021


M. Rosandić et al. / Gene xxx (2013) xxx–xxx

Table 1 Frequencies of direct/reverse complement pairs of trinucleotides (%) and quotients of nucleotides in extended and nonextended trinucleotides (%). Location

Size

H.chr.1

H.chr.7

NT004610.18

AF053356.1

16-mer HOR

12 Mb

226 Kb

2734 bp

Freq. %

Ext./no ext

Freq. %

H.chr.7

Ext./no ext

Freq. %

Ext./no ext

A+T rich direct — revers complement pairs TNTs: ATG 2.05 2.73 1.59 CAT 2.05 2.70 1.56 GTA 1.25 2.48 0.89 TAC 1.25 2.48 0.88 TGA 2.73 2.59 2.59 TCA 2.72 2.56 2.54 AGT 2.14 2.53 2.00 ACT 2.14 2.53 2.00 TAG 1.52 2.73 1.18 CTA 1.52 2.74 1.16 GAT 1.75 3.00 1.45 ATC 1.75 2.95 1.41 TAA 1.40 2.07 0.86 TTA 1.40 2.05 0.85 AAT 1.70 2.24 1.15 ATT 1.70 2.24 1.13 ATA 1.57 4.54 0.97 TAT 1.57 4.40 0.97 ACA 2.12 2.42 1.88 TGT 2.13 2.35 1.96 AAC 1.25 1.83 1.04 GTT 1.25 1.84 1.04 CAA 1.68 1.92 1.49 TTG 1.69 1.94 1.47 AAG 1.78 1.65 1.60 CTT 1.80 1.65 1.46 GAA 1.65 1.83 1.45 TTC 1.65 1.80 1.32 AGA 2.44 2.66 2.39 TCT 2.44 2.67 2.29 AAA 1.65 1.01 1.26 TTT 1.66 1.00 1.20

2.71 2.72 1.85 1.97 2.56 2.59 2.41 2.37 2.32 2.47 2.66 2.80 1.90 1.91 2.28 2.38 4.44 4.36 2.37 2.28 2.12 2.04 2.03 2.05 1.72 1.75 2.02 1.95 2.66 2.47 1.22 1.32

1.76 2.45 0.77 0.99 4.06 2.31 1.94 2.64 1.79 1.35 2.60 2.64 1.02 0.88 1.68 2.45 2.20 1.65 2.27 1.65 2.56 1.39 0.92 3.77 1.98 3.92 4.28 3.11 3.29 3.40 3.00 3.55

2.75 5.62 0.44 2.58 13.88 2.10 8.04 7.95 0.65 2.41 2.37 3.83 4.83 2.43 1.58 1.03 2.06 2.65 1.11 2.57 7.22 1.92 0.42 5.55 0.88 3.53 1.87 0.61 2.67 7.05 0.66 1.54

C+G rich direct — revers complement pairs TNTs CGT 0.46 2.43 0.67 ACG 0.45 2.43 0.63 TGC 2.46 3.07 2.68 GCA 2.44 3.07 2.72 GTC 1.53 2.89 1.58 GAC 1.52 2.91 1.57 CTG 3.30 2.89 3.49 CAG 3.28 2.87 3.53 GCT 2.60 3.12 2.95 AGC 2.59 3.10 3.04 TCG 0.48 2.93 0.75 CGA 0.48 2.91 0.76 GCC 1.85 1.30 2.39 GGC 1.85 1.29 2.41 CCG 0.49 2.01 0.86 CGG 0.49 2.02 0.91 CGC 0.56 4.54 0.96 GCG 0.56 4.53 1.02 CAC 1.96 2.47 2.05 GTG 1.97 2.45 2.13 CCA 2.30 1.34 2.51 TGG 2.32 1.35 2.61 ACC 1.40 1.26 1.51 GGT 1.41 1.27 1.63 CCT 2.31 0.95 2.63 AGG 2.32 0.95 2.72 TCC 1.85 1.63 2.11 GGA 1.87 1.63 2.21 CTC 2.46 2.86 2.70 GAG 2.46 2.81 2.83 CCC 1.49 0.50 2.00 GGG 1.50 0.50 2.05

2.68 2.75 3.42 3.17 3.48 3.37 3.46 3.42 3.22 3.25 3.59 3.42 1.58 1.49 2.50 2.35 5.66 5.60 2.75 2.91 1.47 1.40 1.30 1.26 0.91 0.92 1.60 1.62 3.09 3.17 0.62 0.60

0.37 0.40 1.46 2.27 0.51 2.75 2.75 2.82 1.06 2.01 0.44 0.62 0.40 0.48 0.18 0.29 0.29 0.44 1.57 2.71 0.51 1.54 0.88 0.40 1.35 1.02 0.77 1.98 3.11 2.82 0.37 0.55

5.83 18.33 0.62 0.48 1.43 5.84 4.92 0.50 5.19 1.72 3.00 5.17 0.13 1.14 0.00 0.80 0.50 0.75 0.74 2.29 0.74 2.54 4.56 1.11 5.83 1.92 4.67 1.40 2.34 4.13 0.15 0.10

5

such centromere represents an archaic form, consisting mostly of primordial TNT “seeds” start-ATG and stop-TAA, and CGT which represents the C+G rich counterpart of start-ATG. In the remaining part of centromere there are mostly TNTs in extended form: TTTTAAAAC corresponding to extended TAC, and TAC, both corresponding to complement of extended/nonextended ATG; CATTTT and CATTTTTTT, corresponding to RC of extended ATG; AAAAGT corresponding to R of extended TGA; and CTTTTTAAAAA corresponding to RC of extended TAG. We hypothesize that in the evolutionary chain this was the first A+T rich structural centromeric unit with tendency to form 64 TNTs. In contrast, Saccharomyces pombe is more closely related to higher eukaryotes, having a complex regional centromere of 56,016 bp. Evolutionary leaps between these two centromeres are large. It seems that some transitional organisms between these two species are either extinct or as yet not discovered. The long evolutionary path from S. cerevisiae to great apes and humans was characterized by development of sophisticated noncoding A+T rich α satellite repeats and higher order repeats (Warburton and Willard, 1996). We show that the A+T rich centromere is characterized by pronounced dominance of start/stop CLTs, mostly in extended forms, regardless whether in S. cerevisiae, as the first eukaryote, or in Homo sapiens, as the most developed species. 3.6. Start/stop entrapment in noncoding sequences We show that the frequency of nonextended and extended start/ stop CLTs in noncoding sequences, as for example α satellites and introns, is significantly higher than in the corresponding randomized sequences, indicating their possible role as regulators. Illustrative examples are higher order repeats (HORs) in Neuroblastoma BreakPoint Family (NBPF) Gene in human chromosome 1 (Paar et al., 2011b). In this case we find some tandem repeats of nonextended or extended dinucleotides (Fig. 4A) like entrapped clustered TNTs of R,C-transforms of start/stop-TNTs, for example, TGA → AAG, GAA; TGA → CTT, TTC; ATG → TAT, ATA; and ATG → TGG, GGT. It seems that such arrays have a tendency of entrapment between two start/stop CLTs. We note that overlapped start/stop ATGA (two-nucleotide overlap of ATG and TGA) or stop/start TGATG (one-nucleotide overlap of TGA and ATG) (Fig. 4A, first three sequences) appear in noncoding DNA, especially in α satellites (Rosandić et al., 2013). There are numerous examples with shorter but less regular repetitions in introns and sometimes with extended start/stop CLTs at the beginning and end of repeat pattern, representing a form of A+T rich nucleotide accumulation (Fig. 4B). In some cases the nucleotide accumulation develops when one of start/stop CLTs extends appreciably, for example TAG in extended form TTTAAAAAAAAAAAAAAAAAAGG (Fig. 4C). On the other hand, in the corresponding randomized sequences the repeats of types Fig. 4A and C practically do not appear, and short repeats are seldom. Extended

Notes to Table 1: Notation: F — frequency (in percents); E/NE quotient of nucleotides in extended to nonextended TNTs (in percents); TNT — trinucleotide. The first and the third sequence are A+T rich, while the second of 226 kb is slightly C+G rich (54% C+G, 46% A+T). Frequencies correspond to nonextended TNTs. TNTs are organized into direct-reverse complement pairs in the order of appearance as in our scheme of TNTs from Fig. 1 (A+T rich 1a, 1b, 1c; C+G rich 2a, 2b, 2c). The average frequency for comparison is 1.56% (64 different TNTs). We see a strong degree of similarity between frequencies within each direct-reverse complement pair of TNTs for the same strand in long sequences, in accordance with the second Chargaff's rule. The quotient of trinucleotides in extended and non-extended TNTs shows similar regularity. The best agreement is achieved in directreverse complement pairs of TNTs in the largest 12 Mb sequence, where in 11 out of 16 A+T rich TNT pairs the frequencies are practically identical, while in the others the difference within pairs is very small. Each tandem of 16-mer HOR copies (repeat unit length 2734 bp) (last two columns) basically consists of sixteen 171 bp α satellites, which mutually differ by 20–30%, contributing even more to specific differences of frequencies between pairs of TNTs. Even larger differences for α satellites with respect to large nonrepetitive sequences are seen in quotients of nucleotides constituting extended and nonextended TNTs.

Please cite this article as: Rosandić, M., et al., Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification, Gene (2013), http://dx.doi.org/10.1016/j.gene.2013.09.021


6

M. Rosandić et al. / Gene xxx (2013) xxx–xxx

A ....ACAATGAAAGAGAAAGACAGAGAGAGAGAGACAGAGACAGAGAGAGAGAGA AAGTGACCTA… ….AGAATGAAAGAGAAAGACAGGGAGAGGGAGGGAGAGAGAGAGAGAGAGAG AGAGGAGAAAGTAAGCTCA…. ….AGAATGAAAGAGAAAGACAGGGAGAGGGAGAGAGAGAGAGAGAGAGGAGA AAGTGAGCTC…. ….CAGATAGACACACACACACACACACACACACACACACACACACACACAGACA CACACACACACACAGAGAGAACGAGCTCAGTGAATTGT…. ….TGGGTTTTGATCTTCTTCCCCTTCTTTTCTTCCCCTTCTTCTTTCCTTCTTTGATC TT…. ….GTGTATGTATATATATATATATATACAGATATATATAATACTTTAAGTCTT….

B ….GAAAATGGTGGGCCTTGGTCTTCTTCCTCTTCTTGGTCCTTTTTAGTTCCTGC… ….GGGCATGGTGGGCCTTGGTCTTCTTCCTCTTCTTGGTCCTTTTTAGTTCC…. ….TTTCATTTTGCTTTTTTAATTTTTTTCTTTTTTGGTTTTTTGTTTTTTGTTTGAGA TTGA…. ….AAACTGAGGGACAGACAAGACAAGCAACATGGATGGAGCCCA…. ….ACTCTGGGGACTGTTGTGGGGTGGGGGGAGGGGGGAGGGATAGCACT….

C ….TTTAAAAAAAAAAAAAAAAAAGG…. ….TTTGGAAAAAAAAAA…. ….TGGGGGGA…. ….TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAC…. ….TTTAAAAAAAAAAAAAAAAAAAAAAAAC…

...TAG ...TGA ...TGA ...TAC ...TAC

Fig. 4. Examples of start/stop CLTs entrapment within introns from human chromosome 1. Introns_146618743, 146707072, 146711367, 25289230. (A, B) Entrapment of tandem repeats by CLTs (bold). (C) Substantially extended start/stop CLTs (TAG, TGA and TAC/complement of ATG), indicated on the r.h.s.

forms of start/stop CLTs could be interpreted as giving higher regulatory potential. It should be pointed out that if the same sequence is searched for TNTs in a standard way, in steps of three nucleotides starting from the same position, without recognizing that different subsequences of different lengths can belong to the same CLT pattern, the internal logic of extended CLTs could not be recognized. Let us consider as an illustration the detailed distribution of start/ stop TNTs and their extended forms in two nonrepeat segments from human chromosome 1 (NT_004610.18, position 3855747 to 3859666) and chromosome 7 (AF_053356, position 1-7105). Their TGA segmentation for TGA CLT, nonextended and extended, is shown in Supplementary Figs. S3 and S4, respectively. The TGA CLT was chosen because it has systematically the highest frequency in all repetitive and nonrepetitive sequences studied here. The sequence is thus split into subsegments between each two neighboring nonextended-extended TGAs. The average length of these subsegments is 33 bp in the first and 39 bp in the second sequence, respectively. Presented in histogram with 10 bp wide boxes, each distribution has a maximum in the 0 to 10 bp interval and gradually decreases towards larger distances. The largest distance is 161 bp and 307 bp, respectively. The largest subsequence contains approximate intermixed TAn, TGn and Tn microsatellites, flanked by nonextended TGA and extended TGGA. A case with entrapped microsatellites in Table S3 is the 136 bp subsequence with approximate TACACAn and TATACAn microsatellites. As seen here, in some cases they may be flanked by start/stop CLTs, but not as a general rule. As seen from our examples, such flanking is more common in repetitive than in nonrepetitive sequences. Such expansions are known to emerge due to formation of slippage structures during replication. In more than 80% of cases in large A+T and especially in C+G rich sequences we find that the subsegments between neighboring start/stop nonextended+extended TNTs are C+G rich, as a common characteristic. This is a consequence of high frequency of nonextended start/stops, especially TGAs, and even more of their extended forms. In this way, a significant fraction of A+T nucleotides is exhausted via extended forms of start/stops. Thus we might hypothesize that the predominantly C+G rich subsegments in noncoding

DNA, largely entrapped between start/stops and their extensions, are protecting the A+T rich character of noncoding sequences. 4. Conclusion To conclude, by the new classification of trinucleotides we indicate a possible role of start/stop TNTs as DNA regulators which enable a broad spectrum of variations in dynamical nonlinear processes of inheritance. However, they can change to their C, R, and RC transforms during crossing over in meiosis, in most cases into A+T rich codons in the next generation. This process can be reversible in the leading order, but by transition, for example, to TNTs from subgroups Ib or Ic, the process becomes irreversible in the leading order, with no return to stop CLTs. We show that to cover domain of A+T rich TNTs it is necessary to combine start-ATG CLT and all three stop-CLTs: TGA, TAG, and TAA. We hypothesize that in noncoding DNA each start/stop CLT could mostly exhibit its start- or stop-influence on those TNTs which emerge from them in a single step. With respect to the single step action of start/stops in domains of A+T and C+G rich TNTs we identify reversible, irreversible and neutral TNTs. In our opinion, the separation of genome into pronounced A+T rich and C+G rich subsequences with tendency of preserving the integrity of C+G rich coding DNA could have originated when ATG, TGA, TAG and TAA trinucleotides were active with regulator functions. On the basis of difference which is determined by degree of extension of start/stop CLTs in α satellites between humans and great apes, with dominance in humans, we hypothesize that during evolution the dynamic role of extended CLT mechanism, T and A accumulation and binding led to dominance of A+T rich over C+G rich frequency. This could lead to gradual changes along the evolutionary path and could give a new tool for phylogenetic analyses. In our opinion by introducing new systematic and the corresponding combinatorial experiment we provide evidence for pattern of deterministic dynamical system. A+T rich start/stop TNTs contribute to asymmetry between A+T and C+G rich groups of TNTs in genomes. Through the concept of TNT extensions the elements of nonlinear dynamics are induced, contributing to richness of biological phenomena. Our findings are consistent with the idea that start/stop trinucleotides and their extensions are exerting strong regulatory effects. Conflict of interest The authors declare not to have any competing interests. Acknowledgements We thank C. Tyler-Smith for stimulating our interest for the structure of α satellites and to anonymous reviewers for valuable suggestions to improve our article. Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx.doi.org/10.1016/j.gene.2013.09.021. References Albrecht-Buehler, G., 2006. Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions. Proc. Natl. Acad. Sci. U. S. A. 103, 17828–17833. Alkan, C., et al., 2011. Genome-wide characterization of centromeric satellites from multiple mammalian genomes. Genome Res. 21, 137–145. Astolfi, P., Bellizzi, D., Sgaramella, V., 2003. Frequency and coverage of trinucleotide repeats in eukaryotes. Gene 317, 117–125. Chargaff, E., 1979. How genetics got a chemical education. Ann. N. Y. Acad. Sci. 325, 345–360. Collins, J.R., et al., 2003. An exhaustive DNA micro-satellite map of the human genome using high performance computing. Genomics 82, 10–19. Craig, N.L., et al., 2010. Molecular Biology Principles and Genome Function. Oxford University Press, Oxford.

Please cite this article as: Rosandić, M., et al., Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification, Gene (2013), http://dx.doi.org/10.1016/j.gene.2013.09.021


M. Rosandić et al. / Gene xxx (2013) xxx–xxx Crick, F.H.C., 1968. The origin of the genetic code. J. Mol. Biol. 38, 367–379. Di Giulio, M., 2005. The origin of the genetic code: theories and their relationships, a review. Biosystems 80, 175–184. Di Giulio, M., 2008. An extension of the coevolution theory of the origin of the genetic code. Biol. Direct 3, 37. Ellegren, H., 2004. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5, 435–445. Frenkel, Z.M., Trifonov, E.N., 2012. Origin and evolution of genes and genomes. Crucial role of triplet expansions. J. Biomol. Struct. Dyn. 30, 201–210. Gatchel, J.R., Zoghbi, H.Y., 2005. Diseases of unstable repeat expansion: mechanisms and common principles. Nat. Rev. Genet. 6, 743–755. Gilbert, W., 1986. The RNA world. Nature 319, 618. Glunčić, M., Paar, V., 2012. Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm. Nucleic Acids Res. 41, e17. http://dx.doi.org/10.1093/ nar/gks721. Gyapay, G., et al., 1994. The 1993–94 Genethon human genetic-linkage map. Nat. Genet. 7, 246–339. Haaf, T., Ward, D.C., 1994. Structural analysis of alpha satellite DNA and centromere proteins using extended chromatin and chromosomes. Hum. Mol. Genet. 3, 697–709. Higgs, P.G., 2009. A four-column theory for the origin of the genetic code: tracing the evolutionary pathways that gave rise to an optimized code. Biol. Direct 4, 16. Jordan, I.K., et al., 2005. A universal trend of amino acid gain and loss in protein evolution. Nature 433, 633–638. Karlin, S., Mrazek, J., 1997. Compositional differences within and between eukaryotic genomes. Proc. Natl. Acad. Sci. U. S. A. 94, 10227–10232. Karlin, S., Ladunga, I., Blaisdell, B.E., 1994. Heterogeneity of genomes: measures and values. Proc. Natl. Acad. Sci. U. S. A. 91, 12837–12841. Kawahara-Kobayashi, A., et al., 2012. Simplification of the genetic code: restricted diversity of genetically encoded amino acids. Nucleic Acids Res. 21, 1–9. Knight, R.D., Landweber, L.F., 2000. The early evolution of the genetic code. Cell 101, 569–572. Koonin, E.V., Novozhilov, A.S., 2009. Origin and evolution of the genetic code: the universal enigma. IUBMB Life 61, 99–111. Kozlowski, P., de Mezer, M., Krzyzosiak, W.J., 2010. Trinucleotide repeats in human genome and exome. Nucleic Acids Res. 38, 4027–4039. La Spada, A.R., Wilson, E.M., Lubahn, D.B., Harding, A.E., Fischbeck, K.H., 1991. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 352, 77–79. Lander, E.S., Green, P., 1987. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. U. S. A. 84, 2363–2367. Manuelidis, L., 1978. Complex and simple sequences in human repeated DNAs. Chromosoma 66, 1–21. Novozhilov, A.S., Koonin, E.V., 2009. Exceptional error minimization in putative primordial genetic codes. Biol. Direct 4, 44. Nussinov, R., 1984. Doublet frequencies in evolutionary distinct groups. Nucleic Acids Res. 12, 1749–1763.

7

Orr, H.T., Zoghbi, H.Y., 2007. Trinucleotide repeat disorders. Annu. Rev. Neurosci. 30, 575–621. Osawa, S., 1995. Evolution of the Genetic Code. Oxford University Press, Oxford. Paar, V., et al., 2005. ColorHOR — novel graphical algorithm for fast scan of alpha satellite higher order repeats and HOR annotation for GenBank sequence of human genome. Bioinformatics 21, 846–852. Paar, V., et al., 2011a. Large tandem, higher order repeats and regularly dispersed repeat units contribute substantially to divergence between human and chimpanzee Y chromosomes. J. Mol. Evol. 72, 34–55. Paar, V., Glunčić, M., Rosandić, M., Basar, I., Vlahović, I., 2011b. Intragene higher order repeats in neuroblastoma breakpoint family genes distinguish humans from chimpanzees. Mol. Biol. Evol. 28, 1877–1892. Powdel, B.R., et al., 2009. A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides (Chargaff's second parity rule). DNA Res. 16, 325–343. Richards, R.I., et al., 2001. Dynamic mutations: a decade of unstable expanded repeats in human genetic disease. Hum. Mol. Genet. 10, 2187–2194. Rosandić, M., Paar, V., Basar, I., 2003a. Key-string segmentation algorithm and higherorder repeat 16mer (54 copies) in human alpha satellite DNA in chromosome 7. J. Theor. Biol. 221, 29–37. Rosandić, M., Paar, V., Glunčić, M., Basar, I., Pavin, N., 2003b. Key-string algorithm — novel approach to computational analysis of repetitive sequences in human centromeric DNA. Croat. Med. J. 44, 386–406. Rosandić, M., Glunčić, M., Paar, V., 2011. Start/stop codon-like trinucleotides (CLTs) and extended clusters as new language of DNA. Croat. Chem. Acta 84, 331–341. Rosandić, M., Glunčić, M., Paar, V., 2013. Start/stop codon like trinucleotides extensions in primate alpha satellites. J. Theor. Biol. 317, 301–309. Rudd, M.K., Willard, H.F., 2004. Analysis of the centromeric regions of the human genome assembly. Trends Genet. 20, 529–533. Sueoka, N., 1999. Two aspects of DNA base composition: G+C content and translationcoupled deviation from intra-strand rule of A T and G C. J. Mol. Evol. 49, 49–62. Toth, G., Gaspari, Z., Jurka, J., 2000. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 10, 967–981. Tyler-Smith, C., 1985. Structure of repeated sequences in the centromeric region of the human Y chromosome. Development 101, 93–100. Usdin, K., Grabczyk, E., 2000. DNA repeat expansions and human disease. Cell. Mol. Life Sci. 57, 914–931. Warburton, P.E., Willard, H.F., 1996. Evolution of centromeric alpha satellite DNA: molecular organization within and between human and primate chromosomes. In: Jackson, M., et al. (Ed.), Human Genome Evolution. BIOS Scientific, Oxford, pp. 121–145. Weissenbach, J., et al., 1992. A second-generation linkage map of the human genome. Nature 359, 794–801. Willard, H.F., 1985. Chromosome-specific organization of human alpha satellite DNA. Am. J. Hum. Genet. 37, 524–532. Wong, J.T.F., 1975. A co-evolution theory of the genetic code. Proc. Natl. Acad. Sci. 72, 1909–1912.

Please cite this article as: Rosandić, M., et al., Fundamental role of start/stop regulators in whole DNA and new trinucleotide classification, Gene (2013), http://dx.doi.org/10.1016/j.gene.2013.09.021


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.