Page 1

Forthcoming in: G. Terzis & R. Arp (eds.), Information and the Biological Sciences, MIT Press, Cambridge, MA, 2010.

Plasticity and Complexity in Biology: Topological Organization, Regulatory Proteins Networks and Mechanisms of Genetic Expression Toward New Vistas in the Life Sciences LUCIANO BOI École des Hautes Études en Sciences Sociales, Centre de MathÊmatiques. Mailing address: EHESS-CAMS, 54, boulevard Raspail, 75006 Paris (France) ABSTRACT: The fundamental genetic events within cells (transcription, replication, recombination and repair) seem to be profoundly linked to severe changes in the topological state of the double helix, and further to different sets of elastic deformations which take place in the chromatin and the chromosome. Furthermore, processes such as DNA supercoiling and chromatin remodeling are high complex both from the structural and the functional points of views. This complexity reflects notably the subtle and extremely rich dynamics which underlies these processes, as well as the variety of interactions and pathways present in the most important biological phenomena at very different scales, from transcription to evolution. An important goal of current research in biological sciences should thus be to develop topological methods to understand some structural and functional properties of the genome, which appear to play a fundamental role in the physiological organization of cells and the development of organisms. Our aim here is at exploring the genome at the level beyond that of DNA sequence alone. Further to investigate how the genome is topologically and dynamically organized into the nuclear space within the cell. We will mainly focus on analyses of higher order nuclear architecture and the dynamic interactions of chromatin with other nuclear components. We want to know how and why these levels of organization influences genes expression and chromosome functions, as well as the emergence of new patterns during the spatial and temporal development of multicellular organisms. The proper understanding of these processes require new concepts approaches be introduced and developed. KEY WORDS: plasticity, complexity, biological information, protein networks regulation, chromatin remodeling factors, DNA methylation, genes expression, epigenetics, geometrical modeling, topological organization, conformational changes, relational structures, self-organization, interpretation, meaning.

1. Introduction This paper is aimed first at studying some important aspects of the plasticity and complexity of biological systems and their links. We shall further investigate the relationship between the topological organisation and dynamics of chromatin and chromosome, the regulatory proteins networks and the mechanisms of genetic expression. Our final goal is to show the need for new scientific and epistemological approaches to the life sciences. In this respect, we think for instance that in the near future research in biology has to shift drastically from a genetic and molecular approach to an epigenetic and organismal approach, particularly, by studying the network of interactions among gene pathways, the formation and dynamics of chromatin


structures, and how environmental (intra- and extracellular) conditions may affect the response and evolution of cells and living systems. The comprehension of the connection between genetic expression, cells differentiation and embryo’s development is one of the more difficult scientific problems of life sciences and perhaps one of the great intellectual adventures of our times. A better characterization of these fundamental deeply related biological events could provide a key to our understanding of the growth and evolution of higher organisms, as well as of several mechanisms responsible for serious health diseases. The task is really complex and challenging, and it seems that, in this connection, a profound change of the biological thought be required and a significantly more organismal and integrative approach needs to be carried out. This change entails the working out of a process of relational unification in biology, which should be based on a multi-scale method for studying the properties and behaviors of biological systems. This method should be aimed at exploring the different, and to some extent irreducible (in the sense that, for instance, epigenetic properties of the genome cannot described or understood in terms only of genetic properties of DNA sequences) levels of organisation of living organisms. Yet, the question is whether it can provide a formal systematic basis for biological sciences as a whole. In our perspective, a system and integrative approach to biological sciences might allow to understand a meaningful ontological and epistemological reality, namely that living beings display a plurality of different ontogenetic, morphological and functional levels, which while being the outcome of genuine biological processes, they can affect changes in organisms and also influence the course of evolution. Let us now describe the state of the field and the main problems we will address in the following. In the last years it has become more and more clear that the linear sequence map of the human genome is an incomplete description of our genetic information. This is because information on genome function and gene regulation is also encoded in the way DNA sequence is folded up with proteins to form chromatin structures, which then compact through different fundamental steps into chromosomes inside the nucleus. This means that biological information and organization on and of living organisms cannot be portrayed in the DNA sequence alone. In a post-genomic era, the importance of chromatin-chromosome/epigenetic interface has become increasingly apparent, and the role of proteins in the regulation and modulation of genes expression and cellular activity appear henceforth to be very fundamental.


In fact, the eukaryotic genome is a highly complex system, which is regulated at three major hierarchical levels: (i) The DNA sequence level which support, locally and globally, different kinds of elastic deformations of the molecule such as bending, twisting, knotting and unknotting. These geometrical and topological transformations carry an amount of precious information on the emergence of certain genetic functions like transcription, replication, recombination and repair. (ii) The chromatin level, which involves three major remodeling processes needing the action of different ensembles of regulatory factors and co-factors, namely, folding and packing of complexes DNA-histones, histones modifications, and methylation of DNA molecule. Both remodeling and compaction of chromatin results from the inherently controlled flexible character of those macromolecular complexes which take part in the formation of the chromatin. (iii) The nuclear level, which includes the dynamic and three-dimensional spatial organization of the chromosome within the cell nucleus. It must be stressed at this point that chromatin remodeling and the chromosome organization constitute two novel layers of biological information, which enriches and completes that carried by the genetic DNA-sequence. There is increasing evidence that such a higher-order organization of chromatin arrangement contributes essentially to the regulation of gene expression and other nuclear functions. Furthermore, in eukaryotes, DNA topology and chromatin remodeling may have allowed the evolution of specific molecular mechanism to set the default state of DNA functions in response to external and/or internal signals in differentiated cells. Epigenetic aspects of hierarchical DNA/protein complexes have started to be elucidated in different model systems including Drosophila and yeast. The epigenetic mechanisms might thus constitute the molecular memory of the expression state of genes or gene sets that must be transmitted to progeny. The length and chromatin organization of the genetic material imposes topological constraints to DNA during fundamental nuclear processes such as DNA replication, gene expression, DNA recombination and repair, and modulation of chromatin/chromosome organization. DNA topology, which rest upon the ideas of deformability (change and adaptability of forms) and plasticity (reversible transformations) of molecular and macromolecular structures, has becoming a unifying topic for a variety of different fields that deal with DNA dealings such as replication, recombination and transcription. More than a simply packaging solution of DNA in the cell, chromatin organization has lately emerged as an active gene regulation complex structure, and recent studies have suggested that the 3

enzymes that modify chromatin generate local as well as global changes in DNA topology that drive formation of multiple, remodeled nucleosomal states. Thus, DNA topology itself is a possible way of regulating dynamically gene expression. Moreover, DNA topology likely has a crucial role in chromatin and chromosome organization. The organization and the localization of the chromosomes within the nucleus seem to play a fundamental role in the regulation of gene expression and epigenetic memory. The architecture of the nucleus itself and the relative position of specific chromosomal domains in different and specific nuclear territories might play a role in facilitating controlling gene expression and cellular functions. We think it is very worth to emphasize the enormous impact of chromatin organization and dynamics on epigenetic phenomena and cell metabolism. In a post-genomic era, the importance of epigenetics has become indeed increasingly apparent. The definition of epigenetics is constantly evolving to encompass the many processes that cannot be accounted for by the simple genetic (DNA) code, and the term now refers to extra layers of instructions and information (especially cellular, organismal and environmental) that influences gene activity without altering the DNA sequence. In this context, the chromatin/epigenetics interface is one of the foremost frontiers of recent research in biology. Theoretically, we are thus in the need of a deep and global rethinking of some fundamental biological concepts like “gene code”, “molecular mechanisms” and “genetic information”. At least they require to be supplemented by the concepts, respectively, of “chromatin code”, “multi-level regulatory mechanisms” and “epigenetic information”. In fact, contrary to the prevailing dogma in biology during the second half of the 20th century, according to which the nucleic acids and particularly DNA where the exclusive carrier of genetic information, in the recent years our understanding of epigenetic phenomena has progressed to the point where it appears that the true carrier of genetic information is the chromosome rather than just DNA. Indeed, the chromatin substrate appears to harbor metastable key features determining the recognition and the reading of genetic information. This information layer needs to be deciphered and integrated with the information layer of the genome sequence to model genetic networks properly. Thereby we will acquire a global understanding of the way the information contained in the genome is interpreted by the cell. This study addresses the correlation between geometrical structure, topological organization, complex dynamics and biological functions of the cell nucleus and its components, as well as their multi-levels regulatory systems. We will focus on the spatial organization of DNA, chromatin and chromosome, as well as their effects on the global 4

regulation of the genome functions and the cell activities. All along this paper, we shall suggest a multilevel and integrative approach to the study of some fundamental biological processes and we shall deal with a broad spectrum of questions ranging from mathematical ideas, biological implications and theoretical issues. Our paper is organized in four main sections: (1) first, we start with some remarks on the geometry and topology of the genome, its compaction into the chromosome and the biological meaning of such a process, (2) second, we address more specifically the structure and dynamics of the chromatin and the chromosome, and the role of epigenetic phenomena in cell regulation, (3) third, we briefly examine the way in which epigenetic phenomena influences cell differentiation and embryonic development, as well as different types of pathological diseases; this will lead us to consider the fundamental question of the relation between form and function pertaining to biological systems. The elucidation of these three central issues may help in understanding some key still open theoretical questions, such as the role and meaning of plasticity and complexity in the living systems, the nature and interpretation of biological information at the cell and organism levels, and some new aspects of the interaction between genotype and phenotype. Our principal goal is to demonstrate that certain geometrical and topological objects and transformations are responsible for the formation of many structures and patterns at the mesoscopic and macroscopic levels of living organisms, and that they carry a tremendous amount of precious information on the emergence of new biological forms and the developmental paths of organisms. This last remark allow for stressing two important points which deserves a particular attention: (1) The use of the notion of “information” in biology, in order to avoid semantic ambiguities and to go beyond its reductive (borrowed from cybernetic and the engineeristic theory of information) translation in biology, should be linked to the concept of “topological (dynamic) form” on the one hand, and to that of “system complexity” on the other hand. In fact, (i) first, what is transmitted in living systems during reproduction, development and evolution is not only the genetic code carried by the DNA molecule or the chemical instructions of genes, but also and at the same time the form and the meaning of these code and instructions, which are not from the outset completely contained in the DNA-code itself, and (ii) second, the informational content of biological processes can less be measured or captured in terms of discrete units (bits) of information, than rather by determining the


degrees of complexity of the structural modifications and the functional mechanisms needed to carry out these processes and thus to realize living forms. (2) The expression, regulation and modulation of a single gene or of sets of genes within the chromatin, the chromosome, and inside the cell could not be performed only through genetic information, for other more dynamic and complex physical principles and morphogenetic mechanisms are required for promoting and enhancing epigenetic events and cell activity.

2. Remarks on the geometry and topology of the genome and supercoiling: how forms modulate biological functions. To understand the workings of a complex macromolecule—whether it is a protein, polysaccharide, lipid, or nucleic acid—it is essential to know how that molecule is constructed. Information about the three-dimensional structure of DNA was needed if its biological activity was to be understood. The structure of DNA proposed by Watson and Crick in 1953 [158] included the following components (we stress here more the structuralgeometrical aspects than the chemical ones): 1. The molecule is composed of two chains of nucleotides. 2. The two chains spiral around each other to form a pair of right-handed helices. In a right-handed helix, an observer looking down the central axis of the molecule would see that each strand follows a clockwise path, moving away from the observer. 3. The two chains composing one double helix run in opposite directions; that is, they are antiparallel. Thus, if one chain is aligned in the 5′ → 3′ direction, its partner must be aligned in the 3′ → 5′ direction. 4. The -sugar-phosphate-sugar-phosphate-backbone of each strand is located on the outside of the molecule with the two sets of bases projecting toward the center. The phosphate group gives the molecule a negative charge. 5. The bases occupy planes that are approximately perpendicular to the long axis of the molecule and are, therefore, stacked one on top of another like a pile of plates. Hydrophobic interactions and van der Waal forces between the stacked, planar bases provide stability from the entire DNA molecule. Together, the helical turns and planar base pairs cause the molecule to resemble a spiral staircase. 6. The two strands are held together by hydrogen bonds between each base of one strand and an associated base on the other strand. Because individual hydrogen bonds are weak and easily broken, the DNA strands can become separated during various activities. But the strengths of hydrogen bonds are additive, and the large numbers of hydrogen bonds holding the strands together make the double helix a stable structure. 7. The spaces between adjacent turns of the 6

helix form two grooves of different width—a wider major groove and a more narrower minor groove—that spiral around the outer surface of the double helix. Proteins that bind to DNA often contain domains that fit into these grooves. In some cases, a protein bound in a groove is able to read the sequence of nucleotides along the DNA without having to separate the strands. 8. The double helix makes one complete turn every 10 residues (3.4 nm), or 150 turns per million daltons of molecular mass. 9. Because an A (adenine) on one strand is always bonded to a T (thymine) on the other strand, and a G (guanine) is always bonded to a C (cytosine), the nucleotide sequences of the two strands are always fixed relative to one another. This relationship between the two chains of the double helix is referred to as complementarity. For example, A is complementary to T, AGC is complementary to TCG, and one entire chain is complementary to the other. In fact, complementarity, which is a structural property of the molecule resulting from the geometrical characteristics of the double helix, i.e., the spiraled conformation and the antiparellism (asymmetry) of the two strands, is of paramount importance in nearly all the activities and mechanisms in which nucleic acids are involved. From the time biologists first considered DNA as the genetic material, there were three primary functions it was expected to fulfill. We will see in sections 4, 5 and 6, however, that all three functions need to be deeply reviewed in a more integrative and systemic framework. Furthermore, some of main assumptions they entail must be abandoned. We have putted in italics below the assumptions that need to be reconsidered or dropped. According to the Watson-Crick model, the three primary functions are the following: 1. As genetic material, DNA must contain a stored record of instructions that determine all the heritable characteristics that an organism exhibits. In molecular terms, DNA must contain the information for the specific order of amino acids in all the proteins that are synthesized by an organism. 2. DNA must contain the information for its own replication (duplication). DNA replication allows genetic instructions to be transmitted from one cell to its daughter cells and from one individual to its offspring. 3. DNA is more than a storage center; it is also a director of cellular activity. Consequently, the information encoded in DNA has to be expressed in some form that can take part in events that are going on within the cell. More specifically, the information in DNA must be used to direct the order by which specific amino acids are incorporated into a polypeptide chain. In the 1960s it has been discovered that two closed, circular DNA molecules of identical molecular mass could exhibit very different rates of sedimentation during centrifugation. 7

Further analysis indicated that the DNA molecule sedimenting more rapidly had a more compact shape because the molecule was twisted upon itself, much like a rubber band in which the two ends are twisted in opposite directions or a tangled telephone cord after extended use. DNA in this state is said to be supercoiled. Because supercoiled DNA is more compact than its relaxed counterpart, it occupies a smaller volume and moves more rapidly in response to centrifugal force or an electric field. Supercoiling is best understood if one pictures a length of double-helix DNA lying free in a flat surface. A molecule in this condition has the standard number of 10 base pairs per turn of the helix and it is said to be relaxed. The DNA would still be relaxed if both ends of the two strands were simply fused to form a circle. Consider, however, what would happen if the molecule were to be twisted before the ends were sealed. If the length of DNA were twisted in a direction opposite to that in which the duplex was wound, the molecule would tend to unwind. An underwound DNA molecule has a greater number of base pairs per turn of the helix. Because the molecule is most stable with 10 residues per turn, it tends to resist the strain of becoming underwound by becoming twisted upon itself into a supercoiled conformation. DNA is referred to as negatively supercoiled when it is underwound and positively supercoiled when it is overwound. Circular DNAs found in nature (e.g., mitochondrial, viral, bacterial) are invariably negatively supercoiled. Supercoiling is not restricted to small, circular DNAs but also occurs in linear, eukaryotic DNA. For example, negative supercoiling plays a key role in allowing the DNA of the chromosomes to be compacted so as to fit inside the confines of a microscopic cell nucleus. Because negatively supercoiled DNA is underwound, it exerts a force that helps separate the two strands of the helix, which is required during both replication (DNA synthesis) and transcription (RNA synthesis).

2.1. The role of topoisomerases: mathematical properties and functional abilities Certain enzymes in both prokaryotic and eukaryotic cells are able to change the supercoiled state of DNA duplex. These enzymes are called topoisomerases because they change the topology of the DNA. Cells contain a variety of topoisomerases that can be divided into two main classes, to which, however, could be added two other: type-3 and type-4 topoisomerases. Type I topoisomerases change the supercoiled state of a DNA molecule by creating a transient break in one strand of the duplex. The enzyme cleaves one strand of the DNA and then allows the intact, complementary strand to undergo a controlled rotation, which relaxes the


supercoiled molecule. Topoisomerases I is essential for processes such as DNA replication and transcription to prevent excessive supercoiling from developing as the complementary strands of a DNA duplex separate and unwind. Type II topoisomerases make a transient break in both strands of a DNA duplex. Another segment of the DNA molecule (or a separate molecule entirely) is then transported through the break, and the severed strands are resealed. As expected, the complex reaction mechanism is accompanied by a series of dramatic conformational changes. These enzymes are capable of some remarkable “tricks�. In addition to being able to supercoil and relax DNA, type II topoisomerases can tie a DNA molecule into knots or untie a DNA knot. They can also cause a population of independent DNA circles to become interlinked (catenated), or separate interlinked circles into individual components. Topoisomerase II is required to unlink DNA molecules before duplicated chromosomes can be separated during mitosis. The participation of topoisomerases in nearly all cellular processes involving DNA constitute an important aspect of the biological functioning of living organisms at the genetic as well as the epigenetic levels. Because the enzymes affect the topology and organization of intracellular DNA, the primary effects of inactivating a topoisomerase are also likely to generate far-reaching ripples. The regulation of the cellular levels of the enzymes themselves and the association of the enzymes with other cellular proteins are closely tied to the cellular functions of the enzymes. One major cellular function of the topoisomerases is to prevent excessive supercoiling of intracellular DNA. However, supercoiling is sometimes utilized in vivo to drive a particular region of intracellular DNA into a conformation suitable for a particular process. Initiation of DNA replication, for example, often requires that the DNA be in a negatively supercoiled state. Indeed, replication is the best-known process that generates supercoils in intracellular DNA. The involvement of various topoisomerases in the removal of positive supercoils generated by replication is generally in accordance with their known in vitro specifities. Namely, eukaryotic DNA topoisomerases I and II, and bacterial DNA topoisomerase IV, can efficiently remove supercoils of either sense; bacterial DNA topoisomerases I and III, and eukaryotic DNA topoisomerases III, can remove negative supercoils, but not positive supercoils, unless a single-stranded region is present in the DNA. Bacterial gyrase is unique in its ability to convert positive to negative supercoils; depending on how fast the positive supercoils are generated and how fast they are converted to negative supercoils, gyrase can either prevent accumulation of positive supercoils in an intracellular DNA segment or keep the segment in a negatively supercoiled state. 9

The DNA topoisomerase presumably co-evolved with the formation of very long and/or ring-shaped DNA molecule. To solve a variety of problems that are rooted in the double-helix structure of DNA, nature has created not one but three distinct enzymes. In eukaryotes, members of all three subfamilies of DNA topoisomerases have been found in the same cells; in bacteria, four members from two subfamilies participate in nearly all-cellular transactions of DNA. The past decade saw much progress in the study of the DNA topoisomerases, but many questions remain. The key to answering to them may lie in the elucidation of interactions between the DNA topoisomerases and other cellular proteins. Complexes between these enzymes and transcription factors and chromosomal proteins illustrate new avenues yet to be fully explored. Furthermore, whereas the information available on topoisomerases-DNA interactions is substantial, that on interactions in the context of chromatin is still scarce; whether eukaryotic DNA topoisomerase II has a structural role in the organization of interphase and/or metaphase chromosomes, for example, is yet to be settled.

3. The topological compaction of the double-helix and the link with its biological functions. Conformational flexibility, supercoiling and compaction By the 1980s, it began to become clear that—although the informational content of the genetic code was embodied in a linear array of bases—it was the three-dimensional structure and the topological condensation in the chromatin-like assembly of the DNA double helix in the chromosomes that ultimately would govern its physiological functions in the cells. This is an important point. As an illustration of this point, in perhaps the most striking biological example of “forms dictates function”, the two complementary parental strands of DNA must separate during semi-conservative replication in order to act as the templates for each of the two newly synthesized daughter strands. This discovery leads to the realization that the structure of DNA, while elegant, burdened the cell with previously unimagined topological problems. Although these topological problems were originally recognized only for circular molecules, because of the long length of chromosomal DNA, we now know that they apply to linear genomes as well. The key for finding the solution of these problems seems to lies in the following issues: (1) in the conformational, organizational and biological roles of the topoisomerases that, because of their extreme structural and functional complexity, still remains in part to be elucidated. (2) In the DNA supercoiling process, because it links the biological activity of DNA to its tertiary


structure and not just its sequence. All essential cellular processes seem to be related to the way in which supercoiling is realized. (3) In the three-dimensional organization of the chromatin, which is a nucleoprotein complex and the stuff chromosomes are made of. This organization not only compacts the DNA but also plays a fundamental role in regulating interactions with the DNA during its metabolism. One of the most striking phenomena which reveal the profound interdependence between topological problems and biological processes is those of the compaction of chromatin into the chromosome of cells, whose explanation is one of most challenging task of biology today. Here we are faced with a genuine problem of differential topology. What kind of deformations the double-strands linear DNA molecule undergoes in order that it condenses into an extremely compact form, corresponding to the metaphase of the chromosome? The answer to that question actually is far of being clear or complete. However, some facts have been elucidated very recently, which we shall now describe in a schematic manner. The key distinguishing characteristic of the eukaryotic genome is its tight packaging into chromatin, a hierarchically organized complex of DNA and histone and nonhistone proteins. How genome operates in the chromatin context is a central question in the molecular genetics of eukaryotes. The chromatin packaging presents different levels of organization. Every level of chromatin organization, from nucleosome to higher-order structure up to its intranuclear localization, can contribute to the regulation of gene expression, as well as affect other functions of the genome, such as replication and repair. Concerning gene expression, chromatin is important not only because of the accessibility problem it poses for the transcription apparatus, but also due to the phenomenon of chromatin memory, that is, the apparent ability of alternative chromatin states to be maintained through many cell divisions. This phenomenon is believed to be involved in the mechanism of epigenetic inheritance, an important concept of developmental biology. Supercoiling is one of the three fundamental aspects of DNA compaction; the other two are flexibility and intrinsic DNA curvature. For example, the problem of DNA compaction in E. coli can be putted in the following words: the DNA must be compacted more than a thousand fold in the cell, yet it still needs to be available to be transcribed. Recall that the length of a typical bacterial operon—usually about three genes—, is about as long as the entire bacterial cell, if it is stretched out in its B-DNA double-helical conformation! In order this compaction to be achieved, it is requested some kind of anisotropic flexibility or “bendability” of DNA, which is very much sequence-specific, and is different from the static “rigidity” of DNA. 11

Whereas persistence length of DNA is relatively non-specific, and just has to do with its overall “rigidity”1, anisotropic flexibility is a measure of a particular sequence to be deformed by a protein (or some other external forces). Some sequences are both isotropically flexible and “bendable” – for example, the TATA motifs. Perhaps one of the best examples of this is the binding site for the Integration Host Factor (IHF)—there are certain base pairs that are highly distorted upon binding of this protein. It is quite impressive that this protein induces a bend of 180 degrees into a DNA helix. In other words, the curvature k at each sequence of the two strands of DNA helix must be very sharp in order the DNA double helix may assume its extremely compact form. Indeed, when one consider that the DNA must be compacted more than a thousand fold in the cell, it is probably not surprising that almost any protein that binds to DNA will bend it. Moreover, since the total curvature K of entire DNA double-helix segment depends on the torsional stress which applies to DNA strands, and these strands form hence a twisted curve, i.e., a curve with torsion in the three-dimensional space of the cell nucleus, DNA must coil many times in a very ordered way inside the nucleus (otherwise, if the chromosome of a human cell were in the form of a random coil, they would not fit inside the nucleus). The DNA double helix coils first by overwinding or underwinding of the duplex. The supercoiled form of a circular DNA molecule is much more compact than the other possible conformations, i.e., nicked and linear. In its supercoiled form, DNA molecule minimizes to the highest the space volume which it occupies in the nucleus. Supercoils condense DNA and promote the disentanglement of topological domains. Today we know that DNA is topologically polymorphic. The overwound or underwound double-helix can assume exotic forms known as plectonemes, like the braided structures of a tangled telephone cord, or solenoids, similar to the winding of a magnetic coil. (i) Plectonemically supercoiled DNA is unrestrained and frequently branched, while toroidal supercoils is restrained by proteins and it is more compact. (ii) DNA can be either positive or negatively supercoiled. In particular, eukaryotic DNA is negatively supercoiled in and around genes, and it is transiently negatively supercoiled behind RNA polymerase during transcription. (iii) Negative supercoiling favors DNA-histone association and the formation of nucleosomes, the first step in packaging DNA. Because the solenoidal DNA wrapping around a nucleosome core creates about two negative supercoils, it is understandable that DNA that 1

On average, DNA has a persistence length of about 44nm, which is quite a bit longer than proteins—one way to thinking about this is that proteins tend to fold up into little spheres, or “blobs”, and DNA is a bit more rigid.


fulfills this topological prerequisite will more easily form nucleosome. (iv) These tertiary structures have an important effect on the molecule's secondary structure and eventually its functions. For example, supercoiling induced destabilization of certain DNA sequences and allows the extrusion of cruciform or even the transcriptional activation of eukaryotic promoters. Another essential process, DNA transcription, can both generate and be regulated by supercoiling. During replication, the chromosomes need to be partitioned and the two strands of DNA must be continuously unlinked during replication. The topoisomerases that accomplish this might instead be expected to entangle and knot chromosomes because of the huge DNA concentration in vivo. There are actually several factors that solve this problem and contribute to the orderly unlinking of DNA. A major contributor to chromosome partitioning is the condensation of daughter DNA upon itself soon after replication. DNA condensation is due primarily to supercoiling. Another factor promoting chromosome partitioning is that the type2 topoisomerases of all organisms do not just speed up the approach to topological equilibrium, but actually change the equilibrium position. They actively remove all DNA entanglements. This requires that topoisomerases sense the global conformation of DNA even though they interact with DNA only locally. In fact, topoisomerases achieve this because they are like Maxwell's demon and by positioning themselves at sharp bends in DNA carry out net disentanglement of DNA. An equal partner to the topoisomerases in chromosome segregation is the helicases. They seem to convert the energy of ATP hydrolysis into unwinding DNA. All the enzymes that play critical roles in DNA unlinking and chromosome segregation, topoisomerases, helicases, and condensins, are motor proteins. They use the energy of ATP hydrolysis to move large pieces of DNA over long distances. The foregoing can be summed up by saying that supercoiling has three essential roles. 1. First, (–) supercoiling promotes the unwinding of DNA and thereby the myriad processes that depend on helix opening. Whenever DNA is doing anything interesting, it is single-stranded, and (–) supercoils provide a vital sequence-independent assistance to denaturation. 2. The second essential role of supercoiling is in DNA replication. For replication to be completed, the linking number of the DNA, Lk, must be reduced from its vast (+) value to exactly zero. In bacteria, DNA gyrase introduces (–) supercoils and thereby removes parental linking number Lk. 3. The third essential role of supercoiling is conformational. DNA manifests the difference between the relaxed and naturally occurring values of Lk by winding up into


supercoils. These supercoils condense DNA and promote the disentanglement of topological domains. This can be accomplished equally well by (–) or (+) supercoiling. Recall the general definition that the linking number Lk is the number of times one strand of DNA molecule goes around the other. The shape of a DNA ring is strongly affected by the linking number. This is a topological quantity, hence, it cannot be altered while the strands are intact regardless of how the ring is pulled or twisted. Besides, this topological invariant carries a precious amount of genuine biological information on the adaptability of the forms of macromolecules and macromolecules complexes to the variation of their functional tasks during development and evolution. Let us still underline two important facts. First, the promotion of decatenation by supercoiling has also been directly demonstrated in vivo. Second, the volume occupied by a supercoiled molecule is much more smaller than that of a relaxed DNA. This difference in volume is due mostly to the formation of superhelical branches. Indeed, supercoiled DNA branches and bends itself into a ball. The decrease in chromosomal volume by supercoiling decrease the probability that the septum will pass through the chromosome during cell division. It seems clear that supercoiling play a fundamental role in the condensation of the double helix and that this condensation is responsible for DNA unlinking and chromosome partitioning. Supercoiling results from topological strain and the contortion of DNA by proteins, notably the nucleosomal histone octet and the structural maintenance of chromosomes (SMC) proteins. There are three ways, actually experimentally observed in vivo, in which condensation of chromosome by supercoiling occurs, and to each of them corresponds a topological model for explaining the compaction of chromosomes in the cell's nucleus. 1) (–) Supercoiling by gyrase compacts the chromosomes such that random passages by topoisomerase IV disentangle them. In particular, topoisomerase IV is responsible for decatenation of DNA. (For a comprehensive and lucid account of this subject, see Roca [134]). 2) The second type of condensation via supercoiling, is that by core histones. DNA is compacted in independent successive stages such that the total compaction is the product of compaction in each stage. The first stage of this compaction is via solenoidal wrapping of DNA in the nucleosome. Although the compaction achieved is modest, the nucleosome provides a fundamental structure for genome organization and function. The structure of a nucleosome reveals a scaffolding that forces the DNA to adopt ordered solenoidal supercoils.


3) The third type of compaction cum supercoiling, that by condensin, is needed for the formation of mitotic chromosomes from the open interphase forms. Recently, it has been experimentally demonstrated that condensin was required for both the assembly and maintenance of these chromosomes [44]. It is very worth of note that a simple local interpretation of these results isn’t a satisfactory explanation, because a local overwinding of DNA would have no effect on condensation; nor could a tight wrapping around condensin greatly compact DNA, because there is no more than one condensin molecule per 10kb of DNA. Fortunately, there is a third possible explanation for the (+) supercoiling that is compatible with its physiological role. Condensin is so large, reaching out perhaps 1,000 Å, that it could torque the DNA between its reach. Thus, condensin could introduce (+) supercoiling by effecting global writhe. Strong evidence for this was provided by the finding that incubation of condensin and a type-2 topoisomerase with plasmid DNA forms chiral DNA knots. These knots were almost exclusively (+), as expected if condensin introduces a regular (+) writhe. (For more details on these mathematical concepts, see next section, and also the reference [161]).

4. The complex and multi-levels “interpretation” of the genome by cells: from DNA code to RNA transcription and to proteins regulation One might have predicted that the information present in genomes would be arranged in an orderly fashion, resembling a dictionary or a telephone directory. Although the genome of some bacteria seems fairly well organized, the genomes of most multicellular organisms, such as Drosophila for example, are surprisingly disorderly. Small pieces of coding DNA (that is, DNA that codes for protein) are interspersed with large blocks of seemingly meaningless DNA. Some sections of the genome contain many genes and other lack genes altogether. Proteins that work closely with one another in the cell often have their genes located on different chromosomes, and adjacent genes typically encode proteins that have little to do with each other in the cell. Decoding genomes is therefore no simple matter. In particular, it is difficult to locate definitively the beginning and end of genes in the DNA sequences of complex genomes. The DNA in genomes does not direct protein synthesis itself, but instead uses RNA as an intermediary molecule. When the cell needs a particular protein, the nucleotide sequence of the appropriate portion of the immensely long DNA molecule in a chromosome is first copied


into RNA—a process called transcription. It is these RNA copies of segments of the DNA that are used directly as templates to direct the synthesis of the protein—a process called translation. The flow of genetic information in cells is therefore from DNA to RNA to protein. All cells, from bacteria to humans, express their genetic information in this way, which is thus considered as a fundamental principle of molecular biology. However, there are important variations in the way information flows from DNA to protein. Principal among them is that RNA transcripts in eukaryotic cells are subject to a series of processing steps in the nucleus, including RNA splicing, before they are permitted to exit from the nucleus and be translated into protein. These processing steps can critically change the “meaning” of an RNA molecule and are therefore crucial for understanding how eukaryotic cells read the genome. It has to be said that for some genes RNA is the final product. Like proteins, many of these RNAs fold into precise three-dimensional structures that have structural and catalytic roles in the cell [35]. Transcription and translation are the means by which cells read out, or express, the genetic instructions in their genes. Because many identical RNA copies can be made from the same gene, and each RNA molecule can direct the synthesis of many identical protein molecules, cells can synthesize a large amount of protein rapidly when necessary. But each gene can also be transcribed and translated with a different efficiency, allowing the cell to make vast quantities of some proteins and tiny quantities of others. Moreover, a cell can change the expression of each of its genes according to the need of the moment—most obviously by controlling the production of its RNA. The first step a cell takes in reading out a needed part of its genetic instructions is to copy a particular portion of its DNA nucleotide sequence—a gene—into an RNA nucleotide sequence. The information in RNA, although copied into another chemical form, is still written in essentially the same language as it is in DNA—the language of a nucleotide sequence. Hence the name transcription. Despite their small chemical differences, DNA and RNA differ quite dramatically in overall structure. Whereas DNA always occurs in cells as a double-stranded helix, RNA is single-stranded. RNA chains therefore fold up into a variety of shapes, just as a polypeptide chain folds up to form the final shape of a protein. The ability to fold into complex three-dimensional shapes allows some RNA molecules to have structural and catalytic functions. All of the RNA in a cell is made by DNA transcription, a process that has certain similarities to the process of DNA replication. Transcription begins with the opening and 16

unwinding of a small portion of the DNA double helix to expose the bases on each DNA strand. One of the two strands of the DNA double helix then acts as a template for the synthesis of an RNA molecule. As in DNA replication, the nucleotide sequence of the RNA chain is determined by the complementary base-pairing between incoming nucleotides and the DNA template. When a good match is made, the incoming ribonucleotide is covalently linked to the growing RNA chain in an enzymatically-catalyzed reaction. The RNA chain produced by transcription—the transcript— is therefore elongated one nucleotide at a time, and it has a nucleotide sequence that is exactly complementary to the strand of DNA used as the template. Transcription, however, differs from DNA replication in several crucial ways. Unlike a newly formed DNA strand, the RNA strand does not remain hydrogen-bonded to the DNA template strand. Instead, just behind the region where the ribonucleotides are being added, the RNA chain is displaced and the DNA helix re-forms. Thus, the RNA molecules produced by transcription are released from the DNA template as single strands. In addition, because they are copied from only a limited region of the DNA, RNA molecules are much shorter than DNA molecules. A DNA molecule in a human chromosome can be up to 250 million nucleotide-pairs long; in contrast, most RNAs are no more than a few thousand nucleotides long, and many are considerably shorter. The enzymes that perform transcription are called RNA polymerases. Like the DNA polymerase that catalyzes DNA replication, RNA polymerases catalyze the formation of the phosphodiester bonds that link the nucleotides together to form a linear chain. The RNA polymerase moves stepwise along the DNA, unwinding the DNA helix just ahead of the active site for polymerization to expose a new region of the template strand for complementary base-pairing. In this way, the growing RNA chain is extended by one nucleotide at a time in the 5′-to-3′ direction. The substrates are nucleosides triphosphates (ATP, CTP, UTP, and GTP); as for DNA replication, a hydrolysis of high-energy bonds provides the energy needed to drive the reaction forward. The almost immediate release of the RNA strand from the DNA as it is synthesized means that many RNA copies can be made from the same gene in a relatively short time, the synthesis of additional RNA molecules being started before the first RNA is completed. When RNA polymerase molecules follow hard on each other’s heels in this way, each moving at about 20 nucleotides per second (the speed in eukaryotes), over a thousand transcripts can be synthesized in an hour from a single gene. Although RNA polymerase catalyze essentially like the same chemical reaction as DNA polymerase, there are some important differences between the two enzymes. First, and most 17

obvious, RNA polymerase catalyzes the linkage of ribonucleotides, not deoxyribonucleotides. Second, unlike the DNA polymerases involved in DNA replication, RNA polymerases can start an RNA chain without a primer. This difference may exist because transcription need not be as accurate as DNA replication. Unlike DNA, RNA does not permanently store genetic information in cells. RNA polymerases make about one mistake for every 104 nucleotides copied into RNA (compared with an error rate for direct copying by DNA polymerase of about one in 107 nucleotides), and the consequences of an error in RNA transcription are much less significant than that in DNA replication. The majority of genes carried in a cell’s DNA specify the amino acid sequence of proteins; the RNA molecules that are copied from these genes (which ultimately direct the synthesis of proteins) are called messenger RNA (mRNA) molecules [81]. The final product of a minority of genes, however, is the RNA itself. Careful analysis of the complete DNA sequence of the genome of the yeast S. cerevisiae has uncovered well over 750 genes (somewhat more than 10% of the total number of yeast genes) that produce RNA as their final product, although this number includes multiple copies of some highly repeated genes. These RNAs, like proteins, serve as enzymatic and structural components for a wide variety of processes in the cell. Each transcribed segment of DNA is called a transcription unit. In eukaryotes, a transcription unit typically carries the information of just one gene, and therefore codes for either a single RNA molecule or a single protein (or group of related proteins if the initial RNA transcript is spliced in more than one way to produce different mRNAs). In bacteria, a set of adjacent genes is often transcribed as a unit; the resulting mRNA molecule therefore carries the information for several distinct proteins. Overall, RNA makes up a few percent of a cell’s dry weight. Most of the RNA in cells is rRNA (ribosomal RNA); mRNA comprises only 3–5% of the total RNA in a typical mammalian cell. The mRNA population is made up of tens of thousands of different species, and there are on average only 10–15 molecules of each species of mRNA present in each cell. To transcribe a gene accurately, RNA polymerase must recognize where on the genome to start and where to finish. The initiation of transcription is an especially important step in gene expression because it is the main point at which the cell regulates which proteins are to be produced and at what rate. During transcription initiation a series of conformational changes take place as a successive tightening of the enzyme around the DNA and RNA to ensure that it does not dissociate before it has finished transcribing a gene. If an RNA polymerase does dissociate prematurely, it cannot resume synthesis but must start over again at the promoter. 18

Eukaryotic nuclei have three type of RNA, called RNA polymerase I, RNA polymerase II, and RNA polymerase III. The three polymerases are structurally similar to one another. They share some common subunits and many structural features, but they transcribe different types of genes. RNA polymerases I and III transcribe the genes encoding transfer RNA, ribosomal RNA, and various small RNAs. RNA polymerase II transcribes the vast majority of genes, including all those that encode proteins. Although eukaryotic RNA polymerase II has many structural similarities to bacterial RNA polymerase, there are several important differences in the way in which the bacterial and eukaryotic enzymes function, two of which are worth of noticing here: 1. While bacterial RNA polymerase is able to initiate transcription on a DNA template in vitro without the help of additional proteins, eukaryotic RNA polymerase cannot. They require the help of a large set of proteins called general transcription factors, which must assemble at the promoter with the polymerase before the polymerase can begin transcription. 2. Eukaryotic transcription initiation must deal with the packing of DNA into nucleosomes and higher order forms of chromatin structure, features absent from bacterial chromosomes (see sections for further details on this topic). These general transcription factors help to position the RNA polymerase correctly at the promoter, aid in pulling apart the two strands of DNA to allow transcription to begin, and release RNA polymerase from the promoter into the elongation mode once transcription has begun. The protein are “general” because they assemble on all promoter used by RNA polymerase II; consisting of a set of interacting proteins, they are designated as TFII (for transcription factors for polymerase II), and listed as TFIIA, TFIIB, and so on. In a broad sense, the eukaryotic general transcription factors carry out functions equivalent to those of the factor σ in bacteria. The assembly process starts with the binding of the general transcription factor TFIID to a short double-helical DNA sequence primarily composed of T and A nucleotides. For this reason, this sequence is known as the TATA sequence, or TATA box, and the subunit of TFIID that recognizes it is called TBP (for TATA-binding protein). The TATA box is typically located 25 nucleotides upstream from the transcription start site. It is not the only DNA sequence that signals the start of transcription, but for most polymerase II promoters, it is the most important. The binding of TFIID causes a large distortion in the DNA of the TATA box. This distortion is thought to serve as a physical landmark for the location of an active promoter in the midst of a very large genome, and it brings DNA sequences on both sides of the distortion together to allow for subsequent protein assembly


steps. Other factors are then assembled, along with RNA polymerase II, to form a complete transcription initiation complex. After RNA polymerase II has been guided onto the promoter DNA to form a transcription initiation complex, it must gain access to the template strand at the transcription start point. This step is aided by one of the general transcription factors. TFIIH, which contains a DNA helicases. Next, like the bacterial polymerase, polymerase II remains at the promoter, synthesizing short lengths of RNA until it undergoes a conformational change and id released to begin transcribing a gene. A key step in this release is the addition of phosphate groups to the “tail� of the RNA polymerase (known as the CTD or C-terminal domain). This phosphorylation is also catalyzed by TFIIH, which, in addition to a helicases, contains a protein kinase as one of its subunits. The polymerase can then disengage from the cluster of general transcription factors, undergoing a series of conformational changes that tighten its interactions with DNA and acquiring new proteins that allow it to transcribe for long distances without dissociating. Once the polymerase II has begun elongating the RNA transcript, most of the general transcription factors are released from the DNA so that they are available to initiate another round of transcription with a new RNA polymerase molecule. The phosphorylation of the tail of RNA polymerase II also causes components of the RNA processing machinery to load onto the polymerase and thus be in position to modify the newly transcribed RNA as it emerges from the polymerase. The model for transcription initiation just described was established by studying the action of RNA polymerase II and its general transcription factors on purified DNA templates in vitro. However, as we will see thoroughly in sections 5 to 8, DNA in eukaryotic cells is packaged into nucleosomes, which are further arranged in higher-order chromatin structures. As a result, transcription initiation in a eukaryotic cell is more complex and requires more proteins than it does on purified DNA. First, gene regulatory proteins known as transcriptional activators bind to specific sequences in DNA and help to attract RNA polymerase II to the start point of transcription. This attraction is needed to help the RNA polymerase and the general transcription factors in overcoming the difficulty of binding to DNA that is packaged in chromatin. The role of activators represents one of the many ways in which cells regulate expression of their genes, and it will be discussed in next sections. Here we simply note that their presence on DNA is required for transcription initiation in a eukaryotic cell. Second, eukaryotic transcription initiation in vivo requires the presence of a protein complex known as the mediator, which allows the activator proteins to communicate 20

properly with the polymerase II and with the general transcription factors. Finally, transcription initiation in the cell often requires the local recruitment of chromatin-modifying enzymes, including chromatin remodeling complexes and Histone acetylases. Both types of enzymes can allow greater accessibility to the DNA present in chromatin, and by doing so, they facilitate the assembly of the transcription initiation machinery onto DNA. Many proteins (well over hundred individual subunits) must assemble at the start point of transcription to initiate transcription in a eukaryotic cell. The order of assembly of these proteins is probably different for different genes and therefore may not follow a prescribed pathway. In fact, some of these different protein assemblies may interact with each other away from the DNA and be brought to DNA as performed subcomplexes. For example, the mediator, RNA polymerase II, and some of the general transcription factors can bind to each other in the nucleoplasm and be brought to the DNA as a unit. This means that there are many ways in which eukaryotic cells can regulate the process of transcription initiation.

5. From genomic to epigenomic approach to living systems Much of what we said in the section 1–3, strongly suggest that the secrets of life and what allows the biological growth of all organisms maybe lies, at least partly, in topology, namely in the fact that forms posses the capacity to transform dynamically structures and functions one into another, which require different series of conformational changes be performed with accuracy in the right space and time stages of the cell cycle and organism’s growth. In fact, the topological compaction of our genome, the basic building block of which is the nucleosome (a protein-DNA structure), provides a whole repertoire of information in addition to that furnished by the genetic code. This mitotically stable information is not inherited genetically and is termed epigenetic. Epigenetic phenomena are propagated alternative states of gene expressions, and alternative states of protein folding, and they are closely linked with histone- and chromatin modifications. One of the challenges in chromatin research is to understand how levels of chromosome organization beyond the 30-nm chromatin filaments condense to form the cell metaphase chromosome. We need very likely a topological model that accounts for the several ordered transformations that are required for the dimensions of metaphase chromosomes, which are 10,000-fold shorter and 400 to 500-fold thicker than the double stranded DNA helices contained within them. Loops-like arrangement of chromatin and their stacking into a cylinder 21

of 800 to 1000 nm in thickness, which is in good agreement with the diameter of the metaphase chromosome, and twisting the cylinder into a superhelix would further compact it, is a model that account well for the corkscrew appearance of metaphase chromosome. Remarkably, cells achieve this tight packing of DNA while still maintaining the chromosome in a form that allows regulatory proteins to gain access to the DNA to turn on (or off) specific genes or to duplicate the chromosomal DNA. This means that all epigenetic states and processes have to be established, inherited, controlled and modified in such a way as to permit that their integrity is maintained while preserving the possibility of flexibility (or deformability). Thus, the topological plasticity of the many-levels structure of chromosomes, the chromatin dynamics and the gene’s regulatory modifications are intimately interconnected processes and determinant factors of cellular organization. Condensation of genetic material appears to be a very fundamental mechanism of life. Now, since condensation realize as a kind of topological embedding of one space, the restrained linear DNA, into another space, the 3-dimesional chromosome structure within the nucleus, it seems reasonable to think that topological embeddings and immersions are dynamic processes that are essential for the maintenance and the integrity of life. One demonstration of that is the fact that the exotic supercoiled forms that double helix can assume are tertiary structures, which may have an important effect on the molecule’s secondary structure and its function. DNA and chromosome organization must fulfill precise topological prerequisite in order to achieve certain functional processes. In particular, DNA transcription and replication can both be enhanced and regulated by topological supercoiling. It now appear clear, for example, that for replication to be completed, the linking number of the DNA, Lk, must be reduced from its vast (+) value to exactly zero. In bacteria, DNA gyrase introduces (-) supercoils and thereby removes parental Lk. Moreover, in certain cases, the severity of the phenotype can be controlled by changing the level of supercoiling in the cell. We have thus three interrelated theoretical and experimental facts, which we would like to stress: 1) DNA condensation is a driving force for double helix unlinking and chromosome portioning, by folding, in topological domains. 2) Condensation is achieved by supercoiling, which is a topological state of macromolecules enhanced by three kinds of deformations: twisting, writhing and knotting. If the DNA is modeled as a ribbon in three-dimensional space whose axis is not flat in the plane, we can define the twist of the ribbon abstractly as the integral of the incremental twist of the ribbon about the axis, integrated as we traverse the axis once; so it simply measures how much the ribbon twists about the axis from the frame of 22

reference of the axis: it need not be an integer. The writhe measures how much the axis of the ribbon is contorted in space. Because (–) supercoiling in bacteria arises from a topological misalignment and not a protein corset, it has the flexibility to do work. 3) Supercoiling results from topological strain and the contortion of DNA by proteins, notably the nucleosomal histone octet and the structural maintenance of chromosomes (SMC) proteins. Let us sum up the previous consideration by highlighting two crucial facts. 1. We already stressed (see section 1) that the chromatin structure appears to harbor metastable key features allowing the proper interpretation of genetic information. Among these features, two seem to be of paramount importance, from the structural as well as from the functional point of view: first, the two strands of DNA must be continuously unlinked during replication, and second, chromatin must be topologically condensed inside the cells of organisms with nuclei. Three families of huge ATP-powered enzymes—helicases, type-2 topoisomerases, and condensins—contribute to the orderly unlinking of DNA and to the chromosome segregation in vivo. In this process, two steps seem to be very fundamental. For replication to occur, the DNA must be decondensed. Helicases unwind DNA creating (+) supercoils and precatenanes which are rapidly removed by topoisomerases. Type-2 topoisomerases actively remove all DNA entanglements. Then, the organized recompaction by condensins and supercoiling are essential for chromosome partitioning. The chromosome must, indeed, be folded into topological domains. Besides, we stress that chromosome need to be topologically remodeled in order the genetic events and the cellular process to be performed. Finally, chromatin structure and chromosome conformation are dynamic and complex and exert profound control over gene expression and other fundamental cellular processes. 2. The fundamental determinants of many biological phenomena are now known to be geometrically organized and topologically constrained. And the biologically most important molecules or macromolecules (proteins, RNA and DNA) are comprised of linear chains of building blocks, which yet ordered theyself according to some mathematical rules which are highly non-linear and extremely system-dynamically complex. So, even if each protein or RNA and DNA molecule commonly folds into a specific structure that depends sensitively on its sequence, however, when we take into account large ensemble of these macromolecules and their associated systems, then we observe that they depends rather on the action of some large-range correlation structures. From a systematic statistical mechanics point of view analysis of DNA sequences, these large-range correlations can be interpreted in terms of the 23

signature of the hierarchical structural and dynamical organization of chromatin in relation with DNA replication, gene expression and cell division. At any rate, DNA in living systems is topologically constrained, so its patterns and functions essentially depend on how it is constrained.

6. The complex reality of genes: topological organization, regulatory processes and levels of biological expression Most molecular biologists believe that the building molecular blocks, which form the genetic material of organisms, encapsulate and determine the entire process of life and its evolution on earth. Yet, it became more and more clear that we hardly understand in any detail the links between the molecular substrate and the nature of organisms. It is at present widely recognized that the most striking questions in biology are the following two: 1. How explain the organization of chromatin in the cell and his influence on the replications of cells and on the entire metabolism of eukaryotic organisms. 2. Why the different modes of expression of genes essentially rely on different interrelated epigenetic processes and on the existence of sets of regulatory networks that act on the dynamics of the sophisticated engineering of organismal evolution. Before to go thoroughly into these questions, we would like to stress few points which show the novelty and the far-reaching significance of these issues for developing new approaches to biological processes and forms. 1. There is now strong evidence for believe that certain domains of the chromatin and some epigenetic processes control gene expression and regulate chromosome and cell behavior. Eukaryotic DNA is organized into structurally distinct domains that regulate gene expression and chromosome behavior. Epigenetically heritable domains of heterochromatin control the structure and expression of large chromosome domains and are required for proper chromosome segregation. Recent studies have identified many of the enzymes and structural proteins that work together to assemble heterochromatin. The assembly process appears to occur in a stepwise manner involving sequential rounds of histone modification by silencing complexes that spread along the chromatin fiber by self-oligomerization, as well as by association with specifically modified histone amino-terminal tails. Finally, an unexpected role for non-coding RNAs (polymerase) and RNA interference in the formation of epigenetic chromatin domains has been uncovered.


2. The rule governing physiological regulation and higher levels of cellular organization are located not in the genome, but rather in interactive epigenetic networks which themselves organize genomic responses to environmental signals. 3. It is now well-known that a genome is prepared to respond in a programmed manner to “shocks”, like the “heat shock” response in eukaryotic organisms and the “SOS” response in bacteria. However, by contrast to these programmed responses, there are responses of genome to unanticipated challenges that are not so precisely programmed. The genome is unprepared for these shocks. Nevertheless, they are sensed, and the genome responds in a discernible but initially unforeseen manner. Familiar examples are the production of mutation by x-rays and by some mutagenic agents. 4. One has long-time believed that the close mapping between genotype and morphological phenotype in many contemporary metazoans shows that the evolution of organismal form is a direct consequence of evolving genetic programs. However, recently it has been proposed that the present relationship between genes and form is a highly derived condition, a product of evolution rather than its precondition. Prior to the biochemical canalization of developmental pathways, and the stabilization of phenotypes, interaction of multicellular organisms with their physico-chemical environments dictated a many-to-many mapping between genome and forms. These forms would have been generated by epigenetic mechanisms: initially physical processes characteristic of condensed, chemical active materials, and later conditional, inductive interactions among the organism's constituent tissues. This concept, that epigenetic mechanisms are the generative agents of morphological character origination, helps to explain findings that are difficult to reconcile with the standard neo-Darwinian model, e.g., the burst of body plans in the early Cambrian, the origins of morphological innovation, homology, and rapid change of form. This concept entails a new interpretation of the relationship between genes and biological form.

7. Epigenetics and change in living beings. The new link between genes activity and organism’s environment Loosely defined, epigenetics studies how certain patterns of gene expression may change and become stably inherited, without affecting the actual base sequence of the DNA. We know today that epigenetics play an important role in gene expression, cell regulation, in embryonic development, and also in tumorigenesis. In humans, the main epigenetic events are proteins-


histones modifications and DNA methylation. These events correlate with age and lifestyle. The largest twin study on epigenetic profiles yet reveals the extent to which lifestyle and age can impact gene expression [65]. Its aim was to quantify how genetically identical individuals could differ in gene expression on a global level due to epigenetics. It has been found that 35% of twin pairs had significant differences in DNA methylation and histone modification profiles. The study revealed that twins who reported having spent less time together during their lives, or who had different medical histories, had the greatest epigenetic differences. Gene expression microarray analysis revealed that in the two twin pairs most epigenetically distinct from each other—the 3- and 50-year-olds—there were four times as many differentially expressed genes in the older pair than in the younger pair, confirming that the epigenetic differences the researchers saw in twins could lead to increased phenotypic differences. These findings help show how environmental factors can change one’s gene expression and susceptibility to disease, by affecting epigenetics. In the nucleus of eukaryotic cells, the three-dimensional organization of the genome takes the form of a nucleoprotein complex: chromatin. As we already saw above, this organization not only compacts the DNA but also plays a critical role in regulating interactions with the DNA during its metabolism. This packaging of our genome, the basic building block of which is the nucleosome, provide a whole repertoire of information in addition to that furnished by the genetic code. This mitotically stable information is not inherited genetically and is not inherited genetically and is termed epigenetics. One of the challenges in chromatin research is to understand how epigenetic states are established, inherited, controlled and modified so as to guarantee that their integrity is maintained while preserving the possibility of plasticity; this plasticity works through different degrees and at different levels, depending on the dynamics of the biological process concerned and on the complexity of the task required to perform these process. In other words the aim is to understand the temporal and spatial dynamics of chromatin organization, during the cell cycle, in response to different stimuli and in different cell types. Chromatin dynamics are affected by alterations to the nucleosome, the basic repeating unit formed from DNA wrapped around an octamer of histone proteins. Such alterations can be controlled by three classes of compounds: 1. histone chaperones principally implicated in the transfer of histones to DNA to form the nucleosome; 2. Nucleosome remodeling factors or possibly disassembly factors that enable DNA sequences within nucleosomal structures to become accessible to protein complexes, allowing replication, transcription, repair or 26

recombination; 3. Factors involved in post-translational modifications of histones. The large repertoire of these epigenetic modifications is at the heart of the chromatin code hypothesis. Histones are major protein components of chromatin and are subject to numerous posttranslational modifications like acetylation, methylation, phosphorylation, polyADPribosylation and ubiquitination. The combination of these various potential modifications could generate a great diversity of chromatin states in the nucleus. This repertoire of modifications is at the centre of the histone code hypothesis, which is considered to establish two types of epigenetic markers: heritable (or epigenetic) which contribute to the maintenance of gene activity during cell divisions, or labile for rapid response to the environment. This code would be decoded by proteins or protein complexes able to recognize specific modifications. The modification by methylation of lysine 9 on histone H3 recognized by proteins of the HP1 family provides an example supporting this hypothesis. The structural rearrangements within chromatin, which occur during repair of damage produced by UV radiation, show certain parallels with those occurring during replication. To explain the dynamics of these rearrangements, one has put forward a three-step model: access, repair, restoration of structures [5]. The de novo assembly of nucleosomes takes place during replication and possibly during repair of DNA in the restoration phase. The construction of the nucleosome is facilitated by assembly factors or histone chaperones. The best characterized of these factors to date is CAF-1 (chromatin assembly factor-1), which was discovered in 1986, and which stimulates the formation of nucleosomes on DNA newly replicated in vitro. It comprises three subunits (p150, p60 and p48) and interacts with acetylated histones H3 and H4. It has been subsequently showed that CAF-1 is also able to promote the assembly of nucleosomes specifically coupled to the repair of DNA by nucleotide excision repair (NER) (which involves DNA synthesis) in vitro systems. Then demonstrated the in vitro recruitment of a phosphorylated form of CAF-1 (p60) to chromatin in response to UV irradiation of human cells. Such in vitro coupling between chromatin assembly and repair would ensure restoration of chromatin organization immediately after repair of the lesion. The coupling between repair and assembly in chromatin was reproduced in Drosophila embryo extracts to show that the nucleosome assembly commences from the lesion site. Single-strand breaks and gaps are the most effective lesions in stimulating nucleosome assembly via CAF-1. The search for two-hybrid partners of CAF-1, using the large subunit p150 as bait, has allowed identification of PCNA (proliferating cellular nuclear antigen), a marker of proliferation, whose involvement in replication and repair of DNA is well known. 27

The interaction of CAF-1 with PCNA provides the direct molecular link between chromatin assembly and replication or repair. Finally, protein Asf1 (anti-silencing factor 1) interacts with CAF-1 and can stimulate assembly. A chromatin assembly chain centered around PCNA is therefore gradually coming to light. After these studies in extracts and in cell cultures, these researchers investigated the importance of CAF-1 in a whole organism, Xenopus. They identified the Xenopus homologue of the p150 subunit of CAF-1. Using a dominant-negative strategy that takes advantage of the dimerization properties of xp150, they have revealed the critical role of CAF-1 in the rapid cell divisions characteristic of the embryonic development of Xenopus. Thus, CAF-1 plays a vital role in early vertebrate development. Beyond the level of the nucleosomes, the chromatin is compacted into higher structures which delimit specialized nuclear domains such as regions of heterochromatin and euchromatin. Heterochromatin is defined as the regions of chromatin that do not change their condensation during the cell cycle and represents the majority of the genome of higher eukaryotes. Heterochromatin principally comprises repeated non-coding DNA sequences, its characteristics generally contrast with those of euchromatin. One essential characteristic of the heterochromatin regions, which has been highly conserved during evolution, is the presence of hypoacetylated histones (H3 and H4). Apart from its repression of transcription, heterochromatin’s function remains largely unknown. The current evidence suggests that chromatin remodeling factors use the energy of ATP hydrolysis to generate superhelical torsion in DNA, to alter local DNA topology, and to disrupt histone-DNA interactions, perhaps by a mechanism that involves ATP-driven translocation along the DNA. Many chromatin remodeling factors have been found to affect transcription regulation, but it also appears that chromatin remodeling is important for processes other than transcription. Moreover, proteins in the SNF2-like family of ATPases have been found to participate in diverse processes such as homologous recombination (RAD54), transcription-coupled DBA repair (ERCC6/CSB), mitotic sister chromatid segregation (lodestar; Hrp1), histone deacetylation (Mi-2/CHD3/CHD4), and maintenance of DNA methylation states (ATRX). It thus appears that there is a broad range of nontranscriptional functions of chromatin remodeling proteins. In addition, the recent sequencing has led to the identification of many novel SWI2/SNF2-related putative ATPases. For example, there are al least seventeen SWI2/SNF2-related open reading frames in the Drosophila genome, and only six of the corresponding proteins have been analyzed. In their 28

native state, SWI2/SNF2-related proteins have been generally found to exist as subunits of multiprotein complexes. Thus, the purification of the native forms of the novel SWI2/SNF2like proteins would likely reveal many new chromatin remodeling complexes. It will be an interesting and important challenge to identify the functions of these new factors.

7.1. Non-transcriptional chromatin remodeling factors. Chromatin organization and dynamics, and the emerging of function Let us now schematically indicate the different processes and functions in which chromatin remodeling factors and co-factors seems to be involved: 1. Chromatin structure is an important component of eukaryotic DNA replication. Nucleosomes appear to be generally inhibitory to replication. For instance, the positioning of a nucleosome over a yeast autonomously replicating sequence (ARS) inhibits plasmid DNA replication in vivo, and the packaging of DNA into chromatin represses SV40 DNA replication in vitro. It has been found that specific DNA binding factors, such as the yeast origin recognition complex, can establish an arrangement of nucleosomes that allows the initiation of DNA replication. Hence, in the cell, it is possible that replication-competent chromatin structures are generated by the coordinate action of the DNA replication machinery and ATP-dependent chromatin remodeling factors. Recent studies have shown that a chromatin remodeling factor termed CHRAC (chromatin accessibility complex) play a function in the initiation of DNA replication. On the other hand, CHRAC did not appear to affect the elongation of replication with DNA or chromatin. There may be other activities that are involved in the progression of DNA polymerase through chromatin. Other studies have revealed a connection between the SWI/SNF complex and DNA replication. First, a direct interaction was identified between the Ini1/hSNF5 subunit of human SWI/SNF complex and the human papillomavirus E1 replication protein, which is a sequence-specific DNA binding factor that functions in a manner similar to SV40 large T antigen. Transient transfection, antisense, and mutational analyses revealed that the interaction between Ini1/hSNF5 and E1 is essential for the efficient replication of papillomavirus DNA. It was not determined, however, whether Ini1/hSNF5 protein facilitates papillomavirus replication by itself or as a subunit of the SWI/SNF complex. In a separate work, the function of yeast SWI/SNF complex in DNA replication was investigated by using a mitotic plasmid stability assay that reflects the replication efficiency of ARS-containing minichromosomes. These experiments showed that


mutations in subunits of the SWI/SNF complex cause a decrease in the maintenance of plasmids that contain one particular ARS but not three other ARSs that were tested. In addition, the recruitment of a LexA-SWI2 fusion protein was observed to increase the maintenance of an ARS-containing plasmid. These findings indicate that the SWI/SNF complex can, in some instances, increase the efficiency of DNA replication. 2. Chromatin assembly is a fundamental biological process by which nuclear DNA is packaged into nucleosomes. ACF (ATP-utilizing chromatin assembly and remodeling factor) was identified and purified on the basis of its ability to mediate the ATP-dependent assembly of periodic nucleosome arrays, and it consists of two subunits: ISWI and a polypeptide termed Acf1. The Acf1 subunit functions cooperatively with the ISWI subunit for the assembly of chromatin. ACF-mediated chromatin assembly can be carried out with purified recombinant ACF, purified recombinant NAP-1 (a core histone chaperone), purified core histones, DNA (either linear or circular), and ATP. ACF requires the hydrolysis of ATP for both the deposition of histones onto DNA as well as the establishment of periodic nucleosome arrays. In addition to its function in the assembly of chromatin, ACF can catalyze the ATP-dependent mobilization of nucleosomes, and is therefore a chromatin remodeling factor. Thus, ACF provides an example of a chromatin remodeling factor that has been purified and characterized based on its function in a nontranscriptional process. The ISWI subunit of ACF is also present in the NURF (nucleosome remodeling factor) and CHRAC chromatin remodeling factors. NURF was identified on the basis of its ability to disrupt the regularity of a periodic nucleosome array in the presence of the GAGA factor (a sequence-specific DNA binding factor), whereas CHRAC was identified as a factor that increases the accessibility of restriction enzymes to DNA packaged into chromatin. CHRAC has been found to be devoid of topoisomerases II as well as to contain Acf1 and two smaller subunits. It is therefore likely to be closely related to ACF. NURF, on the other hand, appears to be distinct from ACF, aside from the common ISWI subunit. The analysis of ISWI in Drosophila revealed that ISWI is essential for viability and is localized to both euchromatic and heterochromatic sites in polytene and mitotic chromosomes. The localization of ISWI and RNA polymerase II was found to be mostly nonoverlapping, and thus, ISWI is present mainly at sites that are not actively transcribed. In addition, mutations in ISWI caused a gross change in the structure of the male X chromosome. In Xenopus, ISWI was found to be required for the ATP-dependent global remodeling of nuclei. Thus, these results are consistent with a function of ISWIcontaining complexes in the establishment and maintenance of chromatin structure. The RSC 30

(remodel the structure of chromatin) chromatin remodeling complex, which contains the STH1 ATPase, has the ability to catalyze the transfer of a histone octamer from a nucleosome core particle to naked DNA. This activity may reflect a specialized function of RSC in the establishment and/or maintenance of chromosome structure, such as the transfer of preexisting histones to newly synthesized DNA during replication or the nonreplicative exchange of histones. Mutations in subunits of RSC cause a G2/M cell cycle arrest, which could be due to defects in chromatin assembly and chromosome structure. 3. Chromatin structure is an important component of DNA repair in eukaryotes. The DNA repair machinery must have access to the DNA lesions in chromatin, and the newly-repaired DNA must also be reassembled into chromatin. A recent biochemical study of chromatin structure and nucleotide excision repair revealed that ACF is able to facilitate the excision of pyrimidine (6-4) pyrimidonne photoproducts in a dinuclesome. In that study, chromatinmediated repression of nucleotide excision was only partially relieved by ACF, but it nevertheless appears that ACF can increase the efficiency of nucleotide excision in chromatin. Different SWI2/SNF2-like ATPase seems to contribute to DNA repair. For example, a SWI2/SNF2-related protein with a function in DNA repair is Cockayne syndrome B protein (CSB). More precisely, CSB is involved in the coupling of nucleotide excision repair to transcription. Purified recombinant CSB polypeptide is a DNA-dependent ATPase that exhibits ATP-dependent chromatin remodeling activity. These findings suggest that CSB, possibly as a component of a multisubunit complex, may function to remodel chromatin during transcription-coupled repair. Chromatin remodeling is also likely to be important for DNA recombination. For instance, human SWI/SNF complex was observed to stimulate the cleavage and processing of DNA by the RAG1 and RAG2 proteins that are involved in V(D)J recombination. In addition, RAD54 is a SWI2/SNF2-like protein that is involved in the recombinational repair of double-strand breaks and homologous recombination during meiosis. It is thus possible that RAD54 functions to remodel chromatin during recombination. As we have just seen, there are a handful of examples of ATP-driven chromatin remodeling-reorganizing factors in processes other than transcription. These studies are likely to be the proverbial “tip of the iceberg� of an exciting and important area of chromatin research. One of the key challenges for the future will be to devise chromatin remodeling assays that accurately reflect the specific functions of the factors in the cell.


7.2. DNA methylation, silencing and gene expression: from regulation to biological information Let us now consider the other most important epigenetic event: genomic DNA methylation. Recent studies have illuminated the role of DNA methylation in controlling gene expression and have strengthened its links with histones modifications and chromatin remodeling. DNA methylation is found in the genomes of diverse organisms including both prokaryotes and eukaryotes. In prokaryotes, DNA methylation occurs on both cytosine and adenine bases and encompasses part of the host restriction system. In multicellular eukaryotes, however, methylation seems to be confined to cytosine bases and is associated with a repressed chromatin state and inhibition of gene expression. DNA methylation is essential for viability in mice, because targeted disruption of the DNA methyltransferase enzymes results in lethality. There are two general mechanisms by which DNA methylation inhibits gene expression: first, modification of cytosine bases can inhibit the association of some DNAbinding factors with their cognate DNA recognition sequences; and second, proteins that recognize methyl-CpG can elicit the repressive potential of methylated DNA. Methyl-CpGbinding proteins (MBPs) use transcriptional co-repressor molecules to silence transcription and to modify surrounding chromatin, providing a link between DNA methylation and chromatin remodeling and modification. Recently, there have been significant advances in our understanding of the mechanisms by which DNA methylation is targeted for transcriptional repression and the role of MBPs in interpreting the methyl-CpG signal and silencing gene expression. We emphasize examples from mammalian systems, including studies on animal models, because several recent reviews have covered topics of DNA methylation and silencing in plants and fungi. Mammalian cytosine DNA methyltransferase enzymes fit into two general classes based on their preferred DNA substrate. The de novo methyltransferases DNMT3a and DNMT3b are mainly responsible for introducing cytosine methylation at previously unmethylated CpG sites, whereas the maintenance methyltransferase DNMT1 copies pre-existing methylation patterns onto the new DNA strand during DNA replication. A fourth DNA methyltransferase, DNMT2, shows weak DNA methyltransferases activity in vitro, but targeted deletion of the DNMT2 gene in embryonic stem cells causes no detectable effect on global DNA methylation, suggesting that this enzyme has little involvement in setting DNA methylation patterns. Examples of global de novo methylation have been well documented during germcell development and early embryogenesis, when many DNA methylation marks are re32

established after phases of genome demethylation. Recent studies based on cell-culture model systems have suggested at least three possible means by which de novo methylation might be targeted: first, DNMT3 enzymes themselves might recognize DNA or chromatin via specific domains; second, DNMT3a and DNMT3b might be recruited through protein-protein interactions with transcriptional repressors or other factors; third, the RNA-mediated interference (RNAi) system might target de novo methylation to specific DNA sequences. Clearly, DNA methylation is a multilevel regulating process and a multilayer informational mechanism. Let us specify some of the conformational and functional complexities associated with DNA methylation in cell activity. (a) In mouse cells, DNMT3 enzymes partially localize to regions of pericentromeric heterochromatin. Functional studies shows that the conserved PWWP domain is required to target the catalytic activity to these regions of the genome. The importance of this domain was highlighted by the discovery that a mutation in the PWWP domain of the human DNMT3b protein causes ICF syndrome, a several autosomal recessive disease in humans. The mutation abolishes normal chromatin binding by DNMT3b in tissue-culture cells and causes a reduction in DNA methylation of classical satellite 2 DNA in affected individuals. (b) DNMT’s can also be targeted to endogenous genes by interaction with site-specific transcriptional repressor proteins. This idea was first suggested for oncogenic fusion protein PML-RAR, which can recruit DNA methyltransferases and cause hypermethylation of target genes in cancer cells. More recently, Brenner et al. [32] have shown that the Myc protein associates with DNA methyltransferase activity, and that a direct interaction between DNMT3a and Myc is required for efficient repression of the Myc target gene p21cip1. Chromatin immunoprecipitation studies using tissue-culture cells have shown that Myc is required for recruitment of DNMT3a to the p21cip1 promoter region, leading to de novo methylation of the p21cip1 promoter. The DNA methyltransferase activity of DNMT3a is required for this gene-silencing event, because a point mutation in the catalytic domain alleviates silencing. The exact role of DNMT3a in regulating normal expression of p21cip1 gene remains to be elucidated, but this model system has uncovered a potential mechanism by which de novo methylation is recruited by factors that repress transcription. (c) Other studies [54] have shown that DNA methylation has a role in repressing the expression of genes encoding ribosomal RNA (rRNA). This silencing event relies on de novo methylation of a single CpG dinucleotide in the promoter region of the rRNA gene. TIP5, a component of the NoRC repressor complex, associates with both DNMT3b and DNMT1, 33

providing the link between NoRC silencing at the rRNA genes and the DNA methylation system. Chromatin immunoprecipitation analysis indicates that DNMT enzymes are actively recruited to the promoter region of the rRNA genes, and subsequent silencing is dependent on de novo methylation. Together, these studies indicate that protein-protein interactions are important mediators of de novo DNA methylation, and that DNA methyltransferase enzymes can function as classical co-repressor molecules for some transcription factors. (d) In plants and some fungi, induction of the RNAi gene silencing system results in both posttranscriptional silencing of gene expression. RNAi-mediated transcriptional silencing in plants often results in de novo methylation of the silenced gene. It has also been reported a similar mechanism of de novo methylation during RNAi silencing in mammalian cell-culture systems. When double-stranded RNA corresponding to the promoter sequence of a gene is introduced into mammalian tissue-culture cells, the target gene is efficiently silenced concomitant with de novo DNA methylation of the corresponding promoter sequence. Thus, as we just saw, epigenetic modification of DNA is coupled to gene expression silencing. DNA methylation is linked with transcriptional silencing of associated genes, and much effort has been invested in studying the mechanisms that underpin this relationship. Two basic models have evolved: in the first, DNA methylation can directly repress transcription by blocking transcriptional activators from binding to cognate DNA sequences; in the second, MBPs recognize methylated DNA and recruit co-repressors to silence gene expression directly. For some promoters, repression mediated by DNA methylation is most efficient in a chromatin context, indicating that the “active� component of this repression system might rely on chromatin modification. In keeping with this observation, MBPs associate with chromatin remodeling co-repressor complexes. Two unexpected facets of the DNA-methylation-mediated silencing system have recently become apparent: first, DNA methyltransferase enzymes themselves might be involved in setting up the silenced state in addition to their catalytic activities; and second, DNA methylation can affect transcriptional elongation in addition to its characterized role in inhibiting transcriptional activation.

8. Epigenetics: a new relationship between genetics and environment, or between genotype and phenotypes As we already said, the massive sequencing of our genome leave the key questions concerning the organization and functioning of cells and organisms unanswered. Epigenetics seems to be


the most promising line of research able to unfold this post-genomic era. The epigenetic processes discussed above are natural and essential to many organism functions, but if they occur improperly, there can be major adverse health and behavioral effects. There is the need of a large-scale epigenetic mapping of our cells that moves biology to this new century with a new visage. Epigenetics can be understood as the process that initiate and maintain heritable patterns of gene expression and gene function in an inheritable manner without changing the sequence of the genome. Epigenetics can also be understood as the interplay between environment and genetics. In this last issue, epigenetics provides the best explanation about how the same genotype can be translated to different phenotypes. Perhaps still more important: Epigenetics supply organisms with new layers of biological information that result from specific rules of regulation and organization inherent to chromosomes, cells and organisms. This high-order biological information can, in turn, retroact in many different ways on the genome profile and functioning, according to the cellular and extracellular contexts. One essential epigenetic mechanism is that for repressing transcription, in which, first, methyltransferases attach methyl groups (CH3) to cytosine bases of DNA, then, protein complexes, recruited to methylated DNA, remove acetyl groups and repress transcription. Repression of transcription—the transfer of genetic information from DNA to RNA—is one route by which epigenetic mechanisms can adversely impact health. Examples of the powerful modulator effects of epigenetics in this scenario are starting to emerge in an exponential increasing number: strains of Agouti mice can undergo changes of DNA methylation status of an inserted IAP (intracisternal A-particle) element that changes the animal’s cot color; cloned animals demonstrate an inefficient epigenetic reprogramming of the transplanted nucleus that it is associated with aberrations in imprinting, aberrant growth and lethality beyond a threshold of faulty epigenetic control; and monozygotic twins, that thus share the same DNA sequence, can present anthropomorphic difference and distinct disease susceptibility related to epigenetic differences such as DNA methylation and histone modifications. The goal of epigenetics research is to identify all the organizational and chemical changes and relationships among chromatin constituents that provide function to the genetic code, which will allow a better understanding of normal development, aging, abnormal gene control in cancer, and other diseases as well the role of environment in human health. It is important first to assess what we know about epigenetics in health and disease. The epigenetic network has many layers of complexity that could be summarized in four as 35

follows: DNA methylation, histone modifications, chromatin remodeling and microRNAs. The most studied modification in humans is the methylation of the cytosine located within the dinucleotide CpG. 5-methylcytosine (5mC) in normal human tissue DNAs constitutes 0.75– 1% of all nucleotide bases and we should remember that ∼3–4% of all cytosines are methylated in normal human DNA. CpG dinucleotides are not randomly distributed throughout the vast human genome. There are CpG-rich regions, known as GpG islands, which are usually unmethylated in all normal tissues and frequently span the 5′ end region (promoter, untranslated region and exon 1) of a number of genes: they are excellent markers of the beginning of a gene. If the corresponding transcription factors are available, the histones modifications are in a permissive state and the CpG island remains in an unmethylated state, that particular gene will be transcribed. Of course, there are exceptions to the general rule. We can find certain normally methylated CpG islands in at least four cases as follows: imprinted genes, X-chromosome genes in women, germline-specific genes and tissue-specific genes. Genomic or parental imprinting is a process involving acquisition of DNA hypermethylation in one allele of a gene early on in the male and female germline that leads to monoallelic expression. A similar phenomenon of gene-dosage reduction can also be invoked with regard to the methylation of CpG islands in one X-chromosome in women, which renders these genes inactive in order to avoid redundancy. In addition, although DNA methylation is not a widely occurring system for regulating “normal” gene expression, sometimes it does indeed accomplish this purpose. We have the case, for example, of those genes whose expression is restricted to the male or female germline and that are not expressed later in any adult tissue, such as the MAGE gene family. Furthermore, methylation has been postulated as a mechanism for silencing tissue-specific genes in cell types in which they should not be expressed (see section 6.2. for more details). However, it is still not clear whether this type of methylation is secondary to a lack of gene expression owing to the absence the particular cell-type-specific transcription factor or whether it is the main force behind transcriptional tissue-specific silencing. What is the significance of the presence of DNA methylation outside the CpG islands? One of the most exciting possibilities for the normal function of DNA methylation is its role in repressing parasitic DNA sequences. Our genome is plagued with transposons and endogenous retroviruses acquired throughout the history of the human species. We can control these imported sequences thanks to direct transcriptional repression mediated by several host 36

proteins, but our main line of defense against the large burden of parasitic sequence elements (> 35% of our genome) may be DNA methylation. Methylation of the promoters of our intragenomic parasites inactivates these sequences and, over time, will destroy many transposons. The DNA methylation landscape of a normal cell occurs in the context of all the other epigenetic marks. In this manner, DNA methylation is associated with the formation of nuclease-resistant chromatin, and methyl-CpG binding proteins and DNA methyltransferases, two superfamily enzymes that are key regulators of histone function. Histone function is one of several important members of the epigenetic network. The status of acetylation and methylation of specific lysine residues contained within the tails of nucleosomal core histones is known to play a critical role in chromatin packaging and gene expression. Overall, histone hypoacetylation and hypermethylation is characteristic of DNA sequences methylated and repressed in normal cells, such as X-chromosome in females, imprinted genes and tissue-specific genes. However, each particular lysine residue can be a marker for a different signal. For example, the underacetylated lysine positions of K5, K8 and K12 of histone H4 are characteristic of heterochromatic X-chromosomes, whilst acetylated K16 distribution is similar in the X-chromosome and autosomes. In this regard, recent data suggest that acetyl-K16 behaves differently from the other acetylated residues. It provides a barrier to the spreading of Sir proteins, histone hypoacetylation and silencing within adjacent subtelomeric DNA regions. With regard to histone H4 methylation, the only lysine methylation event in this tail occurs at position K20. This modification seems to trigger many biological processes and the trimethylation of histone H4 has recently been identified as a marker of constitutive heterochromatin and gene silencing, and has also been found to be associated with aging.

8.1. The chromosome organization in topological territories as a principle of gene expression and cell activity during development Gene silencing in mammalian cells may be mediated by positioning of a gene in proximity to the heterochromatic territory in interphase nuclei, suggesting that the eukaryotic nucleus is divided into heterochromatin territories that repress transcription, and territories in which transcription is favored. It is currently believed that this spatial organization of the nucleus in discrete territories may help to establish a tissue-specific pattern of gene expression required for the onset and progression of cellular differentiation. During erythroid maturation the entire


genome is progressively silenced and packaged into heterochromatin, whereas the b-globin locus is among the last to be silenced. The tissue-specific activation and maintenance of its expression in the repressive environment of a terminally differentiating red cell are due, at least in part, to the Locus Control Region (LCR) comprised of several DNasel hypersensitive sites (HS) that contain numerous binding sites for erythroid and ubiquitous transcription factors. It has been previously demonstrated that stable gene expression and open chromatin configuration require both a functional enhancer and positioning away from centromeric heterochromatin, and revealed that enhancers can mediate the localization of genes to nuclear compartments that favor gene activation, away from the repressive compartment of heterochromatin. This led to the hypothesis that activators bound to tissue-specific LCRs/enhancers may act to establish and maintain gene expression in differentiated cells by ensuring that a linked gene resides in a nuclear compartment permissive for transcription, thereby preventing its inclusion in facultative heterochromatin that forms during cell differentiation, and permitting it to be active in the appropriate lineage. Interestingly, as red cells mature, the majority of DNA is heterochromatized at non-centromeric sites in the nucleus. Thus, this observation that repressors move from centromers during differentiation provides a possible basis for this maturation associated increase in heterochromatin formation. The main point to be stressed here is that the way in which the genome is organized within the nuclear space—i.e., how chromatin fibers are folded across the entire human genome—, both within normal, and diseased, cells, influences gene regulation and chromosome function. This kind of organization entails a layer of biological information that stand at a level beyond that carried by DNA sequence, and which is essential for processing the genetic information itself. The spatial organization of human chromosomes and genes in the nucleus is changed, for example, during development and in certain diseases. Let us give some details on this spatial organization of chromosomes. The nucleoplasm is territorially organized into a number of mobile subnuclear organelles in which certain protein and nucleic acid components with specific biological activities are concentrated. The most prominent example is the nucleolus, which contains regions with ribosomal RNA genes from several chromosomes, and also contains the machinery for the assembly of ribosomal subunits. Other nuclear substructures include the SC35 domains, Cajal and promyelocytic leukaemia (PML) bodies. The localization, organization, dynamics and biological activities of these suborganelles appear to be closely related to gene expression. Using whole chromosome painting probes and fluorescence in situ hybridization (FISH), a territorial organization of 38

interphase chromosomes has been demonstrated [46]. Chromosome territories have irregular shapes and occupy discrete nuclear positions with little overlap. In general, gene-rich chromosomes are located more in the nuclear interior while gene-poor chromosome territories are located at the nuclear periphery. In agreement with this, non-transcribed sequences were predominantly found at the nuclear periphery or perinucleolar while active genes and generich regions tended to localize on chromosome surfaces exposed to the nuclear interior or on loops extending from the territories. Recent experimental findings support the concept of a functional nuclear space, the interchromosomal domain (ICD) compartment. According to the ICD model, the interface between chromosome territories is more easily accessible to large nuclear complexes than regions within the territory. More recently, it has been proposed that chromosomes territories are further organized into 1-Mb domains, extending the more accessible space to open intrachromosomal regions surrounded by denser chromatin domains. Using high-resolution light microscopy, an apparent bead-like structure of chromatin can be visualized in which âˆź1-Mb domains of chromatin are more densely packed into an approximately spherical subcompartment structure with dimensions of 300–400 nm. These domains are thought to be formed by a specific folding of the 30-nm chromatin fiber, to which the chain of nucleosomes associates under physiological salt concentrations. Other models have been proposed. The radial-loop models propose small loops of roughly 100 kb arranged in rosettes, while the random-walk/giant-loop (RW/GL) model proposes large loops of chromatin back-folded to an underlying structure. In the chromonema model, the compaction of the 30-nm fiber is achieved by its folding into 60- to 80-nm fibers that undergo additional folding to 100- to 130nm chromonema fibers. Besides the chromatin density local variations on a length scale of several hundred nanometers that have been assigned to the existence of the above-mentioned 1-Mb domains, regions in the micrometer-length scale at the nuclear periphery around the nucleolus and at the centromeres also display a high degree of compaction. These dense chromatin regions are often referred to as heterochromatin as opposed to the less dense euchromatin. Heterochromatin has been described as containing increased DNA methylation at cytosines, specific histone modification patterns like methylation of lysine 8 on histone H3 and histone hypoacetylation, binding of heterochromatin protein 1 (HP1), interactions with non-coding RNA and activities of the RNAi-mediated silencing machinery. Regions of so-called facultative heterochromatin are known, which display a transition from a more open 39

transcriptionally active conformation into a biologically inactive conformation. The most prominent example of facultative heterochromatin is the inactivation of one of the two X chromosomes in mammalian females, which shows a strong chromatin compaction during embryogenesis into a dense structure referred to as the Barr body. The relation of a dense heterochromatin state with a biologically inactive chromatin conformation has led to the concept that the biological activity of chromatin is regulated via its accessibility to protein factors.

8.2. Chromatin compaction and biological processes: how form determines function Depending on the degree of compaction, chromatin regions have different accessibility, and this accessibility is related to the biological function of chromosomes. The organizational, topological and dynamical properties of the chromatin environment determine the mobility of Cajal and PML bodies and other supramolecular complexes. Large particles with sizes around 100 nm (100-nm diameter nanospheres, 2.5-MDa dextrans) are completely excluded from dense chromatin regions. The nuclear Cajal and PML bodies with a size of âˆź1Âľm will therefore have access to only a subspace of the nucleus. Any movement of these bodies over distances above a few hundred nanometers will require a chromatin reorganization that allows the separation of chromatin subdomains to create accessible regions within and through the chromatin network. The interface between chromosome territories (ICD) would provide such a subcompartment. More generally, the nuclear subspace accessible for nuclear bodies is likely to include the interface between chromosome territories that allows a movement of the bodies by a transient separation of chromatin domains. By random movements, the nuclear bodies explore this accessible space and are expected to be localized more frequently in these more open chromatin regions. In as much as they coincide with regions of active gene transcription, they would also constitute possible biological targets of PML and Cajal bodies. This hypothesis can be tested by analyzing the intranuclear mobility and localization of multiple nuclear components simultaneously. In this respect, however, we need to make an ambitious step to develop a more integrative and global approach by studying nuclear bodies, RNA and chromatin loci in parallel within the same living cell, in order to identify their functional relations. Finally, we should not forget that DNA methylation and histone modifications occur in the context of a higher-order chromatin structure. Nucleosomes, formed by the wrapping of 147


bp of DNA segment around a histone octamer core organized into the central (H3–H4)2 tetramer and of two peripheral H2A–H2B dimers, are the champions of that league. The nucleosome is the first level of DNA compaction in the nucleus2. A second level of compaction consists of a solenoid structure formed by the nucleosomal array and stabilized by the linker histone H1 [24]. Multi-subunit complexes, such as those constituted by the SWI/SNF proteins, use the energy of ATP to mobilize nucleosomes and allow the access of the transcriptional machinery; or massive repressive complexes counteract SWI/SNF functions, as does the polycomb group gene family. In the end, the gene expression and function, and overall the genome activity, of the healthy cell is the result of the balance between these massive forces shaping our human epigenome. The organization of DNA into chromatin is a highly dynamic process, which reveals different degrees of plasticity of the many nuclear complexes involved in the remodeling of chromatin. Moreover, this plasticity plays a dynamic regulatory role in the gene transcription process and in the DNA repair process. The organization of DNA must be compatible with access of those DNA binding factors that regulate genome replication, the transcription of genes, recombination of chromosomes and repair of damaged DNA. Modulation of the type and extent of chromatin folding emerges and an important, early regulatory principle [162]. Recent years have witnessed a rapid discovery of enzymes that modify the structure of chromatin in response to cell-internal and –external cues, rendering chromatin a highly dynamic structure. Chromatin “plasticity” is mainly brought about by ATP-dependent chromatin “remodeling” factors, multiprotein complexes containing nucleic acid-stimulated DEAD/H ATPases of the Swi2/Snf2 subfamily. These enzymes couple ATP hydrolysis to alterations of the chromatin structure at the level of the nucleosomal array, which generally facilitates the access to DNA binding proteins to their cognate sites. Energy-dependent modifications of chromatin structure revealed distinct phenomena for individual classes of 2

The genomic DNA of eukaryotes is very long (about 2 m in humans) compared to the diameter of the cell’s nucleus (about 10–5 m). Packaging of the genome involves coiling of the DNA in a left-handed spiral around molecular spools, made of histone octamers, to form nucleosomes. About 80% of the genomic DNA is organized as nucleosomes. Nucleosome assembly is initiated by wrapping a 121 bp DNA segment around a tetramer of histones (H3/H4)2. Association of H2A/H2B dimmers at either side of the tetramer organizes 147 bp of DNA. DNA is a moderately flexible polymer with a persistence length of about 150 pb. In the absence of exogenous forces, 150pb of DNA essentially follow a straight path, but in a nucleosome, it coils in 1.65 toroidal superhelical turns around the octamer and thus is severely distorted. This means that DNA bending around the nucleosome is expected to happen at high energy costs. This energy cost is compensated by DNA–histone interactions occurring approximately every 10 bp on each DNA strand, generating 7 histone–DNA interaction clusters per DNA coil (superhelical locations (SHL) 0.5, 1.5, 2.5, … , 6.5). The DNA-histone interactions are stabilized by more than 116 direct and 358 water-bridged interactions, rendering the nucleosome a stable particle in the absence of additional factors.


remodeling factors, such as the modification of the path of DNA supercoiling around the histone octamer, the generation of accessibility of the nucleosomal DNA to DNA-binding proteins and of the stable distortion of nucleosome structure including formation of particle with “dinucleosome” characteristics. Not all enzymes are equally active in all assays, but all are capable of inducing ATPdependent relocation of histone octamers, their “sliding”, on DNA. It seems that the diversity of nucleosome “remodeling” phenomena brought about the enzymes of the different classes may result from variations of one basic plastic theme, reminiscent of the action of DNA translocases. Conceivably, nucleosome remodeling enzymes simply enhance the intrinsic dynamic properties of nucleosomes by lowering the energy barrier due to the destabilization of histone–DNA interactions. Accordingly, “twisting” and “looping” models have also been invoked to explain catalyzed nucleosome mobility. Taken together, the available data do not support models invoking DNA twisting as the main driving force for nucleosome movements. Rather, the data favor a “loop recapture” model, in which the distortion of DNA into a loop at the nucleosome border initiates nucleosome sliding. Taken together, the current phenomenology is consistent with chromatin remodelers working as anchored DNA translocases. Most, if not all, experimental results can be explained by one general mechanism. DNA translocation against a fixed histone body leads to detachment of DNA segments from the edge of the nucleosome, their bending and recapturing by the histones to form a loop. Depending on the step length of the remodeling cycle, which corresponds to the length of the DNA segment detached from the nucleosomal edge and the extent of inclusion of nucleosomal linker DNA into a loop, variable-sized DNA loops may be generated on the nucleosome surface. The speed, directionality and processivity with which such loops are propagated will determine the predominant result of a remodeling reaction, the detection of an “altered path” of the DNA or nucleosome translocation. Thus, the rich phenomenology of nucleosome necessarily involves quantitative differences in certain kinetic and geometric parameters. However, it is clear that all types of remodeling enzymes are able to increase the dynamic properties of nucleosomes in arrays and to generate accessible sites. Therefore, the looping model discussed mentioned above will need to be refined as one considers nucleosome dynamics in a folded nucleosomal fiber.

8.3. The link between epigenetics and diseases, mediated by aberrant chromatin alterations 42

In order to highlight the fundamental fact that the organizational properties of chromatin influences the genome activity, in the sense that they are the principal carrier and activator of the multilevel genetic and epigenetic information, let us now consider the epigenome of a sick cell. Most human diseases have an epigenetic cause. The perfect control of our cells by DNA methylation, histone modifications, chromatin-remodeling and microRNAs become dramatically distorted in the sick cell. In other words, severe alterations of nuclear forms and especially of the chromatin and the chromosome may provoke different damages to the cell’s activity, suggesting thus that the topological forms of living systems are one of the most, if not the most, fundamental determinant of the unfolding of biological functions during development and evolution. The ground-breaking discoveries have been initially made in cancer cells, but it is just the beginning of the characterization of the wrong epigenomes underlying neurological, cardiovascular and immunological pathologies. In human cancer, the DNA methylation aberrations observed can be considered as falling into one of two categories: transcriptional silencing of tumor suppressor genes by CpG island promoter hypermethylation in the context of a massive global genomic hypermethylation. CpG islands become hypermethylated with the result that the expression of the contiguous gene is shut down. If this aberration affects a tumor suppressor gene it confers a selective advantage on that cell and is selected generation after generation. Recently, researchers have contributed to the identification of a long list of hypermethylated genes in human neoplasias, and this epigenetic alteration is now considered to be a common hallmark of all human cancers affecting all cellular pathways. At the same as the aforementioned CpG islands become hypermethylated, the genome of the cancer cell undergoes global hypomethylation. The malignant cell can have 20–60% less genomic 5mC than its normal counterpart. The loss of methyl groups is accomplished mainly by hypomethylation of the “body” (coding regions and introns) of genes and through demethylation of repetitive DNA sequences, which account for 20–30% of the human genome. How does global DNA hypomethylation contribute to carcinogenesis? Three mechanisms can be invoked as follows: chromosomal instability, reactivation of transposable elements and loss of imprinting. Undermethylation of DNA may favor mitotic recombination, leading to loss of herezygosity as well as promoting karyotypically detectable rearrangements. Additionally, extensive demethylation in centromeric sequences is common in human tumors and may play a role in aneuploidy. As evidence of this, patients with germline mutations in DNA methyltransferase 3b (DNMT3b) are known to have numerous chromosome aberrations. 43

Hypomethylation of malignant cell DNA can also reactivate intragenomic parasitic DNA, such as L1 (Long Interspersed Nuclear Elements, LINEs) and Alu (recombinogenic sequence) repeats. These, and other previously silent transposons, may now be transcribed and even “moved� to other genomic regions, where they can disrupt normal cellular genes. Finally, the loss of methyl groups can affect imprinted genes and genes from the methylated-X chromosome of women. The best-studied case is of the effects of the H19/IGF-2 locus on chromosome 11p15 in certain childhood tumors. DNA methylation also occupies a place at the crossroads of many pathways in immunology, providing us with a clearer understanding of the molecular network of the immune system. Besides, aberrant DNA methylation patterns go beyond the fields of oncology and immunology to touch a wide range of fields of biomedical and scientific knowledge. Regarding histone modifications, we are largely ignorant of how these histone modification markers are disrupted in human diseases. In cancer cells, it is known that hypermethylated promoter CpG islands of transcriptionally repressed tumor suppressor genes are associated with hypoacetylated and hypermethylated histones H3 and H4. It is also recognized that certain genes with tumor suppressor-like properties such as p21WAF1 are silent at the transcriptional level, in the absence of CpG island hypermethylation in association with hypoacetylated and hypermethylated histones H3 and H4. However, until very recently there was not a profile of overall histone modifications and their genomic locations in the transformed cell. This need to determine the histone modification pattern of tumors was even more urgent, given the rapid development of histone deacetylase inhibitors as putative anticancer drugs. It has been provided this missing linking demonstrating that human tumors undergo an overall loss of monoacetylation of lysine 16 and trimethylation of lysine 20 in the tail of histone H4 [66]. These two histone modification losses can be considered as almost universal epigenetic markers of malignant transformation, as has now been accepted for global DNA hypomethylation and CpG island hypermethylation. Certain histone acetylation and methylation marks may have prognostic value. For other human pathologies, research is still in the infancy to define their histone modification signatures. The most important theme of the previous remarks on the human epigenome, which is mainly related to a methodological and epistemological revolution in epigenetics, may be summarized as follow. Cells of a multicellular organism are genetically homogeneous but structurally and functionally heterogeneous owing to the differential expression of genes. Many of these differences in gene expression arise during development and are subsequently 44

retained through mitosis. Stable modifications of this kind are said to be “epigenetic”, because they are heritable in the short term but do not involve mutations of the DNA itself. The two most important nuclear processes that mediate epigenetic phenomena are DNA methylation and histone modifications. Epigenetic effects by means of DNA methylation have an important role in development but can also arise stochastically as humans and animals age. Identification of proteins that mediate these effects has provided insight into this complex process and diseases that occur when it is perturbed. External influences on epigenetic processes are seen in the effects of diet on long-term diseases such as cancer. Thus, epigenetic mechanisms seem to allow an organism to respond to the environment through changes in gene expression. The extent to which environmental effects can provoke epigenetic response is a crucial question which is still largely unanswered.

9. The emergence of new paradigms for explaining gene expression and cell activity The development in recent years of epigenetics entails the emergence of a more integrative and global approach to the study of biological forms and functions. To tackle the whole human epigenome and to deal with the entire organisms, it is needed to elucidate the relationship between the different level of plasticity of protein complexes associated with chromatin remodeling and gene regulation, and the various levels of complexity exhibited by the phenotypic patterns during embryogenesis. The landscape of genetic expression revealed by epigenetics studies appear to be much more complex than that showed by DNA sequence alone, and it clearly results from diverse layers of biological information (DNA folding, histone modifications, the complex regulatory roles of DNA methylation, chromatin remodeling complexes, spatial organization of chromosomes, architecture of nuclear bodies, cell morphology and mobility), which intervenes at different stages of the spatial and temporal development and evolution of a living human organism. In fact, the weirdest genetic phenomena have very little to do with the genes themselves. True, as the units of DNA that encode proteins needed for life, genes have played biology’s center stage foe decades. But works over the past ten years suggests that they are little more than puppets. An assortment of proteins and, sometimes, RNAs, pull the strings, telling the genes when and where to turn on or off. The findings are helping researchers understand longstanding puzzles. Why, for example, are some genes from one parent “silenced” in the embryo, so that certain traits are determined only by the other parent’s genes? Or how are


some tumor suppressor genes inactivated—without any mutation—increasing the propensity for cancer? Such phenomena are clues suggesting that gene expression is not determined solely by the DNA code itself. Instead, as we have shown in the above sections, that cellular and oorganismic activity also depends on a host of so-called epigenetic phenomena—defined as any gene-regulating activity that doesn’t involve changes to the DNA code and that can persist through one or more generations. Over the past ten years, mainly thanks to the development of a more integrative and global approach, cell and molecular biologists have been able to show the four fundamental facts which, putted together, represent a major conceptual and experimental breakthrough in the life sciences. 1. Gene activity is influenced by the proteins that package the DNA into chromatin, the protein-DNA complex that helps the genome fit nicely into the nucleus, by enzymes that modify both those proteins and the DNA itself, and even by RNAs. Chromatin structure affects the binding of transcription factors, proteins that control gene activity, to the DNA. Proteins histones in chromatin are modified in different ways to modulate gene expression. The chromatin-modifying enzymes are now considered the “master puppeteers” of gene expression. During embryonic development, they orchestrate the many changes through which a single fertilized egg cell turns into a complex organism. And throughout life, epigenetic changes enable cells to respond to environmental signals conveyed by hormones, growth factors, and other regulatory macromolecules without to alter the DNA itself. In other words, epigenetic effects give a mechanism by which the environment can very stably change living beings. The most important point which needs to be stressed here is that chromatin is not just a way to package the DNA to keep it stable. All the recent work on acetylation, methylation, phosphorylation and histone modifications and their direct correlation with gene expression show that chromatin’s proteins are much more than static scaffolding. Instead, they form an interface between DNA and the rest of the organism. The topological and dynamical modifications of chromatin structure play a crucial role sometime for clearing the way for transcription and other times blocking it. The exact nature of these modifications remains mostly mysterious. One may think that the different modifications mean different things, because they recruit different kinds of proteins and prevent other kinds of modifications. 2. These proteins and RNAs control patterns of gene expression that are passed on to successive generations. A variety of RNAs can interfere with gene expression at multiple points along the road from DNA to protein. More than a decade ago, plant biologists


recognized a phenomenon called posttranscriptional gene silencing in which RNA causes structurally similar mRNAs to be degraded before their messages can be translated into proteins. In 1998, researchers found a similar phenomenon in nematodes, and it has since turned up in a wide range of other organisms, including mammals. RNAs can also act directly on chromatin, binding to specific regions to shut down gene expression. Sometimes an RNA can even shut down an entire chromosome. Furthermore, newly formed female embryos solve the so-called “dosage compensation problem—female mammals have two X chromosomes, and if both were active, their cells would be making twice as much of the X-encoded proteins as males′ cells do—with the aid of an RNA called XIST, translated from an X chromosome gene. By binding to one copy of the X chromosome, XIST, somehow sets in motion a series of modifications of its chromatin that shuts the chromosome down permanently. Thus, regulatory noncoding RNAs could be widespread in the genome, and influence gene function. The unit of inheritance, i.e., a gene, now extends beyond the sequence to epigenetic modifications of that sequence. Moreover, the various epigenetic profiles that generate phenotypic differences may retroact on the arrangement of gene sequences and influence thus the genome integrity. 3. In the 1950s, the late C. Waddington proposed an epigenetic hypothesis according to which patterns of gene expression, not genes themselves, define each cell type. Moreover, many biologists thought that the genome changes all the time as cells differentiate. Liver cells, for instance, became liver cells by losing unnecessary genes, such as those involved in making kidney or muscle cells. In other words, certain genes would be lost during development. One of the best clues for this phenomenon came from the realization that the addition of methyl groups to DNA plays some role in silencing genes—and that somehow the methylation pattern carries biological information over from one generation to the next. Besides, since the 1970s, cancer biologists observed that the DNA in cancer cells tends to be more heavily methylated than DNA in healthy cells. So methylation might contribute to cancer development by altering gene expression. The demonstration was furnished recently [12]. The combined observations that DNA methylation can result in repression of gene expression, and that promoters of tumor suppressor genes are often methylated in human cancers provided an alternative mechanism for the inactivation of these genes which does not involve genetic mutations. Thus, the changes in methylation in tumors are in fact the cause, and not merely a consequence, of tumor formation.


4. Many observational data concerning anatomic and morphological differences in the phenotypic lineage made researchers aware that there could be parent-specific effects in the offspring. Other observations through the centuries suggested that the genes passed on by each parent had somehow been permanently marked—or imprinted—so that expression patterns of the maternal and paternal genes differ in their progeny. These so-called imprints have since been found in angiosperms, mammals, and some protozoa. Over the past few years, several genes have been identified that are active only when inherited from the mother, and others turned on only when inherited from the father. Many imprinted genes have been found; about half are expressed when they come from the father and half when they come from the mother. Among these are a number of disease genes, including the necdin and UBE3A genes on chromosome 15 that are involved in Prader-Willi and Angelman syndromes, and possibly p73, a tumor suppressor gene involved in the brain cancer neuroblastoma. Several others, including Peg3 and Igf2, affect embryonic growth or are expressed in the placenta. In addition, a lot of organizational and morphological features of imprinted genes regarding the way in which they are arranged in the genome have been discovered; in particular, it has often been found that imprinted genes are clustered. For example, the H19 and Igf2 genes and six other imprinted genes are located near one another on human chromosomes 11 (11p15.5). Another finding is that the imprinted genes DKK1 and GTL2 are neighbors on human chromosome 14q32, arranged much the same way that one has found them in the mouse. The organization of the DNA around both these genes clusters is similar, suggesting that the surrounding DNA somehow specifies the imprinting arrangement. On both chromosome, genes next to one another are imprinted so as to be reciprocally expresses—that is, one is turned off when the other is turned on, depending on whether the chromosome come from the mother or the father. And in both cases one gene in the pair on each chromosome codes not for a protein but for an RNA that never gets translated into a protein. Indeed, an estimated one-quarter produces these non-coding RNAs. Finally, it has been found that on both chromosomes, the pairs of genes within the clusters are separated by a stretch of DNA that includes so-called CpG islands, regions of DNA where the bases cytosine and guanine alternate with one another (for more details on this theme, see section 6). That stretch of DNA contains a binding site for a protein called CTCF, which forms a chromosomal “boundary.” When CTCF is attached, it isolates DNA upstream of the binding site from DNA downstream. Recently, it has been showed a connection between methylation of some of the CpG islands, CTCF binding, and the activity of the H19 and Igf2 48

genes. The Igf2 gene is located before the H19 gene on chromosome 11; farther along the chromosome, after both genes, are regulatory regions called enhancers. Transcription can occur only if the enhancers interact with promoters located near each gene. One important recent finding is that CTCF binding blocks the enhancer’ access to the Igf2 promoter, thereby silencing that gene. However, the enhancer can still interact with the H19 promoter, which coincides with the CpG island and CTCF binding site. Thus H19 is active. But when the CpG island at the CTCF binding site is methylated, the enhancers cannot interact with the H19 promoter and instead cause the Igf2 genes to turn on.

10. General theoretical discussion of some fundamental themes in the life sciences We would like to conclude this article with some remarks on some new research roads in biology to which we should pay much more attention in the future. The crucial point is that there are, as we tried to show all along this paper, different layers of biological information depending on the level of organization one consider for studying properly the properties and behaviors of any living organism at the various stages of its embryogenetic development and overall growth. Moreover, these different layers are interconnected and may all get involved at the same time and in a coordinated way in cell’s activity and organism’s development. Let us point to few important aspects of this multi-layer organization of biological information. A first general worthwhile theoretical remark is that biologists are striving to move beyond a “parts list” to more fully understand the ways in which network components interact with one another to influence complex processes. Thus attention has turned to the analysis of networks that operate at many levels. At the scale of networks of interacting proteins that govern cellular function, the flagellated bacterium Caulobacter crescentus has been a model system for cell cycle regulation for at least 25 years. This example shows clearly that the transcriptional regulatory circuits provide only a fraction of the signaling pathways and regulatory mechanisms that control the cell. Rates of gene expression acting in cells are modulated through posttranscriptional mechanisms that affect mRNA half-lives and translation initiation and progression, as well as DNA structural and chemical state modifications that affect transcription initiation rates. Phosphotransfer cascades provide fast point-to-point signaling and conditional signaling mechanisms to integrate internal and external status signals, activate regulatory molecules, and coordinate the progress of diverse asynchronous pathways. As if this were not complex enough, one is now finding that the


interior of bacterial cells is highly spatially structured, with the cellular position of many regulatory proteins as tightly controlled at each time in the cell cycle as are their concentrations. The second remark relates to the proteomics challenge. Learning to read patterns of proteins synthesis could provide new insights into the working of the cell and thereby a better understanding of how organisms, including humans, develop and function. By identifying proteins on the scale of the proteome—which can involve tens or even hundreds of thousands of proteins, depending on the state of the cells being analyzed—proteomics can answer fundamental questions about biological mechanisms at a much faster pace than the singleprotein approach. The “global” picture painted by proteomics can, for example, allow cell biologists to start building a complex map of cell function by discovering how changes in one signaling pathways—the cascade of molecular events sparked by a signal such as a hormone or neurotransmitter—affect other pathways, or how proteins within one signaling pathways interact with each other. The “global” picture also allows medical researchers to look at the multiplicity of factors involved in diseases, very few of which are caused by a single gene. Proteomics is very likely one of the most important of the “post-genomic” approaches to understanding gene function because it is the proteins encoded by genes that are ultimately responsible for all processes that take place within the cell. But, while proteins may yield the most important clues to cellular function, they are also the most difficult of the cell’s components to detect on large scale. A second, complementary, post-genomic approach is expression profiling, also known as transcriptomics. When a gene is expressed in a cell, its code is first transcribed to an intermediary “messenger RNA” (mRNA) which is then translated into a protein. Transcriptomics involves identifying the mRNAs expressed by the genome at a given time. This gives a snapshot of the genome’s plans for protein synthesis under the cellular conditions at that moment. Transcriptomics can, specifically, yield important biological information about what genes are turned on, and when. But it has the disadvantage that, although the snapshot it provides reflects the genome’s plans for protein synthesis, it does not represent the realization of those plans. The correlation between mRNA and protein levels is poor, generally lower than 0.5, because the rates of degradation of individual mRNAs and proteins differ, and because many proteins are modified after they have been translated, so that one mRNA can give rise to more than one protein. Even in the simplest self-replicating organism, Mycoplasma genitalium, there are 24 per cent more proteins than genes, and in humans there 50

could be at least three times more. Post-translational modification of proteins is important for biological processes, particularly in the propagation of cellular signals, where, for example, the attachment of a phosphate group to a protein can trigger either activation or inactivation of a signaling cascade. So measuring proteins directly might provide more accurate picture of biological information involved in cell’s activity. The development of a proteomics program has led in recent years to a significant elucidation of the relationship between structure and function in biomolecules and to an important revision of the prevailing paradigm that (rigid) structure determines (linearly) function. Several studies on the role played by proteins and proteins interactions in biological phenomena have permitted to elucidate several misconceptions regarding the nature of the relation between the structure and function of biomolecules. (i) Since the overall three-dimensional structure of proteins is always much better conserved than their sequence, it is not uncommon for members of a protein family that possess no more than 10–30% sequence identity to have structures that are practically superimposable. Residues critical for maintaining the protein-fold and those involved in functional activity tend to be highly conserved. However, since proteins during evolution gradually lose some functions and acquire new ones, the residues implicated in the function will not necessarily be retained even when the protein-fold remains the same. Conservation of protein-fold will then not be correlated with retention of function since a link between structure and function would be expected only if attention were restricted to the functional binding site region instead of the whole protein. (ii) Another difficulty in analyzing correlations between structure and function lies in the fact that individual proteins usually have several functions. It has been estimated that proteins are able, on average, to interact with as many as five partners through a variety of binding sites. (iii) A further ambiguity lies in the term function itself. This term is used in different ways and a possible correlation with structure will depend on which aspect of function and which level of biological organization are being considered. Biochemists tend to focus on the molecular level and consider mainly activities like binding, catalysis or signaling. In many instances, the only activity that is discussed is binding activity and function is then taken as synonymous with binding. However, functions can also be defined at the cellular and organismic level, in which case they acquire a meaning only with respect to the biological


system as a whole, for instance by contributing to its health, performance, survival or reproduction. (iv) Protein functions can also be distinguished in terms of the biological roles they play at the organismic level and this has led to a classification in three classes corresponding to energy-, information- and communication-associated proteins. The link between such biological roles and protein structure is less direct than between binding activity and structure, since these functions tend to result from the integrated interactions of many individual proteins or macromolecular assemblies. (v) The prevailing paradigm structure determines function is often interpreted to mean that there is a causal relation between structure and function. Although a biological activity always depends on an underlying physical structure, the structure in fact does not possess causal efficacy in bringing about a certain activity. Causal relations are dynamic relations between successive events and not between two material objects or between a geometrical static structure and a physico-chemical event. A biological event such as a binding reaction can thus not be caused by something that is not an event, like the structure of one or both interacting partners. It is also impossible to deduce binding activity from the structure of one of the interacting molecules if a particular relationship with a specific partner has not first been identified. This is because a binding site is essentially a relational entity defined by the interacting partner and not merely by structural features that are identifiable independently of the relational nexus with a particular ligand. The structure of a binding site, as opposed to the structure of a molecule, cannot be described without considering the binding partner. Since the static geometrical structure of a protein is not the only and most important cause of its function, attempts to analyze structurefunction relationships should consist in uncovering correlations rather than (linear) causal relations. A conception shift is thus needed in proteomics, for there is not a unique, necessary and sufficient relation between the three-dimensional structure of a protein and its biological activity, but a nexus of dynamical relationships between protein complexes and their interactions and activities. There is definitely a direct and fundamental link between the topological folding of proteins, the tertiary forms which result from these folding and their dynamics in the context of cell’s activity. However, biological information of proteins does not derives only from structural information, but also from the complex functional networks which connect specific binding sites at the molecular level to the cell’s activity and to the more global organismic level of organization and working. 52

10.1. The need for a systems biology approach: on the role of interactions and emergent properties The third remark, which relates to the previous one, is aimed at highlighting the importance of a systems biology approach. System biology is about interactions rather than about constituents, although knowing the constituents of the system under study may be a prerequisite for starting description and modeling. Interactions often bring about properties (sometimes called emergent properties), for example, a system may start oscillating although the constituent alone would not. For example, evolutionary biologists have wondered for long jump-like transitions can occur in evolution. From the viewpoint of systems theory, the answer arises from bifurcations. In an non-linear system, at certain points in parameter space (called critical points) bifurcations occur, that is, a small change in a parameter leads to a qualitative change in system behaviour (e.g. a switch from steady state to oscillation). It is clear that the number of potential interactions within a system is far greater than the number of constituents. If only pairwise interactions were allowed, the former number would be n2 if the latter number were denoted by n. The number of interactions is even larger if interactions within triples and larger sets are allowed, as is the case in multi-protein complexes. In the sense of systems biology, a biological object or being is a system if emergent properties result from it. Genomics has certainly been a very important and fruitful undertaking and gave us much new insights into molecular biology. However, much of molecular biology is based on reductionism and simple determinism. It is an extreme exaggeration to say that the human genome has been “deciphered�. Besides the fact that not to all ORFs functions have been assigned yet, it should be acknowledged that even if all functions were known, we would be far from understanding the phenomenon of life because knowledge of all the individual gene products does not say much about the interactions between them. According to a system’s view of life, the study of the dynamics and interaction networks is essential for understanding the ways in which living organisms regulate their cellular activity and organize their physiological growth. One of the major goals of systems biology is to find appropriate ways of diagramming and mathematically describing the specific, complex interactions within and between living cells. Because complex systems have emergent properties, their behaviour cannot be understood or predicted simply by analyzing the structure of their components. The constituents of a complex system interact in many ways, including negative feed-back and


feed-forward control, which lead to dynamic features that cannot be captured satisfactorily by linear mathematical models that disregard cooperativity and non-additive effects. In view of the complexity of informational pathways and networks, new types of mathematics are required for modeling these systems. It is worth of noticing that the specificity of a complex biological activity does not arise from the specificity of the individual molecules that are involved, as these components frequently function in many different processes. For instance, genes that affect memory formation in the fruit fly encode proteins in the cyclic AMP (camp) signaling pathway that are not specific to memory. It is the particular cellular compartment and environment in which a second messenger, such as camp, is released that allow a gene product to have a unique effect. Biological specificity results from the way in which these components assemble and function together. Interactions between the parts, as well as influences from the environment, give rise to new features, such as network behaviour which are absent in the isolated components. Consequently, “emergence” has appeared as a new concept that complements “reduction” when reduction fails. Emergent properties resist any attempt at being predicted or deduced by explicit calculation or any other means. In this regard, emergent properties differ from resultant properties, which can be predicted from lower-level information. For instance, the resultant mass of a multi-component protein assembly is simply equal to the sum of the masses of each individual component. However, the way in which we taste the saltiness of sodium chloride is not reducible to the properties of sodium and chlorine gas. An important aspect of emergent properties is that they have their own causal powers, which are not reducible to the powers of their constituents. According to the principles of emergence, the natural world is organized into stages that have evolved over evolutionary time through continuous and discontinuous processes. Reductionists advocate the idea of “upward causation” by which molecular states generally bring about higher-level phenomena, whereas proponents of emergence admit “down-ward” causation by which higher-level systems may influence lower-level configurations.

10.2. The chromatin code as a complex regulatory principle of cell activity The last remark relate to the dynamic reorganization of chromatin during the cell cycle. Chromatin remodeling faces major questions concerning the intricate and multi-level interplay between the topological plasticity of nuclear structures involved in genome regulation and cell


activity and the ever-increasing complexity of gene regulatory networks. The experimental evidence suggests that chromatin form and its modifications play a critical role in gene regulatory coding (gene activation or gene silencing), in the emergence of cellular differentiation and in development. Even if genomic DNA is the ultimate template of our heredity, clearly DNA is far from being the exclusive entity responsible for generating the full range of information that ultimately results in a complex eukaryotic organism, such as ourselves. We favor the view that epigenetics, imposed at the level of DNA-packaging proteins (histones and nonhistones), is a critical feature of a genome-wide mechanism of information storage and retrieval that is only beginning to be understood. All the theoretical and experimental work we considered along this paper suggests that a “chromatin code” exists that may considerably extend the information potential of the genetic (DNA) code. Chromatin code is a second layer of coding implemented by histone tail post-translational modifications outside the nucleosome. This second-level code is required in eukaryotic cells to provide the additional information necessary to process their long genome (compared to prokaryote ones). There is more and more evidence that histone proteins and their associated covalent modifications contribute to a mechanism that can alter chromatin structure, thereby leading to inherited differences in transcriptional “on-off” states or to the stable propagation of chromosomes by defining a specialized higher order structure at centromers. Differences in “on-off” transcriptional states are reflected by differences in histone modifications that are either “euchromatic” (on) or “heterochromatic” (off). The “chromatin code” is read out by coregulators analog and prior to the reading out of the primary coding (the nucleotide sequence) by transcription factors, and then translated by means of different transcriptional and posttranscriptional steps into biological functions. Furthermore, it has been shown that control of chromatin packaging, into condensed or decondensed fibers play a major role in the regulation of gene expression. Several complex events are associated with the decondensation of chromatin, which is characteristic of the “open” or active state. Genes in open regions of chromatin can be expressed efficiently whereas genes in condensed or closed regions are silent. Recent research suggests that large regions of chromatin are literally “ploughed” open by RNA polymerase II complex in a process known as intergenic transcription. The RNA polymerase complex has been shown to contact a number of factors capable of modifying the structure of the chromatin fiber by adding chemical side groups to nucleosomes and other chromatin proteins. These modifications are thought to increase the accessibility of the genes within these regions or 55

domains resulting in augmented binding of transcription factors to the gene. However this alone is not enough for efficient gene expression. Many genes require additional regulatory regions of DNA known as enhancers that are often located at considerable distances from the gene along the chromatin fiber. One has recently shown that distant enhancers actually physically contact their target genes in the nucleus by looping out the intervening DNA. Such long-range interactions between enhancers and genes are powerful switches that turn on transcription of individual genes resulting in high levels of expression. Recent works suggests that these regulatory interactions between enhancers and genes can only occur if the chromatin containing them is first remodeled to the open state by intergenic transcription. These are essential processes in the chain of events that control gene expression. All the previous clearly indicates that there is a severe correlation between the different local plastic modifications and the overall topological reorganization of chromatin structure and the series of events that lead to high levels of genes expression. Chromatin remodeling and dynamic chromatin reorganization is a key regulatory principle whose processing is essential to open the way to gene activity, but also to control cell differentiation and to orchestrate embryonic development. Overall chromosome stability and identity seem to be influenced by epigenetic alterations of the underlying chromatin structure. In keeping with the distinct qualities of accessible and inaccessible nucleosomal states, it could be that “open” (euchromatic) chromatin represents the underlying principle that is required for inheritance of progenitor character and young cells division. Conversely, “closed” (heterochromatic) chromatin is possibly the reflection of a developmental “memory” that stabilizes lineage commitment and gradually restricts the self-renewal potential of our somatic cells. Whatever it may be, epigenetics imparts a fundamental regulatory system beyond the sequence information of our genetic code and emphasizes that Mendel’s gene is much more than just a DNA moiety.

11. Some far-reaching paths of research in the biological sciences and concluding remarks All what we tried to show in the previous analysis is, first, that the phenomenon of epigenetics clearly reveal the existence of a cryptic code—that is, a systemic web of regulated processes— of physico-chemical (the DNA sequence is chemically altered) and topological (the structure of chromatin is spatially modified into new forms) nature which is written (or


unfolds) over our genome’s DNA sequence. Secondly, which is still more fundamental, that not only is the DNA sequence important but also how gene activity is regulated in response to environment. In other words, epigenetics gives greater place to the interactions of genes with their environment, which bring the phenotype into being. Thus, the epigenetic phenomena refer to extra layers of nuclear plasticity and information processing that influences in an essential way gene activity and cell’s functioning without altering the DNA sequence. Furthermore, recent research shows that the epigenetic code is “read” out by coregulator complexes analog and prior to the reading out of the primary coding (the nucleotide sequence) by transcription-factors [70], [105]. These coregulators interact with other genome maintenance and regulation pathways for permitting the cellular transcription machinery to “interpret” properly the gene regulatory code. Chromatin is indeed highly dynamic rather than static, and this dynamism represents another important mechanism of gene regulation in eukaryotes. Biologists now understand that cell must continually “remodel” the local chromatin structure to give regulatory molecules access to the DNA substrate. This remodeling rests on the action of multi-protein complexes, and generally involves either the hydrolysis of ATP to alter histone-DNA interactions, or the post-translational modification of histone tails. In contrast to transcription factors, coregulator complexes, while not interacting directly with DNA, frequently generate appreciable “synergies” between the actions of transcription factors, with changes in concentration having disproportionate (non-additive) consequences on the rate of transcription. For example, studies show that the nuclear receptor transcription intermediate factor-2 (TIF2) promotes synergy between two transcription activators. Researchers have also found that coregulators can act to switch transcription factors from being repressors to being activators, as well as link gene regulation to other molecular processes such as DNA maintenance and replication. All this suggests that transcription regulation can be predicted accurately only if coregulators’ activity is taken into account. The sequence of the human genome is the same in all our cells, whereas the epigenome differs from tissue to tissue, from organ to organ, from organism to organism, and changes in response to the cell’s environment. Epigenetic codes are much more subject to environmental influences than the DNA sequence. This could also help to explain how lifestyle and toxic chemicals affect susceptibility to diseases. In fact, up to 70% of the contribution to particular disease can be nongenetic. The challenge today is to pin down this vast, complex and everchanging code in a meaningful way. The diversity of epigenomes in different cell types means 57

that it may not make sense to restrict our study to one single tissue, or to a particular time in tissue’s development, but the epigenome of all tissues and the overall organism’s growth processes has to be mapped out. If the DNA sequence is like the musical score of a symphony, the epigenome is like the key signatures, phrasing and dynamics that show how the notes of the melody should be played. The musical score is only one component of the symphony rather static and silent, which is necessary but not at all sufficient for cells to start and develop their multi-level activity. When thinking at the systems level, it become clear that genes only matter because they are one of many cellular and organismic codes, each of them contribute to the construction of an organism by conveying a specific form of biological information. No single gene is more or less important than any other, and the loss of function of gene x causing phenotype X is not itself an interesting observation. It is only interesting if we can begin to qualitatively and quantitatively explain how gene x interacting with genes w, y and z together produce phenotype X in context A but not in context B, and what predictive value this interaction has on the system. Other stated, the complex processes of reading and interpreting the genome can take place only in the context of the embryonic development (during which, besides, the genome is reprogrammed many times), the action of all proteins, all lipids and other cellular mechanisms inherited from one parent. There are at least one hundred of different proteins involved in the cellular machinery, and without their action the genome couldn’t be expressed, which means that the biological informational content of DNA sequences would be very poor without this role of orchestration and construction of organisms played by proteins. Another fundamental fact is the role at almost every level of organization and communication in living cells of protein–protein interaction. Before the proteomics revolution, it was known that proteins were capable of interacting with each other and that protein function was regulated by interacting partners. However, the extent and degree of the protein–protein interaction network was not realized. It is now believed that, not only are the majority of proteins in an eukaryotic cell involved in complex formation at some point the life of the cell, but also that each protein might have, on average, 6-8 interacting partners. The way in which proteins interact ranges from direct apposition of extensive complementary surfaces to the association of specialized, often modular domains, to short, unstructured peptide stretches (“linear motifs”). With the availability of complete genome sequences, it has emerged that up to 30% of the proteomes in higher organisms consists of natively disordered elements. This material is now understood to often have a major role in the formation of large 58

regulatory protein complexes by incorporating linear binding motifs or acting as spacers between protein-binding and activity-bearing modules. Association of such a sequence with modular domains affords grater flexibility because both the peptide and peptide-binding domain can more readily be separated from the functional core of the binding partners (such as the active site of an enzyme) and, thus, do not usually impose considerable evolutionary constraints on the domains that support activity. Peptides or linear motifs can bind to proteins in a variety of ways. They are frequently held in an extended conformation, but recognition motifs can also consist of ß-turns, ß-strands or α-helical structures. One particularly interesting case is the protein–protein interactions that involves association of a ß-strand from the ligand with a strand or a ß sheet in the binding partner. The point to be stressed is that different kinds of ß-strand additions mediate protein–protein interactions in important cell processes, such as cell signaling or host-pathogen interaction. We have to consider that a fully detailed image of a complex organism requires knowledge of all of the proteins and RNAs produced from its genome. Due to the production of multiple mRNAs through alternative RNA processing pathways, human proteins often come in multiple variant forms. Only a global view of splicing regulation combined with a detailed understanding of its mechanisms will allow us to paint a picture of an organism’s total complement of proteins and of how this complement changes with development and the environment. From the very beginning of a living organism—i.e., the fertilized egg and embryogenesis—these proteins and cellular system promote and control transcription and posttranscriptional modifications. And it is this whole system that enable cellular machinery to translate the lower layers of chemical information stored in the DNA sequence and packaged in the nucleus in viable and significant biological information and further to interpret genes properly in the framework of cell’s differentiation and organism’s growth. The comprehension is now increasingly emerging that there are many crucial biological questions that cannot be resolved only by means of genetic sequencing and local molecular mechanism analysis. Development and evolution, the formation and the function of the neural networks in the brain are processes that are not easily broken down into elements corresponding to effects of individual genes, individual biochemical components, or even individual cells. A whole-istic or systems approach seems to be required, and this is a challenge for theoretical as well as for molecular biologists: in particular, if development as such is to be understood, we need to uncover—presumably in a topological-combinatorial and dynamical manner—patterns of the activation of different sets of genes in its course. In this 59

respect, it has been recently demonstrated that a genomic regulatory network can explain the early development of the sea urchin embryo. These regulatory circuits prescribe the ordered expression of genes that determine the fates of developing cells and move those cells together down a one-way path to yield a functional organism. The definition of whole-istic (systems) biology needs a theoretical integration of mathematics, physics and biology for a better understanding, for instance, of a range of complex biological regulatory systems. It is very likely that the answer to many interesting biological issues lie on the frontier of mathematical patterns, physical constraints and biological processes. In the previous section we tried to show two very significant examples where mathematics, physics and biology interact very deeply, namely the topological manipulations of topoisomerases on DNA and the spatial folding of DNA into chromatin structure within the nucleus. In both cases, it is impossible to separate the three following levels of activity and organization of cells and organisms, namely the extremely accurate conformational flexibility of macromolecules, which seems to be a fundamental property of living matter acting locally and globally, the tendency of biological systems to work cooperatively into a wealthy variety of complex regulatory networks for ensuring the overall physiological integrity of organisms, and the multilevel dynamics whose action is very essential for sustaining biochemical metabolism, cells activity and the growth of any embryo into an adult organism. Among the different dynamical principles acting in living systems, two seem to be very pervasive: the continuous remodeling of macromolecular structures the cell’s nucleus is made of (especially chromatin and chromosome), and the role of selforganization in the formation, maintenance and organization of cellular structures. The most important processes related to the remodeling of macromolecular structures like chromatin and chromosome are those of folding (which lead to their condensation) and unfolding (conducting to their decondensation). These processes are connected, respectively, to conformational constraints (large-scale interactions and binding sites connectivity), and to organizational regulatory functions (expression of genes and cells activity). To the analysis of the first principle is dedicated a large part of this article. Let us make few brief remarks on the last principle. In contrast to the mechanism of self-assembly which involves the physical association of molecules into an equilibrium structure (for example, virus and phage proteins self-assemble to true equilibrium and form stable, static structures), the concept of self-organization is based on observations of chemical reactions far from equilibrium, and the associated processes involve the physical interactions of molecules in a 60

steady-state structure. It is well established in chemistry, physics, ecology, and sociobiology. Self-organization in the context of cell biology can be defined as the capacity of a macromolecular complex or organelle to determine its own structure based on the functional interactions of its components [117], [94]. In a self-organizing system, the interactions of its molecular parts determine its architectural and functional features. The processes that occur within a self-organized structure are not underpinned by a rigid architectural framework, rather, they determine its organization. For self-organization to act on macroscopic cellular structures, three requirements must be fulfilled: a cellular structure must be dynamic, material must be continuously exchanged, and an overall stable configuration must be generated from dynamic components. Observations from recent studies on the dynamic properties of cellular organelles indicate that many macroscopic cellular structures, such as cytoskeleton, the cell nucleus and the Golgi complex, fulfill the requirements for self-organization. These structures are characterized by two apparently contradictory properties. On one hand, they must be architecturally stable; on the other hand they must be flexible and prepared for change. Selforganization ensures structural stability without loss of plasticity. Fluctuations in the interaction properties of its components do not have deleterious effects on the structure as a whole. However, global and persistent changes rapidly result in morphological changes. The basis for the responsiveness of self-organized structures is the transient nature of the interactions among their components. The dynamic interplay of components generates frequent windows of opportunity during which proteins can change their interaction patterns or be modified. The effective availability of components is controlled by posttranslational modifications via signal transduction pathways. Another important point is the following: The key principle that runs through many large biological systems (proteins networks, cell components) is the notion of specificity, which enable the different constituent molecules or macromolecules to recognize each other and exclude others that do not belong, so that no external instructions are necessary to form the assembly. In other words, the pattern of an ordered structure is built into the bonding properties of its constituents, so that the system “assembles itself” without the need for a scaffold, which means that the system is capable of self-organization. It has to be stressed that a systems approach may benefit strongly from currently discussed large-scale programs of “transcriptomics” and “proteomics”. These include systematic studies of the expression of messenger RNA and proteins within cells of one and the same organism under different conditions of development. Further aspects of such post-genomic or 61

epigenomics programs are systematic comparative analysis of structures, modules and functions of proteins and regulatory sequences outside of genes, as well as their posttranslational modifications and associations. For the understanding of cell differentiation, the regulatory functions of noncoding sequences are of particular importance. Many different fundamental issues are connected with such programs. For example, for the developmental biologist, it is hoped that in this way the internal order of the network of gene regulation, its relation to morphogenetic processes, may be revealed. Comparison of different organisms may allow us to reconstruct pathways of evolution with respect to protein structure and function and the genomic organization of the regulation of gene activities. One of the many differences between biology and the physical sciences lies in the uniqueness of biological entities and the fact that these are the product of a long history [82], [122]. Living beings are truly historical structure. It can be said that all biological orders result from structural and geometrical constraints, biological robustness and adaptation, epigenetic flexibility and variability, and historical contingencies (the possible pathways of evolution). This simultaneously controlled and contingent natural history unfolds along different scales in time and space and follows different possible paths, so that living systems may encounter bifurcations, singularities and criticalities during their development. In fact, time and space are highly dynamics as own-proper parameters and also in the sense that they are the very outcome of the action of intrinsic dynamics and of their interactions with external factors or constraints, such as environmental changes. This problem may also be characterized by saying that there are not absolute phenomena in biology. Another outstanding feature of all organisms is their unlimited organizational and dynamical complexity. Every biological system is so involved in multiple interactions and pathways, so rich in rich in feedback devices, so plenty in retroactions and unforeseen effects, that one wonders whether a complete description is possible. As one goes to higher levels of organization not all the properties of the new entity are knowable consequences of the properties of the components, no more than chemistry is, in practice, predictable from physics, whatever shares, in principle, many general principles and laws of physics. We mention a last significant characteristic of complex structures one finds in biology, namely, the difference between contingency (variation) and necessity (sameness). In the inorganic world this distinction may be illustrated by the problem of the shape of snow crystal [148]. The fairly correct explanation, known since the time of Kepler and Descartes, is that the hexagonal form of the crystals is produced by the close packing in a plane of spherical water globules; in more precise terms, the internal structure 62

involves puckered hexagonal layers. The hexagonal symmetry is thus a necessity and follows from what Kepler called the demands of matter. (Necessity does not mean, however, that the crystals fulfill a universal law, since some of them may present erratic features or even another kind of symmetries). But what of the external (or, more exactly, dynamic) shape in the living world? Many different individual shapes are found and each is contingent on the particular history of its formation. How the symmetry of the dynamic shape is maintained during growth remains an unsolved problem. A few ideas are at the core of this paper. To conclude, it might be useful to rephrase them as brief statements: (i) The DNA sequences do not contain all the information for producing an organism. In other words, the genomic DNA is not the sole purveyor of biological information, first of all because, as we had thoroughly shown, in most relevant cases and situations, genes are expressed in an epigenetic framework and activated within specific cellular regulatory activities. Even the so-called central dogma of molecular biology, namely, that DNA sequences defines protein sequences in a way that the latter do not define the former (or, other stated, that DNA sequence contains the necessary and complete information for a protein), is only partly true and it need to be deeply revised. Think, for example, of the evident fact that genetic code is degenerate, and also of the more important phenomenon that noncoding regions of the DNA can vary without any effect on the final protein product. In eukaryotes, these noncoding regions occur within as well as between genes. Because of this and other complexities, a DNA sequence and the genetic code cannot be used by themselves to predict protein sequences. We need more than just a sequence and the code—for instance, the boundaries between coding and noncoding regions [139]. The point is that a fully detailed picture and understanding of a complex organism requires knowledge of all the protein sets and RNAs produced from its genome, and of how this complement changes with development and the environment [19]. It should here be recalled that due to the production of multiple mRNAs through alternative RNA processing pathways, human proteins often come in multiple variant forms. Variation in mRNA structure takes many different forms. Exons can be spliced into the mRNA or skipped. Introns that are normally excised can be retained in the mRNA. The position of either 5′ or 3′ splice sites can shift to make exons longer or shorter. In addition to these changes in splicing, alterations in transcriptional start site or polyadenylation site also allow production of multiple mRNAs from a single gene. All of these changes in mRNA structure can be regulated in diverse ways, depending on sexual genotype, cellular 63

differentiation, or the activation of particular cell signaling pathways. The effect of altered mRNA splicing on the structure of the encoded protein is similarly diverse. In some transcripts, whole functional domains can be added or subtracted from the protein coding sequences. In other systems, the introduction of an early stop codon can result in a truncated protein coding sequence or an unstable mRNA. Changes in splicing have been shown to determine the ligand binding of growth factor receptors and cell adhesion molecules, and to alter the activation domains of transcription. In other systems, the splicing pattern of an mRNA determines the subcellular localization of the encoded protein, the phosphorylation of the protein by kinases, or the binding of an enzyme by its allosteric effector. Determining how these sometimes subtle changes in sequence affect protein function is a crucial question in many different problems in developmental and cell biology including control of apoptosis, tumor progression, neuronal connectivity and the tuning of cell excitation and cell contraction. A quite recent discovery in Drosophila is both a fascinating example of the subtle structural changes that can be made in a protein and a remarkable demonstration of the number of proteins that can be produced from a single gene using alternative splicing. The Drosophila genome contains approximately 13, 600 identified genes, whereas the single DSCAM gene can produce nearly three times that number of proteins. It has been a puzzle that an organism as complex as a fly would need so few genes to describe all of its functions. It seems clear that due to alternative splicing the gene number is not an estimate of the protein complexity of the organism. This has many important theoretical consequences for biological sciences, which should lead to abandon the very incomplete description of life that has been conveyed by molecular biology in the last fifty years, and at the same time to the working out of a new vision of living forms and functions (on this topic see the interesting book of D. Noble [123]). The other crucial point, related to the previous one, is that other kind of biological information comes from other (chemical, physical, conformational, organizational and environmental) properties and different levels of organization of living systems. We already addressed in detail this point in different sections of the paper. (ii) In addition, the concept of “information� itself is misleading and really reductionist to take into account the highly complex processes responsible for gene expression and regulation and cell activity, as recent researches in epigenetics plainly demonstrate. For biologists, reductionism means that a particular characteristic of a living organism can be explained in terms of chemistry and physics. This would, in other words, eliminate the need for biology as a science. The problem with biology, unlike physics, is that its objects of interest are 64

extremely complex. Exploring the limits of reductionism in biology is important, because there is ample evidence that many fields of biological studies are non-reductionist in nature; in other words, much of biology cannot be reduced to physico-chemical properties. (iii) Many relevant biological mechanisms involved particularly in the development of the embryo and in morphogenesis reside less in the genomic DNA than in those epigenetic phenomena such as chromatin dynamics and chromosome organization, which control cell fate and the embryo development. For example, is has been showed that organization and cell fate switch respond to positional information in certain plants; in other words, cells are sensitive to (and are controlled by) the spatial location of gradients of morphogenetic substances or morphogens in tissues in the developing embryo [47], [167]. The cells would have “sensors” that would respond differently to different concentrations of the gradient. If the morphogen was a transcription factor, then enhancer or promoter factors might bind the morphogen at different strengths. For example, if a morphogen was being made at the anterior of the body, the genes responsible for organizing head development might have an enhancer that would bind the morphogen poorly. Only when there was a large concentration of the morphogen present would that gene be active. The gene(s) responsible for thorax formation, on the other hand, might have an enhancer that would bind the morphogen rather well, enabling it to respond to relatively low levels of that morphogen. The cells of the head would express both these genes, while the cells of the thorax would express only that gene whose enhancer could bind low amounts of the morphogen. The cells in the posterior portion of the body would not see any of this morphogen, and neither of these genes would be activated. In this way, cells could sense the presence of a morphogen and respond differentially, depending on the morphogen concentration. The sensor would not have to be an enhancer; it could just as well be a cell-surface receptor for a specific growth factor. The interpretation of gradients does not have to be linear. Indeed, starting in the 1980s, researchers have used indeed multigradient models in developmental biology. A simple and nice example is a two-gradient model used to explain the development of “eyespot” patterns on butterfly wings. One gradient consists of a linear diffusion of a morphogen. The second gradient involves the interpretation of this morphogen; in other words, the sensitivity threshold of the cells involved differs at different regions or morphodynamic domains of the wing. The existence of the second gradient gives rise to an elliptical spot, not the circular spot that would result if the sensitivity gradient were absent. Let us return for a moment to plants. Many types of plant cell retain their developmental plasticity and have the capacity to switch fate when exposed to a new 65

source of positional information. For example, in the root epidermis of Arabidopsis, cells differentiate in alternating files of hair cells and non-hair cells, in response to positional information and the activity of the homoeodomain transcription factor GLABRA2 (GL2) in future non-hair cells. It has been shown, by three-dimensional fluorescence in situ hybridization on intact root epidermal tissue, that alternative states of chromatin organization around the GL2 locus are required to control position-dependent cell-type specification. When, as a result of an atypical cell division, a cell is displaced from a hair file into a non-hair file, it switches fate. What was observed is that during this event the chromatin state around the GL2 locus is not inherited, but is reorganized in the G1 phase of the cell cycle in response to local positional information. This ability to remodel chromatin organization may provide the basis for the plasticity in plant cell fate changes. (iv) Gene are not all in life, and genetic coding in unable to explain many fundamental biological processes of living systems, for example, chromatin structures and chromosome spatial organization, cytoskeleton dynamics and mobility, cells signalization and communication, the mechanisms of patterns formation, and the embryogenesis development. Moreover, it is nowadays largely acknowledged the role of the non-genetically encoded properties of elasticity and of deformability of biological structures at the macromolecular, the cell or multi-cellular level, into the regulation and the generation of active physiological processes. Two very significant examples are, first, the motor role of biological membrane elasticity into the budding driving force of vesiculation initiating plasma membrane endocytosis, and second, the role of geometrical strains and deformations of embryonic tissues into the regulation of developmental genes expression during the early steps of Drosophila embryo development at gastrulation [62], [31]. The origins and the nature of the forces driving plasma membrane vesiculation still remain up to now an open question. Two main mechanisms have been proposed in recent years. The first one belongs to local bending forces thought to be developed by the Clathrin polymerization at the cytosolic surface of the membrane, giving rise to the curvature of the associated plasma membrane. The second belongs to the physical properties of elasticity of the phospholipids bilayer membrane. More specifically, the ubiquitous activity of trans-membrane translocation of phospholipids was proposed to generate the bending constraint necessary for vesicularisation in living cells, due to the induction of a difference of surface area between the two-coupled elastic leaflets of the phospholipids bilayer membrane. Such mechanism allowed to understand one of the Tangier disease cell phenotypes: an anomalous high dynamics of endocytosis (strongly perturbing the 66

cholesterol transporters traffic) that was genetically linked to a mutation of the inverse pump of PS, from the inner to the outer of the plasma membrane. Regarding the second example, what briefly can be said is that, during embryogenesis, the geometrical morphing of the embryo is controlled by a sequence of active morphogenetic movements, that is, under the control of developmental genes expression. It is very likely that these epigenetic constraints influence, in return, the expression of some of the developmental genes. It seems that there are developmental genes which expression is geometric-sensitive [62]. In the model of the Drosophila embryo, which developmental genes are well-known during early developmental stages [69], two different methods have been proposed to answer the issue. The first one consisted in searching developmental genes which expression pattern could be profoundly modified in response to an induced deformation exogenously, like twist, applied to the entire embryo. Five developmental genes were identified as geometric-sensitive to such a deformation, including the four developmental genes regulating the dorso-ventral polarity of the embryo. The second is the identification of cells specifically strained by endogenous morphogenetic movements at gastrulation, potentially resulting in a mechanical modulation of the expression of some of the developmental genes previously found as geometric-sensitive. Two distinct morphogenetic movements were, and are studied, involving geometrical induction of twist expression. Effectively, it has been found that twist is expressed in response to endogenous geometrical strains applied to anterior pole stomodeal cells, due to the convergent extension morphogenetic movement of gastrulation. Indeed, this geometrically induced expression of twist could participate to the control of the anterior gut tracks formation, twist expression being necessary for such formation initiated from stomodeal cells. The noteworthy meaning of this above described mechanism is that geometrical force applied to a fly embryo influences the expression of its developmental genes. So not everything is purely genetic and some remarkable features of the living cells and organisms are also geometric-sensitive, so that these cells and organisms may reorganize their form in response to dynamical constraints. It remains to be established whether this phenomenon also applies to human tissues and organs. And could be that geometrical pressures (coupled to environmental chemical-like stresses [114]) exerted on tissues and organs play a role in genes deregulation? (v) In fact, the concept of creativity (which include the ideas of mobility, action and emergence) comes closer to describing the process of development, rather than the prevalent notion of simply following a set of instructions (i.e., a mechanical code). Biological development of an organism is not merely a read-out and implementation of a set of genetic 67

instructions. Development is a continuing interaction between the “painter genotype” and the “canvas phenotype” that finally produces a living organism [39]. The generation of an individual entity through developmental processes is indeed a more creative act than the purely mechanical concept of molecular copying and reproducing could explain (the aboveexplained property of self-organization serves as a beautiful example which illustrate clearly certain striking features of creativity in biological structures and patterns.) It is due to the emerging-like process of development that no two biological entities are exactly the same— not even monozygotic twins or clones (as we thoroughly shown in section 7). This does not, however, mean that there are no rules or boundaries for growth and development to proceed [29], [152]. Certainly, there is an intrinsic consistency and reproducibility in development, which ensures that the principle “like begets like” is maintained. (vi) The formation of patterns during development (cells differentiation, tissues shaping, organogenesis) cannot be explained solely by genetic coding and molecular mechanisms. We need much more than the metaphors of coding and machines. For example, the principles of topological flexibility and dynamic organization allow (at least in part) for explaining how such patterns are generated by functional- driven remodeling processes (such as in chromatin folding or in chromosome reconfiguration), by geometric and mechanical constraints (see point (iv) above), and by principles of self-reorganization under the action both of intrinsic dynamical factors (chemical, kinetic and catalytic parameters and reactions) and the influence of external or environmental factors (energetic, metabolic, ecc.), Another interesting example is the dynamic control exerted by positional (or space-dependent flow of) information on development of the early Drosophila embryo. In fact, morphogen gradients contribute to pattern formation by determining positional information in morphogenetic fields. In Drosophila, maternal gradients establish the initial position of boundaries for zygotic gap gene expression, which in turn convey positional information to pair-rule and segmentpolarity genes, the latter forming a segmental pre-pattern by the onset of gastrulation. However, it has been reported recently substantial anterior shifts in the position of gap domains after their initial establishment; and further showed that these shifts are based on a regulatory mechanism that relies on asymmetric gap–gap cross-repression and does not require the diffusion of gap proteins. The analysis implies that the threshold-dependent interpretation of maternal morphogen concentration is not sufficient to determine shifting gap domain boundary positions, and suggests that establishing and interpreting positional information are not independent processes in the Drosophila blastoderm [84]. 68

(vii) We would like to conclude by stressing the fact that a true new theory of living organisms not only requires new mathematics, new physics, and new epistemology, but also that some striking new ideas and methods borrowed from each of these disciplines be worked out together and merged into a meaningful interdisciplinary and global explanation of biological systems. Such an approach might contribute to restore to our fragmented biological sciences the kind of integration and unity they very need. The plethora of recent biological observations and theoretical models my pave the way to discover new mathematical objects and concepts and open to new frontier-problems, and conversely, biology may benefit from these mathematical structures in structuring experimental data and clarifying their descriptions. This gathering of deep mathematical concepts, physical principles and epistemological analysis is necessary for thinking to a kind of relational biology, which will enable us to explore relationships between systems, properties and behaviors of living organisms. An attempt should be made to construct a theory—say, a semiobiology— in which the informational language (code, program, computation) so characteristic of molecular biology be completed by (and somehow translated into) the language of dynamical systems (phase space, bifurcations, trajectories) and the language of topology (deformations, plasticity, forms). This considerations lead ultimately to the suggestion that our traditional modes of system representation, involving fixed sets of sequential states together with imposed mechanical laws, strictly pertain to an extremely limited class of systems that can be called simple (static) systems or mechanisms. Biological systems are not in this class and they must be called complex or dynamic. Complex systems can only be in some sense approximated, locally and temporally, by simple ones, and to may be described a bouquet of new mathematical idea is necessary. Such a fundamental change of viewpoint leads to a number of theoretical and experimental consequences, some of which are described in this work.

References [1] Abbott, A., “A post-genomic challenge: learning to read patterns of protein synthesis”, Nature, 402 (1999), 715-720. [2] Ageno, M., Dal vivente al non vivente, Theoria, Rom, 1992. [3] Alberghina, L., Westerhoff, H.V. (eds.), Systems Biology–Definitions and Perspectives, Springer, Berlin, 2005. [4] Alberts, B., et al., Molecular Biology of the Cell, fourth edition, GS Garland Science, New York, 2002.


[5] Almouzni, G., “Assembly of spaced chromatin: Involvement of ATP and DNA topoisomerases”, EMBO J., 7 (1988), 4355-4365. [6] Almouzni, G., Kaufman, P.D., “DNA replication, nucleotide excision repair, and nucleosome assembly”, in S.C.R. Elgin and J.L. Workman (eds.), Chromatin and gene expression, Oxford University Press, Oxford, 2000, 24-48. [7] Almouzni, G., Khochbin, S., Dimitrov, S. and Wolffe, A. P., “Histone acetylation influences both gene expression and development of Xenopus laevis”, Dev. Biol., 165 (1994), 654-669. [8] Anderson, P.W., “More is different. Broken symmetry and the nature of the hierarchical structure of science”, Science, 177 (1972), 393-396. [9] Atlan, H., La fin du «tout génétique»? Vers de nouveaux paradigms en biologie, INRA Éditions, Paris, 1999. [10] Atlan, H., Cohen, I.R., “Immune Information, Self-organization and Meaning”, International Immunology, 10(6), 1998, 711-717. [11] Azzalin, C.M., Reichenbach, P., Khoriauli, L., Giulotto, E., and J. Lingner, “TolomericRepeat–; Containing RNA and RNA Surveillance Factors at Mammalian Chromosome Ends”, Science, 318(5851), 2007, 798-801. [12] Ballestar, E., Esteller, M., “The epigenetic breakdown of cancer cells: from DNA methylation to histone modification”, Prog. Mol. Subcell. Biol., 38 (2005), 169-181. [13] Bates, A. and A. Maxwell, DNA Topology, Oxford University Press, Oxford, 1993. [14] Belmont, A. S., “Mitotic chromosome scaffold structure: new approaches to an old controversy”, Proc. Natl. Acad. Sci. USA, 99(25), 2002, 15855-15857. [15] Benecke, A., “Chromatin code, local non-equilibrium dynamics, and the emergence of transcription regulatory programs”, Eur. Phys. J. E, 19 (2006), 379-384. [16] Benecke, A., “Genomic plasticity and information processing by transcriptional coregulators”, ComPlexUs, 1 (2003), 65-76. [17] Berger, J.M., Gambin, S.J., Harrison, S.C., and Wang, J.C., “Structure and Mechanism of DNA Topoisomerase II”, Nature, 379 (1996), 225-232. [18] Bickmore, W. A., Teague, P., “Influences of chromosome size, gene density and nuclear position on the frequency of constitutional translocations in the human population”, Chromosome Res., 10 (2002), 707-715. [19] Black, D.L., “Protein Diversity from Alternative Slicing: A Challenge for Bioinformatics and Post-Genome Biology”, Cell, 103 (2000), 367-370. [20] Bock, G. R., Goode, J. A. (eds.), Complexity in Biological information Processing, Novartis Foundation Series, John Wiley & Sons, New York, 2001. [21] Boi, L., “Beyond genetic determinism and natural selection. New approaches in the study of living beings and biological systems”, Biology and Philosophy, 2008 (to appear). [22] Boi, L., “Geometrical and Topological Modeling of Supercoiling in Supramolecular Structures”, Biophysical Reviews and Letters, 2(3), 2007, 1-13.


[23] Boi, L., “Geometry of dynamical systems and topological stability: from bifurcations, chaos and fractals to dynamics in the natural and life sciences”, International Journal of Bifurcation and Chaos, 2007 (forthcoming). [24] Boi, L., “Interfaces between geometry, dynamics and biology: from molecular topology to the chromosome organization”, in New Trends in Geometry, and Its Role in the Natural and Living Sciences, L. Boi, C. Bartocci, C. Sinigaglia (Eds.), Elsevier, London, 2008 (forthcoming). [25] Boi, L., “Les formes vivantes: de la philosophie à la biologie”, in Vaysse J.-M. (dir.), Vie, Monde, Individuation, Georg Olms Verlag, Hildesheim, 2003, 159-170. [26] Boi, L., “Limites du réductionnisme et nouvelles approches dans l’étude des phénomènes naturels et des systèmes vivants”, in La Fabrication du Psychisme, S. Mancini (ed.), Editions La Découverte, Paris, 2006, 207-240. [27] Boi, L. (ed.), Symétries, Brisures de Symétries et Complexité, en Mathématiques, Physique e Biologie, Peter Lang, Bern, 2005. [28] Boi, L., “Sur qualques propriétés géométriques globales des systèmes vivants”, Bulletin d’Histoire et d’Epistemologie des Sciences de la Vie, 14 (2007), 71-113. [29] Boi, L., “Topological Knots Models in Physics and Biology: mathematical ideas for explaining inanimate and living matter”, in Geometries of Nature, Living Systems and Human Cognition, L. Boi (ed.), World Scientific, Singapore, 203-278. [30] Boi, L., “When topology meets biology ‘for life’: Interdisciplinary remarks on the way in which form modulates function”, (first invited lecture in the Program “Mathematics and Biology”, Interdisciplinary Laboratory for Advanced Studies, March 2, 2007), Preprint SISSA/ISAS, Trieste, p. 50. [31] Boi, L., “Le retournement de la sphere et la gastrulation: un modèle topologique de la gastrulation en embryogenèse”, Journal of Theoretical Biology, 2008 (to appear). [32] Brenner, C. et al., “Myc represses transcription through recruitment of DNA methyltransferase co-repressor”, EMBO J., 24 (2005), 336-346. [33] Calladine, C.R. et al., Understanding DNA. The molecule and how it woks, Elsevier, London, 2004. [34] Callinan, P.A. and A.P. Feinberg, “The emerging science of epigenomics”, Hum. Mol. Genet., 15 (2006), 95-101. [35] Cech, T. R., “A model for the RNA-catalyzed replication of RNA”, Proc. Natl. Acad. Sci. USA, 83 (1986), 360-367. [36] Chauvet, G., The mathematical nature of the living world, World Scientific, Singapore, 2004. [37] Choo, Y., Sánchez-García, I. and A. Klug, “In vivo repression by a site-specific DNAbinding protein designed against an oncogenic sequence”, Nature, 372 (2002), 642-645. [39] Coen, E., The Art of Genes. How Organisms Make Themselves, Oxford University Press, New York, 1999. [40] Crick, F.H.C., “Diffusion in embryogenesis”, Nature, 225 (1970), 420-422. 71

[41] Collier, J., “Information Increase in Biological Systems: How does Adaptation Fit?”, in Evolutionary Systems, G. van der Vijver, S.N. Salthe and M. Delpos (eds.), Kluwer, Dordrecht, 1998, 129-140. [42] Cornish-Bowden, A., Cárdenas, M.L., “Complex networks of interactions connect genes to phenotypes”, Trends Biochem. Sci., 26 (2001), 463-465. [43] Costa, S. and P. Shaw, “Chromatin organization and cell fate switch respond to positional information in Arabidopsis”, Nature, 439 (2006), 493-496. [44] Cozzarelli, N.R. and V.F. Holmes, “Closing the ring: Links between SMC proteins and chromosome portioning, condensation, and supercoiling”, Proc. Natl. Acad. Sci. USA, 97(4), 2000, 1322-1324. [45] Cremer, T. et al., “Higher order chromatin architecture in the cell nucleus: on the way from structure to function”, Biol. Cell, 96 (2004), 555-567. [46] Cremer, T., Cremer, C., “Chromosome territories, nuclear architecture and gene regulation in mammalian cells”, Nat. Rev. Genet., 2(4), 2001, 292-301. [47] Crick, F.H.C., Barnett, S., Brenner, S. and R.S. Watts-Tobin, “General Nature of the Genetic Code”, Nature, 192 (1961), 1227-1232. [48] Davidson, E.H., et al., “A Genomic Regulatory Network for Development”, Science, 295 (2002), 1669-1678. [49] De Witt, E., Greil, F., and B. van Steensel, “Genome-wide HP1 binding in Drosophila: Developmental plasticity and genomic targeting signals”, Genome Res., 15 (2005), 12651273. [50] Del Re, G., “Organization, Information, Autopoiesis: From Molecules to Life”, in The Emergence of Complexity in Mathematics, Physics, Chemistry, and Biology, B. Pullman (Ed.), Pontificiae Academiae Scientiarum/Princeton University Press, 1996, 276-293. [51] Dubnau, J. and G. Struhl, “RNA recognition and translational regulation by a homoeodomain protein”, Nature, 379 (1996), 694-699. [52] Dyson, F., Origins of Life, Cambridge University Press, Cambridge, 1985. [53] Edelman, G. M., Topobiology. An Introduction to Molecular Embryology, Basic Books, New York, 1988. [54] Ehrenhofer-Murray, A.E., “Chromatin dynamics at DNA replication, transcription and repair”, Eur. J. Biochem., 271 (2004), 2335-2349. [55] Eichler, E.E. and D. Sankoff, “Structural Dynamics of Eukaryotic Chromosome Evolution”, Science, 301 (2003), 793-797. [56] Eigen, M., et al., “The origin of genetic information”, Sci. Am., 244(4), 1981, 88-92. [57] Elgin, S. C. R. and L. Workman, Chromatin Structure and Gene Expression, Oxford University Press, Oxford, 2000. [58] Esteller, M. (Ed.), DNA Methylation. Approaches, Methods and Applications, CRC Press, 2004.


[59] Esteller, M. and Almouzni, G., “How epigenetics integrates nuclear functions”, Workshop on epigenetics and chromatin: transcriptional regulation and beyond”, EMBO Rep., 6 (2005), 624-628. [60] Esteller, M., “Aberrant DNA methylation as a cancer-inducing mechanism”, Annu. Rev. Pharmacol. Toxicol., 45 (2005), 629-656. [61] Fan, Y. et al., “Histone H1 Depletion in Mammals Alters Global Chromatin Structure but Causes Specific Changes in Gene Regulation”, Cell, 123 (2005), 1199-1212. [62] Farge, E., “Mechanical Induction of Twist in the Drosophila Foregut/Stomodeal Primordium”, Current Biology, 13 (2003), 1365-1377. [63] Felsenfeld, G. and M. Groudine, “Controlling the double helix”, Nature, 421 (2003), 448-453. [64] Felsenfeld, G., “Chromatin: an essential part of transcriptional apparatus”, Nature (London), 421(355), 1992, 219-223. [65] Fraga, M. F., Ballestar, E., Paz, M. F. et al., “Epigenetic differences arise during the lifetime of monozygotic twins”, Proc. Natl. Acad. Sci. USA, 102 (2005), 10604-10609. [66] Fraga, M. F., Ballestar, E., Villar-Garea, A. et al., “Loss of acetylation at Lys16 and trimethylation at Lys20 of histone H4 is a common hallmark of human cancer”, Nat Genet., 37 (2005), 391-400. [67] Garcia-Bellido, A., Ripoll, P., Morata, G., “Developmental compartmentalization in the wing disc of drosophila”, Nat. New Biol., 245 (1973), 251-253. [68] Gasser, S., “Visualizing chromatin dynamics in interphase nuclei”, Science, 296 (2002), 1412-1416. [69] Gehring, W. J., Master Control Genes in Development and Evolution, Yale University Press, New Haven, 1998. [70] Georgel, P.T., “Chromatin structure of eukaryotic promoters: A changing perspective”, Biochem. Cell Biol., 80 (2002), 295-300. [71] Gierer, A., “Holistic Biology – Back on Stage? Comments on post-genomics in historical perspective”, Philosophia Naturalis, 39 (2002), 25-44. [72] Gilbert, N., Bickmore, W., “The relationship between higher-order chromatin structure and transcription”, Biochem. Soc. Symp., 73 (2006), 59-66. [73] Gilbert, S. F., Developmental Biology, sixth edition, Sinauer Associates, Inc., Sunderland, MA, 2000. [74] Goodwin, B., How the Leopard Changed Its Spots, Weidenfeld & Nicolson, London, 1994. [75] Görisch, S.M. et al., “Nuclear body movement is determined by chromatin accessibility and dynamics”, Proc. Natl. Acad. Sci. USA, 101 (2004), 13221-13226. [76] Görisch, S.M., Lichter, P., Rippe, K., “Mobility of multi-subunit complexes in the nucleus: accessibility and dynamics in chromatin subcompartments”, Histochem. Cell Biol., 123 (2005), 217-228.


[77] Grewal, S.I.S., Moazed, D., “Heterochromatin and Epigenetic Control of Gene Expression”, Science, 301 (2003), 798-802. [78] Grimaud, C., Negre, N. and G. Cavalli, “From genetics to epigenetics: the tale of Polycomb group and trithorax group genes”, Chromosome Res., 14 (2006), 363-375. [79] Hill, D.A., “Influence of linker Histone H1 on chromatin remodeling”, Biochem. Cell Biol., 79 (2001), 317-324. [80] Holliday, R., “The Inheritance of Epigenetic Defects”, Science, 238 (1987), 163-170. [81] Jacob, F. and J. Monod, “On the Regulation of Gene Activity”, C.S.H. Symp. Quant. Biol., 26 (1961), 193-211. [82] Jacob, F., La logique du vivant, Gallimard, Paris, 1970. [83] Jaenisch, R., Bird, A., “Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals”, Nat Genet., 33 (2003), 245-254. [84] Jaeger, J. et al., “Dynamic control of positional information in the early Drosophila embryo”, Nature, 430(6997), 368-371. [85] Jenuwein, T. and Allis, C. D., “Translating the Histone Code”, Science, 293 (2001), 1074-1080. [87] Jenuwein, T., “The epigenetic magic of histone lysine methylation”, FEBS J., 273(14), 2006, 3121-3135. [88] Jin, J., et al., “In and out: histone variant exchange in chromatin”, Trends Biochem. Sci., 30(12), 2005, 680-687. [89] John, B., and G.L.G. Miklos, The eukaryote genome in development and evolution, Allen & Unwin, London, 1988. [90] Jones, P. A. and Takai, D., “The Role of DNA Methylation in Mammalian Epigenetics”, Science, 293 (2001), 1068-1070. [91] Jost, J., “On the notion of complexity”, Theory in Biosciences, 117 (1998), 161-171. [92] Jost, J., External and internal complexity of complex adaptive systems, Theory in Biosciences, Vol. 123, Issue 1 (2004), 69-88. [93] Karp, G., Cell and Molecular Biology. Concepts and Experiments, third edition, John Wiley & Sons, Inc., New York, 2002. [94] Karsenti, E., “Self-organization processes in living matter”, Interdisciplinary Science Reviews, 32(6), 2007, pp. 21-38. [95] Kauffmann, S., The Origins of Order. Self-Organization and Selection in Evolution, Oxford University Press, New York, 1993. [96] Kepes, F. and Vaillant, C., “Transcription-based solenoidal model of chromosomes”, Complexus, 1 (2003), 171-180. [97] Kireeva, N., Lakonishok, M., Kireev, I., Hirano, T., and A.S. Belmont, “Visualization of early chromosome condensation: a hierarchical folding, axial glue model of chromosome structure”, J. Cell Biol., 166(6), 2004, 775-785. [98] Kirschner, M.W., “The meaning of system biology”, Cell, 121(4), 2005, 503-504. [99] Kitano, H. “Systems biology: a brief overview”, Science, 295 (2002), 1662-1664. 74

[100] Klose, R.J. and Bird, A.P., “Genomic DNA Methylation: the mark and its mediators”, Trends in Biochem. Sci., 31 (2006), 89-97. [101] Klug, A., “Macromolecular order in biology”, Phil. Trans. R. Soc. Lond. A, 348 (1994), 167-178. [102] Lambert, D. and Rezsöhazy, R., Comment les pattes viennent au serpent. Essai sur l’etonnante plasticité du vivant, Frammarion, Paris, 2004. [103] Lesne, A., “The chromatin regulatory code: beyond an histone code”, in Proceedings Modelling and Simulation of Biological Processes in the Context of Genomics, P. Amar, F. Kepes, V. Norris, and P. Tracqui (eds.), Platypus Press, 2003, 1-4. [104] Lewontin, R., The Triple Helix. Organisms and Environment, Harvard University Press, Cambridge, MA, 2000. [105] Li, E., “ Chromatin modification and epigenetic reprogramming in mammalian development”, Nat. Rev. Genet., 3 (2002), 662-673. [106] Lodish, H., et al., Molecular Cell Biology, fourth edition, W. H. Freeman and Co., New York, 2000. [107] Lopez, A. J., “Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation”, Ann. Rev. Genet., 32 (1998), 279-305. [108] Lutter, L.C., Judis, L., and R.F. Paretti, “Effects oh Histone Acetylation on Chromatin Topology In Vivo”, Mol. Cell. Biol., 12 (1992), 5004-5014. [109] Mahy, N. L. et al., “Localisation of active and inactive genes, and non-coding DNA, within chromosome territories”, J. Cell Biol., 159 (2002), 579-589. [110] Mahy, N.L., Perry, P.E., Gilchrist, S., Baldock, R.A., Bickmore, W.A., “Spatial organization of active and inactive genes and noncoding DNA within chromosome territories”, J. Cell Biol., 157 (2002), 579-589. [111] Maynard Smith, J., “The Concept of Information in Biology”, Philosophy of Science, 67 (2000), 177-194. [112] Maynard Smith, J., Shaping Life: Genes, Embryos and Evolution, Weidenfeld & Nicolson, London, 1998. [113] McAdams, H. H. and Shapiro, L., “A Bacterial Cell-Cycle Regulatory Network Operating in Time and Space”, Science, 301 (2003), 1874-1877. [114] McClintock, B., “The significance and responses of the genome to challenge”, Science, 226 (1984), 792-801. [115] Haswell, E.S. and Meyerowitz, E.M., “MscS-like proteins control plastid size and shape in Arabidopsis thaliana”, Curr. Biol., 16 (2006), 1-11. [116] Mistelli, T., “Protein dynamics: implications for nuclear architecture and gene expression”, Science, 291 (2001), 843-847. [117] Mistelli, T., “The concept of self-organization in cellular architecture”, The Journal of Cell Biology, 155(2), 2001, 181-185. [118] Morange, M., “Post-genomic, between reduction and emergence”, Synthese, 151 (2006), 355-360. 75

[119] Morange, M., “The Relations between Genetics and Epigenetics”, Ann. N.Y. Acad. Sci., 981 (2002), 50-60. [120] Morange, M., The Misunderstood Gene, Harvard University Press, Cambridge, MA, 2001. [121] Nakayama, J.-J. et al., “Role of Histone H3 Lysine 9 Methylation in Epigenetic Control of Heterochromatin Assembly”, Science, 292 (2007), 110-113. [122] Nicolis, G. and Prigogine, I., Exploring Complexity, Piper, Munich, 1987. [123] Noble, D., The Music of Life. Biology Beyond the Genome, Oxford University Press, Oxford, 2006. [124] Pâques, F. and T. Grange, “Artchitecture du noyau et regulation transcriptionnelle”, Médecine/Sciences, 18 (2002), 1245-1256. [125] Pennisi, E., “Behind the Scenes of Gene Expression”, Science, 293 (2001), 1064-1067. [126] Prohaska, S.J., Mosig, A. and P. F. Stadler, “Regulatory Signals in Genomic Sequences”, in Networks: From Biology to Theory, J. Feng, J. Jost, M. Qian (Eds.), Springer, New York, 2007, 191-220. [127] Raff, R. A., The Shape of Life: Genes, Development, and the Evolution of Animal Form, Chicago University Press, Chicago, 2004. [128] Reik, W., Dean, W., Walter, J., “Epigenetic Reprogramming in Mammalian Development”, Science, 293 (2001), 1089-1093. [129] Remaut, H. and Waksman, G., “Protein–protein interaction through ß-strand addition”, Trends Biochem. Sci., 31(8), 2006, 436-444. [130] Ricca, R.L., “Structural Complexity”, in Encyclopedia of Nonlinear Science (ed. A. Scott), Routledge, New York & London, 2005, 885-887. [131] Richards, E.J. and S.C.R. Elgin, “Epigenetic Codes for Heterochromatin Formation and Silencing: Rounding up the Usual Suspects”, Cell, 108 (2002), 489-500. [132] Richmond, T.J. and J. Widom, “Nucleosome and chromatin structure”, in Chromatin Structure and Gene Expression, S. C. R. Elgin and L. Workman (eds.), Oxford University Press, Oxford, 2000, 1-23. [133] Ridgway, P. and Almouzni, G., “Chromatin assembly and organization”, J. Cell Sci., 114 (2001), 2711-2722. [134] Roca, J., “The mechanisms of DNA topoisomerases”, Trends Biochem. Sci., 20 (1995), 156-160. [135] Rosen, R., “Organisms as causal systems, which are not mechanisms: an essay into the nature of complexity”, in Theoretical Biology and Complexity. Three Essays on Natural Philosophy of Complex Systems, R. Rosen (ed.), Academic Press, Orlando, FL, 1985, 165203. [136] Rupp, R.A.V. and Becker, P.B., “Gene Regulation by Histone H1: New Links to DNA Methylation”, Cell, 123 (2005), 1178-1179.


[137] Santoro, R. and Grummt, I., “Epigenetic mechanism of rRNA gene silencing: temporal order of NoRC-mediated Histone modification, chromatin remodeling, and DNA Methylation”, Mol. Cell. Biol., 25 (2005), 2539-2546. [138] Sapp, J., Beyond the Gene: Cytoplasmic Inheritance and the Struggle for Authority in Genetics, Oxford University Press, Oxford, 1987. [139] Sarkar, S., “Biological Information: A Skeptical Look at Some Central Dogmas of Molecular Biology”, in S. Sarkar (ed.), The Philosophy and History of Molecular Biology. New Perspectives, Kluwer, Dordrecht, 1996, 187-231. [140] Sarkar, S., Genetics and Reductionism, Cambridge University Press, Cambridge, 1998. [141] Scherrer, K. and Jost, J., “Gene and genon concept: coding versus regulation. A conceptual and information-theoretic analysis of genetic storage and expression in the light of modern molecular biology”, 2007, Theory in Biosciences, to appear. [142] Schrödinger, E., What is Life?, Cambridge University Press, Cambridge, 1944. [143] Schuster, P., Stadler, P. F., “Modeling Conformational Flexibility and Evolution of Structure: RNA as an Example”, in Structural Approaches to Sequence Evolution. Molecules, Networks, Populations, U. Bastolla, M. Porto, H. E. Roman and M. Vendruscolo (Eds.), Springer, Berlin & Heidelberg, 2007, 3-36. [144] Scott, M.P., and P.H. O’Farrell, “Spatial programming of gene expression in early Drosophila embryogenesis”, Annu. Rev. Cell Biol., 2 (1986), 49-80. [145] Shiokawa, K. et al., “The developing Xenopus embryo as a complex system: Maternal and zygotic contribution of gene products in nucleo-cytoplasmic and cell-to-cell interactions”, in Complexity and Diversity, K. Kudo, O. Yamakawa, Y. Tamagawa (Eds.), Springer-Verlag, Tokyo, 1997, 154-162. [146] Smet-Nocca, C., Paldi, A. and A. Benecke, “De l’épigénomique à l’émergence morphogénétique”, in Morphogenèse: l’origine des formes, P. Bourgine and A. Lesne (eds.), Belin, Paris, 2006, 153-178. [147] Solé, R. and B. Goodwin, Signs of Life: How Complexity Pervades Biology, Basic Books, New York, 2000. [148] Stewart, I., Life’s Other Secrets: The New Mathematics of the Living World, Allen Lane, London, 1998. [149] Strohman, R.C., “Epigenesis and complexity. The coming Kuhnian revolution in biology”, Nature Biotechnology, 15 (1997), 194-200. [150] Summers, D. W., “Lifting the certain: using the topology to probe the hidden action of enzymes”, Notices of the American Mathematical Society, 42(5), 1995, 528-537. [151] Tautz, D., “Redundancies, development and the flow of information”, BioEssays, 14 (1992), 263-266. [151] The Limits of Reductionism in Biology, by Novartis Foundation Symposium, John Wiley, New York, 1998. [152] Thom, R., Structural Stability and Morphogenesis, Benjamin, New York, 1972.


[153] Van Regenmortel, M.H.V., “Reductionism and the search for structure-function relationships in antibody molecules”, J. Mol. Recognit., 15 (2002), 240-247. [154] Verschure, P.J., van Der Kraan, I., Manders, E.M., van Driel, R., “Spatial relationship between transcription sites and chromosome territories”, J. Cell Biol., 147 (1999), 13-24. [155] Vogelauer, M., Wu, J., Suka, N., and M. Grunstein, “Global histone acetylation and deacetylation in yeast”, Nature, 408 (2000), 495-498. [156] Waddington, C.H., “The Basic Ideas of Biology”, in Towards a Theoretical Biology. 1. Prolegomena, C.H. Waddington (ed.), Aldine Publishing Company, Chicago, 1968. [157] Waddington, C.H., The Strategy of Genes, Allen & Unwin, London, 1957. [158] Watson, J. D., Crick, F.H.C., “A Structure for Deoxyribose Nucleic Acid”, Nature, 171 (1953), 737-738. [159] Weng, G., Bhalla, U.S., Iyengar, R., “Complexity in Biological Signaling Systems”, Science, 284 (1999), 92-96. [160] Westhof, E., Dujon, B., “RNA folding: beyond Watson-Crick pairs”, Structure Fold Des., 8 (2000), 55-65. [161] White, J. H., “Geometry and Topology of DNA and DNA–protein interactions”, in New Scientific Applications of Geometry and Topology, De Witt L. Summers (ed.), Proceedings of Symposia in Applied Mathematics, Vol. 45, Amer. Math. Soc., 1992, 17-37. [162] Widom, J., “Structure, Dynamics, and Function of Chromatin in Vitro”, Annu. Rev. Biophys. Biomol. Struct., 27 (1998), 285-327. [163] Wolffe, A. P., “Chromatin Structure and DNA replication: Implications for Transcriptional Activity”, in Concepts in Eukaryotic DNA Replication, M. L. DePamphilis (Ed.), Cold Spring Harbor Laboratory Press, New York, 1999, 271-293. [164] Wolffe, A. P., Chromatin. Structure and Function, Academic Press, London, 1998. [165] Wolffe, A.P., and J.C. Hansen, “Nuclear visions: functional flexibility from structural instability”, Cell, 104 (2001), 631-634. [166] Wolpert, L. “Positional information and pattern formation”, Curr. Top. Dev. Biol., 6 (1971), 183-224. [167] Wolpert, L., “Positional information and patterns formation in development”, Dev. Genet., 15(6), 1994, 485-490. [168] Wu, C., “Chromatin Remodeling and the Control of Gene Expression”, J. Biol. Chem., 272 (1997), 28171-28174. [169] Yoshikava, K., “Complexity in a Molecular String: Hierarchical Structure as Is Exemplified in a DNA Chain”, in Complexity and Diversity, K. Kudo, O. Yamakawa, Y. Tamagawa (Eds.), Springer-Verlag, Tokyo, 1997, 81-90.


Plasticity and Complexity in Biology  

Topological Organization, Regulatory Proteins Networks and Mechanisms of Genetic Expression Toward New Vistas in the Life Sciences / LUCIANO...

Plasticity and Complexity in Biology  

Topological Organization, Regulatory Proteins Networks and Mechanisms of Genetic Expression Toward New Vistas in the Life Sciences / LUCIANO...